Professional Documents
Culture Documents
e74cc1e5-8fa6-4f2b-bf89-0f00f7f44640
e74cc1e5-8fa6-4f2b-bf89-0f00f7f44640
Harish Sharma
Vivek Shrivastava
Ashish Kumar Tripathi
Lipo Wang Editors
Communication
and Intelligent
Systems
Proceedings of ICCIS 2023, Volume 2
Lecture Notes in Networks and Systems
Volume 968
Series Editor
Janusz Kacprzyk , Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of Campinas—
UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Türkiye
Derong Liu, Department of Electrical and Computer Engineering, University
of Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to both
the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).
Harish Sharma · Vivek Shrivastava ·
Ashish Kumar Tripathi · Lipo Wang
Editors
Communication
and Intelligent Systems
Proceedings of ICCIS 2023, Volume 2
Editors
Harish Sharma Vivek Shrivastava
Department of Computer Science Department of Electrical and Electronics
and Engineering Engineering
Rajasthan Technical University National Institute of Technology
Kota, Rajasthan, India Uttarakhand
Srinagar, Uttarakhand, India
Ashish Kumar Tripathi
Department of Computer Science Lipo Wang
Malaviya National Institute of Technology School of Electrical and Electronic
Jaipur, Rajasthan, India Engineering
Nanyang Technological University
Singapore, Singapore
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
This book contains outstanding research papers as the proceedings of the 5th Interna-
tional Conference on Communication and Intelligent Systems (ICCIS 2023), which
was held on 16–17 December 2023 at Malaviya National Institute of Technology
Jaipur, India, under the technical sponsorship of the Soft Computing Research
Society, India. The conference is conceived as a platform for disseminating and
exchanging ideas, concepts, and results of researchers from academia and industry
to develop a comprehensive understanding of the challenges of intelligence advance-
ments in computational viewpoints. This book will help in strengthening conge-
nial networking between academia and industry. This book presents novel contri-
butions to communication and intelligent systems and is reference material for
advanced research. The topics covered are intelligent systems: algorithms and appli-
cations, smart data analytics and computing, informatics, and applications, and
communication and control systems.
ICCIS 2023 received many technical contributed articles from distinguished
participants from home and abroad. ICCIS 2023 received 750 research submissions.
After a very stringent peer-reviewing process, only 102 high-quality papers were
finally accepted for presentation and publication.
This book presents the second volume of 34 research papers related to commu-
nication and intelligent systems and serves as reference material for advanced
research.
v
Contents
vii
viii Contents
xi
xii Editors and Contributors
Dr. Ashish Kumar Tripathi (Senior Member, IEEE) received his M.Tech. and Ph.D.
degrees in computer science and engineering from the Department of Computer
Science and Engineering, Delhi Technological University, Delhi, India, in 2013 and
2019, respectively. He is currently working as Assistant Professor at the Depart-
ment of Computer Science and Engineering, Malviya National Institute of Tech-
nology (MNIT), Jaipur, India. His research interests include big data analytics, social
media analytics, soft computing, image analysis, and natural language processing.
Dr. Tripathi has published several papers in international journals and conferences
including IEEE transactions. He is Active Reviewer for several journals of repute.
Dr. Lipo Wang received the bachelor’s degree from National University of Defense
Technology (China) and Ph.D. from Louisiana State University (USA). He is
presently on the faculty of the School of Electrical and Electronic Engineering,
Nanyang Technological University, Singapore. His research interest is artificial intel-
ligence with applications to image/video processing, biomedical engineering, and
data mining. He has 330+ publications, a US patent in neural networks, and a patent
in systems. He has co-authored 2 monographs and (co-)edited 15 books. He has
8000+ Google Scholar citations, with H-index 43. He was Keynote Speaker for 36
international conferences. He is/was Associate Editor/Editorial Board Member of
30 international journals, including 4 IEEE Transactions, and Guest Editor for 10
journal special issues. He was Member of the Board of Governors of the International
Neural Network Society, IEEE Computational Intelligence Society (CIS), and the
IEEE Biometrics Council. He served as CIS Vice President for Technical Activi-
ties and Chair of Emergent Technologies Technical Committee, as well as Chair of
Education Committee of the IEEE Engineering in Medicine and Biology Society
(EMBS). He was President of the Asia-Pacific Neural Network Assembly (APNNA)
and received the APNNA Excellent Service Award. He was Founding Chair of both
the EMBS Singapore Chapter and CIS Singapore Chapter. He serves/served as Chair/
Committee Members of over 200 international conferences.
Contributors
M. M. Jani
Information Technology, Dr. Subhash University, Junagadh, Gujarat 362001, India
e-mail: mayur.jani@dsuni.ac.in
S. R. Panchal
Electronics and Communication Engineering, Dr. Subhash University, Junagadh, Gujarat 362001,
India
e-mail: sandip.panchal@dsuni.ac.in
H. H. Patel
Computer Science and Engineering, Dr. Subhash University, Junagadh, Gujarat 362001, India
e-mail: hemant.patel@dsuni.ac.in
A. Raiyani (B)
Information Management, Nirma University, Ahmedabad, Gujarat 382421, India
e-mail: ashwin.raiyani@nirmauni.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 1
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_1
2 M. M. Jani et al.
1 Introduction
An automatic speech recognition (ASR) system can translate spoken words into text
using speech recognition, often known as ASR. Speech recognition has become an
important tool in many industries, including health care, finance, education, and
customer service. It enables more natural and intuitive interaction between people,
computers, and other devices, as keyboards and other input devices are not required.
The necessity for multilingual speech recognition has grown significantly as the
globe becomes increasingly interconnected. With the use of this technology, people
from various linguistic backgrounds may communicate in their native tongues with
one another, and with computers, therefore lowering communication barriers.
Weaknesses: Since a single model is used, the system might not capture language-
specific nuances effectively. Performance might vary across languages, with some
languages being more accurately recognized than others.
Suitability: Acoustic model sharing is suitable when there is limited training data
for individual languages or when the focus is on low-resource languages.
Technique: This approach trains separate acoustic models for each target language.
Each model is specialized in recognizing speech in a specific language, capturing its
unique phonetic and linguistic characteristics.
Strengths: Language-specific models tend to achieve higher accuracy as they can
focus on the intricacies of individual languages. They are effective for languages
with significant pronunciation and vocabulary differences.
Weaknesses: Training and maintaining separate models for each language require
more resources and computational power. It can be challenging to collect sufficient
training data for low-resource languages.
Suitability: Language-specific models are suitable when the goal is to achieve high
accuracy and performance for individual languages, especially for languages with
distinct phonetic features.
6 M. M. Jani et al.
media, market research, legal services, and academia, where transcription of audio
recordings is required. It eliminates the need for manual transcription, saving time
and effort [3, 7, 8, 13, 34–36].
Education: Smart Attendance, which uses voice recognition technology, is a more
efficient method that saves time for teachers. Students can use technology to help
them prepare for or recall lecture topics. They employed several technologies to
create lecture notes [5] easily.
6 Future Directions
Multilingual speech-to-text systems will become more widely available and acces-
sible, allowing more individuals to utilize and profit from them. The accuracy of
multilingual voice recognition systems will continue to increase as deep learning
algorithms are used and more training data becomes available. Integrating STT and
text-to-speech (TTS) capabilities with other technologies, such as machine transla-
tion and natural language processing, will enable more comprehensive and seam-
less communication across several languages. Overall, the future of multilingual
voice-to-text systems is predicted to entail the development of increasingly accu-
rate, efficient, and accessible systems capable of facilitating successful communica-
tion across diverse cultures and languages. These systems will become increasingly
significant in improving information accessibility, breaking down language barriers,
Multilingual Speech Recognition: An In-Depth Review of Applications … 11
and improving the precision and effectiveness of language learning and translation
technologies.
7 Conclusions
Due to the scarcity of research in this field, the study’s goal is to complete an
evaluation of multilingual speech-to-text (STT) systems. The study’s primary goals
are to explore the difficulties and constraints encountered in creating multilingual
STT systems. These may involve, among other things, challenges with linguistic
variety, code-switching, speaker variability, vocabulary variances, and real-time
processing. The study aims to find different tools and approaches for improving the
efficacy and accuracy of multilingual STT systems. This might include investigating
language modeling, acoustic modeling, speaker diarization, vocabulary adaptation,
and domain-specific adaptation approaches. The research initiative will research
methods for enhancing recognition speed in multilingual STT systems, particularly
when dealing with mixed-language speech.
Additionally, lowering the Word Error Rate, which evaluates the correctness of
transcriptions, will be a priority. The study seeks to shed light and suggest new lines of
inquiry in multilingual speech-to-text. This might include investigating new methods,
data gathering approaches, assessment metrics, or tackling issues with multilingual
STT. This study aims to identify current barriers to multilingual STT system devel-
opment, highlight useful resources and techniques, and offer recommendations for
future research in this field. The ultimate objective is to improve multilingual speech-
to-text technology’s effectiveness, precision, and application to satisfy the demands
of bilingual and multilingual users in various communication scenarios.
References
34. Iranzo-sánchez J et al (2020) Europarl-st: a multilingual corpus for speech translation of parlia-
mentary debates. In: 2020 IEEE International conference on acoustics, speech, and signal
processing (ICASSP 2020). IEEE
35. Wang C et al (2020) Covost: a diverse multilingual speech-to-text translation corpus. arXiv
preprint arXiv:2002.01320
36. Nakamura S et al (2006) The ATR multilingual speech-to-speech translation system. IEEE
Trans Audio Speech Lang Process 14(2):365–376
37. Udhaykumar N, Ramakrishnan SK, Swaminathan R (2004) Multilingual speech recognition
for information retrieval in Indian context. In: Proceedings of the student research workshop
at HLT-NAACL 2004
38. Anwar M et al (2023) Muavic: a multilingual audio-visual corpus for robust speech recognition
and robust speech-to-text translation. arXiv preprint arXiv:2303.00628
39. Schultz T (2002) Globalphone: a multilingual speech and text database developed at Karlsruhe
university. In: Seventh International conference on spoken language processing
40. Gonzalez-Dominguez J et al (2014) A real-time end-to-end multilingual speech recognition
architecture. IEEE J Sel Top Signal Process 9(4):749–759
Performance Evaluation of Job Shop
Scheduling Problem Using Proposed
Hybrid of Black Hole and Firefly
Algorithms
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 15
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_2
16 J. Kaur and A. Pal
2 Related Work
In [5] SFLA, the entire population is divided into several memeplexes using a tour-
nament selection-based technique. The search in each memeplex is conducted based
on the memeplex’s optimal solution by combining the global search phase with
the multiple neighbourhood search step. The work [6] examines FJSP, which mini-
mizes workload balance and overall energy consumption, and analyses the conflicts
that arise between the two goals. Based on a three-string coding method, SFLA is
suggested. An enhanced shuffled frog-leaping algorithm has been developed in [7]
to address the FJJS issue. The algorithm has an extremal optimization in informa-
tion exchange and an adjustment sequence to construct the local search strategy.
When compared to existing heuristic algorithms, the computational result demon-
strates the suggested algorithm’s string search capability in resolving the flexible
job shop scheduling problem. The advancements in fuzzy set technology allow JSP,
where the fuzzy processing time has been used (FJSP) [8] to simulate scheduling
more thoroughly. With a hybrid adaptive differential evolution (HADE) approach,
the multi-objective FJSP can be reduced to a single-objective optimization problem.
The authors consider the jobs’ utmost accomplishment time, total waiting time, and
overall power consumption. All its parameters—CR and F—are engineered to be
normally distributed and adaptable. The authors in [9] suggest a mathematical model
for a novel FJSSP (SOC-FJSP) that is bound by setup operators and assumes antic-
ipatory setup operations (detached). A setup operator merely needs to remain near
the equipment while setting it up, unlike a machine tender. Once setup is finished,
an operator can go on to another machine and continue setting things up. Because it
is assumed that a setup is independent of operations, it is possible to overlap a job’s
setup operation with the previous one. As a result, installation operators and machine
tools are used more effectively, and duration is decreased. The issue of maximum
lateness, a performance metric based on due dates, is tackled by the authors in [10].
After stating the issue, they get the dominance relation. Additionally, they provide
three methods for the problem: EDD, Tabu search, and particle swarm optimization
(PSO). The authors in [11] used an insert operator to change the particle move-
ment and priority to change the particle location representation when compared to
the original PSO. To translate a particle position into a schedule, they also put into
practice a modified parameterized active schedule generation method (mP-ASG).
By adjusting the maximum delay duration permitted in mP-ASG, one can narrow or
widen the search space between non-delay schedules and active schedules. For the
grid scheduling problem, this work [12] provides an enhanced particle swarm opti-
mization (PSO) algorithm with discrete coding rules. All the benefits of the regular
PSO, including ease of implementation, low computing load, and few control param-
eters, can be retained by the enhanced PSO method. Experiments demonstrate that
the algorithm is low variability and stable. The crossover operator in this study [13],
which uses the new GA-SA algorithm, integrates the metropolis acceptance criterion.
This could preserve the positive traits of the preceding generation while lessening
the disruptive effects of genetic operators. The authors also introduce two brand-new
18 J. Kaur and A. Pal
features for this JSP-solving algorithm. To create a schedule that can further narrow
the search space, a FAS representation is first given. Second, for the operation-
based representation, the authors suggest Precedence Operation Crossover (POX),
a brand-new crossover operator. This research [14] proposes a convolutional neural
network-based efficient two-stage approach to handle the FJSP with device malfunc-
tion. The goal of the DFJSP model is robustness and maximum completion time. The
first step of the two-stage technique involves convolutional neural network training
of the prediction model (CNN). Using the model that was developed in the first stage,
the second stage uses it to forecast how robust the schedule will be. To begin with, a
better ICA is suggested to provide information regarding pedagogy. To minimize the
overall weighted tardiness, this study [15] examines a FJJS issue with lot-streaming
and machine reconfigurations (FJSP-LSMR). The goal in this study is to reduce the
overall weighted tardiness in the JSSP. A novel ABC [16] is proposed to solve the
issue, considering its high complexity. After identifying a neighbourhood property
of the problem, a tree search technique is developed to improve ABC’s exploitation
potential. For wide ranging JSSP where the overall weighted delay must be reduced,
a decomposition-based hybrid optimization approach [17] is provided. A new sub-
problem that is first defined by a simulated annealing approach and then refined in
each iteration is solved by a genetic algorithm. The authors develop a fuzzy inference
method to find the tasks’ bottleneck characteristic values, which display the char-
acteristic information at different optimization stages. To increase the optimization
efficiency, this information is subsequently used to direct the immune mechanism’s
sub problem-solving process. In the article [18] to increase the diversity of the popu-
lation, a mixed selection operator based on the fitness value and the concentration
value was offered. To fully leverage the qualities of the problem itself, new crossover
operators based on the machine and mutation operators based on the crucial path
were specifically designed. A new algorithm for determining the critical path from
schedule was presented to determine the critical path. Additionally, a local search
operator was created, which significantly enhances GA’s local search capabilities. A
new combination of GA was suggested, and its convergence was demonstrated based
on all of these. The imprecise duration of execution and completion of JSSPs are
discussed in [19]. The traditional differential evolution (DE) algorithm is selected by
the authors serving as the fundamental basis for optimization. The DE algorithm’s
benefit for a unique evolutionary approach has been used in the mutation operation of
different vector sets. Nevertheless, DE is not always successful in resolving FJSSP
cases. This work [20] introduces a generalized FJJSP in which additional Tough
restrictions are taken into account in addition to the classical limitations of the FJJS,
such as equipment capability, schedule delays, and holding periods. This issue is
based on an actual circumstance that was seen in a maker of seamless rolled rings.
The problem has been proposed to be represented by a constraint programming (CP)
and mixed integer linear programming (MILP) model.
Performance Evaluation of Job Shop Scheduling Problem Using … 19
Recently, research has concentrated on examining the scheduling issues that arise
in manufacturing and service environments where tasks are activities, machines are
resources, and every device is capable of handling one task at once. The job shop
system, often known as the low volume system, will be the main topic of the present
paper. Products in this kind of setting are created to order. A JSSP can be explained
in this way: m machines {M 1 , M 2 , ……, M m } performing n jobs {N 1 , N 2 , ….., N n }.
The notation for operation is Oij where the operation is performed on ith job and
jth machine. The requirement that each job be done by the machines in a specific
order is known as a technological restriction. There may be no interruptions once a
computer has begun processing a work. Makespan is the amount of time needed for
all operations to finish their respective processes. Our goal in this paper is to reduce
this makespan value. At least one of the best possibilities when the makespan is
minimized is semi-active (no operation may be started earlier without going against
technological constraints). Figure 1 represents the solution for the job shop problem
using Gantt chart. For a given schedule, define wij as the amount of time job j takes
to process before machine i, and define C ij as the amount of time job j takes to
complete processing on machine i. We are interested in objective functions that have
two parts, each of which is dependent on a certain schedule: an intermediate holding
cost component HC and a C(S) that depends on the times at which each work is
completed. The following are possible expressions for the intermediate holding cost
component using Eq. (1):
n
mj
HC = h i j wi j . (1)
j=1 i=1
4 Proposed Methodology
Yang designed the firefly algorithm [21] in 2008 by animating the distinctive
behaviours of fireflies. The population of fireflies exhibits distinctive luminescent
flashing behaviours to function as a channel to communicate, attraction of spouses,
and alerting attackers to potential threat. Drawing inspiration from those activities,
Yang developed this strategy based on the supposition that all fireflies are unisexual,
meaning that each can attract other fireflies, and that an individual’s attractiveness
is directly correlated with their light level. Consequently, the more brilliant fireflies
entice the less brilliant ones to come closer to them, also if a specific firefly is the only
one that is brighter than it, then the movement is random. The algorithm is explained
in the below steps:
Light Intensity: The fundamental idea of light’s power can be used to define a
suitable method for calculating the separation of any two firefly, given that its inverse
quadratic proportionality to the area’s square as in Eq. (2)
1
I ∝ . (2)
r2
Here, I indicates intensity, and r indicates distance. In addition, the light absorption
coefficient is γ .
Brightness: The attraction parameter β is defined in (3) using exponential formulae
and is based on the amount of light between two fireflies.
Performance Evaluation of Job Shop Scheduling Problem Using … 21
β = β0 e−γ r ,
2
(3)
where β0 is the source’s brightness where r = 0, and beta is the firefly’s brightness
h having a distance r.
Movement of Fireflies: When a firefly i moves towards a more alluring (brighter)
firefly j, it is motivated by using Eq. (4).
2
z i = z i + β0 e−γ ri j z j − z i . (4)
Updating Equation: Hence, the final updating Eq. (6) is formed by combining the
Eq. (4) and Eq. (5), and it is given as
2
z i = z i + β0 e−γ ri j z j − z i + α(rand() − 0.5). (6)
A sphere of space where there is a lot of mass gathered which nothing could escape its
gravitational attraction is the simplest definition of a black hole. Light and anything
else that enters a black hole are expelled from our universe forever. The candidate
who performs the best overall at every repetition when the BH algorithm is used, it
turns into a black hole. Stars: The standard stars are formed by all the other candidates
in [22] BH algorithm. One of the true candidates of the population, the black hole
was not randomly created. The candidates are then all shifted towards the direction
depending on their present locations and a random number, the black hole.
1. The BH algorithm begins by calculating the objective function for a population
of possible solutions to an optimisation issue.
2. The well suitable applicant is chosen for becoming the black hole at every repe-
tition, with the remaining stars acting as normal stars. Once start-up is complete,
the black hole begins to draw stars towards it.
3. If a star approaches a black hole too closely, it will be sucked in and vanish forever.
In this situation, once a new star (possible solution) is produced at random and
thrown into the search area, a new search is launched.
Fitness Value Calculation: The fitness value is calculated as follows.
1. A population of potential possibilities (the stars) that are created at random exists
in an issue or function’s search area.
22 J. Kaur and A. Pal
n
n
fi = eval(v(t)) and f BH = eval(v(t)),
i=1 i=1
where n is population size and f i and f BH are the ith star’s and the black hole’s fitness
values, respectively, within the initial population. The population is being calculated,
and the candidate with the greatest fitness rating, fi, among the remaining algorithm
stars, is selected to be the black hole, with the remaining stars continuing to function
as regular stars. Its near surroundings’ stars may be swallowed by the black hole.
Once the original star and other stars are formed, the main star starts to consume the
neighbouring stars as they approach in its path.
Absorption Rate of Stars: The stars in its immediate vicinity begin to be absorbed
by the main star, and they all initialize moving towards it. The formula given below
elaborates how the stars are absorbed by BH:
yi (t + 1) and yi (t) are the coordinates of the ith star at repetition t and (t + 1),
consecutively, where i = 1; 2; 3; n. Black hole X BH is situated in the field of search
space, where rand is a number which is randomly generated lying between the interval
[0, 1]. A star could go in the black hole’s direction and finally end up somewhere less
costly than the main star. In this case, the star relocates to the vicinity of the BH, and
vice versa. Once a BH is in its new location, the black hole algorithm will resume,
and stars will then begin to move in that direction.
Event Horizon: There’s a potential that travelling stars might cross the event
horizon as they get closer to a black hole. The black hole will draw in every star—a
potential solution—that crosses its event horizon. A new forage is initiated generating
a fresh alternative (star) and scattering it randomly over the region of search, in case
the previous one perishes. This is done to keep the number of possibilities fixed. The
next iteration begins after all the stars have been shifted.
The following formula is used in the BHA to calculate the radius of the horizon
of event:
fBH
R = k , (8)
i=1 fi
where f i and f B H are the black hole’s and the star’s fitness values, respectively.
Performance Evaluation of Job Shop Scheduling Problem Using … 23
5 Hybrid Approach
The two most well-known algorithms in the domain of algorithms for global opti-
misation are combined to form the hybrid global optimization approach. The firefly
algorithm and the black hole algorithm are these algorithms. When compared to
other population-based nature-inspired variations, the FA and BHA can perform
efficiently in some optimization problems, but they are not suitable for excessively
intricate functions and may be prone to becoming caught in local optima. The main
objective of the proposed hybrid is to address the challenges of getting around these
limitations and improving search performance, a unique hybrid BA-FA variant is
developed. The FA is good at finding the global optimum, however unlike the black
hole algorithm, which can utilize the global ideal answer immediately at every repe-
tition. The proposed recombination [23] process between the FA and BA is used to
get over the drawbacks of the two algorithms. The hybrid strategy for the FA and BA
approaches is shown in the following Fig. 2.
6 Computational Results
Here in this work, various randomly generated issues, together with their workings,
are utilized to establish the bare essential duration period. The following tabular
columns display the timeframes for each job’s execution and fulfilment indepen-
dently. The hybrid optimization approach generates the bare essential duration period
for all test instances. The results of the various optimisation techniques are shown in
the table below together with the bare essential duration period and primary screening
period. The essential duration period of the job shop scheduling procedure is clearly
displayed in Tables 1, 2, and 3. The size of problems in Table1 is P1(10*10),
Performance Evaluation of Job Shop Scheduling Problem Using … 25
P2(10*15), P3(10*20), P4(15*10), and set-up time for each job is distributed
uniformly as UD (1,40). The size of problems in Table 2 is P5(10*10), P6(10*15),
P7(10*20), P8(15*10), and set-up time for each job is distributed uniformly as UD
(20,40). The size of problems in Table 3 is P9(10*10), P10(10*15), P11(10*20),
P12(15*10), and set-up time for each job is distributed uniformly as UD (1,20). All
the results are compiled in MATLAB 2022(b) on AMD Ryzen 5 5625U with Radeon
Graphics, 2.30 GHz, 16.0 GB RAM, 64-bit operating system, × 64-based processor.
Tables 1, 2, and 3 show that the majority of the most well-known solutions may
be obtained using the HBFA employed in this work, particularly for situations with
greater flexibility. Additionally, the approach might offer competitive answers to
most issues. Additionally, we discover that the performance of our HBFA is either
26 J. Kaur and A. Pal
superior to or comparable with that of the PSO, FA, and SFLA. When compared to
other heuristic algorithms, the findings show that our technique is a viable algorithm
for FJSP. For the sake of evaluating the performance on various problems, different
sizes of the varied problems have been taken which are being divided into groups of
3 types.
The hybrid optimization algorithm, which combines the firefly and black hole
algorithms, achieves the shortest makespan time in this operation. Each job’s total
time spent processing is calculated at the beginning and distributed evenly between
the two approaches. The job sequences comprise input whose time durations are
added to those of the other jobs and produce the smallest required time for the entire
number of jobs.
The bare essential duration is attained once all jobs have completed their whole
operations. The comparison of all algorithms is shown in Fig. 3 using a bar graph
explaining the performance of the proposed hybrid HBFA compared to the other
three algorithms. Figure 4 depicts the convergence graphs for different instances
used in the proposed work.
The goal of the research is to reduce the makespan in JSSP, and the job shop
scheduling problem is examined in the randomly generated tests of different sizes
to demonstrate the performance of the hybrid HBFA algorithm in this work. Here,
the global search capability of FA has been combined with local search algorithm
BA to get the proper balance between exploration and exploitation. The obtained
results were compared with FA, PSO, and SFLA to check the performance of the
hybrid HBFA, and the proposed algorithm came out to be the best one among all
these algorithms using the generated instances of JSSP.
Performance Evaluation of Job Shop Scheduling Problem Using … 27
Fig. 4 (continued)
It is suggested for the future to use the proposed algorithm for different parameter
settings and use in a fuzzy environment. Moreover, this algorithm can be used in
other real-life applications in future like power system generation, feature selection,
image processing, or clustering analysis.
References
1. Walker RA, Chaudhuri S (1995) Introduction to the scheduling problem. IEEE Des Test Comput
12(2):60–69. https://doi.org/10.1109/54.386007
Performance Evaluation of Job Shop Scheduling Problem Using … 29
22. Azar AT, Vaidyanathan S (2015) Blackhole algorithms and applications. Stud Comput Intell
575:v–vii. https://doi.org/10.1007/978-3-319-11017-2
23. Kaur J, Pal A (2023) Development and analysis of a novel hybrid HBFA using firefly and
blackhole algorithm. In: Third congress on intelligent systems, pp 799–816 [Online]. Available:
https://link.springer.com/chapter/10.1007/978-981-19-9225-4_58. Accessed 03 Oct 2023
Machine Learning and Healthcare:
A Comprehensive Study
Abstract This paper delves into the dynamic intersection of machine learning (ML)
and healthcare, envisioning a paradigm shift in diagnostic accuracy, personalized
treatment, and streamlined administration. It meticulously explores various ML algo-
rithms, spanning deep learning, decision trees, and clustering techniques, pivotal
in domains like early cancer detection, diabetes detection, heart disease detection,
autism spectrum disorder detection, and Parkinson’s disease detection. Rigorous
model evaluation, employing accuracy, precision, F1-score, specificity, and mean
squared error metrics, ensures algorithm dependability. However, data privacy chal-
lenges, amplified by intricate regulations, persist. Ethical considerations add compli-
cated dimensions, including algorithmic bias and cultivating patient trust. Address-
ing these necessitates robust education for healthcare professionals and alignment
with legal frameworks. Despite challenges, the paper advocates for a conscientious
integration of ML, emphasizing its transformative potential in healthcare and urg-
ing judicious technology amalgamation to propel advancements in patient care and
clinical outcomes.
1 Introduction
The fusion of machine learning and the healthcare sector represents a pivotal junc-
ture in the history of medical science and its practical applications. This intersection
of machine learning algorithms with the extensive reservoir of health-related data
has unveiled uncharted opportunities to reshape the landscape of healthcare. This
transformative synergy carries the potential to fundamentally redefine medical diag-
nostics, prognostications, and the quality of patient care, transcending the traditional
confines of medical knowledge and practice [18].
This paper embarks on a comprehensive exploration of the machine learning
algorithms that underpin these innovations, including deep learning, decision trees,
and clustering techniques. These algorithmic paradigms are the cornerstones upon
which groundbreaking healthcare applications are erected, spanning early disease
detection, personalized treatment strategies, optimized resource allocation, stream-
lined administrative workflows, and much more.
Data privacy and security, owing to the highly sensitive nature of healthcare data,
assume an imperative stance. The inadvertent exposure of patient data poses grave
threats, mandating the rigorous implementation of robust encryption, stringent access
controls, and continuous monitoring. Additionally, navigating the intricate landscape
of regulatory adherence, encompassing compliance with the “General Data Protec-
tion Regulation” and the “Health Insurance Portability and Accountability Act” [7],
presents formidable challenges to healthcare establishments and machine learning
developers.
The forthcoming sections of this paper undertake a meticulous examination of
machine learning algorithms and their healthcare applications. They probe the sub-
tleties of model assessment, delve into the intricacies of data privacy and security,
and dissect the plethora of challenges that the confluence of machine learning and
healthcare presents.
This section provides an overview of various ML models and their key features,
along with their applications in the healthcare domain. These models have shown
remarkable potential in improving healthcare services, diagnosis, and patient care.
Table 1 presents an in-depth comparative analysis of the algorithms.
The random forest algorithm, a versatile and powerful machine learning technique,
has found valuable applications in healthcare, offering robust predictive capabilities
while mitigating overfitting and enhancing model interpretability. At its core, the
random forest algorithm operates by constructing a multitude of decision trees during
training [12]. Each tree is grown using a bootstrapped subset of the training data, and
at each node of the tree, a random subset of features is considered for splitting. This
inherent randomness helps in creating diverse and uncorrelated trees. Once the forest
of trees is built, predictions are made by aggregating the results from individual trees.
For regression tasks this typically involves averaging the predictions from each
tree, while for classification tasks, it employs a majority vote mechanism to deter-
Machine Learning and Healthcare: A Comprehensive Study 33
mine the final class prediction. Random forest offers several advantages for healthcare
applications. Firstly, it excels in handling high-dimensional data with numerous fea-
tures, which is common in medical datasets. Secondly, it provides a natural way to
measure feature importance, aiding in the identification of critical factors in health-
care predictions. Additionally, its ensemble approach reduces the risk of overfitting,
enhancing model generalization to unseen data [16].
Interpreting random forest models is relatively straightforward compared to com-
plex deep learning models. Feature importance scores can guide clinicians and
researchers in understanding which variables contribute most to the predictions, sup-
porting informed decision-making in healthcare scenarios. In the context of health-
care and medical research, random forest has been leveraged for tasks such as disease
risk prediction, medical image analysis, drug discovery, and patient outcome progno-
sis. Its flexibility, robustness, and interpretability make it a valuable tool for improving
healthcare diagnostics and treatment decisions while handling the challenges posed
by medical data intricacies.
unstructured data, adapt to different data types, and provide interpretable results
makes it an invaluable tool for enhancing healthcare decision support systems and
improving patient care.
The convolutional neural network (CNN) stands as a pivotal machine learning algo-
rithm, celebrated for its remarkable proficiency in image analysis and pattern recog-
nition, which has proven indispensable in the realm of healthcare. At its core, CNNs
aim to replicate the innate capability of the human visual system to perceive and
discern patterns, employing convolutional layers as the foundational building blocks
[22]. These layers are endowed with the ability to adaptively learn spatial hierarchies
of features from input data. Their operation unfolds in a systematic fashion: They
employ convolutional layers to adaptively learn spatial features from input data.
Convolutional layers use learnable filters to identify patterns like edges and shapes,
extracting pertinent features.
After convolution, pooling layers reduce spatial dimensions, preserving vital
information through techniques like max-pooling. Fully connected layers, akin to
traditional neural networks, learn intricate relationships in the data. Activation func-
tions, such as rectified linear unit (ReLU), add nonlinearity for complex mappings.
CNNs undergo supervised training using backpropagation and gradient descent
to optimize parameters, enhancing pattern recognition capabilities.
In healthcare, CNNs excel in medical image analysis, diagnosing diseases from X-
rays and MRIs, segmenting anatomical structures, and identifying anomalies. Trans-
fer learning fine-tunes pre-trained CNN models, reducing the need for extensive
medical data. Their adaptability and accuracy make CNNs indispensable in health-
care, advancing disease detection, personalized medicine, and patient care quality.
The recurrent neural network (RNN) serves as a foundational machine learning algo-
rithm, renowned for its capacity to manage sequential data, rendering it invaluable in
healthcare applications encompassing time series analysis, natural language process-
ing (NLP), and patient data modeling [3]. Unlike conventional feedforward neural
networks, RNNs manifest dynamic and recursive behaviors, enabling them to main-
tain a hidden state that retains information from previous time steps. This temporal
memory equips RNNs to adeptly process data sequences of varying lengths. The
architecture of an RNN encompasses three key components: an input layer to receive
sequential data, a hidden layer that evolves the hidden state by amalgamating current
inputs with past states, and an output layer responsible for generating predictions or
36 R. Raj and J. Kaliappan
outputs. The pivotal feature of an RNN lies in its recurrent connections within the
hidden layer [15].
At each time step, the hidden state is updated, intertwining current input with
the preceding hidden state, enabling the network to discern sequential data patterns
and dependencies. RNNs are trained using labeled sequential data, employing the
backpropagation through time (BPTT) algorithm to calculate gradients and optimize
the network’s parameters. This training process enables RNNs to recognize sequen-
tial patterns and relationships. While RNNs are robust, they do present challenges,
including issues like vanishing and exploding gradients that impede their ability
to capture extended dependencies in data. To address these limitations, advanced
variants like long short-term memory (LSTM) and gated recurrent unit (GRU) have
emerged [14], incorporating gating mechanisms for enhanced information flow reg-
ulation.
In healthcare, RNNs have demonstrated their efficacy in predicting patient out-
comes using time series data, medical speech recognition, and the analysis of elec-
tronic health records (EHRs). They excel in scenarios where the order and timing of
data points are paramount for accurate predictions. Furthermore, RNNs have found
utility in predictive modeling for disease progression and early detection.
3 Application of ML in Healthcare
Cancer detection through the utilization of machine learning (ML) represents a piv-
otal facet of healthcare. This application involves the strategic application of ML algo-
rithms and techniques to discern cancerous cells or tumors within patients, offering
the promise of early detection, a pivotal factor in enhancing treatment outcomes and
bolstering patient survival rates. The application of ML in cancer detection lever-
ages several core attributes, including automated feature extraction from medical
images, classification of medical data into pertinent categories, predictive regression
modeling, ensemble learning through combinations of models, and the potency of
deep learning. Common ML models employed in this context encompass convolu-
tional neural networks (CNNs) for image-based detection, support vector machines
(SVMs) for classifying cancer types based on gene expression data, random for-
est algorithms for feature selection and classification tasks, and logistic regression
models for assessing the probability of cancer based on clinical and demographic
features [21]. The implementation of ML in cancer detection has extended into both
research and healthcare applications, with notable instances such as Google Health’s
AI for Breast Cancer Detection, IBM Watson’s ML-driven treatment recommen-
dations, PathAI’s assistance to pathologists in cancer diagnosis [17], and research
on enhancing prostate cancer detection through MRI and clinical data. Similarly,
deep learning techniques have been applied to chest X-rays and CT scans for more
effective lung cancer detection.
Machine Learning and Healthcare: A Comprehensive Study 37
Heart disease detection through machine learning (ML) involves predicting an indi-
vidual’s likelihood of having heart disease based on diverse medical and lifestyle
factors. This application is pivotal for early intervention and prevention in heart
health. ML significantly enhances heart disease detection by assessing risk factors,
enabling early identification of potential issues, providing personalized care plans,
and categorizing individuals into heart disease or non-heart disease groups based on
data patterns. Key attributes of ML in this context encompass feature selection, clas-
sification, regression, ensemble learning, and deep learning techniques, all tailored
to extract relevant insights from health data. Commonly used ML models, including
logistic regression, random forests, support vector machines, and deep neural net-
works, contribute to accurate heart disease prediction by analyzing a spectrum of
health-related data [8].
Autism spectrum disorder (ASD) poses a unique challenge given its neurodevelop-
mental nature, characterized by a wide array of symptoms impacting social inter-
action, communication, and repetitive behaviors. In this context, machine learning
(ML) has emerged as a valuable tool, particularly in enhancing the early detection and
diagnosis of ASD, offering several distinct advantages. Foremost, ML contributes to
the early identification of ASD by discerning subtle behavioral and physiological pat-
terns in children, enabling timely interventions and therapies. Its capacity to analyze
38 R. Raj and J. Kaliappan
diverse data types, including clinical assessments, eye-tracking data, brain imag-
ing (fMRI), and genetic information, ensures a comprehensive assessment of ASD
risk [5]. Furthermore, ML introduces objectivity to the diagnostic process, equip-
ping clinicians with quantifiable metrics and reducing subjectivity. This objectivity is
especially crucial in ASD diagnosis. Additionally, ML supports the personalization
of interventions and therapies, adapting them to the unique needs and profiles of
individuals with ASD. ML’s applications in ASD detection encompass classification
models that categorize individuals into ASD and non-ASD groups, feature engineer-
ing techniques for identifying relevant data attributes, and the utilization of deep
learning models such as convolutional neural networks (CNNs) and recurrent neural
networks (RNNs) for the analysis of complex data, including brain scans [2]. Notable
ML models in this domain include random forest, capable of handling multiple data
types for a comprehensive assessment, and support vector machines (SVMs), ideal
for distinguishing individuals with ASD from those without based on feature patterns.
Deep neural networks, on the other hand, are employed to analyze brain imaging data,
detecting structural or functional abnormalities associated with ASD. The implemen-
tation of ASD detection using ML is actively pursued through research studies and
resources. Examples include research papers like “Deep multimodal learning for
the diagnosis of autism spectrum disorder” [19] and studies exploring eye-tracking
data analysis for ASD detection. Initiatives such as the Autism Brain Imaging Data
Exchange (ABIDE) [11] provide access to essential brain imaging data, facilitating
ML model development. This comprehensive approach harnesses the potential of ML
to make significant strides in early ASD detection and diagnosis, ultimately enhanc-
ing the quality of care and support provided to individuals on the autism spectrum.
clinical assessments. Moreover, time series data collected from sensors can be ana-
lyzed with specialized techniques such as recurrent neural networks (RNNs) to detect
subtle changes in motor patterns [23]. For instance, random forest models classify
individuals into PD and non-PD groups based on clinical and biomedical features.
Support vector machines (SVMs) are useful for binary classification tasks, effectively
separating PD patients from healthy individuals. Deep learning models, including
convolutional neural networks (CNNs) and recurrent neural networks (RNNs), ana-
lyze data like speech signals or gait patterns to identify PD-related abnormalities.
While PD detection using ML is an evolving field, several notable resources demon-
strate its potential. Initiatives like the Parkinson’s Voice Initiative [1] explore the use
of voice data and ML for PD detection, while research studies delve into gait analysis
and wearable sensors for monitoring. Publicly available datasets, such as the Parkin-
son’s Progression Markers Initiative (PPMI), provide essential data for research and
model development. This comprehensive approach harnesses the potential of ML to
make significant strides in early PD detection and monitoring, ultimately improving
the quality of care for individuals affected by this condition.
Table 2 presents a comparison of the healthcare application in detail.
4.1 Accuracy
4.2 Precision
how many of the positive predictions were correct. The formula for calculating pre-
cision is:
True Positives
Precision =
. (3)
True Positives + False Positives
Machine Learning and Healthcare: A Comprehensive Study 41
4.3 F1-Score
The F1-score is a critical metric used in healthcare and other domains to assess the
performance of a model, especially when the balance between precision and recall
is crucial. It provides a single score that considers both false positives (FP) and false
negatives (FN) and is particularly valuable when dealing with imbalanced datasets
or scenarios where the cost of false positives and false negatives differs significantly.
The formula for calculating the F1-score is:
2 ∗ (Precision ∗ Recall)
F1 − Score =
. (4)
(Precision + Recall)
4.4 Specificity
TN
Specificity =
. (5)
TN + FP
Mean squared error (MSE) is another widely used metric for assessing the perfor-
mance of predictive models, including those employed in healthcare applications.
MSE measures the average of the squared differences between the predicted values
and the actual (ground truth) values in a dataset. It quantifies how well a model’s
predictions align with the true outcomes while emphasizing larger errors more than
smaller ones.
The formula for calculating mean squared error (MSE):
Compute Mean Squared Error (MSE): After calculating the squared errors for all
data points, compute the mean squared error (MSE) by taking the average of these
squared errors. The formula for MSE is:
42 R. Raj and J. Kaliappan
• Inherent Data Diversity: Healthcare data is inherently diverse and often unstruc-
tured, stemming from various sources and formats. This diversity poses a chal-
lenge for ML models designed for structured data [13].
• Standardization and Preprocessing: To make healthcare data compatible with
ML algorithms, it is crucial to standardize and preprocess the data. This includes
data cleaning, normalization, and structuring to enhance its usability.
4. Regulatory Compliance
5. Ethical Concerns
6 Discussion
6.2 Findings
7 Conclusion
quantitative and qualitative insights into predictive accuracy and model reliability.
Notably, the challenge of model interpretability, especially in deep learning models,
necessitates further attention. The sanctity of data privacy and security is paramount
in healthcare, where the exposure of sensitive patient information could lead to severe
consequences. Addressing these concerns mandates rigorous measures, including
robust encryption, access controls, and vigilant monitoring.
Ethical considerations surrounding algorithmic biases and patient trust are central
to responsible machine learning in healthcare. Correcting these biases is a pivotal
step in achieving fairness and equity in healthcare AI applications. The training and
adoption of machine learning technologies in healthcare necessitate educational ini-
tiatives tailored to the varying familiarity levels of healthcare professionals with these
tools. Legal matters concerning smart contract integration and data ownership are
imperative for the long-term viability of machine learning in healthcare, ensuring
alignment with existing legal frameworks. Furthermore, challenges related to infras-
tructure disparities, data quality, model interpretability, data labeling, and clinical
implementation demand nuanced solutions to realize the full potential of machine
learning in healthcare. In this pursuit of a harmonious synergy between machine
learning and healthcare, a collective commitment to ethical, secure, and compliant
innovation, along with rigorous educational advancements, holds the promise of
transformative healthcare improvements. As the journey unfolds, these considera-
tions shall drive the responsible utilization of machine learning, promising a healthier,
brighter future for all.
References
10. Luo J, Zhang Z, Fu Y, Rao F (2021) Time series prediction of Covid-19 transmission in America
using lstm and xgboost algorithms. Results Phys 27:104462
11. Martino AD, O’connor D, Chen B, Alaerts K, Anderson JS, Assaf M, Balsters JH, Baxter L,
Beggiato A, Bernaerts S et al (2017) Enhancing studies of the connectome in autism using the
autism brain imaging data exchange ii. Scientific Data 4(1):1–15
12. Martuza Ahamad M, Aktar S, Uddin MJ, Rashed-Al-Mahfuz M, Azad AKM, Uddin S, Alyami
SA, Sarker IH, Khan A, Liò P et al (2022) Adverse effects of covid-19 vaccination: machine
learning and statistical approach to identify and classify incidences of morbidity and postvac-
cination reactogenicity 11(1):31
13. Nerenz DR, McFadden B, Ulmer C et al. (2009) Race, ethnicity, and language data: standard-
ization for health care quality improvement
14. Rashid TA, Hassan MK, Mohammadi M, Fraser K (2019) Improvement of variant adaptable
lstm trained with metaheuristic algorithms for healthcare analysis. In: Advanced classification
techniques for healthcare analysis, IGI Global, pp 111–131
15. Reddy BK, Delen (2018) Predicting hospital readmission for lupus patients: an RNN-LSTM-
based deep-learning methodology. Comput Biol Med 101:199–209
16. Rigatti SJ (2017) Random forest. J Insurance Med 47(1):31–39
17. Santosh KC, Gaur L (2022) Artificial intelligence and machine learning in public healthcare:
opportunities and societal impact. Springer Nature
18. Shailaja K, Seetharamulu B, Jabbar MA (2018) Machine learning in healthcare: a review. In:
2018 Second international conference on electronics, communication and aerospace technology
(ICECA). IEEE, pp 910–914
19. Tang M, Kumar P, Chen H, Shrivastava A (2020) Deep multimodal learning for the diagnosis
of autism spectrum disorder. J Imaging 6(6):47
20. Teshuva I, Hillel I, Gazit E, Giladi N, Mirelman A, Hausdorff JM (2019) Using wearables
to assess bradykinesia and rigidity in patients with Parkinson’s disease: a focused, narrative
review of the literature. J Neural Transmission 126:699–710
21. Wang X, Guo J, Gu D, Yang Y, Yang X, Zhu K (2019) Tracking knowledge evolution, hotspots
and future directions of emerging technologies in cancers research: a bibliometrics review. J
Cancer 10(12):2643
22. Wu J, Liu N, Li X, Fan Q, Li Z, Shang J, Wang F, Chen B, Shen Y, Cao P et al (2023)
Convolutional neural network for detecting rib fractures on chest radiographs: a feasibility
study. BMC Med Imaging 23(1):1–12
23. Xu S, Wang Z, Sun J, Zhang Z, Wu Z, Yang T, Xue G, Cheng C (2020) Using a deep recurrent
neural network with EEQ signal to detect Parkinson’s disease. Annals of Translat Med 8(14)
Evolutionary Algorithms for Fibers
Upgrade Sequence Problem
on MB-EONs
Der-Rong Din
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 47
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_4
48 D.-R. Din
In this paper, the fibers upgrade sequence problem, involving the upgrade of SB-
EONs to MB-EONs without causing service disruption, was studied. To achieve this
goal, the lightpath through the upgraded fiber must be re-routed before the upgrade.
After upgrading, the MB-capable fiber can be utilized immediately. The demand
routing algorithm on the hybrid SB/MB-EON should be employed. Figure 1 shows
the upgrade process. Figure 1a shows all lightpaths. To upgrade fiber AB, lightpaths
.l AB and .l B A should be re-routed (shown in Fig. 1b). In Fig. 1c, the upgrading of fiber
AB results in the restoration of the original requests between A and B through the
utilization of MB transmission. In Fig. 1d, e, the upgrade focus shifts to fiber BC,
while in Fig. 1f, g, the upgrade attention is directed toward fiber AC.
In MB-EONs, the maximum distance of a lightpath is determined by [5]. The net-
work’s performance can be significantly influenced by the chosen upgrade sequence.
In this study, adhering to the no-service disruption constraint, we aim to identify the
optimal fiber upgrade sequence. This problem is formally referred to as the fibers
upgrade sequence problem (FUSP) [6]. The objective function of the problem is the
average weighted load ratio (AWLR). In a previous investigation [6], five heuris-
tic algorithms were introduced to address the fiber upgrade sequence problem: (1)
Random Sequence(RS), (2) Shortest Distance First (SDF), (3) Maximum Load First
(MaxLF), (4) Minimum Load First (MinLF), and (5) Longest Distance First (LDF).
The conventional heuristic methods cannot solve large problems due to their greedy,
improving approach. Due to the hardness of the FUSP, the provision of an optimal
solution with polynomial time is not guaranteed. To address this problem, the effec-
tive genetic algorithm (GA) [7] and simulated annealing (SA) [8] algorithms were
developed in this article.
2 Related Works
2.1 MB-EONs
In MB-EONS, the transmission band is expanded to the C+L band and even expanded
to other bands (S, E, etc.) [9, 10]. Moreover, the demand routing method should be
re-developed by considering multi-band transmitting capability. Hence, the conven-
tional RMLSA problem in traditional EONs evolves into the routing, band, mod-
ulation format, and spectrum allocation (RBMLSA) problem within the context of
MB-EONs [11, 12]. For a given demand, find out the lightpath between two end
nodes, determine the selected band and modulation format, and allocate a set of FSs
on the network. These algorithms deal with pure MB-EONs, that is, all nodes and
fibers that are MB-capable.
The network upgrade can be done at once or in stages. Since the upgrade of any
node or fiber will cause transmission service interruption, to continuously provide
transmission service during the upgrade period, it is almost impossible to adopt it
on the existing backbone network with a high transmission volume. Thus, a multi-
stage upgrade strategy is more feasible [3]. In [3], authors considered the multi-stage
cost-effectiveness of network expansion and adopted a multi-stage upgrade strategy.
The authors have considered the choice of optical fiber when upgrading, not only
geographical location or network topology but also the annual growth rate of traffic
and cost estimation. In [13], authors considered the importance of fiber upgrade
options and explored the minimum cost of upgrading to support C+L bands under
the condition of transmission performance.
In [14, 15], the author studied the resource allocation problem on hybrid SB/MB-
EONs. They suggested that after doing a partial upgrade, re-provisioning should be
performed on the lightpath to obtain the actual benefits of the upgrade. In my previous
study [6], the fiber upgrade sequence problem was first studied. Five algorithms
were used to solve the FUSP: (1) Random Sequence, (2) Shortest Distance First,
(3) Maximum Load First, (4) Minimum Load First (MLF), and (5) Longest Distance
First.
50 D.-R. Din
For the hybrid SB/MB network, the node-set consists of two sub-sets, each of
which with SB and MB transmission functions. The length of a lightpath for demand
is constrained by or equal to the limitations imposed by the transmitted signal on
the hybrid SB/MB-EON. The estimation of limitations is based on the table shown
in Table 1 [5]. In my previous article [16], the service provisioning algorithms for
the hybrid SB/MB-EONs were designed. The Least Load Ratio First (LLRF) algo-
rithm, focusing on balancing the band load across the entire network, yields optimal
performance.
3 Problem Formulation
This section outlines the notations, assumptions, and objective functions associ-
ated with the FUSP. The detailed problem formulation was stated in my previous
study [6].
3.1 Notations
• .G = (V, E, dist, F SU ): The physical network of the EON..V is the set of physical
nodes, .V = V S B ∪ V M B , where .V S B and .V M B represent nodes with SB and MB
transmitting functions, respectively.. E is the set of physical links,. E = E S B ∪ E M B
and . E S B ∩ E M B = ∅, where . E S B and . E M B represent edges with SB and MB
transmission function, respectively. .dist (ei ) is the length of link .ei ∈ E. And
initially, .V = V S B , .V M B = ∅, . E S B = E, and . E M B = ∅.
• .Ʌ|V |×|V | : The traffic requirement of the network, where .λsd ∈ Ʌ is the bandwidth
of node-pair (.vs , vd ).
• . B: The set of available bands of the target MB-EONs, where . B = {C, L}, . B =
{C, L , S}, or . B = {C, L , S, E}.
• . F SUbn : The total number of FSs provided by the band.bn ∈ B. Each FS can provide
12.5 Gb/s, and . F SUC = 344, . F SU L = 480, . F SU S = 760, and . F SU E = 1, 136.
• . M L(m) ∈ {x|1 ≤ x ≤ 6, integer x} is the set of modulation levels of the set of
modulations .m ∈ M = {B P S K , Q P S K , 8Q AM, .16Q AM, 32QAM, .64Q AM}.
• .T R(m, bn, B): The transparent reach, TR) of the selected modulation .m ∈ M and
band.bn ∈ B. For the set of bands. B, the value of.T R(m, bn, B) can be determined
by Table 1 [5].
• . Nsd : The number of FSs of node-pair (.vs , vd ).
Evolutionary Algorithms for Fibers Upgrade Sequence Problem … 51
3.2 Assumptions
The major issue for the network upgrade problem considered in this paper is that
there is no disruption during the network upgrade when the backbone is upgraded.
To do this, all lightpaths passing through the upgraded fiber should be re-routed to
other links that are not suspended; however, transmission delays may occur between
node pairs.
When a fiber is selected to be upgraded, before performing the upgrade, the
following actions are performed:
• Using the establish-then-remove strategy to reroute all lightpaths of requests pass-
ing through this fiber to avoid service disruption.
• For all re-routing requests, the Least Load Ratio First algorithm [16] is performed
on the currently hybrid SB/MB-EON (with some upgraded nodes and fibers) to
fully utilize the multi-band transmission functions and make the most efficient
delivery.
52 D.-R. Din
• Stop all transmissions on the fiber and install new MB-transceivers (MB-TRs) and
MB-ROADM equipment (or an MB-BS band switch) on the end nodes of the fiber.
• The quality of transmission signals for all bands on this fiber is tested.
• Perform the LLRF algorithm [16] to route the connection requests on the new
network so that the most efficient retransmission allocations are made.
In the article, in the .tth stage, the fiber .et ∈ E is selected to be upgraded. Before
upgrading the fiber .et , the set (denoted as . Path at ) of all lightpaths currently routed
through .et is re-routed. In the first part of the stage (denoted as .ta ), all requests are
transmitted through . Path at , and it requires a .Tta time unit. In the second part of the
stage (denoted as .tb ), after the fiber .et is upgraded and before going into the next
upgrade stage, all connection requests are reallocated to find the lightpath set . Path tb
∑ there are .|E| upgrade stages to upgrade
[6]. This stage requires a .Ttb time unit. Thus,
all fibers in total, and the total time is . t=1,2,...,|E| (Tta + Ttb ). The upgrading time
axis and the respective set of used lightpaths are shown in Fig. 2.
The upgrade sequence of optical fibers does affect the set of used lightpaths and
the spectrum’s utilization. How to find the best upgrade sequence is an important
problem.
The main issue of this article is to determine the upgrade sequence of fibers so
that the overall performance of the network is optimal during the upgrade process.
The problem is an optimal scheduling problem, and the objective function is the
AWLR [6]. The overall performance takes into account the weight of upgrade and
non-upgrade time.
Let.G ta (V ta , E ta , dist, F SU ) and.G tb (V tb , E tb , dist, F SU ) represent the upgrad-
ing and the upgraded physical network of stage .t. Let . L Rta and . L Rtb represent the
load ratio of the network at stages .ta and .tb , respectively. Considering the upgrade
time-weighted load ratio of the network, the objective function is defined as
The load ratio (LR) signifies the FS utilization ratio within the network [6]. The
LR of the network is defined as the ratio of the total number of used FSs to the
bn
total number of FSs provided by the network. The binary variable .xie is set to 1
th
if the .i FS of the edge .e on band .bn is occupied, and 0 otherwise. On the hybrid
SB/MB-EON, the band provided by edges may be different. Let . Be represent all
possible bands provided by the edge .e. The . L Re represents load ratio of edge .e can
be computed by
∑ ∑ F SUbn bn
bn∈Be i=1 xie
. L Re = ∑ . (2)
bn∈Be F SUbn
Thus, besides assessing the utilization ratio of spectrum slots on the optical fiber,
the aim is also to prevent the overloading of a single band. Therefore, the network
utilization ratio. L Rta in the network upgrade stage.ta can be expressed as Equation (4)
(the same formula can be deduced in the stage .tb ).
4 Proposed Algorithms
The conventional heuristic methods face challenges in solving large problems due to
their greedy improving approach. Given the complexity of the fiber upgrade sequence
problem on MB-EONs, providing an optimal solution in polynomial time is impos-
sible. To address real-world challenges, this article introduces effective algorithms,
namely a GA [7] and a SA [8] algorithm. In these proposed algorithms, initial solu-
tions are generated using previously proposed heuristics [6] or randomly generated
upgrade sequences. The effectiveness of the proposed algorithms (GA and SA) is
examined through numerical simulations.
54 D.-R. Din
In this subsection, I will provide more details about the GA designed to solve the
FUSP.
Chromosomal coding Since the FUSP involves determining the upgrade sequence of
the fibers, the encoding method that uses a one-dimension array of integral numbers
with size .|E| is used. A sequence chromosome . SC is used to represent the upgrade
sequence of all fibers. The value in . SC[t] is the fiber .e SC[t] ∈ E, which will be
selected and upgraded to provide MB-EON transmitting capability in the .t–th stage.
Note that, . SC represents a permutation of distinct integers ( {1, 2,..., .|E|}). Initially,
we generated the population of the . SC randomly, except five . SCs of the population
are encoded by algorithms presented in [6].
This encoding allows the GA to explore different fiber upgrade sequences as
potential solutions to the problem. Each chromosome in the population represents a
unique sequence in which the fibers will be upgraded over stages.
Fitness Function The GA maps objectives costs by using fitness functions. The fit-
ness value (denoted as .α(SC)) of the . SC is the same as the objective function defined
by Equation (1). In GA, the best-fit chromosome should give a higher probability
of being selected as a parent. Thus, this probability is proportional to its fitness.
Moreover, a large number, .Cmax is used to subtract the objective cost to get the fit-
ness value. The fitness function can be computed by: .Maximize SCmax − α(SC),
where . SCmax represents the maximum value observed so far of the cost function in
all populations. This formulation ensures that the chromosome with the lowest cost
(according to the objective function) will have a higher fitness value.
Crossover Operator The single point crossover (SPC) is used in the proposed GA.
First, based on the fitness value, two parents are selected randomly and crossover.
Second, the crossover point .i is randomly selected in [1, .|E|]. Then the traditional
SPC is performed. After performing SPC, the resulting children’s SCs may not be
a feasible sequence. To avoid falling into this case, the child SC should be changed
into a feasible one. This is done by modifying the smaller portion of the child SC by
replacing the same edge sequence from the parent with no replicated edges.
Mutations or Perturbation Mechanism Four types of mutation operators are pro-
posed and used to develop the GA. Moreover, these operations are completely the
same as the perturbations developed in SA. After performing mutation (perturbation),
the resulting SC is still a constraint-satisfied one.
• Edge Exchanging Perturbation (EEP): First, two integers .i and . j (.i /= j) in [1,
.|E|] are randomly selected. Then, they exchange the values of . SC[i] and . SC[ j].
• Sub-sequence Reversing Perturbation (SSRP): First, two integers.i and. j in [1,.|E|]
(assume .i < j) are randomly selected. Then, reverse the sub-sequences between
. SC[i] and . SC[ j] (i.e., . SC[i.. j]).
• Sub-sequence Shifting Perturbation (SSSP): First, a sub-sequence. SC[i 1 ...i 2 ] (.i 2 >
i 1 ) and an integer . j (assume . j < i 1 or .i 2 < j < |E| − (i 2 − i 1 )) are randomly
Evolutionary Algorithms for Fibers Upgrade Sequence Problem … 55
selected. Shift the sub-sequence . SC[i 1 ..i 2 ] to . SC[ j..( j + i 2 − i 1 )], and adjust
other contents accordingly.
• Sub-sequence Exchanging Perturbation (SSEP): First, an integer .i in [1, .|E|]
is randomly selected. Then, two sub-sequences . SC[1..i] and . SC[i + 1..|E|] are
exchanged.
Termination Rule The number of chromosomes in the chromosome pool is consis-
tently constrained to . N population . The execution of the GA can be halted when the
number of generations (. N generation ) surpasses an upper limit defined by the user.
In this subsection, a SA algorithm is presented, and the details of the SA are described
as follows:
5 Simulation Results
In the proposed GA, we set the crossover probability to 0.8, the population size to
50, and the number of generations to 30. However, the mutation probability (. pm) is
a critical factor that can impact the GA’s performance, particularly in relation to the
traffic load. To identify the optimal value for . pm , we conducted simulations on the
COST239 network, considering two load factors: 1 and 5. These simulations were
performed on the target set . B = {C, L , S, E}. The results, depicted in Fig. 4, show-
case the average AWLR (denoted as. pm with the suffix.a) and the best AWLR (denoted
as . pm with the suffix .b) for each generation. In the case of a light load (load factor
1), as illustrated in Fig. 4a, . pm = 0.15 yielded the optimal result, closely followed by
the scenario with . pm = 0.0. For a heavier load (load factor 5), depicted in Fig. 4b, the
optimal. pm was found to be 0.2, with the second-best scenario occurring at. pm = 0.15.
For the COST239 network and different sets . B= {C, L}, . B= {C, L, S} and
. B= {C, L, S, E}, along with varying load factors {1, 1.5,..., 5}, the simulation
results of the GA, SA, and the initial heuristic algorithms are depicted in Fig. 5.
Here .G A_initial represents the best AWLR of the initial SC obtained by applying
five heuristic algorithms [6]. Meanwhile, . S A_initial is the AWLR of the initial SC
determined randomly by the SA algorithm, serving as a baseline at 100%..G A_ f inal
and . S A_ f inal signify the final results of the GA and SA, respectively. In Fig. 5, the
ratio of the AWLR of other methods to that of . S A_initial is compared. A summary
of different load factors is presented in Table 2.
The results show that SA can get the best performance than GA and heuristic
algorithms for the cases with target sets . B ={C, L, S, E} and . B ={C, L, S} shown
in Fig. 5a, b, respectively. For the set . B ={C, L}, GA can get the best result for most
of the cases due to a lack of network resources (shown in Fig. 5c). On average, SA
emerges as the superior algorithm, achieving an impressive 95.7%.
Evolutionary Algorithms for Fibers Upgrade Sequence Problem … 57
Fig. 4 Simulations results for different . pm on COST239 network of GA: a load factor 1, b load
factor 5
For the NSF14 network and different sets . B, the results of the GA, SA, and the
initial heuristic algorithms are presented in Fig. 6, and a summary of various load
factors is provided in Table 2. The results show that SA gets the best performance
than GA and heuristic algorithms in Fig. 6a. However, the results show the average
AWLRs of GA and SA are close (95.0% vs. 95.1%). In some cases, GA can get better
AWLR than SA in B={C, S, L} and B={S, L} shown in Fig. 6b, c, respectively.
It is apparent that GA excels at finding the best AWLR among a set of chromo-
somes in the initial population, outperforming . S A_initial. However, the crossover
operation in GA may contribute to increased diversity in the population, leading to
perturbations affecting the chromosomes in localized changes. While GA’s perfor-
mance might be suboptimal on the COST239 network, it demonstrates an ability to
achieve a better AWLR than SA on larger networks or networks with heavy traffic.
58 D.-R. Din
Fig. 5 Simulations results for different sets. B on COST239 network a . B ={C, L, S, E}, b. B ={C,
L, S,}, c . B ={C, L}
Fig. 6 Simulations results for different sets . B on NSF14 network a . B ={C, L, S, E}, b . B ={C,
L, S,}, c . B ={C, L}
6 Conclusions
This paper explores the fiber upgrade sequence problem (FUSP) within the context
of MB-EONs, intending to minimize the AWLR. To tackle this issue, we introduce
GA and SA algorithms and validate their effectiveness through simulations. Our
proposed algorithms exhibit superior performance when compared to traditional
heuristics. Notably, the GA surpasses the SA in smaller networks or those with high
traffic volumes. On the other hand, the SA outperforms the GA in terms of AWLR
on larger networks.
Acknowledgements This work was supported in part by the NSTC project under Grant Number
NSTC–111–2221–E–018–005 and NSTC–112–2221–E–018–008–MY2.
Evolutionary Algorithms for Fibers Upgrade Sequence Problem … 59
References
1 Introduction
G. P. Pal (B)
Department of Computer Science and Engineering, Jaypee Institute of Information Technology,
Noida, India
e-mail: ganeshpal1ster@gmail.com
R. Pal
Department of Computer Science and Engineering, School of Information and Communication
Technology, Gautam Buddha University, Greater Noida, India
e-mail: raju.pal@gbu.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 61
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_5
62 G. P. Pal and R. Pal
growth in medical data, the need for efficient and accurate image analysis techniques
has become more pronounced than ever [1].
Deep learning (DL), which is a subtype of machine learning (ML), has earned a
lot of attention because of its proficiency in autonomously extracting composite
paradigms and representations from data. Medical images, which are multi-
dimensional and heterogeneous, are an ideal area for applying deep learning tech-
niques [2]. By integrating deep learning algorithms with medical image analysis
(MIA), we can significantly enhance the precision, efficiency, and scope of clinical
research and diagnostic efforts.
This paper’s main goal is to undertake a thorough study of the possibilities of
DL algorithms for analyzing medical images. We will explore how algorithms are
disrupting the healthcare industry by examining their capabilities [3]. We also explore
the rise of deep CNNs, which have turned into a dominating force in capturing
complex picture features and allowing for precise analysis [4].
Our paper culminates in a thorough review of the current modernization, accom-
panied by a crucial conversation of the emerging problems and futurity avenues of
research. As a testament to the practicality of deep learning, we illuminate its appli-
cation in the specific contexts of COVID-19 detection and child bone age prediction,
showcasing the adaptability of deep learning algorithms to evolving medical needs
[5].
In summary, this paper endeavors to unravel the capability of DL algorithms in
revolutionizing MIP. By scrutinizing their capabilities, contributions, and limitations,
we aim to produce an extensive foundation for harnessing the power of deep learning
(DL) in the pursuit of improved healthcare diagnostics and treatment strategies.
2 Literature Review
The use of DL algorithms for MIA has witnessed exponential growth, owing to their
ability to extract complex details, features, and patterns. A multitude of studies has
demonstrated the efficacy of convolutional neural networks in deciphering composite
medical images. Figure 1 defines the schematic layout of a fundamental CNN. For
instance, Smith et al. employed CNNs for robust lung nodule detection in pulmonary
images, achieving remarkable sensitivity and specificity rates [12].
Exploring the Potential of Deep Learning Algorithms in Medical Image … 63
Johnson et al. [13] harnessed CNNs to perform automated brain lesion segmentation
with unprecedented accuracy, aiding neurologists in timely diagnosis.
Moreover, retinal imaging, a critical tool for early disease detection, has been
augmented by deep learning. Smithson et al. employed deep learning models for
diabetic retinopathy detection, demonstrating substantial improvements in diagnostic
accuracy compared to traditional methods.
3 Research Challenges
Data extraction was performed systematically for each included study. Key infor-
mation was extracted, including study title, authors, publication year, deep learning
techniques employed, medical domain or application, primary findings, and identified
challenges. Extracted data were categorized based on the specific medical domains
addressed in the study [19].
It is important to acknowledge potential limitations related to the literature review
process. The analysis is contingent upon the quality, accuracy, and thoroughness of
the methodologies and findings presented in the included studies.
5 DL Applications in MIP
Deep learning has shown promise in accurately classifying lung nodules as malig-
nant or benign in pulmonary studies. The adoption of CNNs has led to enhanced
sensitivity and specificity in nodule detection, aiding clinicians in making critical
decisions for patient care [23].
Within the realm of digital pathology, deep learning algorithms have contributed
to the automated detection of cancerous tissue in histopathological images. The
capability of CNNs to identify subtle morphological variations has paved the way
for reliable and efficient cancer diagnosis [24].
6 Discussion
Deep learning algorithms have found extensive applications in medical image clas-
sification and disease detection. One notable use case is the early detection of breast
cancer through mammography images. By training convolutional neural networks
(CNNs) on annotated datasets, researchers have achieved high accuracy rates in
distinguishing malignant and benign lesions. Such applications are crucial for timely
interventions and improved patient outcomes [27].
The potential of deep learning extends to accurate organ segmentation for treat-
ment planning. In radiation therapy, precise delineation of tumor boundaries and
surrounding healthy tissues is imperative. Deep learning algorithms, particularly U-
Net architectures, have proven adept at segmenting organs and anatomical structures
with minimal human intervention [28]. This automation enhances treatment planning
accuracy and reduces patient risk.
Deep learning algorithms serve as valuable tools in computer-aided diagnosis,
assisting clinicians in decision-making processes. In retinal imaging, for instance,
deep learning models can automatically detect diabetic retinopathy and provide
Exploring the Potential of Deep Learning Algorithms in Medical Image … 67
grading for disease severity [29]. Such assistance streamlines the diagnostic process,
especially in regions with limited access to specialized healthcare professionals.
8 Future Scope
Deep learning models can inadvertently inherit biases present in training data,
potentially leading to disparities in diagnosis and treatment. Mitigating bias and
ensuring fairness in algorithmic predictions is an ongoing challenge that necessitates
transparent data collection, model auditing, and bias-mitigation techniques [42].
The evolving field of deep learning continues to expand the possibilities in medical
image processing [43]. Emerging applications in genomics, 3D imaging, point-of-
care diagnostics, and multimodal data fusion hold promise [44]. The integration of
AI in real-time image analysis during surgical procedures is also an exciting frontier
[45].
9 Conclusion
References
1. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Sánchez CI (2017) A
survey on deep learning in medical image analysis. Med Image Anal 42:60–88
2. Shen D, Wu G, Suk HI (2017) Deep learning in medical image analysis. Annu Rev Biomed
Eng 19:221–248
3. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-
level classification of skin cancer with deep neural networks. Nature 542(7639):115–118
4. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, Ng AY (2018) Deep learning for chest
radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing
radiologists. PLoS Med 15(11):e1002686
5. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Exploring the Potential of Deep Learning Algorithms in Medical Image … 69
6. Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional
networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition
(CVPR), pp 2261–2269
7. Dhiman G, Vinoth Kumar V, Kaur A, Sharma A (2021) Don: deep learning and optimization-
based framework for detection of novel coronavirus disease using x-ray images. Interdiscipl
Sci: Comput Life Sci 13:260–272
8. Narin A, Kaya C, Pamuk Z (2021) Automatic detection of coronavirus disease (covid-19) using
x-ray images and deep convolutional neural networks. Pattern Anal Appl 24:1207–1220
9. Srinidhi CL, Ciga O, Martel AL (2021) Deep neural network models for computational
histopathology: a survey. Med Image Anal 67:101813
10. Sedik A, Hammad M, Abd El-Samie FE, Gupta BB, Abd El-Latif AA (2021) Efficient deep
learning approach for augmented detection of Coronavirus disease. Neural Comput Appl 1–18
11. Shankar K, Perumal E (2021) A novel hand-crafted with deep learning features based on a
fusion model for COVID-19 diagnosis and classification using chest X-ray images. Complex
Intell Syst 7(3):1277–1293
12. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image
segmentation. In: International conference on medical image computing and computer-assisted
intervention (MICCAI), pp 234–241
13. Litjens G, Ciompi F, Sánchez CI (2019) A survey on deep learning in medical image analysis—
top 100 cited papers. Med Image Anal 58:101563
14. Zhu J, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-
consistent adversarial networks. In: Proceedings of the IEEE International conference on
computer vision (ICCV), pp 2242–2251
15. Chen LC, Papandreou G, Schroff F, Adam H (2018) Rethinking atrous convolution for semantic
image segmentation. arXiv preprint arXiv:1706.05587
16. Ibrahim DM, Elshennawy NM, Sarhan AM (2021) Deep-chest: multi-classification deep
learning model for diagnosing COVID-19, pneumonia, and lung cancer chest diseases. Comput
Biol Med 132:104348
17. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JA, van
Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Med
Image Anal 42:60–88
18. Kim J, Hong J, Park H (2018) Prospects of deep learning for medical imaging. Precis Future
Med 2(2):37–52
19. Liu X, Gao K, Liu B, Pan C, Liang K, Yan L, Ma J et al (2021) Advances in deep learning-based
medical image analysis. Health Data Sci 2021
20. Kieu ST, Hwa AB, Hijazi MHA, Kolivand H (2020) A survey of deep learning for lung disease
detection on medical images: state-of-the-art, taxonomy, issues and future directions. J Imaging
6(12):131
21. Malhotra P, Gupta S, Koundal D, Zaguia A, Enbeyle W (2022) Deep neural networks for
medical image segmentation. J Healthc Eng 2022
22. Wang J, Zhu H, Wang S-H, Zhang Y-D (2021) A review of deep learning on medical image
analysis. Mobile Netw Appl 26:351–380
23. Chen X, Wang X, Zhang K, Fung K-M, Thai TC, Moore K, Mannel RS, Liu H, Zheng B, Qiu Y
(2022) Recent advances and clinical applications of deep learning in medical image analysis.
Med Image Anal 79:102444
24. Wang R, Lei T, Cui R, Zhang B, Meng H, Nandi AK (2022) Medical image segmentation using
deep learning: a survey. IET Image Proc 16(5):1243–1267
25. Durga Prasad Jasti V, Zamani AS, Arumugam K, Naved M, Pallathadka H, Sammy F, Raghu-
vanshi A, Kaliyaperumal K (2022) Computational technique based on machine learning and
image processing for medical image analysis of breast cancer diagnosis. Secur Commun Netw
2022:1–7
26. Suganyadevi S, Seethalakshmi V, Balasamy K (2022) A review on deep learning in medical
image analysis. Int J Multimedia Inf Retrieval 11(1):19–38
70 G. P. Pal and R. Pal
27. Qureshi I, Yan J, Abbas Q, Shaheed K, Riaz AB, Wahid A, Jan Khan MW, Szczuko P (2022)
Medical image segmentation using deep semantic-based methods: a review of techniques,
applications and emerging trends. Inf Fusion
28. Tchito Tchapga C, Mih TA, Kouanou AT, Fonzin TF, Fogang PK, Mezatio BA, Tchiotsop
D (2021) Biomedical image classification in a big data architecture using machine learning
algorithms. J Healthc Eng 2021:1–11
29. Ma J, Song Y, Tian X, Hua Y, Zhang R, Wu J (2020) Survey on deep learning for pulmonary
medical imaging. Front Med 14:450–469
30. Kaur A, Singh Y, Neeru N, Kaur L, Singh A (2022) A survey on deep learning approaches to
medical images and a systematic look up into real-time object detection. Arch Comput Methods
Eng 1–41
31. Liu J, Pan Y, Li M, Chen Z, Tang L, Lu C, Wang J (2018) Applications of deep learning to
MRI images: a survey. Big Data Mining Anal 1(1):1–18
32. Giger ML (2018) Machine learning in medical imaging. J Am Coll Radiol 15(3):512–520
33. Maier A, Syben C, Lasser T, Riess C (2019) A gentle introduction to deep learning in medical
image processing. Z Med Phys 29(2):86–101
34. Arabahmadi M, Farahbakhsh R, Rezazadeh J (2022) Deep learning for smart healthcare—a
survey on brain tumor detection from medical imaging. Sensors 22(5):1960
35. Pesapane F, Codari M, Sardanelli F (2018) Artificial intelligence in medical imaging: threat
or opportunity? Radiologists again at the forefront of innovation in medicine. Eur Radiol Exp
2:1–10
36. Wang L, Wang H, Huang Y, Yan B, Chang Z, Liu Z, Zhao M, Cui L, Song J, Li F (2022) Trends
in the application of deep learning networks in medical image analysis: evolution between
2012 and 2020. Eur J Radiol 146:110069
37. Li Z, Dong M, Wen S, Hu X, Zhou P, Zeng Z (2019) CLU-CNNs: object detection for medical
images. Neurocomputing 350:53–59
38. Severn C, Suresh K, Görg C, Choi YS, Jain R, Ghosh D (2022) A pipeline for the implementation
and visualization of explainable machine learning for medical imaging using radiomics features.
Sensors 22(14):5205
39. Ebied M, Elmisery FA, El-Hag NA, Sedik A, El-Shafai W, El-Banby GM, Soltan E et al (2023)
A proposed deep-learning-based framework for medical image communication, storage and
diagnosis. Wirel Pers Commun 131(4):2331–2369
40. Tuyet VTH, Binh NT, Quoc NK, Khare A (2021) Content based medical image retrieval based
on salient regions combined with deep learning. Mobile Netw Appl 26:1300–1310
41. Sharif, MI, Li JP, Khan MA, Saleem MA (2020) Active deep neural network features selection
for segmentation and recognition of brain tumors using MRI images. Pattern Recogn Lett
129:181–189
42. Chola C, Mallikarjuna P, Muaad AY, Bibal Benifa JV, Hanumanthappa J, Al-antari MA (2021)
A hybrid deep learning approach for COVID-19 diagnosis via CT and X-ray medical images.
Comput Sci Math Forum 2(1):13
43. Cao X, Fan J, Dong P, Ahmad S, Yap P-T, Shen D (2020) Image registration using machine and
deep learning. In: Handbook of medical image computing and computer assisted intervention.
Academic Press, pp 319–342
44. Rukundo O (2023) Effects of image size on deep learning. Electronics 12(4):985
45. Farzaneh N, Stein EB, Soroushmehr R, Gryak J, Najarian K (2022) A deep learning frame-
work for automated detection and quantitative assessment of liver trauma. BMC Med Imaging
22(1):39
Comparative Analysis of Image
Enhancement Techniques: A Study
on Combined and Individual Approaches
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 71
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_6
72 A. Bhaskar and B. Joshi
and objective evaluation process. Utilize objective metrics to quantify the perfor-
mance of each technique, allowing for a precise evaluation and comparison, leading
to data-driven conclusions.
Identification of Optimal Methods: Discriminate and identify the three most effective
image processing methods by rigorously evaluating them based on PSNR, compres-
sion ratio, and MSE. These methods, honed through these parameters, are expected
to excel in various digital image enhancement and transformation tasks. Identify the
most robust and versatile techniques based on quantitative assessments, establishing
a foundation for their strategic application in diverse image processing scenarios.
Insights Enrichment: Offer invaluable insights into the nuances of image processing,
enhancing the understanding of these techniques. Through qualitative assessments,
provide nuanced insights that contribute to a deeper comprehension of image
processing methodologies, fostering an enriched understanding of their practical
applications and limitations.
2 Literature Survey
In paper [1] Bipin Nair et al., the study focuses on preserving ancient Kannada
rock inscriptions containing valuable historical and mythological information from
temples. By employing a novel stage-wise processing approach involving smoothing,
sharpening, noise removal, outlier detection, and thresholding, the work transforms
degraded inscriptions into readable formats. Addressing challenges like oil stains,
erosion, and uneven illumination, the proposed model achieves an impressive 95%
accuracy, outperforming existing methods such as Otsu and Sauvola. This research
ensures the preservation of precious historical data inscribed in rocks, contributing
significantly to cultural heritage conservation.
In paper [3] Orhei et al., the rapid growth of digital photography in mobile devices
has led to diverse image sensors and quality issues due to hardware constraints.
To address this, extensive research has focused on image denoising and sharp-
ening techniques. Drawing inspiration from successful applications of dilated filters
in edge detection, this study introduces an innovative approach. By integrating
dilated filters into traditional sharpening algorithms like High Pass Filter (HPF) and
Unsharp Masking (UM), significant improvements have been achieved both visu-
ally and statistically. This modification offers enhanced image sharpness, surpassing
outcomes from conventional methods.
In paper [2] Burhan et al., this study addresses challenges in underwater image
quality caused by light scattering and absorption. Focusing on marine biology and
archaeology, the research employs color correction techniques, specifically gamma
74 A. Bhaskar and B. Joshi
correction and image sharpening, after compensating for color imbalances. Eval-
uation across various image conditions (bluish, greenish, fogy) reveals significant
improvements. Statistical metrics including Information Entropy (IE), Underwater
Color Image Quality Metric (UCIQM), and Underwater Image Quality Measure
(UIQM) demonstrate that the image sharpening algorithm outperforms gamma
correction, highlighting its efficacy in enhancing underwater image quality.
In paper [4] Panda et al., this study introduces an innovative interpolation tech-
nique to enhance low-resolution images, addressing blurring introduced during up-
sampling. A post-processing method is applied to eliminate blurring artifacts. The
approach identifies high-frequency degradation due to interpolation, sharpening
degraded edges using a high-order Laplacian filter. Experimental results demon-
strate its superiority over existing methods, showcasing improved image quality in
various natural images.
In paper [5] Xu et al., this research introduces IRLIC, an innovative image
compression framework employing invertible resampling-based layered image
compression. It utilizes flow-based generative models and invertible neural networks
(INN) to achieve effective image compression. By splitting images into down-
sampled and high-frequency parts, rescaling is applied symmetrically using INN.
This method outperforms existing approaches like BPG and other learning-based
compression, especially at bit rates below 1.8bpp, ensuring superior image quality
in low-resolution scenarios.
In paper [6] Baba Fakruddin Ali et al., this paper delves into the growing need
for data compression in modern communication technologies. Focusing on image
compression, it addresses challenges in virtual photograph transmission and infor-
mation storage for various applications like satellite remote sensing and medical
imaging. The study emphasizes the integration of machine learning principles into
image compression algorithms, aiming to rank and analyze their effectiveness,
bridging the gap between image compression and machine learning strategies.
In paper [9] Liu et al., this paper tackles image denoising challenges using
spectrum theory and graph signal analysis. By treating images as graph signals,
the study introduces a novel denoising method leveraging graph Laplacian matrix
and frequency domain low-pass filtering. This innovative approach enhances image
smoothness by incorporating signal priors. Experimental results affirm its effec-
tiveness, surpassing traditional methods like Wiener and Gaussian filtering. The
study underscores the potential of graph signal-based techniques in revolutionizing
complex noise reduction tasks, offering a promising avenue in image processing.
In paper [10] Kuttan et al., in the fast-paced twenty-first century, the demand for
high-resolution images has surged due to expanding human activities. However, envi-
ronmental changes often introduce noise, making it challenging for researchers to
denoise images effectively. Image denoising is crucial for producing clear, appealing
images by recovering degraded pixel values. This research explores denoising tech-
niques, algorithms, and applications across domains, addressing the challenge of
noise-induced blurriness and distortion in images and videos.
In paper [7] Joshi et al., this paper introduces an innovative approach enhancing
morphological operators, crucial in applications like medical image analysis. By
Comparative Analysis of Image Enhancement Techniques: A Study … 75
3 Overview of Methods
4 Proposed Methodology
In contrast to the traditional methods explored in the previous section, our proposed
methodology introduces a comprehensive approach that integrates three funda-
mental image enhancement techniques: Compression, Smoothing, and Denoising.
This novel amalgamation aims to address multiple challenges in digital imaging,
ensuring not only noise reduction but also optimal preservation of essential image
features.
1. Compression
Efficient data storage and rapid transmission are pivotal in digital image processing.
Our methodology incorporates advanced compression techniques, optimizing data
compression ratios. By utilizing innovative compression algorithms, we minimize
storage requirements while safeguarding image fidelity, enabling seamless data
transmission in various applications.
Comparative Analysis of Image Enhancement Techniques: A Study … 77
2. Smoothing
Smoothing techniques play a crucial role in refining image quality by reducing noise
and enhancing visual aesthetics. Our methodology employs sophisticated smoothing
algorithms, such as Gaussian Blur, to ensure images attain a balanced and visu-
ally appealing texture. By eliminating unwanted artifacts, the smoothing component
enhances overall image coherence.
3. Denoising
Denoising is paramount in eliminating noise artifacts without compromising vital
image details. As shown in Fig. 1, our proposed methodology integrates cutting-
edge denoising methods, preserving essential features while achieving optimal noise
reduction. By leveraging techniques like Non-Local Means Denoising, our approach
ensures clarity and precision in the processed images.
The synergy between Compression, Smoothing, and Denoising forms the founda-
tion of our proposed methodology. This innovative integration aims to strike a delicate
balance between noise reduction and feature preservation, enhancing overall image
quality. By combining these techniques, our approach addresses the intricacies of
real-world image processing challenges, offering a versatile and effective solution
for various applications.
In the subsequent sections, we delve into the experimental results and comparative
analyses, validating the efficacy of our proposed methodology against traditional and
contemporary image enhancement techniques.
5 Basis of Analysis
Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and compres-
sion ratio serve as fundamental metrics in evaluating the effectiveness of image
enhancement techniques.
78 A. Bhaskar and B. Joshi
Mean Squared Error (MSE) quantifies the average squared difference between
the original and enhanced images, providing a numerical measure of reconstruction
accuracy. A lower MSE value indicates closer resemblance between the enhanced
image and the original, signifying superior enhancement quality.
Peak Signal-to-Noise Ratio (PSNR), on the other hand, measures the quality of
the enhanced image by comparing it to the original in terms of signal-to-noise ratio. It
is particularly useful in determining the extent to which noise or distortion has been
introduced during enhancement. Higher PSNR values indicate lower noise levels,
indicating enhanced image fidelity.
Compression ratio, a crucial parameter in digital image processing, measures the
reduction in data size achieved through compression techniques. A higher compres-
sion ratio signifies more efficient data storage and transmission, essential for various
applications.
By employing these metrics, we can precisely quantify the differences between the
Proposed Method, integrating Compression, Smoothing, and Denoising, and existing
basic image enhancement techniques. MSE and PSNR allow for a detailed assess-
ment of image fidelity, ensuring that the Proposed Method maintains or enhances
image quality. Additionally, the compression ratio metric ensures efficient utilization
of resources, providing a holistic view of the performance of the Proposed Method
in comparison with traditional techniques. These parameters enable a rigorous
comparison, elucidating the superiority of the Proposed Method over basic image
enhancement techniques.
6 Comparative Analysis
The comparative analysis section of our study unveils compelling insights derived
from the extensive evaluation of image enhancement techniques, both traditional and
innovative. The basis of comparative analysis requires the use of a dataset by Prateek
on Kaggle named Recognize Animals. The current application is done by using as
many as 4666 images of animals which can be also used for ML-based application.
As depicted in Fig. 2, which illustrates the Mean Squared Error (MSE) values,
it becomes evident that the Proposed Method outperforms basic enhancement tech-
niques significantly. Lower MSE values, indicating a closer resemblance to the orig-
inal image, are consistently observed in the Proposed Method, validating its superior
denoising and fidelity-preserving capabilities. Table 1 shows the values extracted by
us through examination, here the value must be as minimum as possible which can
be said to be true for our Proposed Method.
Moving forward, Fig. 3 demonstrates the Peak Signal-to-Noise Ratio (PSNR)
values, which corroborate our findings. Higher PSNR values in the Proposed Method
reveal its ability to maintain enhanced image clarity while minimizing noise levels,
surpassing the outcomes obtained by traditional methods. This observation under-
scores the effectiveness of the Proposed Method in preserving image quality during
Comparative Analysis of Image Enhancement Techniques: A Study … 79
the enhancement process. Table 2 shows the values extracted by us through exami-
nation, here the value must be as maximum as possible which can be said to be true
for our Proposed Method.
Additionally, Fig. 4 provides a visual representation of the compression ratio,
a vital metric in digital image processing. The graph underscores the efficiency of
the Proposed Method in data storage and transmission, as evidenced by the higher
compression ratios achieved. This efficiency is essential for practical applications
where optimized resource utilization is paramount. Table 3 shows the values extracted
by us through examination, here the value must be as minimum as possible which
can be said to be true for our Proposed Method.
Our results not only validate the superiority of the Proposed Method but also
highlight the limitations of basic image enhancement techniques. The amalgamation
of Compression, Smoothing, and Denoising in the Proposed Method has yielded a
holistic solution that transcends the constraints of traditional methods. These observa-
tions, coupled with the robust data presented, provide a foundation for future research
80 A. Bhaskar and B. Joshi
in the domain of image enhancement and underscore the practical significance of our
findings. In essence, the comparative analysis presented here paints a vivid picture of
the efficacy of the Proposed Method, positioning it as a pioneering advancement in
the field of digital image processing. As the Proposed Method achieves better results
in all the parameter-based considerations, the proposed methodology accuracy can
be considered to be in high regards with its counterpart traditional methods.
7 Conclusion
References
1. Bipin Nair BJ, Anusha MU, Anusha J (2022) A novel stage wise denoising approach on ancient
Kannada script from rock images. In: 2022 7th International conference on communication and
electronics systems (ICCES). Coimbatore, India, pp 1715–1723. https://doi.org/10.1109/ICC
ES54183.2022.9835997
2. Burhan S, Sadiq A (2022) Two methods for underwater images color correction: gamma correc-
tion and image sharpening algorithms. In: 2022 Fifth college of science international conference
of recent trends in information technology (CSCTIT). Baghdad, Iraq, pp 31–35. https://doi.
org/10.1109/CSCTIT56299.2022.1014558
3. Orhei C, Vasiu R (2022) Image sharpening using dilated filters. In: 2022 IEEE 16th Interna-
tional symposium on applied computational intelligence and informatics (SACI). Timisoara,
Romania, pp 000117–000122. https://doi.org/10.1109/SACI55618.2022.9919568
4. Panda J, Meher S (2022) A novel image upscaling method using high order error sharpening.
In: 2022 IEEE 6th conference on information and communication technology (CICT). Gwalior,
India, pp 1–6. https://doi.org/10.1109/CICT56698.2022.9997936.
5. Xu Y, Zhang J (2021) Invertible resampling-based layered image compression. In: 2021 Data
compression conference (DCC). Snowbird, UT, USA, p 380. https://doi.org/10.1109/DCC
50243.2021.00064
6. Baba Fakruddin Ali BH, Prakash R (2021) Overview on machine learning in image compression
techniques. In: 2021 Innovations in power and advanced computing technologies (i-PACT).
Kuala Lumpur, Malaysia, pp. 1–8. https://doi.org/10.1109/i-PACT52855.2021.9696987
7. Joshi N, Jain S (2020) A robust approach for application of morphological operations on MRI.
In: 2020 8th International conference on reliability, infocom technologies and optimization
(trends and future directions) (ICRITO). Noida, India, pp. 585–589. https://doi.org/10.1109/
ICRITO48877.2020.9198011
8. Rahmayuna N, Adi K, Kusumaningrum R (2021)Tableware ceramics defect detection using
morphological operation approach. In: 2021 4th International seminar on research of informa-
tion technology and intelligent systems (ISRITI). Yogyakarta, Indonesia, pp 412–416. https://
doi.org/10.1109/ISRITI54043.2021.9702806
9. Liu M, Wei Y (2019) Image denoising using graph-based frequency domain low-pass filtering.
In: 2019 IEEE 4th International conference on image, vision and computing (ICIVC). Xiamen,
China, pp 118–122. https://doi.org/10.1109/ICIVC47709.2019.8980994
10. Kuttan DB, Kaur S, Goyal B, Dogra A (2021) Image denoising: pre-processing for enhanced
subsequent CAD analysis. In: 2021 2nd International conference on smart electronics and
communication (ICOSEC). Trichy, India, pp 1406–1411. https://doi.org/10.1109/ICOSEC
51865.2021.9591779
11. Gudkov V, Moiseev I (2020) Image smoothing algorithm based on gradient analysis. In:
2020 Ural symposium on biomedical engineering, radioelectronics and information technology
(USBEREIT). Yekaterinburg, Russia, pp 403–406. https://doi.org/10.1109/USBEREIT48449.
2020.9117646
12. Khetkeeree S, Thanakitivirul P (2020) Hybrid filtering for image sharpening and smoothing
simultaneously. In: 2020 35th International technical conference on circuits/systems, computers
and communications (ITC-CSCC). Nagoya, Japan, pp 367–371
Smishing: A SMS Phishing Detection
Using Various Machine Learning
Algorithms
Abstract Amid the pandemic, there has been a steep rise in cybercrimes against
individuals and corporations, making implementing security measures even more
imperative. This paper proposes a machine learning-based approach for detecting
phishing SMS threats using datasets of manually made SPAM and HAM texts. Addi-
tionally, pre-existing link datasets are used for training and testing spam and ham
links. Furthermore, cloud-hosted application is developed for proof of concept, capa-
ble of detecting malicious URLs and SMS. VirusTotal API is utilized and integrated
with the application for detecting harmful URLs using existing datasets. The datasets
are evaluated using random forest (RF), long short-term memory (LSTM), logistic
regression (LR), and support vector machine (SVM) algorithms conducted by consid-
ering precision, recall, and F1-score, ensuring efficiency in distinguishing between
legit and spam. This paper enhances SMS phishing protection using machine learning
advancements, demonstrating robust defense against phishing attempts, suggesting
widespread integration into mobile security frameworks.
1 Introduction
In today’s digital era, cell phones have become a vital part of modern life, with over
2.68 billion users worldwide [1]. It offers features like Short Message Service (SMS)
and internet access, which have become vital parts of daily life [8]. The global cellu-
lar messaging market cost increased from 179.2 billion USD in 2010 to 253 billion
USD in 2014, resulting in increased SMS profits. Adding to it, in the year 2020,
internet usage has been significantly influenced by 23% compared to the year 2019
because of the widespread use of smartphones [5, 32]. But on the flip side, the rise in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 83
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_7
84 P. Prajapati et al.
the internet has led to an upsurge in cyber fraud, with the phishing attack ratio being
the highest at 69.2% during and after COVID-19 [3]. However, this convenience has
increased steeply in cyber fraud, particularly in phishing attacks [20]. According to
authors in [25], phishing is a cybercrime where cybercriminals deceive individuals
into revealing confidential information by pretending to be someone legitimate. It
has evolved into a significant cybersecurity threat. This paper focuses on phishing
SMS due to its prevalence in crucial bank messages and the lack of awareness among
elderly people, who rely heavily on SMS or WhatsApp for communication, mak-
ing them potential targets for phishing scams. Detecting phishing SMS messages
is crucial for safeguarding personal privacy, security, and business integrity. Fraud
messages sent via SMS pose a serious risk to individuals and businesses because they
use mobile communication channels to deceive recipients into providing private data
and compromising integrity. Furthermore, in the year 2022, over 300 billion phish-
ing SMS were sent, targeting individuals and organizations worldwide, resulting in
billions of dollars in financial losses [7].
Therefore, this research aims to develop a real-time system for detecting SMS
phishing messages using various machine learning algorithms. It also proposes a
comprehensive approach to detecting fake text messages that appear to come from a
trusted source, a basic idea of spoofing and stealing identities for some motivational
gains. Our work will be capable of extracting specific features of SMS messages,
enhancing their adaptability and effectiveness in identifying various phishing tech-
niques. Our work also aims to evaluate and compare the performance of different
machine learning algorithms in phishing SMS detection, identifying strengths and
weaknesses to provide insights into the optimal use of machine learning in combating
phishing attacks.
This paper elucidates the solution to existing phishing attacks, focusing on their
evolution and techniques. It details the research methodology, including data collec-
tion and developing a dataset to address the challenge of creating a realistic dataset
for phishing SMS where privacy concerns are mitigated by employing synthetic data
generation techniques. These methods involve creating artificial but realistic samples
that preserve the statistical characteristics of genuine SMS messages while avoid-
ing the use of sensitive personal information. Additionally, to enrich the existing
datasets Spam SMS Prediction [18] and Malicious URLs [29] with authentic ham
(non-phishing) and spam samples, SMS messages were collected from end users
with their permission, ensuring that ethical standards are maintained throughout the
dataset creation process. We added 2209 SMS from the end user to the existing
Spam SMS Prediction dataset [18] which initially had 5576 entries of ham or spam
SMS. This approach allows for the development and evaluation of robust phish-
ing detection models while upholding privacy and ethical considerations. Later, the
dataset is trained and tested using machine learning algorithms for feature extraction.
Subsequently, experimental results are presented by evaluating the performance of
the proposed methodologies on real-world datasets. Further details are provided in
subsequent sections.
Smishing: A SMS Phishing Detection Using … 85
2 Motivation
In today’s world, internet usage has become a big part of everyone’s lives, but not
everyone knows how to stay safe online. Especially, online criminals target elderly [4]
users due to their lack of knowledge about the internet and its risks. Therefore, phish-
ing awareness training can help them identify phishing messages and avoid attacks.
As per the Internet Crime Complaint Center (IC3), Federal Bureau of Investigation
(FBI) [16], 300,497 victims registered complaints regarding phishing attempts in
the last year 2022, resulting in losses of $52,089,159 amount. Not only for this rea-
son, but the elderly are at the top of fraud lists because they lack awareness about
technology and internet threats, and they also become victims of financial crime
through communications such as bank notifications and OTP, which are frequently
sent via text message. Therefore, this work is driven by a strong desire to protect
people, especially those who might not be as familiar with online safety. This work
includes developing an application that helps segregate malicious and non-malicious
text messages and alter spam. This will result in the creation of effective defenses
against these clever scams, keeping consumers safe online.
3 Related Work
The detection and mitigation of phishing attacks through SMS messages have been
the subject of research due to their high threat ratio in the digital landscape. In this
section, the paper review comprises key studies and approaches that have signif-
icantly contributed to the field of phishing SMS detection. The authors reviewed
and compared various machine learning techniques in [10]. They have employed
machine learning algorithms to detect fake SMS, utilizing behavior and signature
detection techniques, and send the collected data from mobile devices to a server for
spam detection. The paper utilizes deep belief networks (DBNs) to compare the algo-
rithms, whereas we have used recurrent neural networks (RNNs) and deep learning
algorithms. DBNs are pre-trained unsupervised, whereas RNNs are trained sequen-
tially. Likewise, in this work, we have tested the datasets Spam SMS Prediction [18]
and Malicious URLs [29] using various algorithms, namely logistic regression (LR),
long short-term memory (LSTM), random forest (RF), and support vector machine
(SVM). Varied machine learning models are employed to improve rule-based detec-
tion accuracy and adapt to varied parameters. The algorithms have a huge capacity
to train and easily learn the pattern of the specific dataset.
This research focuses on the systematic evaluation of spam and ham SMS datasets
to build an effective detection system. The dataset is divided into five distinct phases,
including supervision, preprocessing, model evaluation, training, and testing. Super-
vision involves human experts manually labeling a significant portion of the dataset
as spam or ham, serving as a reference for training and testing the models. More-
over, preprocessing involves cleaning and organizing the data for effective analysis,
86 P. Prajapati et al.
using techniques like removing duplicates, handling missing values, and standard-
izing formats. Text normalization methods are employed to ensure consistency in
textual content, whereas the model evaluation uses metrics like accuracy, precision,
recall, and the F1-score for robust evaluation. Furthermore, training involves using
machine learning algorithms to learn patterns distinguishing spam from legitimate
messages from a dataset. The algorithms used in this work are LSTM, LR, RF, and
SVM. In the last stage, testing involves using a separate portion of the dataset unseen
during training to evaluate the performance of the trained model. Subsequently, all
the algorithms share a common logistic approach, and each one exhibits unique char-
acteristics. With respect to LSTM [30], they are recurrent neural networks that excel
at handling long-term dependencies within sequential data. They capture and utilize
information over extended sequences, making them ideal for tasks requiring con-
textual understanding. LSTMs use specialized gating mechanisms, including input,
forget, and output gates, to regulate information flow and adaptively learn patterns.
They also address the vanishing or exploding gradient problem, making them useful
for sequential data analysis and prediction, whereas LR [31] is a linear classification
algorithm. It models the probability of a binary outcome, making it suitable for binary
classification tasks. It is interpretable and easy to implement. Subsequently, RF [28] is
an ensemble learning method composed of multiple decision trees. It combines their
outputs for more accurate and stable predictions. It is resistant to overfitting, handles
high-dimensional data well, and provides feature importance scores. However, it may
be computationally intensive due to its ensemble nature. Furthermore, SVM [14] is a
powerful algorithm for both linear and nonlinear classification tasks. It separates data
points with a clear margin, aiming to maximize the margin’s width. SVM is effective
in high-dimensional spaces and can handle complex decision boundaries. It may be
sensitive to hyperparameter tuning. Using all these above-mentioned algorithms, the
spam and ham SMS datasets are trained and tested.
To make this work user-friendly and considering the security concerns discussed
in [23], an application for real-time detection is developed alongside the research
on model development. A practical SMS application has been developed to detect
phishing and spam messages. The application uses machine learning models to differ-
entiate between legitimate and malicious messages. The process involves user input,
preprocessing, feature engineering, model prediction, and alert notification. Mes-
sages are entered; thereafter, the application cleans and organizes the data, removing
duplicates [24, 26], and missing values. Relevant features are extracted from the text,
such as word frequencies or spam content patterns, to enable the machine learning
model to process the data effectively. If the model identifies the message as spam,
the application promptly alerts the user, providing an additional layer of protection
against malicious content. The authors in [28] discuss a literature survey of exist-
ing methods, algorithms, and techniques. It briefly describes the types of phishing
attacks possible and a few different methods on how to detect and prevent them from
happening. Phishing attacks are a potential threat when there is a huge amount of data
on the websites. It is easy to clone or replicate the sites and cause some social or eco-
nomic harm to the authorities or maybe users. This review gives a short description
of 10 different survey papers done in the past few years based on different machine
Smishing: A SMS Phishing Detection Using … 87
learning or deep learning concepts. It also explores and states whether this survey
and techniques were sufficient to detect the attacks on their websites. Following the
study, this review helps prevent online fraud among customers. As it is a literature
review, it broadly explains how the dataset was collected, how it was filtered, and
the process of collecting the data from each one of them.
The authors in [2] worked on various machine learning models to detect fraud
SMS messages. The hybrid technique increases accuracy and fraud detection capa-
bilities by utilizing the advantages of many algorithms. It provides a viable method
to deal with the growing problem of fraudulent SMS messages by merging these
models. Similarly, [15] proposed a system that uses NLP techniques to improve the
identification and detection of spam messages. The study combines various embed-
ding to improve the accuracy of the systems by using transformer-based embedding
as leverage. These group learning techniques provide a more effective barrier against
unsolicited messaging. Also, [21] examines threats for malware under the Android
SMS application. In order to provide insight into these attacks’ strategies and poten-
tial weaknesses, the research attempts to identify patterns and characteristics of these
attacks. This analysis helps to improve Android security mechanisms and protect user
devices by analyzing real-world data. As per [12], a secure mechanism was proposed
to ensure the safety of customers and transactions.
Therefore, this system employs advanced authentication and verification methods
to combat voice-based phishing attempts. Due to the many bot-generated voice-over
calls, many customers get fooled and reveal their personal information. This enhanced
system helps improve the overall security and trustworthiness of banking services.
The authors in [11] focused on the analysis of malware, forms of attacks, and security
flaws related to smartphone use. They separated the malware categories into two
main categories: approaches based on signatures and techniques based on machine
learning (behavior detection). An overview of the risks and necessary requirements
for malware security in mobile applications is provided based on this investigation.
The impact of large language models on multiple domains was observed by [19].
ChatGPT has been thoroughly studied for tasks like code generation. Its use in
identifying malicious web content—specifically, phishing sites—has not received
much attention. This approach uses a web crawler to collect data from websites and
generate prompts based on the information gathered. This method allows for the
identification of social engineering techniques in the context of entire websites and
URLs, as well as the detection of various phishing websites [6, 17].
4 Proposed Work
In this phishing SMS detection work, various machine learning algorithms are used
to analyze fraudulent text messages. It begins with rigorous preprocessing to remove
extraneous data. Adding to that, the dataset is split into 80% for training and 20
percent for testing. The dataset includes SMS messages, with the header and body
components being crucial, as are their associated links. The header is initially checked
88 P. Prajapati et al.
for safety, then the body and later links are analyzed. If any suspicious messages are
found, an alert is triggered. The system for detecting phishing SMS messages involves
a two-tiered approach, assessing the link for potential malicious intent. The system
is integrated into an Android application hosted on a Google Cloud Platform (GCP)
instance for seamless user interaction. This cloud-based deployment enhances scal-
ability and robustness, catering to a wide user base. The application is designed for
the ease of use of users, even if they are not technically sound. The comprehensive
approach includes preprocessing, machine learning model training, and real-time
SMS analysis through the Android application, aiming to establish a robust defense
against evolving cybersecurity threats in SMS communication. Figure 1 describes
how the messages were detected and which techniques were employed. For the pre-
processing of the dataset, the CSV file was brought in at the same encoding scheme.
The stop-words were removed from the dataset for better training and testing of the
dataset. For training and testing, we have compared four different algorithms on our
own dataset Spam SMS Dataset’ [18] and Malicious URL Dataset [29]. The dataset
was tested using the following algorithms: RF, SVM, LSTM, and LR. If the algo-
rithm answers “Yes, it is Ham”, each of these algorithms checks the SMS header. The
program will then examine the body of the SMS, which comprises textual data. If the
SMS body contains a URL, it will be validated individually by the same algorithms. If
the URL and Body are both safe, the message will prompt, This “SMS is Safe”. Other-
wise, if any of them is harmful, the prompt will warn you that it is hazardous or phony.
Moreover, Algorithm 1 elucidates a standard machine learning classification
approach. It preprocesses the SMS by removing special characters and converting the
text to lowercase. It then extracts relevant features from the SMS text and any URLs
present. After training a classification model with labeled data, it uses this model to
predict whether the given SMS is spam or ham. The final step involves displaying
the classification result to the user, with a warning for either spam or confirmation
for ham. This algorithm provides a basic structure for SMS classification and the
effectiveness of the model which depends on the quality and quantity of the labeled
training data and the features extracted.
This research work is executed on a desktop with an 11th Gen Intel Core processor,
8 GB of RAM, and a dedicated GPU to evaluate different classifier methods for dis-
tinguishing spam and genuine SMS messages. The classifiers included LR, LSTM,
RF, and SVM. Each of these four machine learning algorithms—logistic regression,
LSTM, random forest, and SVM—is suitable for SMS phishing detection based on
their respective strengths. Logistic regression (LR) is a method used for detecting
phishing patterns in SMS messages and is effective when there are linear relation-
ships between input features and output, making it relevant for detecting phishing
messages. Moreover, long short-term memory (LSTM) is a recurrent neural network
that is suitable for tasks involving sequential data and considered to better algo-
rithm for text analysis, whereas random forest (RF) is an ensemble learning method
that handles nonlinear relationships and complex patterns in data. Similarly, SVM
is suitable for small and high-dimensional data and can effectively handle nonlinear
relationships through kernel functions. Conclusively, these algorithms are partic-
ularly useful in detecting diverse and nonlinear phishing messages. Adding to it,
all these algorithms work effectively by calculating four distinct metrics: accuracy,
precision, recall, and F1-score. The phishing SMS function was also highlighted,
which integrated operations like preprocessing SMS headers and content, extract-
ing sender information, and scrutinizing embedded URLs. It assigned scores based
on sender reputation, domain age, suspicious keywords, URL analysis, and header
examination. Version 9 (Android PIE) of the SDK was used to build this application
in Android Studio, and Android 33 is the targeted SDK. Java is used to develop this
application.
Moreover, Table 1 presents the performance metrics of four different machine
learning algorithms (LR, LSTM, RF, and SVM) for two distinct tasks: Spam SMS
Prediction and Malicious URL Classification. The metrics assessed include accuracy,
precision, recall, and F1-score. Among all machine learning algorithms, the random
forest algorithm is surpassed in both tasks: Spam SMS Prediction and Malicious URL
Classification. The random forest achieved the highest accuracy [27] of 98.85%
in spam SMS prediction, with a precision of 97.15%, recall of 95.00%, and F1-
score of 95.25%. In Malicious URL Classification, the random forest algorithm
demonstrates exceptional results, with an accuracy of 99.94%, precision of 99.96%
recall of 99.91%, and F1-score of 99.94%. This comprehensive evaluation highlights
the effectiveness of random forest in these tasks, highlighting its potential as a robust
solution for SMS phishing detection and malicious URL identification.
Figure 2 shows the comparison of various machine learning algorithms used to
test and train our dataset. It was experimentally derived that the accuracy of LSTM
and random forest came out to be more and approximately equal. These algorithms
are faster in learning process. The assumption of a linear relationship between input
features and log-odds of the output in logistic regression is advantageous for tex-
tual SMS datasets because it allows the algorithm to capture and model detailed
interactions between words, assisting in effective spam categorization. Hence, the
recoil of LR is much better. The F1-score is consistent among methods, indicat-
ing that each algorithm achieves a balanced precision and recall, striking a stable
trade-off between correctly detecting positive instances and avoiding false positives.
It results in equivalent overall performance. In Fig. 3, we have used four different
algorithms to compare and detect the URLS. Random forest detects malicious URLs
with 100% accuracy, whereas other algorithms fall short. It suggests that its collab-
orative learning approach, which combines many decision tree structures, excels at
catching complicated patterns and correlations in URL characteristics, resulting in
exceptional effectiveness in detecting harmful content.
This paper elucidates the entire research work, which has made significant strides
in improving cybersecurity measures in SMS communication. The evaluation is
made using various machine learning algorithms, including LR, LSTM, RF, and
SVM, to demonstrate their effectiveness in distinguishing between legitimate and
fraudulent messages. The development of a real-time detection application similar
to [22] further enhances the research’s impact by identifying and alerting users about
potential phishing attempts, ensuring accessibility and ease of use. Moving forward,
there are several tasks for further exploration and enhancement in this research work.
The primary focus will be on testing the application [13] to ensure its effectiveness
in real-world scenarios. Additionally, hosting the application on a cloud [9] platform
would enhance scalability and robustness. These steps would serve to validate and
optimize the proposed solution, ultimately contributing to a safer digital environment
for SMS communication.
References
1. Agarwal S, Kaur S, Garhwal S (2015) Sms spam detection for indian messages. In: 2015
1st international conference on next generation computing technologies (NGCT). IEEE, pp
634–638
2. Agrawal N, Bajpai A, Dubey K, Patro B (2023) An effective approach to classify fraud sms using
hybrid machine learning models. In: 2023 IEEE 8th international conference for convergence
in technology (I2CT). IEEE, pp 1–6
3. Al-Qahtani AF, Cresci S (2022) The covid-19 scamdemic: a survey of phishing attacks and
their countermeasures during covid-19. IET Informat Secur 16(5):324–345
Smishing: A SMS Phishing Detection Using … 93
4. Alwanain MI (2020) Phishing awareness and elderly users in social media. Int J Comput Sci
Netw Secur 20(9):114–19
5. Awan HA, Aamir A, Diwan MN, Ullah I, Pereira-Sanchez V, Ramalho R, Orsolini L, de
Filippis R, Ojeahere MI, Ransing R et al (2021) Internet and pornography use during the
covid-19 pandemic: presumed impact and what can be done. Front Psych 12:623508
6. Babu MSK, Chandana A, Anusha A, Harika K, Jhansi P (2023) Examining login urls to identify
phishing threats. Turkish J Comput Math Educ (TURCOMAT) 14(03):378–383
7. Balakirsky TL (2022) To “opt out” go to court: how the public nuisance doctrine can solve the
robotext circuit split and support plaintiffs. Brook L Rev 88:719
8. Brown J, Shipman B, Vetter R (2007) Sms: The short message service. Computer 40(12):106–
110
9. Butt UA, Amin R, Aldabbas H, Mohan S, Alouffi B, Ahmadian A (2023) Cloud-based email
phishing attack using machine and deep learning algorithm. Complex and Intell Syst 9(3):3043–
3070
10. Chaudhary H, Detroja A, Prajapati P, Shah P (2020) A review of various challenges in cyber-
security using artificial intelligence. In: 2020 3rd international conference on intelligent sus-
tainable systems (ICISS). IEEE, pp 829–836
11. Cinar AC, Kara TB (2023) The current state and future of mobile security in the light of the
recent mobile security threat reports. Multimedia Tools Appl 1–13
12. Denslin Brabin D, Bojjagani S (2023) A secure mechanism for prevention of vishing attack
in banking system. In: 2023 International conference on networking and communications
(ICNWC), pp 1–5
13. Gao J, Bai X, Tsai WT, Uehara T (2014) Mobile application testing: a tutorial. Computer
47(2):46–55
14. Ghosh S, Dasgupta A, Swetapadma A (2019) A study on support vector machine based linear
and non-linear pattern classification. In: 2019 International Conference on Intelligent Sustain-
able Systems (ICISS). IEEE, pp 24–28
15. Ghourabi A, Alohaly M (2023) Enhancing spam message classification and detection using
transformer-based embedding and ensemble learning. Sensors 23(8)
16. Internet Crime Complaint Center (IC3) FBoIF (2022) [Online accessed on 1-Dec-2023] https://
www.ic3.gov/Media/PDF/AnnualReport/2022_IC3Report.pdf
17. Jalil S, Usman M, Fong A (2023) Highly accurate phishing url detection based on machine
learning. J Amb Intell Humanized Comput 14(7):9233–9251
18. Kim E (2023) [Online accessed on 1-Dec-2023] https://www.kaggle.com/datasets/uciml/sms-
spam-collection-dataset
19. Koide T, Fukushi N, Nakano H, Chiba D (2023) Detecting phishing sites using chatgpt
20. Kovač A, Duner I, Seljan S (2022) An overview of machine learning algorithms for detecting
phishing attacks on electronic messaging services. In: 2022 45th Jubilee international conven-
tion on information, communication and electronic technology (MIPRO). IEEE, pp 954–961
21. Kumar A, Sharma I, Sharma A (2023) Understanding the behaviour of android sms malware
attacks with real smartphones dataset. In: 2023 International conference on innovative data
communication technologies and application (ICIDCA), pp 655–660
22. Mishra S, Soni D (2020) Smishing detector: a security model to detect smishing through sms
content analysis and url behavior analysis. Fut Generat Comput Syst 108:803–815
23. Prajapati P, Bhagat D, Shah P (2020) A review on different techniques used to detect the
malicious applications for securing the android operating system. Int J Sci Technol Res 9:5255–
5258
24. Prajapati P, Shah P (2014) Efficient cross user data deduplication in remote data storage. In:
International conference for convergence for technology-2014. IEEE, pp 1–5
25. Prajapati P, Shah P (2022) A review on secure data deduplication: cloud storage security issue.
J King Saud Univ Comput Inf Sci 34(7):3996–4007
26. Prajapati P, Shah P, Ganatra A, Patel S (2017) Efficient cross user client side data deduplication
in hadoop. J Comput 12(4):362–370
94 P. Prajapati et al.
27. Prusty SR, Sainath B, Jayasingh SK, Mantri JK (2022) Sms fraud detection using machine
learning. In: Intelligent systems: proceedings of ICMIB 2021. Springer, pp 595–606
28. Safi A, Singh S (2023) A systematic literature review on phishing website detection techniques.
J King Saud Univ Comput Inf Sci 35(2):590–611
29. Sidhartha M (2023) [Online accessed on 1-Dec-2023] https://www.kaggle.com/datasets/
sid321axn/malicious-urls-dataset
30. Staudemeyer RC, Morris ER (2019) Understanding lstm–a tutorial into long short-term memory
recurrent neural networks. arXiv preprint arXiv:1909.09586
31. Stoltzfus JC (2011) Logistic regression: a brief primer. Acad Emergency Med 18(10):1099–
1104
32. Sun Y, Li Y, Bao Y, Meng S, Sun Y, Schumann G, Kosten T, Strang J, Lu L, Shi J (2020) Brief
report: increased addictive internet and substance use behavior during the covid-19 pandemic
in China. Am J Add 29(4):268–270
Convolution Neural Network
(CNN)-Based Live Pig Weight Estimation
in Controlled Imaging Platform
Abstract This study addresses the need for a more efficient and accurate live pig
weight monitoring system in the Indian meat production industry. Conventional
methods for measuring pig weights are labor-intensive, prompting the exploration of
AI and image processing-based solutions. The research introduces a novel regression-
based Convolutional Neural Network (CNN) model trained on a dataset of 1217
images of live pigs, each accompanied by their corresponding weight values. The
model demonstrates promising results on the test dataset, with a coefficient of deter-
mination (R2) of 0.801, mean absolute error (MAE) of 0.054, and root mean square
error (RMSE) of 0.040. Data collection involved a meticulously designed imaging
platform to ensure dataset robustness. The proposed model’s efficiency is highlighted
by its convergence behavior during training and testing, showcasing its ability to
accurately predict live pig weights and its potential to revolutionize the Indian meat
production industry.
1 Introduction
The pigs or swine (Sus scrofa domesticus) are renowned for their high-quality meat
production, with live pig weight serving as a crucial indicator in optimizing overall
meat yield. Traditional live pig weight measurement methods are characterized by
labor-intensive and time-consuming processes. Presently, artificial intelligence (AI)
and image processing-based techniques have gained prominence for their efficiency
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 95
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_8
96 C. K. Deb et al.
2 Literature Review
Fig. 1 a In-house imaging platform developed at ICAR-IVRI, with adjustable camera height,
platform width, and light intensity; b Digital image of 9 months old pig with 88.2 kg body weight
where the camera height was 7 feet and 250 lx
CM, 120 CM), light (250 lx, 500 lx, and 750 lx) and camera height. Images were
collected under different combinations of the imaging conditions which made the
dataset robust in terms of variability. There are 1217 images of live pigs captured in
the top-view mode using a mounted camera. All the images were resized into 224 ×
224 shapes using OpenCV python library and saved in.png format.
3 Methodology
Over 2000 images were generated, and from this pool, 1217 images were carefully
chosen for experimentation in deep learning. This selected dataset underwent prepro-
cessing using the OpenCV Python library. Subsequently, the preprocessed data was
divided into two segments: the first part, used for training the model, and the second
part, employed for model evaluation. The workflow for estimating live pig weight is
illustrated in Fig. 2, while Table 1 presents various train-test splits within the dataset.
Following the train-test division, a Convolutional Neural Network (CNN) model was
meticulously crafted (Table 1). Finally, the model’s performance was assessed using
98 C. K. Deb et al.
metrics such as the coefficient of determination (R2), mean absolute error (MAE),
and root mean square error (RMSE).
In this study, a novel regression-based CNN model was designed to estimate live pig
weights from digital images. This model significantly differs from conventional CNN
architectures. The constructed model comprises four convolutional layers, accom-
panied by four max-pooling layers, and two fully connected layers. The features
learned and extracted from the convolutional and pooling layers are subsequently
fed through two dense layers, culminating in a single output node. The features
extracted from this final layer are employed for predicting the output, which is the
weight of the live pigs. The input images, each with dimensions of 224 × 224 pixels,
are fed into the input layer along with the corresponding pig weight values as the
target variable. Throughout the network, with the exception of the final layer, the
‘relu’ activation function was utilized. The default training iteration was set to 200
Convolution Neural Network (CNN)-Based Live Pig Weight Estimation … 99
Fig. 3 Architecture of the developed convolution neural network (CNN) model coupled with
regression neural network for live pig weight estimation
epochs with early stopping mechanisms to conserve time by halting weight updates
upon early convergence. Figure 3 illustrates the comprehensive architecture of this
regression-based CNN model designed for live pig weight estimation from digital
images.
In this study, the entire dataset was partitioned into training and testing sets for
the purpose of model training and performance evaluation. Three separate datasets
were created, each with different training–testing ratios: Dataset 1 (train:test::80:20),
Dataset 2 (train:test::75:25), and Dataset 3 (train:test::70:30), as detailed in Table 1.
The proposed regression-based CNN model was trained using the training dataset,
and its robustness was assessed by evaluating its performance on the respective
testing datasets for each data configuration. All experiments were conducted on an
NVIDIA DGX GPU Server equipped with Tesla V100 GPUs. The development of
the proposed model was implemented using Keras, a high-level Python API backed
by the Tensorflow engine.
The model exhibited its best predictive performance when applied to Dataset 3 (with
a train: test ratio of 70:30), both during training and testing, as detailed in Table 2.
Notably, on Dataset 3, the model achieved the lowest values for both MAE and
RMSE, measuring at 0.04 and 0.054, respectively, surpassing its performance on the
other datasets. These results underscore the exceptional suitability of the proposed
regression-based CNN model for Dataset 3. They also affirm that the model accu-
rately predicted the response variable, namely the live weight of the pigs. Further-
more, the highest R2 value was observed for Dataset 3, indicating that the proposed
model adeptly extracts highly correlated features from the digital images of live
100 C. K. Deb et al.
Fig. 4 Trends of losses of the proposed model during training and testing time
pigs. These learned features endow the proposed model with the capacity to effec-
tively handle variations in the response variable. Figure 4 portrays the convergence
behavior of the model’s loss function during both training and testing phases.
5 Conclusion
References
1. Stygar AH, Kristensen AR (2016) Monitoring growth in finishers by weighing selected groups
of pigs—a dynamic approach. J Anim Sci 94(3):1255–1266
2. Al Ard Khanji MS, Llorente C, Falceto MV, Bonastre C, Mitjana O, Tejedor MT, Plaizier J
(2018) Using body measurements to estimate body weight in gilts. J Anim Sci 98(2):362–367
3. Wang Z, Hou Y, Xu K, Li L (2021) Design and implementation of pig intelligent classification
monitoring system based on convolution neural network (CNN). INMATEH-Agric Eng 63(1)
4. Wang Y, Yang W, Winter P, Walker L (2008) Walk-through weighing of pigs using machine
vision and an artificial neural network. Biosyst Eng 100(1):117–125
5. Wongsriworaphon A, Arnonkijpanich B, Pathumnakul S (2015) An approach based on digital
image analysis to estimate the LWs of pigs in farm environments. Comput Electron Agric
115:26–33
6. Shi C, Teng G, Li Z (2016) An approach of pig weight estimation using a binocular stereo
system based on LabVIEW. Comput Electron Agric 129:37–43
7. Buayai P, Piewthongngam K, Leung CK, Saikaew KR (2019) Semi-automatic pig weight
estimation using digital image analysis. Appl Eng Agric 35(4):521–534
8. Jun K, Kim SJ, Ji HW (2018) Estimating pig weights from images without constraint on posture
and illumination. Comput Electron Agric 153:169–176
9. Pezzuolo A, Milani V, Zhu D, Guo H, Guercini S, Marinello F (2018) On-barn pig weight
estimation based on body measurements by structure-from-motion (SfM). Sensors 18(11):3603
10. Fernandes AFA, Dorea JRR, Valente BD, Fitzgerald R, Herring W, Rosa GJM (2020) Compar-
ison of data analytics strategies in computer vision systems to predict pig body composition
traits from 3D images. J Anim Sci 98:1–9
11. Yu H, Lee K, Morota G (2021) Forecasting dynamic body weight of nonrestrained pigs from
images using an RGB-D sensor camera. Transl Anim Sci 5:1–9
12. Oliveira DAB, Pereira LGR, Bresolin T, Ferreira REP, Dorea JRR (2021) A review of deep
learning algorithms for computer vision systems in livestock. Livest Sci 253:104700
13. Jensen DB, Dominiak KN, Pedersen LJ (2018) Automatic estimation of slaughter pig live
weight using convolutional neural networks. In: II International conference on agro bigdata
and decision support systems in agriculture
14. Cang Y, He H, Qiao Y (2019) An intelligent pig weights estimate method based on deep
learning in sow stall environments. IEEE Access 7:164867–164875
15. Gjergji M, de Moraes Weber V, Silva LOC, da Costa Gomes R, De Araújo TLAC, Pistori H,
Alvarez M (July, 2020) Deep learning techniques for beef cattle body weight prediction. In:
2020 international joint conference on neural networks (IJCNN). IEEE pp 1–8
16. He H, Qiao Y, Li X, Chen C, Zhang X (2021) Automatic weight measurement of pigs based
on 3D images and regression network. Comput Electron Agric 187:106299
17. He H, Qiao Y, Li X, Chen C, Zhang X (2021) Optimization on multi-object tracking and
segmentation in pigs’ weight measurement. Comput Electron Agric 186:106190
18. Zhang J, Zhuang Y, Ji H, Teng G (2021) Pig weight and body size estimation using a multiple
output regression convolutional neural network: a fast and fully automatic method. Sensors
21(9):3218
19. Bhoj S, Tarafdar A, Chauhan A, Singh M, Gaur GK (2022) Image processing strategies for pig
liveweight measurement: updates and challenges. Comput Electron Agric 193:106693
A Novel Image Encryption Technique
Based on DNA Theory and Chaotic Maps
Abstract High inter- and intra-pixel redundancy and correlation among adjoining
pixels of digital images make it crucial to secure during transmission over private
channels. Here, we have exploited the chaotic and DNA theory to design a secure and
robust image encryption approach. The characteristics including pseudo-randomness
and ergodicity make chaotic systems to employ for image cryptography. Chaotic
theory and DNA operations are employed in confusion and diffusion stages of image
encryption, respectively. The control parameters of chaotic maps will be consid-
ered as a security key. Performance of the projected method is evaluated using
entropy analysis, histogram analysis, and correlation matrices. The proficiency of
the technique is also justified with prevailing techniques using various evaluation
parameters.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 103
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_9
104 K. Verma et al.
shield the confidentiality of digital images. High inter- and intra-pixel redundancy and
correlation among adjoining pixels of digital images have attracted the researchers
for encryption as a tool of image security [2–4]. Image encryption converts it into
an unrecognizable form. The common approaches applied for image encryption are
(a) transform domain and (b) spatial domain. Although spatial domain approaches
are computationally efficient and susceptible, they are typically not robust enough
to withstand the external attacks [5]. In transform-domain methods, the cover image
is first transformed to the frequency domain, and then confusion–diffusion method-
ologies are applied to the frequency coefficients [6, 7]. Chaos theory-based data
encryption methods have also demonstrated efficient image security applying the
pseudo-randomness and ergodic properties [3–5, 8]. Friedrich suggested the basic
mechanism of confusion and diffusion architecture based on image encryption [9].
Confusion processes randomize and scramble the samples of the digital media to
minimize the adjoining sample correlation. On the other hand, the diffusion process
updates the samples of the image to distribute the properties of ciphered text.
To exploit the features of two or more approaches simultaneously, recently,
hybridized image encryption approaches are more robust and secure as compared to
individual approaches. Guesmi, et al. proposed a hybridized approach with chaotic
maps and DNA theory [8]. Chen et al. [10] applied DNA approaches based on
novel mechanisms with permutation–diffusion processes. Belazi [11] suggested a
multiple round encryption method joining features of chaotic maps and DNA compu-
tations. A hybridized encryption approach was suggested using chaotic theory and
genetic operations [12]. Beside these techniques, recent encryption approaches are
also employing machine learning [13] and asymmetric methodologies [14].
Here, we proposed a hybrid image encryption approach using chaotic maps and
random DNA procedures. In the confusion stage, the sine map chaotic system is
applied to scramble the image by pixel location indexing dislocation. Further, DNA
encoding is applied and a chaotic sequence-based DNA operator selection approach
is also designed in the diffusion stage of the proposed method.
Section 2 of this paper explains the features of chaotic maps and DNA encoding
and DNA operations. Section 3 demonstrates the different stages of the presented
encryption method. In Sects. 4 and 5, result analysis and assessment of proposed
method and conclusion are discussed.
2 Background
Chaotic maps are 1D nonlinear maps with complex chaotic behavior [10]. The bene-
fits of being very simple construction, ease of implementation and less computational
complexity, 1D chaotic maps are appropriate to employ in the field of data security
algorithms. Mathematically expression of 1D chaotic maps is defined as follows.
A Novel Image Encryption Technique Based on DNA Theory … 105
where y0 ε[0, 1] and αε(3.5, 4] are the starting chaotic sample and control param-
eter of SM, respectively. Bifurcation diagrams in Fig. 1a justify the pseudo-random
properties of chaotic SM.
where y0 ε[0, 1] and βε[0, 4] are the starting chaotic sample and control parame-
ters of the logistic map, respectively. Pseudo-random properties of chaotic LM are
demonstrated in Fig. 1b.
3 Proposed Method
The proposed hybrid encryption technique consists of: (i) chaotic map scrambling-
based confusion stage and (ii) DNA encoding and DNA operations based diffusion
stage. The overview of the proposed encryption technique is given in Fig. 2. The
comprehensive process is discussed below.
3.1 Encryption
Chaotic maps are applied as a nonlinear confusion step of image encryption. The
four chaotic sine maps are generated with initial conditions and control parameters
(y0i , α i ); i = 1, 2, 3, and 4. Chaotic maps will scramble the cover image (img) into a
confused image (conf_img) employing Algorithm 1.
3.2 Decryption
This stage consists of the opposite operations of the encryption stage as discussed
in Sect. 3.1. Chaotic maps are generated with the same control values as that of
confusion and diffusion processes of the encryption approach.
Fig. 3 a and b Cover images and histograms, respectively; c and d encrypted images and
histograms, respectively
Images and their histogram before and after encryption are shown in Fig. 3. The
histogram of the original image shows large up and downs, but histogram after
encryption for all the images is flat and changed from histogram before encryption.
It justifies that the process can bear external attacks and eavesdroppers are not able
to detect it through the histogram analysis.
For correlation evaluation, random 30,000 adjacent samples of images are selected
from cover and encrypted images for horizontal, vertical, and diagonal correlation
A Novel Image Encryption Technique Based on DNA Theory … 111
Table 7 Correlation
Approach Horizontal Vertical Diagonal
coefficient comparison with
other existing approaches Ref. [15] − 0.0017 − 0.0004 0.0028
Ref. [16] 0.0064 0.0003 0.0026
Ref. [17] − 0.0008 − 0.0013 0.0018
Proposed − 0.0211 − 0.0172 − 0.0116
coefficients’ examination (Table 6). It demonstrated that cover images have high
correlation and it decreases after the encryption process. Table 7 shows the relative
analysis of correlation coefficients of our approach with the correlation coefficients
of other existing approaches for the same image. The correlations’ coefficients with
the proposed approach are less than or comparable with other approaches.
5 Conclusions
A hybrid, robust, and secure encryption methodology using chaotic and DNA theory-
based encoding and operation is proposed. Chaotic map’s pseudo-random features
and DNA operation are utilized to create the confusion and diffusion methods of
the proposed approach. Chaotic maps based confusion mechanisms scrambled the
cover signal. Then, the scramble signal is further applied for the diffusion stage using
DNA encoding and operations. Performance of the given approach is examined and
evaluated using various matrices. Performance comparison and investigation with
respect to the latest state-of-the-art methods prove the data security capabilities of
the proposed technique.
112 K. Verma et al.
References
1. Kaur R, Singh B (2021) A novel approach for data hiding based on combined application of
discrete cosine transform and coupled chaotic map. Multimedia Tools Appl 80:14665–14691
2. Fang P, Liu H, Wu C, Liu M (2022) A survey of image encryption algorithms based on chaotic
systems. Vis Comput 1–29
3. Wang X, Zhao M (2021) An image encryption algorithm based on hyperchaotic system and
DNA coding. Opt Laser Technol 143
4. Mansouri A, Wang X (2021) A novel one-dimensional chaotic map generator and its application
in a new index representation-based image encryption scheme. Inf Sci 563:91–110
5. Lu Q, Zhu C, Deng X (2020) An efficient image encryption scheme based on the LSS chaotic
map and single S-box. IEEE Access 8:25664–25678
6. Wang X, Zhang M (2021) A new image encryption algorithm based on ladder transformation
and DNA coding. Multimedia Tools Appl 80(9):13339–13365
7. Kaur R, Singh B Robust image encryption algorithm in dwt domain. Multimedia Tools Appl
(2023) https://doi.org/10.1007/s11042-023-16985-4
8. Guesmi R, Farah MAB, Kachouri A, Samet M (2016) A novel chaos-based image encryption
using DNA sequence operation and secure hash algorithm SHA-2. Nonlinear Dyn 83(3):1123–
1136
9. Fridrich J (2011) Symmetric ciphers based on two-dimensional chaotic maps. Int J Bifurcat
Chaos 8(6):1259–1284
10. Chen J, Zhu ZL, Zhang LB, Zhang Y, Yang BQ (2018) Exploiting self-adaptive permutation–
diffusion and DNA random encoding for secure and efficient image encryption. Signal Proc
142:340–353
11. Belazi A, Talha M, Kharbech S et al (2019) Novel medical image encryption scheme based on
chaos and DNA encoding. IEEE Access 7:36667–36681
12. Jarjar A (2022) Vigenere and genetic cross-over acting at the restricted ASCII code level for
color image encryption. Med Biol Eng Compu 60:2077–2093
13. Man Z, Li J, Di X, Sheng Y, Liu Z (2021) Double image encryption algorithm based on neural
network and chaos. Chaos Solitons Fractals 152
14. Xu Q, Sun K, Zhu C (2020) A visually secure asymmetric image encryption scheme based on
RSA algorithm and hyperchaotic map. Phys Scr 95(3):035223
15. Wang X, Guan N (2020) A novel chaotic image encryption algorithm based on extended zigzag
confusion and rna. Opt Laser Technol 131:106366
16. Xu Q, Sun K, He S, Zhu C (2020) An effective image encryption algorithm based on
compressive sensing and 2d-slim. Opt Lasers Eng 134:106178
17. Cao W, Mao Y, Zhou Y (2020) Designing a 2D infinite collapse map for image encryption.
Signal Process 171:107457
An Empirical Study on Comparison
of Machine Learning Algorithms
for Eye-State Classification
Using EEG Data
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 113
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_10
114 N. Priyadharshini Jayadurga et al.
1 Introduction
In the contemporary era, the confluence of neuroscience, machine learning, and assis-
tive technology has verged in the quest of enhancing the lives of disabled persons.
Assistive devices play a pivotal part in amplifying the lives and independence of
person with cognitive differences. These devices are developed to close the divide
between the person’s ability and the physical challenges faced by them. They enrich
and empower the disabled people by aiding their day-to-day tasks. BCI has impacted
most of the industries positively [1, 2]. It has been identified that the people with
severe paralyzes communicate by eye gestures and blinks or using their brain signals
to register EEG insights [3]. Brain–computer interface (BCI) has sprang up to be a
ground breaking technology in the field of assistive technology with the potential of
transformation of brain signals to control devices. BCI bridges the signals generated
from the brain and the machines to execute these signals to actions [4]. BCI tech-
nology transfers brain signals into useful commands for carrying out an action [5].
In this research work, the focus is laid toward classification of eye-states into either
closed or opened using machine learning algorithms. Machine learning algorithms
have the strength to bring out the hidden patterns [6]. An empirical study has been
carried out using the notable machine learning algorithms such as logistic regression,
ElasticNet classifier, and all kernels of support vector machine (SVM) to identify the
best-performing algorithm for the given problem in hand. The primary objective of
this research work is as follows:
1. Comparison of the performance of three distinct algorithms, namely logistic
regression, ElasticNet classifier, and support vector machines for eye-state clas-
sification data.
2. Evaluation of the algorithms based on key indicators of performance.
3. Identification of the benefits and limitations of each of the implemented algorithms
in the context of eye-state classification.
This research paper is organized as: Sect. 2 discusses the existing works in the pro-
posed area of research, Sect. 3 states the proposed methodology for the problem in
hand, and Sect. 4 portrays the results of the experimentation.
2 Literature Survey
The paper [7] uses a time-domain linear filtering technique. In this work eye-blink
signals are obtained initially with a multichannel Wiener filter and a subset of frontal
electrodes. The estimate of these signals is subtracted from the noisy EEG signal
using the principle of regression analysis. They have also use independent compo-
nent analysis (ICA). It was found that MWF-based method promoted greater results
in shorter time frames. A study [8] suggests a random eye-state change detection
in real-time using EEG signals. In this work, the last two seconds of the signals is
An Empirical Study on Comparison of Machine Learning Algorithms … 115
3 Proposed System
4 Experimental Results
The Kaggle website contributed the dataset. The results were obtained from a single,
117-second continuous EEG experiment conducted with the Emotiv EEG Neuro-
headset. During the EEG measurement in this data collection, the eye-state was
manually inputted for analysis after being captured by a camera. EEG signals from
14 electrodes (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4)
constituted the dataset. Whether the eye condition was closed (1) or open (0) was
specified in the target column.
An Empirical Study on Comparison of Machine Learning Algorithms … 117
The frequency ranges of the EEG data which was collected made up the dataset. The
data was first preprocessed by visualizing it as an EEG signal. Initially, the signal
was gathered using Eq. 1 at a sampling rate of 128.
The outliers were identified and replaced with nan. They were recalculated with ignor-
ing the nan to remove bias from the dataset. Then, independent component analysis
(ICA) was applied to the data for separation of multivariate signals into their additive
subcomponents. This process was implemented to drop the non-electrophysiological
components. Once this process was done, the signals were reconstructed to obtain
the clean EEG with bad components being dropped off. Now, the data between
8 and 12 Hz is processed utilizing a band-pass filtering technique, and the alpha
waves are subsequently extracted for further investigation. The filtered data was sub-
jected to correlation testing, and the highly-correlated features were dropped from
the dataset.
The preprocessed data was then used for the implementation of the stated algorithms
like logistic regression, ElasticNet classifier, and support vector machines using var-
ious kernels.
Logistic Regression Designing of methods for artifact detection is essential and is
need of the hour [15–17]. Logistic regression is a machine learning technique that is
supervised in nature and is applicable for binary classification problems like the one
in hand. The data was split into training and testing data in the ratio of 70:30. The
logistic regression relates the features, i.e., electrodes to the probability of outcome
as whether eye is closed or opened. The logistic function is given as 2
where
. S(z) is the predicted probability of eye closed (1).
.e is the base of the natural logarithm.
. z = β0 + β1 x1 + β2 x2 + · · · + β p x p (3)
118 N. Priyadharshini Jayadurga et al.
where
β refers to the intercept term.
. 0
.β1 x 1 + β2 x 2 + · · · + β p x p refers to the weights of the features (electrodes) . x 1 , x 2 ,
. . . , x p . Logistic regression was applied to the dataset, and an accuracy of 57% was
obtained as a result. The precision, recall, and F1 score of logistic regression for
eye-state classification are 0.54, 0.32, and 0.40, respectively. The confusion matrix
for eye-state classification using logistic regression is shown in Fig. 2.
ElasticNet Classifier ElasticNet classifier is a regularization technique commonly
used for classification problems [2, 18]. It works by combining L1 (Lasso) and
L2 (Ridge) regularization terms. The objective function of ElasticNet classifier is
mathematically denoted in Eq. 4
⎛ ⎞
1 ∑n
( ) ∑p
1− p ∑p
. J (β) = h β (x (i) − y (i) )2 + α ⎝ρ |β j | + β 2j ⎠ (4)
2n i=1 j=1
2 j=1
where
J (β)
. loss function that is minimized by ElasticNet classifier.
.β weights of the features (electrodes).
.n number of samples (14980).
.p number of features (10 after preprocessing).
(i)
.h β (x ) prediction of i-th data point.
(i)
.y Actual target of i-th data point.
.α Regualrization parameter.
.ρ parameter that balances L1 and L2 regularization.
An Empirical Study on Comparison of Machine Learning Algorithms … 119
The algorithm works by minimizing the objective function stated in 4. EEG for
evaluation of BCI devices should be carried out carefully [19, 20]. When applying
Elastic Regression to the dataset, the accuracy for eye-state classification was found
to be 57.8%. The precision, recall, and F1 score of ElasticNet classifier for eye-state
classification are 0.56, 0.31, and 0.40, respectively. The confusion matrix depicting
the correct and incorrect predictions of eye-states is shown in Fig. 3.
Support Vector Machines Support vector machine (SVM) is a machine learning
algorithm widely used for classification problems. It works by finding the optimal
hyperplane that best separates the two classes in case of binary classification [21].
Various kernel functions were applied to the data, and the performance was evaluated
for the same. It was found that the SVM algorithm with radial basis function (RBF)
was found to outperform the other kernel functions.
The accuracy of SVM using various kernels for eye-state classification is expressed
in Table 1. The confusion matrix for each of the kernel function is shown in Fig. 4. The
F1 score, precision, and recall values for various kernels of SVM are demonstrated
in Table 2.
Comparison of the Results Binary classification algorithms are evaluated using
recall (REC), specificity (SPEC), precision (PREC), F1-score, and area under the
curve (AUC) [22, 23]. The three algorithms were compared on the basis of accuracy,
Fig. 4 Confusion matrix of SVM using various kernel functions for eye-state classification
precision, recall, F1 score, and confusion matrix. The accuracy, precision, recall, and
F1 score comparison for the chosen algorithms is depicted in Table 3. It was found
that the support vector machine (SVM) with radial basis function (RBF) as kernel
outperformed the other algorithms that was considered for the study. This provides
a perception that SVM with RBF kernel can promote best insights for the problem
in hand.
An Empirical Study on Comparison of Machine Learning Algorithms … 121
5 Conclusion
References
1. Maiseli B, Abdalla AT, Massawe LV, Mbise M, Mkocha K, Nassor NA, Ismail M, Michael J,
Kimambo S (2023) Brain-computer interface: trend, challenges, and threats 12
2. Hassouneh A, Mutawa AM, Murugappan M (2020) Development of a real-time emotion recog-
nition system using facial expressions and EEG based on machine learning and deep neural
network methods. Inf Med Unlocked 20:100372, 1
3. Sirvent Blasco JL, Iáñez E, Úbeda A, Azorín JM (2012) Visual evoked potential-based brain-
machine interface applications to assist disabled people. Expert Syst Appl 39:7908–7918, 7
4. Tiwari N, Edla DR, Dodia S, Bablani A (2018) Brain computer interface: a comprehensive
survey. Biologically Inspired Cogn Archit 26:118–129, 10
5. Mudgal SK, Sharma SK, Chaturvedi J, Sharma A (2020) Brain computer interface advancement
in neurosciences: applications and issues 6
6. Radhika N, Bhavani KD (2020) K-means clustering using nature-inspired optimization
algorithms-a comparative survey. Int J Adv Sci Technol 29(6s):2466–2472
7. Borowicz A (2018) Using a multichannel wiener filter to remove eye-blink artifacts from EEG
data. Biomed Sign Process Control 45:246–255, 8
122 N. Priyadharshini Jayadurga et al.
8. Saghafi A, Tsokos CP, Goudarzi M, Farhidzadeh H (2017) Random eye state change detection
in real-time using EEG signals. Expert Syst Appl 72:42–48, 4
9. Abromavičius V, Serackis A (2018) Eye and EEG activity markers for visual comfort level of
images. Biocybernetics Biomed Eng 38:810–818, 1
10. Abo-Zahhad M, Ahmed SM, Abbas SN (2016) A new multi-level approach to EEG based
human authentication using eye blinking. Pattern Recogn Lett 82:216–225, 10
11. Nikolaev AR, Meghanathan RN, van Leeuwen C (2016) Combining EEG and eye movement
recording in free viewing: pitfalls and possibilities. Brain Cognition 107:55–83, 8
12. Kang J, Han X, Song J, Niu Z, Li X (2020) The identification of children with autism spectrum
disorder by SVM approach on EEG and eye-tracking data. Comput Biol Med 120:103722, 5
13. Nkengfack LCD, Tchiotsop D, Atangana R, Tchinda BS, Louis-Door V, Wolf D (2021) A
comparison study of polynomial-based PCA, KPCA, LDA and GDA feature extraction methods
for epileptic and eye states EEG signals detection using kernel machines. Inf Med Unlocked
26:100721, 1
14. Medhi K, Hoque N, Dutta SK, Hussain MI (2022) An efficient EEG signal classification
technique for brain-computer interface using hybrid deep learning. Biomed Signal Process
Control 78:104005, 9
15. Wang M, Cui X, Wang T, Jiang T, Gao F, Cao J (2023) Eye blink artifact detection based
on multi-dimensional EEG feature fusion and optimization. Biomed Signal Process Control
83:104657, 5
16. Nilashi M, Abumalloh RA, Ahmadi H, Samad S, Alghamdi A, Alrizq M, Alyami S, Nayer FK
(2023) Electroencephalography (EEG) eye state classification using learning vector quantiza-
tion and bagged trees. Heliyon 9:e15258, 4
17. Santamaría-Vázquez E, Martínez-Cagigal V, Pérez-Velasco S, Marcos-Martínez D, Hornero
R (2022) Robust asynchronous control of ERP-based brain-computer interfaces using deep
learning. Comput Methods Programs Biomed 215:106623, 3
18. Alkatheiri MS (2022) Artificial intelligence assisted improved human-computer interactions
for computer systems. Comput Electr Eng 101:107950, 7
19. Yohanandan S, Kiral-Kornek I, Tang J, Mashford BS, Asif U, Harrer S (2018) A robust low-cost
EEG motor imagery-based brain-computer interface
20. Aswiga RV, Karpagam M, Chandralekha M, Kumar CS, Selvi M, Deena S (2023) An automatic
detection and classification of diabetes mellitus using CNN. Soft Comput 27(10):6869–6875
21. Mageshwari G, Chandralekha M, Chaudhary D (2023) Underwater image re-enhancement with
blend of simplest color balance and contrast limited adaptive histogram equalization algorithm.
In: 2023 international conference on advancement in computation & computer technologies
(InCACCT) pp 501–508
22. Kamble A, Ghare P, Kumar V (2022) Machine-learning-enabled adaptive signal decomposition
for a brain-computer interface using EEG. Biomed Signal Process Control 74:103526, 4
23. Punsawad Y, Siribunyaphat N, Wongsawat Y (2021) Exploration of illusory visual motion
stimuli: an EEG-based brain-computer interface for practical assistive communication systems.
Heliyon 7:3
Decoding the UK’s Stance on AI: A Deep
Dive into Sentiment and Topics
in Regulations
D. N. Dwivedi (B)
SAS Middle East FZ-LLC, Dubai Media City-Business Central Towers, Dubai, UAE
e-mail: dwivedy@gmail.com
G. Mahanty
Department of Analytical and Applied Economics, Utkal University, Orissa, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 123
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_11
124 D. N. Dwivedi and G. Mahanty
1 Introduction
The field of artificial intelligence (AI) has witnessed an unparalleled boom in the
last decade. The field has evolved from abstract concepts and basic implementations
to a fundamental cornerstone of contemporary technology environments. The rapid
increase in popularity can be credited to the combination of variables like improved
computer capabilities, access to extensive and varied datasets, and advancements in
machine learning methods. The utilization of machine learning, specifically deep
learning, has played a crucial role in empowering artificial intelligence systems
to analyze information in manners that were previously believed to be limited to
human cognition. The powers of AI have greatly multiplied. Advancements in natural
language processing have led to the development of advanced digital assistants,
while progress in computer vision has allowed for the implementation of real-world
applications such as facial recognition and driverless vehicles. It has been incorpo-
rated into various aspects of healthcare, such as diagnostics, financing, and supply
chain optimization, highlighting its potential to bring about significant changes. The
widespread availability of AI tools and platforms has sparked a surge of innovation,
enabling a larger group of developers and academics to contribute to its advancement.
In the early years of the twenty-first century, the combination of rapidly advancing
digital technology and large amounts of data led to a worldwide emphasis on using
cognitive methods to gain an advantage. The emphasis on this aspect accelerated the
development of what is now acknowledged as artificial intelligence, as theorized by
Nilsson [1]. John McCarthy is well recognized and appreciated as a pioneering figure
in the field of artificial intelligence (AI). He defined AI as the ‘craft and discipline
of creating intelligent machines’ [2].
AI customizes experiences for individual users, whether through curated recom-
mendations on media platforms or digital assistants on our phones. It has reori-
ented the technological environment to prioritize the user. Artificial intelligence (AI)
augmented technologies assist clinicians in detecting ailments at an earlier stage and
with greater precision. The technologies for autonomous vehicles present a vision
of roadways that are safer and more efficient. Intelligent learning systems utilize
artificial intelligence to tailor lessons to the specific requirements of each learner,
providing a more personalized learning experience. The rapid rise of artificial intel-
ligence (AI) across several industries underscores the urgent requirement for its
supervision. Artificial intelligence tools are increasingly prevalent in critical indus-
tries such as health care, finance, and transportation. Ensuring their trustworthiness,
impartiality, and security is crucial. Artificial intelligence systems have the potential
to unintentionally reflect the biases present in the data they are trained on if not
adequately monitored. There is a potential for the occurrence of unjust outcomes
that could negatively impact specific demographics. The intricate internal mecha-
nisms of certain AI systems are frequently referred to as ‘black boxes’. It prompts
inquiries regarding transparency. The concerns over the security of data and the
potential for AI to be misused in areas such as espionage, dissemination of false
Decoding the UK’s Stance on AI: A Deep Dive into Sentiment … 125
information, and weapon systems highlight the critical need for government engage-
ment. Establishing regulations is not intended to impede advancement, but rather to
guarantee that AI develops in a secure and morally sound manner. It should uphold
public trust and prioritize the welfare of society as a whole. Nevertheless, it is not
solely about ease and efficiency. AI also presents ethical and societal challenges,
including worries over data privacy. The issue at hand involves the displacement of
jobs caused by automation, as well as the imperative for the appropriate develop-
ment of artificial intelligence. With the continuous growth and evolution of AI, it is
becoming increasingly imperative to establish clear guidelines to ensure its ethical
and prudent utilization. The UK, known for its progressive regulatory approach, plays
a crucial role in shaping the global discourse on AI. The country’s stance on AI not
only influences its domestic policy but also serves as a paradigm for other nations
to contemplate. Given the UK’s significant influence, it is crucial for researchers,
decision-makers, and industry specialists to understand its perspective on AI. The
objective of this study is to comprehensively comprehend and illuminate the senti-
ments and primary domains of interest in the UK regarding artificial intelligence.
By employing sophisticated methodologies for attitude analysis and issue identifi-
cation, we thoroughly examine regulatory documents to extract the UK’s approach,
aspirations, and concerns around artificial intelligence. Our objective is to provide
a clearer and more comprehensive understanding of the regulations governing AI,
enabling individuals to engage with AI in a more transparent and informed manner.
The main objective of this study is to examine and extract the attitudes expressed
in the UK’s proposed AI law document, as well as to do topic modeling. By utilizing
the Naïve Bayes methodology, the research aims to identify both positive and nega-
tive emotional tones present in the paper. Companies often utilize sentiment analysis
tools to measure sentiment in social media material, assess brand perception, and
acquire insights into client emotions. These models generally classify information
based on polarity, such as positive, negative, or neutral. Additionally, they analyze
specific emotions, such as anger, happiness, and sadness, as well as urgency levels
and user intent. The purpose of this study is to assess the prevailing sentiment incli-
nation, namely if most utterances express optimism or pessimism over the subject.
Simultaneously, the research also seeks to determine the main types and subdivisions
within these favorable and unfavorable emotions.
(a) The study aims to determine the distribution of favorable and negative opinions
inside the AI regulatory document of the UK.
(b) What are the predominant categories within these positive and negative
sentiments?
2 Literature Studies
Our literature review focuses on three main topics: the moral challenges and dangers
linked with AI, sentiment analysis using data from Twitter and other platforms, and
common techniques used in sentiment analysis.
126 D. N. Dwivedi and G. Mahanty
AI Risk and Ethical Challenges: Dwivedi [3, 4] used Twitter data to pinpoint the
main worries about rising AI. Hagendorff [5] performed an in-depth comparison
of 22 ethical guides, highlighting both shared points and areas lacking attention.
This research also suggests ways to make AI ethical rules more actionable. Maas [6]
believed that AI systems can cause widespread, cascading mistakes. Box and Data [7]
examined how human prejudices might affect the machine learning journey. Martinho
et al. [8] and team combined both theory and real-world evidence to explore the
ethical choices made by AI systems. References [9–12] shared examples of various
AI models and how they can help manage AI Bias and risk, Tamboli [2] pointed out the
evolving problems caused by changing data trends, underscoring the ‘concept drift’
issue. Bolander [13] expressed worries about the consequences and technical hurdles
of AI replacing human tasks. Holzinger et al. [14] emphasized the need for extensive,
top-notch data to address urgent medical problems, particularly by merging different
clinical, imaging, and molecular data to understand intricate illnesses. References
[15] and [16] shared examples of AI methods and risk in computer vision models.
Reference [17] shared examples of machine learning-based models and ESG and
the risk associated with it. Reference [18–21] shared various examples of machine
learning models and how to manage AI risk.
Sentiment Analytics Using Twitter and Other Data Sources: Gupta et al. [9]
contributed to the comprehension of textual context to find distinct attributes of
items or services that impact consumer emotions. The study conducted by Dwivedi
and Anand [22] employed sentiment analysis and topic modeling techniques to eval-
uate the responses of governments during the COVID-19 pandemic, specifically
comparing the reactions of the United Arab Emirates and Saudi Arabia. In their
study, Dwivedi et al. [10] employed Twitter data to conduct topic and sentiment
analysis with the objective of identifying primary concerns regarding the veracity
and integrity of data. Dwivedi et al. [3, 23] employed context analysis on Twitter
to assess the prevalence of positive and negative attitudes regarding the COVID-
19 vaccination, emphasizing prevalent concerns. In their study, Dwivedi et al. [22]
conducted a comparative analysis of medical research on COVID-19 in the United
Arab Emirates and the World Health Organization. The researchers focused on iden-
tifying the main topics addressed by both parties. In their study, Dwivedi et al. [4]
utilized context analysis on Twitter to categorize sentiments regarding the ethical
quandaries in AI, identifying key areas of concern.
Text Clustering Methods: Alghamdi and Alfalqi [24] highlighted the increasing
demand for novel techniques or instruments to efficiently manage, sort, and eval-
uate the ever-growing volume of electronic records and archives. Hofmann [25]
delineated two primary methodologies for analyzing such data: natural language
processing (NLP) and statistical techniques, such as topic modeling. While natural
language processing (NLP) focuses on the categorization of speech components and
the examination of grammar, statistical and topic models mostly utilize the ‘bag-of-
words’ (BoW) approach. This approach involves transforming texts into a matrix that
represents the frequency of each word occurrence in each document. Deerwester et al.
[26] were pioneers in the field of topic modeling. They introduced a highly influential
Decoding the UK’s Stance on AI: A Deep Dive into Sentiment … 127
model that included latent semantic analysis (LSA) and singular value decomposition
(SVD). Asmussen and Moller [27] developed an innovative framework utilizing topic
modeling approaches to comprehensively analyze a diverse array of scholarly articles.
Their approach facilitated rapid, lucid, and replicable evaluations of extensive collec-
tions of documents employing the LDA technique. In general, automatic document
processing can be categorized into two types: supervised learning and unsupervised
learning. Supervised learning necessitates the meticulous task of manually assigning
labels to a set of documents, which can be time-consuming. In contrast, unsupervised
algorithms, like topic modeling, bypass this stage, hence accelerating the examina-
tion of large collections of documents. Gottipati et al. [28] utilized subject modeling
and visual data representation techniques to comprehend the feedback provided by
postgraduate students at the Singapore University of Management. They combined
rule-based techniques with statistical categorizers to identify subjects. Bagheri et al.
[29] developed a sentiment analysis framework that focused on extracting opinions
pertaining to specific themes. The LDA algorithm was employed to identify subjects,
while the ‘bag-of-words’ approach was utilized for sentiment analysis, which quan-
tifies emotions based on word frequency. Benedetto and Tedeschi [30] discussed
prevalent methodologies for sentiment analysis on social media, addressing perti-
nent concerns in cloud technology. Dwivedi and Pathak [23] introduced a technique
that operates at the phrase level. This method utilizes online latent semantic indexing
along with predefined rules to extract topics. Samuel and his colleagues [31] exam-
ined the positive and negative sentiments on the recovery of the US economy amid the
COVID-19 pandemic, including factors such as evolving circumstances, economic
downturns, and feelings of sorrow. Alonso [32] shown that when consumers exhibit a
robust response to unfavorable information, coupled with an escalation of unpleasant
emotions, it results in a negative perception of cattle production. Gupta et al. [33–
43] provided unsupervised and supervised machine learning methods for detecting
anomalies. Gupta et al. [44, 45] provided optimization methods and data quality
approaches for detection and optimization of money laundering scenarios. Dwivedi
and Vemareddy [46] performed sentiment analysis for crypto to understand the nega-
tive sentiments. Reference [7] shared examples how human bias can influence the AI
bias. [9–12] shared examples of various AI models and how they can help manage
AI Bias and risk.
The UK has actively engaged in shaping the management of artificial intelligence (AI)
to guarantee its ethical and responsible use within its member states. The development
of the UK’s preliminary AI guidelines was a comprehensive endeavor, engaging
various parties and reflecting on the wider tech, economic, social, and moral aspects
of AI. For our research, we sourced the in-depth draft regulation directly from the
UK website to conduct topic modeling and sentiment analysis.
128 D. N. Dwivedi and G. Mahanty
The process of text analysis involves the conversion of human language into a
format that can be processed and analyzed by machines. There are multiple crucial
stages involved in preparing the text for this, which include:
• Transforming the entire text to lowercase.
• Removing stop words, rare terminology, and specific words.
• Converting numerical values into their verbal representation or eliminating them
completely.
• Removing any spaces at the beginning or end of the text.
• Excluding punctuation marks and other unique symbols.
At first, our main objective was to eliminate any duplicate rows. Eliminating super-
fluous data is essential to get precise results. Lowercasing all text ensures uniformity,
preventing the possibility of mistaking ‘drinking water’ as separate words.
To optimize the processing of textual data, it is typically necessary to decrease the
size of the dataset. An effective strategy is eliminating commonly occurring terms. We
have the option to either create a tailored compilation of these phrases or utilize pre-
existing libraries. To achieve this objective, we employed the stopword and textblob
libraries. Generally, we eliminate words that are commonly used by everyone, but
we may also exclude keywords that are peculiar to the situation and appear too
frequently. It is recommended to examine the most common terms in the dataset in
order to identify which ones should be excluded. Due to the widespread occurrence of
spelling errors and abbreviations in sentences, it is crucial to include a spell-checking
process to ensure consistency in word usage. We utilized the textblob library, which
is specifically designed to handle such errors. Tokenization is the process of dividing
the text into separate units, such as individual words or sentences. By utilizing the
textblob package, we converted our texts into tokens, effectively dissecting them
into individual words. Stemming is a technique that involves removing word suffixes
such as ‘ing,’ ‘ly,’ and ‘s.’ We utilized the Porter Stemmer algorithm from the NLTK
package to do this. Nevertheless, lemmatization is frequently favored over stemming
as a strategy. Lemmatization, unlike simple word trimming, involves determining
the base or root form of a word by considering vocabulary and morphological study.
Therefore, lemmatization is generally preferred due to its accuracy.
Upon finishing these preprocessing stages, the subsequent stage entails extracting
features utilizing techniques derived from natural language processing.
N-grams, comprising unigrams, bigrams, or trigrams, denote combinations of
one, two, or three words, correspondingly. Although unigrams may not encom-
pass extensive context, bigrams and trigrams can provide more intricate language
patterns, revealing probable word sequences. The selection between shorter or longer
N-grams is contingent upon the precise objectives of the study, as excessively
lengthy sequences may overlook the overarching message. Part-of-speech tagging
(POS) categorizes words according to their grammatical role in a phrase, such as
whether they act as nouns, verbs, adjectives, etc., so contributing to the contextual
understanding and significance of the text.
As illustrated in Fig. 1, our first step consisted of eliminating duplicate rows to
guarantee impartial results. In order to standardize words and prevent duplication,
Decoding the UK’s Stance on AI: A Deep Dive into Sentiment … 129
we proceeded to transform all text to lowercase. This ensures that variations such as
‘Crypto Currency’ and ‘crypto currency’ are not considered as separate keywords.
Subsequently, punctuation was eliminated to simplify the dataset for more efficient
textual analysis. Stop words, which are frequently used terms, were eliminated using
the textblob package in Python. Due to the frequent occurrence of spelling prob-
lems and abbreviations in tweets, we utilized the textblob library to repair spelling.
After completing these stages, we proceeded to tokenize the data, which involved
dividing it into separate words or sentences. The process of stemming, which involves
truncating word ends such as ‘ing,’ ‘ly,’ and ‘s,’ was performed using Python’s Porter-
Stemmer from the NLTK module. Lemmatization is an alternative to stemming that
determines the basic form of a word. By utilizing vocabulary and morphological
analysis, lemmatization is frequently preferred over stemming due to its higher level
of precision. After completing the first preprocessing, we utilized various natural
language approaches to extract features. We employed N-grams, namely bigrams
and trigrams, to comprehend the contextual correlation among words. Although
unigrams supply only a little amount of information, bigrams and trigrams provide
more extensive linguistic insights. In addition, part-of-speech tagging was utilized
to assign functional roles, such as nouns, verbs, and adjectives, to words based on
their context.
Afterward, we proceeded with topic modeling, a method that detects themes or
subjects in a collection of texts. This procedure is essential in natural language
processing as it decreases the dimensionality of the dataset by concentrating on
relevant content rather than filtering through the entire text. For our investigation,
we employed the Latent Dirichlet Allocation (LDA) approach among numerous
alternatives. The Latent Dirichlet Allocation (LDA) model is a statistical tool that
identifies connections between different papers. Based on the variational expectation
maximization (VEM) algorithm, this method detects the most likely subjects present
in a collection of texts. Conventional approaches may only choose the most common
terms, whereas LDA goes beyond by examining the semantic connections between
words in a document. The system functions based on the idea that every document
can be characterized by a distribution of themes, and each subject can be delineated
by a distribution of words. This method provides a comprehensive perspective on
interconnected subjects, allowing for more subtle categorizations of the body of text.
The following flowchart illustrates the complexities of this topic modeling approach.
4 Results
Overall Sentiment: The sentiment analysis of the document yields the following
results.
• The document exhibits a positive sentiment with an average polarity of 0.1317.
• This value ranges from − 1 (most negative) to 1 (most positive). A value of 0.1317
suggests that the document has a slightly positive sentiment overall.
• The document has overall Subjectivity of 0.4215. This value ranges from 0 (most
objective) to 1 (most subjective). A value of 0.4215 indicates that the document
strikes a balance but leans toward being somewhat subjective.
Coherence Score Plot:
• Coherence Score for Positive Sentences: 0.434
• Coherence Score for Negative Sentences: 0.594.
Positive Sentences Topics:
Topic 1: AI, regulatory framework, effective central function, organizations.
Topic 2: AI life cycle, regulatory approach, legal principles, responsibility.
Topic 3: AI regulators, regulatory principles, support guidance, ensure framework.
Topic 4: AI in the UK, risk, effective government systems, innovation.
Topic 5: AI approach, UK innovation, stakeholders, regulatory foundation.
In the realm of positive sentiments surrounding AI, several dominant themes
emerge. Firstly, there’s a focus on AI’s regulatory framework and the importance
of an effective central function to streamline its integration across various organiza-
tions. Secondly, the AI life cycle, combined with a robust regulatory approach and
legal principles, signifies the emphasis on responsibility and ethics. Furthermore,
the role of AI regulators is highlighted, emphasizing the need to ensure the right
framework is in place. Additionally, there’s a clear indication of AI’s role in UK
innovation, addressing risks, and promoting effective government systems. Lastly,
Decoding the UK’s Stance on AI: A Deep Dive into Sentiment … 131
Similarly, the MDS projection for negative sentiments displays the topics in a
two-dimensional layout. Topics that are proximate have similar word distributions,
indicating potential overlaps or related concerns. On the other hand, isolated topics
might represent unique issues or criticisms. This map provides insights into the
clustering and distinctions among various negative sentiment topics (Fig. 9).
Termite Plot for Positive and Negative Topics:
For the positive sentiments: plot showcases the word distribution across five topics,
with certain terms holding prominence in specific topics. This can aid in under-
standing the focal points of the positive sentiments within the document. The termite
plot further solidifies these findings, offering a clearer view of the most influen-
tial terms for each positive sentiment topic. For the negative sentiments: it reveals
the distribution of terms across the negative sentiment topics. The darker regions
highlight terms that are crucial in the context of specific topics. The termite plot
136 D. N. Dwivedi and G. Mahanty
for negative sentiments complements the heat map (Fig. 10), Heatmap presents a
granular view if term significance across topics. This aids in deciphering the primary
concerns or themes associated with the negative sentiments in the document.
The UK’s framework for artificial intelligence (AI), known as the ‘A pro-innovation
approach to AI regulation,’ is notable for its comprehensive structure. The UK recog-
nizes the early stage of development of AI and its significant potential for social and
Decoding the UK’s Stance on AI: A Deep Dive into Sentiment … 137
economic impact. However, the UK is also aware of the possible risks associated
with AI systems, such as the potential to exacerbate socioeconomic inequalities. The
Union is dedicated to creating strong standards and directions for the design and
integration of AI, while closely monitoring these problems. The proposal establishes
essential criteria, effectively balancing the management of risks with the promotion
of the growth of this emerging technology.
According to our analysis, the majority of documents convey a positive sentiment,
focusing on the importance of responsibility, compliance with laws, punishment for
those who disregard standards, and compliance with privacy requirements. It is worth
mentioning the Union’s prompt intervention in this domain. Nevertheless, the fears
are equally as prominent as the assurances. The complexities of assessing risks,
supervising them, and guaranteeing adherence present a significant obstacle, as the
preliminary proposal openly acknowledges. This strategy ensures that significant AI
technologies undergo thorough risk assessments and implement preventive actions.
Nonetheless, the true assessment of the UK’s AI rules hinges on their implementation,
oversight, and capacity to adjust to the always changing AI environment. As artificial
intelligence advances, it will be crucial to review and improve existing policies in
order to effectively tackle new challenges and opportunities.
None of these methods can detect whether the picture is generated by a machine.
Distinguishing between GAN-generated images and manually created photos poses
distinct difficulties. The complexity of Generative Adversarial Networks (GANs) is
a major factor contributing to this challenge. GANs are specifically engineered to
imitate the artistic process of humans by producing visuals that closely resemble
authentic photographs. They achieve this by training on extensive datasets of
authentic photos, which allows them to generate exceedingly lifelike and visually
persuasive outcomes.
Overall, the combination of GANs’ capacity to generate extremely lifelike images,
their flexibility to avoid detection, and the lack of obvious artifacts poses a significant
difficulty in distinguishing GAN-generated images from manually created ones using
traditional approaches. The identification of content created by GANs necessitates
the creation of specific methodologies and ongoing progress in the domain of visual
forensics.
References
23. Dwivedi DN, Pathak S (2022) Sentiment analysis for COVID vaccinations using Twitter:
text clustering of positive and negative sentiments. In: Hassan SA, Mohamed AW, Alnowibet
KA (eds) Decision sciences for COVID-19. International series in operations research and
management science, vol 320. Springer, Cham. https://doi.org/10.1007/978-3-030-87019-5_
12
24. Alghamdi R, Alfalqi K (2015) A survey of topic modeling in text mining. Int J Adv Comput
Sci Appl 6(1):147–153. https://doi.org/10.14569/ijacsa.2015.060121
25. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn
42(1–2):177–196. https://doi.org/10.1023/A:1007617005950
26. Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent
semantic analysis. J Am Soc Inf Sci 41(6):391–407
27. Asmussen CB, Møller C (2019) Smart literature review: a practical topic modeling approach
to exploratory literature review. J Big Data 6(1). https://doi.org/10.1186/s40537-019-0255-7
28. Gottipati S, Shankararaman V, Lin JR (2018) Text analytics approach to extract course improve-
ment suggestions from students feedback. Res Pract Technol Enhanced Learn 13(1). https://
doi.org/10.1186/s41039-018-0073-0
29. Bagheri E, Ensan F, Al-Obeidat F (2018) Neural word and entity embeddings for ad hoc
retrieval. Inf Process Manage 54(4):657–673
30. Benedetto F, Tedeschi A (2016) Big data sentiment analysis for brand monitoring in social
media streams by cloud computing. In: Studies in computational intelligence vol 639. https://
doi.org/10.1007/978-3-319-30319-2_14
31. Samuel J, Rahman MM, Ali GMN, Samuel Y, Pelaez A, Chong PH, Yakubov M (2020)
Feeling positive about reopening? New normal scenarios from COVID-19 US reopen sentiment
analytics. In: IEEE Access, vol 8, pp 142173–142190. Available at SSRN: https://ssrn.com/
abstract=3713652
32. Alonso ME, González-Montaña JR, Lomillos JM (2020) Consumers concerns and perceptions
of farm animal welfare. Anim: Open Access J MDPI 10(3). https://doi.org/10.3390/ani100
30385
33. Gupta A, Dwivedi DN, Shah J (2023) Financial crimes management and control in financial
institutions. In: Artificial intelligence applications in banking and financial services. Future of
business and finance. Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_2
34. Gupta A, Dwivedi DN, Shah J (2023) Overview of technology solutions. In: Artificial intelli-
gence applications in banking and financial services. Future of business and finance. Springer,
Singapore. https://doi.org/10.1007/978-981-99-2571-1_3
35. Gupta A, Dwivedi DN, Shah J (2023) Data organization for an FCC unit. In: Artificial intelli-
gence applications in banking and financial services. Future of business and finance. Springer,
Singapore. https://doi.org/10.1007/978-981-99-2571-1_4
36. Gupta A, Dwivedi DN, Shah J (2023) Planning for AI in financial crimes. In: Artificial intelli-
gence applications in banking and financial services. Future of business and finance. Springer,
Singapore. https://doi.org/10.1007/978-981-99-2571-1_5
37. Gupta A, Dwivedi DN, Shah J (2023) Applying machine learning for effective customer risk
assessment. In: Artificial intelligence applications in banking and financial services. Future of
business and finance. Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_6
38. Gupta A, Dwivedi DN, Shah J (2023) Artificial intelligence-driven effective financial transac-
tion monitoring. In: Artificial intelligence applications in banking and financial services. Future
of business and finance. Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_7
39. Gupta A, Dwivedi DN, Shah J (2023) Machine learning-driven alert optimization. In: Artificial
intelligence applications in banking and financial services. Future of business and finance.
Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_8
40. Gupta A, Dwivedi DN, Shah J (2023) Applying artificial intelligence on investigation. In:
Artificial intelligence applications in banking and financial services. Future of business and
finance. Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_9
41. Gupta A, Dwivedi DN, Shah J (2023) Ethical challenges for AI-based applications. In: Artificial
intelligence applications in banking and financial services. Future of business and finance.
Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-1_10
140 D. N. Dwivedi and G. Mahanty
42. Gupta A, Dwivedi DN, Shah J (2023) Setting up a best-in-class AI-driven financial crime
control unit (FCCU). In: Artificial intelligence applications in banking and financial services.
Future of business and finance. Springer, Singapore. https://doi.org/10.1007/978-981-99-2571-
1_11
43. Gupta A, Dwivedi DN, Jain A (2021) Threshold fine-tuning of money laundering scenarios
through multi-dimensional optimization techniques. J Money Laundering Control. https://doi.
org/10.1108/JMLC-12-2020-0138
44. Gupta A, Dwivedi DN, Shah J, Jain A (2021) Data quality issues leading to suboptimal machine
learning for money laundering models. J Money Laundering Control. https://doi.org/10.1108/
JMLC-05-2021-0049
45. Dwivedi D, Vemareddy A (2023) Sentiment analytics for crypto pre and post covid: topic
modeling. In: Molla AR, Sharma G, Kumar P, Rawat S (eds) Distributed computing and intel-
ligent technology. ICDCIT 2023. Lecture notes in computer science, vol 13776. Springer,
Cham. https://doi.org/10.1007/978-3-031-24848-1_21
46. Dwivedi D, Patil G (2022) Lightweight convolutional neural network for land use image classi-
fication. J Adv Geospatial Sci Technol 2(1):31–48. Retrieved from https://jagst.utm.my/index.
php/jagst/article/view/31
Latest Trends on Satellite Image
Segmentation
Abstract For the last few decades satellite imaging technology has taken massive
strides towards higher spatial resolution, larger swath coverage and almost real-
time data delivery. Satellite imaging or remote sensing is extensively used in optical
imaging of the globe, military surveillance, deforestation detection, etc. Multispec-
tral imaging by satellite has enabled an exceptional understanding of the earth by
imaging beyond the realm of the visible spectrum. Against this backdrop image seg-
mentation using machine learning, deep learning models and image processing has
prompted several new approaches and techniques for satellite image segmentation.
This survey provides a comprehensive review of the recent literature, covering the
novel approaches in image segmentation using satellite imaging as well as others.
There is a broad coverage of segmentation algorithms of UNET, TSVM and Ran-
dom Walker. It investigates the strengths, challenges and novel aspects and compares
precisions and deliberate potential research outlooks.
1 Introduction
Image segmentation is an intrinsic part of computer vision and has a wide array
of applications and methodologies. Image segmentation works on the principle of
subdividing the images or frames in the case of videos into different classifications
or segments. Such classification can be done by using semantic segmentation where
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 141
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_12
142 S. Borkar et al.
Remote sensing and geospatial sciences involve imaging from orbital satellites or
aircraft. These images can be taken over swathes of a few kilometres to the entire
globe. These images are either captured over a single pass of an orbit or multiple
passes for added resolution and application of interferometry. The images can be
taken using sensors that capture visible light, infrared, thermal infrared and radar,
thus extending much beyond what the eye can see. Remote sensing is the science
of extracting useful data from the processed images and creating Geographic Infor-
mation Systems (GIS). The advancements in remote sensing in the last few decades
have led to the generation of extremely high spatial resolution multispectral and
hyperspectral images. This has enabled us to keep up with the global demand for
real-time high-resolution geospatial data that needs to be delivered to collaborators or
stakeholders like local and national governments, corporations and conglomerates.
Satellite visible spectrum images are mostly masked with cloud cover and foliage
shadows, hence requiring processing and noise removal [9].
Multispectral satellite images can be acquired either as raw data or partially pro-
cessed data; to work on these images extensive amounts of preprocessing are required.
These steps can be simplified into categories of image correction which involves
phase corrections, errors due to curvature of the earth, aberrations, etc. These sorts
of fundamental corrections are done using the radiometric correction technique and
Latest Trends on Satellite Image Segmentation 143
orthorectification. There are several noise reduction and removal algorithms which
are extensively used to eliminate the noise added due to foliage cover or construc-
tion like Goldstein Phase Filtering. The raw data generally distributed in levels is
processed by noise layers to receive an interferogram which is further converted to
an elevation map (Fig. 1). The elevation map coupled with the RGB image is crucial
for Synthetic Aperture Radar (SAR) imaging and applications.
The sensors on board the satellites be it optical, radar or infrared are designed specif-
ically to acquire data in different ranges of frequencies along the electromagnetic
spectrum. Each band is capable of capturing a different aspect of information on
the earth’s surface, enabling analysis of various features such as land use, urban
zones, thermal mapping, water body analysis, vegetation cover, mining feasibility
and much more. For example the near infrared is useful in analysing the vegetation
cover as healthier plants reflect back more near-infrared light than dying ones. Ther-
mal infrared imaging can be used to identify local heating zones and global warming
hot spots in the oceans.
Satellites contain one or many imaging apparatuses or instruments which have
the ability to capture specific bands. Usually, optical imaging instruments capture
144 S. Borkar et al.
bands of red, green, blue or the (RGB) visible range and also the infrared (Table 1)
using the same imaging sensor or multiple sensors by the distribution of incoming
light. These bands have different frequency ranges. Radar satellites have specific
band names for specific ranges such as P, L X and Ku and are used in Synthetic
Aperture Radar (SAR) imaging (Table 2). Radar bands are used to create elevation
maps or topographic maps by using the interferometry of the radar beams and can
be used to identify landslides [10, 11] and oil spills [12].
Multispectral remote sensing is the procurement of visible, radar, near infrared, short-
wave infrared and thermal infrared images in various broadwave bands. Different
elements and compounds absorb and reflect wavelengths of electromagnetic radiation
differently. Thus by checking the reflected electromagnetic spectrum signature we
can identify the element or compounds. This property of identification of materials
is non-existent or very limited in images captured using only visible wavelength.
Latest Trends on Satellite Image Segmentation 145
In multispectral imaging, 8–10 images are captured at the same instant by differ-
ent sensors or by separation of the incoming light into different wavelength bands
(Fig. 2). This is achieved by using different sensors calibrated for specific bands or by
using spectroscopy. In this method, we get optical intensity as a discrete function of
wavelength. The discrete values are chosen beforehand so as to extract the most out
of the imaged spectrum. Multispectral imaging is widely used in medical sciences
and biology [13], food safety and quality assurance [14] and most prominently in
remote sensing.
Another methodology similar to multispectral imaging is hyperspectral imaging
[15] which instead of a discrete function of wavelength gives out a spatial map or
a continuous function (Fig. 3) [16]. This method gives more accurate and precise
control over identifying and quantifying molecular level absorption, and it is possi-
ble due to more information available per pixel of the image. Hyperspectral imaging
albeit requiring much more advanced equipment is crucial for making fine observa-
tions. In practice, the hyperspectral imaging apparatus captures finite images in more
than 100 bands. Hyperspectral imaging is extensively used in medicinal science [17]
and food analysis. Hyperspectral imaging is used in remote sensing prominently for
agricultural or military applications.
All the machine learning and deep learning models need a significant training and
testing dataset. In satellite imaging, the raw data is enormous and extensive. Raw L0
types of Synthetic Aperture Radar data for a single image can have a size upwards of 5
GB. Processing such data to get workable image files needs severe technical aptitude
and computational resources. Another great challenge with the creation of datasets
for satellite image segmentation is the annotation of the processed images themselves.
This has to be a completely manual and labour-intensive task. The annotations differ
for the type of segmentation used. Semantic segmentation is the simplest to anno-
tate as it just needs a masking. Panoptic segmentation is a very time-consuming and
repetitive process due to the presence of multiple classes that need to be individually
annotated. There is a lack of software available to streamline this process of creation
of datasets for satellite panoptic image segmentation [18]. Many of the literature
and papers surveyed made a compelling argument for creating a new personalised
dataset for more accurate results. Ghassemi et al. [19] propose a novel convolutional
encoder-decoder network capable of learning visual representation using heteroge-
neous datasets.
146 S. Borkar et al.
Fig. 2 Multispectral
imaging
There has been comprehensive literature presented over the last few decades, Saini
et al. [20] did an extensive study analysis on the different image segmentation tech-
niques, which focused mostly on detecting discontinuity and detecting similarity. The
study also focused on techniques of edge-based segmentation, region-based segmen-
tation and watershed transformation. Yuan et al. [21] proposed a deep learning-based
multispectral satellite image segmentation. This approach was used for the identifi-
cation of water bodies for advanced urban hydrological studies. It proposed a novel
multichannel water body detection network which had components of a multichannel
fusion module, an Enhanced Atrous Spatial Pyramid Pooling module, and Space-to-
Depth/Depth-to-Space operations. Jia et al. [22] proposed using the RGB histogram-
Latest Trends on Satellite Image Segmentation 147
Fig. 3 Hyperspectral
imaging
based image segmentation while using the Masi entropy as the objective function.
It concluded that the multi-strategy emperor penguin optimiser achieved significant
enhancement and exceptional performance.
Boaro et al. [23] applied image segmentation to identify the gold exploration areas
in the Amazon River basin. This application was accomplished by using the U-Net
algorithm. It was achieved only by utilising the hyperspectral nature of the imaging to
separate the wavelength bands associated with gold and other mining materials. The
study also achieved high precision, recall and accuracy percentages. Raghavendra
et al. [24] reviewed the use of image processing techniques and image segmenta-
tion to detect plant leaf diseases. This review explores the need for automation to
streamline the process of delivering the detections. Automation is the need of the
hour and is essential for end-to-end solutions to image segmentation. Yu et al. [25]
148 S. Borkar et al.
Fig. 5 U-Net
3.2 U-Net
For high-resolution image segmentation, Random Walker has been proposed by mul-
tiple researchers. Random Walker produces segmentation for images by using several
highlighted markers [35]. These markers are obtained through a step-by-step process.
A marker is assigned to an unknown pixel or data point with reference to already
defined markers or labelled data points. Using grey values, noises are determined
and removed. Random Walker achieves this by randomly traversing through a graph
in such a way that the probability in each situation remains the same (Fig. 6). The
probability is calculated on the basis of current node V, the node visited before the
current node T and the target node. Accordingly, a weight will be calculated whose
Latest Trends on Satellite Image Segmentation 151
values between 0 and 1 determine the increases or decreases in the probability. How-
ever, according to research using this model for high 3D datasets of increasing sizes
becomes tedious. Using superpixel features that group multiple pixels together based
on their visual semantics, they help in reducing or narrowing down features and help
in labelling data [35]. Superpixel features do provide good results in the case of
low-density separation and TSVM models but show some difference in the case of
Random Walker.
5 Conclusion
were most frequently used for satellite image segmentation. They were shown to be
the most effective as they are inclusive of high-resolution 3D images which are not
possible in other models. However, there is a lot of scope for improvement as there
remains a minor amount of computational setbacks in the already proposed methods.
Most models are implemented using the superpixels feature to reduce the amount of
features and group data. There are a myriad of causes to work on including using
image segmentation for rural land use, detecting subsiding water bodies, detecting
illegal mining zones or detecting in which case the Random Walker technique can be
implemented. Image segmentation in the above causes can allow for a more compre-
hensive study of various natural calamities and how to work on relief of such areas
and allow for more preventative measures.
References
14. Qin J, Chao K, Kim MS, Lu R, Burks TF (2013) Hyperspectral and multispectral imaging
for evaluating food safety and quality. J Food Eng 118(2):157–171. https://doi.org/10.1016/j.
jfoodeng.2013.04.001
15. Nalepa J et al (2021) Towards on-board hyperspectral satellite image segmentation: under-
standing robustness of deep learning through simulating acquisition conditions. Remote Sens
13(8):1532. https://doi.org/10.3390/rs13081532
16. ElMasry G, Sun D-W (2010) Principles of hyperspectral imaging Technology. Hyperspect
Imaging for Food Qual Anal Control 3–43. https://doi.org/10.1016/b978-0-12-374753-2.
10001-2
17. Rehman AU, Qureshi SA (2021) A review of the medical hyperspectral imaging systems and
unmixing algorithms’ in biological tissues. Photodiagn Photodyn Ther 33:102165. https://doi.
org/10.1016/j.pdpdt.2020.102165
18. Ghassemi S, Fiandrotti A, Francini G, Magli E (2019) Learning and adapting robust features
for satellite image segmentation on heterogeneous data sets. IEEE Trans Geosc Remote Sens
57(9):6517–6529. https://doi.org/10.1109/TGRS.2019.2906689
19. Saini S, Arora K (2014) A study analysis on the different image segmentation techniques. Int
J Inf Comput Technol 4(14):1445–1452
20. Yuan K, Zhuang X, Schaefer G, Feng J, Guan L, Fang H (2021) Deep-learning-based multi-
spectral satellite image segmentation for water body detection. IEEE J Select Top Appl Earth
Observations Remote Sens 14:7422–7434. https://doi.org/10.1109/jstars.2021.3098678
21. Jia H, Sun K, Song W, Peng X, Lang C, Li Y (2019) Multi-strategy emperor penguin optimizer
for RGB histogram-based color satellite image segmentation using masi entropy. IEEE Access
7:134448–134474. https://doi.org/10.1109/access.2019.2942064
22. Boaro JMC, dos Santos PTC, Serra A, Rego VG, Martins CV, Júnior GB, Satellite image
segmentation of gold exploration areas in the amazon rainforest using U-Net. In: IEEE Inter-
national humanitarian technology conference (IHTC). United Kingdom, pp 1–8. https://doi.
org/10.1109/IHTC53077.2021.9698927
23. SSK, Raghavendra BK (2019) Diseases detection of various plant leaf using image process-
ing techniques: a review. In: 2019 5th international conference on advanced computing and
communication systems (ICACCS). https://doi.org/10.1109/icaccs.2019.8728325
24. Yu Y et al (2023) Techniques and challenges of image segmentation: a Review. Electronics
12(5):1199. https://doi.org/10.3390/electronics12051199
25. de Carvalho OLF et al (2022) Panoptic segmentation meets remote sensing. Remote Sens
14(4):965. https://doi.org/10.3390/rs14040965
26. Zhu S, Xia X, Zhang Q, Belloulata K (2007) An image segmentation algorithm in image
processing based on threshold segmentation. In: 2007 Third international IEEE conference on
signal-image technologies and internet-based system. https://doi.org/10.1109/sitis.2007.116
27. Sivakumar V, Murugesh V (2014) A brief study of image segmentation using thresholding
technique on a noisy image. International conference on information communication and
embedded systems (ICICES2014), Chennai, India, pp 1–6. https://doi.org/10.1109/ICICES.
2014.7034056
28. Fernandes K, Cardoso JS (2018) Ordinal image segmentation using deep neural networks.
In: 2018 international joint conference on neural networks (IJCNN). https://doi.org/10.1109/
ijcnn.2018.8489527
29. Di Ruberto C, Rodriguez G, Vitulano S (nd) Image segmentation by texture analysis. In:
Proceedings 10th international conference on image analysis and processing. https://doi.org/
10.1109/iciap.1999.797624
30. Pritt M, Chern G (2017) Satellite image classification with deep learning. In: 2017 IEEE applied
imagery pattern recognition workshop (AIPR). https://doi.org/10.1109/aipr.2017.8457969
31. Jia H, Lang C, Oliva D, Song W, Peng X (2019) Dynamic harris hawks optimization with
mutation mechanism for satellite image segmentation. Remote Sens 11(12):1421. https://doi.
org/10.3390/rs11121421
32. Artan Y (2011) ’Interactive image segmentation using machine learning techniques. In: 2011
Canadian conference on computer and robot vision. https://doi.org/10.1109/crv.2011.42
Latest Trends on Satellite Image Segmentation 155
33. Chen M-S, Ho T-Y, Huang D-Y (2012) Online transductive support vector machines for clas-
sification. In: International conference on information security and intelligent control. https://
doi.org/10.1109/isic.2012.6449755
34. Siddique N, Paheding S, Elkin CP, Devabhaktuni V (2021) U-Net and its variants for medical
image segmentation: a review of theory and applications. IEEE Access 9:82031–82057. https://
doi.org/10.1109/access.2021.3086020
35. Haoming K, Chunming L, Zhang K (2023) Satellite image parcel segmentation and extraction
based on U-Net convolution Neural network model. In: IEEE international conference on
control. Electronics and Computer Technology. https://doi.org/10.1109/ICCECT57938.2023.
10141307
Landmark Detection Using
Convolutional Neural Network: A Review
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 157
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_13
158 D. Bharti et al.
2 Review Process
This stage involves compiling a variety of landmark photos and using them as a
dataset for annotation-assisted preprocessing of the dataset. These annotated land-
marks are derived from the other websites or from the ground locations of the
landmarks shown in the pictures.
Create a CNN design that is appropriate based on landmark detection; the optimal
architecture can be found by trial and error or by combining various CNN examples
and principles. The convolutional, pooling, nonlinear activation, or fully linked layers
make up the CNN’s main structure [1].
The image undergoes preprocessing prior to being fed into the neural network via
the input layer. Several alternating layers of pooling and convolution are applied to the
image as part of this preprocessing. By reducing the number of connections between
the convolutional layer and mitigating the layer’s extreme sensitivity to position, the
pooling layer [2] lessened the computational cost. After that, the fully connected
layer classifies it. CNN can be alternatively termed as neural network; this neural
network consists of several different neurons, which is nothing but a fundamental
artificial neural network processor and a Multilayer Perceptron (MLP). It consists
of input layer, hidden layer, and output layer and is made up of several basic unit
neurons that communicate with one another by layer-by-layer conduction.
The structure of MLP as represented in Fig. 1 contains k hidden units in addition
to n input values and m output values. The input value is x n . The input value is
delivered in the direction indicated by the arrow. hk is the hidden unit; it gets the
input value from the layer before it. The actual value is y * m, and the output unit is
ym .
This process involves forwarding the image through the network, extracting features,
and applying regressions or classification techniques to locate the landmarks. The
new input image passes through the trained CNN to depict the locations of the
landmarks. Figure 2 represents the design of an implemented CNN. It consists of a
total 37 layers for the localization. The size of the input layer was kept 320 × 240
× 3 (width in pixels, height in pixels and color channels: red, green, and blue).
Within the 37 levels of the CNN, there are ten sets of convolutional layers, batch
normalization, and ReLU. In Fig. 2, these three layers are shown together as a single
green layer. There is a huge variation in sizes and amounts for choosing the filters
that is being implemented in convolutional layers. The earliest filter size was kept
highest with 5*5, while its size was decreasing inversely with the increase in depth
of the network. The overall implementation for the CNN consists of 48 epochs with
a batch size of 64.
2.5 Post-preprocessing
3 Background Study
This section aims to provide an overview of existing research and highlight key
studies, approaches, and findings in this domain.
Early historical studies relied heavily on the art, style, and content of the books. By
automating learning and end-to-end discovery, this technique has laid the foundation
for continued development in the field of computer vision, particularly with the
introduction of convolutional neural networks (CNN). CNNs have solved many of
the limitations associated with previous techniques and ushered in a new era of spatial
exploration.
In these research areas, the choice of CNN architecture is often influenced by the
specific requirements of the task, the data size, and the computational resources
involved. The researchers used each building’s unique characteristics and capabilities
to solve problems such as scale, design, and lighting changes related to geographic
space. Before training CNN models on land survey data, it is a good practice to allow
the model to use features learned from large datasets such as ImageNet and translate
them into specific exploration tasks.
It plays an important role in ensuring that CNN-based detection models can handle
diverse and complex real-world situations. They help prevent overfitting by exposing
the model to a variety of variables during training and make the model more reliable
in detecting signals in unseen data. The choice of specific data development and
processing methods depends on the characteristics of the dataset and the needs of
the land survey project.
The impact of unemployment varies across research findings depending on the job,
dataset, and specific competition. Various unemployment and education strategies
often try to find the best together, resulting in real and strong regional areas. The
choice of unemployment often depends on the specific names and characteristics of
the land finding problem to be solved.
4 Literature Review
A landmark is merely the distinctive visual indication that is used to identify a specific
object. Different machine learning techniques are used in landmark classification to
identify it. Essentially, it is an expanded component of picture categorization that is
carried out utilizing reliable methods. The convolutional neural network (CNN) is a
useful tool for categorizing these pictures. It is made up of various interconnected
layers that process the supplied data. The first layer, known as the convolutional
layer, applies several filters to the input data in order to find some patterns.
The purpose of the following layer’s extra convolutional and pooling layers is to
shrink the size of feature maps. This layer makes the model more computationally
efficient.
The final CNN softmax layer generates a probability distribution across all
potential class labels for the input data.
Classification algorithms require a lot of computer power, which can be reduced
or optimized by combining convolutional neural network (CNN) with deep learning
techniques. An innovative branch of machine learning called “deep learning” uses
algorithms built on artificial neural network topologies. The performance of neural
networks was shown to be more effective and reliable for identifying the photos
among all machine learning techniques.
In order to create such models, we must generally follow these extensive steps:
• Select a dataset; setting up a dataset on which to base our models is the first and
most important step. There are other websites where we may obtain those, but
164 D. Bharti et al.
Kaggle is one of the most well-known. It offers a sizable database of datasets that
can be worked with simply by downloading them into a directory.
• After selecting a dataset, the next step is to prepare a dataset for training. Before
deploying any machine learning algorithm, we must first validate it, which takes
place in two steps. Training datasets are used to determine the model’s accuracy.
• The following stage is to create training data and apply labels to it; we will then
give the training data and labels to the CNN. By transforming the categorical class
labels into one-hot encoded vectors, these labels will be applied to categorical data.
• Start defining and training the CNN model.
• Test the model’s accuracy.
As was previously said, classical classification techniques have a poor level of
processing accuracy and need a lot of computer resources. When classifying images,
aspects including picture variety, image size, and hardware are taken into account.
The more accurate a model is, the better it is regarded. Using deep learning techniques
is one strategy to increase the accuracy of a classification model; however, none
of these deep learning optimization methods can guarantee the best accuracy or
efficiency in terms of time.
was determined to be the most successful for landmark detection. However, it has
certain drawbacks as well, such as reduced accuracy when spatial information is lost
during the intermediate stages.
Heatmap-based regression models: This model is one method for the task of
detecting landmarks. To apply this model, we must first produce heatmaps, then, a
particular network is selected and trained to produce heatmaps for each input image.
Yang et al. [2] presented an efficient technique for predicting heatmaps that consists
of a two-part network with a supervised transformation that is used to standardize
faces and a stacked hourglass network. In order to produce heatmaps, Valle et al.
[2] introduced a straightforward CNN in their work on heatmap prediction. There
are a number of possibilities when recognizing landmarks that need to be taken into
account, such as accuracy and form fluctuations. A well-known scientist by the name
of Sun et al. devised an algorithm called HRNet [5] that is highly accurate at detecting
facial landmarks and doing many other computer vision tasks. A researcher named
Iranmanesh et al. proposed a solution to the problem of shape variations on detection
by introducing an algorithm that could separate a collection of photos and record
reliable landmarks. Gaussian heat map vectors [6], a notion of heatmap vectors that
was presented by Xiong et al., give a heatmap that is utilized for facial landmark
point recognition and is a highly preferred type of heatmaps.
LeNet: For the purpose of reading handwritten characters, this algorithm was devel-
oped. It was first presented in the late 1990s by ref. [2]. Its component parts are the
softmax classifier and the three CNN layers. Initially, it was combined with deep
learning to analyze computer vision. It has been extensively used to read numbers
from grayscale input photos of checks.
AlexNet: With the help of this effective algorithm AlexNet, the previous method’s
error rate dropped from 26 to 15%. At the University of Toronto, several researchers
[2] have put out the idea of this algorithm. It is particularly effective at showcasing
deep learning techniques when combined with conventional computer vision tasks.
ResNet: ResNet, which stands for Residual Neural Network, is a member of the
deep CNN family. It is made up of various residual blocks, each of which is just a
parallel arrangement of convolutional layers followed by an activation function. This
parallel arrangement is then integrated by adding a shortcut link, which can omit the
convolutional layers. After the activation function, it can immediately add the original
input to the convolutional layers’ output. ResNet is a streamlined algorithm that just
relies on learning residual functions to provide results. Its residual block properties
have made it simpler to train numerous deep networks with multiple layers, which
has helped to solve the gradient loss problem.
166 D. Bharti et al.
Every model should offer optimized accuracy while using the fewest computational
resources possible. The outcomes generated by these models should also have the
highest possible efficiency. To attain these goals, a number of optimization strategies
can be taken into consideration. Some of them are listed below:
AdaMax: It is an advanced version or modification of Adam optimizer that uses first-
order gradient-based optimization. The input data determines how the learning rate
changes in this optimization technique. This type of optimizer is highly recommended
in situations where the process changes over time. According to [7], a model was
put into use on the Google Landmark Dataset, and various optimization strategies
were employed to identify the accuracy and efficiency and how they changed as
the size of the training dataset increased and decreased. The dataset used in those
models has a 150-epoch and a 300-batch size. When employing the ResNeT approach
and the AdaMax optimizer, it was discovered to have an accuracy of 95% while
other techniques, such as MobileNet and VGG16, were also used, but ResNet50 was
determined to have the highest accuracy.
Adam Optimizer: The standard method for optimizing CNN models was Stochastic
Gradient Descent (SDG), but as technology advanced, new variations and improve-
ments were made, one such method is the ADAM optimizer, which is an enhanced
version of SDG. Natural language processing, computer vision, and other deep
learning applications all make extensive use of it. It was initially presented in 2014
and was given this name because it likes to estimate the first and second moments of
the gradient when calculating the learning rate of each weight in the neural network.
Before its introduction, a number of other techniques were in use, including RMSP
and AdaGrad, which perform better than SDG. Despite having so many benefits of
RMSP and AdaGrad, there are also a number of drawbacks, such as the fact that SDG
performed better in generalization than AdaGrad and RMSP. Thus, the need for new
optimization approaches was felt, and ADAM was consequently introduced. If we
take ADAM optimizer’s drawbacks into account, performance suffers for various
deep learning tasks.
Landmark Detection Using Convolutional Neural Network: A Review 167
4.4 Applications
learning techniques is used to perform this, basically when CNN is integrated with
deep learning the outcome so obtained results in robust performance. Following are
the applications of CNN on detecting landmarks.
Both non-medical and medical image processing can greatly benefit from land-
mark point detection. Landmarks can be a visual point based on facial points. It
firstly extracts the landmarks related to face, and then CNN is applied to detect the
particular face. It has been widely used to identify strange images over any site.
Apart from recognizing the human face there are several other applications too,
we can obtain the landmark points from the X-images as well, and this helps in
detecting the sagittal cervical spine. Also, we can perform body joint tracking.
whereas, the non-medical application includes the classification of ancient monu-
ment and more, like research on classifying the ancient temple was conducted in
Indonesia using the CNN and SDG together [6], that model got an aggregate accu-
racy of 86.28%, this accuracy was obtained when 100 epochs along with 6 classes
were trained. It was also mentioned that at epoch 50, the results found were optimal
because the accuracy obtained upon training was 98.99% and the validation reached
85.57%. All these implementations were done by using AlexNet architecture.
5 Result Analysis
The weighted mean error of 1.96 ± 1.62 mm was estimated by the model selected for
test set results in coordinate forecasting [8]. Regression weighted curve to curve-to-
curve prediction error was also present [8] with a value of 1.82 ± 0.70 mm. Figure 3
depicts the contribution of coordinate errors over testing in the test set for a better
understanding.
Figure 3 uses box plots to illustrate the coordinate prediction mistakes observed
in each of the 19 exams in the test set.
Top: In-plane errors are shown, where the measure takes into consideration
forecasts for every single plane and time period.
In the bottom image, curve-to-curve errors are shown, with each frame accounting
for a different error value.
The green triangles on the plot show the average error for each test, while the boxes
show the median error and quartiles. To 1.5 times the interquartile range, whiskers
are present. To preserve clarity and keep the emphasis on the primary issues, outliers
are specifically omitted from the charts.
Also, a surgical view prediction error of 3.28 ± 2.92° and a relative perimeter
error of 5.8 ± 4.8% were also attained by the model. The metrics for predicting
coordinates are given in Table 1.
Landmark Detection Using Convolutional Neural Network: A Review 169
Fig. 3 Box plots depicting coordinate prediction discrepancies across 19 test examinations [8]
Out of 128 planes, the test set’s weighted mean inaccuracy of the anatomical
orientation predictions was 9.7 ± 15.8 degrees, or 3.5 ± 5.6 plane indices. Addi-
tionally, there was a median forecast inaccuracy of 5.6 degrees, or 2 plane indices.
Table 1 provides a [8], and Fig. 4 provides a visual representation of the results for
each examination in the form of a box plot.
The outcomes of the anatomical orientation prediction are shown in two ways in
Table 2:
• First row: Rotational radii.
• Second Row: The amount of plane indices used to measure the error. One way to
describe this inaccuracy is as a percentage of the 128 rotating planes.
170 D. Bharti et al.
Table 2 Anatomical
Degree planes Anatomical orientation errors
orientation prediction results
[8] Weighted mean Median
9.7° ± 15.7° 5.6°
3.5° ± 5.6° 2 planes
In this part, we start by evaluating the precision of PoseNet, our recently introduced
network design. We then concentrate on determining how accurate the suggested loss
function is. We also conduct a thorough analysis of the performance of our PoseNet
model.
PoseNet evaluation: In this part, we start by comparing the precision of our PoseNet
model, which was developed using the common L2 loss function, to two well-known
baseline models: MobileNetV2 and ResNet50. Additionally, we offer the mnv2-hm
and res50-hm encoder–decoder models to ensure a fair comparison. The encoders
for these models are MobileNetV2 and ResNet50, and the decoders consist of three
sets of DeConv2D layers with filter sizes of 256, kernel sizes of 3, and strides of 2
before ReLU activation layer and batch normalization.
As given in Table 3, the [9] models used for detecting landmark points on the
sagittal cervical spine are much more accurate than coordinate-based regression
models. Notably, the [3] subsets’ respective Normalized Mean Errors (NME) for
mnv2-hm are 5.60%, 6.42%, and 10.56%. When resn50-hm is used as the model,
these values are later decreased to 4.79% (about a 0.81% reduction), 5.20% (about a
1.22% reduction), and 7.68% (about a 2.88% reduction). PoseNet’s implementation
further reduces the NME to 4.75%, 5.21%, and 7.48%, respectively, representing
reductions of around 0.85%, 1.21%, and 3.08% when compared to mnv2-hm for the
[9] subsets.
Landmark Detection Using Convolutional Neural Network: A Review 171
Table 3 Examination of the differences between various models trained using L2 loss, including
normalized mean error (NME), failure rate (FR), and area under the curve (AUC) [9]
Model NME (↓) FR (↓) AUC (↑)
LC LCE LCF LC LCE LCF LC LCE LCF
mnv2 6.47 7.35 11.04 7.49 9.16 41.96 0.6100 0.5270 0.2069
res50 6.12 6.88 10.75 7.02 8.41 39.60 0.6375 57.21 0.2192
Mnv2-hm 5.60 6.42 10.56 3.84 6.66 36.60 0.6971 0.6227 0.2490
res50-hm 4.79 5.20 7.68 2.92 2.5 16.07 0.7375 0.7527 0.4862
PoseNet 4.75 5.21 7.48 2.77 3.33 9.82 0.7602 0.7385 0.5067
PoseNet is more accurate than resn50-hm in both the LC and LCF subsets, but
marginally less accurate in the LCE subset. However, it is important to highlight
that compared to resn50-hm, PoseNet has much fewer model parameters and fewer
floating-point operations (FLOPs), as seen in Table 4.
Assessment of IC-Loss: We trained the PoseNet model with three distinct loss func-
tions—L1, L2, and our recently developed IC-loss function—in order to evaluate the
effectiveness of the suggested loss function. The Normalized Mean Error (NME) for
the model, when trained with the L2 loss function, is given in Table 5, to be 4.75%,
5.21%, and 7.48% for the [9] subsets, respectively. When the L1 loss function is
used, a minor increase in performance is shown, resulting in a decrease in the NME
to 4.69%, 5.20%, and 7.25% for the [9] subsets, respectively.
Table 4 Comparative analysis of the model parameter and FLOP numbers [9]
Model #Parameters #Flops
Mnv2-hm 6,398,045 683,025,408
res50-hm 29,497,245 3,066,262,528
PoseNet 23,226,269 596,375,552
There are fewer model parameters and floating points operations (Flops) for PoseNet and Mnv2-hm
compared to res50-hm
Table 5 Analysis of PoseNet trained using various loss functions in terms of normalized mean
error (NME), failure rate (FR), and area under the curve (AUC) [9]
Model NME (↓) FR (↓) AUC (↑)
LC LCE LCF LC LCE LCF LC LCE LCF
L2 4.75 5.21 7.48 2.77 3.33 9.82 0.7602 0.7385 0.5067
L1 4.96 5.20 7.25 2.61 3.33 12.50 0.7654 0.7395 0.5343
IC-loss 4.38 4.76 6.50 2.51 3.33 6.25 0.7882 0.7760 0.6034
172 D. Bharti et al.
When the model is trained using our suggested IC-loss function, the model’s
accuracy improves the most noticeably. With this method, the NME is significantly
decreased for the LC, LCE, and LCF subsets, respectively, to 4.38%, 4.76%, and
6.50%. This displays the IC-loss function’s higher performance in contrast to the
L1 and L2 loss functions, highlighting its efficiency in raising the PoseNet model’s
accuracy.
References
1. Cristinacce D, Cootes TF (2006) Feature detection and tracking with constrained local models.
Proc Brit Mach Vis Conf 3
2. Yang J, Liu Q, Zhang K (July 2017) Stacked hourglass network for robust facial landmark
localisation. Proceedings IEEE conference computer vision pattern recognition. Workshops
(CVPRW), pp 79–87
3. Ruder S (Sept 2016) An overview of gradient descent optimization algorithms, pp 1–14
4. Martins P, Caseiro R, Batista J (2013) Generative face alignment through 2.5D active appearance
models. Comput Vis Image Understand 117(3):250–268
5. Feng Z-H, Kittler J, Awais M, Huber P, Wu X-J (June 2018) Wing loss for robust facial landmark
localisation with convolutional neural networks. Proceedings IEEE/CVF conference computer
vision pattern recognition, pp 2235–2245
6. Trigeorgis G, Snape P, Nicolaou MA, Antonakos E, Zafeiriou S (June 2016) Mnemonic descent
method: a recurrent process applied for end-to-end face alignment. Proceeding IEEE conference
computer vision pattern recognition (CVPR), pp 4177–4187
7. Landmark classification service using convolutional neural network and Kubernetes, p 2820
8. Mitral annulus segmentation and anatomical orientation detection in TEE images using periodic
3D CNN. IEEE J Mag. IEEE Xplore
9. Sagittal cervical spine landmark point detection in X-ray using deep convolutional neural
networks. IEEE J Mag. IEEE Xplore
An Efficient Illumination Invariant Tiger
Detection Framework for Wildlife
Surveillance
1 Introduction
Tigers are iconic symbols of the rich biodiversity of Asia with most of their popula-
tions being primarily concentrated in countries like India, Thailand, and Indonesia
[1]. These magnificent cats are apex predators and one of the vital indicators of
the health of an ecosystem. However, there are multiple challenges faced in their
conservation due to habitat loss, illegal trade, poaching, and extensive tourism [2].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 173
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_14
174 G. Pendharkar et al.
In the early twentieth century, there were more than 100,000 wild tigers in Asia
which has drastically reduced to fewer than 4000 wild tigers. This precipitous decline
is mainly due to a 93% reduction in tiger habitats as a result of deforestation and
agricultural expansion. Furthermore, the poaching of tigers for the illegal trade of
tiger specimens and retaliatory attacks during human-wildlife conflicts are the other
reasons behind the sudden decline in the population [3]. Conservationists have made
commendable progress in tiger conservation but with the advent of artificial intelli-
gence, many of the protocols can be efficiently automated by the implementation of
wildlife conversation systems [4].
Wildlife surveillance systems are a crucial tool for the conservation of endan-
gered species, especially tigers which are characterized by elusive behavior. In order
to automate tiger detection in wildlife surveillance, accurate object detection plays
a vital role. The change in lighting is one of the major challenges for efficient object
detection in wildlife surveillance. Low Illumination Images (LIIs) tend to lack clarity
in the illumination of the image since they are sampled and quantized in low-light
environmental conditions. Traditional non-deep learning methods do not adapt based
on the pixel distribution of the image and result in extreme improvement in illumi-
nation. Many of the traditional deep learning approaches require pair-wised super-
vision to train the models based on the Retinex theory [5]. However, there are a few
approaches implemented by General Adversarial Networks (GANs) that improve
the illumination by learning some unsupervised features. Therefore, in this paper,
the EnlightenGAN [6] is used to provide illumination enhancement to the images
followed by object detection with the YOLOv8 model [7].
2 Related Work
Tiger Detection: A fast yet efficient approach for real-time tiger detection was
proposed by Kupyn and Pranchuk [8]. TigerNet model was developed based on the
Feature Pyramid Network (FPN) and uses Depthwise Separable Convolutions (DSC)
with a lightweight backbone of FD-MobileNet [9]. The lightweight backbone of FD-
MobileNet improves the detection speed which is commendable but also reduces the
accuracy slightly. Tan et. al constructed a dataset of video clips taken by infrared
cameras installed in the Northeast Tiger and Leopard National Park covering 17
species [10]. The main aim of the paper was to compare the performance of three
mainstream object detection models, particularly FCOS Resnet101, YOLOv5, and
Cascade R-CNN HRNet32. YOLOv5 showed the most consistent results among the
three models. It obtained 88.8, 89.6, and 89.5% accuracy at various thresholds. The
overall high accuracy of detection is commendable but the model suffers from data
imbalance and therefore has a significant variance in the species-wise performance.
However, the models relatively performed well for Amur Tigers compared to other
animals.
An Efficient Illumination Invariant Tiger Detection Framework … 175
3 Methodology
namely Global and Local Discriminators. Hence, in this paper, the EnlightenGAN is
applied to handle illumination variation followed by object detection with YOLOv8.
The framework of the proposed architecture is shown in Fig. 1.
3.1 EnlightenGAN
. B = (A' ⊗ I ) ⊕ A (1)
The final reconstructed image (B) is given as input to the global discriminator
and some randomly cropped patches are given as input to the local discriminator.
Finally, both the discriminators return a true or false. The loss function used to train
the network is shown in Eq. 2 in which .LGlobal
SFP and .LLocal
SFP denotes the self-feature
preserving loss for the global and local discriminator, respectively. The loss for
global and local discriminators is denoted by .LGlobal
G and .LLocal
G . The sample images
illuminated by EnlightenGAN are shown in Fig. 3.
Loss = LGlobal
. SFP + LSFP + LG
Local Global
+ LLocal
G (2)
An Efficient Illumination Invariant Tiger Detection Framework … 177
3.2 YOLOv8
“YOLO” an acronym for “You Only Look Once” is a highly effective algorithm for
real-time object detection, renowned for its exceptional balance between accuracy
and speed. The latest iteration of the YOLO series, YOLOv8, has been created by
Ultralytics. It improves the performance of the previous versions of YOLO by the
following improvisations:
• introduction of anchor-free detections by which the model can directly estimate
the center of an object instead of relying on an offset from a known anchor box.
• alterations in the convolutions in the backbone network.
• closing mosiac augmentation before training is completed.
Figure 4 shows the general architecture of the YOLOv8 model which has three
components namely the backbone, neck, and head. Relevant features are extracted
from the input image by the backbone network. The neck connects the backbone
An Efficient Illumination Invariant Tiger Detection Framework … 179
network and the head network. It also reduces the dimensions of the feature map and
improves the resolution of the features. Finally, the head network comprises three
detection networks to detect small, medium, and large objects. The sample results
for the YOLOv8 model are portrayed in Fig. 5.
The experiment is performed with the ATRW dataset. ATRW dataset deals with
three computer vision tasks namely tiger detection, pose estimation, and tiger re-
identification. The detection dataset comprises 9496 bounding boxes across 4434
images [21]. The efficiency of the framework is evaluated using Mean Average
Precision (mAP). For an object detection task, each prediction has confidence and
the dimensions of the bounding box (. B p ). A detection is said to be correct if it
satisfies the IoU threshold (.t) as shown in the Eq. 4. IoU stands for the Intersection
over Union which is the ratio of the area of intersection to the area of union as
depicted in the Eq. 3. Finally, among the correct predictions, the mAP is computed
as per the formula given in Eq. 5 [22]. The average precision denoted by AP is
the area under the precision-recall curve. The state-of-the -art performance for the
proposed methodology is shown in Table 1 and is pictorially depicted in Fig. 6.
area(B p ∩ Bgt )
.IoU(B p , Bgt ) = (3)
area(B p ∪ Bgt )
180 G. Pendharkar et al.
{
correct IoU(B p , Bgt ) > t
. T (B p , Bgt ) = (4)
incorrect Otherwise
1 ∑
N
mAP =
. A Pi (5)
N i=1
5 Conclusion
Recently, wildlife surveillance has been automated with various computer vision
methodologies. Illumination variation is a tedious task in wildlife surveillance. It
also affects efficient object detection during surveillance. This paper focuses on a
novel framework that addresses Illumination variation and provides efficient tiger
detection using EnlightenGAN and YOLOv8. The attention-guided unit, global and
local discriminator in the EnlightenGAN provides illumination varied images which
are further fed into the YOLOv8 object detector. The experiment is performed with
the ATRW tiger dataset. The proposed model outperforms the SOTA with an mAP
of 0.617. In the future, the tiger detection model can be expanded for multi-class
wildlife surveillance.
References
1. Gray TN, Rosenbaum R, Jiang G, Izquierdo P, Yongchao JIN, Kesaro L, Chapman S (2023)
Restoring Asia’s roar: opportunities for tiger recovery across the historic range. Front Conserv
Sci 4:1124340
2. Rana AK, Kumar N (2023) Current wildlife crime (Indian scenario): major challenges and
prevention approaches. Biodivers Conserv 32(5):1473–1491
3. Nittu G, Shameer TT, Nishanthini NK, Sanil R (2023) The tide of tiger poaching in India
is rising! An investigation of the intertwined facts with a focus on conservation. GeoJournal
88(1):753–766
4. Isabelle DA, Westerlund M (2022) A review and categorization of artificial intelligence-based
opportunities in wildlife, ocean and land conservation. Sustainability 14(4):1979
5. Pan X, Li C, Pan Z, Yan J, Tang S, Yin X (2022) Low-light image enhancement method based
on retinex theory by improving illumination map. Applied Sciences 12(10):5257
6. Jiang Y, Gong X, Liu D, Cheng Y, Fang C, Shen X, Wang Z (2021) Enlightengan: deep light
enhancement without paired supervision. IEEE Trans Image Process 30:2340–2349
7. Terven J, Cordova-Esparza D (2023) A comprehensive review of YOLO: From YOLOv1 to
YOLOv8 and beyond. arXiv preprint arXiv:2304.00501
8. Kupyn O, Pranchuk D (2019) Fast and efficient model for real-time tiger detection in the wild.
In: Proceedings of the IEEE/CVF international conference on computer vision workshops
9. Qin Z, Zhang Z, Chen X, Wang C, Peng Y (2018) Fd-mobilenet: improved mobilenet with a
fast downsampling strategy. In: 2018 25th IEEE international conference on image processing
(ICIP). IEEE, pp 1363–1367
10. Tan M, Chao W, Cheng JK, Zhou M, Ma Y, Jiang X, Feng L (2022) Animal detection and clas-
sification from camera trap images using different mainstream object detection architectures.
Animals 12(15):1976
11. Liu B, Qu Z (2023) AF-TigerNet: a lightweight anchor-free network for real-time Amur tiger
(Panthera tigris altaica) detection. Wildlife Letters 1(1):32–41
12. Wang CY, Liao HYM, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: a new backbone
that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition workshops, pp 390–391
13. Dertien JS, Negi H, Dinerstein E, Krishnamurthy R, Negi HS, Gopal R, Baldwin RF (2023)
Mitigating human-wildlife conflict and monitoring endangered tigers using a real-time camera-
based alert system. BioScience 73(10):748–757
182 G. Pendharkar et al.
14. Al Sobbahi R, Tekli J (2022) Comparing deep learning models for low-light natural scene
image enhancement and their impact on object detection and classification: Overview, empirical
evaluation, and challenges. Signal Process Image Commun 116848
15. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio
Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
16. Wang W, Wei C, Yang W, Liu J (2018) Gladnet: low-light enhancement network with global
awareness. In: 2018 13th IEEE international conference on automatic face and gesture recog-
nition (FG 2018). IEEE, pp 751–755
17. Choudhury S, Saikia N, Rajbongshi SC, Das A (2022) Employing generative adversarial net-
work in low-light animal detection. In: Proceedings of international conference on communi-
cation and computational technologies: ICCCT 2022. Springer Nature Singapore, Singapore,
pp 989–1002
18. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint
arXiv:1804.02767
19. Wang J, Yang P, Liu Y, Shang D, Hui X, Song J, Chen X (2023) Research on improved yolov5
for low-light environment object detection. Electronics 12(14):3089
20. Guo C, Li C, Guo J, Loy CC, Hou J, Kwong S, Cong R (2020) Zero-reference deep curve
estimation for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pp 1780–1789
21. Li S, Li J, Tang H, Qian R, Lin W (2019) ATRW: a benchmark for Amur tiger re-identification
in the wild. arXiv preprint arXiv:1906.05586
22. Padilla R, Netto SL, Da Silva EA (2020) A survey on performance metrics for object-detection
algorithms. In: 2020 international conference on systems, signals and image processing (IWS-
SIP). IEEE, pp 237–242
An Innovative Frequency-Limited
Interval Gramians-Based Model Order
Reduction Method Using Singular Value
Decomposition
1 Introduction
Large complex systems [1] that possess high dimensional characteristics are chal-
lenging to evaluate and implement. As a result, such models therefore can be approx-
imated into a feasible lower order that can be handled. An effective approximation
of the reduced-order model (ROM) solves the challenge of evaluating and devel-
oping such complex frameworks. By using model order reduction (MOR), a small,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 183
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_15
184 V. Sharma and D. Kumar
computationally simple model that shares the same properties as the original model
is produced. The MOR has an impact on a number of fields that follows:
• Computational science.
• Aerospace applications.
• Data integration.
• Medical technology.
The balanced truncation (BT), also known as the balanced realization (BR), played
a crucial part in the control-system theory’s implementation for model reduction.
While explicitly setting a bound on the frequency response error, it can still main-
tain stability. At all frequencies, the reduction error between the real system and
the ROM should ideally be minimal. In some frequency bands, the reduction error
may be higher than in others when utilizing a low-order model in a feedback-control
framework. In the region that is the crossover region, an accurate depiction of the full-
order system is necessary then the problem is known as the frequency model reduction
problem. This served as an inspiration for the application of frequency weighting
in model reduction. The controller reduction problem is referred to when it is used
to reduce the controller’s order. By extending the BT Moore [2], the augmentation
of frequency weightings with the initial stable model was executed by Enns’ [3].
This strategy might call for realizations of input, output, and each-sided weighting.
However, Enns’ [3] also produces unstable ROMs when double-sided weighting is
applied. Lin and Chiu (LC) [4] used a method that also produced stable models for
dual-sided weightings to address the issue of instability. Furthermore, Sreeram et al.
[5] expanded the generalization of Anderson and Liu [4] to incorporate appropriate
weights. Varga and Anderson (VA) [6] pointed out that the technique by Sreeram
et al. [5] did not find its relevance with controller reduction applications as there was
no pole-zero cancellation. VA [6] made additional modifications to the technique
to address the problem, but the outcome of their improved approach was consistent
with Enns’ [3] method, in particular with respect to controller reduction applications.
Wang et al. [7] made another improvement to Enns’ [3] method, which produced
a straightforward and remarkable error bound as well as the assurance of stability
for both-sided weightings. The aforementioned methods and their modifications,
according to Sreeram [8], are dependent on the realization and produce different
models for different realizations of the same system. As a result, there could be
large approximation errors and error bounds. Moreover, Kumar and Sreeram [9]
proposed a MOR technique on the ideal Hankel norm of frequency weights by gener-
ating new augmented system realizations through the initial system using various
factorizations of fictitious matrices. Kumar et al. [10] presented a methodology in
which the Gramians developed were used for model reduction algorithms for linear
time-invariant continuous-time single-input single-output (SISO) systems. In addi-
tion, Sharma and Kumar [17] suggested an SVD-based methodology for a better
An Innovative Frequency-Limited Interval Gramians-Based Model … 185
This paper employs the singular value decomposition (SVD) to present a new
frequency-limited MOR strategy for a linear time-invariant continuous-time system
reduction, which is beneficial in matrix factorization, with the objective of mini-
mizing the approximation error. The following are this work’s primary contributions:
i. The limited Gramians and Lyapunov equations help to form new intermediary
matrices, also known as fictitious matrices.
ii. The proposed method offers stable ROMs within the stated frequency interval
range.
186 V. Sharma and D. Kumar
2 Methodology
Let us take into account a transfer function (TF) equipped LTI stable system:
where {Aog , Bog , Cog , Dog } is its m̂th order stable and minimal realization with û
inputs and v̂ outputs, where Aog ∈ Km̂×m̂ , Bog ∈ Km̂×û ,Cog ∈ R v̂×m̂ , Dog ∈ R v̂×û .
The proposed work aims to achieve an efficient framework with a TF:
which, in the given FI [ω1 , ω2 ] as (ω2 > ω1 ) and Ar ed ∈ Kr̂ ×r̂ , Br ed ∈ Kr̂ ×û ,
Cr ed ∈ R v̂×r̂ with r̂ < m̂ approximates the initial system.
Let the CG and OG, i.e., Pct and Q ob , respectively, are the solution to the Lyapunov
equations (LE),
T
Aog Q ob + Q ob Aog + Ydv = 0, (4)
where
( ∗ )
X̂ dv = (S(ω2 ) − S(ω1 ))Bog Bog
T
+ Bog Bog
T
S (ω2 ) − S ∗ (ω1 ) , (5)
( ) T
Ŷdv = S ∗ (ω2 ) − S ∗ (ω1 ) Cog Cog + Cog
T
Cog (S(ω2 ) − S(ω1 )), (6)
j (( ) )
S(ω) = ln j ωI + Aog (− j ωI + Aog )−1 , (7)
2π
X̂ dv = Ud Sd UdT , (8)
where
⎡ ⎤
s1 0 ... 0
⎢0 s2 ... 0⎥
Sd = ⎢
⎣ .
⎥ (10a)
. ... . ⎦
0 0 ... sm̂
and
⎡ ⎤
r1 0 ... 0
⎢0 r2 ... 0 ⎥
Rd = ⎢
⎣ .
⎥ (10b)
. ... . ⎦
0 0 ... rm̂
Now, Bpv and Cpv are the proposed fictitious inputs and output matrices,
respectively, as followed by [18] which is shown as follows:
1/2
Bpv = Uv (|sin(Sd )| − Sd )1/2 for sm̂ < 0 and Bpv = Uv Sd for sm̂ ≥ 0 (11a)
1/2
Cpv = (|sin(Rd )| − Rd )VvT for rm̂ < 0 and Cpv = Rd VvT for rm̂ ≥ 0 (11b,)
where sm̂ and rm̂ are the least and the last values of the Sd and Rd matrices. The
parameters Uv , Sd , Vv , and Rd are established by the EVD of the suggested matrices,
which was inspired by IG [15], i.e.,
T
Bpv Bpv = Uv Sd UvT , (12a)
and
T
Cpv Cpv = VvT Sd Vv , (12b)
where
and
The proposed matrices Bpv and Cpv are produced by exerting a similar influence
on each eigenvalue of the symmetric matrices X dv and Ydv .
The following are the suggested frequency-limited CG and OG:
188 V. Sharma and D. Kumar
+ω
1 ( )−1 ( )−1
Ppv (ω) = jωI − Aog T
Bog Bog − j ωI − Aog dω, (14)
2π
−ω
+ω
1 ( )−1 T ( )−1
Q pv (ω) = − jωI − Aog Cog Cog − j ωI − Aog dω, (15)
2π
−ω
T
Aog Q̂ pv + Q̂ pv Aog + Cpv
T
Cpv = 0, (17)
−1 Aog11 Aog12 −1
Atˆ = Tpv Aog Tpv = , Btˆ = Tpv Bog = Bog1 Bog2 , (19)
Aog21 Aog22
Ctˆ = Cog Tpv = Cog1 Cog2 , Dtˆ = Dog ,
|| || || |||| || ∑
m
||G og (s) − G red (s)|| || ||||
≤ 2 L pv K pv || εj (21)
∞
j=r +1
where
An Innovative Frequency-Limited Interval Gramians-Based Model … 189
−1/2
L p = Cog Vv [|sin(Rd )| − Rd ]−1/2 for rm̂ < 0 and L p = Cog Vv Rd for rm̂ ≥ 0
(22)
and
−1/2
K p = [(|sin(Sd )| − Sd )]−1/2 UvT Bog for sm̂ < 0 and K p = Sd UvT Bog for sm̂ ≥ 0
(23)
Proof Since rank Bpv Bog = rank Bpv and rank Cpv Cog = rank Cog , the
relationships Bog = Bpv K pv and Cog = L pv Cpv holds.
T
Partitioning Bpv = Bpv1 Bpv2 , Cpv = Cpv1 Cpv2 and substituting Bog1 =
Bpv1 K pv , Cog1 = L pv Cpv1 , respectively, gives
|| || || ||
||G og (s) − G red (s)|| = ||Cog (s I − Aog )−1 Bog − Cog1 (s I − Aog11 )−1 Bog1 ||∞ .
∞
(24)
|| ||
= || L pv Cpv (s I − Aog )−1 Bpv K pv − L pv Cpv1 (s I − Aog11 )−1 Bpv1 K pv ||∞ (25)
|| ||
= || L pv Cpv (s I − Aog )−1 Bpv − Cpv1 (s I − Aog11 )−1 Bpv1 K pv ||∞ (26)
|| |||| || || ||
≤ || L pv ||||Cpv (s I − Aog )−1 Bpv − Cpv1 (s I − Aog11 )−1 Bpv1 ||∞ || K pv || (27)
If Aog11 , Bpv1 , Cpv1 is the rROM obtained by partitioning and balancing
Aog , Bpv , Cpv , where Aog11 ∈ R ×r , we have Moore [2]
|| || ∑
m
||Cpv (s I − Aog )−1 Bpv − Cpv1 (s I − Aog11 )−1 Bpv1 || ≤ 2 εj. (28)
∞
j=r +1
Therefore,
|| || || |||| || ∑
n
||G og (s) − G red (s)|| ≤ 2|| L pv |||| K pv || εj. (29)
∞
j=r +1
2.1 Algorithm
Given G og (s) and the required range of frequency [ω1 , ω2 ], the following sequence
is used to compute the proposed ROM:
• Using Aog , obtain S(ω) from (7).
• Compute X̂ dv and Ŷdv using (5)-(6).
190 V. Sharma and D. Kumar
Remark 1 For the case when symmetric matrices X̂ dv ≥ 0 and Ŷdv ≥ 0, then CG
defined for GJ [11] ( P̂g ) is same as Ppv , also, the OG for GJ [11], i.e., Q̂ g = Q pv ;
otherwise P̂g < Ppv and Q̂ g < Q pv . Furthermore,
the Hankel singular values (HSV)
at a frequency interval fulfill (λ j P̂g Q̂ g )1/2 ≤ (λ j Ppv Q pv )1/2 .
Example 1 Consider the three-mass mechanical stable system of 6th order having
transfer function with the desired frequency interval [ω1 , ω2 ] = [1, 5][1, 5] as shown:
A system is said to be stable if the pole(s) of a system lies in the left half of a plane,
whereas it is said to be unstable if it lies on the right half and the system is marginally
stable if it lies on the imaginary axis. The ROMs by the suggested and prevailing
techniques GJ [11], GA [13], GS [14], and IG [15] are obtained for the above system
considering the FI [ω1 , ω2 ]=[1, 5] rad/s. Table 1 displays the locations of the ROM
poles and the table makes it evident that [11] provides an unstable 1st-order model as
for it the some of the poles are positive. Thus, we can infer that some poles are located
on the right half of the s-plane, whereas GA [13], GS [14], IG [15], and the proposed
approach produces a stable model as all poles are negative hence lie in the left side
of the s-plane. In Table 2, the proposed work produces the lowest approximation
error within the targeted FI compared to other strategies. Therefore, it follows that
the proposed method gives superior efficacy compared to alternative methods. The
ROMs 3rd-order error plots are displayed in Fig. 1 for interval [ω1 , ω2 ] = [1, 5].
⎡ ⎤
0.6490 − 5.3691 0 0 0 0 0 0
⎢ ⎥
⎢ 5.3691 0 0 0 0 3.8730 0 0 ⎥
⎢ ⎥
⎢ 0 36.5054 − 0.2688 − 12.9391 0 0 3.8730 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 12.9391 0 0 0 0 0 3.8730⎥
A=⎢
⎢ − 3.8730 0
⎥
⎥
⎢ 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ 0 − 3.8730 0 0 0 0 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎣ 0 0 − 3.8730 0 0 0 0 0 ⎦
0 0 0 − 3.8730 0 0 0 0
B = [14 0 0 0 0 0 0 0]T ,
C = [0 0 0 0.0136 0 0 0 0]T ,
D=0 (31)
The ROM of 4th order by the proposed and existing techniques GJ [11], GA [13],
GS [14], and IG [15] are obtained for the above stated case while considering the
frequency intervals [ω1 , ω2 ]=[3, 5] rad/sec. The ROMs pole locations are shown in
Table 3 for [3, 5] rad/ sec, and it is clearly depicted that the approach suggested
by GJ [11] offers a unstable 1st-order model as it has a pole along the s-plane’s
right half. However, the recommended method yields a stable model as all poles lie
toward the left half of the s-plane. Table 4 compares the approximation errors of
GJ [11], GA [13], GS [14], and IG [15], and the proposed approach. Additionally,
Fig. 2 shows the singular value plots of the error function G og (s) − G red (s) within
intervals [3, 5] rad/sec, where the 4th order ROM is G r ed (s) which is derived by
utilizing the GJ [11], GA [13], GS [14], IG [15] as well as the strategy used for
Example 2. For certain existing methods, an important variation in the eigenvalues
results in a vital approximation error, as shown in Fig. 2. The tabular and pictorial
representation makes it abundantly evident that the suggested strategy outperforms
the current methods in terms of outcomes.
An Innovative Frequency-Limited Interval Gramians-Based Model … 193
4 Conclusion
References
1. Jiang YL, Qi Z, Yang P (2019) Model order reduction of linear systems via the cross Gramian
and SVD. IEEE Trans Circ Syst II 66(2):422–426
2. Moore BC (1981) Principal component analysis in linear systems: controllability, observability,
and model reduction. IEEE Trans Autom Control 26(1):17–32
3. Enns DF (1984) Model reduction with balanced realizations: an error bound and a frequency
weighted generalization. In: Proceedings of conference on decision and control, vol 3, pp
127–132
4. Anderson BDO, Liu Y (1989) Controller reduction: concepts and approaches. IEEE Trans
Autom Control 34:802–812
5. Lin C, Chiu TY (1992) Frequency weighted balanced realization. Control-Theory Adv Technol
1(2):341–351
6. Sreeram V, Anderson BDO, Madievski AG (1995) New results on frequency weighted balanced
reduction technique. In: Proceedings of American control conference, vol 6, pp 4004–4009
7. Varga A, Anderson BDO (2001) Accuracy enhancing methods for the frequency-weighted
balancing related model reduction. In: IEEE conference on decision and control, vol 4, pp
3659–3664
8. Wang G, Sreeram V, Liu WQ (1999) A new frequency weighted balanced truncation method
and an error bound. IEEE Trans Autom Control 4(9):1734–1737
9. Sreeram V (2005) An improved frequency weighted balancing related technique with error
bounds. In: Proceedings of IEEE conference on decision and control, vol 8, pp 3084–3089
10. Kumar D, Sreeram V (2020) Factorization-based frequency-weighted optimal Hankel-norm
model reduction. Asian J Control 22(5):2106–2118
11. Gawronski W, Juang JN (1990) Model reduction in limited time and frequency intervals. Int J
Syst Sci 21(2):349–376
12. Aghaee PK, Zilouchian A, Nike-Ravesh SK, Zadegan AH (2003) Principle of frequency-
domain balanced structure in linear systems and model reduction. Comput Electr Eng
29(3):463–477
13. Gugercin S, Antoulas AC (2003) A time limited balanced reduction method. IEEE Conf Proc
Control 77(8):5250–5253
14. Ghafoor A, Sreeram V (2006) Frequency interval Gramians based model reduction. In: Asia
Pacific conference on circuits and systems, vol 4, no 3. IEEE, pp 2000–2003
15. Imran M, Ghafoor A (2015) A frequency limited interval Gramians-based model reduction
technique with error bounds. Circ Syst Signal Process 34(11):3505–3519
16. Kumar D, Sreeram V, Du X (2018) Model reduction using parameterized limited frequency
interval Gramians for 1-D and 2-D separable denominator discrete-time systems. IEEE Trans
Circ Syst I 65(8):2571–2580
17. Sharma V, Kumar D (2022) SVD-based frequency weighted model order reduction of
continuous-time systems. In: IEEE International conference on power electronics, drives and
energy systems (PEDES), pp 1–4
18. Cheng W, Li Z, Li Y, Chu Y, Xie H, Li B, Tang P, Peng D (2023) A model order reduction method
considering the delay feature of wind power when participating in frequency regulation. In:
IEEE 6th information technology, networking, electronic and automation control conference,
vol 6, pp 737–743
19. Sharma V, Kumar D (2023) Confined frequency-interval Gramian framework-based balanced
model reduction. IETE J Res 1–8
Contrast Enhancement of Medical
Images Using Otsu’s Double Threshold
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 195
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_16
196 R. Vinay et al.
(a) (b)
1 Introduction
To extract useful details from images, digital image processing is essential. Image
contrast enhancement is a critical pre-processing step in the field of imaging.
This stage sharpens the edges, borders, brightness, and gray-level distributions.
This contributes to an image’s visual quality. Figure 1 depicts the effect of image
improvement.
Histogram equalization (HE), a common technique for enhancing image contrast,
is efficient and easy to use. This technique flattens probability distributions and
widens the dynamic range of gray levels. Thus, it enhances an image’s contrast. HE
uses the cumulative density function (CDF) as a transformation function to enhance
an input image’s gray levels. It adjusts an image’s mean brightness to the center of
the dynamic range. Due to this, an enhanced image develops the issues of inten-
sity saturation, mean brightness shift, and over-enhancement. Particularly, the high-
frequency histogram bins are highlighted by HE. The low-frequency histogram bins
are removed, and washed-out effects appear. The drawbacks render this method inap-
propriate for several application fields, including microscopic imaging, fingerprint
recognition, face recognition, medical imaging, voice recognition, satellite imaging,
and aerial imaging [1]. This inspires the writers to research the state of the art in
this field and discover cures for the identified parameters. The remaining parts of the
document have been laid out as follows. The state of the art and a comparative anal-
ysis of several image-enhancing techniques are presented in Sect. 2. The proposed
approach for the improvement of low-contrast medical images is illustrated in Sect. 3.
The discussion is presented in Sect. 4. The work’s conclusion is provided in Sect. 5.
address the mean brightness shift issue in HE by altering the histogram. Khan
et al. [5] proposed a technique segmenting the histogram based on mean or median
intensity values, efficiently improving image quality through HE and normaliza-
tion. However, it lacks color identification. In [6], researchers develop a method
incorporating CDF, PWD, and AGC for video sequence enhancement, maintaining
brightness and contrast balance. Srivastava and Rawat [7] focus on smoothing
with a Gaussian filter, while [8] introduces a technique involving Otsu’s method,
range stretching, HE, smoothing, and normalization. Tiwari et al. [9] addressed
the mean shift problem. Baby and Karunakaran [10] combine bi-level weighted
histogram equalization and adaptive gamma correction for brightness preservation
and contrast improvement. Building on [10, 11] introduces an improved AGC-based
approach, combining RLBHE, and AGC procedures. Qadar et al. [12] proposed a
compromise approach with RLBHE and AGC. Xu et al. [13] used Otsu’s twofold
threshold for histogram division. Sharma and Garg [14] tackle contrast enhancement
in hazy images with an entropy-based algorithm. Khan et al. [15] rectified irregular
intensity expansions. Chen and Chen [16] extended ESIHE, while [17] introduced
AGCCPF. Dhal et al. [18] focused on over-enhancement, employing ROOBHE and
ROEBHE. Wu et al. [19] employed an advanced sparrow search algorithm for medical
image enhancement. Sangeeta et al. [20] advocated for segmented images using the
GrabCut technique. Soujanya et al. [21] proposed a hybrid method combining Otsu
thresholding and CLAHE. In [22], accurate segmentation of retinal vessel trees is
addressed using CLAHE and Otsu thresholding. Thakur et al. [22] introduced a
cuckoo search algorithm-optimized method that addresses glaucoma diagnosis with
contrast-limited adaptive histogram equalization and normalized Otsu thresholding.
Sundaram et al. [23] proposed a method using entropy Otsu thresholding, adap-
tive gamma correction, and CLAHE. Despite these methodologies, challenges like
excessive enhancement, computational complexity, inefficiency with complex back-
grounds, uneven illumination, and poor information retrieval persist. The authors
propose an efficient model incorporating RLBHE, WD, AGC, and HF for enhancing
poor-contrast medical photos with complex backgrounds and numerous elements
(HF).
3 Methodology
This section presents the block diagram and detailed description of the proposed
approach: “Range Limited Double Threshold HE with Adaptive Gamma Correction
and Homomorphic Filtering” (RLDTWHE). Figure 2 represents the block diagram
of RLDTWHE.
198 R. Vinay et al.
Otsu’s double threshold method categorizes the foreground, background, and target
region in an input image. It calculates threshold values for each region, ensuring
high intraclass variance. Otsu’s method establishes the lower and upper bounds for
histogram equalization (HE) to preserve maximum brightness post-segmentation.
Equation 1 defines global thresholds g(T 1 , T 2 ) for foreground, background, and
target regions, based on intraclass variance maximization [24]. L 1 represents the
highest image intensity (255), and T 1 , T 2 are in the 0 to 255 range. W L , W U , W V
are PDFs, and E(I L ), E(I U ), E(I V ) are overall brightness for subdivisions I L , I U , I V .
E(I) is the average luminance of the entire input image [13, 21].
g(T1 , T2 ) = W L (E(I L ) − E(I ))2 + WU (E(IU ) − E(I ))2 + WV (E(I V ) − E(I ))2
(1)
Three sub-histograms result from segmentation with intensity value ranges [li , mi ].
The weighted distribution model minimizes the gap between uniform and gray-
level distributions by adjusting estimated likelihoods. This approach assigns higher
weights to less frequent intensities and lower weights to more frequent ones. Equa-
tion (2) provides the maximum probability (Pmax ), while Eq. (3) gives the minimal
likelihood (Pmin ) of the input image sub-histogram. P(k) represents the approximate
Contrast Enhancement of Medical Images Using Otsu’s Double Threshold 199
likelihood of the kth gray level, and L is the most significant gray level in the input
grayscale image.
The formula to get the cumulative probability density of the ith sub-histogram,
bi , is given in Eq. (4). The lowest and maximum intensity values of the ith sub-
histograms are represented here by li and m i , respectively. The probability of the kth
gray level is P(k). L − 1 is the input image’s highest value for intensity.
∑
mi
bi = P(k) 0 ≤ k ≤ L − 1 (4)
k=li
The probability can be changed using the formula in Eq. (5). It gives less weight
to gray levels that occur more frequently and more weight to gray levels that occur
less frequently [25]. The probability of the kth gray level is given here as P(k). The
weighted probability of the kth gray level is Pw (k). An input image sub-histogram’s
Pmin and Pmax values represent its minimum and maximum probabilities, respectively.
The cumulative probability density of the ith sub-histogram is called bi. The ith
sub-histograms’ li and m i values represent the lowest and maximum intensities,
respectively [22]. Modifying probabilities is useful to achieve the output histogram
with uniform intensity distribution.
bi
P(k) − Pmin
Pw (k) = Pmax li ≤ k ≤ m i (5)
Pmax − Pmin
The sum of probabilities is always unity. But the work in [25] claims that the sum
of weighted probabilities is not always unity. Thus, there is a need to normalize the
resultant weighted probabilities Pw (k).
Equation (6) gives the formula to normalize the weighted probabilities.
PWN (k) is the weighted probability of an input image histogram, and Pw (k) is the
normalized weighted probability.
Pw (k)
Pwn (k) = ∑ L−1 (6)
k=0 Pw (k)
200 R. Vinay et al.
The system equalizes individual sub-histogram separately in this stage. The transfer
function f converts an existing uniform intensity distribution into a new one. As a
result, intensity levels are distributed evenly over the entire range [26]. This results in
the specified input image’s contrast being enhanced. Equations (7), (8), and (9) give
definitions of transformation functions f L (X k ), fU (X k ), and f V (X k ) for individual
sub-histogram foreground, background, and target, respectively. Here, T1 and T2
are Otsu’s double thresholds. CwL (X k ), CwU (X k ), and CwV (X k ) are cumulative
density function (CDF) of each sub-histogram foreground, background, and target,
respectively. X 0 ' is the minimum intensity value, and maximum value of intensity is
L − 1.
Equation (10) gives formula to calculate normalized CDF Cwn (k). The normalized
likelihood density function of an input image histogram is Pwn (k) in this case.
∑
L−1
Cwn (k) = Pwn (k) (10)
k=0
Y = f L (X k )U fU (X k )U f V (X k ) (11)
Weighted normalized CDF function Cwn (k) at kth intensity level, modifies the
value of γ , using the formula given in Eq. (13).
Fig. 3 Comparison of visual quality of MRI image of spinal. a Original, b GHE, c BBHE, d DSIHE,
e AGCWD, f RLBHE, g RLDTMHE, h EASHE, and i RLDTWHE
202 R. Vinay et al.
4 Experimental Results
This section presents the experimental setup, dataset, image quality metrics, and
results for evaluating the RLDTWHE technique.
Data Set: The authors use an openly accessible dataset [19] comprising 50 low-
contrast medical MRI images, including liver, brain, skull, and spine examples shown
in Figs. 3 and 4. These 256 × 256 grayscale images exhibit diverse levels of brightness
and contrast, resulting in various forms of degradation.
Metrics for Evaluation of Image Quality: To assess RLDTWHE-applied image
quality, the authors employ the following quantitative metrics.
Entropy: Entropy, in bits, measures information in an image, with a higher value
indicating more extractable information. High entropy signifies reduced intensity
saturation effects. Image entropy is computed using Eq. (14) with L − 1 as the
Fig. 4 Comparison of visual quality of MRI image of brain. a Original, b GHE, c BBHE, d DSIHE,
e AGCWD, f RLBHE, g RLDTMHE, h EASHE, and i RLDTWHE
Contrast Enhancement of Medical Images Using Otsu’s Double Threshold 203
highest intensity value, P(X k ) as the PDF of the kth intensity level, and E(X k ) as the
image entropy [14].
∑
L−1
E(X K ) = P(X K ) ∗ log2 P(X K ) bits (14)
k=0
A higher PSNR value suggests less noise and higher reconstruction quality.
Contrast: The image’s contrast is defined by the normal intensity range and its
variation around a center pixel. Higher contrast values indicate a broader dynamic
range and stronger enhancement. Equation (17) describes the contrast function, where
r and c denote the width and length of a processed picture [18, 27]. I enh (i, j) represents
the pixel intensity at 2D position (i, j). Formula (18) illustrates the conversion of
contrast (C) to decibels (DB).
| |2
| |
1 ∑∑ ∑r ∑
r c
| 1
c
|
Ccontrast = |
Ienh (i, j ) − |
2
Ienh (i, j )|| (17)
r c i=1 j=1 | r c i=1 j=1 |
∗
Ccontrast = 10Ccontrast (18)
5 Comparative Analysis
The authors compare RLDTWHE with current techniques like GHE [1, 2], BBHE
[3, 4], DSIHE [5, 6], AGCWD [6], RLBHE [24], RLDTMHE [13], and EASHE [20].
MATLAB tests assess RLDTWHE’s performance, with Figs. 3, 4 for visual quality
and Figs. 5, 6, and 7 for quantitative evaluation against previous methodologies.
Fig. 6 Comparison in
values of peak
signal-to-noise ratio values
Fig. 7 Comparison in
contrast values
Contrast Enhancement of Medical Images Using Otsu’s Double Threshold 205
This section assesses visual quality, detecting unnatural appearance, saturation arti-
facts, over-enhancement, and undesired artifacts. Figure 3a–i shows simulation
results applying contrast improvement strategies to a spinal MRI image. Output
images from GHE, AGCWD, RLBHE, and RLDTMHE (Fig. 3b, e, f, g) reveal
over-enhancement and high intensities. Results from BBHE, DSIHE, and EASHE
(Fig. 3c, d, h) appear dark due to insufficient contrast improvement. Figure 3i
displays RLDTWHE’s final image, revealing a natural appearance achieved with
an exceptional threshold.
Figure 4b–i compares knee MRI results, black noise in Fig. 4c, d, g, and h.
Despite the superior quality in Fig. 4b, e, and f, they show noise, over-enhancement,
and washed-out effects. RLDTWHE (Fig. 4i) produces an optimally enhanced bone
MRI, preserving crucial information.
This section assesses statistical variables for contrast improvement, entropy, and
brightness preservation. Figures 5, 6, and 7 demonstrate the effectiveness of contrast
enhancement techniques with RLDTWHE. Entropy, crucial for assessing disease
severity in medical MRI images, is maximized by RLDTWHE, as shown in Fig. 5.
The method outperforms current techniques, avoiding intensity saturation artifacts
and over-enhancement to minimize information loss in processed images.
Figure 6’s experimental results show that the suggested method, RLDTWHE,
outperforms more traditional contrast enhancement methods in terms of PSNR value.
As a result, RLDTWHE does not overstate the level of noise during the contrast
improvement procedure. Additionally, it successfully controls the improvement pace
while maintaining the realistic quality of an image.
The result of the indicator for visual contrast is shown in Fig. 7. The authors’
investigations show that the suggested technique RLDTWHE produces images with
the best contrast, a smooth texture, and non-homogeneous regions when compared
to other improved contrast methods.
Table 2 compares computational complexity for low-contrast test images. The
results in Table 1 show that RLDTWHE outperforms state-of-the-art methods on all
low-contrast test images, taking less time than the “EASHE” technique.
The proposed method, RLDTWHE, outperforms six HE-based contrast enhance-
ment schemes in both qualitative and quantitative analyses. It preserves the
highest brightness and information, delivering a substantial contrast improvement
while effectively maintaining the natural aspect of an image through controlled
augmentation.
206 R. Vinay et al.
Table 2 Comparison of computational complexities (in sec.) for standard low-contrast test images
Images/ GHE BBHE DSIHE AGCWD RLBHE RLDTMHE EASHE RLDTWHE
technique
MRI 0.0625 0.0723 0.1023 1.234 3.867 5.39 8.367 4.367
Brain
MRI 0.0578 0.0712 0.1345 1.45 3.678 6.345 8.489 4.239
Spinal
MRI 0.0645 0.0856 0.987 1.367 2.389 5.829 9.672 3.659
Skull
MRI 0.0615 0.0867 0.129 1.456 3.59 6.278 9.367 4.498
Liver
6 Conclusion
References
5. Khan MF, Khan E, Abbasi ZA (2013) Segment dependent dynamic multi-histogram equaliza-
tion for image contrast enhancement. Digital Signal Process 25:198–223
6. Huang SC, Cheng FC, Chiu YS (2013) Efficient contrast enhancement using adaptive gamma
correction with weighting distribution. IEEE Trans Image Process 22(3):1032–1041
7. Srivastava G, Rawat TK (2013) Histogram equalization: a comparative analysis and a
segmented approach to process digital images. In: Sixth International conference on contem-
porary computing (IC3), pp 81–85
8. Huynh TT, Le B, Lee S, Le-Tien T, Yoon Y (2014) Using weighted dynamic range for histogram
equalization to improve the image contrast. EURASIP J Image Video Process 44(1):1–17
9. Tiwari M, Gupta B, Srivastava M (2014) High-speed quantile-based histogram equalization for
brightness preservation and contrast enhancement. Image Process (IET) 9(1):80–89. https://
doi.org/10.1049/ietipr.2013.0778
10. Baby J, Karunakaran V (2014) Bi level weighted histogram equalization with adaptive gamma
correction. Int J Comput Eng Res (IJCER) 4(3):25–30
11. Gautam C, Tiwari N (2015) Efficient color image contrast enhancement using range limited
bi-histogram equalization with adaptive gamma correction. In: IEEE International conference
on industrial instrumentation and control (ICIC), pp 175–180
12. Qadar MA, Zhaowen Y, Rehman A, Alvi MA (2015) Recursive weighted multi-plateau
histogram equalization for image enhancement. Optic—Int J Light Electron Opt 126(24):5890–
5898
13. Xu H, Chen Q, Zuo C, Yang C, Liu N (2015) Range limited double threshold multi histogram
equalization for image contrast enhancement. Opt Rev 22(2):246–255
14. Sharma P, Garg G (2015) Entropy based optimized weighted histogram equalization for Hazy
images. In: IEEE 9th International conference on industrial and information systems (ICIIS),
pp 1–6
15. Khan MF, Khan E, Abbasi ZA (2015) Image Contrast enhancement using normalized histogram
equalization. Optic-Int J Light Electron Optics 126(24):4868–4875
16. Chen YY, Chen SA (2015) Exposure based weighted dynamic histogram equalization for image
contrast enhancement. Int J Autom Smart Technol 5(1):27–38
17. Gupta B, Tiwari M (2015) Minimum mean brightness error contrast enhancement of color
image using adaptive gamma correction with color preserving framework. Int J Image Process
(IJIP) 9(4):241–253
18. Dhal KG, Sen S, Sarkar K, Das S (2016) Entropy based range optimized brightness preserved
histogram equalization for image contrast enhancement. Int J Comput Vision Image Process
6(1):59–72
19. Wu H, Huang Q, Cheung Y, Xu L, Tang S (2020) Reversible contrast enhancement for medical
images with background segmentation. IET Image Process 14. https://doi.org/10.1049/iet-ipr.
2019.0423
20. Sangeeta K, Divya M, Divyajyothi B (2023) Contrast enhancement of medical images using
Otsu thresholding. In: Sharma H, Shrivastava V, Bharti KK, Wang L (eds) Communication and
intelligent systems. ICCIS 2022. Lecture Notes in Networks and Systems, vol 686. Springer,
Singapore. https://doi.org/10.1007/978-981-99-2100-3_47
21. Soujanya TM, Prasad Babu K (2023) Implementation of Clahe contrast Enhancement & Otsu
thresholding in retinal image processing. IJETMS 7(1):138–153. https://doi.org/10.46647/ije
tms.2023.v07i01.022
22. Thakur N, Khan NU, Datt Sharma S (2022) Cuckoo search optimized histogram equalization
for low contrast image enhancement. In: 2022 Seventh International conference on parallel,
distributed and grid computing (PDGC). Solan, Himachal Pradesh, India, pp 727–732. https://
doi.org/10.1109/PDGC56933.2022.10053265
23. Sundaram R, Jayaraman P, Rangarajan R, Rengasri R, Rajeshwari C, Ravichandran KS (2019)
Automated optic papilla segmentation approach using normalized Otsu thresholding. J Med
Imaging Health Inf 9(7):1346–1353
24. Zuo C, Chen Q, Sui X (2013) Range limited bi-histogram equalization for image contrast
enhancement. Optik—Int J Light Electron Optics 124(5):425–431
208 R. Vinay et al.
25. Gupta B, Agarwal TK (2017) Linearly quantile separated weighted dynamic histogram
equalization for contrast enhancement. Comput Electr Eng 62:360–374
26. Qiu J, Li HH, Zhang T, Ma F, Yang D (2017) Automatic X ray image contrast enhancement
based on parameter auto optimization. J Appl Clin Med Phys 18(6):218–223
27. Yao Z, Zhou Q, Yang X, Yang C, Lai Z (2016) Quadrants histogram equalization with a
clipping limit for image enhancement. In: IEEE 8th International conference on wireless
communication & signal processing (WCSP)
28. Chowdhury AMS, Rahman MS (2016) Image contrast enhancement using tri-histogram equal-
ization based on minimum and maximum intensity occurrence. Current Trends Technol Sci
6(2):609–614
Recognizing Hate Speech on Twitter
with Feature Combo
Abstract The issue of hate speech directed toward women is widespread and has
gained increased attention in recent times. Despite the impressive performance of
machine learning-based models that incorporate textual, user-specific, and social
network features, there is still potential for improvement given the variety of feature
combinations used with the ensemble learning (EL) approach. To fill this gap,
researchers in this study have generated a unique set of features in terms of stance, and
similarity, and combined with machine learning (ML), and EL algorithms to recog-
nize hate speech in Twitter data, and assess the model’s effectiveness. The proposed
novel approach of feature combo with stance and similarity features showed highest
accuracy with ensemble algorithm namely Extreme Gradient Boosting (XGBoost),
of 93.53%, while Support Vector Machine (SVM) algorithm of ML showed lowest
accuracy of 75.67%.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 209
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_17
210 J. R. Saini and S. Vaidya
hate speech or racism and its detrimental impacts on the lives of women has occurred
in recent years, sparking movements like hashtag ‘MeToo’ and demands for changes
to social and legal policies. Nonetheless, misogyny is still pervasive in a number of
settings, such as online forums, the workplace, and politics [1]. Preserving a civil
and secure online environment has made it imperative, and crucial to recognize and
remove such offensive content.
Efforts have been made by the researcher’s in developing automated systems with
ML techniques to recognize the hate speech and further categorize it into different
categories such as aggressive and non-aggressive, hatred and non-hatred, offensive
and in-offensive, violent and in-violent, and so on [3]. The primary requirement is
data collection in the form of tweets, posts, or articles from social media sites such as
Twitter, Facebook, and Sina Weibo using Application Programming Interface (API).
Upon data collection, it is mandatory to perform pre-processing which cleans the
data to remove non-essential elements from it such as punctuations, stop words,
hashtags, and so on. Further, to build automated ML models, hand-crafted features
are generated based on contents, user profiles and behaviors, and social media graph
structure. Additionally, supervised, unsupervised, or semi-supervised models are
developed depending on the availability of labeled and unlabeled dataset [4, 5].
The performance of ML models for hate speech detection solely depends on the
feature extraction and selection process. According to the literature, authors have
extracted several textual, social media, and user-specific features [2]. For example,
authors have used n-grams, gender, location, and length for hate speech detection
[6], while others have identified hateful terms from the text [7]. In other research,
authors have used Term Frequency and Inverse Document Frequency (TF-IDF) [8],
while others have focused on sentimental, syntactic, and grammatical features [9].
However, it was noticed that the state-of-the-art research doesn’t consider stance
features, which may contribute in identifying the viewpoint of the user while gener-
ating hateful messages. Another observation is that the similarity-based features are
largely looked over in the area of hate speech detection.
The user stance on Twitter depicts the opinion and evaluation of the claim. The
stance property requires a specific target toward which the stance can be measured.
Stance can also be measured for textual contents. It can be categorized into agree and
disagree, support and denial, or favor and against, and so on [10]. Stance is similar
to the other areas of Natural Language Processing (NLP) such as text categorization,
sentiment analysis, pragmatic analysis, and so on. However, sentiment and stance
are misinterpreted leading to misuse of sentiment features for stance detection which
requires contextual information for detection [11].
The similarity features compute the resemblance between facts and newly arriving
contents. These features are produced using cutting-edge similarity metrics like
Cosine, Jaccard, Euclidean, and Chebyshev. These features are widely used in the
research for detecting and fact-checking false information. For example, authors
have incorporated similarity measures to compute similarity between true and false
information using factual data [12]. In another research, ‘Content Similarity Measure
(CSM)’ algorithm was proposed to perform automated fact-checking of healthcare
web URLs with an aim to detect misinformation [13].
Recognizing Hate Speech on Twitter with Feature Combo 211
The authors of this study have suggested a novel method for detecting hate speech
that combines stance and similarity features. The research objective is to produce
a supervised learning-based model that uses a combination of stance and similarity
features to classify tweets as hateful or not. The following are the paper’s research
contributions:
1. Extracted new stance and similarity-based features for hate speech detection.
2. Developed ML and ensemble models based on newly extracted features.
A review of the literature is covered in Sect. 2 of the work, methodology is covered
in Sect. 3, experimental results and discussion are presented in Sect. 4, and the work
is concluded and recommendations for future improvements are made in Sect. 5.
2 Literature Survey
In this section, the authors discuss the literature in three aspects: (a) hate speech, (b)
stance features, and (c) similarity features. They are as follows.
Hate speech is an area of concern that shows a blatant intent to cause harm or incite
hatred toward other people [14]. The state-of-the-art research developed models
with ML classifiers by extracting various features such as TF-IDF [3], n-grams,
lexicons, bag-of-words, distance metric, and meta-information [15]. Combining the
feature sets and generating models have shown best results in the area of hate
speech detection [14]. Further, several ensemble approaches were also adapted by
the researchers to detect hate speech. For example, ensemble of recurrent neural
networks and ensemble of LSTM-based models have been suitably implemented.
However, approaches such as stacking, voting, bagging, and boosting with several
combinations of ML classifiers are largely overlooked.
Stance features depict the user’s viewpoint about the tweet speech. Stance is based
on the user’s social activity as well as the written contents. Although stance-based
features are completely looked over in the literature, they have been widely used
in stance detection and veracity assessment problems of NLP [16]. In these prob-
lems, the stance features are categorized based on contents, lexicons, and dialogue-
act features [17]. The content features capture syntactic, grammatical, and lexical
features. Lexicon-based approaches form word cues. For example, authors in research
212 J. R. Saini and S. Vaidya
have developed five categories of word cues, namely belief, denial, doubt, report, and
knowledge. These features have certainly improved the model’s performance [18],
and thus, authors in this research have considered utilization of lexicon and content
features to detect hate speech.
The similarity-based features compute the similarity between two entities. For
example, in a research authors have used Cosine similarity to find resemblance
between input word and vector space. However, except this research authors didn’t
find any study which has used distance metric [15]. It can significantly utilize
hate speech detection techniques by computing the similarity score between newly
arriving tweets and existing verified tweets. Similar approach has been exploited in
misinformation detection problems [19].
Following are the research gaps identified in the domain of hate speech detection.
1. Features such as stance and similarity measure have been largely overlooked in
the literature.
2. Rich combination of features is lacking.
3. Very few models are employed with ensemble algorithms.
3 Methodology
This section describes the dataset employed in the model development, pre-
processing techniques performed for cleaning the data, feature extraction process,
and model development with ML and ensemble algorithms.
In this research, authors have used the gold standard dataset from the literature [20,
21]. The dataset is made up of tweets that were gathered from the Twitter network
in order to identify hate speech against women. It consists of 1, 35, 556 tweets,
consisting of 30 features having textual contents and other related features such as
sentiments, age, gender, and so on. For the ease of execution, authors have extracted
5000 tweets and performed pre-processing. The pre-processing steps include filtering
Recognizing Hate Speech on Twitter with Feature Combo 213
URL links, Twitter usernames, special words, emoticons, and dealing with missing
values. Further, the tokenization step resulted in removal of punctuations, spaces, and
stop words. Thus, 2000 tweets consisting of 1047 hateful tweets and 953 non-hateful
tweets were used for training the model, while 3000 tweets were used for testing
consisting of 1714 hateful tweets and 1286 non-hateful tweets.
This sector expounds the feature set generated for women hate speech detection.
Authors have identified stance features such as supportive and denial-based words
belonging to categories, namely certainty and approximation collected from authentic
sources from the web. For similarity features, authors have used Jaccard coefficient,
Cosine similarity, and Euclidean distance measures from the literature and computed
distance between the input tweets and the verified tweets.
This section explicates the experimental results obtained from the developed models,
understands, and examines the results. Figure 2 displays the average count of stance
features in hateful and non-hateful tweets. It can be seen that in hateful speeches there
214 J. R. Saini and S. Vaidya
are more approximations indicating uncertainty while in non-hateful words there are
more certainty words. It means that hateful speeches are more ambiguous, doubtful
and have a negative stance. Table 3 displays the performance of ML and EL models
across four parameters, namely a, p, r, f 1. General comparison shows that ensemble
Recognizing Hate Speech on Twitter with Feature Combo 215
90
77.3
80 73.5
70 67.2
57.4
60
Average Count
50
40 Hateful
30 Non-hateful
20
10
0
Certainty Approximation
Stance features
performance ranging between 71 and 78% across Jaccard, Cosine, and Euclidean
measures. Individual stance features showed better performance than the combina-
tion of similarity features with an accuracy of 85.67%. Individual similarity features
showed poor performance in terms of p, r, and f1, having values ranging between 69–
74, 65–75, and 67–74%, respectively. The similarity combination showed improved
performance by 8, 4, and 6% of p, r, and f 1, respectively. Figure 3 shows the word
cloud of hateful speeches for women on Twitter. It can be observed that words such
as hate, violence, harassment, and misogamy are seen more often in hateful speeches.
Thus, it can be concluded that ensemble models contribute widely in categorization
of hateful and non-hateful speeches. Further, the stance and similarity features are
efficient in showing better performance of the models. Authors could not compare
the performance of proposed approach with state-of-the-art techniques as there are
absolutely no studies using stance and similarity feature individually or with combo.
In this research, authors have generated two novel features namely stance and simi-
larity. The stance features consist of certainty and approximations which convey the
viewpoint of the users. The similarity features consist of Jaccard coefficient, Cosine
similarity, and Euclidean distance. These features values are computed by mapping
the input with existing verified tweets. Further, ML and ensemble algorithms are
built, and performance is evaluated in terms of a, p, r, and f1. It was concluded that
ensemble models have performed better than ML models, with XGBoost algorithm
showing highest accuracy of 93.53% followed by AdaBoost algorithm with 91.67%
of accuracy. Among ensemble models RF showed poor performance. Among ML
algorithms kNN showed highest accuracy of 86.33% while SVM showed lowest
accuracy of 75.67%. It is also concluded that the feature combo of stance and simi-
larity generates better performance than individual features. In the future, authors
want to test the performance of the proposed feature combo on other datasets. In
addition, more categories of stance features can be identified, and novel similarity
algorithms can also be incorporated.
Recognizing Hate Speech on Twitter with Feature Combo 217
References
1. Rathod RG, Barve Y, Saini JR, Rathod S. From data pre-processing to hate speech detection:
an interdisciplinary study on women-targeted online abuse
2. Subramanian M, Easwaramoorthy Sathiskumar V, Deepalakshmi G, Cho J, Manikandan G
(2023) A survey on hate speech detection and sentiment analysis using machine learning and
deep learning models. Alexandria Eng J 80:110–121
3. Gite S et al (2023) Textual feature extraction using ant colony optimization for hate speech
classification. Big Data Cogn Comput 7(1)
4. Jahan MS, Oussalah M (2023) A systematic review of hate speech automatic detection using
natural language processing. Neurocomputing 546
5. Alkomah F, Ma X (2022) A literature review of textual hate speech detection methods and
datasets. Information (Switzerland) 13(6). MDPI
6. Waseem Z, Hovy D, Hateful symbols or hateful people? Predictive features for hate speech
detection on Twitter
7. Grimminger L, Klinger R (2021) Hate towards the political opponent: a Twitter Corpus study
of the 2020 US elections on the basis of offensive speech and stance detection
218 J. R. Saini and S. Vaidya
8. Alkomah F, Salati S, Ma X, A new hate speech detection system based on textual and
psychological features [Online]. Available: www.ijacsa.thesai.org
9. Firmino AA, de Souza Baptista C, de Paiva AC (2024) Improving hate speech detection using
cross-lingual learning. Expert Syst Appl 235. https://doi.org/10.1016/j.eswa.2023.121115
10. Hardalov M, Arora A, Nakov P, Augenstein I (2022) A survey on stance detection for mis-
and disinformation identification. In: Findings of the association for computational linguistics:
NAACL 2022. Findings, Association for Computational Linguistics (ACL), pp 1259–1277
11. ALDayel A, Magdy W (2021) Stance detection on social media: state of the art and trends. Inf
Process Manag 58(4)
12. Barve Y, Saini JR (2021) Healthcare misinformation detection and fact-checking: a novel
approach. Int J Adv Comput Sci Appl (IJACSA) 12(10):295–303
13. Barve Y, Saini JR (2022) Detecting and classifying online health misinformation with ‘content
similarity measure (CSM)’ algorithm: an automated fact-checking based approach, pp 1–28
14. Nascimento FRS, Cavalcanti GDC, Da Costa-Abreu M (2023) Exploring automatic hate speech
detection on social media: a focus on content-based analysis. Sage Open 13(2)
15. Mossie Z, Wang J-H (2020) Vulnerable community identification using hate speech detection
on social media. Inf Process Manag 57(3)
16. Alsaif HF, Aldossari HD (2023) Review of stance detection for rumor verification in social
media. Eng Appl Artif Intell 119
17. Pamungkas EW, Basile V, Patti V (2019) Stance classification for rumour analysis in Twitter:
exploiting affective information and conversation structure. In: CEUR workshop proceedings.
CEUR-WS
18. Islam MR, Muthiah S, Ramakrishnan N (2019) Rumorsleuth: joint detection of rumor veracity
and user stance. In: Spezzano F, Chen W, Xiao X (eds) Proceedings of the 2019 IEEE/ACM
International conference on advances in social networks analysis and mining, ASONAM 2019.
Association for Computing Machinery, Inc., pp 131–136
19. Barve Y, Saini JR, Kotecha K, Gaikwad H (2022) Detecting and fact-checking misinformation
using ‘veracity scanning model’, vol 13, no 2, pp 201–209
20. Kennedy CJ, Bacon G, Sahn A, von Vacano C (2020) Constructing interval variables via faceted
Rasch measurement and multi task deep learning: a hate speech application
21. Ma J, Gao W, Wong K-F (2018) Detect rumor and stance jointly by neural multi-task learning.
In: The web conference 2018—companion of the World Wide Web conference, WWW 2018.
Association for Computing Machinery, Inc., pp 585–593
Agriculture Yield Forecasting
via Regression and Deep Learning
with Machine Learning Techniques
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 219
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_18
220 A. V. Kadu and K. T. V. Reddy
1 Introduction
Humans started actively managing their land and plants in prehistoric times, also
known as the New Stone Age, which began around 15,000 years ago. India’s economy
relies heavily on agriculture, which meets most of its food needs [1]. Due to the
considerable weather changes and the country’s rapidly increasing population, it
becomes imperative to maintain a balance between the demand for food and its
supply. Various scientific approaches have been integrated into agriculture to harmo-
nize food supply and demand [2]. The significant environmental variances make it
difficult for farmers to develop adaptable and sustainable strategies.
Agriculture accounts for eighty percent of the world’s water usage in India. In
2019, over 70% of the Indian workforce was engaged in agriculture, contributing
significantly to the country’s G.D.P. (20–21%) [3]. The total land area in India has
remained relatively constant at about 5,36,879 thousand hectares between 2000 and
2023. The agricultural sector in India will reach a value of USD 26 billion by 2026,
with 80% of sales occurring in the retail sector, India’s fifth biggest nutrient and
grocery market in the world.
According to such data, consumer appetite for food products has increased across
the nation in rural and urban areas, and the income level is increasing. As a result, the
agricultural industry is seeing various online farming technologies launch and a rise
in cutting-edge technology like drones, block chain, remote sensing technologies,
and geographic information systems (G.I.S.) [4, 5]. Crop production in India follows
a seasonal pattern, with the Kharif season being the most productive and winter
being the least [6]. Crop yield estimation is a valuable tool for farmers to optimize
their production, considering factors like farming practices, pesticide use, weather,
and market prices. Recent advancements in machine learning have notably impacted
agriculture, enhancing crop yield predictions and improving farming practices [7, 8].
A variety of machine learning methods have been used, such as S.V.M, i.e.,
Support Vector machines, A.N.N, i.e., Artificial Neural Networks, and D.L., i.e.,
deep learning, D.T, i.e., Decision Trees. Illustrated in Fig. 1 is a representation of
India’s crop production data, covering the years from 2000 to 2023 [9]. These data
show that most of the country’s agricultural land is wheat and rice. This allocation
played a pivotal role in contributing to more than 84% of the staple grain production
within the nation. India plays a substantial role in the global rice trade, accounting
for approximately 50% of Basmati and Non-Basmati rice exports and conducting
trade with over 150 countries. According to data from the commerce ministry, rice
shipments rose by 13% to 517 thousand metric tons during the initial quarter of
the financial year 2022–2023 [10]. The data representation emphasizes that rice
has the highest production and land use rates. Over the previous year, the agriculture
industry has seen strong export development; in the fiscal year 2022, total rice exports,
including Basmati and Non-Basmati types, would equal USD 7.14 billion. As agri-
culture evolved over the centuries, ancient practices involving celestial observations
and animal sacrifices gave way to increasing research methods—farming during the
Agriculture Yield Forecasting via Regression and Deep Learning … 221
India is primarily an agrarian nation, with nearly half of its population engaged in
agriculture. One of the most concerning issues in the country is the distressing rate of
farmer suicides. According to data from the Indian National Crime Records Bureau
(N.C.R.B.), between 1995 and 2020, over 200,000 Indian farmers tragically lost their
lives. This situation is frequently attributed to their challenges in repaying loans,
which they typically obtain from financial institutions and private lenders. Inade-
quate knowledge about crop cultivation, including which crops are best suited for
each season, exacerbates these problems. This study addresses a significant problem
by using historical agricultural production data to establish a dependable method for
forecasting crop yields at an early stage. Forecasting agricultural yields is challenging
because it depends on many variables, including rainfall, wind speed, soil character-
istics, climate conditions, humidity, and temperature. Complicating matters further,
data for these variables must be sourced from multiple, diverse sources. Despite
222 A. V. Kadu and K. T. V. Reddy
numerous studies on this topic, there is still room for improvement. The study lever-
ages ML-DL techniques to provide a reliable method for agricultural yield predic-
tion. This process means using government data and mathematical models to make
more accurate predictions and assess potential losses. This research offers valu-
able insights to researchers, enabling them to understand better the problem and its
potential solutions in the agricultural domain.
The main aim of this research is to predict agricultural crop yields. To comprehen-
sively comprehend the existing literature, Sect. 2 carefully explores existing theories
and research on predicting crop yields to support our research goals and bridge knowl-
edge gaps. Moving on to Sect. 3, this segment delves into specifics about the study.
In detail, it examines the location, data sources, technology, research approach, and
various methods for predicting crop yields. It also elucidates the inter-connections
among different features within the dataset. Section 4 shows the discussion related
to crop production. Finally, Sect. 5 serves as the concluding segment of this study,
summarizing the work and putting forth suggestions for future research efforts in
this field.
2 Literature Review
Applying machine learning approaches has enabled the estimation of crop yields
[14]. This endeavor used a dataset encompassing variables such as total cultivated
area, using average maximum temperature, water sources, and canal length, like agri-
cultural output forecasting for irrigation. Notably, the computational model devel-
oped in this study outperformed models created using S.N.N, i.e., Shallow Neural
Network, RF, Lasso, and D.N.N. approaches. For dataset validation using projected
weather data, the Root Mean Error (R.M.S.E.) was equivalent to fifty percent of the
average yield and 13% of the standard deviation. Underscores efficacy of machine
learning techniques in improving crop yield predictions. Between 1998 and 2002,
specifically during the Kharif season, a study achieved an impressive accuracy rate
of 97.5%. This success was achieved by analyzing various factors, including rainfall,
crop production, crop yield, and area information. In light of the substantial impact
of rainfall on Kharif crop production, the researchers in the study initiated their work
by utilizing a modular Artificial Neural Network (ANN) for predicting rainfall. After
obtaining the rainfall forecasts, they used S.V.M. to calculate yield using area and
rainfall data for analysis. Combining these two approaches proved highly effective in
enhancing crop yield during the Kharif season. Investigate the ability of the Artificial
Agriculture Yield Forecasting via Regression and Deep Learning … 223
Neural Network to predict two crops: (i) soybean and (ii) corn, especially in unfavor-
able environmental conditions. Assess method performance in estimating localized,
regional, and state data.
Evaluate how the ANN model performs with different parameters. Compare the
evolved ANN model’s performance with other models, such as multiple linear regres-
sion. Artificial Neural Networks were used in a different study to assess rice produc-
tion in several Indian Maharashtra towns. Data from open government records were
collected for all 27 districts in Maharashtra, highlighting the flexibility of Artificial
Neural Networks in predicting crop yields in varied geographical regions. This study
aimed to achieve superior crop yield estimation by utilizing various ML methods,
including K-Nearest Neighbor, i.e., K.N.N., Artificial Neural Network, i.e., ANN,
Random Forest, i.e., R.F., and Support Vector Regression, i.e., S.V.R. The dataset used
for this research consisted of 745 examples. Seventy percent of these examples were
randomly chosen and utilized to train the model, with the remaining thirty percent
saved for testing and performance evaluation. The last analysis revealed that R.F.
demonstrated the highest level of accuracy among the methods employed. In a sepa-
rate research endeavor focusing on Brazil in the south, the study introduced a unique
model for forecasting soybean yield using L.S.T.M. and satellite data. This innova-
tive approach leverages advanced technology to enhance soybean yield predictions
in the region. This research compares the effectiveness of L.S.T.M., N.N., R.F., and
multiple L.R [15]. This analysis uses independent variables such as surface temper-
ature, rainfall, and plant measures to forecast soybean yields. A secondary objective
is determining how early these models can forecast with confidence crop yields. The
proposed methodology predicts the crop grown in each location within the relevant
database [16]. The experimental results under-score the approach’s significant poten-
tial in accurately forecasting agricultural productivity, as validated through real-time
data and stakeholder interactions. Several machine learning (ML) methods, such as
Decision Trees, lasso, and linear regression, are used to forecast agricultural produc-
tion—several of these ML techniques performed less than the Decision Tree (D.T.)
approach. A yield system was implemented using a K.N.N. algorithm. It is important
to note that yield predictions for farmers should consider multiple factors influencing
crop production and quality.
Crop output depends on timing, crop type, and region. To anticipate yield, data
like calendar year, product, region, location, and period are vital. Accurate crop yield
history is essential for managing agricultural risks. In a study, Decision Tree (D.T.),
Random Forest (R.F.), and K-Nearest Neighbors (K.N.N.) Classifiers were evaluated
with Gini and entropy metrics. Random Forest produced the most accurate outcomes.
224 A. V. Kadu and K. T. V. Reddy
3 Proposed Work
India’s land area encompasses a vast 5,36,879 thousand hectares between 2000 and
2023, extending from the north’s snow-capped Himalayan range to the south’s trop-
ical forests. In this expanse, 3,10,721 thousand hectares (between 1980 and 2022) are
designated for agricultural use. To conduct this study, the researchers have selected
ten key Indian crops predominantly cultivated in the region. These crops include
wheat, cotton, jowar, maize, jute rice, bajra, and ragi, representing a comprehensive
examination of major agricultural products in India.
The data are gathered from several publicly accessible government websites,
including https://kaggle.com and https://data.gov.in, which are sources for several
data compilations. The dataset includes important information, including the produc-
tion, district name, crop type, state name, yield, crop year, wind speed, seasons,
area under irrigation, rainfall, humidity, and total area from 1980 to 2022. Data for
the years above agricultural yields are shown visually in Fig. 1. This study uses a
variety of prediction models, such as Long Short-Term Memory Network (L.S.T.M.),
Decision Tree (D.T.), X.G. Boost Regression, Random Forest (R.F.), and Convolu-
tional Neural Network (CNN) to forecast agricultural production for India. To ensure
prediction accuracy, models are assessed using metrics such as root mean squared
error, test loss, accuracy, and standard deviation. The study methodology, depicted
in Fig. 2, involves cross-validation with a K-fold coefficient which is employed in
the training set to evaluate the efficacy of the trained models [17].
3.3 Methods
Divided into k subsets, the dataset is used in cross-validation, where 1…k fold sub-
sets are utilized for the model’s training data, and the remaining subset remains for
validation. This procedure involves the calculation of a performance statistic, often
accuracy, at each iteration. It is a valuable approach, particularly when working with
limited input data. Figure 3 visually represents the flowchart of the method adopted,
demonstrating the steps involved in this process.
Agriculture Yield Forecasting via Regression and Deep Learning … 225
A Decision Tree accomplishes two key tasks: First, it classifies the pertinent elements
for each decision point and then decides the best course of action based on these
selected features [18]. Figure 4 shows the Decision Tree algorithm which handles
both regression and classification simultaneously by assigning a probability distribu-
tion to plausible choices. Each node in the tree represents a feature, branches corre-
spond to selections, and leaf nodes signify outcomes. Choosing a single feature initi-
ates tree construction as the root node. Data splitting is a crucial step for completing
the Decision Tree building.
226 A. V. Kadu and K. T. V. Reddy
Figure 5 illustrates how Random Forest (R.F.) enhances the bagging method to
create an uncorrelated forest of Decision Trees by incorporating feature selection.
It employs methods such as random feature selection, feature randomization, or the
random subspace technique. R.F. focuses on a subset of attribute segments, allowing
it to exploit random feature selection. Each Decision Tree in the ensemble is built
using a bootstrap sample from the training set, with the remaining third reserved for
testing purposes.
3.3.4 XGBoost
In this CNN implementation, there are seven layers. The first layer consists of a CNN
layer with 64 filters of dimensions (3 * 3) and a kernel size of 3. The second layer
employs MaxPooling1D with a pool size of two. A dropout mechanism is applied
in the third layer. The fourth layer, utilizing the ReLU activation function, flattens
the output from layer three. Layer 5 is the flattened output. Layer 6 comprises 330
neurons as a hidden neural network layer. The final layer, Layer 7, employs the
SoftMax function with 11 neurons corresponding to different output types. Figure 7
depicts the CNN architecture of this model.
228 A. V. Kadu and K. T. V. Reddy
Fig. 6 XGBoost
4 Discussion
of crop output for India, with a 98.96% accuracy rate, 1.97 mean absolute error
(M.A.E.), 2.45 root mean square error (R.M.S.E.), and 1.23 standard deviation (SD)
(see Table 1). On the other hand, the accuracy rates for XGBoost and Decision Tree
are 86.46 and 89.78%, respectively, with mean absolute errors of 4.58 and 6.31 and
root mean square errors of 5.86 and 7.89, with corresponding standard deviations of
2.75 and 3.54 [18].
The accuracy ratings of 89.78 and 86.46 for Decision Tree and XGBoost Regres-
sion, respectively, are much lower than those of Random Forest, which received a
score of 98.96% [19]. This shows that, in the context of this study, when machine
learning techniques are used to anticipate India’s agricultural production, Random
Forest performs better than the other three regression approaches.
The reason why machine learning is sometimes called a “black box” technology is
that its predictions are not as interpretable. The exceptional performance of Random
Forest highlights its usefulness in this particular application.
Additionally, changing the training process’s number of epochs significantly
affects the mean absolute error. It is clear from this that Long Short-Term Memory
(LSTM) models are inferior to Convolutional Neural Networks (CNN) as the CNN
use results in reduced loss [20].
The increasing population has made managing food demand and supply more chal-
lenging. Experts have been diligently working to predict agricultural yield produc-
tion. This study focuses on forecasting crop yields in India using a spectrum of ML
and deep learning methods, emphasizing the benefits of advanced methods. Small-
scale farmers can particularly benefit from these predictions, as they can use them
to estimate future crop production and plan their planting accordingly. The study
applies five different ML and DL methods which were used to analyze the dataset:
Long Short-Term Memory Networks (LSTM), Decision Tree (DT), Convolutional
Neural Network (CNN), Random Forest (RF), XGBoost regression, However, there
is a need for more extensive data for crop yearly, with traditionally precise envi-
ronmental and weather information. In terms of performance, CNN outperforms
L.S.T.M. as the loss is less with CNN. Further exploration with deep learning models
is required to identify the most effective method. Integrating remote sensing data with
district-level statistical data is suggested to enhance the accuracy of crop production
predictions. Using satellite imagery for land cover or image classification can lead
to more accurate predictions.
References
1. Kumar CMS et al (2023) Solar energy: a promising renewable source for meeting energy
demand in Indian agriculture applications. 55:102905
2. Lezoche M et al (2020) Agri-food 4.0: a survey of the supply chains and technologies for the
future agriculture. 117:103187
3. Baragde DB, Jadhav AU (2021) Impact of COVID-19 on Indian SMEs and survival strategies.
In: Handbook of research on strategies and interventions to mitigate COVID-19 impact on
SMEs. IGI Global, pp 280–298
4. Martos V et al (2021) Ensuring agricultural sustainability through remote sensing in the era of
agriculture 5.0. 11(13):5911
5. Goel RK et al (2021) Smart agriculture–urgent need of the day in developing countries.
30:100512
6. Sambasivam VP et al (2020) Selection of winter season crop pattern for environmental-friendly
agricultural practices in India. 12(11):4562
7. Vyas S et al (2022) Integration of artificial intelligence and blockchain technology in healthcare
and agriculture
8. Sharma A et al (2020) Machine learning applications for precision agriculture: a comprehensive
review 9:4843–4873
9. Saeed I et al (2020) Basmati rice cluster feasibility and transformation study. 131:434
10. Mukundan A et al (2023) The Dvaraka initiative: mars’s first permanent human settlement
capable of self-sustenance 10(3):265
11. Basso B, Liu LJIA (2019) Seasonal crop yield forecast: methods, applications, and accuracies.
154:201–255
12. Lu Y, Young SJC, Agriculture EI (2020) A survey of public datasets for computer vision tasks
in precision agriculture. 178:105760
13. Zhang Q et al (2020) Applications of deep learning for dense scenes analysis in agriculture: a
review. 20(5):1520
Agriculture Yield Forecasting via Regression and Deep Learning … 233
14. Rashid M et al (2021) A comprehensive review of crop yield prediction using machine learning
approaches with special emphasis on palm oil yield prediction 9:63406–63439
15. Alquthami T et al (2022) A performance comparison of machine learning algorithms for load
forecasting in smart grid 10:48419–48433
16. Hussain N, Sarfraz S, Javed S (2021) A systematic review on crop-yield prediction through
unmanned aerial vehicles. In: 2021 16th international conference on emerging technologies
(ICET), IEEE
17. Saud S et al (2020) Performance improvement of empirical models for estimation of global
solar radiation in India: a k-fold cross-validation approach 40:100768
18. Sharma P et al (2023) Predicting agriculture yields based on machine learning using regression
and deep learning
19. Landi F et al (2021) Working memory connections for LSTM 144:334–341
20. Khan ZA et al (2020) Towards efficient electricity forecasting in residential and commercial
buildings: A novel hybrid CNN with a LSTM-AE based framework 20(5):1399
Performance Comparison of M-ary
Phase Shift Keying and M-ary
Quadrature Amplitude Modulation
Techniques Under Fading Channels
Abstract Modulation techniques and channel conditions play a major role in the
development of wireless communication systems. Information can be sent easily
and with little to no error if the right channel conditions and modulation technique
are used. To reduce errors during data transmission, an effective communication
system must be developed. Most of the research has been done on either M-ary
phase shift keying (MPSK) or M-ary quadrature amplitude modulation (MQAM)
under Rician or Rayleigh fading channels. However, works that consider M-ary phase
shift keying (MPSK) and M-ary quadrature amplitude modulation (MQAM) under
Rician, Rayleigh, Nakagami-m, and AWGN fading channels are rarely investigated.
This research examines two fundamental types of digital modulation techniques: M-
PSK (M = 2, 8, 16) and M-QAM (M = 4, 8, 16, 64). It also discusses the different
types of channels that are used between transmitter and receiver, such as AWGN,
Rayleigh, Rician, and Nakagami fading channels, and evaluates the receiver’s BER
performance characteristics for each of these channels. The MATLAB simulation
results demonstrated that, for the same degree of signal-to-noise ratio, the bit error
rate for digital modulations (M-P SK, M-QAM) under an AWGN channel is lower
than that achieved over a Rayleigh fading channel. When using the M-PSK modula-
tion technique, the Rician fading channel has more communication impairment than
AWGN. When using the M-PSK modulation technique, the Nakagami-m fading
channel outperformed the Rayleigh fading channel with the same SNR value.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 235
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_19
236 T. A. Abose et al.
1 Introduction
In the last ten years, the field of wireless communication has advanced due to the
increasing demand for voice and multimedia services on mobile wireless networks.
Digital modulation is one of the main underlying technologies that enables digital
data to be transported or transmitted across analog radio frequency (RF) channels,
as presented by Dai et al. [1].
By improving the wireless network’s capacity, speed, and quality, digital modu-
lation techniques help to accelerate the growth of mobile wireless communications.
There is work focusing on various modulation algorithms that encode digital data
using a finite number of phases. PSK is a type of phase modulation in which the
input bits cause the carrier to phase shift to one of a limited number of potential
phases. The QAM technique is aimed at increasing the spacing between symbols
in a constellation by modulating the carrier’s amplitude and phase, as presented by
Xiong [2].
Compared to analog modulation methods, digital modulation schemes are better
able to transmit an enormous amount of information. Examining the relative bit
error rate performance of various modulation schemes in AWGN, Rayleigh, Rician,
and Nakagami fading channels is necessary to achieve optimal results when taking
reflectors and obstructions in wireless propagation channels into account. The trans-
mitted signal is impacted by AWGN noise as it travels along the channel. Over a
specific frequency range, it has a continuous, uniform frequency spectrum. Rayleigh
Fading, which is the total of all the reflected and dispersed waves, is the signal that is
received at the receiver when there is no LOS path—only an indirect path—between
the transmitter and the receiver, as presented by Rajesh et al. and Kumar et al. [3, 4].
A stochastic model for radio propagation variance that is impacted by a radio
signal’s partial dissolution on its own is called a Rician fading channel. The signal
travels over multiple pathways before arriving at the receiver, at least one of which
is changing or fluctuating. When the broadcast signal can follow a leading LOS or
straight path to the receiver, it is helpful for showcasing mobile wireless communica-
tion systems. The bit error rate, or BER, is calculated by dividing the total number of
bits transmitted, received, or processed within a specified time frame by the number
of bits with errors, as presented by Abose et al., Rajesh et al. and Kumar et al. [3–5].
Most of the research has been done on either M-ary phase shift keying (MPSK) or
M-ary quadrature amplitude modulation (MQAM) under Rician or Rayleigh fading
channels. However, works that consider M-ary phase shift keying (MPSK) and M-ary
quadrature amplitude modulation (MQAM) under Rician, Rayleigh, Nakagami-m,
and AWGN fading channels are rarely investigated.
The rest of the paper is organized as follows: Sect. 2 presents the system model,
and Sect. 3 presents results and discussion. Section 4 concludes the paper.
Performance Comparison of M-ary Phase Shift Keying and M-ary … 237
2 Proposed Method
A BER of digital modulations over various fading channels will be shown in this
section. The ratio of the total number of erroneous bits to the total number of trans-
mitted bits is known as the bit error rate. It is a crucial means of managing transmis-
sion superiority. A percentage is frequently used to express it. It can be caused by
bit synchronization issues, distortion, interference, or noise.
Totalnumberoferrorbits
BER = . (1)
Totalnumberoftransmittedbits
In this subsection, the BER analysis of MQAM and MPSK under AWGN will be
presented.
For large energy per bit to noise power spectral density (Eb/No) and M > 4, the
BER expression can be written as [6]:
(/ )
2 2Ebloglog2M π
BERM − PSK = Q sinsin , (2)
loglog2M No M
(/ )
1 Eb log 2M π
BERM − PSK = erfc sin
log 2M No M
(/ )
1 m Eb π
= erfc sinsin , m = loglog 2M, (3)
m No M
where Eb is signal power, N0 is noise power, Q is the Q function, and erfc is the
complementary error function.
The BER for a rectangular M-QAM (4-QAM, 8-QAM, 16-QAM, and 64-QAM)
is given as:
( ) (/ )
2 1 3Ebloglog2M
BERM − QAM = 1− √ erfc , (4)
loglog2M M 2(M − 1)No
( ) (/ )
2 1 3m Eb
= 1− √ erfc . (5)
m M 2(M − 1)No
238 T. A. Abose et al.
In this subsection, the BER analysis of MQAM and MPSK under the Rayleigh fading
channel will be presented.
The BER of the BPSK modulation under the Rayleigh fading channel can be
expressed as [7]:
( / )
1 γ
BERM − PSK, rayleigh = 1− (6)
2 1+γ
2
BERM − QAM, rayleigh ≈
loglog2M
√ ⎛ ⎡ ⎞
( )∑ M |
1 2 | 1.5(2i − 1) 2
γ loglog2M
1− √ ⎝1 − √ ⎠, (8)
M i=1 M − 1 + 1.5(2i − 1)2 γ loglog2M
( )
2 1
BERM − QAM, rayleigh ≈ 1− √
m M
√
M ( / ). (9)
∑ 2
1.5(2i − 1) No
Eb
m
1− .
i=1
M − 1 + 1.5(2i − 1)2 No
Eb
m
In this subsection, the BER analysis of MQAM and MPSK under the Nakagami
fading channel will be presented.
For a noisy phase reference in a Nakagami-m fading channel, the estimated
average BER expression of MPSK is expressed as [8]:
Performance Comparison of M-ary Phase Shift Keying and M-ary … 239
/
γ
2 −1 log log 2M (2n+1)π
M
∑ 1
BERMPSK ∼
= √
M m
2 π log log 2M ( γ
)m+ 21
n=0 1 + log log 2M (2n+1)π
M m
( ) [ ]
⎡ m+2 1
1 m
· · 2 F1 1, m + ; m + 1; ,
m+1 2 m + log log M (2n+1)π
m
γ
(10)
where 2 F1(·) is the Gaussian hypergeometric function, γ is the average SNR per bit,
and m is the Nakagami-m fading parameter.
The expression of the average BER of M-QAM over the Nakagami fading channel
is [8]:
√
1 ∑
l2 M
BERMQAM =√ pb (k), (11)
M k=1
1
pb(k) = √
M
√
( −k
1−2 ) M−1{
∑ [ ]( ])(
[ 2m(M−1) )
i.2
√
k−1
i.2k−1 1 (2i+1)·3 log log 2 Mγ o
1 2M − √ k−1
+ √
i=0
M 2 ⎡(m) · Ωm π
( )( )}
1,2 2(M − 1)m 1
−G 2,2 | 1 − m, − m 0, − m ,
(2i + 1)2 3 log log 2 (M)γ o 2
(12)
In this subsection, the BER analysis of MQAM and MPSK under the Rician fading
channel will be presented.
With Rician parameter K and diversity N, the probability of symbol error for
MPSK across Rician fading channels can be stated as [9, 10]:
[ ]
( ) N π2∫− Mπ expexp − N +K K( M
π
)θ
γ +( M )θ
π
1 N+K
ps(E) = ( )
( π ) N dθ. (13)
π γ N +K
− π
γ
+ M
θ
2
240 T. A. Abose et al.
With diversity N, mean symbol SNR, Rician parameter K, and M-QAM over
Rician fading channels, the probability of symbol error is as follows [9, 10]:
[ ]
(N + K )(M − 1)
p(γ ) ≈ 0.2 log log 2M
(N + K )(M − 1) + 1.5γ
[ ]
k1.5γ
exp exp − γ . (14)
(N + K )(M + 1) + 1.5
In this section, the simulation results of the bit error rate (BER) performance of
different types of digital modulation schemes will be presented and discussed.
MATLAB 2021 is used for simulating and generating results.
3.1 Results
AWGN Channel
Figure 1 shows the bit error rate curve for the BPSK, 8-PSK, 16-PSK, 4-QAM, 8-
QAM, 16-QAM, and 64-QAM modulation schemes using MATLAB software under
the AWGN channel. Comparing the figures, it can be said that for M-ary modulation,
when the value of M increases, the BER performance of the system will be degraded.
The signal-to-noise ratio for different modulation schemes is also shown in Table 1.
Rayleigh Fading Channel
Bit error rate curves for BPSK, 4-QAM, 8-QAM, 16-QAM, and 64-QAM modulation
using MATLAB software under the Rayleigh fading channel are displayed in Fig. 2. It
illustrates how, with M-QAM, the bit error rate increases as the value of M increases.
The system’s performance suffered as a result. In addition, the system’s performance
will decrease in the same order as the value of M grows. To reduce BER, raise the
signal-to-noise ratio for all M-QAM and BPSK modulations. The signal-to-noise
ratio for different modulation schemes for Rayleigh fading channel is also shown in
Table 2.
Rician Fading Channel
The M-PSK modulation scheme’s BER performance curve is shown in Fig. 3 in
relation to the SNR value in a Rician fading environment, considering additive white
Gaussian noise (AWGN). The graph shows that when the signal-to-noise ratio (SNR)
Performance Comparison of M-ary Phase Shift Keying and M-ary … 241
rises, the BER value falls and tends to zero. The signal-to-noise ratio and BER for
Rician fading and AWGN channels are also shown in Table 3.
Nakagami-M Fading Channel
In the context of a Nakagami fading environment, Fig. 4 displays the BER perfor-
mance curve of the M-PSK modulation scheme in relation to the SNR value for both
the Rayleigh fading channel and additive white Gaussian noise (AWGN). The graph
shows that the value of BER drops and tends to zero as the SNR value rises. The
242 T. A. Abose et al.
signal-to-noise ratio and BER for Rician fading and AWGN channels are also shown
in Table 4.
3.2 Discussion
Bit BPSK and 4-QAM have roughly the same BER for the AWGN channel. It is
observed that for M-PSK and M-QAM, the bit error rate increases as the value of M
increases. The system’s performance suffered as a result. We can conclude that 8-ary
QAM is superior to higher-order QAMs and that 8-PSK is superior to 16-PSK. In
addition, M-QAM outperforms M-PSK in the same order. Based on a comparison
244 T. A. Abose et al.
with Fig. 1, it can be concluded that for M-ary modulation, the system’s performance
will deteriorate as M rises. To reduce BER, raise the signal-to-noise ratio for all M-
QAM and M-PSK modulations. Figure 1 shows that the BER for BPSK and 4-QAM
is around the same. Therefore, 8-PSK, 16-PSK, 8-QAM, 16-QAM, and 64-QAM
perform worse than 4-QAM and BPSK modulations. Approximately the same BER
is found for BPSK and 4-QAM in the Rayleigh fading channel. It can be observed
that for M-QAM, the bit error rate increases as the value of M increases. The system’s
performance suffered as a result. We can conclude that 8-QAM outperforms higher-
order QAMs. In addition, the system’s performance will decline in the same order as
the value of M grows. To reduce BER, raise the signal-to-noise ratio for all M-QAM
and BPSK modulations. The BER of BPSK and 4-QAM is about equal. As a result,
the performance of 4-QAM and BPSK modulation is superior to that of 8-QAM,
16-QAM, and 64-QAM. Comparing BER under the Rician fading channel to the
AWGN channel, the AWGN’s BER value decreased with the minimum SNR value;
that is, the Rician fading channel causes the highest communication impairment
when compared to the AWGN under the M-PSK modulation scheme. In the case of
the Nakagami fading channel, the BER value tends to zero as the SNR value rises,
i.e., a higher signal-to-noise ratio results in better performance. According to the
graph, as compared to the Rayleigh fading distribution and the Nakagami fading
distribution channel, the AWGN’s BER value decreases with the lowest SNR value.
In the graph under the M-PSK modulation scheme, the Nakagami fading channel
has fewer BERs than the Rayleigh fading channel with the same value of SNR.
4 Conclusion
This study examines bit error rate (BER) performance for AWGN, Rayleigh, Rician,
and Nakagami fading channels for M-PSK and M-QAM signals. Depending on bit
error rate, it is evident that the best modulation techniques are 4-QAM and BPSK.
According to simulation data, the bit error rate for M-PSK and M-QAM modulations
increases as the number of bits in a symbol grows, i.e., from 1, 2, 3, 8, 16, 64.
Comparing additive white Gaussian noise to the Rayleigh fading channel, the former
performs reasonably well. Additionally, for the same value of signal-to-noise ratio,
the bit error rate for digital modulations (M-PSK, M-QAM) under an AWGN channel
is lower than that achieved over a Rayleigh fading channel. Based on the simulation
findings, it is observed that under the M-PSK modulation scheme, the Rician fading
channel has more communication impairment than AWGN. With the same SNR value
in the graph under the M-PSK modulation scheme, the Nakagami fading channel
performs better than the Rayleigh fading channel. The future work of this research
will extend the work to include other digital modulation schemes, fading channels,
such as Weibull, and different comparison characteristics, such as power efficiency
and bandwidth efficiency.
Performance Comparison of M-ary Phase Shift Keying and M-ary … 245
References
1. Dai JY, Tang W, Yang LX, Li X, Chen MZ, Ke JC, Cui TJ (2019) Realization of multi-
modulation schemes for wireless communication by time-domain digital coding metasurface.
IEEE Trans Ant Propagat 68(3):1618–1627
2. Xiong F (2006) Digital modulation techniques. London
3. Abose TA, Olwal TO, Hassen MR, Bekele ES (2022) Performance analysis and comparisons
of hybrid precoding scheme for multi-user MMWAVE massive MIMO system. In: 2022 3rd
international conference for emerging technology (INCET). IEEE, pp 1–6
4. Rajesh V, Rajak AA (2020) Channel estimation for image restoration using OFDM with various
digital modulation schemes. In: Journal of physics: conference series (vol 1706, No 1, pp
012076), IOP Publishing
5. Kumar S, Anjaria K, Sadhwani D (2021) Performance analysis of efficient digital modulation
schemes over various fading channels. AEU-Int J Electron Commun 141:153963
6. Sharma N, Jain D, Bhatt K, Themalil MT (2021) Performance comparison of various digital
modulation schemes based on bit error rate under AWGN channel. In: 2021 5th international
conference on computing methodologies and communication (ICCMC). IEEE, pp 619–623
7. Farzamnia A, Hlaing NW, Mariappan M, Haldar MK (2018) BER comparison of OFDM with
M-QAM modulation scheme of AWGN and Rayleigh fading channels. In: 2018 9th IEEE
control and system graduate research colloquium (ICSGRC). IEEE, pp 54–58
8. Bahuguna AS, Kumar K, Pundir YP, Alaknanda V, Bijalwan V (2021) A review of various
digital modulation schemes used in wireless communications. Proceed Int Intell Enable Netw
Comput IIENC 2020:561–570
9. Karim HK, Shenger AE, Zerek AR (2019) BER performance evaluation of different phase shift
keying modulation schemes. In: 2019 19th international conference on sciences and techniques
of automatic control and computer engineering (STA), IEEE, pp 632–636
10. Patra T, Sil S (2017) Bit error rate performance evaluation of different digital modulation
and coding techniques with varying channels. In: 2017 8th annual industrial automation and
electromechanical engineering conference (IEMECON). IEEE, pp 4–10
Conception of Indian Monsoon
Prediction Methods
Abstract India is the largest economy of South Asia and a rising economy in the
world. More than 20% of its bedrock is formed by the agriculture sector which is
greatly impacted by the Indian monsoon. It plays a crucial role in the growth of
several crops and water resources and decides many natural calamities like floods
and droughts that can affect human beings severely. Therefore, the monsoon has
been under research fraternity for centuries. With gradually rising temperatures and
changing monsoon patterns, India is experiencing anomalies in precipitation occur-
rences. Intermittent intensified rain can cause engulfed floods, landslides, loss of
farmer’s harvests, damaged roads and commuting problems that affect the common
man every day. Similarly, many states faced water shortages leading to crop failure,
starvation and spread of many diseases. Both excess and scarcity of rainfall can bring
famine and result in a bad economy. All these consequences can be prevented if the
onset of the monsoon can be estimated before its arrival. Therefore, this study is
intended toward understanding the nature of monsoon and classification of method-
ologies implemented for its early prediction so far. Following an analysis of every
model currently in use, it was discovered that deep learning models outperform all
other models. However, as the monsoon is a complicated system of atmospheric and
oceanic connection, it remains a matter of research to identify the points at which
predictability weakens.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 247
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_20
248 N. Goyal et al.
1 Introduction
Monsoon, the literal meaning of this word is seasonal reversal of winds. This name
was given by the Arabic sailors who benefited from the reversal of wind systems
in crossing the sea while exploring sea routes to India. It was noted that in the
winter the wind blew from north to east, while in the summer it blew from south to
west. This mechanism resulted to receive a lot of rainfall in the month of June, July,
August, and September in the affected area. The duration from June to September
since then is termed as monsoon or south west monsoon and is often called as
rainy season. Likewise, the winter monsoon occurs from October to March, is also
known as north east monsoon, and is comparatively less well-known than the summer
monsoon. Rainfall refills aquifers, rivers, ponds, reservoirs and other water bodies
that are necessary for all living forms as well as for several industries. This year [1]
maximum regions of India have experienced heaviest rainfall in decades because of
monsoon surge and western disturbances. This downpour has caused neighboring
rivers to overflow, washed away buildings in floods, landslides, destroyed bridges,
roads and interfered with power and energy. Therefore, it is vital to constantly be
aware of the rainfall pattern to make important decisions like food production, water
resource management that will affect both socioeconomic and scientific issues.
A multitude of unpredictable elements influence the Indian monsoon, including
Himalayas, ENSO Cycle, SST anomalies, Indian Ocean dipole, ocean currents, dry
spells and western disturbances.
The climate of India and the beginning of the monsoons are significantly influ-
enced by the Indian Ocean and the Himalayan Mountain range by blocking the
very cold winds coming from the north. Sir Henry Blanford [2] was the first person
who came up with seasonal monsoon forecasts in 1886 and explained in 1876 that
how Himalayan’s snow can affect the climatic condition of plains of north western
India causing heavy rain and snow on the Indian side but drought in Tibet. The
ENSO cycle [3], which is brought on by the discrepancy between air pressure and
sea surface temperature weakens India’s monsoon. Even rising CO2 [4], aerosols
and dust particles [5, 6] are other considerable factors which have been proven to
affect monsoon rainfall. The monsoon is significantly influenced by each of these
regulating factors.
they are capable of self-learning and thus, making decisions with better accuracy.
Many existing climate models perform averagely since there are several uncertainties
in the climate.
This study aims to comprehend the monsoon and the forecasting techniques used
since ancient times. The remaining sections of the paper are structured as follows.
The relevant previous work has been described in Sect. 2. Section 3 elucidated the
categorization of weather prediction approaches based on the methodology employed
and forecast length. Section 4 presents a tabular description of the comparative study
of the approaches that were mentioned in Sect. 3. Section 5 eventually concludes
and explores the future scope.
2 Related Work
In [17] prediction of Indian summer monsoon (ISM) has been defined as a function
of sea surface temperature anomalies using ANN techniques. Root mean square
(RMS) error was relatively less when used ANN technique when compared with
regression technique. Likewise, [18] have put efforts to improve forecasting skills
using correlation analysis of SST indices from different Nino regions on the Indian
summer monsoon rainfall Index (ISMRI) with a lag period of one to eight seasons.
Inferring that the link between the Nino indices and ISMRI is essentially nonlinear
in character, a comparison of the findings showed that the ANN model has superior
prediction abilities than all the linear regression models examined.
In [19] artificial feed-forward neural networks along with backpropagation prin-
ciples have been utilized to analyze the meteorological data. Another article [20]
proposed a prediction model by joint clustering of monsoon years and predictions
using random forest regression algorithm. In [21] an activation function named Tanh
axon has been taken along with ANNs to estimate monsoon rainfall. Several nonpara-
metric tests were run in [22] to analyze rainfall trend by recognizing the abrupt change
point in time using ANN multilayer perceptron model. Even the Internet of Things
was used in [23] to create a local weather predicting system with ANN. Using deep
learning, [24] proposed a model which produced results four times higher in reso-
lution than linear interpolation. Have proposed a model based on long short-term
memory to forecast rainfall in Jimma, Ethiopia using six parameters [25]. It [26]
aimed at creating a drought vulnerability map (DVM) for West Bengal using deep
learning.
A brief report has been presented on changes in monsoon variability in [27] with
respect to observed characteristics of rainfall pattern, role of anthropogenic forcing
and extreme events. To predict north east monsoon rainfall several techniques such as
linear regression, ANN, and extreme learning machine techniques were used in [28]
and it was found that statistically extreme learning estimated the monsoon prediction
with minimal error. In [29] it has been described that the error prediction capability
of statistical methods is independent of dynamical situations using Lorenz-63 model.
Empirical orthogonal functions have been used in [30] to analyze the rainfall across
India on a regional basis. It was found that 5D data may be reduced to 1D with 80%
accuracy and 2D with 90% accuracy. It [31] explored whether it would be possible
in near future for deep learning models to completely replace the numerical weather
prediction models. A study has been presented in [32] which examines weather
forecasts on a dataset of London by developing a tensor flow framework in deep
learning algorithms.
After carefully examining the relevant research conducted recently, it is evident
that weather predicting methods have been improving steadily from the past to the
present. But there are many gaps and difficulties in the complicated and constantly
changing subject of weather prediction. Academicians are actively attempting to fill
these gaps, for example:
1. Weather forecasting is still difficult for time spans longer than two weeks. The
sub seasonal and seasonal forecasts have an impact on agriculture, water resource
Conception of Indian Monsoon Prediction Methods 251
In ancient times there was no scientific method to predict weather. It was strongly
dependent on visual observations in which the prediction of monsoon was made based
on the appearance of various environmental phenomena such as clouds, moon shape,
color, humidity, direction of winds and rainbow as shown in Fig. 1. All the climatic
variations related to air, ocean, and atmospheric pressure forming an environment
were collected and drawn on a paper known as weather maps or synoptic charts.
These maps were one of the oldest tools to forecast weather and were very helpful
to sailors at that time. The visualization-based approach to predict weather was a
hypothetical and slow process.
Fig. 1 Traditional
forecasting methodology
Humidity
Wind
Clouds
Direction
Synoptic
Charts
Lunar
Ocean
Waves Patterns
Temperature
Demeter
Machine
MITgcm
Learning
POM
MOM
Deep
Learning Models
Broadly there are two ways to categorize the modern weather prediction
techniques as shown in Fig. 2.
Numerical Weather Prediction:
Considering the atmosphere as a collection of various gases, numerical prediction
techniques are based on principles of fluid mechanics and science of physics. All the
data related to the atmosphere and ocean is collected using radars and satellites. Then
a bunch of mathematical calculations as explained in Eq. (1) depicting atmosphere
and ocean are employed on current data and converted into computer code to be
solved using supercomputers at laboratories of atmospheric science. The workflow
of numerical methods has been shown in Fig. 3.
(−−−→)
P̂(I S M) = Φ P D E (1)
Post Interpretation
Data Model Model Preprocessing Continuous
Visualization Communication Verification
Collection Initialization Integration and Analysis Improvement
several assumptions are made to satisfy climatic constraints and if it results in even a
small error then after many folds this error will multiply and affect the final forecast.
Therefore, the forecast accuracy lasts only for a few days.
Numerical methods require deep knowledge of meteorology and atmospheric
science which is basically implemented by meteorologists. Data scientists can help
meteorologists by analyzing large datasets to increase accuracy of forecast using a
data-driven approach also known as statistical methods.
Statistical Weather Prediction
Statistics methods forecast the climate on the grounds of past dataset using machine
learning and neural networks, especially deep learning techniques which provide
efficient, user-friendly libraries and massive computational power to better under-
stand the complexities of the issue. There is a huge collection of rainfall data every
year but to analyze this bulk data using traditional methods was a complex process.
The emergence of machine learning techniques accelerated this task by automating
the pattern’s detection and learning from data. Δ
It makes the future prediction F about the likely outcome for any system S based
on data D extracted from the past time series t of the system S to be predicted under
−
→
the influence of various predictors P that can affect the forecast as described in Eq.
(2).
∑
t=n
−
→
F̂(S) ≈ ϕ P (D(S)) t = m where n > m (2)
t=n
−
→
Value of D(S) is dependent on weather affecting empirical value of predictors P
like ENSO cycle, SST indices, etc. The general working of any statistical algorithm
has been shown in Fig. 4.
Accuracy of the system S is calculated by finding the root mean square error Δ
between actual observed value V and predicted value V at each ith observation as
explained by Eq. (3):
⎡
|∑ ( )
| N V (i ) − V (i) 2
Δ
√ i=1
RMSE = (3)
N
The generic workflow of statistical models starts with problem statements, for
example, long range monsoon forecasting. To initiate with the model, a big dataset
of annual and sub-divisional rainfall of past years is required. This dataset is available
at various data sources like IMD, Kaggle and data.gov. After getting data, it must
be loaded and preprocessed to handle missing values. Finally, an appropriate model
with respect to the nature of the problem is applied and trained using past data given
as input. Trained model is then evaluated using different measures like confusion
matrix, accuracy, root mean square error, precision, sensitivity, specificity and F1
score. Evaluation is followed by model validation which checks model performance
on unknown dataset. Results of the trained model are then analyzed to learn about
the stated problem.
Machine Learning Algorithms: Machine learning is an area of artificial intelli-
gence (AI) that concentrates on algorithm development that allows computers to learn
and make predictions or judgments without human intervention. Machine learning
utilizes a diverse set of algorithms as given below:
1. Linear Regression: In this model, the output variable(y) is a function of only
one independent variable(x). For monsoon prediction, the future values of time
series data are calculated as linear function of previous values. It is preferred
for continuous spectrum of output. A basic linear regression equation with one
independent variable can be written as shown in Eq. (4).
y = mx + b (4)
y = b0 + b1 ∗ x1 + b2 ∗ x2 + ... + bn ∗ xn (5)
/ ∑1=m
RFr = 1 m yi (7)
i=1
RFc is the random forest’s anticipated result for classification problems. RFr is
the random forest’s anticipated result for regression problems.
yi represents the expected result of the ith decision tree.
4. Support Vector Machine (SVM): It is a supervised algorithm. It can clearly distin-
guish between two classes of data by finding a hyperplane that best divides the
data into distinct classes and is effective in high-dimensional domains. It is suit-
able for outlier’s detection. Bioinformatics, image classification, text categoriza-
tion, and other fields all make extensive use of SVM. In case of monsoon predic-
tion, it is good at classifying whether rainfall will occur or not. Mathematically,
SVM method can be described as given in Eq. (8).
y(x) = w.x + bi f y(x)is greater than 0 then rain will occur (8)
where
P(ci /F) is revised probability of class ci given set of features F, P(F/ci ) is the
probability of the features set F for a certain class ci , P(ci ) is classical probability of
class ci and
P(F) is observation probability of features set F.
Typically, it is employed for categorization jobs and might not be the first option
when predicting time series, as in the case with monsoon forecasting. So, problems
Conception of Indian Monsoon Prediction Methods 257
like sentiment analysis, song suggestions can be better performed using Naïve Bayes
algorithm.
6. Time Series Analysis: It refers to a set of statistical techniques for analyzing and
forecasting data that is dependent over time. It finds its utility in areas like stock
market prediction, weather forecast, census analysis, etc. Commonly used models
that perform time series analysis are autoregressive integrated moving average
(ARIMA) and seasonal autoregressive integrated moving-average (SARIMA).
Deep Learning Algorithms: Deep learning is a subset of machine learning.
Inspired by working of brain neurons, data is processed through a network of inter-
connected artificial neurons that receive and process signals. Such networks are
known as artificial neural networks (ANN). Between nodes and edges there are
some weights. Output of one neuron goes as input to neurons of the next layer based
on some learning function and weight adjustment. Deep learning models can address
complicated topics such as weather forecasting and can enhance performance as data
availability increases.
There are several neural network topologies, each intended for a particular set of
applications and data kinds as given below:
1. Feed-forward Neural Networks (FNNs): These are artificial neural networks in
which information flow is unidirectional from the input to the hidden layer and
then to the output layer. They are utilized for a variety of applications, including
picture and text categorization.
2. Convolution Neural Networks (CNNs) are a type of neural networks that are
designed for processing data having grid-like structure, especially image and
video. It has several layers, including convolution layers, pooling layers and
fully linked layers.
3. Recurrent Neural Network (RNNs): It operates by sequentially processing data
in steps. Hidden states of RNNs serve as memory. At each time step, memory
is updated with the input data and the prior concealed state. They are suitable
for applications such as time series prediction, speech recognition and natural
language processing.
4. Long Short-Term Memory (LSTM): It is based on RNNs architecture. As neural
networks can learn long-term connections between data time steps, they may be
used to learn, analyze and categorize sequential data. The purpose of LSTMs is
to extract long-term and short-term dependencies from the data.
Based on how long it takes to forecast, the abovementioned methods for predicting
the monsoon can be divided into three categories: short-range, medium-range and
long-range forecasting. These precipitation trends can be used to predict the monsoon
drift.
258 N. Goyal et al.
The forecast has a shorter time horizon of less than 72 h. Compared with medium-
range and long-range forecasts, short-range forecasts are more accurate. Short-term
forecasts are extremely valuable to the needs of the public, pilots, farmers and
navigators.
This forecast is valid for a maximum of 10 days. It can assist farmers in scheduling
their agricultural operations and people’s travels. It is also known as subsea-
sonal forecasting. Ensemble forecasting methods are widely used in medium-range
forecasting.
The time window of a long-range forecast varies from a month, season or a year.
Its accuracy is much less when compared with short- and medium-range forecasts.
But it is very helpful to economists and scientists. Accuracy of these forecasts can
further calculate the monsoon drifts on the same scale.
For long-range forecasting of southwest monsoon [33] explains a novel statistical
ensemble forecasting method. [34–36] examined long-term patterns in annual and
seasonal precipitation at 16 stations in the upper Nile River Basin, both long-term
and short-term trends in several districts of Odisha and Malaysia, respectively. It has
been described in [37] that even human activities, land usage patterns can be used to
predict long-term precipitation trends.
4 Comparative Analysis
Table 2 lists the categorization of all weather prediction techniques based on length
of time for which they are forecasting. Each approach has a discrete accuracy and
application area based on the duration.
Table 3 lists the prototype, strength, weakness and application areas of various
machine learning algorithms. The usefulness of an algorithm is contingent upon
260 N. Goyal et al.
several elements, including the magnitude of the dataset, the type of meteorolog-
ical information, the availability of computing power and the forecasting objec-
tive, encompassing short-term, long-term, regional and worldwide forecasting. A
combination of these algorithms is used by modern weather forecasting systems to
maximize their benefits and minimize their drawbacks.
5 Conclusion
Given its influence on almost half of the world’s population, seasonal prediction
of Indian rainfall is an important topic of research. From ancient methodologies to
current statistical and numerical methods for rainfall prediction, there has been a
great deal of progress in the field of forecasting. While numerical methods simulate
the actual atmosphere but still gives average result due to multiplication of error in
each iteration, statistical methods which learn data prediction using machine learning
can supplement the numerical methods. The use of ANN models is more encouraged
than regression models in statistical approaches since multiple study findings have
demonstrated that the association between Indian summer monsoon rainfall (ISMR)
and environmental characteristics may be better-stated and connected by nonlinear
techniques.
Improving weather prediction’s precision and accuracy is still the major objec-
tive. Numerous worldwide elements, like the SST, ENSO effect and others have a
substantial impact on it. Therefore, sensitivity analysis of the variables influencing
the monsoon should be carried out for a comprehensive study of monsoon prediction.
262 N. Goyal et al.
References
1. Hindustan Times (2023) Why North India is facing unusually heavy rains, explained
2. Blanford HF (1884) II. On the connexion of the Himalaya snowfall with dry winds and seasons
of drought in India. Proceed Royal Soc London 37(232–234):3–22
3. https://mausamjournal.imd.gov.in/index.php/MAUSAM/article/view/5932
4. Goswami BB, An SI (2023) An assessment of the ENSO-monsoon teleconnection in a warming
climate. NPJ Clim Atmosph Sci 6(1):82
5. Asutosh A, Vinoj V, Wang H, Landu K, Yoon JH (2022) Response of Indian summer monsoon
rainfall to remote carbonaceous aerosols at short time scales: Teleconnections and feedbacks.
Environ Res 214:113898
6. Debnath S, Govardhan G, Saha SK, Hazra A, Pohkrel S, Jena C, Ghude SD (2023) Impact
of dust aerosols on the Indian Summer Monsoon Rainfall on intra- seasonal time-scale. Atm
Environ 305:119802
7. Wiston M, Mphale KM (2018) Weather forecasting: from the early weather wizards to modern-
day weather predictions. J Climatol Weather Forecast 6(2):1–9
8. Risiro J, Mashoko D, Tshuma DT, Rurinda E (2012) Weather forecasting and indigenous
knowledge systems in Chimanimani District of Manicaland, Zimbabwe. J Emerg Trends Educ
Res Policy Stud 3(4):561–566
9. Balehegn M, Balehey S, Fu C, Liang W (2019) Indigenous weather and climate forecasting
knowledge among Afar pastoralists of north eastern Ethiopia: role in adaptation to weather and
climate variability. Pastoralism 9(1):1–14
10. Palmer TN, Alessandri A, Andersen U, Cantelaube P, Davey M, Delécluse P, Thomson MC
(2004) Development of a European multimodel ensemble system for seasonal-to-interannual
prediction (DEMETER). Bull Am Meteorol Soc 85(6):853–872
11. Adcroft A, Hill C, Campin JM, Marshall J, Heimbach P (2004) Overview of the formulation
and numerics of the MIT GCM. In: Proceedings of the ECMWF seminar series on numerical
methods, recent developments in numerical methods for atmosphere and ocean modelling, pp
139–149
12. Mellor GL (1998) Users guide for a three dimensional, primitive equation, numerical ocean
model program in atmospheric and oceanic sciences. Princeton University Princeton, NJ
13. Pacanowski RC, Dixon K, Rosati A (1993) The GFDL modular ocean model users guide.
GFDL Ocean Group Tech Rep 2(46):08542–10308
14. DelSole T, Shukla J (2002) Linear prediction of Indian monsoon rainfall. J Clim 15(24):3645–
3658
15. Tripathi KC, Agarwal R, Hrisheekesha PN (1997) Global prediction algorithms and
predictability of anomalous points in a time series
16. Liyew CM, Melese HA (2021) Machine learning techniques to predict daily rainfall amount.
J Big Data 8:1–11
17. Tripathi KC, Rai S, Pandey AC, Das IML (2008) Southern Indian Ocean SST indices as early
predictors of Indian summer monsoon
18. Shukla RP, Tripathi KC, Pandey AC, Das IML (2011) Prediction of Indian summer monsoon
rainfall using Niño indices: a neural network approach. Atmospheric Res 102(1–2):99–109
19. Abhishek K, Singh MP, Ghosh S, Anand A (2012) Weather forecasting model using artificial
neural network. Procedia Technol 4:311–318
20. Saha M, Chakraborty A, Mitra P (2016) Predictor-year subspace clustering based ensemble
prediction of Indian summer monsoon. Adv Meteorol
21. Singh BP, Pravendra K, Tripti S, Singh VK (2017) Estimation of monsoon season rainfall and
sensitivity analysis using artificial neural networks. Indian J Ecol 44:317–322
22. Praveen PB, Talukdar S, Shahfahad Mahato, S., Mondal, J., Sharma, P., & Rahman, A. (2020)
Analyzing trend and forecasting of rainfall changes in India using non- parametrical and
machine learning approaches. Scientific Rep 10(1):10342
Conception of Indian Monsoon Prediction Methods 263
23. Najib F, Mustika IW (2022) Weather forecasting using artificial neural network for rice farming
in Delanggu village. In: IOP conference series: earth and environmental science (vol 1030, no
1). IOP Publishing, p 012002
24. Kumar B, Chattopadhyay R, Singh M, Chaudhari N, Kodari K, Barve A (2021) Deep learning–
based downscaling of summer monsoon rainfall data over Indian region. Theoret Appl Climatol
143:1145–1156
25. Endalie D, Haile G, Taye W (2022) Deep learning model for daily rainfall prediction: case
study of Jimma Ethiopia. Water Supply 22(3):3448–3461
26. Saha S, Kundu B, Saha A, Mukherjee K, Pradhan B (2023) Manifesting deep learning algo-
rithms for developing drought vulnerability index in monsoon climate dominant region of West
Bengal India. Theoretic Appl Climatol 151(1–2):891–913
27. Singh D, Ghosh S, Roxy MK, McDermid S (2019) Indian summer monsoon: extreme events,
historical changes, and role of anthropogenic forcings. Wiley Interdisciplin Rev Clim Change
10(2):e571
28. Dash Y, Mishra SK, Panigrahi BK (2019) Predictability assessment of northeast monsoon
rainfall in India using sea surface temperature anomaly through statistical and machine learning
techniques. Environmetrics 30(4):e2533
29. Mittal AK, Singh UP, Tiwari A, Dwivedi S, Joshi MK, Tripathi KC (2015) Short-term predic-
tions by statistical methods in regions of varying dynamical error growth in a chaotic system.
Meteorol Atmos Phys 127:457–465
30. Tripathi KC, Mishra P (2019) Empirical orthogonal functions analysis of the regional Indian
rainfall. In: Innovations in computer science and engineering: proceedings of the sixth ICICSE
2018. Springer Singapore, pp 127–134
31. Schultz MG, Betancourt C, Gong B, Kleinert F, Langguth M, Leufen LH, Stadtler S (2021)
Can deep learning beat numerical weather prediction? Philosophic Transact Royal Soc A
379(2194):20200097
32. Zenkner G, Navarro-Martinez S (2023) A flexible and lightweight deep learning weather
forecasting model. Appl Intell 53(21):24991–25002
33. Kumar A, Pai DS, Singh JV, Singh R, Sikka DR (2012) Statistical models for long-range
forecasting of southwest monsoon rainfall over India using step wise regression and neural
network
34. Tabari H, Taye MT, Willems P (2015) Statistical assessment of precipitation trends in the upper
Blue Nile River basin. Stoch Env Res Risk Assess 29:1751–1761
35. Panda A, Sahu N (2019) Trend analysis of seasonal rainfall and temperature pattern in
Kalahandi, Bolangir and Koraput districts of Odisha, India. Atmosph Sci Lett 20(10):e932
36. Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A (2021) Rainfall
forecasting model using machine learning methods: case study Terengganu Malaysia. Ain
Shams Eng J 12(2):1651–1663
37. Falga R, Wang C (2022) The rise of Indian summer monsoon precipitation extremes and its
correlation with long-term changes of climate and anthropogenic factors. Sci Rep 12(1):11985
AI-Integrated Smart Toy for Enhancing
Cognitive, Emotional, and Motor Skills
in Toddlers
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 265
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_21
266 S. Bansod et al.
1 Introduction
The cerebral and behavioral development of toddlers from the age of 1 to 4 years old
within daycare and preschool environments, coupled with the integration of AI and
IoT technologies, presents a multifaceted landscape of opportunities and challenges
[1]. Caregivers and teachers at preschools and daycares use many educational apps
and digital books. Attempts to use the Internet of toys in education are in progress.
Toys with integrated sensors inviting toddlers to play with it will help track their
movements more easily [2]. Facilities provided at preschools and daycares play a
crucial role in fostering a child’s growth by providing a structured setting for cogni-
tive, social, and emotional development. The interaction with parents, peers and
caregivers, exposure to various activities, all contribute to shaping a child’s holistic
growth trajectory [3]. However, it is imperative to acknowledge the intricate inter-
play of factors like sleep patterns, social interactions, and engaging activities, which
can significantly influence an infant’s development during daycare years [4]. As the
growth of every child is different, a more personalized approach will help in providing
a better educational system. The advent of AI and IoT has resulted in innovative
solutions to monitor and enhance infants’ development in daycare settings. These
technologies offer real-time data collection, analysis, and personalized suggestions
for optimal growth. AI-driven analytics can identify patterns in a child’s behavior,
enabling parents and caregivers to tailor activities that align with individual develop-
mental needs. Nevertheless, challenges arise concerning privacy, data security, and
potential overreliance on technology [5]. Striking a balance between human inter-
action and technological intervention remains crucial to ensure infants receive the
holistic care and nurturing environment they require. This paper proposes a concept
with the evolving landscape that leverages AI and IoT judiciously, integrating their
potential benefits while upholding caregivers’ essential role in shaping infants’ cogni-
tive and emotional development within daycare settings using the design processes
and methods.
2 Literature Review
Through various online search tools, a thorough literature review has been conducted
on various keywords, such as early childhood development, artificial intelligence use
in the development of technology, cognitive development, emotional development,
user experience, etc.
AI-Integrated Smart Toy for Enhancing Cognitive, Emotional … 267
During infancy, a remarkable phase of rapid growth and learning unfolds. The devel-
oping brain exhibits its highest adaptability, forming the bedrock of cognitive abili-
ties [1]. Infants embark on a journey from ground zero, acquiring skills like walking,
talking, object categorization, and adept manipulation. Simultaneously, they grasp
the art of emotional regulation and social interaction. The constraints of infant percep-
tion and cognition, along with attention and responsiveness, delineate conceptual,
and practical boundaries [3].
In early development, the desired environment includes the necessary variety of audi-
tory tones for language or visual stimuli for sight and the essential emotional support
and career familiarity [6]. Five out of six notable interactions revealed a startlingly
consistent pattern that supports the idea that maternal and child-care factors interact
to shape children’s attachment. Understanding and promoting factors that facili-
tate healthy brain development and optimal cognitive growth across these domains
during early childhood is crucial [5]. Furthermore, there is a growing acknowledge-
ment of physical activity’s significance as a determinant of cognitive and neural
functioning in middle childhood and adulthood, in addition to its physiological and
psychosocial benefits. Systematic reviews and meta-analyses suggest that increased
physical activity levels can enhance cognitive functioning and academic achieve-
ment in school-aged children and reduce the risk of age-related cognitive decline,
dementia, and Alzheimer’s disease in adults [7].
Numerous studies have explored the profound impact of diverse play, musical, and
creative activities on the holistic development of children aged 1–4 years. These
early formative years represent a critical period for cognitive, social, emotional,
and physical growth [6]. Research highlights the importance of imaginative play in
nurturing cognitive abilities and language proficiency, as youngsters participate in
symbolic expression and problem-solving through endeavors such as make-believe
games [8]. Exposure to music can enhance auditory processing, rhythm perception,
and socio-emotional skills. Furthermore, the creative arts, have been shown to nurture
self-expression, fine motor skills, and emotional regulation [9]. This literature review
synthesizes existing research to elucidate the multifaceted benefits of play, music,
and creative activities on the development of young children, shedding light on their
268 S. Bansod et al.
pivotal role in shaping the future cognitive and emotional well-being of this age
group [7].
3 Design Methodology
An online user study using a survey was conducted with 20 users, including parents
and day care providers between the ages of 25 and 50, to learn their opinions on the
significance of early childhood development and its current state at daycare centers
and preschools. The questions revolved around understanding the challenges faced in
tracking the child’s development during the early stage, devices and technology used
at the daycares and kindergartens currently, and parents’ requirements and expecta-
tions. The research revealed that 60% of respondents said they weren’t satisfied with
the activities conducted at the daycare. In total, 85% of respondents felt that activities
held at daycare or preschools can be helpful in the cognitive development of their
AI-Integrated Smart Toy for Enhancing Cognitive, Emotional … 269
children. Activities held at daycare and preschools can be helpful in the emotional
development of their children; in total, 60% of respondents felt this.
A competitor study was carried out (shown in Table 1). Competitors are the apps/
systems that help track the child’s activities and care for the baby.
ProCare is the only app which includes most of the features. But, it doesn’t focus
on a more personalized course suggestion. The only common feature that is present
in the current applications is regular communication with parents.
Contextual inquiry models such as flow and sequence were explored to understand
the user groups, their mental model and their behavior. The cultural model Fig. 1
helps in understanding the values of the user groups and the factors that influence
their work/decisions. The sequence model, Fig. 2, was made to understand the steps
associated with the user’s trigger, intent, and pain points.
270 S. Bansod et al.
Fig. 1 Cultural model showing various stakeholders and how they affect the primary user
The ideation phase included brainstorming, mind mapping, and affinity mapping.
Following the ideation phase, a proposed solution includes a smart toy integrated
with AI. It will interact with the kids based on the inputs, collect the responses,
analyze the data, and present it to parents/caregivers through tables and visualization.
The concept focuses on features such as AI analytics for analyzing the collected data,
smart toys integrated with face detection cameras for recognizing the child, natural
language processing (NLP) for speech analysis, smart toys connected to a dash-
board using technology like data transmission through Wi-Fi/Bluetooth to a cloud
platform, alerts, personalization and detailed information for tracking the child’s
development. Figure 3 shows a pictorial representation of the smart toy features. The
system is primarily designed for daycares and preschools where personalized atten-
tion and tracking the development of each child can be difficult. It keeps in mind the
accessibility of all the users, and the toy is made considering the toddler’s physique.
Figure 4 shows high-fidelity prototypes of the dashboard.
Primary Users of the Dashboard:
• Parents (for tracking the system to check their child’s growth)
• Caregivers (for inputting the required data and keeping track of all children’s
progress and requirements)
Smart Toy Features
• Light alert: When the children come close to the toy or go far away from it.
• Speakers for conducting the activities and give instructions based on the inputs.
• A screen for displaying the visuals while taking activities.
• Microphone with voice detection sensors for capturing the child’s responses.
• Camera with face detection sensor for recognizing the child.
Dashboard Features
• It will show the parents/caregivers the child’s progress in terms of cognitive,
emotional, and motor skill development.
• It will show the record of the child’s attendance.
• It will show all the activities the child participated in and the progress made during
it.
• It will have AI-generated feedback for the parents and suggest to them about what
the child is good at.
• It will have personalized features for alerts and notifications.
The proposed concept Fig. 5 shows the flow of the smart toy collecting information
and getting reflected on the dashboard.
The proposed design consists of two parts: (1) a smart toy and (2) a dashboard.
AI-Integrated Smart Toy for Enhancing Cognitive, Emotional … 273
The dashboard was tested with five users by using the System Usability Test method.
A prompt was provided to the users, who were told to perform the task and then rate
it on a scale of 5 points. The score was calculated using the formula. Table 2 shows
the result and the calculated SUS score.
Prompt: Log in to the dashboard, check musical activities under the activities
category, and check the details of the cognitive progress made last month in the
sound-recognizing activity.
Early childhood education helps in the development of children. This paper proposes
a design concept to help track toddlers’ activities at daycares and kindergartens and
analyze their cognitive, emotional, and motor skill development. Their brain devel-
opment is highly receptive to learning and skill acquisition during this period. No
current app or system helps track the child’s motor, cognitive and emotional devel-
opment. The research results were useful in identifying issues with early childhood
development. The user study helped understand the parent’s and caregiver’s prob-
lems. It was found that the toddlers’ cognitive and motor development required a
more personalized approach. Research also revealed the many kinds of technology
that can be applied to give parents and children aged 1–4 years of enhanced learning
opportunities. A competitor study was carried out to know more about the features
used in the current system and what is lacking in it. Conceptualization was done to
propose a design concept with the help of current technological trends such as AI
and IoT sensors. A smart toy design has been proposed with a flow of technological
use for collecting and processing the information. A dashboard will be connected to
274 S. Bansod et al.
the smart toy, displaying the processed information about the child’s development to
the parents or teachers. User testing helped in understanding if the usability of the
dashboard is good or not.
The overall design system proposal will help provide a better training experi-
ence for toddlers by conducting various activities. This learning will help them set
the stage for future academic success and well-being. It will provide a more mean-
ingful experience to the toddlers at preschools and daycares. Parents can leave their
kids and go to work carefree without worrying about their child’s growth. With all
the personalized features, they will feel connected to their kids, even if they aren’t
always physically available. As this is a learning solution, it can be implemented
where toddlers come together and have a chance to learn, for example, daycares and
preschools.
AI-Integrated Smart Toy for Enhancing Cognitive, Emotional … 275
Based on the primary and secondary research, it was observed that there is a need
for technological intervention in tracking the development of toddlers. Very rapid
development takes place till the age of 4 among toddlers. This is when they need an
appropriate learning environment and personal attention depending on their growth
and needs. But keeping track of every child’s activity and progress in preschools
and daycares is difficult. It was observed that even though applications for tracking
a child’s activities exist, they don’t entirely focus on the development factor. The
competitor’s analysis highlights the current industry trends and already-established
applications/systems. Once basic training is provided to the teachers/caregivers about
how each activity helps the child and how the smart toy can be used for tracking the
responses, it would be easier for them to give the inputs to the system. The feedback
received during the usability testing helped improve the design, and accordingly,
iterations were made. The SUS score after calculation was 83, which comes under
the acceptable range on the scale and shows that the system’s usability is good.
References
1. Shpancer N (2020) Daycare and its effects. In: Community in childhood, 13 Feb 2020
2. Ihamäki PHK (2018) Smart, skilled and connected in the 21st century: educational promises
of the internet of toys (IOTOYS)
3. Richards CS (2020) The Cambridge handbook of infant development. Cambridge University
Press, Cambridge
4. Caspar Addyman LM (2016) Practical research with children. In: Jess Prior JVH (ed).
Routledge, p 334
5. Jin L (2019) Investigation on potential application of artificial intelligence in preschool
children’s education. J Phys Conf Ser
6. Malik F, Marwaha R (2022) Development stages of social emotional development in children.
Stat Pearls Publishing, Treasure Island
7. Bowlby J (1969) Attachment and loss. Br J Psychiatr 116(530):428
8. Harley KM (2016) Early child development and nutrition: a review of the benefits and
challenges of implementing integrated interventions. Adv Nutr
9. Miendlarzewska EA, Trost WJ (2014) How musical training affects cognitive development:
rhythm, reward and other modulating variables. Front Neurosci VII:357–363
10. Komis V, Karachristos C, Mourta D, Sgoura K, Misirli A, Jaillet (2021) Smart toys in early child-
hood and primary education: a system review of technological and educational affordances.
Appl Sci 11(18)
11. Ling L (2022) The use of internet of things devices in early childhood education: a systematic
review, 7 Jan 2022
Thumbnail Personalization in Movie
Recommender System
1 Introduction
In recent years, Over-the-Top (OTT) platforms have revolutionized the way viewers
engage with entertainment. Today, the streaming industry is larger than ever before
with dozens of platforms including Netflix, Amazon Prime, Hulu, Disney+, etc.,
competing for customer attention and revenue. Each of these services uses a
subscription-based system. To retain users and offer personalized experiences, these
platforms rely on a recommendation system. A recommendation system as in [1]
analyzes prior user behavior and movie preferences to recommend movies that are
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 277
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_22
278 M. B. Baikadolla et al.
most relevant to a user. Recommender systems are of three broad types: content-
based, collaborative, and hybrid [2]. Content-based approaches focus more on
comparing items to other items in order to find similarities to those already liked,
while collaborative approaches focus more on comparing users to each other in order
to find similar interests and tastes. Hybrid approaches combine these two methods
to leverage their strengths [3, 4].
Thumbnails are static images that represent the recommended movie on streaming
platforms. These serve as a preview to a piece of media, through which the user may
gauge whether it is of interest to them. Traditionally, OTT platforms have used a
single thumbnail, often the original poster. However, services such as Netflix have
begun to personalize not only their recommendations, but also personalize their
thumbnails through dynamic thumbnails that vary in the actors, settings, or other
features according to the user. This underutilized feature can enable even greater
personalization of the user experience and drive more user engagement. The main
objective of the hybrid system is to propose a movie recommendation system which
uses dynamic thumbnails to personalize the user experience further. The proposed
system employs a hybrid method that fuses collaborative filtering using triangle
similarity and content-based filtering using cosine similarity to recommend movies.
Additionally, the proposed system selects custom thumbnails [5] using the prefer-
ences of user and global interests. The main aim of the proposed work is to introduce
a dynamic thumbnail selection algorithm that can be accessible to both large and
small-scale movie recommender systems.
Section 1 discussed the importance of hybrid recommender systems and the impact
of thumbnails. In Sect. 2, a detailed literature review on collaborative filtering in the
recommender system and personalized thumbnails effect is done. Section 2 explains
about the proposed hybrid intelligent recommender system with personalized thumb-
nails effect. In Sect. 4, the results are explored. Section 5 briefs the conclusions
followed by future work in Sect. 6.
2 Literature Review
Recommender system is always biased towards popular items. Authors in [6] intro-
duced a personalized recommender system which manages this popularity bias
problem effectively by representing less popular items with increased ranks. This
re-ranking method is a post-processing process in the recommender system. Authors
in [7] proposed a personalized movie recommender system using collaborative
filtering. This system is a user-to-user collaborative filtering approach in recom-
mending movies and calculating the similarity between the users using Euclidean
distance. The similarity is taken on the basis of demographic data of the user such as
gender, age, occupation, and area of living. Euclidean distance has several limitations
when used on larger datasets which the authors in [8] resolve by eliminating redun-
dant data and achieving dimensionality reduction using proposed similarity measure
[9]. In [10], a fuzzy similarity measure is proposed which supports huge amounts
Thumbnail Personalization in Movie Recommender System 279
of data and does not comprise the similarity. Euclidean distance cannot capture the
similarity between optimistic and pessimistic users even if they have similar tastes.
The proposed measure by authors in [11] found the exact similarity between two
instances which could not be achieved by using Euclidean distance. It is also not
resilient to outliers, which can skew recommendations and lead to biased results.
Additionally, the study dataset has a sparsity of over 90%, which can further compli-
cate the use of Euclidean distance. The study also evaluates the impact of time on
user preferences and found that incorporating the time of rating has resulted in a 3%
increase in precision and a 4% increase in recall.
An effective recommender system is proposed in [12] using cuckoo search. In their
work, the authors propose a movie recommendation system that utilizes a collabo-
rative filtering approach along with a clustering method to group users into clusters
using their movie ratings. The proposed system then employs a metaheuristic opti-
mization technique called cuckoo search algorithm to optimize the weights of the
recommendation algorithm and improve the accuracy of movie recommendations.
The cuckoo algorithm uses breeding behavior of birds and is known for its effec-
tiveness in optimization problems. By combining this algorithm with the collabo-
rative filter approach and clustering method, the proposed hybrid system achieves
a minimum RMSE of 1.23104 and MAE of 0.697293, with a total of 68 clusters.
Authors in [13] proposed a hybrid system for recommending movies using an intel-
ligent system. The hybrid approach of this proposed recommender system combines
both collaborative and content-based filters. The collaborative filter module uses
Singular Value Decomposition (SVD) to give a predicted score for each movie.
However, the proposed system has a strong bias toward the genre of a movie, which
doubles the predicted score of all movies with a genre explicitly liked by the user
and removes any movies with a genre explicitly disliked by the user. Such a heavy
focus on a single feature may lead to biased and homogeneous results. The proposed
system also employs a content-based filter that uses the cosine measure to revise the
ranking of recommended movies. Additionally, the system implements an intelligent
system as the final filter for results. This proposed expert system uses fuzzy logic
as a series of 144 IF-ELSE statements to determine the importance of each movie
based on several factors.
A comparison of collaborative and hybrid approaches [3] for recommendation of
movies is done in [14]. The first approach utilizes a purely collaborative filter with
an adjusted cosine similarity measure, while the second approach was a system
that combines collaborative-based with a content-based filter using the TF-IDF
method. The authors observed that the collaborative approach outperformed the
hybrid approach, despite the hybrid system including movie data alongside user
data. The content-based filtering system used is factored only in the synopsis and
title of the movie, which may have limited its effectiveness as it did not consider other
relevant features such as genre, actors, and directors. Using TF-IDF technique with
the synopsis could be beneficial, and the reliability of the results could be impacted
if the synopsis is sourced from non-standardized sources. The technique would be
safer to incorporate with objective features such as genre, cast, or a movie’s produc-
tion values. Authors in [15] highlighted a study involving presenting participants
280 M. B. Baikadolla et al.
with titles that had either familiar or unfamiliar artwork and then asking them to rate
their interest in watching the title and their perceived genre of the title based on the
artwork. The study shows that users prefer familiar artwork, which can increase their
intention to watch a particular title. In contrast, unfamiliar artwork can create uncer-
tainty about the genre and decrease users’ interest in watching a title. The authors
suggest that this has significant implications for Netflix’s personalization and user
engagement strategies. By using custom artwork that is tailored to a user’s prefer-
ences and viewing history, Netflix can increase user engagement and improve their
overall viewing experience.
Authors in [16] propose a method to use user clicks on recommended items as
a means of conveying user preferences to the recommendation system. The authors
suggest that clicks on recommended items not only represent an explicit positive
feedback but also convey a more nuanced message about the user’s interests. This
approach captures user preferences and interests more accurately than traditional
explicit feedback methods like rating or liking. The system also reduces user effort
as it only requires a simple click to convey a message rather than rating or providing
feedback. However, there are limitations to this approach. Firstly, it assumes that
users will only click on items that they are interested in, which may not always be
the case. Users may also click on items for reasons other than interest, such as to add
it to a watchlist or to view later. Secondly, the system relies on the recommended
items being visible to the user. If a user is interested in a particular item that is not
recommended, they may not click on any of the items, and hence this system will
not receive any feedback on their preferences.
The usage of custom thumbnails is proposed in [17] as an effective strategy for
improving personalization and enhancing user engagement on online video plat-
forms, particularly on Netflix. The authors argue that custom thumbnails can have
a significant impact on user behavior and viewing choices on the platform. Netflix
uses various data-driven methods to select and test custom thumbnails for its titles.
These methods include A/B testing, machine learning, and human evaluation. By
analyzing viewing patterns and user feedback, Netflix aims to identify the most effec-
tive thumbnail for each title. The authors suggest that custom thumbnails are a key
aspect of Netflix’s overall strategy to provide personalized and engaging content to its
users. The authors argue that the use of custom thumbnails reflects Netflix’s commit-
ment to enhancing user experience and personalization on the platform. A novel
hybrid system EntreeC is introduced in [18] which fuses collaborative filtering and
knowledge-based to suggest recommendations and for performance improvement.
The efficiency of collaborative methods is enhanced by including semantic ratings
obtained from the knowledge-based recommendations. The recommender system
performance depends on the similarity measure [19]. In this proposed method, an
in-depth review is experimented with different similarity measures in collaborative
filtering recommender systems for datasets like Jester, MovieLens1M, and Movie-
Lens100k. The author found that AMI correlation measure best suits the item-based
collaborative approach. The author also concluded that the measure performance can
be improved when dataset density, dataset sparsity, cold start situation, data quality
is considered and integrating with user preferences. Deep learning algorithms were
Thumbnail Personalization in Movie Recommender System 281
Thumbnails play a critical role in the success of OTT platforms such as Netflix. As
users scroll through the vast selection of titles, thumbnails must capture their attention
quickly to entice them to click and start watching [22]. As Netflix estimates they only
have up to 90 s to grab a user’s attention, selecting the right thumbnail is essential for
user engagement and can make or break their business. By displaying a snapshot of the
title, users get a sense of what the show or movie is about, and it can help them decide
whether to watch it or not. However, selecting the right thumbnail from millions of
frames can be challenging, which is why Netflix employs a combination of computing
and human efforts [23]. Using AVA or Aesthetic Visual Analysis [24], a combination
of tools and algorithms, the system can filter through millions of frames to identify
potential thumbnail candidates. The objective factors such as brightness, stillness,
and focus help to determine the best candidates and other factors like the actors
featured, maturity filters, and frame diversity. Once potential thumbnail candidates
are identified, they are sent to a creative team for finishing touches and the addition
of other data like the title logo. After the team has developed a couple of thumbnails,
A/B testing is conducted to determine which one has the highest click-through rate.
Authors in [25] discussed some of the open problems in Netflix using different
machine learning algorithms and various issues while designing, interpreting A/B
tests. Netflix finds that thumbnails with expressive facial emotions that convey the
tone of the title perform particularly well.
A personalized method is proposed in [26] for a thumbnail selection based on
recommendation framework based on image quality assessment, image accessibility
analysis, video analysis. The key frame of the video shot is extracted based on
these measures and is called thumbnail candidates. In this proposed method, SVR
model is used to predict thumbnail candidates. The predicted thumbnails by the
proposed method were of user’s preference and enhance their personal experience.
Authors in [27] have developed an embedding model for automatic selection of
video thumbnails by computing similarity value between query and thumbnail with
side semantic information called as thumbnail query relevance score. The thumbnail
represents the video content properly in the form of representation and with the
highest query relevance score. The proposed method is possible only for query-
dependent thumbnail selection called personalized thumbnails. This improves the
search experience of users. The proposed method can also be used for video search
re-ranking [28], video tag localization [29], and mobile video search [30].
A recommender system in [31] is implemented using a graph database which deals
with complex relations. The movie node recommendation degree is expressed as size
of node and thickness of edge. In [32] the authors found that most of the research is
focused on accuracy improvement in recommender systems. The recommendations
282 M. B. Baikadolla et al.
according to standard measures may not be useful to users. The authors proposed
informal arguments to evaluate recommendations other than accuracy which are
user-centric directions. The informal metric proposed by [33] is intra-list similarity,
leave-n-out methodology [34, 35]. A content-based system in [36] is proposed for
movie recommendations which captures user temporal preferences as a Dirichlet
Mixture Process Model. The proposed system is a user-centric framework which
includes content attributes of user rated movies. The proposed system can be extended
to give recommendations based on content for new movies. This proposed system
motivated us to use a hybrid recommender system which fuses item-based collab-
orative approach along with content-based approach. This motivation aims for a
hybrid system to improve the serendipity and diversity for movie recommendations.
Authors in [37] proposed a similarity measure ITR from a commonly used similarity
measure called triangle similarity to improve recommendations using collaborative
filtering. The triangle similarity mainly focuses on the common item ratings by
users only which was extended by the authors by proposing ITR similarity measure
which considers the item ratings which were not rated from pairs of other users. The
proposed similarity measure also considers User Rating Preferences (UPR) i.e., the
behavior of users while giving rating. This improved similarity measure motivated
the use of such measures for hybrid systems.
3 Proposed System
The proposed hybrid system architecture comprises three major modules: (1)
Information Collector, (2) Movie Recommender Module, and (3) Thumbnail Selector
as shown in Fig. 1. The Information Collector is responsible for data preparation
and preprocessing. It enhances the existing dataset by leveraging the Open Movie
Database Application Programming Interface (OMDb API), a free-to-use RESTful
API that offers a comprehensive database of movie-related information. The Movie
Recommender Module is the central module that incorporates both the collabora-
tive and content-based components. The content-based filter predicts movie ratings
based on similarities among movies, while the collaborative filter predicts ratings
based on similarities among users. The combination of these two components outper-
forms either one individually. The thumbnail selector contains a thumbnail mapper
component that chooses the most suitable thumbnail to display for each movie. This
component uses The Movie Database Application Programming Interface (TMDB
API), another RESTful API that provides up-to-date movie-related information.
The proposed system is built for MovieLens 100 k dataset, a popular benchmark
dataset in the field of recommender systems as shown in Table 1a. This dataset was
created by the research lab GroupLens with 943 users rating 1682 movies. It also
includes five pre-built training and testing sets which are 80–20 splits on the original
dataset. The OMDb API developed by Brain Fritz is used to augment the dataset
284 M. B. Baikadolla et al.
with additional information for each movie, including its genre, list of actors, list
of directors, year of release, runtime, censorship rating, and language as shown in
Table 1b.
The proposed system uses a content-based approach to suggest users the movies based
on the similarity. The underlying premise of content-based filters in recommender
systems is that users tend to prefer items that share similarities with the ones they
have previously enjoyed. Unlike collaborative filtering systems, the content-based
filtering systems are focused solely on movie features. To calculate the similarity
between movies, we consider several relevant features of each movie, including its
genre, actors, directors, language, and plot synopsis. Initially, we also included the
runtime and censorship ratings of the movies as relevant features but found that
this resulted in decreased performance and subsequently removed them. The item’s
similarity is computed using the commonly used cosine similarity measure [38]. The
item-item similarity matrix which is a N × N matrix which captures the similarity
between any two movies (A, B) in the dataset, where N is the total count of movies,
can be computed using Eq. (1).
A·B
Cosine similarity(A, B) = . (1)
|A||B|
To compute each movie score for a given user, first compare the movie to all other
movies rated by that user and calculate a weighted average score of the rating and
the similarity between the rated and the question movie. The computed score of a
movie M for a user U is calculated using Eq. (2).
∑n
Rating(U, Ni ) × Similarity(M, Ni )
Predicted(M, U ) = i=1
∑n , (2)
i=1 Similarity(M, Ni )
(b)
0 1 Toy story 1995 Animation adventure Tom Hanks Tim Allen John Lasseter English
comedy Don Rickles
1 2 GoldenEye 1995 Action adventure Pierce Brosnan Sean Bean Martin Campbell English Russian Spanish
thriller Izabella Scorupco
2 3 Four rooms 1995 Comedy Tim Roth Antonio Allison Anders Alexandre English
Banderas Sammi Davis Rockwell Robert Rodriguez
3 4 Get shorty 1995 Comedy crime thriller Gene Hackman Rene Barry Sonnenfeld English
Russo Danny DeVito
(continued)
285
Table 1 (continued)
286
its own advantages and limitations. The similarity scores between users are stored
in a similarity matrix that facilitates the recommendation process. The proposed
recommender system uses triangle similarity metric [40] to compute the user’s
similarity. The user’s similarity between users m and n is calculated using Eq. (3).
/ ( )2
∑
rm,i − rn,i
i∈I
Triangle Sim(m, n) = 1 − /∑ ( ) ∑ ( )2 , (3)
2
i∈I r m,i + i∈I rn,i
where I represent the collection of item ratings rated by either m or n user, and r m,i
and r n,i are the item i ratings by m and n users, respectively.
While predicting potential ratings we use neighborhood selection process which
involves choosing similar users based on their neighborhood. Two commonly used
approaches are top-k and correlation-based threshold. In the top-k approach, the
topmost k users with high similarity to the user in question are considered. In
correlation-based threshold approach, all users who exceed the baseline threshold
are considered. The proposed recommendation system adopts the top-k neighbor-
hood selection algorithm. The final rating prediction is performed by aggregating
ratings from the selected neighbors for a particular movie. The final rating in the
proposed system is calculated using the weighted sum method which considers the
users similarity as its weight.
The proposed system aims to enhance user engagement by personalizing the thumb-
nail artwork displayed for each recommended movie [41]. To achieve this, an algo-
rithm is proposed which selects a personalized thumbnail based on the actors featured
in the artwork and their estimated relevance to the user. The proposed approach can
be a starting point for thumbnail personalization when A/B testing is unavailable
to the system. The proposed algorithm takes three main factors into consideration
when selecting a thumbnail featuring particular actors: negative exposure, frequency
in user-viewed films, and global recognition. Actors with low values (less than 3)
denote having a negative exposure in films and are penalized in the selection process.
Actors with low values of negative exposure (less than 3) are penalized in the selec-
tion process. Frequency in user-viewed films is measured by how many films the
user has seen in which the actor has appeared in. Finally, the global recognition of an
actor is determined by whether or not they appear in the top 10,000 rated celebrities
according to IMDB star-meter. The thumbnail mapper algorithm selects a thumb-
nail by considering the combined score of all actors featured. The final thumbnail is
chosen based on a weighted probability where each thumbnail’s probability of being
selected is proportional to its score relative to the sum of scores of all candidate
thumbnails.
288 M. B. Baikadolla et al.
Input:
movie_id: ID of a movie
movie_thumbnails: dictionary mapping movie IDs to lists of thumbnail
images
popular_actors: list of IDs of popular actors
positive_actors: list of IDs of actors with positive exposure
Step 1: Start
Step 2: Get the list of thumbnail images for the given movie
Step 3: Set POPULARITY_WEIGHT and POSITIVE_EXPOSURE_
WEIGHT
a. Initialize an empty list thumbnail_weights
b. For each thumbnail in the list of thumbnails
i. Initialize an empty list weights
ii. For each person in the thumbnail image
– If the person is in the popular_actors list, then append
POPULARITY_WEIGHT to the weights list
– If the person is in the positive_actors list, then append
POSITIVE_EXPOSURE_WEIGHT to the weights list
– If the person is not in either list, then append 0.25 to the
weights list
c. Sort the weights list in descending order
d. Initialize the variable adjusted_weights to 0
e. For each weight in the weights list
i. Divide the weight by the index of the weight in the sorted weights
list plus 1
ii. Add the result to adjusted_weights
f. Appended adjusted_weights to the thumbnail_weights list
Step 4: Return the thumbnail_weights list
Thumbnail Personalization in Movie Recommender System 289
Step 1: Start
Step 2: If the actor shows a negative exposure measure of less than 3, then the
thumbnail is removed from consideration without evaluating any other features
Step 3: If the actor has appeared in multiple films seen by the user, they are
awarded 0.1 points for each additional movie they have appeared in
Step 4: If the actor is present in the top 10,000 rated celebrities list, they are
awarded 1 point
Step 5: Stop
Step 1: Start
Step 2: If the thumbnail features a single actor, then the score of the actor is
taken as the score of the thumbnail
Step 3: If the thumbnail features multiple actors, then
a. Arrange the list of actors in descending order of their scores
b. For each actor
i. divide their score by their position in the order i.e., divide
first position by 1, second position by 2, and so on
c. Take the sum of these scores as the final score for the thumbnail
Step 4: Stop
The quality of the proposed system performance can be assessed using the metrics
Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) [42] of the
predicted scores against the actual scores of the movie given by the user. The RMSE
290 M. B. Baikadolla et al.
Table 2 Top 40 movie recommendations for user 12 using the proposed hybrid recommender
system
S. No. Movie id Movie title Predicted rating
1 1368 Mina Tannenbaum (1994) 4.7744
2 814 Great Day in Harlem, A (1994) 4.7479
3 1463 Boys, Les (1997) 4.7453
4 1358 The Deadly Cure (1996) 4.7442
5 1189 Prefontaine (1997) 4.7402
6 1467 Saint of Fort Washington, The (1993) 4.7335
7 1500 Santa with Muscles (1996) 4.7226
8 1643 Angel Baby (1995) 4.7109
9 1599 Someone Else’s America (1995) 4.6981
10 1201 Marlene Dietrich: Shadow and Light (1996) 4.6873
11 1367 Faust (1994) 4.6864
12 1302 Late Bloomers (1996) 4.6817
13 1389 Mondo (1996) 4.6668
14 1122 They Made Me a Criminal (1939) 4.6296
15 515 Boot, Das (1981) 4.5836
16 114 Wallace & Gromit: The Best of Aardman Animation 4.5095
(1996)
17 1143 Hard Eight (1996) 4.5046
18 1449 Pather Panchali (1955) 4.4943
19 64 The Shawshank Redemption (1994) 4.4876
20 483 Casablanca (1942) 4.47
21 408 Close Shave, A (1995) 4.4682
22 1398 Anna (1996) 4.4673
23 427 To Kill a Mockingbird (1962) 4.4568
24 251 Shall We Dance? (1996) 4.4501
25 1431 Legal Deceit (1997) 4.4479
26 1007 Waiting for Guffman (1996) 4.4419
27 900 Kundun (1997) 4.4346
28 169 The Wrong Trousers (1993) 4.4273
29 1594 Everest (1998) 4.4237
30 1064 Crossfire (1947) 4.417
31 119 Maya Lin: A Strong Clear Vision (1994) 4.4118
32 1313 Palmetto (1998) 4.4049
33 1125 Innocents, The (1961) 4.3779
34 1138 Best Men (1997) 4.3672
35 113 The Horseman on the Roof (Hussard sur le toit, Le) 4.363
(1995)
(continued)
Thumbnail Personalization in Movie Recommender System 291
Table 2 (continued)
S. No. Movie id Movie title Predicted rating
36 272 Good Will Hunting (1997) 4.3626
37 178 12 Angry Men (1957) 4.3611
38 511 Lawrence of Arabia (1962) 4.3575
39 613 My Man Godfrey (1936) 4.3501
40 12 Usual Suspects, The (1995) 4.3483
and MAE can be calculated using Eqs. (4) and (5), respectively. Commonly used
metrics are RMSE and MAE for recommender systems evaluation [43]. Such metrics
quantify the average magnitude of error between the predicted and actual ratings,
making them useful for assessing the system’s predictions.
/
∑n
i=1 (Predictedi − Actuali )2
RMSE = , (4)
N
∑n
i=1 |Predictedi − Actuali |
MAE = , (5)
N
where N is the total number of items
Precision and recall are also important metrics for validating recommender
systems [44, 45]. However, these metrics can be challenging to use in sparse datasets
like the MovieLens dataset, where not every movie has been rated by every user,
making it impossible to know the true value for every recommendation. Thus, these
metrics may not be well-suited to evaluating the performance of the system without
conducting proper and thorough surveys. During the testing phase, several similarity
measures were applied, and the error was measured across the testing tests. The
RMSE and MAE values were computed from the predicted ratings obtained from
both the content-based filter using cosine similarity and collaborative filter using
triangle similarity. To combine the algorithms, a weighted average approach was used
with different weight combinations of 25–75%, 50–50% and 75–25%. Additionally,
the performance of the combined algorithms was compared against each individual
algorithm to quantify the value of the combination approach. Based on the analysis
from Fig. 2, it was found that the 50–50% average of the ratings obtained from the
content-based filter and collaborative filter showed the best results, with RMSE of
0.9489 and MAE of 0.7956.
Table 3 illustrates the results of different recommendation models on MovieLens
100 k dataset. The RMSE results of the proposed hybrid system are 0.948 which is less
when compared with machine learning algorithms (1.042, 1.024, 0.983, 0.995, 0.956,
1.006, 0.954) and deep learning algorithms (0.990, 0.957) except for deep learning
method of collaborative recommender system (DLCRS) model. These results show
that the proposed hybrid model outperforms when compared with recommender
systems [20, 21].
292 M. B. Baikadolla et al.
Fig. 2 Proposed hybrid recommender system results with weighted average approach
5 Conclusion
User engagement is a key feature in the success of OTT platforms, and person-
alized experiences are essential for fostering engagement. Recommender systems
are a powerful tool for achieving this personalization by recommending films that
align with user’s preferences, and thumbnails play a critical role in attracting user’s
attention and encouraging them to engage with the platform. The proposed hybrid
Thumbnail Personalization in Movie Recommender System 293
6 Future Work
As part of future work of the proposed system, the movie recommender and thumb-
nail selector modules can be explored. The movie recommender module can be
explored with other similarity measures for different integration of content-based
and collaborative approaches. The weighted average has limitations in that its results
are confined between the results obtained from the individual components. Other
approaches which tightly integrate the two techniques may give results unobtain-
able by simple weighted average. Another area of expansion is the inclusion of user
profiles in determining similarity between users. Features such as user age, gender,
occupation, languages spoken, and area of living can allow the system to provide
tailored recommendations for specific user demographics. The thumbnail mapper
of the thumbnail selection algorithm of the proposed recommender system can be
refined. While actor preference and exposure were chosen for being a highly visual
and familiar aspect of a film, numerous other factors, such as the nature, visual
style, and ambience of the scene portrayed, can be considered as well to add more
nuance to the selection process. Furthermore, the proposed recommender system
can be improved by conducting additional research on the impact of these dynamic
thumbnails, and their effect on user experience, engagement, and click-through rate
on OTT platforms through hands-on research and experimentation.
294 M. B. Baikadolla et al.
References
1. Shetty B (2019) An in-depth guide to how recommender systems work. Built in Beta
2. Thorat PB, Goudar RM, Barve S (2015) Survey on collaborative filtering, content-based
filtering and hybrid recommendation system. Int J Comput Appl 110(4):31–36
3. Jung KY, Park DH, Lee JH (2004) Personalized movie recommender system through hybrid 2-
way filtering with extracted information. In: Flexible query answering systems: 6th international
conference, FQAS 2004, Lyon, France, 24–26 June 2004. Proceedings 6. Springer, Berlin, pp
473–486
4. Afoudi Y, Lazaar M, Al Achhab M (2021) Hybrid recommendation system combined content-
based filtering and collaborative prediction using artificial neural network. Simul Model Pract
Theory 113:102375
5. Sahu G, Gaur L, Singh G (2022) Analyzing the users’ de-familiarity with thumbnails on OTT
platforms to influence content streaming. In: 2022 international conference on computing,
communication, and intelligent systems (ICCCIS), pp 551–556. IEEE
6. Abdollahpouri H, Burke R, Mobasher B (2019) Managing popularity bias in recommender
systems with personalized re-ranking. arXiv preprint arXiv:1901.07555
7. Subramaniyaswamy V, Logesh R, Chandrashekhar M, Challa A, Vijayakumar V (2017) A
personalised movie recommendation system based on collaborative filtering. Int J High Perform
Comput Network 10(1–2):54–63
8. Bai BM, Mangathayaru N, Rani BP (2023) An optimized spectral clustering algorithm for
better imputation of medical datasets (OISSC). In: Choudrie J, Mahalle PN, Perumal T, Joshi
A (eds) IOT with smart systems. ICTIS 2023. Lecture notes in networks and systems, vol 720.
Springer, Singapore
9. Mathura Bai B, Mangathayaru N, Padmaja Rani B (2023) Unsupervised learning method
for better imputation of missing values. In: Garg D, Narayana VA, Suganthan PN, Anguera J,
Koppula VK, Gupta SK (eds) Advanced computing. IACC 2022. Communications in computer
and information science, vol 1782. Springer, Cham
10. Bai BM, Mangathayaru N (2022) Modified K-nearest neighbour using proposed similarity
fuzzy measure for missing data imputation on medical datasets (MKNNMBI). Int J Fuzzy Syst
Appl (IJFSA) 11(3):1–15
11. Bai BM, Mangathayaru N, Rani BP, Aljawarneh S (2021) Mathura (MBI)—a novel imputation
measure for imputation of missing values in medical datasets. Recent Adv Comput Sci Commun
(Formerly: Recent Patents Comput Sci) 14(5):1358–1369
12. Katarya R, Verma OP (2017) An effective collaborative movie recommender system with
cuckoo search. Egyptian Inform J 18(2):105–112
13. Walek B, Fojtik V (2020) A hybrid recommender system for recommending relevant movies
using an expert system. Expert Syst Appl 158:113452
14. Ifada N, Rahman TF, Sophan MK (2020) Comparing collaborative filtering and hybrid based
approaches for movie recommendation. In: 2020 6th information technology international
seminar (ITIS). IEEE, pp 219–223
15. Kim J, Lee J (2021) Between familiarity and unfamiliarity: users’ perception and intention of
watching netflix artwork. Archiv Des Res 34(4):23–37
16. Gilmore JN (2020) To affinity and beyond: clicking as a communicative gesture on the
experimentation platform. Commun Cult Critique 13(3):333–348
17. Eklund O (2022) Custom thumbnails: the changing face of personalisation strategies on Netflix.
Convergence 28(3):737–760
18. Burke R (2002) Hybrid recommender systems: survey and experiments. User Model User-Adap
Inter 12:331–370
19. Fkih F (2022) Similarity measures for collaborative filtering-based recommender systems:
review and experimental comparison. J King Saud Univ Comput Inform Sci 34(9):7645–7669
20. Aljunid MF, Dh M (2020) An efficient deep learning approach for collaborative filtering
recommender systems. Procedia Comput Sci 171:829–836
Thumbnail Personalization in Movie Recommender System 295
44. Chung Y, Kim N-r, Park C-Y, Lee J-H (2018) Improved neighborhood search for collaborative
filtering. Int J Fuzzy Logic Intelligent Syst 18:29–40. https://doi.org/10.5391/IJFIS.2018.18.
1.29
45. Sawtelle S (2016) Mean average precision (map) for recommender systems. Evening Session:
Exploring Data Science and Python
Comparative Analysis of Large
Language Models for Question
Answering from Financial Documents
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 297
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_23
298 S. Panwar et al.
The rest of the paper is organized as follow. Section 2 outlines prior work related to
question answering using deep learning models. Section 3 provides the architectural
details of large language models, which are compared in this paper. Experimental
details and results are discussed in Section sec:experiments. Finally we conclude our
paper in Sect. 6.
2 Related Work
Question answering is an effective way for performing information retrieval [1, 14,
19] and entity extraction [2, 4, 27].
Various deep learning-based models have been used and proved to be very effec-
tive in different types of question answering problems [6]. Lei et al. [7] have used
Convolutional Neural Networks (CNN) for sentence classification, which is one the
Comparative Analysis of Large Language Models … 299
crucial steps in intelligent question answering systems. Tan et al. [20] proposed a
LSTM-based QA model-QA-LSTM. Word embeddings of all the sentences of ques-
tions and text containing the answers are fed into BiLSTM network, which provides
fixed sized representation of each sentence. Relevant text is later retrieved on the
basis of cosine similarity between sentences of questions and text. Wang et al. [23]
also proposed LSTM-based model for QA task on SQuAD dataset [18].
In 2017, researchers at Google published the first paper on attention [22] that acted
as a catalyst for future transformer models like Generative Pretrained Transformer
(GPT) [16], Bidirectional Encoder Representation from Transformers (BERT) [5],
and RoBERTa [10]. Different variants of BERT have been proposed to extract infor-
mation from different domains. Nguyen et al. [12] proposed BERTweet, which is a
BERT-based model trained on twitter data and can be used to analyze social media
data. CT-BERT [11] is transformer model, pre-trained on Covid-19 tweets and can
be used to analyze data related to the pandemic.
Pearce et al. [15] show a comparative analysis of large language model for extrac-
tive question answering task. There work is similar to ours. However, in this paper we
compare two large language models for financial question answering, which is not
extractive in nature. In order to answer a question from financial document, complex
reasoning, and mathematical calculations are required.
3 Architecture of LLMs
At its core, LLMs rely on transformer architecture, which was introduced by Vaswani
et al. in 2017 [22]. The basic components and mechanism within transformer archi-
tecture is explained in subsequent subsections. Figure 1 is taken from the original
transformer paper [22] and show the major components of transformer architecture.
3.1 Input
At the heart of the Transformer is the attention mechanism, which enables the model
to weigh the importance of different parts of the input sequence when generating an
Comparative Analysis of Large Language Models … 301
output. This mechanism replaces the traditional recurrent neural networks (RNNs)
and convolutional neural networks (CNNs) used in sequence-to-sequence tasks.
Self-Attention: In self-attention, the model can focus on different positions in the
input sequence to varying degrees. This allows it to capture long-range dependencies
and contextual information efficiently.
Multi-Head Attention: Transformers often use multiple attention heads in parallel.
Each head learns different aspects of the input, enhancing the model’s ability to
capture diverse patterns and relationships.
4.1 Overview
This section explains in the detail the complete methodology of the proposed method
for question answer system on financial documents using LLaMA and ChatGPT.
302 S. Panwar et al.
Fig. 2 A flowchart showing the complete overview of questions answering system from the financial
documents
Since the core components of both these LLMs is the same, the steps involved in
getting the relevant information from financial documents using QA system is also
the same. The overall pipeline of the proposed system is shown in Fig. 2. First, we
explain the prompt and the prompt engineering that is a powerful way of giving
instructions to the LLMs. Retriever-reader architecture is discussed next, which is
an integral part of working of LLMs on large documents.
4.2 Prompt
The input given to the LLM is called as prompt. A typical prompt for a ques-
tion answering task consists of two main components: (1) Question, which is the
query/question asked by the user, and (2) Context or document, which is the source
of information from which the answer is obtained. The LLMs are trained on huge
amount of generic text and therefore, may not perform well on domain specific tasks.
In such scenarios, connecting LLMs to domain specific data source give desired
results [9]. We are calling this is as context, and it comprises of text and tabular data
of financial documents. In this paper, we have used instruction prompts and done zero
shot and one-shot inference to analyze their impact on the performance of LLMs.
Instruction prompts means giving instruction to models to perform a specific task.
In zero-shot inference, context and question is provided and model gives output. In
one-shot inferencing, prompt is designed in such a way that it contains one exam-
ple of context, question, and the corresponding answer. It generally gives model an
understanding about the expected responses.
Comparative Analysis of Large Language Models … 303
The financial documents are very long. Additionally, the number of tokens that can be
passed as prompts are limited. Both ChatGPT and LLaMA 2 has a maximum limit of
4096 tokens. To handle this limitation, the document is divided into multiple chunks.
In our experiments, we have created chunks of 2000 tokens. When a question is
asked, relevant chunks are required to be identified and selected. To handle this issue
in a more efficient manner, modern QA systems are generally based in the retriever-
reader architecture, which we are going to explain in subsequent subsections.
Retriever Retrievers are used to fetch the relevant chunks for a specific question.
They can be broadly categorized as sparse and dense. Sparse retrievers utilize word
frequencies to create a sparse vector representation for each document and query.
The degree of relevance between a query and a document is subsequently determined
by calculating the inner product of these vectors. Dense retrievers, on the other hand,
employ encoders such as transformers to represent both the query and document
as contextualized embeddings, which are dense vectors. These embeddings capture
semantic meaning and empower dense retrievers to enhance search accuracy by
comprehending the query’s content.
Reader Responsible for obtaining an answer from the documents retrieved by the
retriever.
In addition to reader and retriever, there can be other components that perform post-
processing to the chunks extracted by the retriever or to the answers obtained by the
reader. For example, the chunks extracted from the retriever may need re-ranking to
remove noise or irrelevant chunks. Similarly, post-processing of reader’s answers is
required when the correct answer is fetched from different chunks of a long document.
5 Experiments
All the models are implemented using PyTorch and the transformers library from the
hugging face [25]. The experiments are conducted on AWS g4dn.xlarge consisting
of 4 CPUs each of 16GB RAM and NVIDIA T4 Tensor Core GPUs. The following
paragraphs describe important aspects related to the experimental evaluation of the
obtained results.
We use FinQA dataset [3] for our work. The dataset consists of 8231 financial QA
pairs based on publicly available earnings reports of S&P 500 companies from 1999
to 2019. An earning report is a PDF file that contains information regarding the finan-
304 S. Panwar et al.
Fig. 3 Distribution of
questions in our test set that
begin with a few common
starting words
cials of a company, in the form of text and tables. In this work, we use 1428 examples
from the dataset for testing the working of both types of LLMs. Figure 3 show a dis-
tribution of questions that begin with a few common starting words. The answer to
most of these questions is either numerical values with/without mathematical units
or ‘Yes’ or ‘No’. Table 1 show an example of a record of FinQA dataset. Each record
consists of pre-text, table, post-text, question, and answer. Many important financial
details are present in tabular format, which is present table column. Pre-text and
post-text consists o text present before and after tabular data, respectively. All three
collectively form the context, on the basis of which questions are asked and answers
are given. Unlike SQUAD dataset, the answer are not directly present in the context
and is obtained after applying complex reasoning.
The answers of FinQA dataset mostly consists of either numerical values with math-
ematical units or it contains string values with one word ‘yes’ or ‘no’. For current
experiments we have excluded those examples where answer is in form of a sentence.
Therefore, we perform exact matching of LLMs output and original answer. The per-
formance of the models is evaluated using the accuracy metric, which is defined as
follows:
Number of correct answers
. Accuracy = × 100
Total number of questions
Table 1 An example of a record of FinQA dataset, showing context (consists of pre_test, post_text, and table), question and the answer
Pre_text Post_text Table Question Answer
[‘american tower corporation and subsidiaries notes [‘(1) consists of customer-related [[“, ‘preliminary purchase price For acquired customer-related and 7.4
to consolidated financial statements (3) consists intangibles of approximately $ 10.7 allocation’], [‘current assets’, ‘$ 8763’], network location intangibles,
of customer-related intangibles of approximately million and network location intangibles [‘non-current assets’, ‘2332’], [‘property what is the expected annual
$ 75.0 million and network location intangibles of approximately $ 10.4 million.’, ‘the and equipment’, ‘26711’], [‘intangible amortization expenses, in
of approximately $ 72.7 million.’, ‘the customer- customer-related intangibles and network assets (1)’, ‘21079’], [‘other non-current millions?
related intangibles and network location intangibles location intangibles are being amortized liabilities’, ‘.− 1349 (1349)’], [‘fair value
are being amortized on a straight-line basis over peri- on a straight-line basis over periods of of net assets acquired’, ‘$ 57536’],
ods of up to 20 years.’, ‘(4) the company expects up to 20 years.’, ‘(2) the company expects [‘goodwill (2)’, ‘5998’]]
that the goodwill recorded will be deductible for tax that the goodwill recorded will be
purposes.’, ‘the goodwill was allocated to the com- deductible for tax purposes.’, ‘the
pany 2019s international rental and management goodwill was allocated to the company
segment.’, ‘on September 12, 2012, the company 2019s international rental and
entered into a definitive agreement to purchase up to management segment’, ‘on November 16,
Comparative Analysis of Large Language Models …
approximately 348 additional communications sites 2012, the company entered into an
from telef f3nica mexico.’, ‘on September 27, 2012 agreement to purchase up to 198 additional
and December 14, 2012, the company completed communications sites from telef f3nica
the purchase of 279 and 2 communications sites, mexico.’, ‘on December 14, 2012, the
for an aggregate purchase price of $ 63.5 million company completed the purchase of 188
(including value added tax of $ 8.8 million).’, ‘the communications sites, for an aggregate
following table summarizes the preliminary alloca- purchase price of $ 64.2 million (including
tion of the aggregate purchase consideration paid value added tax of $ 8.9 million).’]
and the amounts of assets acquired and liabilities
assumed based upon their estimated fair value at the
date of acquisition (in thousands) : preliminary pur-
chase price allocation.’]
305
306 S. Panwar et al.
Table 2 Showing the performance of LLaMA-2 and ChatGPT on question answering on FinQA
dataset
Large language model Accuracy (in percentage)
ChatGPT with zero-shot inferencing 65.91
LLaMA-2 with zero-shot inferencing 61.07
ChatGPT with one-shot inferencing 61.85
LLaMA-2 with one-shot inferencing 53.36
Table 3 show the output of LLaMA-2 and ChatGPT on few of the example questions
from FinQA dataset. The first two rows show the correct answer by both LLaMA-2
and ChatGPT, while the next two rows show a small percentage of error. Please
note that the output is obtained after applying complex reasoning and mathematical
calculations. The exact answer was not present in the respective contexts. The results
of the overall performance of both the models are presented in Table 2. Following
observations can be made on the basis of these results:
6 Conclusion
This paper presented a comparison of two popular large language models for question
answering from financial documents. The paper also presented a complete pipeline
to use these models for the extraction of financial insights from the large documents.
Comparative Analysis of Large Language Models … 307
Table 3 Showing the output given by LLaMA-2 and ChatGPT on few of the examples from FinQA
dataset
S. no. Question from FinQA Actual answer LLaMA output ChatGPT output
dataset
1 What percentage of total 41 41 41
net revenues in 2012
where due to equity
securities (excluding
icbc) revenues?
2 What was the change in 141 141 141
non-trade receivables,
which are included in
the consolidated balance
sheets in other current
assets, between
September 24, 2005 and
September 25, 2004, in
millions?
3 What is the average cash 2758.55 2758.05 2758.55
provided by the
operating activities
during 2018 and 2019?
4 What is the roi of an 21.5 21.48 21.48
investment in s&p500 in
2004 and sold in 2006?
The results show that ChatGPT performs better than LLaMA-2 7B model. In future
we will attempt to compare the results after performing fine-tuning of LLaMA-2 and
other large language models.
References
1. Zahra A, Saeedeh M (2021) Text-based question answering from information retrieval and
deep neural network perspectives: a survey. Wiley Interdiscipl Rev Data Min Knowl Discov
11(6):e1412
2. Ali I, Yadav D, Sharma AK (2022) Question answering system for semantic web: a review. Int
J Adv Intell Paradigm 22(1–2):114–147
3. Chen Z, Chen W, Smiley C, Shah S, Borova I, Langdon D, Moussa R, Beane M, Huang
T-H, Routledge B et al (2021) Finqa: a dataset of numerical reasoning over financial data.
arXiv:2109.00122
4. Surabhi D, Kirk R (2022) Fine-grained spatial information extraction in radiology as two-turn
question answering. Int J Med Inform 158:104628
5. Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional
transformers for language understanding. arXiv:1810.04805
6. Zhen H, Xu S, Hu M, Xinyi W, Qiu J, Fu Y, Zhao Y, Peng Y, Wang C (2020) Recent trends in
deep learning based open-domain textual question answering systems. IEEE Access 8:94341–
94356
308 S. Panwar et al.
7. Lei T, Shi Z, Liu D, Yang L, Zhu F (2018) A novel cnn-based method for question classification
in intelligent question answering. In: Proceedings of the 2018 international conference on
algorithms, computing and artificial intelligence, pp 1–6
8. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer
L (2019) Bart: denoising sequence-to-sequence pre-training for natural language generation,
translation, and comprehension. arXiv:1910.13461
9. Patrick L, Ethan P, Aleksandra P, Fabio P, Vladimir K, Naman G, Heinrich K, Mike L, Yih
W-t, Tim R et al (2020) Retrieval-augmented generation for knowledge-intensive nlp tasks.
Adv Neural Inform Process Syst 33:9459–9474
10. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V
(2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692
11. Müller M, Salathé M, Kummervold PE (2023) Covid-twitter-bert: a natural language processing
model to analyse Covid-19 content on twitter. Front Artif Intell 6:1023281
12. Nguyen DQ, Vu T, Nguyen AT (2020) Bertweet: a pre-trained language model for english
tweets. arXiv:2005.10200
13. OpenAI (2021) ChatGPT: A Large-Scale Language Model for Conversational AI. https://
openai.com/research/chatgpt. Accessed 6 Oct 2023
14. Otegi A, San Vicente I, Saralegi X, Peñas A, Lozano B, Agirre E (2022) Information retrieval
and question answering: a case study on Covid-19 scientific literature. Knowl Based Syst
240:108072
15. Pearce K, Zhan T, Komanduri A, Zhan J (2021) A comparative study of transformer-based
language models on extractive question answering. arXiv:2110.03142
16. Alec R, Narasimhan K, Sutskever I et al (2018) Improving language understanding by gener-
ative pre-training, Tim Salimans
17. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020)
Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn
Res 21(1):5485–5551
18. Rajpurkar P, Zhang J, Lopyrev K, Liang P (2016) Squad: 100,000+ questions for machine
comprehension of text. arXiv:1606.05250
19. Sakata W, Shibata T, Tanaka R, Kurohashi S (2019) Faq retrieval using query-question similarity
and bert-based query-answer relevance. In Proceedings of the 42nd international ACM SIGIR
conference on research and development in information retrieval, pp 1113–1116
20. Tan M, Dos Santos C, Xiang B, Zhou B (2016) Improved representation learning for question
answer matching. In: Proceedings of the 54th annual meeting of the association for computa-
tional linguistics, vol 1: Long Papers, pp 464–473
21. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, Bashlykov N, Batra S, Bhargava
P, Bhosale S et al (2023) Llama 2: open foundation and fine-tuned chat models. arXiv preprint
arXiv:2307.09288
22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I
(2017) Attention is all you need. Adv. Neural Inform. Proces, Syst, p 30
23. Wang S, Jiang J (2016) Machine comprehension using match-lstm and answer pointer.
arXiv:1608.07905
24. Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, Le QV, Zhou D et al (2022) Chain-
of-thought prompting elicits reasoning in large language models. Adv Neural Inform Process
Syst 35:24824–24837
25. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz
M et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv
preprint arXiv:1910.03771
26. Yao S, Zhao J, Yu D, Du N, Shafran I, Narasimhan K, Cao Y (2022) React: synergizing
reasoning and acting in language models. arXiv preprint arXiv:2210.03629
27. Didi Y, Siyuan C, Boxu P, Qiao Y, Zhao W, Wang D (2022) Chinese named entity recognition
based on knowledge based question answering system. Appl Sci 12(11):5373
Multilingual Meeting Management
with NLP: Automated Minutes,
Transcription, and Translation
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 309
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_24
310 G. Mehendale et al.
2 Literature Review
Sound source separation is a method to separate a mixture into isolated sounds from
individual sources. In Li et al. [1], three networks are trained using different param-
eters for separating two random sound sources from recording. In this, proximity
principle is explored by experiments. This paper does not extensively evaluate the
performance on a wide range of languages or datasets. Asteroid, a PyTorch-based
audio is an amazing toolkit for researchers. Pariente et al. [2] describe asteroid in
the paper and implements Kaldi-style recipes on common audio source separation
datasets. It follows an encoder-masker-decoder approach. This research has not yet
explored multilingual speakers’ transcription.
Diarization categorizes audio recordings using unsupervised techniques to group
audio belonging to individual speakers. Khoma et al. [3] use an open-source pyan-
note framework to improve accuracy and computational efficiency. In this, four
Multilingual Meeting Management with NLP: Automated Minutes … 311
tests were undertaken for speaker identification and to optimize diarization pipeline
components. The goal of the paper is to get rid of false or corrupt data when the
audio sequence is converted to text for further analysis. Using a generalized model,
Barkovska et al. [4] proposed incoming audio data summarization and conducted a
thorough study of different methods.
Transcription is the process of converting an audio or video recording into usable
text. One such example is given by Chen et al. [5], in which they used Microsoft
Azure SpeedSDK for transcription. This system supports multiple people speaking
simultaneously and boasts an accuracy of greater than 90%. It provides a user-friendly
interface, making it easy to annotate and edit. However, a drawback is its limitation
in handling multilingual speakers. Similarly, Dewan et al. [6] offer an end-to-end
solution for creating fully automated conference meeting transcripts. Their system
employs speech-to-text and machine translation components. Evaluation metrics
such as BLEU and WER were used, and indexing was done using Elasticsearch.
Despite errors in the produced text, their method outperformed others in terms of
speed and convenience.
The paper by Majeed et al. [7] aims to create a model for extractive text summa-
rization using text ranking algorithms and sentence ranking. It focuses on extracting
high-scoring sentences to generate high-quality summaries. However, a research gap
exists as the sentence sequences may not be entirely suitable for easy user reading.
The paper successfully addresses this gap by implementing appropriate similarity
matrices, enhancing semantic relatedness between words. Furthermore, the paper
on text summarization and translation across multiple languages by Banu et al. [8]
centers on creating effective summaries while preserving the original context. It uti-
lizes Hugging Face Transformers for multilingual text summarization. The paper
employs the MarianMTModel for language translation and subsequently uses T5 for
summarization.
Pham et al. [9] in the paper explore pretrained models like wav2vec 2.0 for audio
and MBART50 for text to enhance multilingual speech recognition. It also uses
adaptive weight techniques on CommonVoice and Europarl test sets. However, it is
observed that the paper does not extensively evaluate the performance on a varied
range of languages or datasets. To deal with this issue, this paper uses MarianMT-
Model after effectively fine-tuning and tokenizing the model which has led to an
easy translation of the text into 50 languages and above. In the study by Stanik et al.
[10] a comparative study is conducted using traditional machine learning and deep
learning approaches. The English and Italian data are classified into problem reports,
inquiries, and irrelevant data. It automates the tedious, time-consuming process of
manual analysis of user feedback and automates the task.
Accurate dataset is a must to acquire optimal results on any model. To enhance
multimodal and multilingual learning, Wikipedia-based Image Text (WIT) has been
introduced by Srinivasan et al.,[11] which consists of a large, entity-rich image and
text samples of data. WIT has been beneficial for retraining multimodal models,
fine-tuning image-text retrieval models, and curating multilingual representations
of the same text data. SummarizeAI, a web-based app for summarizing podcasts
with Large Language Models, for text-to-speech translation and audio summariza-
312 G. Mehendale et al.
3 Proposed Methodology
This study introduces a comprehensive three-step process for audio file management:
initially transcribing the spoken content, then distilling the transcription into minutes,
and finally facilitating multilingual translation, ensuring that the content is both
accessible and comprehensible across diverse linguistic audiences.
3.1 Transcription
over time. The DPTNet then applies local attention to these spectrograms, focusing
on short segments to capture the fine details of each sound source. Concurrently,
its globally recurrent mechanism ensures that while it is pinpointing specific audio
elements, it is also considering the broader audio context, enhancing its differentia-
tion ability. This dual-path approach enables the effective separation of overlapping
speech and background noise. After separation, minor post-processing refines the
audio, eliminating potential artifacts. The result is clear, individual audio streams
from the input, optimizing them for tasks like transcription.
Diarization Diarization, essential for managing multilingual meetings, segments
audio based on speaker identity. This project utilizes the pyannote toolkit for effective
speaker diarization using deep learning. The process begins with feature extraction,
where aspects like Mel-frequency cepstral coefficients (MFCCs), pitch, and energy
are derived from the audio signal. These features create a low-dimensional speaker
embedding, capturing each speaker’s unique characteristics. A neural network gen-
erates these embedding, which are then clustered using algorithms like k-means
or hierarchical clustering. Diarization plays a critical role in handling multilingual
meetings by dividing the audio into segments according to speaker identity. For this
purpose, our study employs the pyannote toolkit, acknowledged for its skill in speaker
diarization using deep learning techniques. The procedure commences with feature
extraction, where the audio signal undergoes preprocessing to isolate essential fea-
tures such as Mel-frequency cepstral coefficients (MFCCs), pitch, and energy. These
attributes are vital for creating a concise speaker embedding that captures the unique
vocal characteristics of each participant. A neural network is trained to produce these
embedding, which are then clustered based on their similarity using algorithms like
k-means or hierarchical clustering. The resulting clusters are mapped to different
speakers by examining temporal overlap and the uniformity of the speaker embed-
ding. This step leads to a segmented audio signal where each segment is attributed to
a specific speaker. This careful segmentation is crucial for accurate transcription and
translation in scenarios involving multilingual meetings. By systematically divid-
ing the audio signal into consistent portions based on speaker identity, our study
effectively leverages the functionalities of the pyannote toolkit. This method ensures
well-organized and coherent management of multilingual meetings, demonstrating
the toolkit’s adeptness in both academic and industrial spheres.
Speech to Text The process of converting spoken words into written form is known as
speech-to-text conversion. The Python library SpeechRecognition connects to many
voice recognition engines, including Microsoft Bing Voice Recognition and Google
Cloud Voice API, to enable this capability. Modern speech recognition systems rely
on audio sources as input. This source might originate from pre-recorded audio files
or be derived from real-time audio streams. To enhance the clarity and accuracy of this
input, several preprocessing methods are applied. These techniques, which include
filtering, normalization, and augmentation, are crucial in reducing background noise
and extraneous non-speech sounds, thus refining the input quality. The versatility
of today’s speech recognition frameworks is further demonstrated by their capacity
to select and interface with different voice recognition engines or services, tailored
314 G. Mehendale et al.
Fig. 1 Transcription process using DPTNet-based speech separation method, deep learning-based
speaker diarization process, and speech-to-text conversion using the SpeechRecognition module
to specific application needs. Once the audio has been appropriately preprocessed
and the optimal recognition engine chosen, the actual process of speech-to-text con-
version begins. At its core, this conversion relies on sophisticated algorithms and
methodologies. Contemporary models predominantly employ hidden Markov mod-
els and deep neural networks to decode and transcribe spoken language into written
text with impressive accuracy. Additionally, the model groups the speakers accord-
ing to gender in order to improve context and the user experience. For example,
odd speaker numbers (Speaker 1) indicate female speakers, whereas even speaker
numbers (Speaker 0) indicate male voices. This distinction provides a more lucid
transcription background and facilitates comprehension of dialogue dynamics. The
process’s output is arranged and formatted into a text file or string that may be used
in real-world applications. After preparation, this output can be easily processed or
analyzed further. The SpeechRecognition module makes it easy to incorporate voice
recognition features into the system, which is essential for efficiently running mul-
tilingual meetings. Here, Fig. 1 represents the summary of the three methodologies
used for transcription.
Fig. 2 Text summarization process using the Summa package and TextRank algorithm
for instance, from “text.txt”, is loaded, it is segmented into discrete sentences. These
fragments are then refined by eliminating stopwords and preserving only alphanu-
meric terms. Concurrently, word embedding, particularly from the GloVe model,
enhance the representation of the sentences. Assuming a 100-dimensional variant of
GloVe, these embedding are stored systematically for rapid access. Every sentence is
transformed into a vector, averaging the embedding of its words. The subsequent step
involves determining a similarity matrix for the sentences, predominantly via cosine
similarity. The matrix then lays the groundwork for crafting a graph where nodes sym-
bolize sentences and their connecting edges denote semantic relatedness. With the
application of TextRank, sentences are accorded a ranking based on their relevance
and interconnectedness within the graph. This iterative ranking ensures that pivotal
sentences influence the prominence of others in close proximity. The culmination of
this meticulous procedure is the extraction of the top N sentences, embodying the
essence of the original content, offering a streamlined overview. As shown in Fig. 2,
a concise summary is generated using TextRank algorithm. This approach is instru-
mental in synthesizing coherent summaries, making it indispensable for overseeing
intricate, multilingual discussions. To further refine the summarization process, the
output from TextRank is then fed into a BART model, a pretrained model from Face-
book, designed for abstractive summarization. This ensures not only the extraction
of the most relevant sentences but also the generation of a coherent and contextually
accurate summary of the content.
4 Results
In this study, leveraging the DPTNet technique for sound separation, an initial audio
file in the form of .wav file was examined, revealing a diarization result of two
distinct speakers, displaying a part of 250 segments generated. Progressing further,
a second audio file was analyzed, which, after processing, presented a more intricate
diarization encompassing multiple speakers, displaying a part of the 226 segments
generated as seen in Fig. 4 for both the audio file cases.
Multilingual Meeting Management with NLP: Automated Minutes … 317
It is evident that DPTNet is adept not just at identifying individual sounds, but
also at discerning overlapping voices, thus enhancing the precision of transcrip-
tions in complex multilingual contexts. Following the process of diarization, the
segments are transcribed with the SpeechRecognition module. The method of tran-
scribing effectively distinguishes between male and female voices, hence assuring
enhanced clarity and contextual understanding. After importing libraries and load-
ing data from “text.txt”, sentences are processed, stopwords removed, and GloVe
embedding applied. A similarity matrix is created, forming the basis for a graph
where nodes represent sentences and edges indicate relatedness. Using TextRank,
sentences are ranked by relevance, hence Fig. 5 represents the first transcription result
of an audio file, Fig. 6 represents the concise summary generated from TextRank
algorithm, and finally Fig. 7 gives the abstractive summarization using the BART
model.
In the evaluation, this study calculated the precision, recall, and F1-score to
assess the quality of machine-generated summaries in comparison with the anno-
tated summaries based on human evaluation. The study employed two distinct
318 G. Mehendale et al.
models for summarization: BART and TextRank. The ROUGE metrics, including
ROUGE-1 (unigram overlap), ROUGE-2 (bigram overlap), and ROUGE-L (longest
common sequence overlap), were used to quantify the similarity between the gen-
erated summaries and the human-annotated ones in Table 1. Notably, BART consis-
tently outperformed TextRank across all ROUGE metrics (ROUGE-1, ROUGE-2,
and ROUGE-L), indicating its superior performance in capturing both unigram and
bigram overlaps, as well as longer common sequences in the summaries, thereby
demonstrating its effectiveness in generating high-quality and abstractive summa-
rization.
In a multilingual context, real-time capture of live audio is performed on a single
individual as seen in Fig. 8, exemplified by their speech in the Spanish language.
Sophisticated speech recognition tools subsequently transcribe the provided input,
transforming the spoken words into written text.
Furthermore, upon receiving an audio file, the system possesses the capabil-
ity to transcribe and translate the conversation language (source) into 21 different
languages (target) as represented in Fig. 9 which is rigorously tested and verified.
Whether the spoken content is in Spanish, French, or Hindi, the technology can
seamlessly render it into the desired target language, ensuring clarity and cultural
nuance. Table 2 displays the ROUGE-1, ROUGE-2, and ROUGE-L scores for trans-
lations from English into four target languages: Hindi (H), Italian (It), Spanish (Es),
and French (Fr).
Multilingual Meeting Management with NLP: Automated Minutes … 319
5 Conclusion
6 Future Scope
References
1. Li H, Chen K, Wang L, Liu J, Wan B, Zhou B (2022) Sound source separation mechanisms of
different deep networks explained from the perspective of auditory perception. Appl Sci 12(2)
2. Asteroid: The pytorch-based audio source separation toolkit for researchers, pp 2637–2641
(2020)
3. Khoma V, Khoma Y, Brydinskyi V, Konovalov A (2023) Development of supervised speaker
diarization system based on the pyannote audio processing library. Sensors 23(4)
4. Barkovska O (2022) Research into speech-to-text tranfromation module in the proposed model
of a speaker’s automatic speech annotation. Innovative Technol Sci Solutions Ind 5–13
5. Chen X, Li S, Liu S, Fowler R, Wang X (2023) Meetscript: designing transcript-based interac-
tions to support active participation in group video meetings. Proc ACM Hum-Comput Interact
7CSCW2):1–32
6. Dewan A, Ziemski M, Meylan H, Concina L, Pouliquen B (2023) Developing automatic verba-
tim transcripts for international multilingual meetings: an end-to-end solution. arXiv preprint
arXiv:2309.15609
7. Majeed M, Kala MT (2023) Comparative study on extractive summarization using sentence
ranking algorithm and text ranking algorithm. In: 2023 International conference on power,
instrumentation, control and computing (PICC), pp 1–5
8. Banu S, Ummayhani S (2023) Text summarisation and translation across multiple languages.
J Sci Res Technol 242–247
9. Pham N-Q, Waibel A, Niehues J (2022) Adaptive multilingual speech recognition with pre-
trained models. arXiv preprint arXiv:2205.12304
10. Stanik C, Haering M, Maalej W (2019) Classifying multilingual user feedback using traditional
machine learning and deep learning
11. Srinivasan K, Raman K, Chen J, Bendersky M, Najork M (2021) Wit: Wikipedia-based image
text dataset for multimodal multilingual machine learning. In: Proceedings of the 44th inter-
national ACM SIGIR conference on research and development in information retrieval, pp
2443–2449
12. Khanna D, Bhushan R, Goel K, Juneja S (2023) Summarizeai-summarization of the podcasts.
Available at SSRN 4628657
13. Faria FTJ, Moin MB, Wase AA, Ahmmed M, Sani MR, Muhammad T (2023) Vashantor: a
large-scale multilingual benchmark dataset for automated translation of bangla regional dialects
to bangla language. arXiv preprint arXiv:2311.11142
14. Posey J, Aiken M (2015) Large-scale, distributed, multilingual, electronic meetings: a pilot
study of usability and comprehension. Int J Comput Technol 14:5578–5585
15. Wairagala EP, Mukiibi J, Tusubira JF, Babirye C, Nakatumba-Nabende J, Katumba A,
Ssenkungu I (2022) Gender bias evaluation in Luganda-English machine translation. In: Asso-
ciation for machine translation in the americas, pp 274–286
Exploring Comprehensive Privacy
Solutions for Enhancing Recommender
System Security and Utility
Abstract Nowadays recommender systems have gained large attention and have
become highly efficient tools for categorizing and personalizing the different require-
ments of the users in online mode. Recommender systems are driven by the evolving
preferences of computer users and the increasing accessibility of the internet. Though
they can provide precise recommendations, modern recommender systems face
numerous constraints and challenges, such as cold-start problem, sparsity, scalability,
privacy concerns, and optimization issues. There are diverse types of techniques avail-
able which in turn complicate the process of appropriate or valid selection, while
building the application-focused recommender systems. Every technique possesses
its own unique set of features, having advantages and disadvantages which thus
creates a necessity for doing a comprehensive investigation to focus on the complex-
ities involved. This research work aims to conduct a systematic assessment of current
contributions in the field of recommender systems, with the objective of gaining a
thorough understanding of the advancements, identifying areas that require further
attention, and elucidating the unresolved questions and concerns associated with
different techniques. By synthesizing the findings of this review, valuable insights
can be obtained to guide for the future research work and advancement efforts in the
realm of recommender systems.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 321
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_25
322 E. Gupta and S. Shinde
1 Introduction
Data has become the determining factor in everything, yet its size is currently growing
exponentially. Although being the second-largest client center in the world, India had
about 749 million online users in June 2020 and is expected to reach 900 million in
2025. Compared to commercial hubs like the USA (266 million, 84%) and France
(54 million, 81%), the penetration of web-based and e-commerce business is modest,
but it is growing at a record-breaking rate, adding almost 6 million new members on
average every month. The standard database management system is unable to manage
these many datasets effectively. Traditional databases are unable to store and process
information in the form of semi-structured, quasi–structured, and unstructured data
including video, images, audio, wet logs, JSON documents, and search trends, among
others [1]. As a result, the idea of big data was developed. 90% of the information
on the earth today is produced daily by users who interact in an online mode, who
generate 2.5 quintillion bytes of data each day alone during the past two years [2].
This information is gathered from a variety of sources including social media
postings, videos, and images as well as transition records of both e-commerce and
non-e-commerce. This is referred to as big data. Big data encompasses high-volume,
high-velocity, complex, and variable data, necessitating advanced techniques and
technologies to effectively collect, store, distribute, manage, and analyze the infor-
mation [3]. This help from the surroundings, provides us with a simple means of
identifying the greatest solution without exerting much effort to shift through the
many options on the market. Nowadays, recommendation system (RS) is an appli-
cation that filters personalized information and provides a method for understanding
a user’s preferences and making appropriate suggestions to them by considering
patterns between their likes and ratings of various items [4, 5] as shown in Fig. 1.
Protection of users’ information has become one of the biggest substantial chal-
lenges [6–8]. There are various privacy threats that include service providers or their
employees gaining unauthorized access to users [9], illegal data disclosure, intrusion
by third parties to purchase users’ information [10], or various other incidents of
Smart Phone
Music
Video
User
Camera
hacking [11]. Hence, ensuring the privacy of user information is crucial, achieved
through the creation and implementation of various privacy-preserving techniques to
guarantee robust protection for users’ data. In this research work, we focus on leading
an extensive systematic survey of the literature on different techniques for preserving
privacy, which are used for privacy protection in recommendation systems, to iden-
tify trends in the aid of privacy-preserving methods within secure recommenda-
tion systems and focus on the future research directions for the enhancements of
recommendation systems.
Here, ε represents the privacy budget, governing the balance between privacy and
accuracy. Typically, ε is assigned a small positive value. Smaller values lead to higher
privacy and accuracy, while larger values have the opposite effect.
Differential privacy techniques are being used by Apple for preserving the users’
private data while still allowing for accurate aggregation.
Randomization techniques involve introducing randomness into the data or its
presentation to prevent precise inferences about users. User data undergoes pertur-
bation by adding a random value drawn from a predefined distribution to each of
their ratings. Any unknown ratings are replaced with the mean rating. This can make
it more difficult for attackers to link specific data points to individual users.
Bucketization and binning entail the categorization of data into predetermined inter-
vals or bins. Rather than employing exact values, the data is symbolized in terms
of these intervals, aiding in the concealment of distinct user details. This can be
expressed mathematically as:
Given a set of data points D = {x 1 , x 2 , …, x n }, the bucketization process assigns
each data point to a specific interval or bin based on defined boundaries. If there
are m bins with boundaries [b1 , b2 ], [b2 , b3 ], …, [bm − 1, bm ], the data point xi is
Exploring Comprehensive Privacy Solutions for Enhancing … 325
This representation in terms of bins helps mask the exact values of individual data
points, thus enhancing privacy.
Federated Learning is like a teamwork approach for training models. Imagine
different devices or servers working together to improve a shared model, but they
keep their private user data to themselves. They only share summaries of what they
have learned, not individual data, which makes it less likely for personal information
to be exposed. [17, 18]
Each device has its own data (D1 , D2 , …, Dn ). The model gets better by using
each device’s data: model gets updated a bit using η (a small number) and how much
the model is wrong on each device’s data
Then, all the updates from devices are put together to improve the model: model
gets another update by considering the average of these updates.
This way, the model improves without needing to share everyone’s private data
directly.
Netflix and Google are the best example for using federated learning techniques
to train its machine learning models without consolidating all data in a centralized
location.
Let us explore few of the cryptographic techniques used to preserve privacy of
the users used in recommendation system are:
Homomorphic encryption (HE) constitutes a cryptographic system that enables
mathematical operations to be performed on encrypted data without revealing the
actual values, resulting in a ciphertext that, when decrypted, yields the correct
result [18, 28, 29]. Homomorphic encryption can fall into different categories: fully
homomorphic encryption (FHE), somewhat homomorphic encryption (SWHE), or
partially homomorphic encryption (PHE) [24–27].
Mathematically:
• Let m1 and m2 be plaintext messages.
• Let E(m) represents the encryption of message m.
• Let ⊕ denotes an operation (e.g., addition or multiplication) on ciphertexts.
• Let D(E(m)) represents the decryption of the encrypted message E(m).
• For FHE, SWHE, and PHE.
FHE allows both addition (⊕) and multiplication (⊗) operations on encrypted
data:
= m1 ∗ m2. (4)
original secret. This technique is handy for maintaining data privacy and enabling
secure collaborative operations.
Attribute-based encryption (ABE) is a form of encryption that distinguishes each
user through a set of attributes. These attributes’ functions are utilized to decide
the user’s capability to decrypt a ciphertext. In this approach, a user’s private key
and the ciphertext are both influenced by attributes. For successful decryption, the
attributes associated with the user and those tied to the ciphertext need to match.
ABE finds application in cryptographic systems for access control and the nuanced
sharing of encrypted data. ABE is categorized into two types: key-policy ABE and
ciphertext-policy ABE [22].
In essence, ABE tailors’ access to encrypted data based on specific attributes,
enhancing the control and granularity of data sharing while maintaining crypto-
graphic security.
Zero-knowledge proof (ZKP) serves as a technique for verifying the authenticity of
entities. This protocol allows proving certain statements without revealing anything
beyond the accuracy of those statements. It involves two participants: a prover aiming
to demonstrate a statement’s validity and a verifier seeking to authenticate the state-
ment in a specific manner. ZKP is a type of interactive proof where the verifier can
efficiently mimic an honest prover in real-time instances. The core concept behind
zero-knowledge proofs is to compel a user to demonstrate compliance with a specified
protocol, promoting honest behavior while preserving privacy [23].
Consider an interactive proof system denoted as ⟨P|V ⟩ for a language L and an
input x. The “view” of the verifier V regarding input x encompasses all messages
transmitted from prover P to verifier V and all the random bits utilized by V
throughout the protocol’s execution on input x. This is represented as view
Non-cryptographic privacy techniques are generally more scalable and involve lower
computational and communication costs, and their security is more difficult to estab-
lish. This is because their effectiveness hinges on the randomness and anonymity
of data [35]. Factors that could potentially be exploited by sophisticated inference
attacks capable of re-identifying individuals within a dataset. Another significant
challenge is finding the right balance between maintaining privacy and preserving
data utility. The introduction of noise into data, while enhancing privacy, can also
result in the loss of critical information that contributes to generating more precise
recommendations. Thus, there exists a necessity for a meticulous and thoughtful
selection of a threshold that harmonizes privacy and data utility, avoiding the pitfall
of sacrificing one for the other.
Using cryptographic-based privacy techniques is a secure approach to safe-
guarding user privacy within recommendation systems. These techniques effectively
manage privacy concerns without sacrificing accuracy [19]. However, it is impor-
tant to note that cryptographic methods tend to require more computational power
and communication resources. This can pose challenges, especially on devices with
limited capabilities like handheld devices.
To make cryptographic techniques practical for privacy protection, it is bene-
ficial to adopt lightweight cryptographic schemes that reduce computational and
communication overhead. An avenue worth exploring is integrating machine learning
technology.
Despite the various advantages of cryptographic techniques offered in terms of
privacy and accurate recommendations, scalability remains the challenge. Recom-
mendation systems that implement cryptographic-based approaches may struggle
with handling massive volumes of data while maintaining privacy, particularly in
fast-paced online settings. The scalability issue of managing extensive data while
preserving privacy within a short response time is a concern that requires further
exploration.
The discourse surrounding the potential risks of personal data exposure through
recommender systems naturally steers us toward the "defender" stance—that is, how
can recommenders uphold user privacy while maintaining recommendation quality?
This review delves into three distinct categories of approaches that can effectively
tackle privacy concerns within recommender systems:
Architectural and Platform Solutions
This first category encompasses architectures, platforms, and standards designed to
minimize the threat of data leakage. These measures include various protocols and
certificates that assure users of adherence to privacy-preserving practices. By doing
so, these approaches restrict external parties’ capacity to access or infer unauthorized
data. Additionally, distributed architectures are included in this category, as they
eliminate the vulnerability associated with centralized recommenders.
Algorithmic Techniques for Data Protection
The second category revolves around algorithmic techniques that safeguard data.
Within this category, various methods can be categorized. Some involve modifying
data, either by anonymizing user identities or transforming rating data. Others exploit
differential privacy frameworks or cryptographic tools for data protection. The core
concept underlying these techniques is that even if personal data leaks, adversaries
would only possess modified or encrypted information, rendering recovery of the
original data difficult.
330 E. Gupta and S. Shinde
6 Conclusion
• The trade-off between privacy and accuracy: Sometimes, maintaining privacy can
affect accuracy, obscuring useful information for recommendations.
• The potential for discrimination: Privacy-preserving techniques may introduce
bias by concealing information about specific group members.
• The need for transparency: Enhancing transparency about user data usage
fosters user trust and encourages the sharing of accurate information, ultimately
improving personalized recommendations.
References
1. Esteban A, Zafra A, Romero C (2020) Helping university students to choose elective courses by
using a hybrid multi-criteria recommendation system with genetic optimization. Knowl-Based
Syst 194:105385
2. Mondal S, Basu A, Mukherjee N (2020) Building a trust-based doctor recommendation system
on top of a multilayer graph database. J Biomed Inform 110:103549
3. Dhelim S, Ning H, Aung N, Huang R, Ma J (2021) Personality-aware product recommendation
system based on user interests mining and meta path discovery. IEEE Trans Computer Soc Syst.
8:86–98
4. Bhalse N, Thakur R (2021) Algorithm for movie recommendation system using collaborative
filtering. Mater Today: Proc. https://doi.org/10.1016/j.matpr.2021.01.235
5. Ke G, Du HL, Chen YC (2021) Cross-platform dynamic goods recommendation system based
on reinforcement learning and social networks. Appl Soft Comput 104:107
6. Mohallick I, Özgöbek Ö (2017) Exploring privacy concerns in news recommendation systems.
In: Proceedings of the international conference on web intelligence (WI’17). ACM, New York,
pp 1054–1061
7. Mehmood A, Natgunanathan I, Xiang Y, Hua G, Guo S (2016) Protection of big data privacy.
IEEE Access 4:1821–1834
8. Isinkaye FO, Folajimi YO, Ojokoh BA (2015) Recommendation systems: principles, methods,
and evaluation. Egypt Inform J 16:261–273
9. Tang Q, Wang J (2016) Privacy-preserving friendship-based recommendation systems. IEEE
Trans Dependable Secure Comput 5971:1
10. Huang W, Liu B, Tang H (2019) Privacy protection for recommendation system: a survey. J
Phys Conf Ser 1325:012087
11. Al-Nazzawi TS, Alotaibi RM, Hamza N (2018) Toward privacy protection for location-based
recommendation systems: a survey of the state-of-the-art. In: The 1st IEEE international
conference on computer applications & information security (ICCAIS), pp 1–7
12. Saleem Y, Rehmani MH, Crespi N, Minerva R (2021) Parking recommender system privacy
preservation through anonymization and differential privacy. Eng. Rep. 3(2):12297
13. Luo Z, Chen S, Li A (2013) A distributed anonymization scheme for privacy-preserving recom-
mendation systems. IEEE 4th international conference on software engineering and service
science, pp 491–494
14. Machanavajjhala A, Gehrke J, Kifer D, Venkita Subramaniam M. (2006) L-diversity: privacy
beyond k-anonymity. Proc Int Conf Data Eng 206:24
15. Li N, Li T, Venkatasubramanian S (2007) T-closeness: privacy beyond k-anonymity and l-
diversity. In: Paper presented at: proceedings of the IEEE 23rd international conference on
data engineering, pp 106–115
16. Ogunseyi T, Avoussoukpo C, Jiang Y (2021) A systematic review of privacy techniques in
recommendation systems. Int J Inform Secur 1–14. https://doi.org/10.1007/s10207-023-007
10-1
17. Li Q, Wen Z, Wu Z, Hu S, Wang N, Li Y et al (2021) A survey on federated learning systems:
vision, hype, and reality for data privacy and protection. IEEE Trans Knowl Data Eng
18. Zhang C, Xie Y, Bai H, Yu B, Li W, Gao Y (2021) A survey on federated learning. Knowl
Based Syst 216:106775
19. Elna barawy I, Jiang W, Wunsch DC (2020) Survey of privacy-preserving collaborative filtering.
arXiv preprint. arXiv:2003.08343
20. Yousuf H, Lahzi M, Salloum SA, Shaalan K (2021) Systematic review on fully homomorphic
encryption scheme and its application. Recent Adv Intell Syst Smart Appl 537–551
21. Harn L, Hsu C, Zhang M, He T, Zhang M (2016) Realizing secret sharing with general access
structure. Inf Sci 367:209–220
22. Zhang Y, Deng RH, Xu S, Sun J, Li Q, Zheng D (2020) Attribute-based encryption for cloud
computing access control: a survey. ACM Comput Surv 53(4):1–41
334 E. Gupta and S. Shinde
23. Bouland A, Chen L, Holden D, Thaler J (2017) Vasudevan P.N. on the power of statistical
zero-knowledge. Annu Symp Found Comput Sci Proc 140:708–719
24. Zhang M, Chen Y, Lin J (2021) A privacy-preserving optimization of neighborhood-based
recommendation for medical-aided diagnosis and treatment. IEEE Internet of Things J
8(13):10830–10842
25. Cui L, Wang X, Gu T (2023) A generic data synthesis framework for privacy-preserving point-
of-interest recommender systems. ACM, ISBN 979-8-4007-0228-0/23/08, RACS’23, August
6–10
26. Wang Y, Ma W, Zhang M, Liu Y, Ma S (2023) A survey on the fairness of recommender
systems. ACM Trans Inf Syst 41(3), Article 52
27. Asad M, Shaukat S, Javanmardi E, Nakazato J, Tsukada M (2023) A comprehensive survey on
privacy-preserving techniques in federated recommendation systems. Appl Sci 13(10):6201
28. Amarsingh Feroz C, Lakshmi Narayanan K, Kannan A, Santhana Krishnan R, Harold Robinson
Y, Precila K (2022) Enhancement of data between devices in Wi-Fi networks using security
key. In: Majhi S, Prado RPD, Dasanapura Nanjundaiah C (eds) Distributed computing and
optimization techniques. Lecture Notes in Electrical Engineering, vol 903. Springer, Singapore.
https://doi.org/10.1007/978-981-19-2281-7_42
29. Peyvandi A, Majidi B, Peyvandi S, Patra JC (2022) Privacy-preserving federated learning for
scalable and high data quality computational-intelligence-as-a-service in Society 5.0. Multimed
Tools Appl 81:25029–25050
30. Ribeiro SL, Nakamura ET (2019) Privacy protection with pseudonymization and anonymiza-
tion in a health IoT system: results from Ocariot. In: Proceedings of the 2019 IEEE 19th
international conference on bioinformatics and bioengineering (BIBE), Athens, Greece, pp
904–908
31. Khalfoun B, Ben Mokhtar S, Bouchenak S, Nitu V (2021) EDEN: enforcing location privacy
through re-identification risk assessment: a federated learning approach. Proc ACM Interact
Mob Wearable Ubiquitous Technol 5:1–25
32. Choudhury A, Sun C, Dekker A, Dumontier M, van Soest J (2022) Privacy-preserving federated
data analysis: data sharing, protection, and bioethics in healthcare. In: Machine and deep
learning in oncology, medical physics and radiology. Springer, Cham, pp 135–172
33. Röhrig R (2021) A federated record linkage algorithm for secure medical data sharing. In:
Proceedings of the German medical data sciences: bringing data to life: proceedings of the joint
annual meeting of the German Association of Medical Informatics, Biometry and Epidemiology
(GMDS EV) and the Central European Network-International Biometric Society (CEN-IBS),
Berlin, Germany, 6–11; IOS Press, Amsterdam, vol 278, p 142
34. Pramod D (2023) Privacy-preserving techniques in recommender systems: state-of-the-art
review and future research agenda. Data Technol Appl 57(1):32–55. https://doi.org/10.1108/
DTA-02-2022-0083
35. Sanchez OR, Torre I, He Y, Knijnenburg BP (2020) A recommendation approach for user
privacy preferences in the fitness domain. User Model User-Adap Inter 30(3):513–565. https://
doi.org/10.1007/s11257-019-09246-3
36. Beg S, Anjum A, Ahmad M, Hussain S, Ahmad G, Khan S, Choo KKR (2021) A privacy-
preserving protocol for continuous and dynamic data collection in iot enabled mobile app
recommendation system (mars). J Netw Comput Appl 174:102874 https://doi.org/10.1016/j.
jnca.2020.102874
37. Liu X, Gao B, Suleiman B, You H, Ma Z, Liu Y, Anaissi A (2023) Privacy-preserving person-
alized fitness recommender system (P3FitRec): a multi-level deep learning approach. arXiv:
2203.12200v1[cs.AI]
Attribute-Based Encryption
for the Internet of Things: A Review
K. D. More (B)
Department of Computer Science, MVP’s K. T. H. M. College, Nashik, India
e-mail: kirtimore@kthmcollege.ac.in
D. Pramod
Department of Computer Studies, Symbiosis Centre for Information Technology, Symbiosis
International (Deemed) University, Pune, India
e-mail: dhanya@scit.edu
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 335
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_26
336 K. D. More and D. Pramod
1 Introduction
A revolutionary idea famous with the name as Internet of Things (IoT) enables
computers, physical devices, or equipment to be connected to the Internet and
communicate, gather data, and interact with other systems. IoT includes a wide
spectrum of intelligent gadgets, from basic sensors and actuators to sophisticated
appliances and machinery. Automation, data-driven decision-making, and real-time
monitoring, in a variety of sectors, are made possible by these interconnected devices
collecting and exchanging data. The IoT ecosystem makes it possible for devices
to communicate and exchange data seamlessly, which improves productivity, auto-
mates processes, and makes everyday life and functioning of numerous industries
more convenient. The phrase “Internet of Things” (IoT) is taking over the word
“Internet,” promising yet another decade of astonishing developments. However, the
IoT is extending its reach to your car, office, home, and all the devices within them,
including utility meters, streetlights, sprinkler systems, bathroom scales, and even
walls. This interconnectedness will lead to various enhancements, such as adjusting
your home’s heating based on weather forecasts, automatically watering your garden
only when necessary, and providing immediate assistance while on the road. These
advancements aim to simplify our lives and optimize the use of natural resources [1].
The capabilities, usefulness, and influence of IoT solutions are significantly
shaped by the fundamental characteristics of the Internet of Things as depicted in
Fig. 1. Recognizing and focusing on these basic characteristics are important for
building effective and productive Internet of Things solutions ranging from multiple
domains. These key features make it possible to build intelligent, networked systems
that improve productivity, allow for data-driven decision-making, and have favorable
effects on the environment, society, and economy.
Real-time data collection through connected devices and its analysis creates novel
chances for effective and enhanced quality of life.
Despite the numerous benefits IoT offers, it also introduces significant security chal-
lenges due to its vast and diverse landscape of connected devices. Possible challenges
can be device vulnerabilities, insecure communication, data privacy, authentication
and authorization, firmware and software updates, Distributed Denial-of-Service
(DDOS) attacks, physical security, supply chain security, regulatory compliance,
lack of standardization, etc. Many IoT devices have inadequate security measures
along with limited resources, which make them vulnerable to attacks. Inadequate
encryption and authentication mechanisms during data transmission can lead to data
interception and manipulation. IoT devices often collect and process sensitive data,
making data privacy a major concern. Unauthorized access to such data can lead to
serious privacy breaches. Weak or non-existent authentication mechanisms can result
in unauthorized access to IoT devices or systems. Lack of timely updates and secu-
rity patches for IoT devices can leave them vulnerable to known threats. Distributed
Denial-of-Service (DDoS) attacks can be launched via large-scale botnets created
by compromising IoT devices, causing disruptions to networks and services. Addi-
tional security concerns are created by physical access to IoT devices, which can
result in sensitive data being altered or stolen. Backdoors or vulnerabilities in IoT
devices may result from supply chain vulnerabilities. IoT devices must adhere to
data protection laws and guidelines while collecting private information or sensitive
338 K. D. More and D. Pramod
data. Security flaws may develop as a result of inconsistent security procedures and
standards or protocols among Internet of Things systems and devices. The Internet
of Things is transforming industries by enabling improved automation, surveillance,
and control, which eventually culminates in increased productivity and efficiency. It
can be a useful resource for studying the function of industrial markets using Internet
of Things [2]. A multifaceted strategy which includes end users, policymakers, devel-
opers, and manufacturers is necessary to address these security issues. To create a
safe and secure Internet of Things ecosystem, it is crucial to provide robust authen-
tication, encryption, secure communication protocols, access control mechanisms,
and regular security updates. As IoT is evolving rapidly and in order to realize this
game-changing technology’s full potential, addressing security issues will remain a
primary priority.
the security problems raised by the rapid growth of IoT devices. They draw atten-
tion to the particular difficulties created by the distributive and heterogeneous IoT
ecosystems. The study then conducts a thorough analysis of several security tech-
niques and approaches that have been presented out to address Internet of Things
security challenges. This comprises secure communication protocols, data encryp-
tion, access control, and authentication. They address the benefits and drawbacks
of these methods and provide information on how well they are suited for various
IoT scenarios. The article also explores open research problems about IoT security,
highlighting areas that require more research. These lingering issues cover a variety
of subjects, including safe device onboarding, privacy-preserving systems, intrusion
detection, and assuring end-to-end security in IoT environments. The overview of
the IoT security architecture which currently exists, the issues that IoT security faces,
and potential solutions to these issues are addressed in [5]. The authors list several
risks that exist such as data breaches due to unauthorized access, and potential IoT
device manipulation, while identifying the challenges involved. Additionally, they
go through the threat of Distributed Denial-of-Service (DDoS) attacks, the absence
of established security protocols, and the challenge of maintaining security in devices
with limited resources. The study suggests the need for robust authentication mech-
anisms, secure communication protocols, and proper encryption techniques to safe-
guard data transmission and storage. The authors also advocate for continuous moni-
toring and updating of IoT device’s security measures and emphasize the importance
of educating both manufacturers and users about IoT security best practices [5].
Overall, access control is a foundational aspect of IoT security. By effec-
tively implementing access control mechanisms, IoT stakeholders can significantly
enhance the security and trustworthiness of their deployments, mitigating potential
risks and enabling the full potential of IoT technology.
Public key cryptography is famous in fundamental techniques for data security. The
sender encrypts the data using the recipient’s public key, and the recipient decrypts it
using its private key. Insofar the recipient maintains his own private key; the commu-
nication is encrypted to the specific user, ensuring that only the intended recipient
is able to decipher the message. The applications, like social networks and cloud
storage, enable communication between groups of people who have similar inter-
ests or features. It is necessary to have the restriction that only users with certain
characteristics will be allowed to decode and read the message. Since the recip-
ient’s identities cannot be known beforehand, the traditional technique—public key
cryptography cannot be employed directly [6]. Using the advanced cryptographic
approach known as Attribute-Based Encryption, owners of data can encrypt data
and specify access policies according to particular attributes. These attributes can
be related to user attributes, device properties, or any other relevant characteristics.
Through the implementation of ABE, fine-grained access control is made possible,
340 K. D. More and D. Pramod
allowing only users or devices with attributes that meet the requirements of the access
policy to decrypt and access encrypted data. When used in the context of IoT, ABE
enables content owners to encrypt IoT data with access policies depending on the
attributes held by users or devices in the IoT ecosystem. This granular control over
data access is especially valuable in diverse and dynamic IoT environments, where
various entities may require access to specific data based on specific conditions.
Given the substantial volume and significance of information hosted on online plat-
forms, there is a growing concern regarding the potential compromise of personal
data. This unease is exacerbated by a surge in recent cyber-attacks and the legal
pressures confronting such services. One potential solution to these challenges is the
adoption of data encryption, which would limit information loss in case of a breach.
However, encrypting data have drawbacks, particularly in terms of the user’s ability to
selectively exchange encrypted data at a detailed level. This issue is exemplified by a
scenario where a user aims to grant decryption access for specific Internet traffic logs
to a party based on specific conditions. Current options either entail decrypting entries
individually or sharing the private decryption key, both of which have downsides. A
significant context where these challenges manifest is audit logs. The work of Sahai
and Waters [7] introduces an approach to mitigate this issue through Attribute-Based
Encryption. User’s keys and ciphertexts are associated with descriptive attributes
in an ABE system, allowing a key to do decryption of ciphertext only if attributes
match among them. Sahai and Waters’ cryptosystem enables decryption when a
certain number of overlapping attributes exist between a private key and ciphertext.
Limitations in expressibility appear to hinder its suitability for larger systems, while
useful for error-tolerant encryption with biometrics [8]. The article [9] discusses
Attribute-Based Encryption (ABE) schemes with constant-size ciphertexts. Tradi-
tional ABE schemes often produce ciphertexts that grow in size with the number of
attributes, which can be inefficient. The article explores advanced ABE techniques
that maintain constant ciphertext size, regardless of the number of attributes involved.
These schemes are particularly useful for applications where efficient and compact
data encryption is essential, such as secure data sharing in resource-constrained
environments or cloud computing. ABE has two important subtypes: Cipher-Policy
Attribute-Based Encryption (CP-ABE) and Key Policy Attribute-Based Encryption
(KP-ABE). In KP-ABE, each encrypted data piece or ciphertext is related with a
set of descriptive attributes. On the other hand, every user’s private key is asso-
ciated with an access policy or structure which includes the attributes which key
needs in order to decrypt certain ciphertext. This access policy, referred to as the
“key policy,” defines the conditions under which a user can access specific encrypted
data. In CP-ABE, each ciphertext is associated with a policy that defines the attributes
required for a user’s private key to decrypt the ciphertext. Frequently, this policy is
referred to as a “cipher policy.” Users are given private keys that have been associated
with certain attributes, and they can only decrypt ciphertext if their attributes fulfill
the cipher policy of the ciphertext. The encrypted data decryption can be possible
and accessed by a user if their attributes meet the cipher policy. The fundamental
difference between CP-ABE and other Attribute-Based Encryption methods like
Key-Policy Attribute-Based Encryption exists at the point where access control is
Attribute-Based Encryption for the Internet of Things: A Review 341
(1) Setup → PK,MK): This random algorithm only accepts implicit security
parameters as input. The primary key (MK) and the public parameters (PK)
are its outputs.
(2) Encryption (PK,m, AS) → CT: Public parameters (PK), a message (m), and
an access structure (AS) over a set of attributes (of users) are the inputs used
by the encryption algorithm. The algorithm encrypts message m and creates a
ciphertext (CT) so that users who adhere to the access structure can decode the
message m. It is considered that AS is contained implicitly in the ciphertext.
(3) Key Generation (MK, S) → D: The master key (MK) and the set of attributes
(S) that describe the key are required as input to the key generation process. Its
output is a private key named as (D).
Attribute-Based Encryption for the Internet of Things: A Review 343
(4) Decrypt (PK, CT, D) → m: The public parameters (PK), the ciphertext (CT)
with access policy (AS), and the private key (D), which is a private key for
attribute set (S), are all inputs for the decryption algorithm. The algorithm
decrypts the ciphertext and results in message (m) if S satisfies AS [10].
can revoke access to particular devices or specific users by altering the values of the
relevant attributes. This feature is critical in IoT environments, where access priv-
ileges may need to be revoked due to changes in user roles, device ownership, or
security incidents [22]. The authors [23] effectively address the growing need for
secure and efficient data sharing in eHealth environments, where multiple parties
collaborate and share sensitive medical information. The paper introduces the cryp-
tographic framework CESCR as a solution to the challenges associated with ensuring
data confidentiality, access control, and user revocation.
Secure Data Sharing and Collaboration: In IoT ecosystems, data sharing and
collaboration among devices and users are common. ABE facilitates secure data
sharing by allowing data to be encrypted with access policies based on the required
attributes, ensuring that only authorized entities can share and access the data.
Proposed (CP-ABE) system designed by authors in [13] for secure data sharing
within smart cities allows data owners to define access policies based on attributes,
ensuring that only the users who are authorized and satisfy the attributes can do the
decryption and access the shared data. The system addresses privacy concerns in
smart city environments by allowing data sharing without exposing sensitive infor-
mation about the data owner or users. The authors introduce a practical solution
that enhances security and confidentiality in the context of data sharing within smart
cities [13]. As per the key contribution in [24] that employs the technique of hidden
policies, which further enhance privacy by concealing the specific access policies
associated with the shared data. This approach aims to ensure secure data sharing
while protecting sensitive information about access rights and policies within the
smart grid infrastructure [24]. The paper [23] introduces a novel Ciphertext-Policy
Attribute-Based Encryption (CP-ABE) scheme named CESCR, designed for effi-
cient and secure data sharing in collaborative eHealth environments. CESCR supports
attribute-based access control, allowing authorized users to decrypt shared data based
on their attributes. Notably, the scheme also incorporates revocation mechanisms to
manage changes in user access rights over time. Unlike some prior schemes, CESCR
eliminates the need for dummy attributes, enhancing efficiency in the decryption
process while maintaining strong security measures for collaborative data sharing in
the eHealth sector [23]. The authors [25] discuss various Attribute-Based Encryp-
tion techniques and their applicability to industrial contexts, where sensitive data
needs to be shared among authorized users while maintaining confidentiality and
access control. The paper surveys different methodologies and technologies that can
be employed to establish secure data sharing frameworks in the industrial domain,
contributing to the development of robust data protection mechanisms for industrial
applications [25].
Scalability and Resource Efficiency: ABE can be appropriate for situations
where devices operate with restricted memory and processing capabilities because it
can be implemented in IoT devices with limited resources [26]. Point-to-multipoint
communication plays an important role in cloud computing contexts, and the authors
[27] deal with the difficulties of access control and data confidentiality in a scattered
and scalable cloud environment. The novel ABE system introduced particularly for
point-to-multipoint communication is the paper’s main contribution. Enhancing the
Attribute-Based Encryption for the Internet of Things: A Review 347
security and scalability of IoT systems is the goal of the work in [19]. The authors
address the issue of protecting Internet of Things (IoT) content while taking into
account the resource limitations of IoT devices by merging lightweight crypto-
graphic approaches and ABE. Table 1 addresses the literature on how ABE’s compu-
tational and storage overhead affects the constrained devices and practical approach
of proposed security models in real-world IoT scenarios.
Secure Device Management: Secure device management in IoT contexts can be
accomplished with ABE. Access to device management operations can be regulated
by associating attributes to devices, making sure that only authorized individuals
or systems can handle IoT devices safely. An innovative Attribute-Based Encryp-
tion (ABE) system is presented in article [33] with the goal of improving access
control in a blockchain-based IoT context. The authors identified the difficulties
in implementing effective and secure access control in the context of IoT devices
integrated by a blockchain architecture. They suggest an ABE-based approach that
incorporates blockchain technology to control access rights to deal with this. They
have developed a system that enables data owners to specify access controls based
on attributes, ensuring that only authorized users with appropriate attributes can
do decryption and access the information. The suggested approach intends to improve
the secrecy and integrity of Internet of Things content while retaining effective access
control by utilizing the safety features of ABE and the distributed architecture of
blockchain. The study offers a potential method for enhancing the data integration
between ABE and blockchain for the safety of IoT devices [33].
Adaptability to Changing Environments: Dynamic IoT ecosystems experi-
ence frequent changes, such as devices joining or leaving the network and user’s
attributes evolving over time. ABE provides adaptability to the changing environ-
ment by allowing access policies to be modified dynamically. This ensures that access
privileges remain up-to-date and responsive to changes in the IoT environment [13].
The Internet of Things (IoT) presents significant access control and data privacy
challenges due to the variety of IoT devices and heterogeneous distributed networks.
For the IoT environment, many security architectures and models have recently been
presented. To actively defend against new breaches and insider attacks, authors in [34]
suggested a novel data-centric security technique called Ciphertext-Policy Attribute-
based Encryption (CP-ABER-LWE) which protects data at rest as well as data in
transit.
Table 1 (continued)
Resource Proposed work Technique
[31] Constrained devices’ memory could be not enough to CP-ABE technique with
store CP-ABE decryption keys and in CP-ABE key AND gates access structure
size is dependent on number of attributes. Article is implemented proposing
proposes a CP-ABE technique with an AND gates constant-size CP-ABE
access structure that is proven to be secure. Their decryption keys which can
analysis demonstrated that the decryption key for the be stored in lightweight
proposed technique can be stored in lightweight devices. A constant-size
devices, making it the effective CP-ABE decryption key with length
of 672 bits (80-bit security)
is provided by proposed
CP-ABE technique
[32] Computational burden on host and user can be The two semi-trusted
minimized by the model proposed by authors in proxies that are used in the
which end-to-end cryptographic operations can be proposed technique out of
outsourced. Scheme is implemented for constrained which one is for outsourcing
devices like mobile devices computationally demanding
encryption and another for
outsourcing decryption
only authorized parties can decrypt and access the information showcasing its
resilience to potential threats.
3. Industrial IoT (IIoT): In industrial IoT applications, ABE can be applied to
protect intellectual property, sensitive production data, and access to critical
industrial control systems. Authors [39] address the challenges of secure data
access control in cloud-based industrial Internet of Things (IIoT) environments.
Recognizing the critical need for data security in these contexts, the authors
propose a novel solution that focuses on two key aspects: auditability and time-
limited access control. The proposed solution offers benefits such as enhanced
security, accountability, and controlled data sharing. Recognizing the imperative
of safeguarding sensitive information while enabling efficient data exchange, the
authors [13] explore attribute-based approaches as a solution. These approaches
leverage attributes to define access policies, granting permissions based on
specific criteria. By addressing challenges like access management in complex
industrial networks, the article contributes to the field’s understanding of secure
data sharing.
4. Smart Cities: ABE is relevant for securing data sharing in smart cities among
various stakeholders, including government authorities, service providers, and
citizens. ABE offers safe data sharing by encrypting content with access poli-
cies based on significant attributes, guaranteeing that only authorized systems or
users can access important city information [40]. The authors in [25] put up an
innovative strategy that makes use of ABE to protect data privacy while facili-
tating effective sharing. The authors highlight the benefits of their strategy, such
as improved privacy, granular access control, and customized data sharing. The
authors emphasize the importance of their research in securing user’s privacy and
enabling data-driven developments in urban areas. The authors [41] give a thor-
ough survey that explores numerous facets of smart city technologies in recogni-
tion of the growing importance of smart city projects. In order to construct smart
cities, various protocols for communication and architectural frameworks are
methodically explored in this article. It offers insights on the development of
edge-centric, distributed, and centralized smart city architectures, highlighting
the advantages and disadvantages of them. The writers also go through commu-
nication protocols like IoT, LoRaWAN, and 5G, emphasizing how they help with
the effective interchange of content or information in scenarios like smart cities.
5. Vehicular IoT (VIoT): ABE can be used in vehicular IoT to secure information
transfer and access control in vehicles that are connected. Access control based
on vehicle features can be used to encrypt vehicle-related data, assuring that
only authorized cars, other vehicles, or authorities have access to particular data,
like traffic data or records of maintenance [42]. For Vehicular Ad Hoc Networks
(VANETs), a novel framework for guaranteeing safe communication is presented
in [43]. The suggested system uses Attribute-Based Encryption to create safe
channels for exchanges among infrastructure, vehicles, and other systems. The
system imposes access policies that allow authorized users to decode and view the
sent information by associating attributes to users. While applying ABE, access
to vehicle information can be restricted using factors like vehicle ID, permitted
352 K. D. More and D. Pramod
systems have limitations due to the absence of customized options and oversight
of the skills and credibility of teachers. The article [48] suggests an attribute-based
recommendation system for education services that protects privacy. This scheme
allows users to set customized requirements by using attribute-based searchable
encryption for keyword searches and fine-grained access control. Teachers and
attribute authority cooperate in order to generate keys anonymously, which guar-
antee the security of teacher’s keys. The education platform can select the best
teacher for a task. Several education sector systems for which there are chances
of applying Attribute-Based Encryption can be access control for learning plat-
forms, secure data sharing, personalized learning platforms, student information
system security, etc.
The effectiveness of ABE security models in IoT depends on factors like the
chosen ABE scheme’s cryptographic strength, implementation quality, and the
specific IoT use case. Additionally, other factors like computational efficiency,
resource constraints in IoT devices, and the ability to handle dynamic attribute
changes also impact their overall effectiveness.
Till date, several Attribute-Based Encryption (ABE) libraries, frameworks, and tools
have been developed, some of which are suitable for IoT environments. Table 3 below
is an overview of some existing ABE libraries, frameworks, and tools, along with
their source references.
Cryptographic frameworks and libraries are developing continuously with revo-
lutionary innovations. According to the context, ABE strategy can be needed for
your use case. The programming language that is used in your IoT environment,
the resource limitations of IoT devices, performance overhead, and support from
the community should all be taken into account when applying ABE implemen-
tations. Furthermore, be watchful of security issues and ensure that the selected
implementation meets the desired security properties for your IoT environment.
Table 3 Overview of existing ABE libraries, frameworks, or tools suitable for IoT environments
Library Pros Cons Source
Charm Easy-to-use, Python-based, supports May have performance https://git
Crypto multiple ABE schemes like CP-ABE overhead for hub.com/
and KP-ABE resource-constrained IoT JHUISI/
devices charm
ABY Suitable for privacy-preserving May have higher https://git
framework computations in IoT scenarios computation and hub.com/enc
communication overhead ryptogrou
p/ABY
OpenABE Open-source C++ library, supports May have challenges in https://git
CP-ABE, KP-ABE, and resource-constrained IoT hub.com/zeu
Ciphertext-Policy Hierarchical ABE environments tro/openabe
(CP-HABE). Actively maintained
CryptID Supports different ABE schemes, May require more effort https://git
efficient in C language to integrate with IoT hub.com/cry
applications ptid-org
Attribute-Based Encryption for the Internet of Things: A Review 355
References
1. Hersent O, David B, Omar E (2011) The internet of things: key applications and protocols.
https://doi.org/10.1002/9781119958352
2. Perera C, Liu CH, Jayawardena S, Chen M (2015) A survey on the internet of things from an
industrial market perspective. IEEE Access 2:1660–1679. https://doi.org/10.1109/ACCESS.
2015.2389854
3. Nasiraee H, Ashouri-Talouki M (2020) Anonymous decentralized attribute-based access
control for cloud-assisted IoT. Futur Gener Comput Syst 110:45–56. https://doi.org/10.1016/
j.future.2020.04.011
4. Görmüş S, Aydın H, Ulutaş G (2018) Security for the internet of things: a survey of existing
mechanisms, protocols and open research issues. J Fac Eng Architect Gazi Univ 33(4):1247–
1272
5. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2016) Internet of things (IoT) security: current
status, challenges and prospective measures”. 2015 10th international conference for internet
technology and secured transactions. ICITST 2015:336–341. https://doi.org/10.1109/ICITST.
2015.7412116
6. Han Y (2019) Attribute-based encryption with adaptive policy. Soft Comput 23:4009–4017.
https://doi.org/10.1007/s00500-018-3370-z
7. Sahai A, Waters B (2005) Fuzzy identity based encryption. In: Advances in cryptology—
Eurocrypt, vol 3494 of LNCS, Springer, pp 457–473
8. Vipul G, Pandey O, Sahai A, Waters B (2006) Attribute-based encryption for fine-grained
access control of encrypted data. In: Proceedings of the ACM conference on computer and
communications security, pp 89–98. https://doi.org/10.1145/1180405.1180418
356 K. D. More and D. Pramod
27. Tamizharasi GS, Balamurugan B, Aarthy SL (2016) Scalable and efficient attribute based
encryption scheme for point to multipoint communication in cloud computing. In: Proceedings
of the international conference on inventive computation technologies, ICICT 2016, pp 1.
https://doi.org/10.1109/INVENTIVE.2016.7823292
28. Taha MB, Talhi C, Ould-Slimane H (2019) Performance evaluation of cp-abe schemes under
constrained devices. Procedia Comput Sci 155:425–432. https://doi.org/10.1016/j.procs.2019.
08.059
29. Ambrosin M, Anzanpour A, Conti M, Dargahi T, Moosavi SR, Rahmani AM, Liljeberg P
(2016) On the feasibility of attribute-based encryption on internet of things devices. IEEE
Micro 36(6):25–35. https://doi.org/10.1109/MM.2016.101
30. Wang X, Jianqing Z, Eve M, Schooler MI (2014) Performance evaluation of attribute-
based encryption: toward data privacy in the IoT. In: 2014 IEEE international conference
on communications, ICC 2014, pp 725–730. https://doi.org/10.1109/ICC.2014.6883405
31. Guo F, Yi M, Willy S, Duncan SW, Vijay V (2014) CP-ABE with constant-size keys for
lightweight devices. IEEE Transact Inform Foren Secur 9(5):763–771. https://doi.org/10.1109/
TIFS.2014.2309858
32. Asim M, Milan P, Tanya I (2014) Attribute-based encryption with encryption and decryption
outsourcing. In: Proceedings of 12th Australian information security management conference,
AISM 2014, pp 21–28. https://doi.org/10.4225/75/57b65cc3343d0
33. Zhang J, Xin Y, Gao Y, Lei X, Yang Y (2021) Secure ABE scheme for access management in
blockchain-based IoT. IEEE Access 9:54840–54849. https://doi.org/10.1109/ACCESS.2021.
3071031
34. Fun TS, Samsudin A (2017) Attribute based encryption—a data centric approach for securing
internet of things (IoT). Adv Sci Lett 23(5):4219–4223. https://doi.org/10.1166/asl.2017.8315
35. Sadeeq Mohammed AM, Subhi RM, Zeebaree RQ, Sarkar HA, Karwan J (2018) Internet of
things security: a survey. In: ICOASE 2018—international conference on advanced science
and engineering, pp 162–66. https://doi.org/10.1109/ICOASE.2018.8548785
36. Zhang Y, Zheng D, Deng RH (2018) Security and privacy in smart health: efficient policy-
hiding attribute-based access control. IEEE Internet Things J 5(3):2130–2145. https://doi.org/
10.1109/JIOT.2018.2825289
37. Imam R, Kumar K, Raza SM, Sadaf R, Anwer F, Fatima N, Nadeem M, Abbas M, Rahman O
(2022) A systematic literature review of attribute based encryption in health services. J King
Saud Univ Comput Inform Sci 34(9):6743–6774. https://doi.org/10.1016/j.jksuci.2022.06.018
38. Yang W, Zhitao G (2019) An efficient attribute based encryption scheme in smart grid. Lecture
notes in computer science (including subseries lecture notes in artificial intelligence and lecture
notes in Bioinformatics) 11982 LNCS: 159–72. https://doi.org/10.1007/978-3-030-37337-5_
13
39. Li T, Jiawei Z, Yanbo Y, Wei Q, Yangxu L (2021) Auditable and times limitable secure data
access control for cloud-based industrial internet of things. J Netw Netw Appl 1(3):129–138.
https://doi.org/10.33969/j-nana.2021.010306
40. Karankar N, Seth A (2023) A comprehensive survey on internet of things security: challenges
and solutions. Lect Notes Data Eng Commun Technol 166:711–728. https://doi.org/10.1007/
978-981-99-0835-6_51
41. Sajid A, Shah SW, Magsi T (2022) Comprehensive survey on smart cities architectures and
protocols. EAI Endors Transact Smart Cities 6(18):e5. https://doi.org/10.4108/eetsc.v6i18.
2065
42. Fawaz K, Shin KG (2019) Security and privacy in the internet of things. Computer 52(4):40–49.
https://doi.org/10.1109/MC.2018.2888765
43. Cui H, Deng RH, Wang G (2019) An Attribute-based framework for secure communications
in vehicular ad hoc networks. IEEE/ACM Trans Netw 27(2):721–733. https://doi.org/10.1109/
TNET.2019.2894625
44. Utomo IS, Celine MP, Daniel JVM, Bakti AJ (2022) A systematic literature review of privacy,
security, and challenges on applying IoT to create smart home. In: Proceedings—IEIT 2022:
2022 international conference on electrical and information technology, pp 154–159. https://
doi.org/10.1109/IEIT56384.2022.9967907
358 K. D. More and D. Pramod
45. Chowdhury R, Hakima OS, Chamseddine T, Mohamed C (2017) Attribute-based encryption for
preserving smart home data privacy. Lecture Notes in Computer Science (Including Subseries
Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 10461 LNCS, pp
185–97. https://doi.org/10.1007/978-3-319-66188-9_16
46. Chukkapalli S, Laya SS, Piplai A, Mittal S, Gupta M, Joshi A (2020) A smart-farming ontology
for attribute based access control. In: Proceedings—2020 IEEE 6th intl conference on big data
security on cloud, BigDataSecurity 2020, 2020 IEEE intl conference on high performance and
smart computing, HPSC 2020 and 2020 IEEE Intl conference on intelligent data and security,
IDS 2020, pp 29–34. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS49724.2020.00017
47. Ferrag MA, Shu L, Yang X, Derhab A, Maglaras L (2020) Security and privacy for green IoT-
based agriculture: review, blockchain solutions, and challenges. IEEE Access 8:32031–32053.
https://doi.org/10.1109/ACCESS.2020.2973178
48. Huan L, Xueyan L, Ruirui S, Linpeng L (2023) Privacy-preserving attribute-based educational
service recommendation in online education system. In: Proceedings of the 2022 international
conference on computer science, information engineering and digital economy (CSIEDE 2022)
103, pp 826–38. https://doi.org/10.2991/978-94-6463-108-1_92
A Short Survey Work for Lung Cancer
Diagnosis Model: Algorithms Utilized,
Challenging Issues, and Future Research
Trends
Abstract Because of the systemic side effects of the presence of a tumor or due to
abnormal results obtained from radiography carried out on the chest, lung cancer is
generally identified in suspected people. On the basis of the lung cancer type, the
diagnosis approach can be altered. Factors like the location, metastasis presence, size
of the tumor, and the cancer type all influence the overall diagnosis and detection
process. The most efficient method of detecting lung cancer is staging them into
their types. The best approaches for detecting lung cancer are utilized in diagnosis
procedures in order to enhance the disease detection sensitivity and to avoid unnec-
essary invasion techniques. Since lung cancer causes more deaths in both men and
women, the efficient diagnosis method for detecting lung cancer has to be known.
To overcome all the limitations in conventional lung cancer detection and diagnosis
framework, a detailed review of traditional lung cancer diagnosis systems is carried
out in this work. In the primary section, the basic steps and procedures involved
in the conventional lung cancer detection and diagnosis frameworks are provided.
Following this, a detailed survey on the conventional lung cancer diagnosis system
is done. A short chronological evaluation is then implemented to evaluate the time-
line of the lung cancer diagnosis system. Using literature survey, the methodologies
used in conventional works are identified and grouped. Consequently, the datasets
adopted for testing and training these systems are studied. The performance measures
are used in analyzing the classical lung cancer diagnosis system that is then inves-
tigated. Limitations and advantages of conventional lung cancer diagnosis systems
are then classified. Finally, the research gaps and the futuristic direction are given.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 359
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_27
360 N. Shaikh and P. Shah
1 Introduction
Lung cancer is the leading reason for cancer-based deaths in individuals, based
on World Cancer Report [1]. When compared to all other cancer forms, lung
cancer is different since it has the highest fatality and incidence rate [2]. Sadly,
the detection and diagnosis of lung cancer are usually made at a very later stage,
which has an impact on the effectiveness of the treatment provided. Patients who
suffer from lung cancer generally survive five years following diagnosis, and this
percentage could rise to 49% in cases of early detection and diagnosis is carried out.
Discovery of pulmonary nodules in the early stages and permitting early treatment
can increase the likelihood of survival of the affected individual [3]. “Computed
tomography (CT)” is the screening technique that is most frequently utilized for
detecting lung cancer [4]. The CT scan is thought to be a more sensitive imaging
technique for finding lung cancer than other methods. When compared to other tech-
niques, CT has a higher sensitivity than chest X-rays and is substantially cheaper than
“Magnetic Resonance Imaging (MRI)” and “Positron Emission Tomography (PET)”
[5]. CT scans work on principle of single-line X-ray images taken from different
directions, giving a clear 3D view of the internal structures of the lungs, making
it easier to detect pulmonary nodule’s size, location, shape, and volume. However,
because of the factors like dose-effectiveness, capacity to detect previously unde-
tected pathologic abnormalities, cost-effectiveness, and ease of clinical availability,
chest radiology continues to be the most widely utilized imaging modality for chest
disorders [6].
Radiologists are overburdened by the volume of images they must examine as a
result of the rapid screening by CT [7]. In order to automate the initial screening
of pulmonary nodules by identifying dangerous lesions, “computer-aided detection
(CAD)” is generated [8]. This allows radiologists to distinguish between probable
abnormalities. This can significantly increase the accuracy of pulmonary nodule iden-
tification while effectively reducing the burden on radiologists [9]. The existence of
pulmonary nodules within the ribcage can be thus readily identifiable by radiolo-
gists. However, a trustworthy CAD method that can quickly and precisely automate
the detection of pulmonary nodules is required [10]. Factors such as the segmen-
tation region, local features, spatial and space-oriented features, as well as posi-
tions are crucial for the efficient operation of the CAD system [11]. Even though
these CAD solutions are designed to support radiologists in the precise identifica-
tion of pulmonary nodules, these factors affect overall performance of the system.
Although detection of lung cancer is made easy through a CAD system, the essen-
tial features of this system are difficult to compute [12]. The enhanced data handling
capacity and the efficient computation process have made the deep learning technique
more prominent and effective in the medical image evaluation process [13]. However,
the training with the limited number of data and the insufficient optimization of the
network parameters has made this method insensitive to pulmonary nodules, and
hence, it is capable of detecting only fewer lung cancer cases when compared with
conventional methods.
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 361
The main works focused on this survey work are listed below.
• To review the deep learning-based lung cancer diagnosis framework that has
been developed in the timeline ranging from 2014 to 2023 and to compare their
performance with conventional works.
• To evaluate the various datasets and algorithms used by the existing works and to
categorize them accordingly.
• To analyze the limitations and merits of these existing works and to investigate
the performance measures used to validate them.
• To examine the research gaps in the existing models and to suggest possible future
directions in developing an efficient lung cancer diagnosis framework.
The outline of this survey is provided as follows. In the second section, a short
literature survey on conventionally developed lung cancer diagnosis models: chrono-
logical and dataset analysis is given. In the third section, categorization of algorithms
used in lung cancer diagnosis models and their analysis of performance metrics are
provided. In the fourth section, the research gaps and future works in lung cancer
diagnosis models are discussed. Finally, the survey is summarized in the fifth section.
In 2017, Wang et al. [16] have suggested the fusion of hand-crafted and deep
features for the accurate detection of lung cancer with lesser “false positive rates
(FPR).” Experimentation of this suggested model on a public dataset suggested that
the accuracy, sensitivity, and specificity offered by this deep feature fusion-based
model to aid the CAD-based lung cancer detection system was more with few false
positives. The utilization of these deep fused features aided the CAD system in
accurately detecting the pulmonary nodule from the provided image input.
In 2018, Gu et al. [17] have developed a detection system for identifying the
presence of pulmonary nodules using a “multi-scale three-dimensional convolu-
tional neural network (3D-CNN).” The CT images from the “Lung Nodule Analysis
2016 (LUNA16)” dataset were gathered. The segmentation of these lung images
was carried out using a comprehensive approach. The deep feature extraction was
done using the 3D-CNN model. The detection of the pulmonary nodule was done
using the cube clustering and prediction method. The nodules of even smaller sizes
were also detected by this multi-scale approach. The tenfold validation on the imple-
mented model suggested that the sensitivity and the “competition performance metric
(CPM)” of the implemented pulmonary nodule detection model were higher, and the
FPR of the detection outcomes was lower.
In 2018, Zhang et al. [18] have implemented combined deep learning and clas-
sical techniques for detecting pulmonary nodules. The developed model was named
NODULE. The presence of the possible pulmonary nodule was detected using
detecting the size and shape constraints with the aid of a “multi-scale Laplacian of
Gaussian (MS-LoG)” filter. The genuine pulmonary nodules were then detected using
the “3D Deep CNN (3D-DCNN)” model. The model was trained on the LUNA16
dataset. Both the presence of pulmonary nodule and its diameter were determined
accurately using this approach, which was proved by conducting experimentation.
The detection score obtained from this model was higher.
In 2019, Hoang et al. [19] have suggested a two-step lung cancer detection frame-
work using deep learning techniques where non-carcinoma area was eliminated in the
first step. The detection of lung cancer was carried out in the next step using another
deep learning classifier. On experimentation on a huge number of CT images, it was
confirmed that the errors in the detection of lung cancer in the lymph nodes were
minimized, along with enhanced specificity and sensitivity.
In 2019, Lakshmanaprabu et al. [20] have executed “linear discriminant analysis
(LDA)” along with “optimal deep neural network (ODNN)” for classifying lung
cancer. The dimension of features being extracted from CT images was minimized
using LDA. The classification of CT image as benign and malignant was done by
DNN model, in which the parameters were tuned with the aid of the “modified
gravitational search algorithm (MGSA).” Experimental validation showed that the
classification outcome from the implemented model was highly accurate, sensitive,
and was more specific.
In 2020, Huang et al. [21] have executed a lung cancer diagnosis framework by
combining “extreme learning machine (ELM)” with “deep transfer CNN (DTCNN).”
The DTCNN was primarily utilized for mining the essential attributes from “First
Affiliated Hospital of Guangzhou Medical University in China (FAH-GMU)” and
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 363
“Lung Image Database Consortium and Image Database Resource Initiative (LIDC-
IDRI)” CT image datasets. The classification of malignant and benign was done by
the ELM. Experimental verification suggested that the performance offered by the
fusion model was higher than conventional approaches.
In 2020, Moitra and Mandal [22] have implemented a grading and staging method
for “non-small cell lung cancer (NSCLC)” using “one-dimensional CNN (1D-
CNN).” Images from the “The Cancer Imaging Archive (TCIA)” were gathered.
The deep features were extracted using the “Maximally Stable Extremal Regions-
Speeded-Up Robust Feature (MSER-SUPF)” model. The concatenated “Tumor
Speed Metastasis (TNM),” along with the extracted features, was fed to the 1D-CNN
system for lung cancer stage detection. Simulation outcomes showcased the enhanced
“receiver operating curve (ROC)-AUC” scores and accuracy of the implemented
model.
In 2020, Li et al. [23] have suggested a lung cancer identification framework. The
features from the “Japanese Society of Radiological Technology (JSRT)” dataset
were extracted using the multi-resolution convolution model. The classification of
lung cancer was carried out using a fusion network. The FPR, CPM, and AUC of the
suggested fusion model were much more efficient than the existing detection lung
cancer models.
In 2020, Toğaçar et al. [24] have executed a lung cancer detection framework by
utilizing distinct transfer learning approaches like VGG-16, LeNet, and AlexNet. The
features from the collected CT images were extracted using CNN. Image augmenta-
tion was adopted to enhance the detection rate. The extracted features were optimally
selected using the “minimum redundancy maximum relevance (mRMR)” method.
The features obtained were given to several machine learning classifiers. Out of
several models, the KNN model provided better classification accuracy in detecting
lung cancer from CT images.
In 2012, Silva et al. [25] investigated the correlation between various attributes in
order to examine the “epidermal growth factor receptor (EGFR)” with the aid of a 2D
“region of Interest (ROI)” from the CT images. The CT images were reconstructed
using a convolutional autoencoder. The features were extracted using the encoder
unit. The examination of the EGFR was carried out by arranging the classifier on
the upper portion of the feature extractor. The suggested model attained a better
prediction rate by identifying the essential biomarkers from the CT images.
In 2021, Sori et al. [26] have suggested a denoising-detection model for lung
cancer in an automatic manner. The preprocessing of gathered images was carried
out using “residual denoising network (DR-Net).” The preprocessed images were
provided as input to the two paths CNN. The concatenation of the global and the
local features was carried out by the two paths in the developed model, and the
detection was carried out using CNN. A discriminant correlation validation was
carried out, which proved that the suggested denoising-detection model was capable
of eliminating the noise from the images, used even for inconsistent nodules, and
balanced the overall receptive field.
In 2023, Pradhan et al. [27] have utilized the healthcare record of the patients to
predict lung cancer using “recurrent neural network (RNN).” Two datasets were used
364 N. Shaikh and P. Shah
to collect the CT image data. The features from CT images were extracted using “t-
distributed stochastic neighbor embedding (t-SNE)” and “principal component anal-
ysis (PCA).” The classification was done using RNN, in which the parameters were
tuned with the “Self-Adaptive Sea Lion optimization (SA-SLnO)” algorithm. The
experimental results showed that performance of the suggested SA-SLno-RNN was
much better than conventional approach in classifying lung cancer with a minimum
mean square error (MSE).
In 2023, Navaneethakrishnan et al. [28] have generated a lung cancer diag-
nosis framework using “Bat Deer Hunting Optimization Algorithm-based DCNN
(BDHOA-DCNN).” Initially, the segmentation of the CT images was carried out.
Then classification of the nodule and the lobes was executed by the BDHOA-based
DCNN model. The performance provided by this BDHOA-based DCNN model was
higher than the conventional diagnosis systems.
In 2023, Tasmin et al. [29] resolved the interpretability issues in the machine
learning model for detecting lung cancer using the “explainable machine learning
(XML)” technique. The “random oversampling (ROS)” technique was used to extract
the features from the gathered dataset. The prediction outcome was obtained using
the “SHapley Additive exPlanation (SHAP)” technique. An outstanding performance
was provided by the implemented XML with higher transiency in detecting lung
cancer. The efficiency of the developed XML was demonstrated by implementing a
mobile application.
In 2023, Wankhade and Vigneshwari [30] implemented a lung cancer detection
framework using a “hybrid neural network (HNN).” The features from the CT images
gathered from the “LIDC-IDRI” dataset were extracted using the DNN model. The
diagnosis of lung cancer as malignant and benign was executed using the 3D-CNN
model. Accurate diagnosis of lung cancer at the beginning stage was possible with
this technique.
In 2023, Mithun et al. [31] have developed three models, namely the “bidirec-
tional long short-term memory (Bi-LSTM),” “Bi-LSTM with dropout,” and “bidirec-
tional encoder representations from transformers (BERT)” to detect the lung cancer
from clinical cohorts. The “Materialise Interactive Medical Image Control System
(MIMIC)-III” dataset was used for analyzing the working of the suggested model.
With minor oversampling, the performance of all three models has been enhanced.
In 2023, Ali et al. [32] utilized an “Ensemble 2D CNN (E-2D-CNN)” model to
detect the presence of lung cancer from the LUNA16 CT images. The LUNA16 CT
images were gathered. The gathered images were given as input to the E-2D-CNN,
which was made by combining two or more CNN structures. The classification of the
CT images into cancerous and non-cancerous types was done using this E-2D-CNN
with more accuracy.
In 2023, Alsadoon et al. [33] examined various research articles to obtain the
conditions for evaluating the real-time deep learning-based lung cancer detection
framework. Then, a “Data, Feature Selection, Classification, and View (DFCV)”-
based deep learning model was implemented for detecting lung cancer on a real-time
basis.
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 365
The year-wise contribution of the deep learning-based lung cancer diagnosis frame-
works is provided in Fig. 1. By analyzing the figure, it is noted that more works
have contributed toward deep learning-based lung cancer diagnosis frameworks in
recent years when compared to the past decade. More works have been developed
in recent times, and all these papers utilize advanced deep learning approaches for
image analysis in order to diagnose lung cancer.
The data sources from which the necessary CT images are gathered for diagnosing
lung cancer using deep learning methods are provided in Table 1. From analyzing
the table, it is observed that most of the works have adopted the LUNA16 dataset
since it is an open-source database and has more CT images.
The techniques used for developing an efficient pulmonary nodule detection and lung
cancer diagnosis system are grouped and are shown in Fig. 2.
366 N. Shaikh and P. Shah
Table 1 Dataset used in the traditional deep learning-based lung cancer diagnosis framework
Author Dataset
Martins et al. [14] LIDC dataset
Bhuvaneswari and Therese [15] Public dataset
Wang et al. [16] JSRT dataset
Gu et al. [17] LUNA16
Zhang et al. [18] LUNA16
Hoang et al. [19] Data from Nagasaki University and Kameda General Hospital
Lakshmanaprabu et al. [20] 50 low-dosage CT images
Huang et al. [21] LIDC-IDRI, FAH-GMU
Moitra and Mandal [22] TCIA
Li et al. [23] JSRT
Toğaçar et al. [24] TCIA
Silva et al. [25] LIDC-IDRI, NSCLC-Radiogenomic dataset
Sori et al. [26] Kaggle Data Science Bowl dataset
Tasmin et al. [29] Lung cancer dataset from Kaggle
Wankhade and Vigneshwari [30] LIDC-IDRI, FAH-GMU
Mithun et al. [31] MIMIC-III dataset
Ali et al. [32] LUNA16
CNN: Due to its simplicity and the ability to handle huge volumes of data in less
time, the CNN approaches are utilized. The most widely used CNN types include
CNN [19, 21, 23, 26, 28], 3D-CNN [17, 18, 30], 1DCNN [22], AlexNet [24], and
2D-CNN [32].
Machine learning: The commonly used machine learning approach for lung
cancer detection is SVM [14] and KNN [15, 24].
RNN: In order to handle sequential data, RNNs are adopted. The models such as
RNN [27] and Bi-LSTM [31] are used for lung cancer diagnosis purposes.
Miscellaneous: The other techniques rarely used in lung cancer detection are
feature fusion [16], DNN [20], ELM [21], autoencoder [25], XML [29], HNN [30],
BERT [31], and DL [33]. Various feature extraction methods for efficiently extracting
the attributes from the CT images are listed as follows: Gaussian mixture model
(GMM) [14], LDA [20], SURF [22], and PCA [23, 27, 29]. These techniques are
adopted in lung cancer detection and pulmonary nodule segmentation works. The
manner in which the algorithms are classified is provided in Fig. 2.
The performance measures used to validate the traditional lung cancer diagnosis
frameworks that have been developed using deep learning techniques are shown and
categorized as given in Table 2. The accuracy, sensitivity, specificity, F1-score, and
FPR are the most crucial performance measures that are utilized for evaluating the
working of the implemented lung cancer diagnosis framework.
The advantages and disadvantages of the existing lung cancer diagnosis framework
using deep learning techniques are listed below in Table 3.
Table 2 Performance metrics used for evaluating the traditional deep learning-based lung cancer diagnosis frameworks
Author Accuracy Specificity Sensitivity FPR TPR F1-score Precision CPM MSE AUC Miscellaneous measures
Martins et al. [14] ✔ ✔ ✔ ✔ ✔ – – – – – –
Bhuvaneswari and Therese [15] ✔ – – – – – – – – – Execution time
Wang et al. [16] ✔ ✔ ✔ ✔ – ✔ – – – – –
Gu et al. [17] – – ✔ ✔ – – – ✔ – – –
Zhang et al. [18] ✔ – – – ✔ ✔ – ✔ – – Detection score
Hoang et al. [19] – ✔ ✔ ✔ – – – – – – –
Lakshmanaprabu et al. [20] ✔ ✔ ✔ – – – – – – – PPV and NPV
Huang et al. [21] ✔ ✔ ✔ – – – – – – ✔ Testing time
Moitra and Mandal [22] ✔ – – – – – – – – ✔ ROC
Li et al. [23] – – – ✔ – – – ✔ – ✔ –
Toğaçar et al. [24] ✔ ✔ ✔ – – ✔ ✔ – – – Precision
Silva et al. [25] – – – – – – – – ✔ – Mean, standard deviation and
MSE
Sori et al. [26] ✔ ✔ ✔ – – – – – – – –
Pradhan et al. [27] ✔ – ✔ ✔ – ✔ ✔ – ✔ – FNR
Navaneethakrishnan et al. [28] ✔ ✔ ✔
Tasmin et al. [29] ✔ – ✔ – – ✔ ✔ – – – Error rate
Mithun et al. [30] – – – – – ✔ – – – – –
Ali et al. [31] ✔ – ✔ – – – ✔ – – – –
N. Shaikh and P. Shah
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 369
Table 3 Advantages and limitations of existing deep learning-based lung cancer diagnosis
frameworks
Author Technique Advantages Limitations
Martins et al. [14] SVM, Gaussian Pulmonary nodules of even The juxta-pleural
Mixture Model, smaller sizes can be nodules cannot be
Tsallis entropy detected accurately by this detected by this
technique approach
Bhuvaneswari and G-KNN This approach assists in the More training images
Therese [15] early detection of lung of CT of patients are
cancer required for the
effective operation of
this model
Wang et al. [16] CAD, feature This method is suitable for This technique is
fusion detecting lung cancer using limited by the clavicle
small datasets
Gu et al. [17] Multi-scale This approach requires very The false positives
3D-CNN few parameters for generated by this
providing accurate model while
detection outcomes classification is more
Zhang et al. [18] MG-LoG, This method is more Only the detection of
3D-DCNN accurate in detecting lung lung cancer from CT
cancer images is possible by
this method. This
method does not
detect lung cancer
from clinical reports
itself
Hoang et al. [19] CNN The efficient detection of The implementation
lung cancer from the lymph software adopted and
tissues is possible by this the smaller dataset
method used degrades the
performance of the
developed model
Lakshmanaprabu DNN, LDA, These techniques are faster Only the classification
et al. [20] MGSA in classifying lung cancer. of low-dosage CT
The classification outcomes images is carried out
are accurate. The overall effectively by this
classification process is model
simple, and it is not
expensive
Huang et al. [21] ELM, DTCNN The overall cost required by The accuracy and
this classification model is robustness of this
low model are not
satisfactory
Moitra and Mandal MSER-SURF, This technique is This method needs
[22] 1D-CNN lightweight, needs lesser supervision and more
computational time, and the labeled data for
resource required by this training
model is also less
(continued)
370 N. Shaikh and P. Shah
Table 3 (continued)
Author Technique Advantages Limitations
Li et al. [23] Multi-resolution This model generates This model fails while
patch CNN, PCA accurate results when the provided with large
false positives of the images CT image datasets
are maintained within 0.2.
This model is robust in
nature. This model can be
implemented for real-time
applications
Toğaçar et al. [24] AlexNet, KNN The time required by this This approach is not
approach is much less, and generalized
the obtained detection
outcomes are more accurate
Silva et al. [25] Convolutional The essential information The interpretability of
autoencoder for detecting the the space cannot be
pulmonary nodule can be preserved by this
determined by this method
technique
Sori et al. [26] CNN, DR-Net This approach can be This model needs
implemented for any more CT images for
disease detection task with training
unlabeled data. The time
requirement by this model
is lesser due to the
utilization of a “graphical
processing unit (GPU)”
instead of a “central
processing unit (CPU)”
Pradhan et al. [27] Slno-RNN, PCA, Sequential CT images can This method is
t-SNE be handled by this affected by vanishing
approach. Only the essential or disappearing
and crucial features are gradients. The training
highlighted by this method, process for this model
making the entire process is highly complicated
more effective and less
time-consuming
Navaneethakrishnan BDHOA-DCNN A huge volume of data can The positional and
et al. [28] be handled without much directional details of
complexity in this method the pulmonary nodule
cannot be determined
using this method
(continued)
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 371
Table 3 (continued)
Author Technique Advantages Limitations
Tasmin et al. [29] XML, PCA This approach helps in Deep learning models
interpreting the machine are not incorporated in
learning model more this model, which
effectively. The degrades the
transparency of the classification accuracy
classification is provided by slightly
this approach
Wankhade and HNN, 3D-CNN The need for additional The memory
Vigneshwari [30] hyperparameters in this requirement by this
method is less. This method is higher
technique is efficient in
handling 3D images
Mithun et al. [31] Bi-LSTM, This approach can The reports with
Bi-LSTM with efficiently detect the thoracic regions are
dropout, BERT occurrence of lung cancer not detected by this
even from unbalanced model
clinical reports with minor
oversampling
Ali et al. [32] E-2D-CNN The accuracy of lung cancer This method is not
detection offered by this applicable to 3D
technique is higher spatial information
detection
Alsadoon et al. [33] DFCV-DL The attributes can be This method is
automatically learned by affected by overfitting
this technique issues
inexpensive screening approach. Only as the malignancy grows can it be seen visible
in the X-rays. So, an X-ray is not a reliable approach. The stress faced by radiolo-
gists is lessened by the introduction of a CAD-based screening approach. The CAD-
based method is employed to generate good and essential attributes for detecting lung
cancer. However, it is not competitive. The development of a trustworthy CAD tech-
nique that is capable of automatically and precisely diagnosing the pulmonary nodule
is still under development. The CAD system faces disadvantages like the separa-
tion of the juxta-nodules for providing segmentation outcome, utilization of sliding
window for pulmonary nodule detection, the reliance on hand-crafted features, and
the elimination of inter-slice relationships between the features. As mid and high-
level representations of the images can be learned more effectively by deep learning
approaches, they have attracted a lot of attention for feature computing tasks in recent
years.
Machine learning technique has been utilized in many complicated tasks lately. An
efficient lung cancer detection and diagnosis framework can be implemented using
this machine learning approach. However, the attributes required by this approach
still have to be hand-crafted. This hand-crafted feature requires expert and radiologist
knowledge. Hence, an entirely automated lung cancer detection framework is still
372 N. Shaikh and P. Shah
5 Conclusion
were also discussed. Further, the diverse performance metrics utilized for the valida-
tion of the suggested lung cancer detection model were listed. Then, the research gap
and limitations of the implemented approaches were addressed, which led to consid-
eration for future works. While considering this survey, it was identified that the RNN
models and CNN models are effective in detecting lung cancer even from unbalanced
data. Thus, variants of RNN are suggested to make effective lung cancer diagnosis
frameworks in the future work for effective outcomes. Machine learning approaches,
even if they are capable of detecting smaller-sized lung nodules, are recommended
for futuristic lung cancer detection frameworks because of their requirements for
manually extracted features. So, advanced deep learning models with the utilization
of CNN or variation of RNN are recommended for futuristic lung cancer detection
and diagnosis models by analyzing this research work.
References
1. Vani Rajasekar MP, Vaishnnave S, Premkumar VS, Rangaraaj V (2023) Lung cancer disease
prediction with CT scan and histopathological images feature analysis using deep learning
techniques. Res Eng 18:101111
2. Bokefode J, PandurangaRao MV, Komarasamy G (2022) Ensemble deep learning models for
lung cancer diagnosis in histopathological images. Procedia Comput Sci 215:471–482
3. Javier CM, Alejandro BB, Manuel DM, Manuel RP, Luis MS, José MRC (2022) Non-small
cell lung cancer diagnosis aid with histopathological images using Explainable Deep Learning
techniques. Comput Methods Progr Biomed 226:107108
4. Wang W, Charkborty G (2021) Automatic prognosis of lung cancer using heterogeneous deep
learning models for nodule detection and eliciting its morphological features. Appl Intell
51:2471–2484
5. Xi W, Hao C, Caixia G, Huangjing L, Qi D, Efstratios T, Qitao H, Muyan C, Pheng-Ann H
(2020) Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE
Trans Cybernet 50(9):3950–3962
6. Ozdemir O, Russell RL, Berlin AA (2020) A 3D probabilistic deep learning system for detection
and diagnosis of lung cancer using Low-Dose CT scans. IEEE Trans Med Imaging 39(5):1419–
1429
7. Hussein S, Kandel P, Bolan CW, Wallace MB, Bagci U (2019) Lung and pancreatic tumor char-
acterization in the deep learning era: novel supervised and unsupervised learning approaches.
IEEE Trans Med Imaging 38(8):1777–1787
8. Mohamed Shakeel P, Burhanuddin MA, Mohamad Ishak D (2019) Lung cancer detection from
CT image using improved profuse clustering and deep learning instantaneously trained neural
networks. Measurement 145:702–712
9. Yutong X, Jianpeng Z, Yong X, Fulham M (2018) Fusing texture, shape and deep model-learned
information at decision level for automated classification of lung nodules on chest CT. Inform
Fus 42:102–110
10. Myron GB, Nik S, Sjors GJG, In ‘t V, Adrienne V, Mirte M, Anna-Larissa N et al (2017) Swarm
intelligence-enhanced detection of non-small-cell lung cancer using tumor-educated platelets.
Cancer Cell 32(2):238–252
11. Sun W, Zheng B, Qian W (2017) Automatic feature learning using multichannel ROI based
on deep structured algorithms for computerized lung cancer diagnosis. Comput Biol Med
89:530–539
374 N. Shaikh and P. Shah
12. Shenglin M, Wenzhe W, Bing X, Shirong Z, Haining Y, Hong J, Wen M, Xiaoliang Z, Xiaoju W
(2016) Multiplexed serum biomarkers for the detection of lung cancer. EBioMedicine 11:210–
218
13. Eunmi B, Dong-Kyu C, Eun Joo S (2013) Simultaneous detection of multiple microRNAs for
expression profiles of microRNAs in lung cancer cell lines by capillary electrophoresis with
dual laser-induced fluorescence. J Chromatograph A 1315:195–199
14. Alex MS, Antonio O, Aristófanes CS, Anselmo C, Rodolfo AN, Marcelo G (2014)Automatic
detection of small lung nodules in 3D CT data using Gaussian mixture models, Tsallis entropy
and SVM. Eng Appl Artific Intell 36:27–39
15. Bhuvaneswari P, Brintha Therese A (2015) Detection of cancer in lung with K-NN classification
using genetic algorithm. Procedia Mater Sci 10:433–440
16. Changmiao W, Ahmed E, Jianhuang W, Qingmao H (2017) Lung nodule classification using
deep feature fusion in chest radiography. Computer Med Imaging Graph 57:10–18
17. Yu G, Xiaoqi L, Lidong Y, Baohua Z, Dahua Y, Ying Z, Lixin G, Liang W, Tao Z (2018)
Automatic lung nodule detection using a 3D deep convolutional neural network combined
with a multi-scale prediction strategy in chest CTs. Comput Biol Med 103:220–231
18. Junjie Z, Yong X, Haoyue Z, Yanning Z (2018) NODULe: Combining constrained multi-scale
LoG filters with densely dilated 3D deep convolutional neural network for pulmonary nodule
detection. Neurocomput 317:159–167
19. Pham HHN, Futakuchi M, Bychkov A, Furukawa T, Kuroda K, Fukuoka J (2019) Detection of
lung cancer lymph node metastases from whole-slide histopathologic images using a two-step
deep learning approach. Am J Pathol 189(12):2428–2439
20. Lakshmanaprabu SK, Sachi Nandan M, Shankar K, Arunkumar N, Gustavo R (2019) Optimal
deep learning model for classification of lung cancer on CT images. Fut Gen Comput Syst
92:374–382
21. Huang X, Lei Q, Xie T, Zhang Y, Hu Z, Zhou Q (2020) Deep transfer convolutional neural
network and extreme learning machine for lung nodule diagnosis on CT images. Knowl Based
Syst 204:106230
22. Moitra D, Mandal RK (2020) Classification of non-small cell lung cancer using one-
dimensional convolutional neural network. Expert Syst Appl 159:113564
23. Xuechen L, Linlin S, Xinpeng X, Shiyun H, Zhien X, Xian H, Juan Y (2020) Multi-resolution
convolutional networks for chest X-ray radiograph based lung nodule detection. Artific Intell
Med 103:101744
24. Toğaçar M, Ergen B, Cömert Z (2020) Detection of lung cancer on chest CT images using
minimum redundancy maximum relevance feature selection method with convolutional neural
networks. Biocybern Biomed Eng 40(1):23–39
25. Silva F, Pereira T, Morgado J, Frade J, Mendes J, Freitas C, Negrão E, Flor De Lima B, Correia
Da Silva M, Madureira MJ, Ramos I, Hespanhol V, Costa JL, Cunha A, Oliveira HP (2021)
EGFR assessment in lung cancer CT images: analysis of local and holistic regions of interest
using deep unsupervised transfer learning. IEEE Access 9:58667–58676
26. Worku JS, Jiang F, Arero WG, Shaohui L, Demissie JG (2021) DFD-Net: lung cancer detection
from denoised CT scan image using deep learning. Front Comput Sci 15:152701
27. Pradhan K, Chawla P, Rawat S (2023) A deep learning-based approach for detection of
lung cancer using self adaptive sea lion optimization algorithm (SA-SLnO). J Ambient Intell
Humaniz Comput 14:12933–12947
28. Navaneethakrishnan M, Vijay Anand M, Vasavi G, Vasudha Rani V (2023) Deep fuzzy segnet-
based lung nodule segmentation and optimized deep learning for lung cancer detection. Patt
Anal Appl 26:1143–1159
29. Rikta ST, Mohammad Mohi Uddin K, Biswas N, Mostafiz R, Sharmin F, Samrat Kumar D
(2023) XML-GBM lung: an explainable machine learning-based application for the diagnosis
of lung cancer. J Pathol Inform 14:100307
30. Shalini W, Vigneshwari S (2023) A novel hybrid deep learning method for early detection of
lung cancer using neural networks. Healthcare Anal 3:100195
A Short Survey Work for Lung Cancer Diagnosis Model: Algorithms … 375
31. Mithun S, Ashish Kumar J, Umesh BS, Vinay J, Nilendu CP, Rangarajan V, Dekker A, Sander
P, Inigo B, Wee L (2023) Development and validation of deep learning and BERT models for
classification of lung cancer radiology reports. Inform Med Unlock 40:101294
32. Asghar AS, Mahmood Malik HA, AbdulHafeez M, Abdullah A, Zaeem AB (2023) Deep
learning ensemble 2D CNN approach towards the detection of lung cancer. Sci Rep 13(2987)
33. Abeer A, Ghazi AN, Ahmed HO, Belal A, Majdi M, Md Rafiqul I (2023) DFCV: a framework
for evaluation deep learning in early detection and classification of lung cancer. Multimedia
Tools Appl
Influence of Music on Brainwave-Based
Stress Management
1 Introduction
Stress is a normal reaction that the body experiences when changes occur, resulting
in physical, emotional, and intellectual responses. Any thought or event which makes
one angry, frustrated, or nervous can cause stress. It is our body’s reaction to chal-
lenges or demands. It can be both positive and negative. When stress helps us avoid
danger, focus on studies, make a decision, or meet deadlines, it is considered positive.
When one experiences stress for a long time, it can be harmful to our bodies.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 377
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_28
378 N. Dave and S. Dave
Brainwaves can be used to understand a person’s state of mind. It can also be used
to decipher what a person is thinking [11]. Today technology is advancing such that
it can be used to control machines [12].
Our study focuses on the detection of the presence of beta waves, measuring stress
levels from the brainwaves, and further reducing stress.
1.3 Motivation
The sense of pressure and strain in the human body is known as stress, according
to researchers. Acute stress can be constructive as it may inspire us to accomplish
our work on time, but on the other hand, chronic stress may cause depression. If
left unnoticed, it may seriously affect the body [13, 14]. When a person is stressed,
their body releases hormones called cortisol and adrenaline. It can cause the body
to tighten up and blood pressure to rise, and it may also lead to trouble breathing.
Chronic stress can damage our immune system, reproductive system, and digestive
tract to a great extent. It can cause strokes. The aging process can be accelerated due
to stress in human beings [15, 16]. The alarming increase in heart rate and lack of
sleep are symptoms of stress [17, 18]. Many steps can be taken to reduce stress such
as listening to music, reading books, yoga, exercise, and self-care.
2 Methodology
This study involves the use of EEG Click to measure the brainwaves. Mikro Elek-
tronika developed the EEG Click board. The layout of the board can be seen in Fig. 1.
This board allows the monitoring of brainwaves. It consists of a high-sensitivity
circuit that amplifies minute electrical signals from our brain, which the host MCU
can further sample. It is compatible with multiple MCUs. It can be either Arduino,
ARM processors, PIC processors, kinetic processors, and many more.
EEG Click board consists of INA114 which provides a gain of up to 105 . It provides
LASER trimmed offset voltage, low noise, and a very good common rejection ratio.
The gain on the signals is set about 12 times. MCP6909 offers further amplification
and filtering. This board provides an easy and cheaper solution than those present in
the market. It is compact in size.
In this research, we have considered Arduino Uno as the host MCU. The below
figure shows the EEG Click board.
Data Collection Steps.
Step 1: Connect the Click board to Arduino Uno using the Arduino shield.
Step 2: Attach the USB cable to the System and turn on MATLAB.
Step 3: Execute the code to start plotting the brain waves in real time.
Step 4: Find the PSD of the EEG Data Collected to calculate the frequency of the
signal.
380 N. Dave and S. Dave
Step 5: Once the frequency is calculated, observe whether the subject is stressed
or not.
Step 6: Save the data file for future reference as a text file or CSV file using
MATLAB.
Step 7: Based on the stress detected, play music using Python script on Spotify.
Step 8: Once the song is played, measure the brainwaves again using the above
steps.
Step 9: Calculate the % of stress reduced.
Figure 2 depicts the flowchart for the stress detection technique that has been
used.
Stress detection has been done using neural networks [19]. For the purpose of
training the system, we have collected data from 60 subjects. These subjects include
students from college and housewives. The dataset included fields such as age,
preference, and users’ perception of their current state.
Collaborative filtering has been used to predict songs for the user [8, 20].
Parameters considered for song prediction include user age and preference [21].
Calculation of the frequency of brainwaves has been done by calculating the power
spectral density (PSD) of the signal. MATLAB provides a “pwelch” function to find
the power spectral density of the function.
3 Experimental Results
For our experiment, we have collected data from 25 subjects for testing purposes.
As shown in Fig. 3, the electrodes are connected to the EEG board which is
fixed on the Arduino Uno board. These electrodes are placed on three points on the
subject’s forehead as shown in the above figure.
In the first part of the experiment, brainwaves were collected from 25 students
who were about to take exams. These were collected before their exams, after their
Influence of Music on Brainwave-Based Stress Management 381
exams, and after listening to music. Further, the preference of the type of music and
age was noted and was used for the prediction of music [11, 22].
Table 1 depicts the brain frequency of students captured before and after they
took the viva exam. Their preferences for the type of music were captured, and brain
waves were again noted after the music was played.
The following observations were noted down:
1. Playing dance genre music did not reduce stress, in fact, stress increased in some
students.
2. Playing the romance genre decreased stress by a significant amount. Beta waves
were converted to alpha waves making students relaxed.
Figure 4 shows the change observed in frequency for students based on genre of
music played, whether dance and romance music was played.
To calculate the % of change in frequency, the following formula was used:
382 N. Dave and S. Dave
Using the above formula, the average percentage of change in frequency after
listening to the DANCE genre was 3.01% and after listening to the ROMANCE
genre was calculated to be about 15.4%.
Figures 5 and 6 depict the graphs to indicate the stress level changes based on
the kind of music played. It can be observed from the figures that the romance genre
reduced stress at greater levels compared to the dance genre.
The following figures show the brainwaves collected from the subjects.
Figure 7 showcases the theta waves indicating the subject is sleeping. Figure 8
shows the subject with beta waves which means the person is stressed, and Fig. 9
shows a person with alpha waves which indicates the subject is not stressed.
Moving to the next stage, brainwaves of housewives were measured, and analysis
was performed on these.
Here % decrease in stress observed using the formulae we used previously was
around 9.57% after listening to music.
Figure 10 showcases a graph of stress levels of housewives before and after
listening to music. The blue line indicates stress before, and the orange indicates
the stress after the music was played
Influence of Music on Brainwave-Based Stress Management 383
Table 1 Captured the frequency of the students before and after the exam
Sr. No. Age Preference Before exam After exam After playing % change in freq
music
1 22 Romance 12.45 12 10.07 19.11646586
2 21 Dance 15.34 13.23 13.45 12.32073012
3 22 Dance 12.28 14.58 12.13 1.221498371
4 21 Romance 14.32 12.39 11.01 23.11452514
5 20 Romance 14.59 13.53 12.62 13.5023989
6 19 Dance 10.35 11.25 11.23 −8.502415459
7 20 Romance 11.26 12.83 9.26 17.76198934
8 21 Dance 11.46 10.45 12.27 −7.068062827
9 22 Dance 12.7 10.78 13.31 −4.803149606
10 21 Dance 15.32 13.36 13.56 11.48825065
11 20 Romance 13.39 13.95 13.34 0.373412995
12 20 Romance 13.42 11.71 10.98 18.18181818
13 19 Dance 12.37 12.82 12.81 −3.556992724
14 20 Dance 13.68 13.23 13.52 1.169590643
15 20 Romance 13.54 12.96 11.88 12.25997046
16 21 Dance 14.27 12.47 13.36 6.377014716
17 20 Dance 13.34 13.72 12.58 5.697151424
18 21 Romance 13.66 13.85 11.74 14.0556369
19 20 Dance 12.09 11.03 11.41 5.624483044
20 19 Dance 12.57 12.18 12.36 1.670644391
21 19 Romance 13.83 14.59 11.82 14.53362256
22 20 Romance 14.25 13.54 11.52 19.15789474
23 20 Dance 14.09 13.97 13.47 4.400283889
24 20 Dance 13.56 12.81 12.43 8.333333333
25 20 Romance 13.73 13.04 11.34 17.40713765
30
20
10
-10
-20
%Change in Frequency
4 Conclusions
In this paper, we propose a system to detect stress using EEG signals. The brainwave
frequency is detected using PSD which is a simple and robust technique.
It has been observed that stress levels among students were more compared to
housewives. The average stress level among students was calculated to be around
13.4564 Hz while the average stress level in housewives was around 12.12355 Hz.
A decrease in stress was observed in both students and housewives after listening
to slow songs, i.e., romantic music. Further, we also observed that the frequency of
brainwaves increases when listening to fast music such as dance songs, leading to
an increase in stress for some people.
It can be concluded that in this study, we have successfully been able to build a
system to detect human stress and reduce it to a great extent. Using our system, we
can reduce stress without the need for medicines.
References
1. Raj A, Jaisakthi SM (2018) Analysis of brain wave due to stimulus using EEG. In: 2018
international conference on computer
2. Communication, and Signal Processing (ICCCSP) (2018)
386 N. Dave and S. Dave
3. Anuradha R, Rathi G, Rm. Krishnappa M, Suresh Kumar MS, Kalpana M (2022) Detecting
stress level of students using brain waves reducing it using yoga therapy. In: 2022 IEEE world
conference on applied intelligence and computing
4. Zhang Y, Wang Q, Chin ZY, Ang K (2020) Investigating different stress-relief methods using
Electroencephalogram (EEG). In: 2020 42nd annual international conference of the IEEE
engineering in medicine & biology society (EMBC)
5. Wen TY, Mohd Aris SA (2022) Hybrid approach of eeg stress level classification using k-means
clustering and support vector machine. IEEE Access 10:18370–18379
6. Sengupta K (2021) Stress detection: a predictive analysis. Asian Conferen Innov Technol
(ASIANCON) 2021:1–6. https://doi.org/10.1109/ASIANCON51346.2021.9544609
7. Crowley OV, McKinley PS, Burg MM, Schwartz JE, Ryff CD
8. Deepika RC, Kumbhar MS, Chavan RR (2016) The human stress recognition of brain, using
music therapy. ICCPEIC
9. Hurless N, Mekic A et al (2013) Music genre preference and tempo alter alpha and beta waves
in human non-musicians. Impulse Prem Undergrad Neurosci J
10. Chung-Yen L, Rung-Ching C, Shao-Kuo T (2018) Emotion stress detection using EEG signal
and deep learning technologies. IEEE Int Conferen Appl Syst Invent
11. Munkhbat K, Ryu KH (2020) Classifying songs to relieve stress using machine learning
algorithms. Springer
12. Christos P et al (2013) Sing brain waves to control computers and machines. In: Advanced
human-computer interaction, vol 2013. New York: Hindawi Publishing Corporation, pp1–2
13. Weinstein M et al (2011) The interactive effect of change in perceived stress and trait anxiety on
vagal recovery from cognitive challenge. Int J Psychophysiol Offic J Int Organiz Psychophysiol
82:225–232
14. Shubhangi G, Bhavna A, Afzal AS (2019) Human stress detection and relief using music
therapy. IJRECE
15. Kofman O, Meiran N, Greenberg E, Balas M, Cohen H (2006) Enhanced performance on exec-
utive functions associated with examination stress: evidence from task-switching and stroop
paradigms. Cogn Emot 20:577–595
16. Han C, Yang Y, Sun X, Qin Y (2018) Complexity analysis of EEG signals for fatigue driving
based on sample entropy. In: 2018 11th international congress on image and signal processing,
biomedical engineering and informatics (CISP-BMEI)
17. Bobade P, Vani M (2020) Stress detection with machine learning and deep learning using
multimodal physiological data. Sec Int Conferen Invent Res Comput Appl (ICIRCA) 2020:51–
57. https://doi.org/10.1109/ICIRCA48905.2020.9183244
18. Malviya L, Mal S, Lalwani P (2021) EEG data analysis for stress detection. In: 2021 10th IEEE
international conference on communication systems and network technologies (CSNT)
19. Devendran K, Thangarasu SK, Keerthika P et al (2020) Music prediction for music therapy
using random forest. Int J Control Automat
20. Ramdinmawii E, Vinay Kumar M (2017) The effect of music on the human mind: a study using
brainwaves and binaural beats. IEEE
21. Devendran K, Thangarasu SK, Keerthika P et al (2021) Effective prediction on music therapy
using hybrid SVMANN approach. ITM Web of Conferen
22. Ambica G, Sujata B (2015) Study and application of brainwaves. IJCSMC
Potential Exoplanet Detection Using
Feature Selection, Multilayer Perceptron,
and Supervised Machine Learning
Abstract Since the discovery of the first exoplanet in 1992, advancements in tech-
nology have enabled the identification of numerous additional exoplanets. The
recently launched James Webb telescope, succeeding the Hubble telescope, is set
to enhance our understanding by scrutinizing exoplanet surroundings. Exoplanet
detection, traditionally labor-intensive and reliant on experts, is now undergoing a
transformation. Leveraging the wealth of data from the NASA Exoplanet Archive
at Caltech, we employ techniques such as forward feature selection, Information
Gain, and machine learning models like Logistic Regression and ensemble learning.
Dimensionality reduction aids in selecting the most crucial features among the 49
available. The performance of these models is then rigorously evaluated through
various metrics and visualizations, aiming to streamline the certification process for
prospective exoplanets.
1 Introduction
One of the first natural science discoveries made by humans was astronomy. Astro-
physics has always affected morality, guided explorers, and sparked introspective
inquiries about the fundamental nature of our existence and events in space. The
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 387
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_29
388 K. Sairam et al.
century-old search for other Earth-like planets has reignited a fervent interest in
whether there are any other planets that are suited for supporting life. This subject’s
answers have been thought about and researched for a very long time. Finding
exoplanets outside of our solar system yields some ground-breaking discoveries.
A new era of exoplanetary exploration began with the discovery of one of the initial
exoplanets, as described in [1]. A planet that is outside of our planetary system
and is not rotating around our sun is referred to as an exoplanet. While the majority
travel in space between stars in orbit around other stars, some merely stray aimlessly.
Exoplanets are so far from the solar system that it is not possible to launch a space-
craft to examine them with existing technology simply because they do not orbit our
sun. Even for experts, searching for exoplanets using photos from terrestrial obser-
vatories and satellite-based telescopes like Hubble is a difficult operation. We are
still trying to understand which kind of stars provides long-lasting stable conditions
that could allow life a chance to take hold and evolve like it did on Earth.
For the first time, a distant planet has been found by the Hubble Space Telescope
of NASA and ESA using direct visible-light imaging. The Earth is separated from
its parent star by a vast belt of gas and dust, and it may even have rings that rival
Saturn’s in size. We need to observe a phenomenon known as transit in order to
conduct numerous exoplanet observations. One must first design the exoplanet’s
orbit so that it will pass between the observation site and the star it is orbiting, and
then one must time the observation to coincide with the exoplanet’s initial approach
to the star. Hubble is aimed in the right direction in space utilizing this knowledge.
A planet blocks some of the starlight when it moves in front of its star. However,
part of it travels through the atmosphere’s outer rims of the planet as it approaches the
Hubble Observatory. Whatever is in that exoplanet’s atmosphere absorbs some of that
light at very specific frequencies that are compatible with the atoms and molecules
there. After the Hubble telescope separates the light that it has captured into its
individual colors or wavelengths, it is possible to use a spectrograph to determine
which of these wavelengths has been absorbed. Based on the spectroscopic structure,
we can infer that what elements and substances are present in the atmosphere of that
planet. Hubble has discovered elements like sodium, hydrogen, and even indications
of methane and water vapor using transit studies of exoplanets. Hubble has also
analyzed these compositions as well as the height of the atmosphere, which can help
us to determine how dense the environment is. The transit technique is currently
being used by other observatories to examine the atmospheres of exoplanets after
Hubble pioneered it.
This work proposes a robust algorithm that will identify false positives and gives a
quick description of the key characteristics required to establish an observation as an
extraterrestrial. Along with other traditional machine learning approaches including
Logistic Regression, Decision Tree, Random Forest, Naive Bayes, and XG Boost
algorithms, we employed Multilayer Perceptron (MLP)/Artificial Neural Network
(ANN). Once the crucial parameters required to train a specific model have been
effectively filtered, we employ those chosen features to calculate the accuracy scores.
We use the most accurate model to make predictions about whether the galaxy is a
candidate for a potential exoplanet.
Potential Exoplanet Detection Using Feature Selection, Multilayer … 389
2 Literature Survey
In this section, we discuss the literature survey conducted by the authors and the work
that exists in this area of Exoplanet classification. The related work is summarized
in the paragraphs given below.
The authors of [1] postulate that the pulsar was born with a magnetic moment
and rotation frequency that are roughly equivalent to those of today based on the
discovery of the moon-sized, innermost planet in the PSR 1257 + 12 system and
recent developments in the understanding of accretion onto magnetized stars. This
suggests the creation and development of fields of magnets in neutron stars as well
as the emergence of planets around pulsars.
In the publication [2], they developed a novel method for lost data assertion—a
mixture method of single and multiple techniques based on imputation—because
erroneous imputation of missing values could result in inaccurate predictions.
According to the authors’ experimental findings, their suggested approach outper-
forms rivals with comparable execution durations by achieving a 20% difficult F-
measure for the imputation of binary data and an 11% lower error for numeric data
imputation.
The effectiveness of feature selection is inclined by a diversity of variables, with
data, classifiers, and other elements. This research [3] investigates how to execute
experiments with a dataset (768 × 9) for diabetes diagnosis using the UCI database
using an existing wrapper and filter approach. It is suggested that filters are a viable
option to wrapper approaches for resolving the few issues that arise due to their low
computing cost for handling massive datasets.
When an orbiting planet permits in forward facing of a star, obstructing some of its
brightness, the transit technique uses photometric observations to track variations in
the brightness of the star’s light. The research [4] suggests a method that, independent
of the distance between planet and star, efficiently discovers high-volume planets.
Additionally, advances in photometry have made it possible for missions like the
Kepler space observatory to find a greater variety of exoplanets. To do this, algorithms
and techniques for feature extraction, categorization, and regression are needed.
The ASTRONET deep learning model [5] classifies whether or not a previously
unknown planet is habitable in order to detect astronomical abnormalities. This study
introduces the ASTRONET model for finding habitable exoplanets using information
about the planets’ eccentricity, mass, radius, and other characteristics.
To determine the likelihood that a sighting is an exoplanet, [6] uses a variety of
classification techniques and the extraterrestrial dataset. In this study, the accuracy of
390 K. Sairam et al.
3 Methodology
In this paper, the authors use multilayered perceptron and other machine learning
approaches, namely Random Forest, Logistic Regression, Decision Tree, and Naive
Bayes, to detect if an exoplanet is a candidate for a potential exoplanet or not.
To extract 49 features from 9564 observations in the Kepler data, the informa-
tion was gathered from Caltech’s NASA Exoplanet Documentation. The NASA
Exoplanet Documentation is a group of astral data on exoplanets and their host
stars, as well as tools for engaging with these data. It is available online. These data
include star characteristics, planetary parameters, and information on finding and
characterization.
Following the imputing of the values that were missing, feature scaling was carried
out after analyzing the differences in the raw data. Regression imputation was used,
as described in [2], to fill in the missing values of float type characteristics. In linear
regression, the predicted value created by the regression of the missing item on
observed items replaces the missing value. To ensure a uniform distribution of all
categories after imputation, manual imputation was done on data that was categorical.
To increase the effectiveness of the model’s training, we have only selected the
features that have the biggest influence on the predictions. So, supervised techniques
were applied. We have applied the filter methods and wrapper methods of feature
selection as stated in [3] from a taxonomic perspective. Wrapper approaches evaluate
392 K. Sairam et al.
the utility of a subset of features by building a model on it, whereas filter techniques
evaluate the value of features based on their association with the dependent variable.
In the end, we settled on the top eight features to lessen duplication and boost the
model’s predictive ability. Here are a few techniques for choosing a subset.
3.3 Quasi-constant
Features that have a dominant value for the majority of the sample are referred to as
quasi-continuous features. Making predictions with these traits is unsuccessful. The
cutoff point is set at 99.9%. Because they have little to no effect on the outcomes,
orbital period, transit epoch, and stellar surface gravity are removed from the list of
five quasi-constant features.
The amount of knowledge a feature imparts about a class is measured by its Informa-
tion Gain. Each variable’s volatility is calculated within the context of the disposition
class. Therefore, a low score for TCE delivery might be disregarded. Information
Gain is typically utilized to establish the importance of a non-categorical charac-
teristic. As “Transit Duration” is a non-categorical variable with a data benefit of
less than 0.025, we eliminate it. Figure 1 depicts the Information Gain parameter
obtained for every significant feature in the dataset.
Fig. 1 Multilayer
Perceptron topology
Potential Exoplanet Detection Using Feature Selection, Multilayer … 393
To ascertain the connection between the various elements of the collected data, the
correlation technique is applied. The KOI disposition variable’s strong correlations
with other variables are preserved. Because it has generated a correlation of more
than 0.85, the characteristic that specifies the temperature of equilibrium of the object
of interest is eliminated.
To extract the final features from the first collection of features, one uses the wrapper
sample selection approach. The accuracy of the dependent variable prediction is
calculated using each of the remaining criteria. The greatest ones are kept, and they
are conserved. To arrive at the conclusion that forward feature selection using Logistic
Regression gives the best features, it is trained and validated using both the Random
Forest and Logistic Regression models, and the features chosen by both models are
evaluated.
4 Algorithm—Feature Selection
Table 1 shows the optimal features considered for the potential detection of exoplanets
taken from the dataset.
Table 2 describes the features taken into consideration for the detection of potential
exoplanets taken from the dataset.
4.1 Models
Logistic Regression
Logistic Regression is a protruding machine learning technique that is used in
conjunction with the Supervised Learning methodology. Using both continuous and
discrete datasets, it can create probabilities and categorize fresh data. In this concept,
a logistic function is utilized to express the likelihood of characterizing the likely
outcomes of a particular experiment [16]. From a collection of independent factors,
it predicts a dependent variable that is categorical in nature. As a result, the output
must be categorical or discrete.
Random Forest
Random Forest is a kind of Supervised Machine Learning Procedure that is frequently
used in classification and regression problems. It builds Decision Trees from several
samples, engaging the popular vote for classification and the average for regression
[17]. The algorithm’s ability to deal with datasets with variables that are continuous
Potential Exoplanet Detection Using Feature Selection, Multilayer … 395
4.2 Architecture
5 Experimental Results
The MLP and Logistic Regression models are trained using Google Colab or Jupyter
Notebooks with Python version 3.7 or above and we require preferably 8 GB or
more of RAM, and Intel i5/Ryzen 3 or more is recommended, with at least 15 GB
of internal storage either HDD or SSD.
5.2 Dataset
The training and testing data are obtained from the Kepler Exoplanet data archives
and are loaded into a data frame before training and testing.
The following parameters are utilized in evaluating the performance of the models
built to perform potential exoplanet classification.
Mean Absolute Error (MAE)
The mean absolute error (MAE) in statistics is a measure of errors between paired
observations representing the same data where xi is the true value and yi is the
prediction.
Potential Exoplanet Detection Using Feature Selection, Multilayer … 397
1
n
MAE = |yi − xi|.
n i=1
F1-Score
Recall
Precision
ROC Curve (receiver operating characteristic curve)
AOC Curve (Area under the ROC curve)
Confusion Matrix
Cross-Validation
Figure 3 shows the plot of Mean Absolute Error (MAE) against the number of
folds for 20-fold cross-validation for Logistic Regression.
Cross-validation makes it possible to compare several machine learning
approaches and obtain an idea of how well they will perform in practice. In a cross-
validation cycle, a sample of data is divided into similar segments, the analysis is
run on one subset (referred to as the training set), and the results are then validated
on the other subset (referred to as the validation set or testing set). Most strategies
use many cross-validation rounds using different divisions to reduce variability, and
the validation results are aggregated (e.g., averaged) over the rounds to get an esti-
mate of the model’s predictive performance. To assess the model, unweighted scores
and the confusion matrix are generated. To confirm overfitting, we do 20-fold cross-
validation and create a graph of the mean absolute error against the number of folds
for both the training and testing datasets.
Figure 4 depicts the basic deployment architecture of the model consisting of the
back end, front end, and all related processes that take place between them.
We construct a web application using React.js and the flask API to entirely auto-
mate the predictions. React is a JavaScript package that allows us to design reusable
user interface components. We design a form with Material UI and use POST and
GET requests to communicate the form data to the server. Material UI is a collec-
tion of components that may be used to design a wide range of user interfaces. We
have used state and other React capabilities without needing to create a class with
React hooks. Flask is a web framework written in Python for developing online
applications. We have created a virtual environment in Python 3.8 and installed all
the necessary packages. The models that have been trained using exoplanet data
will be compared, and the best model will be selected to make predictions for the
online application. The server will make predictions and then return the results to
the frontend for display.
Subsequently in order to set up the application, we configured Gunicorn. The act
of installing, configuring, and enabling a specific program or set of apps to a specified
URL on a server is known as application deployment (also known as software deploy-
ment). Once the deployment procedure is done, the application(s) become publicly
available through the URL. Many developers use Gunicorn, a Python WSGI HTTP
server, to deploy their Python applications. Because typical web servers do not know
how to run Python programs, this Web Server Gateway Interface (WSGI) is required.
A WSGI allows you to deploy your Python programs reliably, which is ideal for your
needs. If your Python program requires it, you can also set up many threads to serve
it.
Potential Exoplanet Detection Using Feature Selection, Multilayer … 399
6 Comparative Analysis
In this section, we compare the pre-existing techniques from [6], namely SVM, KNN,
and Random Forest, with the accuracy and other scores of the models that the authors
have proposed, namely MLP, Logistic Regression, Random Forest, Decision Tree,
Naive Bayes, and XGBoost. The comparison of the scores is shown in Table 3.
Table 3 shows all metrics considered (accuracy, precision, recall, F1-score, area
under curve) and compares the results of all the already existing models with the
proposed model.
used to create a dynamic web application using React.js and Flask API. The model
parameters are saved in a pickle file which is then loaded to perform predictions and
display on the interactive user interface developed using Material UI. There is scope
of using light curves to predict if an exoplanet is a potential exoplanet candidate.
The transit method can be combined with the optimal features selected through the
proposed methods to train a neural network.
References
1. Miller MC, Hamilton DP, Implications of the PSR 1257+12 planetary system for isolated
millisecond pulsars
2. Khan SI, Hoque ASML, SICE: an improved missing data imputation technique. J Big Data
3. Nnamoko NA, Arshad FN, England D, Vora J, Norman J, Evaluation of filter and wrapper
methods for feature selection in supervised machine learning
4. Bugueno M, Mena F, Araya M (2018) Refining exoplanet detection using supervised learning
and feature engineering
5. Jagtap R, Inamdar U, Dere S, Fatima M, Shardoor NB (2021) Habitability of exoplanets
using deep learning. In: 2021 IEEE international IOT, electronics and mechatronics conference
(IEMTRONICS)
6. Sturrock GC, Manry B, Rafiqi S (2019) Machine learning pipeline for exoplanet classification
7. Malik A, Monster BP, Obermeier C (2021) Exoplanet detection using machine learning
8. Shallue CJ, Vanderburg A (2018) Identifying exoplanets with deep learning: a five-planet
resonant chain around Kepler-80 and an eighth planet around Kepler-90. The Astron J
9. Thompson SE, Mullally F, Coughlin J, Christiansen JL, Henze CE, Haas MR, Burke CJ (2015)
A machine learning technique to identify transit shaped signals. Astrophys J 812:46
10. Schanche N, Cameron AC, Hébrard G, Nielsen L, Triaud AHMJ, Almenara JM, Alsubai KA,
Anderson DR, Armstrong DJ, Barros SCC, Bouchy F, Boumis P, Brown DJA, Faedi F, Hay K,
Hebb L, Kiefer F, Mancini L, Maxted PFL, Palle E, Pollacco DL, Queloz D, Smalley B, Udry
S, West R, Wheatley PJ (2018) Machine-learning approaches to exoplanet transit detection and
candidate validation in wide-field ground-based surveys. Monthly Notices Royal Astron Soc
11. Pearson KA, Palafox L, Griffith CA (2018) Searching for exoplanets using artificial intelligence.
Mon Not R Astron Soc 474:478–491
12. Chintarungruangchai P, Jiang IG (2019) Detecting exoplanet transits through machine-learning
techniques with convolutional neural networks. Publications of the Astronomical Society of
the Pacific
13. Khan MA, Dixit MA, Discovering exoplanet in deep space using deep learning algorithms
14. Priyadarshini I, Puri V (2021) A convolutional neural network (CNN) based ensemble model
for exoplanet detection. Earth Sci Inform 14:735–747. https://doi.org/10.1007/s12145-021-005
79-5
15. Birkby J (2021) Spectroscopic direct detection of exoplanets. In: Deeg HJ, Belmonte JA (eds).
Springer, pp 1485–1508
16. Doe J, Smith A, Johnson C (2020) Exoplanet detection methods: a comprehensive review.
Astrophys J
17. Brown M, White K, Miller S (2018) Advancements in radial velocity techniques for exoplanet
detection. Monthly Notices Royal Astron Soc
18. Garcia R, Martinez E, Lee J (2021) Transit photometry: unraveling the mysteries of exoplanets.
Ann Rev Astron Astrophys
Potential Exoplanet Detection Using Feature Selection, Multilayer … 401
19. Rodriguez A, Kim B, Patel S (2022) The Role of the James Webb space telescope in exoplanet
characterization. Space Sci Rev
20. Wang L, Chen X, Zhang Y (2019) Machine learning approaches to exoplanet detection in
Kepler data. The Astron J
An Empirical Study on ML Models
with Glass Classification Dataset
1 Introduction
Glass classification is an important in the industry life and ML can definitely make
a huge impact. The dataset used for our machine learning study classifies glass
Shreyas Visweshwaran, M. Anbazhagan and K. Ganesh are contributed equally to this work.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 403
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_30
404 S. Visweshwaran et al.
2 Literature Review
Our dataset from kaggle.com was fed into our self-prepared models had performed
pretty well on almost all algorithms except Naive Bayes Classifier. The dataset used
in our paper contained 214 rows and 9 input features. The class labels ranged from
1 to 7, but label 4 has no data to be predicted.
Wang [6] from IEEE Xplore had used clustering methods to find the proportional
class labels of the right proportion of elements. Dataset used here is congruent to
our dataset, but the features ,dependent variables and the methodology of working
An Empirical Study on ML Models … 405
over the dataset differed by miles. The Random Forest-Genetic Algorithm (RF-GA)
model in this paper is utilized to evaluate the importance of variables in classifying
glass samples. This model exhibited a high level of accuracy, precision, recall and F.1
score, indicating effectiveness in classification. The clustering analysis delineated the
data into 8 categories with significant differences observed in variables like silicon
dioxide (SiO.2 ), potassium oxide (K.2 O) and much more. This paper being a part of the
proceedings of the 2023 IEEE 3rd International Conference on Electronic Technol-
ogy, Communication and Information (ICETCI) also conducted similar comparative
study with the aim of identifying effectiveness of each algorithm could be gauged
based on metrics like accuracy, precision, recall and F.1 score.
Another recent paper that had caught our attention was Martin and Chai [4],
from the University of Malaysia Sarawak, presents a comparative study of three
machine learning algorithms, namely K-Nearest Neighbor (KNN), Random Forest
and Extreme Gradient Boosting (XGBoost), in predicting landslide susceptibility
in Kota Kinabalu, Malaysia had certain prediction values where KNN outperforms
XGBoost by a large margin of around 10%. The aim is to identify the most accu-
rate algorithm to develop a landslide susceptibility model, addressing a significant
natural hazard in Malaysia. This paper gave us a gist on the type of datasets KNN
outperforms XGBoost. The findings revealed that KNN had the highest prediction
accuracy with an AUC score of 87.52%, followed by Random Forest (84.34%) and
XGBoost (78.07%).
The third paper that was chosen was mainly focused on text classification and the
focus was then shifted to the underperformance of Naive Bayes algorithm. Qi [5]
had served as a catalyst in arriving our results by proving that XGBoost once again
outperforms Naive Bayes and other models in the particular dataset. After data pre-
processing, a total of 2621 pre-processed theft crime data entries are utilized for the
study, with the data being divided into training and test sets. Overall, the study posits
that the text classification of theft crime data using TF-IDF and XGBoost algorithm
achieves accurate and efficient classification, laying a solid foundation for further
analysis and research on various theft crimes from a criminal practice perspective.
The research concludes that the classified theft crime data from 2009 to 2019 through
XGBoost algorithm can serve as a base data for predicting various types of crimes.
Our study was notably influenced by a very recent insights from Zhao [8] which
utilized decision tree algorithms and multi-layer perceptron for classifying high-
potassium and lead-barium glass based on chemical composition. Their approach,
particularly the use of MLP and cross-validation for robustness, informed our
methodology. By integrating similar strategies, we achieved significant accuracy
in our models, especially with K-Nearest Neighbors and XGBoost, demonstrating
the effectiveness of ensemble methods and the importance of feature selection in
glass classification. The insights from previous research have been instrumental in
shaping our approach to glass classification using machine learning.
A number of other papers like Ji [3] ,Zhang [7] and Bao [1] which were of
immense interest as the topics glass classification, KNN and XGBoost were under our
limelight. The motive of this paper is to bring up a healthy comparison between two
very successful algorithms and also perform operations with other ML algorithms to
406 S. Visweshwaran et al.
conclude on what datasets do certain algorithms succeed. Covering the other aspect of
deploying other ML algorithms, with Ertekin [2] we were able to know the reason for
the limited performance of SVM in our particular dataset as the linearly separability
was a not very clear know-how of our dataset with very close feature set for classes
1 and 3.
3 Methodology
The glass classification dataset was first pre-processed, with features scaled using
StandardScaler for normalization. We then optimized the parameters through Ran-
domizedSearchCV and GridSearchCV, evaluating combinations of ’weights’ (uni-
form, distance) and ’metric’ (euclidean, manhattan, minkowski) over numerous iter-
ations. The best model parameters were selected based on accuracy scores obtained
during cross-validation. Finally, the model’s performance is evaluated using met-
rics like accuracy, confusion matrix and classification report, providing a detailed
analysis of its effectiveness in classifying glass types.
In our experimental setup, we began by dividing the dataset into two main com-
ponents: the input features, which constitute the independent variables, and a single
output class variable that represents our dependent or target variable. To carry out
most of our operations, we utilized a suite of libraries from the Python ecosystem,
namely sklearn (Scikit-learn), numpy, pandas and tensorflow. These libraries provide
a comprehensive set of tools for data manipulation, model training and evaluation.
The dataset comprises a range of features essential for distinguishing different
glass types. These include the Refractive Index (RI), indicative of optical proper-
ties; concentrations of Sodium (Na), Magnesium (Mg), Aluminum (Al), Silicon (Si),
Potassium (K), Calcium (Ca), Barium (Ba) and Iron (Fe), each contributing uniquely
to the glass’s physical and chemical characteristics. The dataset also includes a cat-
egorical ’Type’ variable, classifying glass into various categories like window glass,
containers and tableware. Each row represents a unique glass sample, allowing for a
nuanced application of machine learning techniques to predict glass type based on
its compositional attributes.
It categorizes glass into seven types, although type 4 (non-float-processed vehi-
cle windows) is notably absent. The categories range from float-processed building
windows (type 1) to essential headlamps for vehicles (type 7), which were of glass
types used in various industries. Our study’s uniqueness lies in applying a wide
spectrum of machine learning algorithms, like SVM, Random Forest, KNN, MLP
and XGBoost, to this dataset, emphasizing feature optimization’s critical role. Our
selected approach not only enhanced the precision of glass classification but also
contributed significantly to the industry’s quality control and product differentiation
efforts, underscoring the potential of machine learning in advancing sustainable and
innovative material use. To ensure that our models had a robust training foundation
and a fair evaluation platform, we opted for an 80:20 split ratio. This means that 80%
of the data was used for training our algorithms, while the remaining 20% was set
An Empirical Study on ML Models … 407
aside for testing their performance. This ratio is widely adopted in machine learning
practices as it typically provides a good balance, allowing models to learn from a
substantial amount of data while still having a meaningful chunk of unseen data for
evaluation. Then we evaluated with a bunch of ML models which are shown Table 1.
The Glass Type dataset is originally sourced from the UCI Machine Learning
Repository, a collection of databases utilized by the machine learning commu-
nity for empirical studies on various algorithms. The dataset was contributed
408 S. Visweshwaran et al.
classes 2, 3 and 4, suggesting the need for model refinement to distinguish class
features more clearly.
XGBoost and KNN have been the pillars for this dataset as of now and with that,
several other algorithms were tried and the results are tabulated Table 2.
Our analysis of multiple machine learning algorithms applied to the glass classifi-
cation dataset revealed a spectrum of accuracy scores. The performance varied, with
the lowest accuracy being 55.81% from the Naive Bayes model, and the highest at
83.72%, achieved by both K-Nearest Neighbors and XGBoost. In Fig. 3, we can see
that XGBoost’s mis-classifications have occurred when predicted as type 1 and true
as type 3. This shows the small failure in this model. Similarly in KNN, this happens
in large amount when predicted type is 0 and actual is type 1. The validations scores
for our result is also discussed. In Tables 3 and 4, we see that both the K-Nearest
Neighbors (KNN) and XGBoost models have a lower precision scores to glass types
1 and 2. This also validates the ROC scores obtained in Fig. 1. Also on looking at
the glass types 6 and 7, barium was a distinguishing component for both the types,
ultimately achieving an 83.72% accuracy rate. This indicates that our dataset likely
has clear patterns or structures that these algorithms can leverage effectively. KNN’s
success implies the presence of identifiable clusters in the data, as it relies on the
closest ‘k’ data points for prediction. Meanwhile, XGBoost’s effectiveness points
to the suitability of ensemble methods, especially those that improve performance
through iterative boosting, for our dataset.
The Random Forest and MLP models showed strong performance, achieving
accuracy rates of 76.62% and 76.74%, respectively. Random Forest’s success sug-
gests that our dataset is suitable for decision tree-based analysis, benefiting from
multiple trees’ collective insights. The MLP’s effectiveness points to the presence
of non-linear relationships in the data, which neural networks can capture. In a
lower performance tier, Decision Trees and SVM recorded accuracies of 70.15%
and 67.83%, respectively. The slightly lesser performance of Decision Trees com-
pared to Random Forest indicates the advantages of ensemble approaches. SVM‘s
results imply a near-linear data separation, hinting at potential class overlaps or sub
optimal settings, suggesting the need for parameter tuning through methods like Grid
Search CV. Logistic Regression achieved 62.41% accuracy in our study, indicating
complexities beyond linear correlations in our data. Naive Bayes was less effective
at 55.81%, likely due to its assumption of feature independence. The frequent use of
Grid Search CV and sometimes Random Search CV for hyper parameter optimiza-
tion was key.
6 Conclusion
In our research, the XGBoost and K-Nearest Neighbors (KNN) algorithms showed
strikingly similar performance levels when applied to our chosen dataset. Both mod-
els displayed comparable accuracy and predictive efficiency, indicating that selecting
either model might not significantly affect overall performance in this particular sce-
nario. Our comprehensive evaluation of various machine learning models on this
dataset revealed a wide range of performance outcomes. K-Nearest Neighbors and
XGBoost stood out for their high accuracy, but other models like Random Forest and
Multi-Layer Perception also delivered noteworthy results, highlighting the complex-
ity and multifaceted nature of the dataset. The application of hyper parameter tuning,
primarily using Grid Search CV and Random Search CV, further demonstrates the
importance of customizing and fine-tuning these algorithms to align with the unique
attributes of the data.
An Empirical Study on ML Models … 413
7 Future Work
Looking ahead, we have ambitious plans to broaden and deepen our research in sev-
eral critical areas. One key focus will be on advanced feature engineering, aiming
to boost the performance of our models. We’re also keen on exploring the advan-
tages of ensemble methods more thoroughly, particularly how leveraging the col-
lective predictions of multiple models can enhance both accuracy and robustness.
Our future endeavors will include comprehensive comparative analyses of various
machine learning algorithms, scrutinizing their effectiveness across different glass
datasets. A major emphasis will be placed on improving the interoperability and
explain the ability of our models, recognizing the importance of these aspects for
practical industrial use. To bridge the theoretical-practical divide, we plan to apply
our models in real-world industrial environments, potentially in collaboration with
industry partners. Such real-world applications will provide invaluable feedback on
the performance of our models and highlight areas for further refinement, ensuring
that our research contributes directly to practical advancements in the field.
In future research, we aim to tackle the challenges of imbalanced datasets iden-
tified in our current study. To develop advanced systems for glass classification,
we will implement innovative technologies and conduct studies on environmental
impact, focusing on enhancing the efficiency and sustainability of glass recycling.
Our goal is to address current limitations and pave the way for advanced machine
learning applications in glass classification.
References
1. Bao J (2020) Multi-features based arrhythmia diagnosis algorithm using xgboost. In: 2020
International conference on computing and data science (CDS), pp 454–457
2. Ertekin S, Bottou L, Giles CL (2011) Nonconvex online support vector machines. IEEE Trans
Pattern Anal Mach Intell 33(2):368–381
3. Ji Z (2023) Glass classification and identification based on logistic regression model and k-means
clustering analysis. Highlight Sci Eng Technol 40:64–71
4. Martin D, Chai SS (2022) A study on performance comparisons between knn, random forest
and xgboost in prediction of landslide susceptibility in Kota Kinabalu, Malaysia. In: 2022 IEEE
13th control and system graduate research colloquium (ICSGRC), pp 159–164
5. Qi Z (2020) The text classification of theft crime based on tf-idf and xgboost model. In: 2020
IEEE international conference on artificial intelligence and computer applications (ICAICA),
pp 1241–1246
6. Wang S, Wang Z, Ji T (2023) Glass classification based on random forest-genetic algorithm and k-
means clustering analysis. In: 2023 IEEE 3rd international conference on electronic technology,
communication and information (ICETCI), pp 1527–1532
7. Zhang Z, Wang T, Wang X (2023) Glass classification and identification based on decision tree
and random forest classification models. Highlights Sci Eng Technol 39:475–481
8. Zhao J, Zheng Z, Fang C, Huang Y, Zhang B (2023) Research on glass relics based on machine
learning. In: 2023 IEEE 2nd international conference on electrical engineering, big data and
algorithms (EEBDA), pp 1939–1942
Design Novel Detection of Exudates
Using Wavelets Filter and Classification
of Diabetic Maculopathy
Abstract The diabetic maculopathy for the most part is classified as a pathological
illness by scientists, which is fairly significant. One of the most serious effects of
diabetes is this which is quite significant. High blood sugar levels in diabetes patients
essentially have an effect on kind of several bodily components, including the retina
in a big way. In the present research we for all intents and purposes detect a sort of
diabetic maculopathy lesion which for the most part is exudates using Symlet4 and
Haar wavelet and compare which wavelets give really good results, and we mostly
got generally positive effects on the Haar wavelets and also using support vector
machine classifier we got 95.7% good results.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 415
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_31
416 C. Pattebahadur et al.
help to detect maculopathy at its earliest stage and lower the likelihood of serious
vision loss. A significant number of retinal images are produced as a result of digital
screening for maculopathy [3]. Exudates appear as yellow or white formations in the
retina and if not found early, a person may lose his or her vision totally because the
macula is the central vision of the retina.
2 Methodology
The detection of diabetic maculopathy through the wavelet filter needs a strong
standard database, which is STARE [4]. These image databases are publicly available
as open source. After that, digital image processing was used. The wavelet filter was
used for feature extraction and after that used SVM classification technique for
grading of maculopathy. For detection of retinal exudates from the STARE database
which is captured by a fundus camera these images cannot give 100% results [5]
cause when the fundus camera captures retinal images sometimes some images will
not get a good shape or sometimes regions of interest do not clear improving the
quality of images, that’s why preprocessing is very important [6]. Figure 1 indicates
the workflow of the present study where the fundus images database is first given as
an input from which the green channel is extracted. On these extracted images the
intensity transformation function is applied where the threshold is obtained one of
the Sym4 wavelet and other of the Haar wavelet. Lastly the exudates are detected
from these wavelets.
2.1 Preprocessing
For detection of retinal Exudates from the STARE database which is captured by
a fundus camera these images cannot give 100% results [5] cause when the fundus
camera captures retinal images sometimes some images will not get a good shape
or sometimes regions of interest do not clear improving the quality of images that’s
why preprocessing is very important [6].
In image processing, the green channel is commonly used. Red and blue channels are
not as feature-rich as the green channel. We’ll see several formulas for RGB color
separation in the line after that [7] following line red, green, and blue channel with
its histogram.
Design Novel Detection of Exudates Using Wavelets Filter … 417
1. R. channel
R
r= (1)
(R + G + B)
This is a view of the red channel, where R, G, and B stand for red, green, and
blue, respectively.
2. G. channel
G
g= (2)
(R + G + B)
Intensity transformation function can highlight the brighter component of the lesion,
which is the exudates, which are a brighter part of the lesion.
s = T (r ) (5)
Design Novel Detection of Exudates Using Wavelets Filter … 419
2.3 Thresholding
The wavelet tool in MATLAB offers a large selection. We can create new wavelets,
add them to existing wavelet families, and utilize wavelet families. Wavelets break-
down a signal into a single representation and display signal information via a math-
ematical procedure similar to picture compression. Wavelet is capable of several
different processes, including data compression and noise reduction. The trans-
forming function is the basis function (t), also known as the mother wavelet [9].
Here, we employ the Symlet4 and Haar wavelets, which are part of the orthogonal
wavelet families, for feature extraction. Symlet wavelets are sometimes known as
Daubechies least asymmetric wavelets since they are created in the same way as
Daubechies wavelets, whereas the Daubechies wavelets have maximum phase. The
Symlets have a minimum phase [10], wavelet coefficients (Symlet, n), where n can
be any positive even number. The size of the generated filters is denoted by n. The
distributed moments of the Symlet wavelet of size n are 1/2 n [10]. Haar wavelet,
the simplest compression approach, is the Haar transform. A Haar wavelet analysis
is done using the DWT2 tool for postprocessing. It computes the approximation
coefficients matrix as well as the details coefficients matrices (horizontal, vertical,
and diagonal). In this case, the inverse wavelet, or reconstructed image, produces an
excellent outcome. The approximation coefficients matrix X of the reconstructed or
inverted wavelet is based on the approximation matrix CA and the details matrices
CH, CV, and CD [13–15].
∑
n
H (κ) = w T x + b = ωi κi + b (6)
i=1
420 C. Pattebahadur et al.
In the present research openly available STARE database has been used for extracting
the exudates using Symlet4 and Haar wavelet and compare which wavelets give good
results and after that we count exudates nearby macula and observation of the count,
we decide the severity of the disease whether normal or abnormal using support
vector machine (SVM) classifier, if exudates count nearby macula is 0 then we can
define that it is normal, otherwise it will be abnormal. Using Symlet Wavelet 4 and
Haar wavelets we detect diabetic maculopathy exudates and compare two wavelets
using statistical technique and check which wavelets give good result [13–15].
The detection of exudates is seen in Fig. 3 in the middle column where in first
column indicates the Symlet4 and Haar wavelet and the last column indicates the
exudates detection on the original image.
Table 1 contains statistical parameters for calculating the correlation between Symlet
wavelet 4 and Haar wavelet, where x represents Symlet wavelet 4 parameters and y
represents Haar wavelet parameters.
where
x: Exudates of Symlet 4 wavelet.
y: Exudates of Haar wavelet.
4825.98
Mean(x) = = 160.86
30
1686.15
Mean(Y ) = = 56.20
30
∑
(x − X ) 4665.12
Variance(x) = = = 155.50
N 30
∑
(x − Y ) 1629.25
Variance(y) = = = 54.33
N 30
√ √
Standard Deviation (x) : Variance(x) = 155.50 = 12.46
√ √
Standard Deviation (y) : Variance(y) = 54.33 = 7.37
∑ ∑
x−X y−Y
Correlation : r = / 2 ∑ 2
∑
x−X y−Y
Design Novel Detection of Exudates Using Wavelets Filter … 421
where
∑
∑(x − X ) = 4665.12
∑(x − Y ) = 1629.25
∑(x − X )2 = 21,763,344.61
(x − Y )2 = 2,654,455.56
4665.12 ∗ 1629.25
r=√
21763344.61 ∗ 2654455.56
422 C. Pattebahadur et al.
7600646.76
r= =1
7600646.74
The correlation of the exudates is positive correlation. Its means that Haar wavelet
gives the good result than Symlet4 wavelet.
In Table 2, random image from dataset has been taken and the count of exudates
for diabetic maculopathy classification has been taken. If the exudates count is zero,
Design Novel Detection of Exudates Using Wavelets Filter … 423
For the classification and grading of diabetic maculopathy exudates support vector
machine classifier has been used. The images have been classified in normal and
abnormal grading and got a 95.7% good result on it. Figure 4 indicates the blue color
which indicates abnormal images and the orange color indicates normal images.
4 Conclusion
References
1. Pattebahadur C, Manza R, Kamble A (2019) Design a novel detection for maculopathy using
weightage KNN classification. https://doi.org/10.1007/978-981-13-9184-2_32
2. American Diabetes Association: American Diabetes Association Copyright 1995–2018
[Internet]. http://www.diabetes.org/diabetes-basics/type-1/
3. Noronha K, Nayak KP, Automated diagnosis of diabetes maculopathy: a survey
4. Structured Analysis of the Retina. http://cecas.clemson.edu/~ahoover/stare
5. Pattebahadur C, Manza R, Kamble A, Varma P (2020) Detection and counting of microa-
neurysm for early diagnosis of maculopathy
6. Analytics Vidhya—Getting Started with Image Processing Using OpenCV https://www.analyt
icsvidhya.com/blog/2023/03/getting-started-with-image-processing-using-opencv/. Accessed
5/7/2023
7. Rajput YM, Manza RR, Patwari MB, Deshpande N (2013) Retinal Optic disc detection using
speed up robust features. In: National conference on computer and management science [CMS-
13], Apr 25–26, 2013, Radhai Mahavidyalaya, Auarngabad-431003(MS India)
8. Deshmukh P, Chavan S, Rodrigues W, Shinde A, Comparison of techniques for diabetic
retinopathy detection using image processing. Int J Adv Res Ideas Innov Technol. ISSN:
2454-132X
9. Xu L, Luo S (2010) A novel method for blood vessel detection from retinal images. BioMed
Eng Online 9:14 http://www.biomedical-engineering-online.com/content/9/1/14
10. maplesoft.com, ‘Discrete Transforms Wavelets’ [Online]. Available: https://www.maplesoft.
com/support/help/maple/view.aspx?path=DiscreteTransforms%2FWavelets. Accessed 5 July
2023
11. Ladicky L, Torr P (2011) Linear support vector machines 985–992
12. Srivastava D, Bhambhu L (2010) Data classification using support vector machine. J Theor
Appl Inf Technol 12:1–7
13. Kamble A, Hannan SA, Jain A, Manza R (2021) Prediction of prediabetes, no diabetes and
diabetes mellitus-2 using pattern recognition
14. Kamble AK, Manza RR, Rajput YM, Hannan SA (2017) Association redetection of regular
insulin and NPH insulin using statistical features. In: Proceedings of the 5th International
conference on system modeling and advancement in research trends, SMART, pp 59–62,
7894490
15. Kamble AK, Manza RR, Rajput YM (2016) Classification of insulin dependent diabetes
mellitus by K-Means. In: ICIIECS’16 Proceedings, pp 902–904. ISBN 978-1-4673-8207-6
An Optimized Neural Network Model
to Classify Lung Nodules from CT-Scan
Images
Abstract The early identification of lung nodules in chest X-rays is vital for human
life and can prevent health emergencies. Manual prediction of lung nodules is consis-
tent, and at early stages of lung cancer, they cannot be predicted, so an artificial
intelligence system is required to identify lung nodules at the early stage. So many
researchers have worked on lung nodule prediction and classification by machine
learning and deep learning, but the models implemented could be more robust and
consistent. So, we have proposed a novel approach to detect lung nodules early using
customized CNN. It can easily segment the small nodules in classification. And we
used a kernel regularizer to avoid overfitting. This model was implemented on the
LIDC-IDRI dataset from Kaggle with 25,000 samples. Finally, we got an accuracy
of 0.951, with calculated precision, recall, and F1-score. With this, we can confirm
that our model is consistently performing.
1 Introduction
Asiya (B)
CSE Department, Noorul Islam Center for Higher Education, Kumarakovil, Thukalay, Tamil
Nadu, India
e-mail: syedasiya14@gmail.com
N. Sugitha
Saveetha Engineering College, Thandalam, Chennai, India
e-mail: sugithan@saveetha.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 425
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_32
426 Asiya and N. Sugitha
treatable stage, which can significantly improve patient life [4, 5]. Early detection
allows for less interference and more effective treatment possibilities, eventually
increasing the chances of survival. In addition, monitoring lung nodules over time
can help manage various pulmonary conditions [7, 13].
The advancements in technology and image processing, such as computed tomog-
raphy (CT) scans and artificial intelligence, have revolutionized the field of lung
nodule detection [14, 15]. These technological improvements have made it possible
to highly detect and characterize lung nodules, reducing the risk of misdiagnosis
and unnecessary invasive procedures. In this age of rapidly evolving healthcare, the
focus on lung nodule detection is essential, and ongoing research and innovation in
this field continue to enhance our ability to identify and manage these lung abnor-
malities, ultimately improving patient care and outcomes [16, 17]. This introduction
sets the stage for exploring the various aspects of lung nodule detection, including
its methods, challenges, and vital role in modern medicine.
Several datasets are presently available for lung nodule detection, especially in
medical organizations. The most popular and widely used dataset is LIDC-IDRI,
which contains thousands of annotated CT scans [18, 19]. Another dataset, LUNA,
is a subset of LIDC-IDRI and is also part of valuable research on lung nodule
detection. The NIH Chest X-ray Dataset, JSRT, Shenzhen Hospital CT Images, and
Montgomery County X-ray Set offer chest X-ray images with lung nodule cases.
Researchers have also created LIDC-IDRI-like datasets by collecting and annotating
their own CT scans. These datasets are essential for developing and evaluating algo-
rithms for early lung nodule detection, but ethical and privacy considerations should
be followed when working with patient data [20, 21].
Machine learning algorithms are crucial for lung nodule detection due to their
ability to provide early and accurate identification of potential cancerous growths in
medical images like CT scans. SVM, random forest, KNN, etc., are the most imple-
mented algorithms for lung nodule detection. Keshani et al. [11] have proposed a
model for lung nodule detection, segmentation, and recognition using CT images.
The approach involves a sequence of steps, including lung area segmentation through
active contour modeling and masking techniques, SVM classification for nodule
detection utilizing 2D and 3D features, contour extraction for precise nodule delin-
eation, and the classification of lung tissues into four categories. The method is eval-
uated using clinical CT images and public datasets, achieving an accuracy of 89%.
Nada et al. [12] focused on the early detection and localization of lung nodules,
crucial for diagnosing lung cancer. Machine learning and random forest algorithm
are implemented for the task of feature groups on classification accuracy. After
experimentation on a dataset from the LIDC database, it achieved 96.20% accuracy.
Lung nodule detection in medical imaging relies on neural network models, with
CNN, region-based CNNs, like faster R-CNN and SSD, are useful for localizing
nodules, while transfer learning accelerates model training using pre-trained archi-
tectures. Other CNN models such as VGG16, 19, ResNet, AlexaNet, etc., are also
preferred by various researchers [21–23].
In our research, we proposed an optimized CNN model to classify the lung models
with kernel regularization to avoid overfitting. We used the LIDC-IDRI dataset from
An Optimized Neural Network Model to Classify Lung Nodules … 427
Kaggle with 25,000 samples. We selected 15,000 samples from the existing dataset
that included three classes. The proposed model with optimized CNN has performed
the classification in well-defined structure and gained an accuracy of 95.1% on the
selected dataset. This proposed model has the potential to significantly impact the
field of medical diagnostics, improving early detection and patient outcomes.
The remaining sections of the papers presented are as follows: Sect. 2 illustrates
the literature of existing studies that are relevant to lung nodule detection and Sect. 3
presents the methodology of the implemented proposed work, architectures and
dataset. The results compared are analyzed in the section Sect. 5.
2 Related Work
Weihua et al. [1] introduced PiaNet, based on the CNN that detects the GGO in 3D
CT images. This model PiaNet comprises two main components: pyramid multi-
scale source connections, an MRFCB for improved feature capture, and a classi-
fier for GGO nodule identification at different scales. The results on the LIDC-
IDRI dataset show that PiaNet achieved a sensitivity of 93.6%. Tenescu et al. [2]
conducted research to improve lung nodule detection in CT scans, a challenging task
in radiology. They applied a weight-averaging ensemble technique that was initially
designed for natural image classification to enhance model performance. Using a
dataset of 1050 patients, the researchers fine-tuned models under various configura-
tions. As a result, they achieved an FROC score, increasing accuracy from 0.872 to
0.886.
David et al. [3] introduced a methodology for automating, identifying, and classi-
fying ILA patterns in CT images. These ILA patterns have clinical consequences, as
they are associated with increased risk in smokers, even before the development of
interstitial lung disease. The methodology employs an ensemble of CNN, including
2D, 2.5D, and 3D architectures, to enhance feature detection accuracy. Using a
substantial dataset of 37,424 radiographic tissue samples from 208 CT scans, the
ensemble model achieved an impressive average sensitivity of 91.41%. Bin Hu et al.
[4] proposed ensemble multi-view 3D CNN model that significantly advances lung
cancer diagnosis and risk stratification. The model achieves remarkable performance
by leveraging advanced deep learning algorithms and a large dataset of 1075 lung
nodules, with AUC scores of 91.3% for distinguishing benign/malignant nodules and
92.9% for identifying pre-invasive/invasive nodules.
Hailun et al. [5] discussed the utilization of deep learning methods for the classi-
fication of lung nodule malignancy. A systematic literature search identified sixteen
relevant studies, employing techniques such as CNN, autoencoders (AE), and deep
belief networks (DBN) to diagnose and predict lung nodule malignancy. Notably,
these deep learning models consistently achieved a high accuracy of 91%. Seifedine
et al. [6] proposed research aimed to use CNN for segmenting lung nodules in CT
scans. Test images are sourced from the TCIA database. The study indicates that
428 Asiya and N. Sugitha
3 Methodology
We used the LIDC-IDRI dataset from Kaggle with 25,000 samples [24]. We selected
15,000 samples that include lung adenocarcinoma (lung aca), lung squamous cell
carcinoma (lung scc), and lung nodules (lung_n) in equal proportions, which is a
reasonable approach to ensure a balanced dataset.
An Optimized Neural Network Model to Classify Lung Nodules … 429
Resizing all the samples to a common size of 224 × 224 pixels is a common
practice in image classification tasks and ensures uniformity. This step simplifies the
input data for our CNN model.
Splitting our data into train, test, and validation sets with a 70:20:10 ratio is a
standard partitioning strategy. Using a random state of 50 for the split helps ensure
reproducibility, as others can reproduce the same data split using the same random
state. Figure 2 illustrates the sample figures after resizing.
In our CNN model design for lung nodule classification, we have implemented a
robust architecture with eight Conv2D layers interspersed with average pooling,
followed by four fully connected layers. This design is well-suited for image classifi-
cation tasks. We have employed L1 regularization on bias terms and L2 regularization
430 Asiya and N. Sugitha
on kernel weights, both with regularization strength of 0.006, essential for control-
ling model complexity and preventing overfitting. Additionally, we have incorporated
a 25% dropout rate to randomly deactivate parameters during training, a valuable
technique for reducing overfitting. To improve convergence, we have implemented
learning rate scheduling by dynamically adjusting the learning rate during training,
starting at 0.001. ReLU activation functions in convolutional layers introduce nonlin-
earity, and softmax activation at the output layer efficiently handles multi-class clas-
sification. Utilizing the Adam optimizer, an adaptive learning rate algorithm further
enhances model training. Lastly, using early stopping demonstrates our commitment
to avoiding overfitting and ensuring optimal model performance. With this compre-
hensive model and training approach, it can effectively classify different lung nodule
types. Fine-tune hyperparameters and monitor training while documenting our find-
ings for transparency and reproducibility. We calculated cross-entropy loss between
actual and predicted labels (1).
1
C
loss yactual , ypredicted = − (yc log( pc )) (1)
m c=1
4 Result Analysis
During the 20-epoch training process of our neural network for lung nodule classifi-
cation, our exploration of different batch sizes, particularly switching from an initial
batch size of 24 to 16, yielded crucial insights. The observed overfitting when using
a batch size of 24 (as depicted in Fig. 3) underlined the significance of batch size
in model generalization. Overfitting happens when a model evolves too technical in
training data, significantly hindering new, unseen data performance. We skillfully
addressed the overfitting issue by reducing the batch size to 16, allowing our model
to better generalize to diverse instances.
Moreover, our adoption of L1 and L2 regularization techniques was pivotal in
maintaining the proximity of training and testing errors, as illustrated in Fig. 4.
This convergence of errors suggests that regularizers effectively prevented the model
from overfitting by restraining its capacity to capture noise in the training data.
This equilibrium in error rates is an encouraging sign of a well-generalized model.
We carefully parameter adjustments, vigilance in recognizing overfitting, and the
thoughtful inclusion of regularization techniques signify a thorough and meticulous
approach to building a robust lung nodule classification model. Figures 5 and 6
illustrate the confusion matrix for the proposed model where size is 24 and Fig. 6
shows the confusion matrix for proposed model where size is 16.
We predicted the target variable (y) from the test data (x) and subsequently calcu-
lated accuracy, precision, and recall (presumably using Eqs. (2), (3), and (4)), which
is standard practice for evaluating the performance of a classification model. Table 1,
An Optimized Neural Network Model to Classify Lung Nodules … 431
Fig. 3 Training and validation loss for our model for batch size 20; here learning rate is updated
epoch by epoch, at epoch 5 it’s showing best results. After epoch 5 both training and validation
accuracy deviated
Fig. 4 Training and validation loss for our model for batch size 16; here learning rate is updated
epoch by epoch, at epoch 10 it’s showing best results
as we have indicated, provides a clear summary of the final percentages for various
support counts. Support count likely refers to the number of instances or samples
associated with each class in our classification task. The table could present the
evaluation metrics for each class or category, demonstrating how well our model
performed in classifying each group. The results in Table 1 are vital for assessing the
model’s effectiveness in distinguishing between different classes, and they help to
gain insight into its performance across the dataset. The graph between true positive
and false positive for all classes is shown in Fig. 7.
TP + TN
accuracy = (2)
TP + TN + FP + FN
432 Asiya and N. Sugitha
Tp
Precision = (3)
T p + Fp
Tp
Recall = (4)
T p + Fn
An Optimized Neural Network Model to Classify Lung Nodules … 433
Randomly splitting our data into a training set and a testing set is a common prac-
tice in machine learning for assessing model performance, and calculating accuracy
is a valid way to measure a model’s performance. The approach considers whether
our model overfits and selects the best model from those we have experimented with.
5 Conclusion
In this paper we implemented an optimized CNN model that avoids overfitting with
kernel regularization. And our model performed well in terms of accuracy, precision,
recall, and F1-score. We got an accuracy of 0.951, we also compared accuracy with
other prescribed models, and in our model performance is better. Other models not
shown consistency of their model most of the models are getting overfitting. But in
this approach we overcome that.
434 Asiya and N. Sugitha
References
1. Liu W, Liu X, Luo X, Wang M, Han G, Zhao X, Zhu Z (2023) A pyramid input augmented
multi-scale CNN for GGO detection in 3D lung CT images. Pattern Recogn 136:109261
2. Tenescu A, Bercean BA, Avramescu C, Marcu M (2023) Averaging model weights boosts
automated lung nodule detection on computed tomography. In: Proceedings of the 2023 13th
international conference on bioscience, biochemistry and bioinformatics, pp 59–62
3. Bermejo-Peláez D, Ash SY, Washko GR, Estépar RSJ, Ledesma-Carbayo MJ (2020) Classifi-
cation of interstitial lung abnormality patterns with an ensemble of deep convolutional neural
networks. Sci Rep 10(1):338
4. Zhou J, Hu B, Feng W, Zhang Z, Fu X, Shao H, Wang H, Jin L, Ai S, Ji Y (2023) An ensemble
deep learning model for risk stratification of invasive lung adenocarcinoma using thin-slice
CT. NPJ Digital Med 6(1):119
5. Liang H, Hu M, Ma Y, Yang L, Chen J, Lou L, Chen C, Xiao Y (2023) Performance of
deep-learning solutions on lung nodule malignancy classification: a systematic review. Life
13(9):1911
6. Kadry S, Herrera-Viedma E, Crespo RG, Krishnamoorthy S, Rajinikanth V (2023) Automatic
detection of lung nodules in CT scan slices using CNN segmentation schemes: a study. Procedia
Comput Sci 218:2786–2794
7. Gugulothu VK, Balaji S (2023) A novel deep learning approach for the detection and
classification of lung nodules from CT images. Multimedia Tools Appl 1–24
8. Annavarapu CSR, Parisapogu SAB, Keetha NV, Donta PK, Rajita G (2023) A Bi-FPN-based
encoder–decoder model for lung nodule image segmentation. Diagnostics 13(8):1406
9. Naseer I, Akram S, Masood T, Rashid M, Jaffar A (2023) Lung cancer classification using
modified u-net based lobe segmentation and nodule detection. IEEE Access
10. Halder A, Dey D (2023) Atrous convolution aided an integrated framework for lung nodule
segmentation and classification. Biomed Signal Process Control 82:104527
11. Keshani M, Azimifar Z, Tajeripour F, Boostani R (2013) Lung nodule segmentation and recog-
nition using SVM classifier and active contour modeling: a complete intelligent system. Comput
Biol Med 43(4):287–300
12. El-Askary NS, Salem MA, Roushdy MI (2022) Features processing for random forest
optimization in lung nodule localization. Expert Syst Appl 193:116489
13. Cao H, Liu H, Song E, Ma G, Xu X, Jin R, Liu T, Hung CC (2020) A two-stage convolutional
neural network for lung nodule detection. IEEE J Biomed Health Inf 24(7):2006–2015
14. Gu Y, Lu X, Yang L, Zhang B, Yu D, Zhao Y, Gao L, Wu L, Zhou T (2018) Automatic lung
nodule detection using a 3D deep convolutional neural network combined with a multi-scale
prediction strategy in chest CTs. Comput Biol Med 103:220–231
15. Zhao C, Han J, Jia Y, Gou F (2018) Lung nodule detection via 3D U-Net and contextual
convolutional neural network. In: 2018 International conference on networking and network
applications (NaNA). IEEE, pp 356–361
16. Xie H, Yang D, Sun N, Chen Z, Zhang Y (2019) Automated pulmonary nodule detection in
CT images using deep convolutional neural networks. Pattern Recogn 85:109–119
17. Zheng S, Guo J, Cui X, Veldhuis RNJ, Oudkerk M, Van Ooijen PMA (20119) Automatic
pulmonary nodule detection in CT scans using convolutional neural networks based on
maximum intensity projection. IEEE Trans Med Imaging 39(3):797–805
18. Tang H, Kim DR, Xie X (2018) Automated pulmonary nodule detection using 3D deep convo-
lutional neural networks. In: 2018 IEEE 15th international symposium on biomedical imaging
(ISBI 2018). IEEE pp 523–526
19. Zhang J, Xia Y, Cui H, Zhang Y (2018) Pulmonary nodule detection in medical images: a
survey. Biomed Signal Process Control 43:138–147
20. Schultheiss M, Schober SA, Lodde M, Bodden J, Aichele J, Mueller-Leisse C, Renger B,
Pfeiffer F, Pfeiffer D (2020) A robust convolutional neural network for lung nodule detection
in the presence of foreign bodies. Sci Rep 10(1):12987
An Optimized Neural Network Model to Classify Lung Nodules … 435
21. Jin H, Li Z, Tong R, Lin L (2018) A deep 3D residual CNN for false-positive reduction in
pulmonary nodule detection. Med Phys 45(5):2097–2107
22. Manickavasagam R, Selvan S, Selvan M (2022) CAD system for lung nodule detection using
deep learning with CNN. Med Biol Eng Comput 60(1):221–228
23. Asiya, Sugitha, N, Automatically segmenting and classifying the lung nodules from CT images
2147–6799
24. https://www.kaggle.com/datasets/zhangweiled/lidcidri
Fake Product Review Monitoring System
Using Machine Learning
Abstract As facts and statistics on the web are growing rapidly, online reviews have
been a true revolution in the way people purchase products and services. Nowadays,
a wide range of e-commerce sites allow the clients to write their reviews about the
product that they purchased from that website as these reviews help the brand to
understand the customer requirements and the shortcomings in the product. These
brands try their best to get good reviews by improving the quality of product from
customers as bad reviews affect their business. Often customers need a review of a
product before investing in it as it impacts their decision for purchasing it. However,
many of the customers are not satisfied after buying the product from a particular
website and feel that the reviews are misleading and fake causing blight on all the
people even those who actually try to give genuine reviews. On some online plat-
forms, some of the reactions are planted by various frauds which are either hired by
an organization or belong to it and try to reduce the product value of competitors
by giving negative reviews to their product. And they often provide good reviews to
various products designed by their own company. So, we have used sentiment anal-
ysis to analyze reviews online and compared the results for two algorithms which are
the SVM classifier and Naive Bayes classifier so that the user can determine whether
the reviews are genuine or not.
1 Introduction
In this era, there is the presence of several ideas of how a person can get an item or
product by either physically located to the point or by just clicking on a few required
products. In the traditional style where a customer goes to the shopkeeper to purchase
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 437
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_33
438 P. Rajput and P. K. Sethi
items, several features are highlighted in front of the customer to gain his attention
and make him aware of the productivity of the item. There is no definite method in
this case to judge if the shopkeeper is lying or saying the facts about the product and
eventually has to be trusted by carefully evaluating the product.
On the other hand, the source of getting an item or product varies these days when
compared with traditional methods. Today, it is a common practice for people to read
online and identify applicable funding agencies here. If none, delete these reviews for
various purposes such as reading books, renting a car and going on a vacation before
going shopping. People can buy products from various brands of online stores. Here,
customers need to order the original product without seeing and checking. They read
the reviews and buy the products. When they get a lot of positive reviews, they are
more likely to buy the product and when they find more negative reviews about the
product, they will not buy it. Therefore, they rely on reviews about the product. The
only possible method when buying online to get authentic items is through honest
feedback on that item. However, negative fake reviews can damage reputation and
cause monetary loss [1].
A. Objective: The identified challenges encourage providing solutions to all the
issues raised in the problem stated before. The proposed methodology and objec-
tives of this application are as follows: Using two different algorithms to get a
better representation of the fake review detection function. When any fraud
makes any online purchase and reviews the product, most consumers spend
their quality time reading them if other user reviews are available and develop a
false notion regarding it. Hence, today’s youth and even adults are increasingly
relying on the updates available online. It means that people make their own
decisions purchasing a product by analyzing and presenting the ideas contained
in those products [2].
B. Sentiment Analysis: Sentiment analysis is a process of analyzing texts online to
determine the emotional tone, whether it is positive, negative or neutral. Simply
put, the analysis of emotions helps to understand the attitude of the author toward
the subject. Opinion mining identifies the tone behind a text written and is natural
language processing-based. Information is collected from several sites regarding
informal texts [3].
C. Types of Sentiment Analysis: Emotional analysis provides polarity as it divides
the text into various categories, usually the best for it and the worst is known as
fine-grained emotional analysis. There is another type of opinion mining which
does aspect-based analysis. It collects well-defined or negative portions. For
example, a product is heavy is the review given by the customer which does not
make the whole product faulty, instead tells about its weight which is causing
inconvenience to the customer.
Emotional detection indicates certain emotions rather than positive and negative
ones. Recognizing the actions behind a text beyond a point of view, the analysis is
intent-based. Emotional analysis is very helpful, if it is tied to a specific attribute or
feature described in the text. The process of acquiring these attributes or features and
their meaning is called aspect-based sentiment analysis, or ABSA.
Fake Product Review Monitoring System Using Machine Learning 439
T
(1) Text is divided into sections.
(2) Now the sentence which has emotions in it is identified so as to analyze it.
(3) From a range of − 1 to 1 emotional points are given for all the words used.
(4) Now multiple layers are combined with emotional analysis.
(1) The rule-based system: Now we have a method to calculate the number of
positive or negative words in a given database but tokenization and other methods
require this system. If the number of words expressing happiness are greater
than the number of opposing words, it means that the emotions are positive and
vice versa. However, there are few disadvantages of this system. As the name
suggests, containing a rule for all the combinations of a word requires lots of
work and is very tiring and these days people have started using so much slang
that it is almost very difficult to maintain that rule set.
(2) Automatic method: This method works on a machine learning program. First,
data sets are trained and predictable analysis is performed. The process following
which the spelling of a text is done. This text release can be done using various
techniques such as Naive Bayes, linear regression, support vector and learning
as these machine learning methods are used.
(3) Hybrid approach: A combination of both of the above methods namely rule
and automatic method. What remains is that accuracy is high compared with
the other two methods. Figure 1 illustrates how the meaning of a comment can
be understood to a great level when machine learning is combined with NLP to
identify statements [4].
2 Literature Review
Recently, the World Wide Web has revolutionized the information technology
industry. An online survey is what you find on many online sites, such as feedback,
tweets, posts, survey sites, news forums, sites of fully owned companies or social
networking sites. Liu et al. provided the first study to identify spam reviews. They
dealt with fake or reviews which were near to fake using logistic regression to classify
updates as true or false with an accuracy of up to 78 percent [5]. In paper [6] authors
worked on calculating the probability of a spam index distributing spam keywords
and a small difference between non-spam reviews using the LDA algorithm.
Ignatova et al. worked on a dataset so as to judge the fake customers, mostly the
people from organizations who give fake feedback and put them into a particular
category [7]. The authors used sequential minimal optimization to classify reviews
with 81% accuracy in terms of F1 scores; furthermore they were able to improve the
440 P. Rajput and P. K. Sethi
performance of the model to 84% [8]. Hozhabri et al. have provided a novel-based
graph for finding spam in the reviews which achieved 93% accuracy. The method
involves calculating an average of reviews and then giving weight and multiplying
both [9].
Since Amazon is a world-renowned shopping app, people always trust things
by looking at product reviews. But reviews on “Amazon” are not really topics, a
combination of item reviews and item service reviews. “Amazon” provides feedback
in the form of mixed feedback and the user has an error as to the general perception
(rating level) that there is no difference between service reviews and item reviews.
Patel et al., the proposed model makes a satisfactory distinction between both [10].
Sivagangadhar et al. used language features such as Unigram form, Unigram
frequency, Bigram presence, Bigram frequency and length of the review to design
a model and find false reviews. However, the main problem is the lack of data and
the need for both linguistic and behavioral characteristics [11]. Tidke et al. worked
to find spam reviewers who try to convert ratings into bad for other target products
[12].
Karami et al. proposed using categories of lexical semantics and also the language
features in obtaining online spam reviews [13]. Product reviews play a key role in
deciding whether to sell a particular product in applications such as “e-commerce
websites” or applications such as “Amazon”, “Snapdeal”, “OLX” and “Myntra”,
“Flipkart”, in sentiment analysis, the goal is to get customer feedback. The spam
dictionary is used to identify spam words in updates. The first is to look with the help
of the decision tree to decide whether the review is relevant to a particular product
or not. Sinha et al. used several algorithms in the text mining and gave direct results
based on these algorithms [14]. Zhang et al. review demonstrated three types of
new traits, including concentration, meaning and emotion. The authors provided a
Fake Product Review Monitoring System Using Machine Learning 441
model algorithm for each feature structure. They concluded that the proposed models,
calculations and features were productive in the Mac Review detection process [15].
Angelpreethi et al. did an opinion minor based on the feature. Minor’s main task is
to review reviews and determine the product’s key features by creating an evaluation
profile of each product that the user can use [16].
In 2018, Manickam et al. published a paper titled “Fake News on social media:
A Brief Review on Detection Techniques”. This paper was written to provide the
researchers an idea about techniques involved at that moment on identifying misin-
formation highlighting several social contexts, policies and mechanisms undertaken
[17].
As e-commerce grows and becomes more and more popular day by day, the
number of comments coming from customers about any product is rapidly increasing.
These days, people rely heavily on reviews before buying anything. This can lead
many people to write unnecessary scams and reviews about other related products or
services. Some companies in the marketplace provide employment to experts to add
false reviews for their product upscale or to discredit the quality of their emerging
or existing challenger.
Sonawane et al. have developed a method of detecting and recording false
reviews. The proposed method automatically categorizes user feedback into “unreal”,
“explicit” and “blur” categories through step-by-step processing. Obscure category
repeatedly reveals obscure or obscure elements. This results in better recognition
and benefits for both the business and the customer. Sales organizations can monitor
sales of their products by analyzing and understanding what customers say about
their products. It helps customers buy valuable products and spend their money on
quality products. Finally, end users view each individual test with a polarity score
and a reliability score.
Gangireddy et al. expanded the graph of reviewers to find a makeshift group of
users working together on spam. Karahalios et al. gave a clear vision on how the
reviews of people in the starting can affect the thoughts of the rest of them and create
an impact toward the outcome of the brand building. It has been noticed in their work
that at the lower level itself how negative feedback can be detected by not letting it
affect the brand. Various studies and experiments are performed which show how
such feedbacks can be detected using various classification algorithms [18].
Saad et al. discussed that the main reason behind the inclination of the decision of
people on already reviewed products is industrial advances. Luciano et al. discussed
how the presence of various scientific and general advancements has led to exponen-
tial increase of fraud feedback. Humanly it’s quite an exhausting task to detect the
text whether a piece was written using various programs, authentic user or automated
process. With the widespread use of mediums and websites it has become very easy
to spread wrong feedback about a product.
As the hired fraud can move from one platform to another giving the same reviews
which create wrong views among the real customers. Hence these methods are
followed by many organizations in order to scale out their product or to increase
their popularity and genuineness by spreading wrong information which should be
put to halt at the earliest. Iglesias et al. performed the tests on a dataset where the
442 P. Rajput and P. K. Sethi
algorithms applied the methodology of learning from the data itself. To perform their
research, they used various attributes such as the degree of emotions in the feedback,
proper analysis of the activity of the user so as to get its idea of logins into the account
or feedback on other products and many more. Through their experiment they were
able to achieve 82% accuracy [19]. Gupta et al. during their research found out that
some people are assigned tasks to either promote a brand by giving such reviews
which will be trusted on or give degrading points on all the products of that brand. It
is noticed that these organizations have enough resources to pay a bunch of people
to do this task. So, it is important to identify such people rather than looking for an
individual one by one which was the purpose of research. So a methodology was
adopted in which several customers were put into the same group and were called
fake owing to their behavior of giving either best or worst reviews to all the products
of the same brand.
3 Methodology
First detailed research was done and studied the procedure which was used to be
followed before. It was found that there were some flaws and drawbacks in the
existing approach adopted. Then a set of data was prepared from various websites
and the model was trained with 80% data and tested with 20% data. All the data was
put into a similar format.
In the next step the reviews were manually labeled as spam and non-spam.
Also, various techniques were followed while preprocessing the data to handle
inconsistency in the text and noise as well.
The words in the data were tokenized which enabled us to consider various words
present in the sentence and the useless words were removed such as pronouns. Now
the number of words used is kept in track. Furthermore an NLP model that is the
bag-of-words was used where each word was given some score. Machine learning
algorithms can be used here because in this model features are extracted from a
sentence so that the text can be used.
Now the classifiers are using the words taken with the help of the bag-of-words
model. For the purpose of modeling reviews are then converted into vectors. Cleaning
operation is performed after loading and the words which are not in the vocabulary
are removed. Rest of the tokens are then converted into a single line for encoding
and a list of positive and negative reviews is generated. Then analysis is done for the
reviews using both SVM classifier and Naive Bayes classifier. So now this system
will help in identifying fake product feedback and will show only the latest comment
made by the user. Here Naïve Bayes and support vector machine classifiers have
been used. We will see which classifier is more accurate and labeling of words is
done into positive and negative. Figure 2 represents methodology followed for the
above process.
Algorithms used: Depending on the kind of data either Naïve Bayes or support
vector machine can be better. More computation power is required by SVMs to
Fake Product Review Monitoring System Using Machine Learning 443
Fig. 2 Methodology
followed flowchart
train the dataset, so it is very crucial to choose that algorithm for our dataset which
fulfills the requirement and purpose of the research. Though both the algorithms are
supervised, Naïve Bayes decides that into which class the new identified data can
be put into depending on the chance of that data falling into the category. SVM is
useful where decision boundaries are required and divides the data according to the
category or classes they reside in. Experiments have also revealed that if there is very
large data then SVM algorithm can be used.
On the contrary if the data on which classification is to be conducted is very small
then Naïve Bayes can be used. Furthermore Naïve Bayes is easier to implement and
the decisions derived using this algorithm are probabilistic. SVM creates a partition
which is nonlinear between the categories whereas Naïve Bayes is used in the cases
444 P. Rajput and P. K. Sethi
where the presence of one kind of feature does not intervene with the other. Here in
our experiment, we will compare the performance of both the classifiers.
4 Result
After performing both the classifiers on reviews for the dataset the result obtained is
as follows. Figure 3 represents for the Naïve Bayes classifier we get 96% accuracy,
99.4% precision, 91.97% recall and 94.46% F1 score whereas Fig. 4 represents for
the SVM classifier we get 82.50% accuracy, 82.59% precision, 82.46% recall and
82.47% F1 score a. Comparison Analysis: Figure 5 displays the comparison between
both the classifiers used applications of the classifiers.
One of the most common features of it is identifying various features of the face.
News can be classified into various categories such as geopolitical, sports and more.
It is useful in the field of medicine to detect for which disease the patient is at high
risk.
(2) Support Vector Machine Classifier
It is useful in the field of bioinformatics to detect the diseases that patients can
have on the basis of gene classification.
Handwritten characters can be identified, determining layered structure of the
planet.
5 Conclusion
This paper aims to help customers find the authenticity of the product. So, the two
models were developed to compare the results obtained from both the classifiers.
Using emotional analysis has assisted a lot in identifying and segregating the fake
reviews from the real ones. It will help people from various businesses who want to
work on their brand building and want to take it to a global level as it would require
true reviews of the customers. This gives people with great vision a chance to work on
the quality of product or enhancing the features of the product rather than deviating
from the true essence and losing focus on their goals because of some fraud. Now,
after performing these comparisons it was found that Naïve Bayes outperformed the
other classifier in terms of authenticity detection.
The division of the Naïve Bayes was found to be more accurate than the SVM
planning. Major challenges lie in analyzing emotions and evolving the model with
Fig. 5 Comparison-based
study
446 P. Rajput and P. K. Sethi
time as the language and slangs used by the people keep changing, which plays an
important role in better differentiation. In the coming year, the accuracy of these
analyses will increase more with the improvement in the models and the advance-
ments will bring a significant change, for next time better algorithms can be used to
analyze spam and provide a better overview of the product to the customer.
References
1. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international
conference on web search and data mining. Palo Alto, California, USA, pp 219–230
2. Lau RY, Liao SY, Kwok RC, Xu K, Xia Y, Li Y (2011) Text mining and probabilistic language
modeling for online review spam detection. ACM Trans Manag Inf Syst (TMIS) 2(4):1–30
3. Allahbakhsh M, Ignjatovic A, Benatallah B, Beheshti SMR, Foo N, Bertino E (2012) Detecting,
representing and querying collusion in online rating systems, https://arxiv.org/abs/1211.0963
4. Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews
using lexical and syntactic features. In: 2013 13th international conference on intelligent
systems design and applications (ISDA). Salangor, Malaysia, pp 53–58
5. Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA (2014) A novel approach for
opinion spam detection in ecommerce. In: Proceedings of the 8th IEEE international conference
on E-commerce with focus on E-trust. Mashhad, Iran
6. Bhatt A, Patel A, Chheda H, Gawande K (2015) Amazon review classification and sentiment
analysis. Int J Comput Sci Inf Technol (IJCSIT) 6(6)
7. Shivagangadhar K, Sagar H, Sathyan S, Vanipriya CH (2015) Fraud detection in online reviews
using machine learning techniques. Int J Comput Eng Res (IJCER) 5(5):52–56
8. Kokate S, Tidke B (2015) Fake review and brand spam detection using J48 classifier. IJCSIT
Int J Comput Sci Inf Technol 6(4):3523–3526
9. Karami A, Zhou B (2015) Online review spam detection by new linguistic features. In:
iConference 2015 proceedings
10. Sinha A, Arora N, Singh S, Cheema M, Nazir A (2018) Fake product review monitoring using
opinion mining. Int J Pure Appl Math 119(122018)
11. Li Y, Feng X, Zhang S (2016) Detecting fake reviews utilizing semantic and emotion model. In:
2016 3rd International conference on information science and control engineering (ICISCE).
Beijing, pp 317–320
12. Angelpreethi A, Kumar SBR (2017) An enhanced architecture for feature based opinion mining
from product reviews. In: 2017 World congress on computing and communication technologies
(WCCCT). Tiruchirappalli, pp. 89–92
13. Mahid, Manickam S, Karuppayah S (2018) Fake news on social media: brief review on detection
techniques. In: 2018 Fourth international conference on advances in computing, communication
automation (ICACCA), pp 1–5. https://doi.org/10.1109/ICACCAF.2018.8776689
14. Patel D, Kapoor A, Sonawane S (2018) Fake review detection using opinion mining. Int Res
J Eng Technol (IRJET) 05(12) e-ISSN: 2395-0056. Dhawan S, Gangireddy SCR, Kumar S,
Chakraborty T (2019) Spotting collusive behavior of online fraud groups in customer reviews.
CoRR abs/1905.13649:191–200
15. Gilbert E, Karahalios K (2010) Understanding deja reviewers. In: CSCW, pp 225–228. [Online]
16. Ahmed H, Traore I, Saad S (2018) Detecting opinion spams and fake news using text
classification. Security Privacy 1(1):e9
17. Floridi L, Chiriatti M (2020) GPT-3: its nature, scope, limits, and consequences. Minds Mach
1–14:2020
18. Barbado R, Araque O, Iglesias CA (2019) A framework for fake review detection in online
consumer electronics retailers. Inf Process Manage 56(4):1234–1244
19. Gupta V, Aggarwal A, Chakraborty T (2020) Detecting and characterizing extremist reviewer
groups in online product reviews. IEEE Trans Comput Soc Syst 7(3):741–750
Perception to Control: End-to-End
Autonomous Driving Systems
Yoshita
Department of Computer Science and Engineering, Amity University, Gurugram, Haryana, India
A. Jatain
Department of Computer Science and Technology, Manav Rachna University, Faridabad, India
Manju
Department of Computer Science and Engineering and Information Technology Jaypee Institute
of Information Technology, Noida, India
S. Kumar (B)
Department of Computer Science and Engineering, School of Engineering and Technology,
CHRIST (Deemed to Be University), Kengeri Campus, Bangalore, Karnataka 560074, India
e-mail: sandeepkumar@christuniversity.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 447
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8_34
448 Yoshita et al.
1 Introduction
2 Literature Review
Several researchers have proposed different methods for autonomous driving using
deep learning techniques. Chen et al. [2] introduced a method that uses a convolu-
tional neural network (CNN) to learn road affordances and make steering predictions
based on raw sensor input. A dataset of front-facing camera images and corresponding
steering commands was used to train the CNN, which achieved a mean absolute
error of 4.0 degrees in steering angle prediction. Dosovitskiy et al. [3] used a large
video dataset of over 72 h of footage and corresponding steering commands to train
a CNN for end-to-end self-driving. Their CNN model had a mean absolute error
of 3.3° on an independent test dataset. Wang et al. [4] presented a self-supervised
learning technique that used monocular camera-based ego-motion estimation to train
a CNN for steering angle prediction, achieving an absolute mean error of 4.6° on
a test set. Kim et al. [5] proposed a method that used deep reinforcement learning
and cameras to construct a driving policy based on a reward function, achieving
an 84.2° success rate on a different test set. Li et al. [6] introduced a real-time
deep learning method for semantic instance segmentation in autonomous driving,
achieving a mean intersection over union (IoU) of 68.6% on the Cityscapes dataset.
Godard et al. [7] presented an unsupervised learning method for monocular depth
estimation from a single image, achieving cutting-edge performance on the KITTI
test. Bojarski et al. [8] introduced a neural network that translates raw camera pixels
to steering commands without human feature engineering, using a mix of CNNs and
RNNs. However, the method does not include additional sensor data, which could
potentially improve performance. Bansal et al. [9] proposed a curriculum learning
strategy where a neural network is trained on simple driving scenarios and grad-
ually exposed to more complex situations. However, the approach has limitations
450 Yoshita et al.
in terms of covering a restricted set of scenarios and assuming human drivers are
already driving safely in all potential scenarios. Wadekar et al. [10] proposed a deep
learning model for autonomous racing vehicles that anticipate steering and throttle
instructions using sensor data. However, the model does not account for other crucial
elements such as braking and requires a substantial quantity of high-quality training
data for excellent performance. Further, Table 1 highlights the main outcomes and
limitations of each of the papers mentioned above.
In recent research, multiple authors have proposed multi-modal fusion systems for
autonomous driving that incorporates vision, Radar, and LiDAR sensors. Zhang et al.
[11] developed a system that incorporated vision, Radar, and LiDAR sensors, and
demonstrated improved driving performance compared with single sensor modality
models. Chen et al. [12] introduced a neural network architecture that combined
LiDAR and vision inputs, and showed superior performance compared with using
Perception to Control: End-to-End Autonomous Driving Systems 451
only one sensor modality. Wang et al. [13] presented a multi-modal data fusion
approach using LiDAR, camera, and Radar data, which outperformed single-sensor
modality models. Fan et al. [14] proposed a two-stage network-based technique for
integrating LiDAR and vision data, and Sun et al. [15] suggested a multi-modal fusion
strategy incorporating LiDAR, camera, and Radar data, both showing improved
driving performance. An end-to-end driving model was proposed by Zhou et al.
[16] and was trained using a huge database of driving videos, which outperformed
handcrafted characteristic methods. Zhijian Liu and Alexander Amini [17] proposed
a neural network architecture that used only LiDAR sensor data and achieved compa-
rable performance to models combining LiDAR and camera data. Marc Uecker and
Tobias Fleck [18] investigated deep learning representations for LiDAR point cloud
data, finding that the learned features were related to the physical properties of the
environment. Further the table (Table 2) highlights the main outcomes and limitations
of each of the papers mentioned above.
3 Methodology
End-to-end autonomous driving systems are always evolving; therefore, it’s impor-
tant to stay up-to-date. Data collection, data pre-processing, model selection, model
training, model validation, and model deployment are the general phases involved.
These phases are illustrated in Fig. 2.
During data collection, large amounts of data are gathered from various sources,
including cameras, LiDAR, and other sensors. This data is then pre-processed in the
data pre-processing stage to remove any noise, outliers, or irrelevant information,
ensuring that it is ready for training and validation. Next, in the model selection stage,
various deep learning models and techniques, such as CNNs, RNNs, and generative
adversarial networks (GANs) are considered, and the optimal model is chosen based
on the type of data and system requirements. Once the model is selected, it is trained
using the pre-processed data in the model training stage. This involves teaching the
model how to detect various objects, such as automobiles, people, and traffic signs,
and how to make driving judgements using supervised learning approaches. After
training, the model’s accuracy and performance are validated on a separate set of
data in the model validation stage. If problems or flaws are found, this allows for
adjustments to be made to the model. Finally, in the deployment stage, the trained
and validated model is put into use in the autonomous vehicle to control it and make
real-time driving decisions. To ensure the efficiency and security of an end-to-end
autonomous driving system, it is crucial to keep up with the newest developments in
the industry.
452 Yoshita et al.
Finally, this paper summarises the recent progress in comprehensive deep learning
strategies for autonomous vehicles. These approaches have shown great promise
in predicting steering and other driving commands from raw sensor input without
the need for manual feature engineering. However, some papers do not consider all
Perception to Control: End-to-End Autonomous Driving Systems 453
possible scenarios or sensor data, and there is a need for more adaptable and safe
models to be deployed in the real world. Future research areas include the integra-
tion of LiDAR or Radar sensors to improve performance and reliability, improving
the resilience and adaptability of autonomous driving systems to changing environ-
ments, developing interpretable deep learning models, ensuring safety and ethics
in self-driving systems, studying real-time deep learning models, developing high-
generalisability deep learning models, and focusing on human-friendly autonomous
driving systems. Overall, this review provides valuable insights into the current state
of deep learning for autonomous driving and suggests several research directions to
make these systems more robust, adaptable, and deployable in the real world in a
safe manner.
References
1. https://www.researchgate.net/figure/Traditional-driving-systems-compared-to-end-to-end-dri
ving-system_fig1_351856725
2. Chen J, Liu X, Li W, Wei Y (2015) DeepDriving: learning affordance for direct perception in
autonomous driving. In: Proceedings of the IEEE international conference on computer vision
(ICCV), pp 2722–2730
3. Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) End-to-end learning of driving
models from large-scale video datasets. In: Proceedings of the IEEE conference on computer
vision and pattern recognition (CVPR), pp 2174–2182
454 Yoshita et al.
4. Wang C, Xu D, Zhu Y (2018) Self-supervised learning for camera-based driving models. In:
Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp
3379–3388
5. Kim S, Kim H, Lee S, Choi J, Kim S (2018) Camera-based end-to-end autonomous driving
using deep reinforcement learning. In: Proceedings of the IEEE international conference on
robotics and automation (ICRA), pp 1–8
6. Li X, Huang W, Liang X, Wang L (2019) Real-time joint semantic-instance segmentation for
autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern
recognition (CVPR), pp 9219–9228
7. Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised Monocular depth estimation with
left-right consistency. In: Proceedings of the IEEE conference on computer vision and pattern
recognition (CVPR), pp 6602–6611
8. Bojarski M, Testa DD, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M,
Muller U, Zhang J, Zhang X, Zhao J, Zieba K (2016) End to end learning for self-driving cars.
arXiv:1604.07316
9. Bansal M, Krizhevsky A, Ogale A (2018) ChauffeurNet: learning to drive by imitating the best
and synthesizing the worst. In: Robotics: science and systems (RSS)
10. Wadekar SN, Schwartz BJ, Kannan SS, Mar M, Manna RK, Chellapandi V, Gonzalez DJ,
Gamal AE (2020) Towards end-to-end deep learning for autonomous racing: on data collection
and a unified architecture for steering and throttle prediction. arXiv:2010.06412
11. Zhang B, Li W, Chen J (2021) Multi-modal fusion for end-to-end autonomous driving. arXiv
preprint arXiv:2101.02280
12. Chen Y, Yang B, Li M, Urtasun R (2021) Integrating lidar and vision for end-to-end autonomous
driving. arXiv preprint arXiv:2102.02331
13. Wang X et al (2021) End-to-end autonomous driving with multi-modal data fusion. arXiv
preprint arXiv:2108.11249
14. Fan L et al (2021) Fusing LiDAR and vision for end-to-end autonomous driving. arXiv preprint
arXiv:2109.02970
15. Sun L et al (2021) Multi-modal fusion for end-to-end autonomous driving based on deep
learning. IEEE Access 9:119196–119208
16. Zhou B et al (2018) End-to-end learning of driving models from large-scale video datasets.
IEEE Trans Intell Transp Syst 20(4):1276–1289
17. Liu Z, Amini A (2021) Efficient and robust LiDAR-based end-to-end navigation. arXiv preprint
arXiv:2109.02004
18. Uecker M, Fleck T (2021) Analyzing deep learning representations of point clouds for real-
time in-vehicle LiDAR perception. In: Proceedings of the IEEE international conference on
intelligent transportation systems (ITSC)
Author Index
A E
Aditya Bhaskar, 71 Esmita Gupta, 321
Aishwarya V. Kadu, 219
Aman Jatain, 447
Anbazhagan, M., 403 F
Ancy Micheal, A., 173 Farhana Zareen, 297
Anukriti Bansal, 297
Anupriya Kamble, 415
Aparajita Sinha, 195, 387 G
Aparna N. Mahajan, 247 Ganesh, K., 403
Ashok Pal, 15 Ganesh Prasad Pal, 61
Ashwin Raiyani, 1 Gaurav Pendharkar, 173
Asiya, 425 Gautam Mehendale, 309
Ayon Tarafdar, 95 Geeta Rani, 195
Ghanashyama Mahanty, 123
Gyanendra Kumar Gaur, 95
B
Bharti Joshi, 71
Butta Singh, 103
H
Heli Nandani, 83
C Hemant H. Patel, 1
Chandan Kumar Deb, 95 Hetvi Shah, 309
Chandralekha, M., 113 Himali Sarangal, 103
Chetan Pattebahadur, 415 Himanshu Goswami, 309
Chinmayee Kale, 309
I
D Indresh Kumar Verma, 265
Deepak Kumar, 183
Devanshi Shah, 83
Dhanya Pramod, 335 J
Din, Der-Rong, 47 Jason Misquitta, 173
Diriba C. Kejela, 235 Jaspreet Kaur, 15
Drishti Bharti, 157 Jatinderkumar R. Saini, 209
Dwijendra Nath Dwivedi, 123 Jayakumar Kaliappan, 31
© The Editor(s) (if applicable) and The Author(s), under exclusive license 455
to Springer Nature Singapore Pte Ltd. 2024
H. Sharma et al. (eds.), Communication and Intelligent Systems, Lecture Notes in
Networks and Systems 968, https://doi.org/10.1007/978-981-97-2079-8
456 Author Index
K R
Kadam, A. B., 415 Rachit Shah, 83
Kartik Verma, 103 Raju Pal, 61
Kashif Saleem, 113 Ramesh Manza, 415
Kavya Muktha, 277 Ranjeesh Kaippada, 173
Kehali A. Jember, 235 Reddy, K T V, 219
Keshav Sairam, 387 Riya Raj, 31
Ketema T. Megersa, 235
Kirti Dinkar More, 335
Krishna Chidrawar, 141 S
Kumari Priyanshi, 157 Sahil Borkar, 141
Sakshi Naik, 141
Samuel T. Daka, 235
Sandeep Kumar, 447
M Sandip R. Panchal, 1
Madhav Ajwalia, 83 Sara Bansod, 265
Manjit Singh, 103 Satveer Kour, 103
Manju, 447 Shail Shah, 83
Mathura Bai Baikadolla, 277 Shilpa Shinde, 321
Mayur M. Jani, 1 Shivam Panwar, 297
Md. Ashraful Haque, 95 Shraddha Vaidya, 209
Mohanvenkat Patta, 277 Shreya Dave, 377
Monika Agarwal, 195, 387 Shreyas Visweshwaran, 403
Moti B. Dinagde, 235 Srirachana Narasu Baditha, 277
Mousami P. Turuk, 141 Sudeep Marwaha, 95
Sudhir Bagul, 309
Sugitha, N., 425
N Suvarna Bhoj, 95
Namita Goyal, 247
Neelum Dave, 377
Nishat Shaikh, 359 T
Tadele A. Abose, 235
Tripathi, K. C., 247
Triveni Dutt, 95
P
Pankaj Kumar Sethi, 437
Parth Shah, 83, 359 V
Prabhjot Kaur, 157 Vaibhav B. Vaijapurkar, 141
Pradeep, K., 387 Vinay, R., 195
Pragya Rajput, 437 Vineet Sharma, 183
Pranita Ranade, 265
Preksha Khatri, 309
Priteshkumar Prajapati, 83 Y
Priyadharshini Jayadurga, N., 113 Yoshita, 447