Intelligent Data Communication Technologies and Internet of Things

Lecture Notes on Data Engineering
and Communications Technologies 101
D. Jude Hemanth
Danilo Pelusi
Chandrasekar Vuppalapati Editors
Intelligent Data
Communication
Technologies
and Internet
of Things
Proceedings of ICICI 2021
Lecture Notes on Data Engineering
and Communications Technologies
Volume 101
Series Editor
Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data
technologies and communications. It will publish latest advances on the engineering
task of building and deploying distributed, scalable and reliable data infrastructures
and communication systems.
The series will have a prominent applied focus on data technologies and
communications with aim to promote the bridging from fundamental research on
data science and networking to data engineering and communications that lead to
industry products, business knowledge and standardisation.
Indexed by SCOPUS, INSPEC, EI Compendex.
All books published in the series are submitted for consideration in Web of Science.
More information about this series at https://link.springer.com/bookseries/15362

D. Jude Hemanth · Danilo Pelusi ·
Chandrasekar Vuppalapati
Editors
Intelligent Data
Communication
Technologies and Internet
of Things
Proceedings of ICICI 2021
Editors
D. Jude Hemanth Danilo Pelusi
Department of Electronics Faculty of Communication Sciences
and Communication Engineering University of Teramo
Karunya Institute of Technology Teramo, Italy
and Sciences
Coimbatore, India
Chandrasekar Vuppalapati
Department of Computer Engineering
San Jose State University
San Jose, CA, USA
ISSN 2367-4512 ISSN 2367-4520 (electronic)

Lecture Notes on Data Engineering and Communications Technologies
ISBN 978-981-16-7609-3 ISBN 978-981-16-7610-9 (eBook)
https://doi.org/10.1007/978-981-16-7610-9
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
We are honored to dedicate the proceedings
of ICICI 2021 to all the participants and
editors of ICICI 2021.
Foreword
This conference proceedings volume contains the written versions of most of

the contributions presented during the conference of ICICI 2021. The conference
provided a setting for discussing recent developments in a wide variety of topics
including data communication, computer networking, communicational technolo-
gies, wireless and ad hoc network, cryptography, big data, cloud computing, IoT, and
healthcare informatics. The conference has been a good opportunity for participants
coming from various destinations to present and discuss topics in their respective
research areas.
ICICI 2021 conference tends to collect the latest research results and applications
on intelligent data communication technologies and Internet of Things. It includes a
selection of 77 papers from 275 papers submitted to the conference from universities
and industries all over the world. All the accepted papers were subjected to strict
peer reviewing by 2–4 expert referees. The papers have been selected for this volume
because of quality and the relevance to the conference.
ICICI 2021 would like to express our sincere appreciation to all authors for their
contributions to this book. We would like to extend our thanks to all the referees for
their constructive comments on all papers; especially, we would like to thank Guest
Editors Dr. D. Jude Hemanth, Professor, Department of ECE, Karunya Institute
of Technology and Sciences, India; Dr. Danilo Pelusi, Faculty of Communication
Sciences, University of Teramo, Italy, and Dr. Chandrasekar Vuppalapati, Professor,
San Jose State University, California, USA, for their hard working. Finally, we would
like to thank Springer publications for producing this volume.
Dr. K. Geetha
Conference Chair—ICICI 2021
vii
Preface
It is with deep satisfaction that I write this Foreword to the Proceedings of the ICICI
2021 held in, JCT College of Engineering and Technology, Coimbatore, Tamil Nadu,
from August 27 to 28, 2021.
This conference was bringing together researchers, academics and professionals
from all over the world, experts in Data Communication Technologies and Internet
of Things.
This conference encouraged research students and developing academics to
interact with the more established academic community in an informal setting to
present and discuss new and current work. The papers contributed the most up-to-date
scientific knowledge in the fields of data communication and computer networking,
communication technologies, and their applications such as IoT, big data, and cloud
computing. Their contributions aided in making the conference as successful as it
has been. The members of the local organizing committee and their assistants have
put in a lot of time and effort to ensure that the meeting runs smoothly on a daily
basis.
We hope that this program will stimulate further research in intelligent data
communication technologies and the Internet of Things, as well as provide prac-
titioners with improved techniques, algorithms, and deployment tools. Through this
exciting program, we feel honored and privileged to bring you the most recent devel-
opments in the field of intelligent data communication technologies and the Internet
of Things.
We thank all authors and participants for their contributions.
Coimbatore, India Dr. D. Jude Hemanth

Teramo, Italy Dr. Danilo Pelusi
San Jose, USA Dr. Chandrasekar Vuppalapati
ix
Acknowledgements
ICICI 2021 would like to acknowledge the excellent work of our conference orga-
nizing committee, keynote speakers for their presentation on August 27–28, 2021.
The organizers also wish to acknowledge publicly the valuable services provided by
the reviewers.
On behalf of the editors, organizers, authors and readers of this conference, we
wish to thank the keynote speakers and the reviewers for their time, hard work,
and dedication to this conference. The organizers wish to acknowledge Dr. D. Jude
Hemanth and Dr. K. Geetha for the discussion, suggestion, and cooperation to orga-
nize the keynote speakers of this conference. The organizers also wish to acknowledge
for speakers and participants who attend this conference. Many thanks given for all
persons who help and support this conference. ICICI 2021 would like to acknowl-
edge the contribution made to the organization by its many volunteers. Members
contribute their time, energy, and knowledge at a local, regional, and international
levels.
We also thank all the chair persons and conference committee members for their
support.
xi
Contents
An Optimized Convolutional Neural Network Model for Wild

Animals Detection Using Filtering Techniques and Different
Opacity Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Pavan Nageswar Reddy Bodavarapu, T. Ashish Narayan,
and P. V. V. S. Srinivas
A Study on Current Research and Challenges in Attribute-based
Access Control Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
K. Vijayalakshmi and V. Jayalakshmi
Audio Denoising Using Deep Neural Networks . . . . . . . . . . . . . . . . . . . . . . 33
S. Jassem Mohammed and N. Radhika
Concept and Development of Triple Encryption Lock System . . . . . . . . . 49
A. Fayaz Ahamed, R. Prathiksha, M. Keerthana, and D. Mohana Priya
Partially Supervised Image Captioning Model for Urban Road
Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
K. Srihari and O. K. Sikha
Ease and Handy Household Water Management System . . . . . . . . . . . . . 75
K. Priyadharsini, S. K. Dhanushmathi, M. Dharaniga,
R. Dharsheeni, and J. R. Dinesh Kumar
Novel Intelligent System for Medical Diagnostic Applications
Using Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
T. P. Anithaashri, P. Selvi Rajendran, and G. Ravichandran
Extracting Purposes from an Application to Enable Purpose
Based Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Amruta Jain and Sunil Mane
Cotton Price Prediction and Cotton Disease Detection Using
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Priya Tanwar, Rashi Shah, Jaini Shah, and Unik Lokhande
xiii
xiv Contents
Acute Leukemia Subtype Prediction Using EODClassifier . . . . . . . . . . . . 129

S. K. Abdullah, S. K. Rohit Hasan, and Ayatullah Faruk Mollah
Intrusion Detection System Intensive on Securing IoT
Networking Environment Based on Machine Learning Strategy . . . . . . 139
D. V. Jeyanthi and B. Indrani
Optimization of Patch Antenna with Koch Fractal DGS Using
PSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Sanoj Viswasom and S. Santhosh Kumar
Artificial Intelligence-Based Phonocardiogram: Classification
Using Cepstral Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
A. Saritha Haridas, Arun T. Nair, K. S. Haritha,
and Kesavan Namboothiri
Severity Classification of Diabetic Retinopathy Using
Customized CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
Shital N. Firke and Ranjan Bala Jain
Study on Class Imbalance Problem with Modified KNN
for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
R. Sasirekha, B. Kanisha, and S. Kaliraj
Analysis of (IoT)-Based Healthcare Framework System Using
Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
B. Lalithadevi and S. Krishnaveni
Hand Gesture Recognition for Disabled Person with Speech
Using CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
E. P. Shadiya Febin and Arun T. Nair
Coronavirus Pandemic: A Review of Different Machine
Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
Bhupinder Singh and Ritu Agarwal
High Spectrum and Efficiency Improved Structured
Compressive Sensing-Based Channel Estimation Scheme
for Massive MIMO Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
V. Baranidharan, C. Raju, S. Naveen Kumar, S. N. Keerthivasan,
and S. Isaac Samson
A Survey on Image Steganography Techniques Using Least
Significant Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
Y. Bhavani, P. Kamakshi, E. Kavya Sri, and Y. Sindhu Sai
Efficient Multi-platform Honeypot for Capturing Real-time
Cyber Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
S. Sivamohan, S. S. Sridhar, and S. Krishnaveni
Contents xv
A Gender Recognition System from Human Face Images Using

VGG16 with SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
S. Mandara and N. Manohar
Deep Learning Approach for RPL Wormhole Attack . . . . . . . . . . . . . . . . 321
T. Thiyagu, S. Krishnaveni, and R. Arthi
Precision Agriculture Farming by Monitoring and Controlling
Irrigation System Using Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Badri Deva Kumar, M. Sobhana, Jahnavi Duvvuru,
Chalasani Nikhil, and Gopisetti Sridhar
Autonomous Driving Vehicle System Using LiDAR Sensor . . . . . . . . . . . . 345
Saiful Islam, Md Shahnewaz Tanvir, Md. Rawshan Habib,
Tahsina Tashrif Shawmee, Md Apu Ahmed, Tafannum Ferdous,
Md. Rashedul Arefin, and Sanim Alam
Multiple Face Detection Tracking and Recognition from Video
Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
M. Athira, Arun T. Nair, Kesavan Namboothiri, K. S. Haritha,
and Nimitha Gopinath
Review Analysis Using Ensemble Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 373
V. Baby Shalini, M. Iswarya, S. Ramya Sri, and M. S. Anu Keerthika
A Blockchain-Based Expectation Solution for the Internet
of Bogus Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Rishi Raj Singh, Manish Thakral, Sunil Kaushik, Ayur Jain,
and Gunjan Chhabra
Countering Blackhole Attacks in Mobile Adhoc Networks
by Establishing Trust Among Participating Nodes . . . . . . . . . . . . . . . . . . . 399
Mukul Shukla and Brijendra Kumar Joshi
Identification of Gene Communities in Liver Hepatocellular
Carcinoma: An OffsetNMF-Based Integrative Technique . . . . . . . . . . . . 411
Sk Md Mosaddek Hossain and Aanzil Akram Halsana
Machine Learning Based Approach for Therapeutic Outcome
Prediction of Autism Children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
C. S. KanimozhiSelvi, K. S. Kalaivani, M. Namritha,
S. K. Niveetha, and K. Pavithra
An Efficient Implementation of ARIMA Technique for Air
Quality Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
Rudragoud Patil, Gayatri Bedekar, Parimal Tergundi,
and R. H. Goudar
A Survey on Image Emotion Analysis for Online Reviews . . . . . . . . . . . . 453
G. N. Ambika and Yeresime Suresh
xvi Contents
An Efficient QOS Aware Routing Using Improved Sensor

Modality-based Butterfly Optimization with Packet Scheduling
for MANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
S. Arivarasan, S. Prakash, and S. Surendran
IoT Based Electricity Theft Monitoring System . . . . . . . . . . . . . . . . . . . . . . 477
S. Saadhavi, R. Bindu, S. Ram. Sadhana, N. S. Srilalitha,
K. S. Rekha, and H. D. Phaneendra
An Exploration of Attack Patterns and Protection Approaches
Using Penetration Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Kousik Barik, Karabi Konar, Archita Banerjee, Saptarshi Das,
and A. Abirami
Intrusion Detection System Using Homomorphic Encryption . . . . . . . . . 505
Aakash Singh, Parth Kitawat, Shubham Kejriwal,
and Swapnali Kurhade
Reversible Data Hiding Using LSB Scheme and DHE for Secured
Data Transfer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
D. N. V. S. L. S. Indira, Y. K. Viswanadham,
J. N. V. R. Swarup Kumar, Ch. Suresh Babu,
and Ch. Venkateswara Rao
Prediction of Solar Power Using Machine Learning Algorithm . . . . . . . . 529
M. Rupesh, J. Swathi Chandana, A. Aishwarya, C. Anusha,
and B. Meghana
Prediction of Carcinoma Cancer Type Using Deep
Reinforcement Learning Technique from Gene Expression Data . . . . . . 541
A. Prathik, M. Vinodhini, N. Karthik, and V. Ebenezer
Multi-variant Classification of Depression Severity Using Social
Media Networks Based on Time Stamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553
M. Yohapriyaa and M. Uma
Identification of Workflow Patterns in the Education System:
A Multi-faceted Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Ganeshayya Shidaganti, M. Laxmi, S. Prakash, and G. Shivamurthy
Detection of COVID-19 Using Segmented Chest X-ray . . . . . . . . . . . . . . . 585
P. A. Shamna and Arun T. Nair
A Dynamic Threshold-Based Technique for Cooperative
Blackhole Attack Detection in VANET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
P. Remya Krishnan and P. Arun Raj Kumar
Detecting Fake News Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . 613
Ritik H. Patel, Rutvik Patel, Sandip Patel, and Nehal Patel
Contents xvii
Predicting NCOVID-19 Probability Factor with Severity Index . . . . . . . 627

Ankush Pandit, Soumalya Bose, and Anindya Sen
Differentially Evolved RBFNN for FNAB-Based Detection
of Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643
Sunil Prasad Gadige, K. Manjunathachari, and Manoj Kumar Singh
A Real-Time Face Mask Detection-Based Attendance System
Using MobileNetV2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
Kishansinh Rathod, Zeel Punjabi, Vivek Patel,
and Mohammed Husain Bohara
A New Coded Diversity Combining Scheme for High Microwave
Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 671
Yousra Lamrani, Imane Benchaib, Kamal Ghoumid,
and El Miloud Ar-Reyouchi
Extractive Text Summarization of Kannada Text Documents
Using Page Ranking Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
C. P. Chandrika and Jagadish S. Kallimani
Destructive Outcomes of Digitalization (Credit Card), a Machine
Learning Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697
Yashashree Patel, Panth Shah, Mohammed Husain Bohara,
and Amit Nayak
Impact of Blockchain Technology in the Healthcare Systems . . . . . . . . . . 709
Garima Anand, Ashwin Prajeeth, Binav Gautam, Rahul, and Monika
A Comparison of Machine Learning Techniques
for Categorization of Blood Donors Having Chronic
Hepatitis C Infection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 731
Sukhada Bhingarkar
Monitoring the Soil Parameters Using IoT for Smart Agriculture . . . . . 743
K. Gayathri and S. Thangavelu
NRP-APP: Robust Seamless Data Capturing and Visualization
System for Routine Immunization Sessions . . . . . . . . . . . . . . . . . . . . . . . . . 759
Kanchana Rajaram, Pankaj Kumar Sharma, and S. Selvakumar
Methodologies to Ensure Security and Privacy of an Enterprise
Healthcare Data Warehouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777
Joseph George and M. K. Jeyakumar
Comparative Analysis of Open-Source Vulnerability Scanners
for IoT Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785
Christopher deRito and Sajal Bhatia
xviii Contents
Emotion and Collaborative-Based Music Recommendation

System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801
R. Aparna, C. L. Chandana, H. N. Jayashree, Suchetha G. Hegde,
and N. Vijetha
Cricket Commentary Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
A. Siva Balaji, N. Gunadeep Vignan, D. S. V. N. S. S. Anudeep,
Md. Tayyab, and K. S. Vijaya Lakshmi
Performance Comparison of Weather Monitoring System
by Using IoT Techniques and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837
Naveen S. Talegaon, Girish R. Deshpande, B. Naveen,
Manjunath Channavar, and T. C. Santhosh
A Study on Surface Electromyography in Sports Applications
Using IoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
N. Nithya, G. Nallavan, and V. Sriabirami
Detection of IoT Botnet Using Recurrent Neural Network . . . . . . . . . . . . 869
P. Tulasi Ratnakar, N. Uday Vishal, P. Sai Siddharth, and S. Saravanan
Biomass Energy for Rural India: A Sustainable Source . . . . . . . . . . . . . . 885
Namra Joshi
Constructive Approach for Text Summarization Using
Advanced Techniques of Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
Shruti J. Sapra, Shruti A. Thakur, and Avinash S. Kapse
Lane Vehicle Detection and Tracking Algorithm Based
on Sliding Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
R. Rajakumar, M. Charan, R. Pandian, T. Prem Jacob, A. Pravin,
and P. Indumathi
A Survey on Automated Text Summarization System for Indian
Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921
P. Kadam Vaishali, B. Khandale Kalpana, and C. Namrata Mahender
A Dynamic Packet Scheduling Algorithm Based on Active Flows
for Enhancing the Performance of Internet Traffic . . . . . . . . . . . . . . . . . . . 943
Y. Suresh, J. Senthilkumar, and V. Mohanraj
Automated Evaluation of Short Answers: a Systematic Review . . . . . . . 953
Shweta Patil and Krishnakant P. Adhiya
Interactive Agricultural Chatbot Based on Deep Learning . . . . . . . . . . . . 965
S. Suman and Jalesh Kumar
Analytical Study of YOLO and Its Various Versions in Crowd
Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975
Ruchika, Ravindra Kumar Purwar, and Shailesh Verma
Contents xix
IoT Enabled Elderly Monitoring System and the Role of Privacy

Preservation Frameworks in e-health Applications . . . . . . . . . . . . . . . . . . 991
Vidyadhar Jinnappa Aski, Vijaypal Singh Dhaka, Sunil Kumar,
and Anubha Parashar
Hybrid Beamforming for Massive MIMO Antennas Under 6
GHz Mid-Band . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007
Kavita Bhagat and Ashish Suri
Multi-Class Detection of Skin Disease: Detection Using HOG
and CNN Hybrid Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025
K. Babna, Arun T. Nair, and K. S. Haritha
DeepFake Creation and Detection Using LSTM, ResNext . . . . . . . . . . . . 1039
Dhruti Patel, Juhie Motiani, Anjali Patel,
Classification of Plant Seedling Using Deep Learning Techniques . . . . . 1053
K. S. Kalaivani, C. S. Kanimozhiselvi, N. Priyadharshini,
S. Nivedhashri, and R. Nandhini
A Robust Authentication and Authorization System Powered
by Deep Learning and Incorporating Hand Signals . . . . . . . . . . . . . . . . . . 1061
Suresh Palarimath, N. R. Wilfred Blessing, T. Sujatha,
M. Pyingkodi, Bernard H. Ugalde, and Roopa Devi Palarimath
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073

About the Editors
Dr. D. Jude Hemanth received his B.E. degree in ECE from Bharathiar University
in 2002, M.E. degree in communication systems from Anna University in 2006 and
Ph.D. from Karunya University in 2013. His research areas include Computational
Intelligence and Image processing. He has authored more than 120 research papers
in reputed SCIE indexed International Journals and Scopus indexed International
Conferences. His Cumulative Impact Factor is more than 150. He has published 33
edited books with reputed publishers such as Elsevier, Springer and IET.
Dr. Danilo Pelusi received the Ph.D. degree in Computational Astrophysics from the
University of Teramo, Italy. Associate Professor at the Faculty of Communication
Sciences, University of Teramo, he is an Associate Editor of IEEE Transactions on
Emerging Topics in Computational Intelligence, IEEE Access, International Journal
of Machine Learning and Cybernetics (Springer) and Array (Elsevier). Guest editor
for Elsevier, Springer and Inderscience journals, he served as program member of
many conferences and as editorial board member of many journals. His research inter-
ests include Fuzzy Logic, Neural Networks, Information Theory and Evolutionary
Algorithms.
Dr. Chandrasekar Vuppalapati is a Software IT Executive with diverse experience

in Software Technologies, Enterprise Software Architectures, Cloud Computing,
Big Data Business Analytics, Internet of Things (IoT), and Software Product and
Program Management. Chandra held engineering and Product leadership roles at
GE Healthcare, Cisco Systems, Samsung, Deloitte, St. Jude Medical, and Lucent
Technologies, Bell Laboratories Company. Chandra teaches Software Engineering,
Mobile Computing, Cloud Technologies, and Web and Data Mining for Master’s
program in San Jose State University. Additionally, Chandra held market research,
strategy and technology architecture advisory roles in Cisco Systems, Lam Research
and performed Principal Investigator role for Valley School of Nursing where
he connected Nursing Educators and Students with Virtual Reality technologies.
Chandra has functioned as Chair in numerous technology and advanced computing
conferences such as: IEEE Oxford, UK, IEEE Big Data Services 2017, San Francisco
xxi
xxii About the Editors
USA and Future of Information and Communication Conference 2018, Singapore.

Chandra graduated from San Jose State University Master’s Program, specializing in
Software Engineering, and completed his Master of Business Administration from
Santa Clara University, Santa Clara, California, USA.
An Optimized Convolutional Neural
Network Model for Wild Animals
Detection Using Filtering Techniques
and Different Opacity Levels
Pavan Nageswar Reddy Bodavarapu, T. Ashish Narayan,

and P. V. V. S. Srinivas
Abstract Despite the fact that there are numerous ways for object identification,
these techniques under-perform in real-world conditions. For example, heavy rains
and fog at night. As a result, this research work has devised a new convolutional
neural network for identifying animals in low-light environments. In the proposed
system, images of different animals (containing both domestic and wild animals)
are collected from various resources in the form of images and videos. The overall
number of samples in the dataset is 2300; however, because convolutional neural
networks require more samples for training, a few data augmentation techniques
are employed to raise the number of samples in the dataset to 6700. Horizontal flip,
rotation, and padding are the data augmentation techniques. The proposed model has
achieved an accuracy of 0.72 on the testing set and 0.88 on training set, respectively,
without applying the edge detection techniques. The proposed model has achieved
0.81 accuracy after using Canny edge detection technique on animal dataset for
outperforming the state-of-the-art models with ResNet-50 and EfficientNet-B7.
Keywords Object detection · Edge detection · Convolutional neural network ·

Deep learning · Animal detection
1 Introduction
There is a lot of research happening on in the field of object detection. Deep learning
and computer vision have now produced incredible results for detecting various
classes of objects in a given image. Recent advances in this domain assisted us
in creating bounding boxes around the objects. Future developments in this sector
may benefit visually impaired people [1, 2]. The most frequent strategy used in
computer vision techniques for object detection is to transform all color images into
grayscale format and subsequently into binary image. A later region convolutional
neural network is constructed to outperform the standard computer vision algorithms
P. N. R. Bodavarapu (B) · T. A. Narayan · P. V. V. S. Srinivas

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation,
Guntur, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
D. J. Hemanth et al. (eds.), Intelligent Data Communication Technologies and Internet
of Things, Lecture Notes on Data Engineering and Communications Technologies 101,
https://doi.org/10.1007/978-981-16-7610-9_1
2 P. N. R. Bodavarapu et al.
in terms of accuracy [3, 4]. Object detection in remote sensing images is quite a bit
of challenge. The major difference between natural image and remote sensing image
is the size of the object, which is small in remote sensing images when compared
to the background of natural image. This becomes hard to detect the object in the
remote sensing images. Further challenge is that the remote sensing images are only
top-down view [5].
The feature pyramids can generate segment proposals for object detection. The
applications that can be combined with feature pyramids for object detection are: (1)
regional proposal network and (2) fast R-CNN [6, 7]. It is difficult to detect items
in images that do not include labeled detection data. However, YOLO9000 assists
in detecting item types that do not contain label data. YOLO9000 can function in a
real-time environment while performing these difficult tasks [8]. Unmanned aircraft
systems play an important role in wild animal survey for managing a better ecosystem.
Going through all these aerial photographs for manual detection of wild animals is
a difficult task. Employing deep learning methods helps to detect wild animals more
accurately with reduced time consumption. The steps involved for detecting wild
animals in aerial images are: (1) image collection, (2) image splitting, (3) image
labeling, (4) dividing data into train and test sets, (5) training and validation, and (6)
testing [9]. The advantages of using ReLU activation function are the computations
are cheaper and converge faster. The major objective, the vanishing gradient effect,
may be addressed with the ReLU activation function. The error percentage of deep
convolutional neural network (DCNN) with ReLU activation function is 0.8%, which
is more advantageous than sigmoid and tanh activation functions, since the error
percentage is 1.15 and 1.12, respectively, on MNIST dataset [10, 11].
Vehicle collisions with animals cause traffic accidents, resulting in injuries and
deaths for both humans and animals. When the vehicle’s speed exceeds 35 kmph,
the driver has a more difficult time avoiding a collision with an animal since the
distance between the car and the animal is shorter. The influence of humans in
road accidents is nearly 92%. Vehicle collision with animals can be categorized into
direct and indirect collisions [12]. Convolutional neural network can be significantly
effected (increase or decrease) by various techniques, namely (1) weighted residual
connections, (2) cross stage partial connections, (3) cross mini-batch normalization,
(4) self-adversarial training, (5) mish activation, (6) mosaic data augmentation, and
(7) DropBlock regularization. The training of the object detection can be improved
by (1) activation functions, (2) data augmentation, (3) regularization methods, and (4)
skip connections [13, 14]. The steps involved in classifying the wild animals in video
footage are: (1) input video, (2) background subtraction, (3) difference clearing, (4)
calculate energy, (5) average variation, and (6) classification [15, 16].
The important contributions made in this research paper can be outlined as: (1)
devised a novel convolutional neural network, (2) applied various edge detection tech-
niques, (3) experimented on different opacity levels, and (4) compared all the results
and provided valid conclusion. This research is based on animal detection in an image
that has been taken in low-light conditions, and we collected different animals (both
domestic and wild animals) from many resources in form of images and videos. These
videos include several animals; we split the recordings into frames using a Python
An Optimized Convolutional Neural Network … 3
script, and relevant images were chosen and grouped into their respective directo-
ries. The size of the dataset is 2300 samples, since the convolutional neural networks
need more number of samples for training. Also, few data augmentation techniques
are used to increase the size of dataset to 6700 samples. The data augmentation
techniques used here are horizontal flip, rotation, and padding. The proposed model
contains four convolutional layers, two batch normalization layers, two maxpooling
layers, and two dropout layers. The activation function used at convolutional layers is
rectified linear unit (ReLU), and the activation function that has been used in output
layer is softmax. The learning rate and weight decay used in this work are 0.0001 and
1e−4, respectively. The proposed model is then trained for 100 epochs with batch
size 32.
2 Related Work
Fu et al. [17] proposed a framework “deepside” to integrate the convolutional neural

network features and also proposed a new fusion method to integrate different outputs
from the network; this helps to obtain high-precision borderline details of objects.
The deepside framework contains VGG-16 and backbone of deep side structures.
The learning rate used here is 10–9 with a batch size of 1 in this research. The
proposed framework is evaluated on various datasets. The inference time for linear
deepside framework is 0.08 s. Similarly, the inference time for nonlinear deepside
framework is 0.07 s. Hou et al. [18] have proposed a novel saliency method for
salient object detection. This framework is designed by including short connections
within the holistically nested edge detector. This approach is tested and evaluated
on five various salient object detection benchmarks. The learning rate and weight
decay used in this approach are 1e−8 and 0.0005, respectively. The total time taken
to train this model is 8 h. The processing time for each image is 0.08 s. The data
augmentation approach has helped to increase the performance of novel saliency
method by 0.5%. Jia et al. [19] have proposed a salient object detection method,
which can pass back global information much efficiently. The proposed method has
obtained state-of-the-art results. VOC dataset and ImageNet are mixed to form a new
dataset for salient object detection. The proposed method has obtained an F-measure
of 0.87 on PASCAL-S dataset.
Ren et al. [20] have proposed a network, which is a fully convolutional network
to estimate object bounds and scores at every part. The frame rate of the proposed
method on GPU is 5 frames per second (fps). Liu et al. [21] have proposed a model
for region detection in order to increase the region detection performance. The author
has employed center saliency, center, background saliency, and foreground that are
combined in this method to make it more efficient. The runtime of the proposed model
on the MSRA-1000 dataset is 0.2 s/image. This model with 20 color superpixels can
detect the important objects in an image even though they touch the image boundary.
Yu et al. [22] have proposed an algorithm for detecting the moving objects. The
classifier used in this algorithm is Haar cascade classifier. The frame rate of the
proposed method before adding the recognition algorithm is 43 fps, and the frame rate
after adding the recognition algorithm is 36 fps. Othman et al. [23] have proposed a
system for object detection in real time, which can run at high frames per second (fps).
The author has used MobileNet architecture combined with a single shot detector,
which is trained on framework Caffe. To implement this model, Raspberry Pi 3 is
used for obtaining high frames per second, where movidius neural compute stich is
used. Data augmentation is used, since the convolutional neural networks need large
number of samples for training. This method on the Raspberry Pi 3 CPU has obtained
0.5 frames per second. Gasparovsky et al. [24] have discussed about the importance
of outdoor lighting and the factors affecting it. The outdoor lighting depends on
various conditions, namely (1) season, (2) time of day, and (3) no. of buildings and
population of the area.
Guo et al. [25] have proposed a neural network for two sub-tasks: (1) region
proposals and (2) object classification. RefineNet is included after the region proposal
network in the region proposals section for the best region suggestions. On the
PASCAL VOC dataset, the proposed technique is tested. After analyzing the results,
the author has explained that the fully connected layer with softmax layer must
be fine-tuned. The proposed model on PASCAL VOC dataset has achieved 71.6%
mAP. The state-of-the-art model R-CNN has obtained 66.0% mAP on PASCAL VOC
dataset. The results clearly indicate that the proposed method performs significantly
better than the R-CNN. Guo et al. [26] have proposed a convolutional neural network
for object detection, which does not use region proposals for object detection. For
detecting the objects, DarkNet is transferred to a fully convolutional network, and
later, it is fine-tuned. The region proposal system is not effective in real time, since
they take more run time. The proposed model has obtained 73.2% mAP, while fast
R-CNN and faster R-CNN obtained 68.4% and 70.4% mAP, respectively.
3 Proposed Work
3.1 Dataset Description
This study is focused on detecting animals in images shot in low-light settings. We

collected numerous animals (both domestic and wild animals) from various resources
in the form of images and videos. These movies include several animals; we split the
recordings into frames using a Python script, and relevant images were chosen and
grouped into their corresponding folders. The size of the dataset is 2300 samples,
since the convolutional neural networks need more number of samples for training,
and we used few data augmentation techniques to increase the number size of dataset
to 6700 samples. The data augmentation techniques used are horizontal flip, rotation,
and padding. After data augmentation, the dataset is divided in the ratio of 80:20,
where 80% (5360 images) are used for training and 20% (1340 images) are used for
evaluating (testing).
3.2 Edge Detection Techniques
The technique of finding boundaries of an image is called edge detection. This tech-
nique can be employed in various real-world applications like autonomous cars and
unmanned aerial vehicles. The edge detection techniques help us to decrease the
computation and processing time of data while training the deep learning model.
There are several edge detection approaches; in this study, we utilize Canny edge,
Laplacian edge, Sobel edge, and Prewitt edge detection.
3.2.1 Canny Edge Detection
Canny edge detection is used for finding many edges in images. The edges detected
in this image will generally have high local maxima of gradient magnitude. This
technique decreases the probability of not finding an edge in the image. The steps
involved in this technique are, namely (1) smoothing, (2) find gradients, (3) non-max
suppression, (4) thresholding, (5) edge tracking, and (6) output.
Canny edge detection equation:

Edge_gradient = G 2x + G 2y (1)
Gy
Angle(θ ) = tan−1
Gx
3.2.2 Laplacian Edge Detection
Laplacian is the second derivative mask, which is susceptible to noise. If an image

contains noise, Laplacian edge detection is not preferred, which is a major drawback
of this technique. In order to use this technique, we need to follow certain steps if
the image is containing noise. The first step is to cancel the noise by using denoising
filters and then applying Laplacian filter to the corresponding image.
Laplacian edge detection equation:
∂2 f ∂2 f
∇2 f = + (2)
∂x2 ∂ y2
3.2.3 Sobel Edge Detection
Sobel edge detection finds the edges, where gradient of image is very high. Unlike
Canny edge detection, Sobel edge detection does not generate smooth edges, and
also, the number of edges produced by Sobel edge detection is less than Canny edge
detection.
Sobel edge detection equation:

M= S2x + S2y (3)
where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 )

S y = (a0 + ca1 + a2 ) − (a6 + ca6 + a4 )
with constant c = 2.
3.2.4 Prewitt Edge Detection
Prewitt edge detection is used for finding vertical and horizontal edges in images.
The Prewitt edge detection technique is fast when compared to Canny edge and Sobel
edge techniques. For determining the magnitude and edge detection, it is considered
as one of the best techniques.
Prewitt edge detection equation:

M= S2x + S2y (4)
where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 )

S y = (a0 + ca1 + a2 ) − (a6 + ca6 + a4 )
with constant c = 1.
3.3 Algorithm
Step 1: Input the animal dataset containing different animal images.

Step 2: At first, convert all the images into JPG format and then the RGB color images
are converted into grayscale format.
Step 3: Secondly, resize all the corresponding grayscale images to 48X48 pixels.
Step 4: Now, all the corresponding images are selected, and further, Canny edge
detection technique is applied.
Step 5: Then, the same process is repeated with Laplacian, Sobel, and Prewitt edge
detection techniques.
Step 6: Later, all the datasets are divided in the ratio of 80:20 for training and
evaluation.
Step 7: The proposed model and different state-of-the-art models are trained on the
train set and tested on the test set, respectively.
Step 8: Lastly, the performance metrics of different models are displayed based on
the train and test sets.
4 Experimental Results
4.1 Performance of Various Models on Wild Animal Dataset
See Table 1 and Figs. 1, 2, and 3.

Table 1 shows the accuracy and loss comparison of different deep learning models
with proposed model on animal dataset. The proposed model has achieved an accu-
racy of 0.72 on the testing set and 0.88 on training set. The state-of-the-art models
ResNet-50 and EfficientNet-B7 are also trained and tested on the same dataset. The
ResNet-50 model has achieved 0.56 accuracy, and EfficientNet-B7 achieved 0.64
accuracy on the test sets, respectively. The train loss of the proposed model is 0.26,
and the test loss is 0.64, whereas the train and test losses of ResNet-50 are 1.48
and 2.69, respectively. The train and test losses of EfficientNet-B7 are 1.31 and
1.92, respectively. Here, we can clearly say that the proposed model is performing
better than ResNet-50 and EfficientNet-B7 in terms of accuracy and loss of train
set and test set. The train accuracy of proposed model is 0.88, which indicates that
the proposed model is extracting important features for detecting animals in images,
when compared to ResNet-50 and EfficientNet-B7, whose train accuracy is 0.70
and 0.76, respectively. The EfficientNet-B7 is performing better than the ResNet-50,
whose train and test accuracies are significantly higher than ResNet-50 but less than
the proposed model. After analyzing all the results, the proposed model is outper-
forming the state-of-the-art models ResNet-50 and EfficientNet-B7. The proposed
model is able to achieve this high accuracy than the other two models is because it
is able to detect the animals in low-light conditions like images taken during night
Table 1 Outline of accuracy and loss of different models

S. no. Model name Train accuracy Test accuracy Train loss Test loss
1 ResNet-50 0.70 0.56 1.48 2.69
2 EfficientNet-B7 0.76 0.64 1.31 1.92
3 Proposed model 0.88 0.72 0.26 0.64
Fig. 1 Accuracy and loss of

ResNet-50
or during heavy fog. Below are the sample images that the proposed model is able
to detect the animals in night and fog conditions (Fig. 4).
4.2 Performance of Proposed Model After Applying Different

Edge Detecting Techniques
See Table 2 and Figs. 5, 6, 7, and 8.

The edge detection techniques help us to decrease the computation and processing
time of data during training the deep learning model. There are different edge
detecting techniques, and we use Canny edge detection, Laplacian edge detection,
Sobel edge detection, and Prewitt edge detection in this work. The proposed model
achieved 0.81 accuracy after using Canny edge detection technique on animal dataset.
Similarly, the proposed model achieved 0.68, 0.68, and 0.65 accuracies when Lapla-
cian, Sobel, and Prewitt edge detection techniques are applied, respectively. The
results show that Canny edge technique is performing better than the remaining
techniques on animal dataset. The train accuracy of proposed model after applying
Canny edge detection is 0.92. The proposed model achieved 0.72 accuracy on animal
Fig. 2 Accuracy and loss of

EfficientNet-B7
dataset without any edge detection techniques, but after using Canny edge technique,
the proposed model achieved an accuracy of 0.81 on animal dataset. There is a signif-
icant increase in train accuracy and test accuracy of proposed model, when Canny
edge detection is applied. The model seems to perform better when Canny edge detec-
tion is applied on animal dataset. The proposed model achieved less accuracy when
Prewitt edge detection is used. When all the above four edge detection techniques
are compared, Canny edge detection is better, since it is achieving high accuracy
than other edge detection techniques. The results clearly suggest that the proposed
model’s accuracy on both train and test sets significantly improved when Canny edge
detection is used.
Fig. 3 Accuracy and loss of proposed model
Fig. 4 Animals detected by proposed model during night and fog conditions
Table 2 Outline of accuracy and loss of proposed model after applying edge detection techniques
S. no. Technique Train accuracy Test accuracy Train loss Test loss
1 Canny 0.92 0.81 0.26 1.05
2 Laplacian 0.88 0.68 0.35 1.16
3 Sobel 0.90 0.68 0.31 1.60
4 Prewitt 0.81 0.65 1.13 2.01
Fig. 5 Accuracy and loss

using Canny edge

using Laplacian edge

using Sobel edge

using Prewitt edge
Table 3 Accuracy of proposed model for various opacity levels on animal dataset
S. no. Opacity level No. of actual animals No. of detected animals Accuracy
1 1.0 7 7 100
2 0.9 7 6 85.7
3 0.7 7 5 71.4
4 0.5 7 3 42.8
5 0.3 7 0 0
6 0.1 7 0 0
4.3 Performance of Proposed Model on Different Opacity

Levels
Table 3 illustrates the accuracy of proposed model at different opacity levels. When
the opacity level is 1, the proposed model has detected every object in the image and
obtained 100% accuracy. We next reduced the opacity level to 0.9, and the suggested
model obtained 85.7% accuracy, detecting 6 items out of 7 correctly. When the
opacity level is adjusted to 0.5, the suggested model’s accuracy is 42.8, which means
it detected only three items out of seven. When the opacity levels are 0.3 and 0.1, the
accuracy of model is 0, that is, it did not detect any of the 7 objects in the image. Here,
we can see that the accuracy of the model is decreasing as the opacity levels decrease.
This shows that light is very important factor in an image for object detection. The
future work of this research is to develop a system, which can work better at opacity
levels less than 0.5. The drawback of the traditional models and the proposed model
is that they do not perform well under the 0.5 opacity levels. Below is the sample
number of objects detected in image by proposed model, when the opacity level is
1.0.
5 Conclusion
In the proposed system, images of various kind of animals (containing both domestic
and wild animals) are collected from many resources in form of images and videos.
All the videos are then divided into frames using Python script, and appropriate
images are selected. The size of the dataset is 2300 samples, since the convolutional
neural networks require more number of samples for training, and we used few data
augmentation techniques to increase the number size of dataset to 6700 samples.
The data augmentation techniques used are horizontal flip, rotation, and padding.
After the data augmentation, the dataset is divided in the ratio of 80:20, where 80%
(5360 images) are used for training and 20% (1340 images) are used for evaluating
(testing). The proposed model achieves an accuracy of 0.72 on the testing set and
0.88 on training set, respectively, without applying edge detection techniques. The
proposed model achieved 0.81 accuracy after using Canny edge technique on animal
dataset. Similarly, the proposed model achieved 0.68, 0.68, and 0.65 accuracies when
Laplacian, Sobel, and Prewitt edge detection techniques are applied, respectively. The
results clearly suggest that the proposed model’s accuracy on both train and test sets
significantly improved when Canny edge detection is used, and it is outperforming
the state-of-the-art models ResNet-50 and EfficientNet-B7.
References
1. Nasreen J, Arif W, Shaikh AA, Muhammad Y, Abdullah M (2019) Object detection and narrator
for visually impaired people. In: 2019 IEEE 6th international conference on engineering
technologies and applied sciences (ICETAS). IEEE, pp 1–4
2. Mandhala VN, Bhattacharyya D, Vamsi B, Thirupathi Rao N (2020) Object detection using
machine learning for visually ımpaired people. Int J Curr Res Rev 12(20):157–167
3. Zou X (2019) A review of object detection techniques. In: 2019 International conference on
smart grid and electrical automation (ICSGEA). IEEE, pp 251–254
4. Gullapelly A, Banik BG (2020) Exploring the techniques for object detection, classification,
and tracking in video surveillance for crowd analysis
5. Chen Z, Zhang T, Ouyang C (2018) End-to-end airplane detection using transfer learning in
remote sensing images. Remote Sens 10(1):139
6. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017)Feature pyramid networks
for object detection. In: Proceedings of the IEEE conference on computer vision and pattern
recognition, pp 2117–2125
7. Kumar KR, Prakash VB, Shyam V, Kumar MA (2016) Texture and shape based object detection
strategies. Indian J Sci Technol 9(30):1–4
8. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition, pp 7263–7271
9. Peng J, Wang D, Liao X, Shao Q, Sun Z, Yue H, Ye H (2020) Wild animal survey using UAS
imagery and deep learning: modified Faster R-CNN for kiang detection in Tibetan Plateau.
ISPRS J Photogramm Remote Sens 169:364–376
10. Ding B, Qian H, Zhou J (2018) Activation functions and their characteristics in deep neural
networks. In: 2018 Chinese control and decision conference (CCDC). https://doi.org/10.1109/
ccdc.2018.8407425
11. NarasingaRao MR, Venkatesh Prasad V, Sai Teja P, Zindavali Md, Phanindra Reddy O (2018)
A survey on prevention of overfitting in convolution neural networks using machine learning
techniques. Int J Eng Technol 7(2.32):177–180
12. Sharma SU, Shah DJ (2016) A practical animal detection and collision avoidance system using
computer vision technique. IEEE Access 5:347–358
13. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: Optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934
14. Krishnaveni G, Bhavani BL, Lakshmi NV (2019) An enhanced approach for object detection
using wavelet based neural network. J Phys Conf Ser 1228(1):012032. IOP Publishing
15. Chen R, Little R, Mihaylova L, Delahay R, Cox R (2019) Wildlife surveillance using deep
learning methods. Ecol Evol 9(17):9453–9466
16. Chowdary MK, Babu SS, Babu SS, Khan H (2013) FPGA implementation of moving
object detection in frames by using background subtraction algorithm. In: 2013 International
conference on communication and signal processing. IEEE, pp 1032–1036
17. Fu K, Zhao Q, Gu IY, Yang J (2019) Deepside: a general deep framework for salient object
detection. Neurocomputing 356:69–82
18. Hou Q, Cheng MM, Hu X, Borji A, Tu Z, Torr PH (2017)Deeply supervised salient object
detection with short connections. In: Proceedings of the IEEE conference on computer vision
and pattern recognition, pp 3203–3212
19. Jia S, Bruce NDB (2019) Richer and deeper supervision network for salient object detection.
arXiv preprint arXiv:1901.02425
20. Ren S, He K, Girshick R, Sun J (2015)Faster r-cnn: towards real-time object detection with
region proposal networks. arXiv preprint arXiv:1506.01497
21. Liu G-H, Yang J-Y (2018) Exploiting color volume and color difference for salient region
detection. IEEE Trans Image Process 28(1):6–16
22. Yu L, Sun W, Wang H, Wang Q, Liu C (2018)The design of single moving object detection and
recognition system based on OpenCV. In: 2018 IEEE international conference on mechatronics
and automation (ICMA). IEEE, pp 1163–1168
23. Othman NA, Aydin I (2018)A new deep learning application based on movidius ncs for
embedded object detection and recognition. In: 2018 2nd international symposium on
multidisciplinary studies and innovative technologies (ISMSIT). IEEE, pp 1–5
24. Gasparovsky D (2018) Directions of research and standardization in the field of outdoor
lighting. In: 2018 VII. lighting conference of the Visegrad Countries (Lumen V4). IEEE,
pp 1–7
25. Guo Y, Guo X, Jiang Z, Zhou Y (2017)Cascaded convolutional neural networks for object
detection. In: 2017 IEEE visual communications and image processing (VCIP). IEEE, pp 1–4
26. Guo Y, Guo X, Jiang Z, Men A, Zhou Y (2017) Real-time object detection by a multi-feature
fully convolutional network. In: 2017 IEEE international conference on image processing
(ICIP). IEEE, pp 670–674
A Study on Current Research
and Challenges in Attribute-based Access
Control Model
K. Vijayalakshmi and V. Jayalakshmi
Abstract Access control models are used to identify and detect anonymous users or
attacks when sharing big data or other resources in the distributed environment such as
cloud, edge, and fog computing. The attribute-based access control model (ABAC) is
a promising model used in intrusion detection systems. Comparing with the primary
access control models: discretionary access control model (DAC), mandatory access
control model (MAC), and role-based access control model, ABAC gets attention in
the current research due to its flexibility, efficiency, and granularity. Despite ABAC is
performing well in addressing the security requirements of today’s computing tech-
nologies, there are open challenges such as policy errors, scalability, delegations, and
policy representation with heterogeneous datasets. This paper presents the funda-
mental concepts of ABAC and a review of current research works toward framing
efficient ABAC models. This paper identifies and discusses the current challenges in
ABAC based on the study and analysis of the surveyed works.
Keywords Access control models · Attribute-based access control model · Cloud

computing · Big data · DAC · Intrusion detection system · MAC · RBAC
1 Introduction
The intrusion detection system (IDS) is a software and protection mechanism used
in the security system to monitor, identify, and detect anonymous users’ attacks. The
primary roles of IDS are monitoring all incidents, watching logging information, and
reporting illegal attempts [1]. The increased quantity of malicious software gives
K. Vijayalakshmi (B)
Vels Institute of Science, Technology and Advanced Studies, Chennai, India
Arignar Anna Govt. Arts College, Cheyyar, India
V. Jayalakshmi
School of Computing Sciences, Vels Institute of Science, Technology and Advanced Studies,
VISTAS, Chennai, India
e-mail: jayasekar.scs@velsuniv.ac.in
https://doi.org/10.1007/978-981-16-7610-9_2
18 K. Vijayalakshmi and V. Jayalakshmi
dangerous challenges for researchers in designing efficient IDS. And also, there are
more security threats such as denial of service, data loss, data leakage, loss of data
confidentiality in the connected information technology. Hence, the security is an
important issue, and the design of efficient IDS is also a challenging task [2]. The
formal definition of IDS is introduced in 1980. IDS is mainly classified as misuse-IDS
and anomaly-based IDS. The misuse-IDS uses recognized patterns to detect illegal
access. The possible harmful and suspicious activities are stored as patterns in the
database. Based on the recognized patterns, this misuse-IDS monitors and detects
illegal activities. The anomaly-based IDS uses network behavior as a key to detect
the anonymous user or attacks. Thus, if network behavior is up to the predefined
behavior, then the access is granted; otherwise, the anomaly-based IDS generates
the alerts [3]. IDS uses access control models as an agent and the analysis engine to
monitor and identify the sign of intrusion [4]. The traditional IDS fails in addressing
the security attacks of today’s computing technologies like cloud computing, edge
computing, fog computing, and the Internet of Things (IoT). With the development
of the Internet and the usage of social networks, the resources and the users are
increasing exponentially. And, the attacks and security threats are also increased day
by day. Developing the IDS to meet all the security needs is a big challenge [5].
IDS implements an access control model to monitor and detect malicious intru-
sion. The implementation of a flexible and efficient access control model is an impor-
tant task for addressing todays’ complex security needs [6]. The access control model
is a software function and established with a set of security policies. The three main
operations of the access control model are authentication, authorization, and account-
ability. Authentication is the process of identifying the legal users based on the proof
of identity. The function of authorization is deciding whether to allow or deny the
request of the user. Accountability is the task of monitoring users’ activity within
the system and logging the information of these activities for further auditing [7].
Thus, access control model allows or denies the request of the user based on the
security policies. Many access control models have been proposed, and some got
great success in addressing the security needs, while some fail [8]. The discretionary
access control model (DAC) uses the access control list (ACL) for each shared
resource that specifies the access rights for the users [9]. DAC is a owners’ discre-
tionary model; thus, it allows the owner of the resource to create the ACL for his
resource. The mandatory access control model (MAC) uses security labels for both
user and the resource. MAC identifies the legal access or user based on the security
labels [10]. Both DAC and MAC give better performance when the number of users
and resources is limited. They failed in addressing the security issues of todays’
complex computing technologies. The role-based access control model (RBAC) is
proposed to address security attacks in large-scale applications [11, 12]. RBAC estab-
lishes two mappings: permissions-role and role-user. RBAC first assigns all feasible
access rights (permissions) to the role (job) of the user and then it assigns the role to
the user. Hence, the user can get the access rights up to the limit of his role. Many
versions of RBAC with the new technical concept have been proposed to refine and
improve the efficiency of the model [13, 14]. Despite RBAC is performing well, there
are some limitations like poor expressive power of policies and inability to address
A Study on Current Research and Challenges … 19
the dynamic and complex security needs of today’s computing technologies. The
attribute access control model (ABAC) is promising in addressing well-developed
and complex security attacks and threats [15]. The most challenging security attacks
are denial of service, account hijacking, data breach, and data loss [16, 17].
ABAC identifies and allows legal activities based on security policies. The secu-
rity policy is a set of security rules, and a rule is constructed with the attributes of
the subject (who requests the access), resource, and environmental conditions [18,
19]. Despite ABAC meets complex security needs, and some open challenges affect
the performance and efficiency of the model [20, 21]. In this paper, we described the
basic concepts of ABAC and presented a review of current research works toward
framing efficient ABAC models. And also, we identified and analyzed the important
challenges of policy errors, scalability, delegations, and policy representation with
heterogeneous datasets. Section 2 presents the related research works toward devel-
oping the ABAC model. Section 3 describes the fundamental concepts of the ABAC
model and presents a review on ABAC models. Section 4 categorizes and discusses
the current research works in ABAC. Section 5 discusses the current open challenges
in designing efficient ABAC model. Finally, we concluded in Sect. 6.
2 Literature Survey
Heitor Henrique and his team designed and implemented the access control model to
overcome the security problems in federated clouds (interconnected clouds). They
experimented in bioinformatics applications [22]. Muhammad Umar Aftab proposed
a hybrid access control model by combining the strengths and features of ABAC and
RBAC and removing the limitations of these two models. This hybrid model has
the features of the policy-structuring ability of ABAC and the high-security power
of RBAC [23]. Jian Shu Lianghong Shi proposed an extended access control model
by introducing action based on ABAC. The usage of multi-attributes with complex
structures is avoided in this model, and also, this model resolves the issues in dynamic
authorization and changes of access rights [24]. Bai and Zheng have done a survey on
access control models and provided a detailed analysis of the access control models
through the research on access control matrix, access control list, and policies [25].
Xin Jin and Ram Krishnan have proposed a hybrid ABAC model called ABACα
which can easily be configured to other dominating access control models DAC,
MAC, and ABAC. Thus, ABACα combines the strengths and features of DAC, MAC,
RBAC, and ABAC [26]. Canh Ngo and his team proposed a new version of ABAC by
incorporating complex security mechanisms for multi-tenant cloud services. They
extended their model for inter-cloud platforms [27]. Riaz Ahmed Shaikh proposed a
data classification method for ABAC policy validation. He proposed the algorithms
for detecting and resolving rule inconsistency and rule redundancy anomalies [28].
Daniel Servos and Sylvia L. Osborn gave a detailed review of ABAC and discussed
the current challenges in ABAC [29]. Maryem Ait El Hadj and his team proposed
a cluster-based approach for detecting and resolving anomalies in ABAC security
policies [30]. Harsha S. Gardiyawasam Pussewalage and Vladimir A. Oleshchuk

proposed a flexible, fine-grained ABAC model by incorporating the access delegation
features for e-health care platforms [31]. Majid Afshar and his team proposed a
framework for ABAC for health care policies [32]. Xing bing Fu proposed the new
ABAC scheme with large universe attributes and the feature of efficient encryption
for addressing the security requirements of the cloud storage system [33]. Table 1
illustrates the analysis of the current research on the ABAC model. Figure 1 shows
the flow diagram of the literature survey.
Edelberto Franco Silva and his team proposed an extended framework called
ACROSS for authentication and authorization. They developed this framework based
on the policies and attributes of virtual organizations. Thus, this framework is devel-
oped for addressing the security issues in virtual organization platforms [34]. Hui Qi
and his team proposed a hybrid model called role and attribute-based access control
(RABAC) by incorporating the efficiencies of both RBAC and ABAC. RABAC
has the capability of static relationships of RBAC (permission-role and role-user
mappings) and the dynamic ABAC policies [6]. Maryem Ait El Hadj proposed an
approach for clustering ABAC policies and algorithms for detecting and resolving
policy errors [35]. Youcef Imine and his team proposed a flexible and fine-grained
ABAC scheme to improve the security level of the model. This novel ABAC scheme
also gave the solution for revocation problems like removing the users or some
attributes in the system and preventing the user from getting access [36]. Mahendra
Pratap Singh and his team proposed an approach and gave solutions for converging,
specifying, enforcing, and managing other access control models’ security policies
with ABAC policies [37]. Charles Morisset proposed an extended ABAC frame-
work for evaluating missing information in the ABAC security policies with the
use of binary decision data structures [38]. In our previous research, we proposed
a priority-based clustering approach to cluster the ABAC policies before the policy
validation [39]. We have done a review on access control models and analyzed the
access control models based on the study on the previous researches [40, 41].
3 Attribute-based Access Control Model
3.1 Background of ABAC
ABAC model is a software and protection mechanism to monitor and identify the
intrusion of malicious users [42]. ABAC is established with a set of security policies.
The decision on the users’ requests is made based on the specified policy set. Each
ABAC policy is a set of security rules. ABAC allows or denies the request for
accessing the shared resource based on the security rules. Thus, ABAC model allows
only the legitimate users by checking their identity in two gates [43]. The first gate
is a traditional authentication process that verifies the common identities of the users
like username, password, and date of birth. The second gate is the ABAC model that
Table 1 Analysis of the research on ABAC model

References Technique Efficiency Limitations
Heitor Henrique et al. Access control model Improved security for Increases the
[22] for federated clouds interconnected clouds complexity of
developing an efficient
access control model
Muhammad et al. [23] Hybrid access control Combined features of The poor expressive
model ABAC and RBAC power of policies
Increases the
complexity of
managing attributes
dynamically
Shu et al. [24] Extended ABAC Introduced Degrades the
model action-based ABAC efficiency of the
Avoids the complexity authorization process
of attributes
Jin et al. [26] Hybrid access control Easily compromised Fails to address the
model ABACα and configured with inefficiencies of the
the primary access configured access
control models DAC, control models
MAC, and RBAC
Ngo et al. [27] A new version of Efficient in Implementing the
ABAC with complex multi-tenant clouds model with complex
security mechanisms and inter-cloud security requirements
platforms is a difficult task
Shaikh et al. [28] ABAC policy Identified and Not concentrated on
validation using data resolved the all anomalies like
classification method anomalies rule conflict-demand, rule
inconsistency and rule discrepancy
redundancy
Ait et al. [30] Policy validation Policy validation with Generated more
using a cluster-based reduced computation clusters and increases
approach time complexity and cost
Pussewalage et al. [31] Fine-grained ABAC Efficient in health care The delegation feature
with delegation platforms may cause critical
feature security issues
Afshar et al. [32] A framework for Efficient in expressing The scope is limited to
ABAC health care policies the health care
platform
Fu et al. [33] ABAC model with Implemented efficient Increased the
large universe encryption complexity of
attributes managing large
universe attributes
Edelberto et al. [34] ACROSS —an Efficient authorization Increased complexity
extended framework and authentication in implementing the
based on virtual feature model
organizations
(continued)
Table 1 (continued)
References Technique Efficiency Limitations
Qi et al. [6] A hybrid model with Efficient in managing High computation
the features of RBAC static relationships time and complex
and ABAC and dynamic ABAC implementation
policies
Search
Articles related to Various databases such as
ABAC models Springer, Elsevier, IEEE,
Google Scholar, etc.
Download
500 Articles related to ABAC models were downloaded
100 Articles unrelat- Excluded

ed to ABAC models Categorized and
were excluded filtered based on
ABAC concepts
Taken for review process
52 Articles related to ABAC research and current challenges are identified
A review is conducted and current research and challenges in ABAC are

identified and discussed
Fig. 1 Flow diagram of literature survey
verifies the users with more attributes like name of department, designation, resource
name, resource type, time. The common jargons in ABAC models are as follows:
Subject: The user who requests access to a shared resource is called the subject. The
subject may be a person (user), process, application, or organization.
Subject attributes {S1 , S2 , …, Sn }: The important properties or characteristics used
to describe the subject are referred to as subject attributes.
Example: {S1 , S2, S3 } = {Department, Designation, grade}.
Subject attributes values {VS1 , VS2 , …, VSn }: The possible set of values (domain)
is assigned to the subject attributes {S1 , S2 , …, Sn }. Such that VSk = {sk v1 , sk v2 , …,
sk vn } is the value domain for attribute Sk , and Sk = {values ε VSk }.
Example: {VDepartment = {Cardiology, Hematology, Urology, Neurology}.
Subject value attribute assignment: The values of the subject attribute are assigned
as Sk = {values ε VSk }.
Example: Department = {Hematology, Urology} ∧ Designation = {Nurse, Doctor}.
Object: The shared resource is called the object.
Object attributes {A1 , A2 , …, An }: The important properties or characteristics used
to describe the object are referred to as object attributes.
Example: {O1 , O2, O3 } = {ResourceName, ResourceType, LastUpdatedOn}.
Object attributes values {VO1 , VO2 , …, VOn }: The possible set of values (domain)
is assigned to the object attributes {O1 , O2 ,.., On }. Such that VOk = {ok v1 , ok v2 , …,
ok vn } is the value domain for attribute Ok , and Ok = {values ε VOk }.
Example: {VResourceName = {Pat_007_Blood_Report, Pat_435_CBC_Report}.
Object value attribute assignment: The values of the object attribute are assigned
as Ok = {values ε VOk }.
Example: ResourceName = { Pat_435_CBC_Report} ∧ ResourceType =
{DataFile}.
Environmental condition: This category specifies the information about the
environmental conditions.
Environmental condition attributes {E1 , E2 , …, En }: The characteristics are used
to describe the environment.
Example: {E1 , E2, E3 } = {DateOfRequest, Time, PurposeOfRequest}.
Environmental condition attributes values {VE1 , VE2 , …, VEn }: The possible set
of values (domain) is assigned to the environment attributes {E1 , E2 ,.., En }. Such that
VEk = {ek v1 , ek v2 ,…,ek vn } is the value domain for attribute Ek , and Ek = {values
ε VEk }.
Example: {V Time = {07:12, 12:05, 08:16y}.
Environmental value attribute assignment: The values of the environmental
attribute are assigned as Ek = {values ε VEk }.
Example: Time = {07:12}.
ABAC rule is expressed as R = {Xop | {A1 e VA1 , A2 e VA2 , …, An e VAn }.
X is the decision (allow or deny) for the request of operation (read, write, print,
etc.). {A1 , A2 , …, An } is the list of attributes belonging to categories {subject,
object, environmental conditions}. VA1 , VA2 , VAn are the set of permitted values
of the attributes {A1 , A2 , …, An }, respectively. The decision is made based on the

attributes specified in the ABAC rule. The ABAC rule can be written as follows:

allowread | Designation = {Surgeon, Chief doctor},
R1 =
Department = {Cardiology}, FileName = {Pat_567_CBC_Report}
The above rule, R1, states that the persons who are all working as a surgeon
or chief doctor belonging to the department of cardiology can read the file
Pat_567_CBC_Report.
3.2 Policy Expression
ABAC policies can be written using access control policy languages. Most ABAC
implementation uses extensible access control markup language (XACML) to
express the ABAC policies [44]. Organization for the Advancement of Struc-
tured Information Standards (OASIS) created a standard for XACML based on
XML concepts in 2002. OASIS also developed security assertion markup language
(SAML) in 2005 for the specification of security policies [45]. The security policy
set of the ABAC model can also be expressed by JavaScript Object Notation (JSON).
In XACML, each attribute is expressed with the pair (attribute’s name, attribute’s
value) using a markup language. The ABAC policy set can be expressed by XACML
as follows:
<PolicySet> <Object>
<Policy PolicyID=”P1”> <ResourceName>
<Rule RuleID=”R1” Decision=”Allow” > PatID_005_CBCe_Report
<Operation> </ ResourceName >
<Operation-1>read</Operation-1> </Object>
<Operation-2>write</Operation-2> <EnvironmentalCondition>
</Operation> <Duration>07:12</Duration>
<Subject> </EnvironmentalCondition>
<Department>dermatology</Department> </Rule>…// more rules can be specified
<Designation>chief doctor<Designation> </Policy> // more policies can be specified
</Subject> </PolicySet>
In the above example, the rule R1 states that security policy allows the
chief doctor in the department of dermatology to read and write the file
“PatID_005_CBC_Report” during the time 07:12 h.
4 Taxonomy of ABAC Research
The current study on ABAC research is classified based on model, implementation,

policy, and attributes. The research on each category can be divided into subcategories
specific to the area or domain of the research. Figure 2 shows the taxonomy of ABAC
research.
4.1 ABAC Models
The research on designing ABAC models either the original model or hybrid models
is getting great attention. The design of the original model is a purely new attribute-
based access control model not an extended model of any previous access control
model. The design of the original access control model may be general or domain-
specific models. The hybrid models are designed by the combined features or
strengths of two or more existing models. Table 2 shows the type of ABAC models.
4.2 ABAC Implementation
The researches toward the implementation of ABAC models have also great impact
and interest in todays’ communication technology. Comparing to the researches
on designing ABAC models, the researches on the implementation is in the next
place to the design of ABAC models. The framework for the implementation of the
ABAC model comprises several functional components like representation of ABAC
policies in any one of the access control languages (XACML, SAML, ORACLE,
MySQL, or others), establishing security policies, storing and managing policies
and metadata, and testing and maintenance of the framework.
Categorization of current research in ABAC
Model-related Information- Policy-related Attributes-

research related research research related research
Fig. 2 Taxonomy of ABAC research

Table 2 Taxonomy of ABAC research in designing models

ABAC model Techniques
Original model • General models
- Logic-based ABAC: Designing is mainly concentrated on consistency,
representation, and validation of ABAC policies [46]
- ABACα: Designing the model by incorporating the features of DAC, MAC,
and RBAC [26]
- Attribute-based access matrix model: implementing ABAC matrix called
ABACM. Each row represents the subject’s attribute and value pair. Each
column represents the object’s attribute and value pair. Each cell specifies the
access right [47]
• Domain-specific models
- Cloud computing: designing the model for the domain of cloud computing
[48]
- Grid computing: designing the model for the domain of grid computing
- Real-time systems: designing the model for the domain of real-time systems
Hybrid model • RABAC: designing the model with the combined features of RBAC and
ABAC
• PRBAC: designing the model with the combined features of parameterized
RBAC and ABAC
• Attribute-based and role assignment: a model with attribute-based and
role-assignment policies
4.3 ABAC Policies
The researches toward the development, testing, and validation of ABAC security
policies are also getting great attention. The researcher has an equivalent interest
in policy-related tasks and the implementation of the ABAC model. The previous
and current research contributions on policies are preserving the consistency and
confidentiality of the policies, flexible and efficient policymaking, testing policies,
detecting anomalies, and validating policy anomalies.
4.4 ABAC Attributes
The literature review describes that there are also more research contributions on
determining and specifying attributes in the policies. The research on policy attributes
involves preserving confidentiality, adding more attributes to improve the security
level, flexible attribute specification, storing and managing attributes. Figure 3 shows
the evolution of ABAC research, and Fig. 4 shows the research rate of each category
of ABAC.
Fig. 3 Evolution of ABAC research
Fig. 4 Research rate of each category of ABAC
5 Challenges in ABAC
5.1 Policy Errors
The main critical issues are anomalies or conflicts in the security policies. The
policy errors cause dangerous security issues like denial of service, data loss, or data
breach. The primary policy errors are rule redundancy and rule discrepancy [49].
The rule redundancy errors consume high storage space and increase the complexity
in updating security policies [50]. The rule discrepancy error provides confusion
in granting permissions to the users. This error causes unavailability of the shared
resource or illegal access.
5.2 Scalability
The important challenge in implementing or adopting the ABAC framework is the

scalability of the model. The traditional access control models DAC and MAC proved
their scalability in small-scale applications [51]. RBAC is also performing well
in large-scale applications. ABAC has to meet complex security requirements and
manage millions of subjects’ and objects’ attributes. ABAC solutions should require
many case studies to prove their scalability.
5.3 Delegations
The most essential feature of access control models is delegation. The delegation
feature allows one subject to grant (delegate) certain permissions (access rights)
to the other subjects. Due to frequent and dynamic changes of attributes and poli-
cies, achieving dynamic delegation is more complex [52]. The delegation requires
constant policies with constant attributes and role-user assignments. The researchers
are struggling to fulfill the requirement of dynamic delegation.
5.4 Auditability
Another important and necessary aspect of all security systems and access control
models is auditing. The term auditing refers to the ability to determine the number
of subjects who has got particular access rights (read, write, or share) for a certain
object, or the particular subject has got access rights for how many objects. ABAC
never maintains the identity of the users [42]. The users are unknown, and they get
the access rights if their attributes are satisfied with the predefined ABAC policies.
Thus, it is more difficult to determine the number of users for a particular object and
the number of objects allowed for access to a particular user.
6 Conclusion
With the help of the Internet, communication, and information technology, the
number of users and resources is growing rapidly. Hence, the security is an essen-
tial, critical, and challenging concept. Many access control models play a vital role
in addressing security threats and attacks like denial of service, account hijacking,
and data loss. ABAC is getting more attention from the researchers, due to its flex-
ibility and efficiency. This paper has presented the fundamental concepts of the
ABAC model and the taxonomy of research in ABAC. This paper has categorized
and described each category of ABAC research. This article also discussed the chal-
lenges in ABAC models. This review work may help the researchers and practitioners
toward attaining knowledge of ABAC models, implementation, policies, attributes,
and challenges in ABAC.
References
1. Kumar A, Maurya HC, Misra R (2013) A research paper on hybrid intrusion detection
system.Int J Eng Adv Technol 2(4):294–297
2. Khraisat A, Gondal I, Vamplew P, Kamruzzaman J (2019) Survey of intrusion detection
systems: techniques, datasets and challenges. Cybersecurity 2(1). https://doi.org/10.1186/s42
400-019-0038-7
3. Hydro C et al (2013) We are IntechOpen, the world ’ s leading publisher of Open Access books
Built by scientists, for scientists TOP 1 %. INTECH 32(July):137–144
4. Liang C et al (2020) Intrusion detection system for the internet of things based on blockchain
and multi-agent systems. Electrononics 9(7):1–27. https://doi.org/10.3390/electronics9071120
5. Varal AS, Wagh SK (2018) Misuse and anomaly intrusion detection system using ensemble
learning model. In: International conference on recent innovations in electrical, electronics &
communication engineering ICRIEECE 2018, pp. 1722–1727. https://doi.org/10.1109/ICRIEE
CE44171.2018.9009147
6. Qi H, Di X, Li J (2018) Formal definition and analysis of access control model based on role
and attribute. J Inf Secur Appl 43:53–60. https://doi.org/10.1016/j.jisa.2018.09.001
7. Suhendra V (2011) A survey on access control deployment. In: Communication in computer and
information science, vol 259 CCIS, pp 11–20. https://doi.org/10.1007/978-3-642-27189-2_2
8. Sahafizadeh E (2010) Survey on access control models, pp 1–3
9. Conrad E, Misenar S, Feldman J (2016) Domain 5: identity and access management (Control-
ling Access And Managing Identity). In: CISSP Study Guid, pp 293–327. https://doi.org/10.
1016/b978-0-12-802437-9.00006-0
10. Xu L, Zhang H, Du X, Wang C (2009) Research on mandatory access control model for
application system. In: Proceedings of international conference on networks security, wireless
communications and trusted computing NSWCTC 2009, vol 2, no 1, pp 159–163. https://doi.
org/10.1109/NSWCTC.2009.322
11. Sandhu RS et al (1996) Role based access control models. IEEE 6(2):21–29. https://doi.org/
10.1016/S1363-4127(01)00204-7
12. Sandhu R, Bhamidipati V, Munawer Q (1999) The ARBAC97 model for role-based admin-
istration of roles. ACM Trans Inf Syst Secur 2(1):105–135. https://doi.org/10.1145/300830.
300839
13. Sandhu R, Munawer Q (1999) The ARBAC99 model for administration of roles. In: Proceed-
ings 15th annual computer security applications conference, vol Part F1334, pp 229–238.
https://doi.org/10.1109/CSAC.1999.816032
14. Hutchison D (2011) Data and applications security and privacy XXV. In: Lecture notes
computer science, vol 1, pp 3–18. https://doi.org/10.1007/978-3-319-20810-7
15. Crampton J, Morisset C (2014) Monotonicity and completeness in attribute-based access
control. In: LNCS 8743,Springer International Publication, pp 33–34
16. Prakash C, Dasgupta S (2016) Cloud computing security analysis: challenges and possible
solutions. In: International conference on electrical, electronics, and optimization techniques
ICEEOT 2016, pp 54–57. https://doi.org/10.1109/ICEEOT.2016.7755626
17. Markandey A, Dhamdhere P, Gajmal Y (2019) Data access security in cloud computing:
a review. In: 2018 International conference on computing, power and communication
technologies GUCON 2018, pp 633–636. https://doi.org/10.1109/GUCON.2018.8675033
18. Que Nguyet Tran Thi TKD, Si TT (2017) Fine grained attribute based access control model
for privacy protection. Springer International Publication A, vol 10018, pp 141–150. https://
doi.org/10.1007/978-3-319-48057-2
19. Vijayalakshmi K, Jayalakshmi V (2021) Analysis on data deduplication techniques of storage of
big data in cloud. In: Proceedings of 5th international conference on computing methodologies
and communication ICCMC 2021. IEEE, pp 976–983
20. Vijayalakshmi K, Jayalakshmi V (2021) Identifying considerable anomalies and conflicts
in ABAC security policies. In: Proceedings of 5th international conference on intelligent
computing and control systems ICICCS 2021. IEEE, pp 1286–1293
21. Vijayalakshmi K, Jayalakshmi V (2021) A similarity value measure of ABAC security rules.
In: Proceedings of 5th international conference on trends electronics and informatics ICOEI
2021, IEEE
22. Costa HH, de Araújo AP, Gondim JJ, de Holanda MT, Walter ME (2017) Attribute based
access control in federated clouds: A case study in bionformatics. In: Iberian conference on
information systems and technologies CIST. https://doi.org/10.23919/CISTI.2017.7975855
23. Aftab MU, Habib MA, Mehmood N, Aslam M, Irfan M (2016) Attributed role based access
control model. In: Proceedings of 2015 conference on information assurance and cyber security
CIACS 2015, pp 83–89. https://doi.org/10.1109/CIACS.2015.7395571
24. Shu J, Shi L, Xia B, Liu L (2009) Study on action and attribute-based access control model for
web services. In: 2nd International symposium on information science and engineering ISISE
2009, pp 213–216. https://doi.org/10.1109/ISISE.2009.80
25. Bai QH, Zheng Y (2011) Study on the access control model in information security. In: Proceed-
ings of 2011 cross strait quad-regional radio science wireless technology conference CSQRWC
2011, vol 1, pp 830–834. https://doi.org/10.1109/CSQRWC.2011.6037079
26. Jin X, Krishnan R, Sandhu R (2012) A unified attribute-based access control model covering
DAC, MAC and RBAC BT. In: Lecture notes in computer science, vol 7371, pp 41–55
27. Ngo C, Demchenko Y, De Laat C (2015) Multi-tenant attribute-based access control for cloud
infrastructure services. https://doi.org/10.1016/j.jisa.2015.11.005
28. Shaikh RA, Adi K, Logrippo L (2017) A data classification method for inconsistency and
incompleteness detection in access control policy sets. Int J Inf Secur 16(1):91–113. https://
doi.org/10.1007/s10207-016-0317-1
29. Servos D, Osborn SL (2017) Current research and open problems in attribute-based access
control. ACM Comput Surv (CSUR) 49(4):1–45. https://doi.org/10.1145/3007204
30. El Hadj MA, Ayache M, Benkaouz Y, Khoumsi A, Erradi M (2017) Clustering-based approach
for anomaly detection in xacml policies. In: ICETE 2017—proceedings of 14th international
joint conference on E-business telecommunication, vol 4, no Icete, pp 548–553. https://doi.
org/10.5220/0006471205480553
31. Pussewalage HSG, Oleshchuk VA (2017) Attribute based access control scheme with controlled
access delegation for collaborative E-health environments. J Inf Secur Appl 37:50–64. https://
doi.org/10.1016/j.jisa.2017.10.004
32. Afshar M, Samet S, Hu T (2018) An attribute based access control framework for healthcare
system. J Phys Conf Ser 933(1). https://doi.org/10.1088/1742-6596/933/1/012020
33. Fu X, Nie X, Wu T, Li F (2018) Large universe attribute based access control with efficient
decryption in cloud storage system. J Syst Softw 135:157–164. https://doi.org/10.1016/j.jss.
2017.10.020
34. Franco E, Muchaluat-saade DC (2018) ACROSS: a generic framework for attribute-based

access control with distributed policies for virtual organizations. Futur Gener Comput Syst
78:1–17. https://doi.org/10.1016/j.future.2017.07.049
35. Ait El Hadj M, Khoumsi A, Benkaouz Y, Erradi M (2018) Formal approach to detect and resolve
anomalies while clustering ABAC policies. ICST Trans Secur Saf 5(16):156003. https://doi.
org/10.4108/eai.13-7-2018.156003
36. Imine Y, Lounis A, Bouabdallah A (2018) AC SC. https://doi.org/10.1016/j.jnca.2018.08.008
37. Pratap M, Sural S, Vaidya J (2019) Managing attribute-based access control policies in a
unified framework using data warehousing and in-memory database. Comput Secur 86:183–
205. https://doi.org/10.1016/j.cose.2019.06.001
38. Morisset C, Willemse TAC, Zannone N (2019) A framework for the extended evaluation of
ABAC policies. Cybersecurity 2(1). https://doi.org/10.1186/s42400-019-0024-0
39. Vijayalakshmi K, Jayalakshmi V (2020) A priority-based approach for detection of anomalies
in ABAC policies using clustering technique. In: Iccmc, pp 897–903. https://doi.org/10.1109/
iccmc48092.2020.iccmc-000166
40. Vijayalakshmi K, Jayalakshmi V (2021) Shared access control models for big data: a perspective
study and analysis. Springer, pp 397–410. https://doi.org/10.1007/978-981-15-8443-5_33
41. Vijayalakshmi K, Jayalakshmi V (2021) Improving performance of ABAC security policies
validation using a novel clustering approach. Int J Adv Comput Sci Appl 12(5):245–257
42. Hu VC et al (2014) Guide to attribute based access control (abac) definition and considerations.
NIST Spec Publ 800:162. https://doi.org/10.6028/NIST.SP.800-162
43. Cavoukian A, Chibba M, Williamson G, Ferguson A (2015) The importance of ABAC: attribute-
based access control to big data: privacy and context. In: Private Big Data Institute, p 21
44. Deng F et al (2019) Establishment of rule dictionary for efficient XACML policy management.
Knowl-Based Syst 175:26–35. https://doi.org/10.1016/j.knosys.2019.03.015
45. OASIS (2008) SAML v2.0. Language (Baltim)
46. Dovier A, Piazza C, Pontelli E, Rossi G (2000) Sets and constraint logic programming. ACM
Trans Program Lang Syst 22(5):861–931. https://doi.org/10.1145/365151.365169
47. Zhang X, Li Y, Nalla D (2005) An attribute-based access matrix model. In: Proceedings of the
2005 ACM symposium on applied computing, vol 1, pp 359–363. https://doi.org/10.1145/106
6677.1066760
48. Ahuja R, Mohanty SK, Sakurai K (2016) A scalable attribute-set-based access control with
both sharing and full-fledged delegation of access privileges in cloud computing. Comput Electr
Eng, pp 1–16. https://doi.org/10.1016/j.compeleceng.2016.11.028
49. Vijayalakshmi K, Jayalakshmi V (2021) Resolving rule redundancy error in ABAC policies
using individual domain and subset detection method. In: Proceedings of 6th international
conference on communication and electronics systems. ICCES 2021, IEEE
50. Ait M, Hadj E, Erradi M, Khoumsi A (2018) Validation and correction of large security policies
: a clustering and access log based approach. In: 2018 IEEE international conference on big
Data (Big Data), no 1, pp 5330–5332. https://doi.org/10.1109/BigData.2018.8622610
51. Fugkeaw S, Sato H (2018) Scalable and secure access control policy update for outsourced big
data. 79:364–373. https://doi.org/10.1016/j.future.2017.06.014
52. Servos D, Mohammed S, Fiaidhi J, Kim TH (2013) Extensions to ciphertext-policy attribute-
based encryption to support distributed environments. Int J Comput Appl Technol 47(2–3):215–
226. https://doi.org/10.1504/IJCAT.2013.05435
Audio Denoising Using Deep Neural
Networks
S. Jassem Mohammed and N. Radhika
Abstract Improving speech quality is becoming a basic requirement with increasing

interest in speech processing applications. A lot of speech enhancement techniques
are developed to reduce or completely remove listeners fatigue from various devices
like smartphones and also from online communication applications. Background
noise often interrupts communication, and this was solved using a hardware physical
device that normally emits a negative frequency of the incoming audio noise signal to
cancel out the noise. Deep learning has recently made a break-through in the speech
enhancement process. This paper proposes an audio denoising model which is built
on a deep neural network architecture based on spectrograms (which is a hybrid
between frequency domain and time domain). The proposed deep neural network
model effectively predicts the negative noise frequency for given input incoming
audio file with noise. After prediction, the predicted values are then removed from
the original noise audio file to create the denoised audio output.
Keywords Deep neural network · Spectrogram · Transfer learning · Activation

function · Sampling rate · Audio synthesizing · Preprocessing
1 Introduction
Speech signals are transmitted, recorded, played back, analyzed, or synthesized by

electronic systems in the context of audio communication. Noise influences must be
carefully considered when building a system for any of these reasons. Different types
of noise and distortion can be identified, and there are a variety of signal processing
principles that can help mitigate their impact. One of the most researched problems
in audio signal processing is denoising. Noise is an inevitable and, in most cases, an
S. J. Mohammed · N. Radhika (B)

Department of Computer Science and Engineering, Amrita School of Engineering,
Amrita Vishwa Vidyapeetham, Coimbatore, India
e-mail: n_radhika@cb.amrita.edu
S. J. Mohammed
e-mail: cb.en.p2aid19016@cb.students.amrita.edu
https://doi.org/10.1007/978-981-16-7610-9_3
34 S. J. Mohammed and N. Radhika
Fig. 1 Working of AI-based audio denoising model
undesired component of audio recordings, necessitating the use of a denoising stage

in signal processing pipelines for applications such as music transcription, sound
categorization, voice recognition. The goal of audio denoising is to reduce noise
while preserving the underlying signals. There are several applications, including
music and speech restoration. Figure 1 shows the overall basic working of an audio
denoising artificial intelligence model.
Figure 1 shows the basic overall working of a trained audio denoising model. The
trained audio denoising model for an incoming noisy speech file predicts the noise
values present in the incoming input speech. These noise values when subtracted
from the incoming audio file, clean or denoised audio file is obtained.
For decades, researchers have been working on speech enhancement techniques
that predict clear speech based on statistical assumptions over how speech and noise
behave. With the development of deep learning-based approaches, a new age of voice
augmentation has begun. By training a deep neural network, these strategies learn
the mapping function that transfers noisy speech to clean speech without making
any statistical assumptions. This deep neural network is fed a lot of data for training
in the form of clean and noisy speech pairings, and it updates its parameters during
the supervised learning process to produce the best prediction for the target clean
speech. This paper has seven sections in total. Section 2 presents the background
literature survey, Sects. 3 and 4 elucidates on the design of the proposed model and
implementation of the proposed model, while Sects. 5 and 6 discusses the results
and inferences. The conclusion and future scope are described at the end in Sect. 7.
In Sect. 7, the appendix contains important snippets of code that were used for the
implementation of this project.
Audio Denoising Using Deep Neural Networks 35
2 Background
In this section, some of the recent techniques that have been used for denoising
speech using artificial intelligence models and algorithms have been surveyed.
The deep autoencoder model (DAE) described in this paper [1] has been utilized to
perform dimensionality reduction, face recognition, and natural language processing.
The author studied utilizing a linear regression function to create the DDAE model’s
decoder (termed DAELD) in this research and evaluated the DAELD model on
two speech enhancement tasks (Aurora-4 and TIMIT datasets). The encoder and
decoder layers of the DAELD model are used to transform speech signals to high-
dimensional feature representations and then back to speech signals. The encoder
consists of nonlinear transformation, and the decoder consists of the linear regression
algorithm. The author had proved that utilizing linear regression in the decoder part,
he was able to obtain improved performance in terms of PESQ and STOI score
values.
In this paper [2], using deep neural networks, the author used an ideal binary
mask (IBM) as a binary classification function for voice improvement in complex
noisy situations (DNNs). During training, IBM is employed as a target function, and
trained DNNs are used to estimate IBM during the augmentation stage. The target
speech is then created by applying the predicted target function to the complex noisy
mixtures. The author had proved that a deep neural network model with four hidden
layers and with the mean square error as its loss function provides an average seven
percent improvement in the speech quality.
This paper [3] provides a deep learning-based method for improving voice
denoising in real-world audio situations without the need for clean speech signals in
a self-supervised manner. Two noisy realizations of the same speech signal, one as
the input and the other as the output, are used to train a fully convolutional neural
network as proposed by the author using LeakyReLU as the activation function as
the author had mentioned that it would help speeding the training processes. Thus,
LeakyReLU had been selected as the activation function for the proposed model of
this paper.
In this paper [4], in the spectro-temporal modulation domain, the author presented
a simple mean squared error (MSE)-based loss function for supervised voice enhance-
ment. Because of its tight relationship to the template-based STMI (spectro-temporal
modulation index), which correlates well with speech intelligibility, this terms the
loss spectro-temporal modulation error (STME). In the training and test sets, the
author used a small-scale dataset with 9.4 hours and 35 min of loud speech, respec-
tively. The author used the Interspeech2020 deep noise suppression (DNS) dataset
for the large-scale dataset. The author’s model consists of four fully connected layers
connected with two stacked gated recurrent units between the first and the second
layer. Unlike the proposed model of this paper, the author of paper [4] had built a
speech enhancement model on the modulation domain.
In this paper [5], deep neural networks had been developed by the author to classify
spoken words or environmental sounds from audio. After that, the author trained
an audio transform to convert noisy speech to an audio waveform that minimized
the recognition network’s “perceptual” losses. For training his recognition network
with perceptual loss as the loss function, the author utilized several wave UNet
architectures and obtained PESQ score of 1.585 and STOI score of 0.773 as the
highest score reached by the author for various proposed architectures with a similar
architecture to that of the UNet model.
This Stanford paper [6] by the author Mike Kayser has come up with two different
approaches for audio denoising, and the first method is to provide the noisy spectro-
gram to a convolutional neural network and obtain a clean output spectrogram. The
clean spectrogram is used to generate mel-frequency cepstral coefficient (MFCC). In
the second method proposed by the author, the noisy spectrogram is given as an input
to the multilayer perceptron network which is in turn connected to a convolutional
neural network. This combined network learns and predicts the MFCC features. The
author has also concluded from his experiments that for various architectures, tanh
activation function gives better results when training audio spectrograms compared
to that of rectified linear units.
In this paper [7], the author proposes Conv-TasNet, a deep learning system for
end-to-end time-domain speech separation, as a fully convolutional time-domain
audio separation network (Conv-TasNet). Conv-TasNet generates a representation
of the speech waveform that is optimized for separating distinct speakers using a
linear encoder. The encoder output is subjected to a collection of weighting functions
(masks) to achieve speaker separation.
Thus, in order to propose a deep neural network model and to improve the perfor-
mance of the designed model, the above literature survey was done. From paper
[2], it shows that the presence of hidden layers can improve the performance of the
model. Utilizing LeakyReLU as mentioned in paper [3] reduces the training time of
the proposed model, and paper [6] shows that tanh activation function can improve
the performance of the denoising model. Paper [5] shows how UNet model architec-
ture can be used for building the audio denoising model. Combining this extracted
information done from the literature survey, a deep neural network model which is a
hybrid of UNet model and dense layers has been proposed and explained in the next
section.
3 Methodology
In this section, a description of the dataset and a detailed explanation of the model
architecture are provided.
3.1 Dataset
The datasets chosen for the projects are,

Vassil Panayotov worked with Daniel Povey to create LibriSpeech [8], a corpus
of around 1000 h of 16 kHz read English speech. The information comes from the
LibriVox project’s read audiobooks, which have been carefully separated and aligned.
The ESC-50 dataset [9] is a tagged collection of 2000 ambient audio recordings
that can be used to compare sound categorization systems. The dataset is made up
of 5-s recordings that are divided into 50 semantic classes (each with 40 examples)
and informally sorted into five major categories:
1. Animals
2. Natural soundscapes and water sounds
3. Human, non-speech sounds
4. Interior/domestic sounds
5. Exterior/urban noises.
Dataset for noisy input has been synthesized by randomly combining noise audios
from the ESC-50 dataset onto the LibriSpeech dataset.
3.2 Model Design
The proposed neural network structure was constructed with the UNet model as the
base, and this network architecture was modified for working with spectrograms, and
the last five layers of this network architecture comprise dense layers. The overall
working of the entire system has been shown in the form of a block diagram shown
in Fig. 2.
The deep neural network model is similar to that of the UNet architecture. The
UNet architecture has been chosen for this application because UNet architectures are
normally used in image segmentation problems, which are similar to the denoising of
audio file application as the network has to identify and segment out the clean audio
from the incoming noise audio file. The constructed model has two major portions.
The first major portion of the neural network is known as the contracting portion,
and the second major portion is called the expansive portion. The expansive portion
has five dense layers present at the end of the architecture as shown in Fig. 3.
Figure 3 shows the architecture diagram of the proposed model. The output from
the proposed model gives the value of negative audio noise value which is then
subtracted from the noisy speech audio spectrogram to produce denoised audio file.
Fig. 2 Block diagram of entire model
Fig. 3 Architecture model diagram
4 Experimental Setup
Dataset Preprocessing. The audio files cannot be used as such for training as the
noisy speech data should be synthesized by randomly combining both the ESC-50
dataset and the LibriSpeech dataset. The audio files are first converted into NumPy
which is then converted into a spectrogram matrix. For converting an audio file into
a NumPy matrix, the following parameters had been initially set,
1. sampling rate = 8000
2. frame length = 255
3. hop frame length = 63

4. minimum duration = 1.
Here, sampling rate is defined as the number of samples to be extracted per
second present in a time series. Here, the standard value of 8000 Hz has been used,
and further during experimentation, this number has been increased to 16,000 Hz
to check the performance of the model. For training a model to recognize the noise
present in the audio, the model should be trained on a noisy speech audio and also
the noise audio for the model to learn the features of the noise. Audios that are
captured at 8 kHz and windows somewhat longer than 1 s were removed to form
the datasets for training, validation, and testing. For the environmental noises, data
augmentation has been done by changing the window value randomly at different
times to create different noise windows. With a randomization of the noise intensity,
noises have been merged to clear voices between 20 and 80. A single noise audio
file has been created as training data where the audio file from the ESC-50 dataset
has been randomly merged.
4.1 Evaluation Metrics
This sub-section explains the evaluation metrics utilized for evaluating the perfor-
mance of the proposed model. Upon literature survey, various authors [1–5] have
utilized the same evaluation metric for evaluation of denoised audio.
PESQ Score. PESQ [10] refers to perceptual evaluation of speech quality which
is defined by International Telecommunications Union recommendation P.862. This
score value ranges from 4.5 to −0.5 where greater score value indicates better audio
quality. The PESQ algorithm is normally used in the telecommunications industry
for objective voice quality testing by phone manufacturers and telecom operators.
Higher the PESQ score, better the audio quality. For obtaining the PESQ score of
the model, both clean speech audio file and the noisy speech audio file are required
for obtaining the PESQ score. PESQ library has been used for this purpose.
STOI Score. STOI [11] refers to short term objective intelligibility which is a
metric that is used for predicting the intelligibility of noisy speech. This does not
evaluate the speech quality (as speech quality is normally evaluated in silence) of
the audio, but this returns a value between 0 and 1 where 1 being the highest score
where the noisy speech can be understood easily. The pystoi library has been used for
obtaining the STOI score of the models, and similar to the PESQ metric, the STOI
score computation also requires the presence of clean speech audio and the noisy
speech audio file.
5 Implementation and Results
The synthesized noisy speech is converted into spectrogram and along with its pair
of clean speech audio. These spectrograms are then given as input training data to
the proposed model. The input dataset is split into 80% training data and 20% testing
for training the model. Adam optimizer function is used, and mean squared error
loss function is used in the proposed model. The model stops once training when the
validation loss starts to increase. This is necessary in order to avoid overfitting of
the model. Once the model is trained, an input noisy speech audio in the form of its
matrix spectrogram is given as input to the model, and the predicted output values
are obtained. The predicted output values are then subtracted from the noisy speech
audio spectrogram in order to obtain the clean speech audio spectrogram which is
then converted to audio file.
5.1 Implementation of UNet Model
The base UNet model [12] has been implemented on the same dataset for comparison
purposes. Figure 9 shows the training graph for the UNet model. UNet model [12]
has been chosen as this architecture’s contracting and symmetrical expanding path
helps in distinct localization with constructing the model with less training data.
The UNet model performs best when it comes to image segmentation problems. The
audio files are converted into spectrograms [13], which are later converted into a
NumPy array at the time of training the model. This process is similar to the way
of handling image files where the images are converted into NumPy arrays when
feeding as training data into the model.
From the above Fig. 4, there is a slight increase in validation loss, after the third
epoch. This shows that the model has stopped learning. On increasing the number
of epochs above 4, the model starts overfitting as the training loss starts increasing.
The UNet model implementation was done using standard hyperparameters.
Fig. 4 Learning curve of the

UNet model
Fig. 5 Frequency–time graph of the input noisy audio file
Fig. 6 Spectrogram of the input noisy audio file
For this implementation purpose, a different external audio noisy speech file was
generated and used for obtaining the evaluation metric of the model. Figures 5 and
6 show the graphical representation of the input audio file.
Results. The following are the test results that were obtained for testing the UNnet
model for the above noise voice file. Figures 7 and 8 show the graphical representation
of the output audio file. The output spectrogram file from Fig. 8 shows only fewer
and clean spikes of red with a deeper blue background compared to that of the input
file’s spectrogram from Fig. 6. This shows visually the absence of noise in the output
spectrogram.
Fig. 7 Frequency–time graph of the output audio file

Fig. 8 Spectrogram of the output audio file
The evaluation metric obtained for the implementation of the UNet model is,
1. STOI score is 0.738.
2. PESQ score is 1.472.
5.2 Implementation of the Proposed Model
Initially, the proposed model has been trained on default hyperparameter values. The
number of epochs has been set to 4 because if the number of epochs is increased, the
proposed model starts overfitting, and the validation loss starts increasing, denoting
that the model has stopped learning. LeakyReLU has been utilized as the activation
function based on the results obtained from this paper [14] for the ESC-50 dataset
[9]. Mean squared error loss function is utilized since this is a prediction problem
and not a classification problem along with Adam [15] optimizer. Hyperparameter
values are,
1. number of epochs = 4
2. activation function = LeakyReLU
3. optimizer = Adam
4. loss = mean squared error
5. sampling rate = 8000.
Results. For the above hyperparameter values, the model was trained and the
following evaluation metric results were obtained.
The obtained evaluation metrics do not provide a drastic change compared to the
evaluation metric values obtained from the implementation of the UNet model for
comparison purposes. The presence of dense layers present in the proposed archi-
tecture did not bring change in the evaluation metric except for a slight increase in
PESQ score. In order to boost the performance of the proposed model, tuning of the
hyperparameter values must be done.
Fig. 9 Frequency–time graph of the input audio file
Fig. 10 Spectrogram of the input audio file
Hyperparameter tuning. According to the author Mike Kayser in his paper [6]
“Denoising convolutional autoencoders for noisy speech recognition,” the author had
proved that the tanh activation function yields better when a deep learning model
is trained on audio data; hence, the activation function had been changed to the
proposed architecture. The visual representation of the input noise speech audio file
is shown in Figs. 9 and 10.
In order to obtain improved results from implementing the proposed model, the
sampling rate of the audio file had been increased from 8000 to 16,000 Hz during
the training of the proposed model, as sampling rate defines the number of samples
per second taken from a continuous signal to make it a discrete or digital signal.
Thus, on increasing the sampling rate, the number of samples utilized for training
the proposed model increases which helps in learning the features of the audio file.
The proposed model had been trained for 3 epochs with the following hyperparameter
values. Hyperparameters values are,
1. number of epochs = 3
2. activation function = LeakyReLU, tanh
3. optimizer = Adam
4. loss = mean squared error
5. sampling rate = 16,000.
For the above hyperparameter values, Fig. 11 shows the learning curve diagram.
Fig. 11 Learning curve of

the proposed model
From the above training graph from Fig. 11, it is observed that the rate of validation
loss values slightly increases after the second epoch, but the training loss decreases
at a faster rate further. This shows that the model had slowly started to overfit based
on the given input training data.
If the model is trained above 3 epochs, the model starts overfitting, and this can be
visually seen when the training loss keeps decreasing, but the validation loss values
saturates. The following results were obtained for testing the model on a random
noise voice file. Figures 12 and 13 show the graphical representation of the output
audio file.
Fig. 12 Frequency–time graph of the output audio file
Fig. 13 Spectrogram of the output audio file

Table 1 Summary of results

S. no. Model PESQ score STOI score
obtained for various
implementations 1 UNet model 1.472 0.738
2 Proposed model 1.681 0.727
3 Proposed model (tuned) 1.905 0.756
4 WaveNet model [5] 1.374 0.738
5 A1 + W1 [5] 1.585 0.767
Results. On visual comparison between the input spectrogram and output spec-
trogram, the brightness of red is slightly reduced in areas where noises are present,
but it is not completely removed. From this, it is inferred that the denoised audio file
does have noise present in it but at a lower magnitude compared to that of what the
noise file had initially. The evaluation metric obtained for this experiment is,
From the obtained evaluation metric, it is observed that by increasing the sampling
rate of the audio file at the time of training the proposed model, changing the activation
function to tanh has increased the performance of the proposed model. Comparing
the spectrograms between Figs. 13 and 10, there is a drastic visual difference in the
shade of red, showing the magnitude of the noise has still further reduced.
Table 1 shows the condensed form of all results obtained, and from this, we can
clearly see that upon changing the hyperparameter values, improvised results can
be obtained. The table also clearly shows comparison of already existing models
from WaveNet model and A1 + W1 model from this paper [5]. It is observed that
the proposed model has better evaluation metrics compared to that of the existing
models and variations of the UNet model.
6 Inference
From Table 1, we can infer that the proposed deep neural network model works
better than the standard UNet architecture due to the presence of dense layers present
in the proposed model. Dense layers are normally utilized to identify unlabeled or
untagged features compared to that of a standard convolutional layer, where layer can
accurately learn the marked or highlighted features. Moreover, the proposed model
with tanh activation function increases the performance of the model. Further, the
performance of the model can still be increased with increase in sampling rate of the
audio file. Increased sampling rate refers to increasing number of samples obtained
for once second present in the audio; thus, more detailed features are present for the
proposed model to learn, hence increased performance of the model.
7 Conclusion
In this project, a deep neural network model has been proposed and experimented
with, which enhances the speech and denoises multiple kinds of noises present in for
any given audio file. The proposed model shows significant improvement in terms of
PESQ and STOI as audio spectrograms of clean speech audio files, and synthesized
speech noise audio files are used as training data. The results from experiments
where we obtain a STOI value of 0.756 and a PESQ score of 1.905 show how the
presence of dense layers with tanh activation function and the increased sampling
rate (from 8000 to 16,000 Hz) during training can significantly improve the results
of the proposed model.
References
1. Zezario RE, Hussain T, Lu X, Wang H-M, Tsao Y (2020) Self-supervised denoising autoen-
coder with linear regression decoder for speech enhancement. In: ICASSP 2020—2020 IEEE
international conference on acoustics, speech and signal processing (ICASSP), pp 6669–6673
https://doi.org/10.1109/ICASSP40776.2020.9053925
2. Saleem N, Khattak MI (2019) Deep neural networks for speech enhancement in complex-noisy
environments. Int J Interact Multimed Artif Intell InPress, p 1. https://doi.org/10.9781/ijimai.
2019.06.001
3. Alamdari N, Azarang A, Kehtarnavaz N (2020) Improving deep speech denoising by
noisy2noisy signal mapping. Appl Acoust (IF 2.440) Pub Date 16 Sept 2020. https://doi.org/
10.1016/j.apacoust.2020.107631
4. Vuong T, Xia Y, Stern RM (2021) A modulation-domain loss for neural-network-based real-
time speech enhancement. In: ICASSP 2021—2021 IEEE international conference on acous-
tics, speech and signal processing (ICASSP), pp 6643–6647. https://doi.org/10.1109/ICASSP
39728.2021.9414965
5. Saddler M, Francl A, Feather J., Kaizhi A, Zhang Y, McDermott J (2020). Deep network
perceptual losses for speech denoising
6. Kayser M, Zhong V (2015) Denoising convolutional autoencoders for noisy speech recognition.
CS231 Stanford Reports, 2015—cs231n.stanford.edu
7. Luo Y, Mesgarani N (2019) Conv-tasnet: Surpassing idealtime–frequency magnitude masking
for speech separation. IEEE/ACM Trans Audio Speech Lang Process 27(8):1256–1266
8. Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an ASR corpus based on
public domain audio books. In: 2015 IEEE international conference on acoustics, speech and
signal processing (ICASSP), pp 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964
9. Piczak KJ (2015) ESC: dataset for environmental sound classification. https://doi.org/10.7910/
DVN/YDEPUT, Harvard Dataverse, V2
10. Rix A (2003) Comparison between subjective listening quality and P.862 PESQ score
11. Taal CH, Hendriks RC, Heusdens R, Jensen J (2010) A short-time objective intelligibility
measure for time-frequency weighted noisy speech. In: ICASSP, IEEE international conference
on acoustics, speech and signal processing—proceedings, pp 4214–4217. https://doi.org/10.
1109/ICASSP.2010.5495701
12. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image
segmentation. LNCS 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
13. French M, Handy R (2007) Spectrograms: turning signals into pictures. J Eng Technol 24:32–35
14. Zhang X, Zou Y, Shi W (2017) Dilated convolution neural network with LeakyReLU for
environmental sound classification, pp 1–5. https://doi.org/10.1109/ICDSP.2017.8096153
15. Kherdekar S (2021) Speech recognition of mathematical words using deep learning. In: Recent
trends in image processing and pattern recognition. Springer Singapore, pp 356–362
16. Pandey A, Wang DL (2019) A new framework for cnn-based speech enhancement in the time
domain. IEEE/ACM Trans Audio Speech Lang Process 27(7):1179–1188
17. Zhao Y, Xu B, Giri R, Zhang T (2018) Perceptually guided speech enhancement using deep
neural networks. In: 2018 IEEE international conference on acoustics, speech and signal
processing (ICASSP), IEEE, Calgary, AB, pp 5074–5078
18. Martin-Donas JM, Gomez AM, Gonzalez JA, Peinado AM (2018) A deep learning loss function
based on the perceptual evaluation of the speech quality. IEEE Signal Process Lett 25(11):1680–
1684
19. Mohanapriya SP, Sumesh EP, Karthika R (2014) Environmental sound recognition using
Gaussian mixture model and neural network classifier. In: International conference on green
computing communication and electrical engineering (ICGCCEE)
20. Kathirvel P, Manikandan MS, Senthilkumar S, Soman KP (2011) Noise robust zerocrossing
rate computation for audio signal classification. In: TISC 2011—proceedings of the 3rd
international conference on trendz in information sciences and computing, Chennai, pp 65–69
21. Manoj C, Magesh S, Sankaran AS, Manikandan MS (2011) Novel approach for detecting
applause in continuous meeting speech. In: ICECT 2011—2011 3rd international conference
on electronics computer technology, Kanyakumari, vol 3, pp 182–186
22. Bhaskar J, Sruthi K, Nedungadi P (2015) Hybrid approach for emotion classification of audio
conversation based on text and speech mining. In: Proceedings of the international conference
on information and communication technologies (ICICT), Procedia Computer Science
23. Raj JS (2020) Improved response time and energy management for mobile cloud computing
using computational offloading. J ISMAC 2(1):38–49
24. Suma V, Wang H (2020) Optimal key handover management for enhancing security in mobile
network. J Trends Comput Sci Smart Technol (TCSST) 2(4):181–187
Concept and Development of Triple
Encryption Lock System
A. Fayaz Ahamed, R. Prathiksha, M. Keerthana, and D. Mohana Priya
Abstract The main aim of the triple encryption lock system is to surge the concept
of security and to illuminate threats, and it allows higher authorities to authorise
the concerned person to access the restricted areas. The issue of accessing highly
authorised areas is paramount in all places. This system is suitable for server rooms,
examination cells, home security and highly secured places. It is designed in such a
way that the door has three encryptions—DTMF, password security and fingerprint
sensing. We have designed it in such a way that the circuit will be in an OFF condition.
The user sends a signal to audio jack frequency, then the relay is triggered, and it
moves to the other two encryptions—keypad and fingerprint sensing. The attempt of
our encryption system is that the microcontroller gets turned on only when the signal
is sent from the user so that 24 h of heating issues are resolved. The real benefit is that
it provides significant changes in accessing highly authorised areas and can bring a
great change in the security system.
Keywords DTMF—dual-tone multiple frequency · Door lock · Triple encryption

lock · Keypad · Microcontroller · Fingerprint sensor
1 Introduction
Security being the main intent of the project, the most important application is to
provide security in home, examination cells, manufacturing units etc. Shruti Jalapur,
Afsha Maniyar published an article on “DOOR LOCK SYSTEM USING CRYP-
TOGRAPHIC ALGORITHMS BASED ON IOT” in which the secured locking is
achieved with AES-128 (Advanced Encryption Standards) and SHA-512 (Secure
Hashing Algorithm). Hardwares such as Arduino, servo motor, Wi-Fi module
and keypad have been used to obtain the proposed locking system [1]. Neelam
Majgaonkar et al. proposed a door lock system based on Bluetooth technology, but
A. F. Ahamed · R. Prathiksha (B) · M. Keerthana · D. M. Priya

Department of Electrical and Electronics Engineering, R.M.K Engineering College, Kavaraipettai
601206, India
e-mail: prat18220.ee@rmkec.ac.in
https://doi.org/10.1007/978-981-16-7610-9_4
50 A. F. Ahamed et al.
the communication range of a Bluetooth module is too low in comparison with Wi-Fi
or GSM communication [2]. Harshada B. More et al. made a deep technical survey
on face detection, face recognition and door locking system. Eigenfaces algorithm
is one of the algorithms that is mainly used for face recognition. The recognised
face is compared with the prefetched face in order to lock and unlock the door [3].
Varasiddhi Jayasuryaa Govindraj et al. proposed a smart door using biometric NFC
band and OTP-based methods in which Arduino, biometric, OTP, NFC, RFID and
GSM module are used. In this, NFC band has been used as one of the methods for
registered members and OTP technology is used for the guest user [4]. We have
analysed and studied the locks of various manufacturing companies such as Godrej
and Yale. On the basis of overall study, microcontrollers in all the systems have
been switched on for 24 h which results in heating issues and reduces the lifetime of
the system. So we are suggesting an efficient development over the current locking
system with high security without heating issues. As the name defines the meaning,
triple encryption lock system, primarily has three encryptions, i.e., DTMF module,
password security and fingerprint sensor. As it is a secure and safe lock system, it
consists of an electronic control assembly which ensures that safety is only in the
hands of the authorities. Two things that happen in authenticated places are to provide
security and easy access for unlocking the door, to be accessed only by the specific
person with the user control. Dual-tone multiple frequency module, password secu-
rity and fingerprint sensor are attached to the door to achieve the proposed system.
This paper will give a vivid idea about the mechanism of each encryption, flow chart
of the system and its working.
2 Objective
The main objective of the paper is to bring out the study about the three encryptions
(dual-tone multiple frequency module, password security and fingerprint sensor) in
a generalised perspective. Authorisation is the process of verifying the credentials of
a person and granting permission to access. In such a case, our system could be able
to award authorisation to a higher degree. This system not only provides security for
homes but also for other authenticated places. The report supplies information about
the techniques and working in each encryption.
3 Methodology
The report is structured by identifying the importance and demand of security in door
locking and unlocking. This system involves electrical work to achieve our idea. The
design methodology of the system consists of various steps. A single operator can use
this system in minutes. First, the user’s problem in security is planned to achieve the
desired system. Problems in existing system is analysed, then the essential method is
Concept and Development of Triple … 51
Fig. 1 Workflow of
designing a triple encryption
lock system
the electrical part using which workflow and functional block diagram is obtained.
To integrate the system to any existing structure of design, microcontroller and motor
is selected accordingly. The testing of code is done. The final prototype is developed
for effective target to access the highly authenticated places. This enables the user to
enter into highly authenticated areas (Figs. 1 and 2).
4 Major Components Required
See Table 1.
Fig. 2 Functional block diagram
Table 1 Components used in

S. No. Name Qty
the system
1 Arduıno uno 1
2 MT 8870 DTMF Decoder 1
3 AS608 fıngerprınt sensor 1
4 4 × 4 matrıx keypad 1
5 Servomotor 1
6 Relay 1
7 Lock 1
8 12 V battery 1
5 Encryptions of the System
5.1 First Encryption
This project uses DTMF technology for opening and closing of doors. Positive
terminal of the LED is connected to the output pin of the decoder, negative terminal
of the LED is connected to the ground of the decoder. Similarly, the mobile to DTMF
decoder is connected by the auxiliary cable. Every numeric button on the keypad of
the mobile phone generates a unique frequency when pressed. The user presses the
keys and signal is sent via to the audio jack of the mobile. DTMF decoder decodes
the audio signal. When the signal comes from the mobile, the corresponding value
of the frequency selects their function and performs it. The positive and negative
terminal of the 9 V battery is connected to the Vcc and ground, respectively. When
Fig. 3 Signal sent from user

to audio jack frequency
number “1” is pressed from the authorised person from a faraway locked spot, the
user mobile receives the frequency and the microcontroller gets turned on (Fig. 3).
5.2 Second Encryption
The keypad consists of a set of push buttons. The encryption is made in such a
way that the entered pin number compares with the preprogramed pin number. The
keypad lock works with the 3-digit code which is 888. Once you place the correct
combination in the keypad, the door gets unlocked. The door will remain closed upon
entering the wrong pin number (Fig. 4).
Fig. 4 Keypad encryption

5.3 Third Encryption
The Vcc of the fingerprint sensor is connected to 5 V of the Arduino and the ground
of the fingerprint sensor is connected to ground of the Arduino. Similarly, Rx of
fingerprint sensor is connected to pin 2 of Arduino and vx of fingerprint sensor is
connected to pin 3 of Arduino. The door gets unlocked when the user scans the right
fingerprint which is recorded in the system (Fig. 5).
6 Proposed System and Workflow
This project is proposed to provide access to the system far away from the locked
spot. Sometimes in the examination cells and home, it is necessary to provide access
even if the authorised person is not present. The solution proposed is triple encryption
lock system. The proposed system is designed in such a way that the door has three
encryptions—DTMF (dual-tone multi frequency), password security and fingerprint
sensing. In this project, the circuit will be initially at OFF condition. The user sends a
signal to the audio jack frequency of the mobile to turn on the microcontroller. Relay
gives true or false information; microcontroller, keypad and fingerprint gets turned on.
If the password security results are true, it moves to the fingerprint sensing. If all the
two inputs result true as an output, the motor running forward door lock gets opened.
The relay gives the required supply to the DTMF to turn ON the microcontroller.
Once the task is completed, Relay 2 is triggered to lock the door. The attempt of our
encryption system is that the microcontroller gets turned on only when the signal is
sent from the user so that 24 h of heating issues are resolved. This can bring a great
change in the security system. In certain cases, it is difficult for higher authorities to
give authorization to the concerned person to access the restricted areas. The project
is designed in such a way that the access is given at that instant and triple encryption
lock system can be made affordable to the people around. In today’s fast growing
world, the proposed system has high security and gives convenient access with three
encryptions (Fig. 6).
Fig. 5 Fingerprint
encryption
Fig. 6 Flowchart of the system
The user has to successfully cross the three verification points as below.
6.1 Step 1: DTMF—Triggering
Authorised person who is far away from the locked spot has to dial up to the mobile
that is connected to the locking system and enters number “1” from his/her dial
pad. So that the DTMF decodes the audio frequency and powers the microcontroller,
keypad and fingerprint sensor. Power to the microcontroller and other equipment are
turned ON only if the relay is closed.
Fig. 7 Triple encrypted lock system
6.2 Step 2: Pin Code Verification
The user has to enter a number on the keypad, the entered pin is verified with the
preprogramed pin. If it is verified true, it moves on to the next fingerprint encryption.
If either of the input is false, the process gets stopped.
6.3 Step 3: Fingerprint Verification
Third encryption is the fingerprint sensor. The user has to place the finger on the
fingerprint sensor, the captured fingerprint is verified with the recorded fingerprint.
If it results true, then the motor runs forward and the door gets unlocked (Fig. 7).
7 Conclusion
With the developments enumerated, we have developed expertise in the design, devel-
opment performance and modelling in the application of lock systems. This will be
pivotal in ensuring access to the concerned areas. The designed system not only
provides easy access to the user, it also resolves 24 h of heating issues. Thus, access
can be given to the concerned person in highly authorised areas. On the off chance
when you go to the market for the unique lock, it is close to |9000–|14,000, whereas
our triple encrypted lock system is modest with high security and suited for examina-
tion cell, server room, home security and highly authenticated areas. Therefore, when
we need a solution for easy access in highly concerned areas with a high security,
our triple encrypted lock system will be the solution.
References
1. Jalapur S, Maniyar A (2020) Door lock system using cryptographic algorithms based on IOT.
Int Res J Eng Technol 7(7)
2. Majkaongar N, Hodekar R, Bandagale P (2016) Automatic door locking system. IJEDR 4(1)
3. More HB, Bodkhe AR (2017) Survey paper on door level security using face recognition. Int
J Adv Res Comput Commun Eng 6(3)
4. Govindraj VJ, Yashwanth PV, Bhat SV, Ramesh TK (2019) Smart door using biometric NFC
band and OTP based methods. JETIR 6(6)
5. Nehete PR, Chaudhari J, Pachpande S, Rane K (2016) Literature survey on door lock security
systems. Int J Comput Appl 153:13–18
6. Delaney R (2019) The best smart locks for 2019. In: PCMag
7. Automatic door lock system using pin on android phone (2018)
8. Verma GK,Tripathi P (2010) A digital security system with door lock system using RFID
technology. Int J Comput Appl (IJCA) (09758887) 5:6–8
9. Hassan H, Bakar RA, Mokhtar ATF (2012) Face recognition based on auto-switching
magnetic door lock system using microcontroller. In: 2012 International conference on system
engineering and technology (ICSET), pp 1–6
10. Jagdale R, Koli S, Kadam S, Gurav S (2016) Review on intelligent locker system based on
cryptography wireless & embedded technology. Int J Tech Res Appl pp 75–77
11. Johnson J, Dow C (2017) Intelligent door lock system with encryption. Google Patents
Partially Supervised Image Captioning
Model for Urban Road Views
K. Srihari and O. K. Sikha
Abstract Automatically generating a characteristic language portrayal of an image

has pulled in interests in light of its significance in practical applications and on the
grounds that it associates two significant artificial intelligence fields: natural language
processing and computer vision. This paper proposes a partially supervised model
for generating image descriptions based on instance segmentation labels. Instance
segmentation, a combined approach of object detection and semantic segmenta-
tion is used for generating instance level labels which is then used for generating
natural language descriptions for the image. The instance segmentation model uses
MRCNN framework with feature pyramid networks and region proposal networks
for object detection, and fully convolution layer for semantic segmentation. Informa-
tion obtained from different local region proposals are used to generate region wise
captions. Important aspects of the caption include distance, color and region calcula-
tions based on the results obtained from the instance segmentation layers. This paper
uses instance segmentation layer information such as ROIs, class labels, probability
scores and segmentation values for generating effective captions for the image. The
proposed model is evaluated on Cityscape dataset where the primary objective is
to provide semantic scene understanding based on the instances available in urban
areas.
Keywords Instance segmentation · Partially supervised model · MRCNN (Mask

region based convolution neural networks) framework
1 Introduction
Generating meaningful natural language descriptions of an image has been promis-

ingly used by many applications over the past few years. Image captioning appli-
cations uses both computer vision and natural language processing tasks. Most of
K. Srihari · O. K. Sikha (B)

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Coimbatore, India
e-mail: ok_sikha@cb.amrita.edu
https://doi.org/10.1007/978-981-16-7610-9_5
60 K. Srihari and O. K. Sikha
the state-of-the-art image captioning algorithms [1] have different set of prepro-
cessing techniques for image and the output text separately. They use different types
of sequence modeling to map the input image to the output text. Supervised image
captioning has stronger impact but has few important limitations and issues. Applica-
tions of supervised image captioning like super-captioning [2] uses two-dimensional
word embedding while mapping the image to its corresponding natural language
description. Partially supervised image captioning [3] has applied their approach of
captioning to existing neural captioning models using COCO dataset. This approach
uses weakly annotated data which is available in the object detection datasets. The
primary objective of this paper is to derive set of natural language captions with the
partially available supervised data retrieved from the Instance segmentation network
trained on Cityscapes dataset. Object detection and semantic segmentation results
were involved in creating the partially supervised data. Major contributions of this
work include:
• Training end-to-end MRCNN with U-NET model for instance segmentation to
get semantic information.
• Object classification and localization based on images from urban street view.
• Develop an inference level captioning module without sequence modeling for
generating meaningful captions based on the information produced from the
instance segmentation layers.
The rest of the paper is organized as follows. Related works are explained in
Sect. 2; Sect. 3 describes the dataset attributes, image details and proposed model
in brief; The captioning results obtained from the proposed model are detailed in
Sect. 4. Finally, the paper concludes with Sect. 5.
2 Related Works
Object detection and semantic segmentation have become one of the key research
problem in computer vision since most of the high-end vision-based tasks such
as indoor navigation [4], autonomous driving [5], facial part object detection [6],
human computer interaction require accurate and efficient segmentation. With the
advent of deep learning models in the past few decades’ semantic segmentation
problem also witnessed great progress especially using deep Convolutional Neural
Networks. Mask RCNN (MRCNN) [7] is an important breakthrough in the instance
segmentation domain. Mask R-CNN uses Feature Pyramid Networks [8] and Region
Proposal Networks as in FASTER-RCNN [9] for object detection and uses fully
convolution layers for semantic segmentation. Ronnebergeret al. developed U-NET
[10] model which was an inspiring semantic segmentation algorithm majorly used
in medical AI applications. Brabander et al. [11] proposed a discriminative loss
function based semantic instance segmentation model for autonomous driving appli-
cation. Recurrent neural networks for semantic instance segmentation proposed by
Salvador et al. [12] uses CNN, RNN and LSTM for semantic segmentation, object
Partially Supervised Image Captioning Model for Urban Road Views 61
detection and classification, respectively. Developments in the instance segmentation

domain also put forward new research directions in image captioning [13]. Image
captioning with object detection and localization [14] by Yanguses uses sequence
to sequence encoder decoder model with attention mechanism to generate image
captions. Densecap fully convolution localization networks for dense captioning by
Justin et al. [15] used localization values as ground truth while generating captions.
Pedersoli et al. [16] proposed areas of attention for image captioning, where object
localization and proposals are used for generating captions. The major drawback of
sequence modeling based approach is that training end to end sequence modeling
is computationally expensive. Most of the state-of-the-art image captioning models
fails to generate captions with semantic information of the image under consideration
and they tend to generate identical captions for similar images. Anderson proposed a
partially supervised image captioning model [3], where labeled objects and detection
results are used to generate finer sentences as captions. A semi-supervised framework
for image captioning proposed by Wenhuet al. [17] detects visual concept from image
and uses reviewer-decoder with attention mechanism for caption generation. Liu et al.
[18] proposed an image captioning model which uses partially labeled data proposed
as Image captioning by self-retrieval with partially labeled data, but implemented
using reinforcement learning algorithm. Kamel and his team proposed a Tenancy
status identification in parking slots [19] which was based on mobile net classifier.
Image processing techniques are always of greater significance and used in most
of the computer vision applications. 3D image processing using Machine learning
[20] was developed by Sungeetha and team which is based on input processing for
Man–Machine Interaction.
3 Proposed Image Captioning System
This section describes the proposed partially supervised image captioning model in
detail. An improved Mask-RCNN model [21] with UNET architecture proposed in
our previous work is used for generating instance segmentation labels. The bounding
box and pixel-wise semantic information obtained from the hybrid M-RCNN—U-
NET model is used as the initial input to the image captioning model.
The annotations, masked outputs, localization results and object level labels
obtained from the instance segmentation model are used for generating meaningful
captions. Figure 1 shows the MRCNN-UNET [21] hybrid model architecture for
instance segmentation, and Figure 2 shows the proposed image captioning model.
The instance segmentation labels obtained from the MRCNN-UNET hybrid model
is shown in Fig. 4, which has pixel level annotation and corresponding confidence
score.
Region proposal network is used to generate proposals for object detection in
faster-rcnn. RPNs does that by learning from feature maps obtained from a base
network (VGG16, ResNet, etc.,). RPN will inform the R-CNN where to look. The
input given to the RPN is the convolution feature map obtained from a backbone
Fig. 1 Architecture of MRCNN-UNET hybrid instance segmentation model
Fig. 2 Architecture of the proposed image captioning model
network. The primary function of RPN is to generate Anchor Boxes based on Scale
and Aspect Ratio. 5 varying scales and 3 different aspect ratios are initialized, creating
15 anchor boxes around each proposal in the feature map. The next immediate task
of RPN is to classify each box whether it denotes foreground or back ground object
based on IOU values of each anchor boxes compared with the ground truth. The
metrics used in this level is rpn_cls_score and rpn_bbox_pred values. In anchor
target generation, we calculate the IOU of GT boxes with anchor boxes to check if it
is foreground/background and then the difference in the coordinates are calculated
as targets to be learned by regressor. Then these targets are used as input for cross
entropy loss and smooth l1 loss. These final proposals are propagated forward through
ROI pooling layer and fully connected layers.
Feature pyramid network (FPN) is a feature extractor which generates multiple
feature map layers (multi-scale feature maps) with better quality information than
the regular feature pyramid for object detection. With more high-level structures
detected, the semantic value for each layer increases. FPN provides a top-down
pathway to construct higher resolution layers from a semantic rich layer. FPN extracts
feature maps and later feeds into a detector, says RPN, for object detection. RPN
applies a sliding window over the feature maps to make predictions on the objectness
(has an object or not) and the object boundary box at each location.
U-NET is a fully convolutional neural network which is mainly used for training
end to end image processing algorithms where the set of input images can be of
any domain, but the corresponding output images are masked images of the primary
objects present in the input image. The size of input and output images are same.
U-NET model is nothing but a Convolutional AutoEncoder which maps input images
to masked output images. One important modification in U-Net is that there are a
large number of feature channels in the upsampling part, which allow the network
to propagate context information to higher resolution layers. As a consequence,
the expansive path is more or less symmetric to the contracting part, and yields a u-
shaped architecture. The network only uses the valid part of each convolution without
any fully connected layers. The output images are of binary images where only the
primary objects will be in the form of masked objects. The U-NET model consists
of 10 layers where first 5 layers in the contractive phase and the last 5 layers in the
expansive phase. The loss function used is binary cross entropy and the optimizer
used is Adam. The metric used for validating the FCN model is cross entropy loss
value and Mean IOU with 2 classes. One class is for background and one class is for
foreground objects. There were 19,40,817 trainable parameters. Figure 3 shows the
sample image and its corresponding output mask.
Fig. 3 a Sample input image to UNET. b Output mask

Fig. 4 Sample output from MRCNN-UNET instance segmentation model
Table 1 Quantitative
Metric name Value
evaluation Metrics analysis of
MRCNN-UNET instance Box-classifier loss 0.3874
segmentation model Box-localization loss 0.3232
Box-mask loss 2.5810
RPN localization loss 1.7640
RPN classification loss 1.8240
Total loss 7.0620
Mean Average Precision (mAP) 0.0341
mAP at 0.5 IOU 0.0401
mAP at 0.75 IOU 0.0001
mAP (small) 0.0029
mAP (medium) 0.0017
mAP (large) 0.0094
Average Recall (AR) 0.0017
AR (small) 0.0012
AR (medium) 0.0362
AR (large) 0.0197
Table 1 summarizes various evaluation metrics of MRCNN-UNET instance

segmentation model [21]. Box classification loss metric shows the measure of correct-
ness of object classification during object detection phase and the value was 0.3874
after 50,000 epochs. Box localization loss metric shows how tight the bounding box
values are predicted. It is a 4 K variable compared with the ground truth values and
the value was 0.3232 after 50,000 epochs. RPN loss is the metric obtained during
region proposal network trying to match the proposals to the ground truth values and
the value was 1.824 after the end of training. Box-classifier mask loss is the metric
used to compare the ground truth masked object. The instance segmentation mask
obtained from the MRCNN-UNET hybrid model is fed into the image caption module
as shown in Fig. 2. The information available at the end of instance segmentation
layers includes region of interests for the detected objects, their probability scores,
and their class labels and their corresponding binary labeled pixel wise segmented
masks which are shown in Fig. 5. The captioning module does not have ground
truth captions for training. The information obtained from the instance level labels
Fig. 5 Information available at the instance segmentation layer
were fit into meaningful NLP descriptions including semantic information such as
object color, location and distance between the objects. The skeleton structure of
the output captions are fixed for all the images but distance, color and region values
of the objects differ based on the 3 different captioning modules in every image.
The captioning modules include estimating size and distance based on reference
object method, color detection using k-means clustering and HSI color calculations,
and image-region wise captioning. This combination of instance segmentation labels
and inference captioning modules is a novelty approach for image captioning. This
approach does not include any sequence modeling for generating captions, making
the inference part computationally simple and effective. First level of caption lists the
important objects present in the image based on class label information. Second level
of caption is based on the estimated distance between vehicles or distance between
a vehicle and traffic signal, with respect to a real world reference object. Colors
of the contour detected objects are found based on the color detection captioning
module. Finally, the object’s location in the image is found using image-region wise
captioning module.
Instance segmentation is used as it is best scene understanding in any real time
applications. The issues in the existing traditional Image captioning systems are
limited ability of generating captions, generating identical sentences for similar
images. Sequence modeling is computationally expensive which is not used in the
proposed model. Localization and segmentation results are used in the proposed
approach which is lacking in most of the state of the art image captioning approaches.
Therefore, proposed model overcomes these research issues which are carried over
by the usual image captioning algorithms and provide good results by using the
inference level captioning modules.
3.1 Dataset Used
The Cityscapes Dataset is used for evaluating the proposed model. The dataset
contains images from 50 cities during several months (spring, summer and fall)
and the images are augmented with fog and rain, making it diverse. It has manu-
ally selected frames which include large number of dynamic objects, scene layout
and varying background. It covers various vehicles like car, truck, bus, motor cycle,
bicycle, caravan and also classifies human person walking in a side walk or rider
riding on road. Table 2 shows the set of classes present in the dataset and Fig. 6 are
few sample images.
The cityscapes dataset contains 25,000 images across 30 classes of objects
covering 50 different cities. From each class around 500–600 instances, across
different images are taken into consideration for the training process. Tensorflow
object detection and instance segmentation Mask-RCNN approach is used. The
creation of training and validation records used for modeling is created based on
the 30 classes using annotations and parsing into a single json file is created where
all the image and object details will be present as the ground truth. Horizontal and
Table 2 Dataset instances

Group Objects
Flat road, sidewalk, parking, rail track
Human Person, rider
Vehicle Car, truck, bus, on rails, motorcycle, bicycle,
caravan
Construction Building, wall, fence, guard, rail, bridge,
tunnel
Object Pole, traffic sign, traffic light
Nature Vegetation, terrain
Others Ground, sky
Fig. 6 Cityscape dataset sample images
vertical flips are majorly used image augmentation and preprocessing techniques.
Contrast, saturation, brightness and hue image processing attributes are also used
in the augmentation process. Around 500 first stage max proposals are given to
each ROI for best detection purpose during training and dropout, weight decay and
l2 regularization techniques are also used. Pipeline configuration files are used to
fine tune the hyper parameters for the MRCNN model. The training is connected to
tensorboard where real-time graph values of all metrics can be seen and analyzed.
After the training and validations are completed, the saved hybrid MRCNN-UNET
model is generated using tensorflow inference graph session-based mechanisms.
3.2 Estimation of Size and Distance Using a Reference Object
The generated captions from the proposed model has information regarding the
distance between vehicle instances, or distance between vehicle and traffic signal. A
reference object based algorithm is used for calculating the distance, which demands
for a reference object whose original size is known. The pixel wise vehicle mask
obtained in the segmentation result is mapped with the original size to get a relative
pixel wise size. Reference ratio is then calculated by dividing the original size by pixel
wise size. By taking corresponding length, or corresponding width, the reference
ratio will be approximately same. Our objective is to find the original distance of any
object or original distance between any 2 vehicles. The pixel wise distance between
two vehicles is calculated using the Euclidean formula which is then multiplied with
the reference ration to obtain the actual distance as shown in Eqs. 1 and 2. Table 3
shows a sample vehicle mask obtained from instance segmentation module and the
corresponding distance calculated based on reference object algorithm.
Original height of car

Reference Ratio =
Calculated height of car
Original_Distance_between_cars
= (1)
Calculated_Difference_between_cars
Table 3 Sample results of distance calculation

Pixel wise vehicle mask obtained from instance segmentation Calculated distance
module
11 Meters
2.45 Meters
Original_Distance_Between_Cars = Reference_Ratio
*Calculated_Difference_Between_Cars (2)
The basic information needed for calculating the distance between 2 cars is region
of interests box detections of the 2 cars. Consider the reference object in the image
is another car where the original height and width of the car is known. The reference
object is also detected by the model and its corresponding ROI box coordinates are
known. So we will be getting the original height of the reference object and the box
coordinate calculated machine result height of the reference object. Reference ratio
is calculated by dividing original height by machine result height of the reference
object. This reference ratio is same and common for all the objects present in the
image. Whereas the reference ratio value changes with image to image as each
image has different orientation and zooming attributes. The objective is to find the
original distance between the cars. Since the ROIs of the 2 cars is known, the center
coordinates of both the cars is also calculated. By using Euclidean distance, machine
result of pixel wise distance between the 2 center coordinates of the car is calculated.
When this distance value is multiplied with the reference ratio, the original distance
between the cars can be calculated. Using this algorithm, the distance between any
2 objects can be calculated provided the object is present in the training data and is
well trained.
Fig. 7 Image-region wise

captioning
Table 4 Regions and their

Region Center coordinates
corresponding center
coordinates Top-left (256, 512)
Top-right (256, 1536)
Center (512, 1024)
Bottom-left (768, 512)
Bottom-right (768, 1536)
3.3 Image-Region Wise Captioning
The bounding box coordinates of objects present in the image is further processed
to calculate the location. The entire image of size 1024 × 2048 is divided into 5
regions namely; top-left, top-right, bottom-left, bottom-right and center as in Fig. 7.
The center pixel coordinates of each region was calculated as tabulated in Table 4.
Euclidean distances are calculated between center coordinates of the object and all
other region’s center coordinates to find the exact location. In Fig. 7, the object of
interest, i.e., car is located in the center part of the image.
4 Results and Discussion
The output set of captions derived out of our image captioning algorithm explains
clear understanding of the instances available in the image. Semantic details like
distance between vehicles, distance between traffic signal and the vehicles, the region
in which the instances are present and the color of instances are captured in the output
captions. Few example images and their sample output captions are illustrated as
follows.
Sample 1:
Output Caption: The objects present in this image are: a building, 2 cars on the
road and 2 cars in the parking, 2 traffic sign boards, 3 poles. A tree is present in the
top right part of the image. The 2 cars are present in the top left part of the image.
The distance between black car and white car is 11 m.
Sample 2:
Output Caption: The objects present in this image are: 3 cars, 3 bicycles, 1
motorcycle, 1 traffic light, 2 persons and 1 tree. The traffic light shows green. The
distance between white car and traffic light is 6 m. The 3 cars are present in the top
right part of the image. The distance between black car and traffic light is 2 m.
Figures 8 and 9 describe the module wise output captions in detail. The initial level
of captions tells the list of objects present in the image. Distance between truck and
car is calculated in module-1 based on reference object distance calculation. Color
Fig. 8 Module wise output captions—Sample 1 a Input image. b Generated instance segmentation
mask
Fig. 9 Module wise output captions—Sample 2 a Input image. b Generated instance segmentation
mask
of the truck and car is obtained in the second module based on K-means and CIE
L*a*b space values. Object locations are found in the image-region wise captioning
part as the third module.
5 Conclusion
This paper proposes an image captioning model of cityscapes dataset using instance
segmentation labels as the input. Mask-RCNN-UNET is used as the instance segmen-
tation algorithm where bounding box prediction values and pixel segmented values
are available as partially supervised output data. The proposed image captioning
system generates semantic descriptions including distance between vehicles by refer-
ence object distance method, colors of objects present in the image using k-means
clustering and LAB color space values. The generated captions are meaningful and
can be applied to many real world applications. Captions which are generated from
the urban city images will have detailed information about the traffic control and
pedestrian safety, which can be useful for autonomous driving. More indications or
alert can be used for pedestrians crossing the road, vehicles which disobey the traffic
rules based on the output of the model. When it comes to real world applications, the
captions can be given to hearing aid for the blind. This model can be used for auto-
mated captions in YouTube for videos containing urban street views. While driving,
some unusual behavior in the roads can be given as an instruction in order to avoid
accidents. In case of road accidents, exact set of reports can be collected instantly.
References
1. Sanjay SP et al (2015) AMRITA-CEN@ FIRE2015: automated story illustration using word

embedding. In: Fire workshops
2. Sun B et al (2019) Supercaptioning: image captioning using two-dimensional word embedding.
3. Anderson P, Gould S, Johnson M (2018)Partially-supervised image captioning. arXiv preprint
arXiv:1806.06004
4. Sanjay Kumar KKR, Subramani G, Thangavel S, Parameswaran L (2021) A mobile-based
framework for detecting objects using SSD-MobileNet in indoor environment. In: Peter J,
Fernandes S, Alavi A (eds) Intelligence in big data technologies—beyond the hype. advances
in intelligent systems and computing, vol 1167. Springer, Singapore. https://doi.org/10.1007/
978-981-15-5285-4_6
5. Deepika N, SajithVariyar VV (2017) Obstacle classification and detection for vision based navi-
gation for autonomous driving. In: 2017 International conference on advances in computing,
communications and informatics (ICACCI). IEEE
6. Vikram K, Padmavathi S (2017) Facial parts detection using Viola Jones algorithm. In: 2017
4th International conference on advanced computing and communication systems (ICACCS),
Coimbatore, pp 1–4. https://doi.org/10.1109/ICACCS.2017.8014636
7. He K et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer
vision
8. Lin T-Y et al (2017) Feature pyramid networks for object detection. In: Proceedings of the
IEEE conference on computer vision and pattern recognition
9. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with
region proposal networks. NIPS
10. Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical
image segmentation. BIOSS Centre for Biological Signalling Studies, University of Freiburg,
Germany
11. De Brabandere B, Neven D, Gool LC (2017) Semantic instance segmentation with a

discriminative loss function. arXiv preprint arXiv:1708.02551
12. Salvador A et al (2017) Recurrent neural networks for semantic instance segmentation. arXiv
preprint arXiv:1712.00617
13. Weeks AR, Hague GE (1997) Color segmentation in the hsicolor space using the k-means
algorithm. In: Nonlinear image processing VIII. Vol. 3026. International Society for Optics
and Photonics
14. Yang Z et al. (2017) Image captioning with object detection and localization. In: International
conference on image and graphics. Springer, Cham
15. Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: Fully convolutional localization networks
for dense captioning. In: Proceedings of the IEEE conference on computer vision and pattern
recognition
16. Pedersoli M et al (2017) Areas of attention for image captioning. In: Proceedings of the IEEE
international conference on computer vision
17. Chen W, Lucchi A, Hofmann T (2016) A semi-supervised framework for image captioning.
18. Liu X et al (2018) Show, tell and discriminate: Image captioning by self-retrieval with partially
labeled data. In: Proceedings of the European conference on computer vision (ECCV)
19. Kamel K, Smys S, Bashar A (2020) Tenancy status identification of parking slots using mobile
net binary classifier. J Artif Intell 2(3):146–154
20. Sungheetha A, Sharma R (2021) 3D image processing using machine learning based input
processing for man-machine interaction. J Innov Image Process (JIIP) 3(01):1–6
21. Srihari K, Sikha OK (2021) An improved MRCNN model for instance segmentation. Pattern
Recogn Lett
Ease and Handy Household Water
Management System
K. Priyadharsini, S. K. Dhanushmathi, M. Dharaniga, R. Dharsheeni,

and J. R. Dinesh Kumar
Abstract Managing waste water is one of the important things that is directly
connected to the entire water chain and so it is essential to manage water utility.
This is the right time to start saving water as population increases drastically along
with which the necessity of water increases. For an instance, about 85 L/day of water
is wasted on an average by a family. The water we save today will serve tomorrow, by
this way though there are lots of technologies in saving water wastage, this project
is all about narrowing down all the technologies and making an IOT application
which not only makes water management at home easy but also make it handy
which indeed helps to monitor and access the household water even at the absence
of physical presence.
Keywords IOT · Water management · Handy device · Portable
1 Introduction
Here, we are with a simple IOT device which in turn helps the user to manage the
household water in an efficient way. This handy and easily portable device will be
even very helpful to elderly people to manage water from wasting [1]. An auto-
matic water management system doesn’t require a person’s contribution in mainte-
nance. All automated water systems are embedded with electronic appliances and
specific sensors. The automatic irrigation system senses the soil moisture and further
submersible pumps are switched on or off by using relays, and as a result, this system
helps in functioning without the presence of the householder. The main advantage
of using this irrigation system is to reduce human interference and ensure proper
irrigation [2]. The use of automatic controllers in faucets and water storage tanks
K. Priyadharsini (B) · S. K. Dhanushmathi · M. Dharaniga · R. Dharsheeni · J. R. Dinesh Kumar

Department of Electronics and Communication Engineering, Sri Krishna College of Engineering
and Technology, Coimbatore, India
e-mail: priyadharsinik@skcet.ac.in
S. K. Dhanushmathi
e-mail: 18euec040@skcet.ac.in
https://doi.org/10.1007/978-981-16-7610-9_6
76 K. Priyadharsini et al.
help save a large amount of water being wasted. This project shows a system where
domestic use of water can be controlled using IoT devices from 200 m away. House-
hold consumption of water is majorly in gardening, water taps in kitchen and other
areas, refilling of over water tanks [3]. It is also used to know the water level in the
tank and also to control overflow by switching off the pump when the tank is filled.
2 Existing System
In general, in agriculture, a drip irrigation system [4] is used to maintain the moisture
but broadly there is no control in measuring the moisture content already present
in the soil. The person by their experience water the plants if moisture content is
already present in the soil the water goes waste, and before watering if the soil goes
dry losing the moisture watering after is of no use [5]. The same is happening in
sprinkler systems also, some areas are watered more, and some are less. There is
a lot of wastage during usage of water in houses, while operating taps. Taps must
be closed tight to avoid spillage of water, or it will drain the water from the tank.
If a person opens the tap to fill a bucket of water, there is always a chance that the
bucket will overflow before the tap is closed. Water tanks are not monitored mainly
because of their location. One knows that there is no water when the taps run dry
after then only the motor is turned ON to fill the water tank [6]. Now, while the tank
is being filled with water the motor should be switched off before it overflows, but if
we forget the water will flow out and go waste to drain. There are systems to monitor
the above causes but not together. If a tap is half closed and drains the water from the
tank means, while the water level indicator senses the level and turns ON the motor,
the entire storage of water will be wasted.
Based on recent research, IoT-based household water management is only used
for managing tank water level by using application [7].
3 Proposed System
The proposed methodology introduces an application with IoT options which controls
the water usage of the house. The proposed system overcomes the drawback of
mono-purpose usage as supervising the tank water level to multi-purpose usage as
controlling three main areas of water usage [8]: gardening, taps in the house and
water tank. In the application, the first one is for gardening where there is an option
for sensing the soil moisture if the moisture content is less than the required amount
we can switch on the motor for gardening. This can be done directly in the application
through mobile. The taps in the house are sensor based so that when we place hands
the water will flow. If the user feels that a tap may be leaky, and he is not in the house
he can close water flow in all the taps through the application. During all these, there
should always be water in the tank, if the water level is less the application sends out
Ease and Handy Household Water Management System 77
a notification to switch ON the motor to fill the tank. When the tank is fully filled, it
will automatically switch OFF the motor without letting the water to overflow. The
retrieved data is sent to the cloud through the IoT module, and the user manages it
through the application [9].
3.1 Advantages
• Usage of smart IoT devices simplifies work and helps to plan efficiently.
• Minimum amount of water is used to satisfy the daily needs.
• Wastage of water can be controlled from anywhere through IoT.
4 Block Diagram
The working model of the system is briefly described in the given block diagram.
All modules and sensors including moisture and temperature sensors are directly
connected to the arduino which serves as a controlling system [10]. Therefore,
these modules are connected via cloud to our mobile phone which indeed helps
the application to run (Fig. 1).
5 Flow Chart Application Software Flowchart
The application ‘E-Water’ aims at conserving water for future generations which
starts from every individual home. And also, the ecosystem should be kept balanced
by treating plants, herbs and trees which are planted in homes with sufficient amounts
of water (Fig. 2).
So, initially, this application has the control over taps and the tank of the home.
The following FCs would be representing individual hardware setups’ procedure of
working.
5.1 Automatic Faucet System
Here, the tap is being smartened by automation with servo motor so as to turn on
and off while the user is present and absent, respectively, by calculating the specific
range of distance using an ultrasonic sensor [11] (Fig. 3).
Fig. 1 Block diagram of the water management system model
Fig. 2 Flowchart of the water management system model

Fig. 3 Flowchart of smart faucet
5.2 Smart Tank System
In addition to that, there is also the water tank at home which is being smartened in
a way that the setup is set at the lid of the water tank so that the ultrasonic sensor
senses the distance of presence of water from the lid of the tank. The resultant status
is updated instantly on a liquid crystal display (Fig. 4).
If the water is at bottom, the LCD displays ‘very low’ which in turn makes the
IC L239D to direct the servo motor to be ‘ON’ and when subsequent levels of water
rise, corresponding notification is updated in the LCD. While the water level reaches
the top of the lid, again the IC is made to change the servo motor position so as to
turn ‘OFF’ the motor [12].
Fig. 4 Flowchart of smart tank
5.3 Smart Irrigation System
Then to irrigate the plants, the soil moisture is tracked and if it goes below the
threshold value potentiometer raises, an LED glows, an LCD (liquid crystal display)
notifies ‘water irrigation’ and the motor is triggered to ‘ON’ position [13]. After the
moisture of the soil is maintained, the motor is made to be ‘OFF’ (Fig. 5).
Fig. 5 Flowchart of smart irrigation
6 Concept Implementation
6.1 Sensors
The Arduino connects with ultrasonic sensors to detect water level in the tank, to
sense hand in order to open the faucet through ultrasonic waves and by using soil
moisture sensor to measure the moisture in the garden for irrigation.
6.2 IoT Module
The interface between the sensors and the cloud is done by IoT module. The collected
data from hardware integration is stored in cloud memory (noSQL big data base)
through IoT [14].
6.3 Application
If-then-else approach is used to make it easier. User gets the collected data in the appli-
cation. User can access and manage home water management through the application
(Fig. 6).
Fig. 6 Workflow of proposed system

Fig. 7 Simulation of garden irrigation by measuring moisture in soil
7 Simulation
Household water management system comprises three implementations which makes

sure of reduction in wastage of water, and hence, the working of these implemen-
tations which include smart irrigation system, smart faucet system and water tank
have been simulated and verified as follows. These simulations are done in Tinkercad
[15]. All these simulations are further connected to the application (Figs. 7, 8 and 9).
8 Function of Application
Having the entire application as hardware cum software with three main usages,
the ultimate aim is to preserve water through the individual circuit integration. This
proposed project integrates the tank water level indication, automatic on/off water
tap (by sensing hand) and smart water irrigation for home gardens [16].
When a user gets into this application, can see at the bottom the options of garden,
tap, tank, status, exit. If user choose garden, it checks the moisture of soil and brings
us notice to turn off water for irrigation when moisture level is low. It also shows
the level of moisture in the soil. Likewise, for tap, it manages the leakages of taps.
By sensing the hand, the water flow is activated in the hand washing area as well
as the hands leave the sensing area (hand washing area) the water tap completely
closes preventing any wastage [17, 18]. Thus, the automation in faucet is being
monitored that can not only be implemented in a newly constructed building but
also in homes those have been present for so many years with faulty leaky pipes.
If the user chooses the option of tank, it checks the water level in the water tank
Fig. 8 Simulation of tank water level indication by buzzer
Fig. 9 Simulation of automatic opening closing of tape by sensing hands

Fig. 10 Selection of garden
and indicates us by variant colors. Once if the tank is getting filled or emptied, the
buzzer also alerts us as well as the user gets notified by the application. The status
option shows whether the motor is turned ON or not. Chosen of exit option, exit the
application. This household water management system can be used even when user
is not at home using the automatic devices technology [19]. This system even works
on older homes’ water systems and older leakage faucets (Figs. 10, 11, 12, 13, 14
and 15).
8.1 Working Principle
To measure dielectric permittivity of the surrounding medium, capacitance is used

by the soil’s moisture measuring sensor. A voltage in the sensor is proportional to
the dielectric permittivity and the water content of the soil. Based on the result of
threshold value, the motor is made to be on or off, respectively. Then, the IC L239D
can change the direction and speed of the servo motor wherein the faucet can toggle
its position between on and off. Based on the result of threshold value, the entire
setup works to produce a result of effective usage of water in taps.
The tank level monitoring and filling of tanks along with notifying the target
user employees the sensor probe in the arrangement. The probes that are used here
Fig. 11 Showing of
moisture level
are triggered to send information to the control panel to indicate the emptiness or
completely getting filled before the overflow.
Thus, the application aims in controlling the water wastage by the user’s choice,
i.e., when one among the three choices, is selected by the applicant, the respective
principle behind each choice can be accessed and the working commences.
9 Result
From the E-water application, we checked the following water wastage situations: a
drinking water tap can waste up to 75 L a day due to leakage. Of the total usage of
water 15% of water is wasted in leakage per day in the absence of inmates (Source
Google). We can avoid the wastage by shutting the flow from the tank remotely
also, if water level decrease is monitored in the tank after installing our product at
homes. It is estimated that 7% of the water supplied is wasted during refilling of water
tanks because of overflow (Source Google). With our application refilling of water
tanks can be monitored and the pump can be turned off before the tank overflows
by continuously monitoring the water level. This prevents the overflow spillage and
Fig. 12 Selection of tap

option
Fig. 13 Showing of water

leakage
Fig. 14 Selection of water

tank
wastage of water to Nil. Usually during gardening, twice the amount of water is
being watered to plants and leads to major water wastage in a household. This can
be controlled by checking the moisture content with the application, and watering
the plants when it is required (Fig. 16).
The above graph constitutes the concept of utility of water before and after the
implementation of the proposed system. The survey gives the information about the
water usage in a home for a period of approximately two months. Axis of abscissa
is the time period where one unit equals one week. Axis of ordinate is the amount
of water used in kilo-liters where one unit equals one kilo-liter. The existing system
graph shows the history of greater amount of water used before implementation of
the proposed idea. The proposed system graph conveys the less usage of water since
the wastage is controlled with the help of the application. The variation from one
week to another week is due to the situations handled at home (say due to occasional
moments, functional days or malfunctioning/repair on the user’s mobile). Hence, the
conclusion is that when the application is used in a home, and water is preserved
than before (Table 1).
Fig. 15 Selection of motor status
Usage of water in existing system

vs
proposed system
10
water in kilo litres
8
6
4 Proposed system
2 Existing system
Fig. 16 Graph analysis

Table 1 Comparison between before and after implementation of the proposed system
Wastage of water in household After installing application
A flush of the toilet uses 6 L of water. On an We only manual checking for this now here
average a person wastes about 0–45 L of water after we can check for water leakage and have
per day for flushing. To understand it better, it control over it by using our handy application
is 30% of the water requirement per person per embedded with IoT and monitor and control it
day. Hence, wasted water amounts to 125 from anywhere
million liters per day
A drinking water tap can waste upto 75 L a day We can reduce atleast 98% of total wastage
due to leakage. Of the total usage of water, 15% after installing our product at homes
water is wasted in leakage per day
It is estimated to be 7% of the water supplied is With our application refilling of water tanks
wasted during refilling of water tanks because can be monitored and the pump can be turned
of overflow off before tank over flows. This prevents the
overflow spillage and wastage of water to nil
Usually during gardening, twice the amount of This can be controlled by checking the
water is being watered to plants and leads to moisture content and only watering the plants
major water wastage in a household when it is required
Moisture in soil is monitored and the
application gives notification when it less
10 Conclusion
The household water management system connects via IOT and brings into a single-
handy application. This application can be used effortlessly. This bring entire house-
hold water management into single-handy application. Hence, by this application
can conclude that to preserve water in the modern world is the need of the hour.
Starting to implement in homes and then extending to the entire country helps other
countries to take us as a role model and begin to save water. On an average a family
with 3 members could save 40% of water. This can also be used by a large family
which in turn makes them realize they would save upto 50% of water. When this is
used by a densely populated places like the hotels, hostels, halls, etc., so that India
could escape from water scarcity. This is the best way to save the water and prevent
from the wastage of water. The final outcome of the project is a single-handy appli-
cation controlling the IoT connected devices placed to manage household water. The
proposed system helps to completely save water in the upcoming busy world.
References
1. Robles T, Alcarria R, Martín D, Morales A (2014) An ınternet of things based model for
smart water management. İn: Proceedings of the 8th ınternational conference on advanced
ınformation networking an applications workshops (WAINA), Victoria, Canada. IEEE, pp
821–826
2. Kumar S (2014) Ubiquitous smart home system using android application. Int J Comput Netw
Commun 6(1)
3. Perumal T, Sulaiman M, Leon CY (2019) Internet of Things (IoT) enable water monitoring
system. In: IEEE 4th Global conference consumer electronics, (GCCE)
4. Dinesh Kumar JR, Ganesh Babu C, Priyadharsini K (2021) An experimental investigation to
spotting the weeds in rice field using deepnet. Mater Today Proc. ISSN 2214-7853. https://doi.
org/10.1016/j.matpr.2021.01.086; Dinesh Kumar JR, Dakshinavarthini N (2015) Analysis and
elegance of double tail dynamic comparator in analog to digital converter. IJPCSC 7(2)
5. Rawal S (2017) IOT based smart irrigation system. Int J Comput Appl 159(8):1–5
6. Kansara K, Zaveri V, Shah S, Delwadkar S, Jani K (2015) Sensor based automated ırrigation
system with IOT: a technial review. IJCSIT 6
7. Real time wireless monitoring and control of water systems using Zigbee 802.15.4 by Saima
Maqbool, Nidhi Chandra.
8. Durham R, Fountain W (2003) Water management within the house landscape Retrieved day,
2011
9. Kumar A, Rathod N, Jain P, Verma P, Towards an IoT based water management system for a
campus. Department of Electronic System Engineering Indian Institute of Science Bangalore
10. Pandian AP, Smys S (2020) Effective fragmentation minimization by cloud enabledback up
storage. J Ubiquit Comput Commun Technol (UCCT) 2(1): 1–9
11. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman Filter approach
based on intensity parameter. J Innov Image Process (JIIP) 3(01):7–20
12. Parvin JR, Kumar SG, Elakya A, Priyadharsini K, Sowmya R (2020) Nickel material based
battery life and vehicle safety management system for automobiles. Mater Sci 2214:7853
13. Dinesh Kumar JR, Priyadharsini K., Srinithi K, Samprtiha RV, Ganesh Babu C (2021) An
experimental analysis of lifi and deployment on localization based services & smart building.
In: 2021 International conference on emerging smart computing and ınformatics (ESCI), pp
92–97. https://doi.org/10.1109/ESCI50559.2021.9396889
14. Priyadharsini K et al (2021) IOP Conf Ser Mater Sci Eng 1059:012071
15. Priyadharsini K, Kumar JD, Rao NU, Yogarajalakshmi S (2021) AI- ML based approach in
plough to enhance the productivity. In: 2021 Third ınternational conference on ıntelligent
communication technologies and virtual mobile networks (ICICV), pp 1237–1243. https://doi.
org/10.1109/ICICV50876.2021.9388634
16. Priyadharsini K, Kumar JD, Naren S, Ashwin M, Preethi S, Ahamed SB (2021) Intuitive
and ımpulsive pet (IIP) feeder system for monitoring the farm using WoT. In: Proceedings of
ınternational conference on sustainable expert systems: ICSES 2020, vol 176. Springer Nature,
p 125
17. Nanthini N, Soundari DV, Priyadharsini K (2018) Accident detection and alert system using
arduino. J Adv Res Dyn Control Syst 10(12)
18. Kumar JD, Priyadharsini K, Vickram T, Ashwin S, Raja EG, Yogesh B, Babu CG (2021) A
systematic ML based approach for quality analysis of fruits ımpudent. In: 2021 Third ınter-
national conference on ıntelligent communication technologies and virtual mobile networks
(ICICV). IEEE, pp 1–10
19. Priyadharsini K, Nanthini N, Soundari DV, Manikandan R (2018) Design and implementation
of cardiac pacemaker using CMOS technology. J Adv Res Dyn Control Syst 10(12). ISSN
1943-023X
Novel Intelligent System for Medical
Diagnostic Applications Using Artificial
Neural Network
T. P. Anithaashri, P. Selvi Rajendran, and G. Ravichandran
Abstract In recent years, the recognition of images with feature extraction in

medical applications is a big challenge. It is a tough task for the Doctors to diagnose
the diseases through image recognition with the scanned images or x-ray images.
To enhance the image recognition with feature extraction for the medical applica-
tions, a novel intelligent system has been developed using artificial neural network.
It gives high efficiency in recognizing the image with feature extraction compared
over fuzzy logic system. The artificial neural network algorithm was used for the
feature extraction from the scanned images of patients. The implementation has been
carried out with the help of Tensor flow and Pytorch. The algorithms was tested over
200 sets of scanned images has been utilized for the classification and prediction of
trained dataset images. The analysis on the data set and test cases has been performed
successfully and acquired 81% of accuracy for the image recognition using artificial
neural network algorithm. With the level of significance (p < 0.005), the resultant
data depicts the reliability in independent sample t tests. The process of prediction
of accuracy for the image recognition, through the ANN gives significantly better
performance than the fuzzy logic system.
Keywords Artificial neural network · Novel diagnostic system · Fuzzy logic

system · Feature extraction · Image recognition
T. P. Anithaashri (B)
Institute of CSE, Saveetha School of Engineering, Saveetha Institute of Medical and Technical
Sciences, Chennai 602105, India
e-mail: anithaashritp.sse@saveetha.com
P. S. Rajendran
Department of CSE, Hindustan Institute of Technology and Science, Chennai, India
e-mail: selvir@hindustanuniv.ac.in
G. Ravichandran
AMET University, Chennai, India
https://doi.org/10.1007/978-981-16-7610-9_7
94 T. P. Anithaashri et al.
1 Introduction
The lack of highly efficient detection system for image recognition of chronic diseases
such as cardiovascular diseases, cancers, respiratory diseases, pulmonary diseases,
asthma, diabetes [1] in the later stages becomes fatal. The exigent factors for these
types of chronic diseases are the continuous treatment, pharmaceutical requirements,
medical electronic equipment requirements to diagnose the stage [2] of the damage
in the organ, in a periodical manner. Diagnosis of these types of diseases and taking
measures for treatments are becoming more challenge [3]. The analyzes on various
kinds of chronic diseases using the factors such as periodical observation through
tests, monitoring the level of adequacy of glucose, sucrose [4], etc., are tedious
process. Thus diagnosing the chronic disease just by symptoms is less efficient.
Hence Artificial Intelligence techniques [3] can be used for better detection with
high resolution and accuracy to overcome the disease diagnosis errors between the
scanned images [2] and x-ray images.
2 Existing System
The image analysis for chronic diseases using artificial intelligence techniques gives
less efficiency in image recognition. By using convolution neural network, the diag-
nosis on diseased images gives approximation [5] in image recognition. The major
functions of convolution neural network returns the feature map for the image identi-
fication with the various parameters in the recognition of images [6]. Differentiating
the use of convolution neural network with other artificial intelligence techniques for
the image analysis [7], recurring convolution neural network provides more accu-
racy than convolution neural network, but time constraints are more. The convolution
neural network algorithm [2] can be used to outperform image classification to predict
and differentiate the diseased images from the normal images. So, in order to reduce
the errors and time lapse process, the artificial neural network algorithm is used with
chest x-ray images [8] of patients for identifying the disease.
In many sectors, the use of emerging trends in artificial intelligence, paved the way
to enhance the existing system in terms of image recognition, image analysis, image
classification, etc. It became an industrial revolution [6] in terms of automation for
image recognition. In the medical field, there are many AI algorithms [9] and tech-
niques that are being implemented. Disease prediction has always been a challenge
to doctors and it is time consuming. To overcome all these drawbacks, automation of
disease prediction using AI techniques [2] can make the process simple and feasible.
With the help of AI algorithms, implementation of a smart medical diagnosing [10]
system for diagnosing [11] chronic diseases through image recognition [12] is less
efficient in diagnosis. The use of neural network helps in analyzing the trained data
sets with validation but gives less accuracy in identifying the disease through image
analysis.
Novel Intelligent System for Medical Diagnostic Applications … 95
3 Proposed System
The application of artificial techniques to any field enhances the efficiency in automa-
tion of emerging technology. The use of AI in the medical applications [7] were
tremendous in the automation of manual work for the real time applications. The
image recognition for the analysis of various diseases [6] has become a big chal-
lenge to the Doctors community in analyzing the various kinds of parameters such
as consumption of time to diagnose, feature extraction of scanned images, etc. To
overcome all these drawbacks [8], the AI techniques can be used to enhance the
system for image recognition of scanned images. In this novel system, the evalua-
tion through test procedures and classification [13] of normal and abnormal scanned
images of the patients through various clinical observations. Artificial intelligence
algorithms ANN [14] and fuzzy logic system are compared with their performances
in prediction. In the fuzzy logic system, the weights and biases [14] are assigned from
layer to layer connected, but in ANN algorithm, the first and last layers are connected
which is considered as an output layer. To address the problem [4] of diagnosing the
disease through image recognition, a novel system has been proposed. The overview
of the proposed system depicted in Fig. 1.
In this framework, processing of input data through cloud application takes place.
The use of neural architectural search automates the working function of artificial
neural network. The neural architectural search explore the search space and helps to
evaluate the ANN for the specific task. The processing of images in neural network
algorithm helps in the extraction of images. Its starts with the identification of data
sets, such that once the data sets are identified, the process of image classification is
Fig. 1 Novel framework for diagnosing chronic diseases through artificial neural network
carried out by artificial neural network. Thus, it paved the way to extract the image
accuracy for diagnosing the diseases through image processing. The study setting of
the proposed work is done in Saveetha University. The number of groups identified
are two. The group 1 is fuzzy logic system, and group 2 is artificial neural network
algorithm. Artificial neural network and fuzzy logic system was iterated various
number of times with the sample size of 200.
3.1 Fuzzy Logic System
In this system, an image is considered as frames and the pixels of images with the
data augmentation, complicated images are considered to be classified as trained. An
image which is clear and precise to the human eye may not be accurate and not clear
with details. The analysis of scanned images through different permeation for each
layer provides clarity in the analysis of images. This gives us the time efficiency. But
in terms of accuracy of data, it is not efficient.
Step 1: Start.
Step 2: Load the datasets path through cloud application.
Step 3: Read images and resize them.
Step 4: Convert to grayscale.
Step 5: Train and test the images.
Step 6: Repeat the process for analysis.
Step 7: Prediction of Accuracy extracted images.
Step 8: Stop.
After the process, find the number of samples for each class and test images with
the trained data that are classified to predict the image recognition in an effective way.
The process of data intensification helps to enhance the performance of the algorithm
to classify the scanned images. After the data intensification [15], the quality images
and the classified images [5] will be saved in some random order. By classifying the
patients’ scanned images and modification of the dataset was a difficult process and
hence provides the less accuracy.
3.2 Artificial Neural Network Algorithm
The artificial neural network is used to identify the feature values of the samples of
external data. It processes the inputs and analyze the images to extract the feature in
an image. The feature extraction of scanned images through image analysis from the
classified data provides the clarity, and thus, it helps to predict the accuracy. The peak
signal ratio or noise or disturbances are the part of the image for classification is called
loss. The use of neural architectural search improvise the application of algorithm,
thus provides the novelty to this proposed system. The neural architectural search
paved the way for better enhancement in processing of the images through automation
in three layers namely input layer, output layer and hidden layer. Hence, this novel
architecture of artificial neural network is a better model because of the less number
of parameters, reusability of weights assigned, and thus, it gives the time efficiency
with high accuracy.
Step 1: Start
Step2: Read the no.of data

for i in range(1,n)
read xi,yi
next i
Step 3: Assign the weights to the input
Step 4: Add bias to every input xi, yi
Step 5: Find the sum and activate

Step 6: stop
The simulation tools tensor flow and keras were used for execution of the project
code. It helps to manage and access various kinds of files. Through the python
environment, a command prompt can provide easy access to the code and execution.
Main tools that need to be installed in the python environment are keras and tensor
flow. Minimum of 4 GB RAM is required to compile and execute the project code.
Preferred operating systems are windows and ubuntu. Using anaconda navigator
software and anaconda prompt helps to install the necessary modules and tools. By
testing various kinds of scanned images [4] for classification, with the number of
epochs given as 10, it has increased the efficiency with less time of execution and
more accuracy of image extraction. By reducing the data size for image classification,
helped to get the improved accuracy with increased efficiency in terms of taking
less time by using a novel diagnostic system. To check with the data and accuracy
reliability, SPSS is used with the level of significance of 0.05.
4 Results and Discussions
The image analysis with predicted data sets are trained for 10 epochs and a total
sample of 200 images of chest scanned image datasets. A total of 10 epochs and
batch size of 22 are used in the model and tabulated with epoch stages as shown
Table 1 Analysis on accuracy (0.9147) of train and loss (0.3262) data of images for different
epochs stages (10) with the model trained by various categories of scanned images of diseased
patients
Epoch stage Training accuracy Training loss Validation accuracy Validation loss
1 0.61 0.58 0.71 0.63
2 0.82 0.62 0.65 0.71
3 0.81 0.51 0.62 0.54
4 0.79 0.51 0.60 0.36
5 0.87 0.68 0.66 0.45
6 0.79 0.31 0.52 0.71
7 0.69 0.48 0.63 0.67
8 0.72 0.52 0.58 0.74
9 0.93 0.54 0.58 0.62
10 0.72 0.61 0.55 0.31
in the Table 1. Thus, training with 10 epochs and the specified batches provides an
accuracy of 81% in disease prediction through the proposed system. In Table 1, the
process of data training will be carried out by novel diagnostic system and after the
classification of training data, and the system will be trained to categorize different
kinds of virus affecting the human body, respectively.
Here, Fig. 2 represents the variations between the accuracy and loss by analyzing
the trained data sets and achieving the accuracy of 0.93 which in turn specifies the
improvisation through artificial neural network.
In Table 2, F refers to the f statistics variable which is calculated by dividing
mean square regression by mean square residual. T refers to t score and depicts
Fig. 2 Accuracy scores image extraction based on the different stages of epochs on the major axis
with the range of minor axis for the accuracy (0.93) and loss (0.31), respectively
Table 2 SPSS statistics depicts data reliability for artificial neural network and fuzzy logic system with independently sample T- test and the result is applied
to fix the dataset with confidence interval as 95% and level of significance as 0.05 to analyze the data sets for both algorithm and achieved more accuracy for
artificial neural network than that of fuzzy logic
F sig T df Sig (2-tailed) Mean difference Std. error 95% confidence 95% confidence
difference interval of the interval of the
difference difference
Lower Upper
Accuracy Equal variances 1.8 0.265 2.61 18.0 0.002 0.19 0.51 0.075 0.28
assumed
Accuracy Equal variances 2.61 16.0 0.003 0.18 0.050 0.078 0.32
not assumed
Loss Equal variance 4.5 0.040 0.97 18.0 0.360 16.31 17.0 −19.30 53.04
Novel Intelligent System for Medical Diagnostic Applications …
assumed
Loss Equal variance 0.97 10.0 0.36 16.31 17.0 −23.14 54.88
not assumed
99
the population variance, when the t value exceeds the critical value, then the means
are different. It can be calculated by dividing the difference between the sample
mean and given number to standard error. Sig (2 tailed) is a significance, which is
depicted by comparing with 0.05 it should be within the level of significance. The
below graphical representation Figure 2 depicts the accuracy and loss for respective
algorithms compared. When compared to fuzzy logic system, artificial intelligence
neural network algorithms depicts more accuracy in the image recognition analysis.
The scanned images of chest are considered for classification of data. After the
classification, the trained data is tested and validated with 10 epochs and results of
validation are obtained. Graphical representation of the loss and accuracy for artificial
neural networks gives 81% of accuracy with the assumed variance 0.18 with the help
of SPSS. The reliability of data with respect to the artificial neural network with the
mean difference for assumed variances and non-assumed variances of 0.02 provides
more accuracy than the fuzzy logic algorithm with their mean accuracies, and thus,
the high accuracy is obtained in extraction of images.
5 Conclusion
The validation results are obtained by the classification of the images with trained
data and 10 epochs. The implementation results show the improved accuracy of 81%
in image recognition with extraction of images. By using artificial neural network,
the connection between each layer helps to acquire more accuracy from classification
of images. By using the novel diagnostic system, the grouping of images for affected
and unaffected people helps to classify and train models to diagnose the presence of
disease through the image recognition in a significant manner. The proposed system
has considered scanned images, which is a limitation and can be overcome by using
the radiology images for more accuracy. The use of web application through bots
interaction and the utilities of AI tools for disease prediction and treatment would be
a future scope of this system.
References
1. Harmon SA, Sanford TH, Xu S, Turkbey EB, Roth H, Xu Z, Yang D et al (2020) Artificial intel-
ligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets.
Nat Commun
2. Kılıc MC, Bayrakdar IS, Çelik Ö, Bilgir E, Orhan K, Aydın OB, Kaplan FA et al (2021) Artifi-
cial intelligence system for automatic deciduous tooth detection and numbering in panoramic
radiographs. Dento Maxillo Facial Radiol
3. Grampurohit S, Sagarnal C (2020) Disease prediction using machine learning algorithms.
https://doi.org/10.1109/incet49848.2020.9154130
4. Livingston MA, Garrett CR, Ai Z (2011) Image processing for human understanding in low-
visibility. https://doi.org/10.21236/ada609988
5. Ponnulakshmi R, Shyamaladevi B, Vijayalakshmi P, Selvaraj J (2019) In silico and in vivo

analysis to identify the antidiabetic activity of beta sitosterol in adipose tissue of high fat diet
and sucrose induced type-2 diabetic experimental rats. Toxicol Mech Methods 29(4):276–290
6. Fouad F, King Abdul Aziz University, and Saudi Arabia Kingdom (2019) The fourth industrial
revolution is the AI revolution an academy prospective. Int J Inf Syst Comput Sci. https://doi.
org/10.30534/ijiscs/2019/01852019
7. Girija AS, Shankar EM, Larsson M (2020) Could SARS-CoV-2-induced hyperinflammation
magnify the severity of coronavirus disease (COVID-19) leading to acute respiratory distress
syndrome? Front Immunol
8. Greenspan RH, Sagel SS (1970) Timed expiratory chest scanneds in diagnosis of pulmonary
disease. Invest Radiol. https://doi.org/10.1097/00004424-197007000-00014
9. Ramesh A, Varghese S, Jayakumar ND, Malaiappan S (2018) Comparative estimation of sulfire-
doxin levels between periodontitis and healthy patients—a case-control study. J Periodontol
89(10):1241–1248
10. Rahman T, Chowdhury MEH, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA,
Kashem S (2020) Transfer learning with deep fuzzy logic system for scanned detection using
chest scanned. Appl Sci. https://doi.org/10.3390/app10093233
11. Anithaashri TP, Ravichandran G, Kavuru S, Haribabu S (2018) Secure data access through
electronic devices using artificial intelligence, IEEE Xplore. In: 3rd International conference
on communication and electronics systems (ICCES). https://doi.org/10.1109/CESYS.2018.
8724060
12. Manjunath KN, Rajaram C, Hegde G, Kulkarni A, Kurady R, Manuel K (2021) A systematic
approach of data collection and analysis in medical imaging research. Asian Pac J Cancer
Prevent APJCP 22(2):537–546
13. Knok Ž, Pap K, Hrnčić M (2019) Implementation of intelligent model for scanned detection.
Tehnički Glasnik. https://doi.org/10.31803/tg-20191023102807
14. Liu C, Ye G (2010) Application of AI for CT image identification. In: 2010 3rd International
congress on image and signal processing. https://doi.org/10.1109/cisp.2010.5646291
15. Rahman T, Chowdhury MEH, Khandakar A, Islam KR, Islam KF, Mahbub ZB, Kadir MA,
Kashem S (2020) Transfer learning with deep convolutional neural network (CNN) for
pneumonia detection using chest X-Ray. Appl Sci. https://doi.org/10.3390/app10093233
Extracting Purposes from an Application
to Enable Purpose Based Processing
Amruta Jain and Sunil Mane
Abstract In general, applications are built to serve certain business purposes. For
example, a bank invests into an application development to offer its customers online
services such as online shopping, FD services, utility bill payments, etc. Within the
application, every screen has its own purpose. For example, the login page is used
to authenticate a customer. Dashboard screen gives a high-level view of activities
done by a customer. Similarly, every field appearing on the screen has purpose too.
The username field of login screen enables customer to supply a username given
to the customer by the bank. Every screen of a given enterprise application is not
developed keeping the exact purposes in mind. Many a times, a screen may serve
multiple purposes. While an application screen may serve different purposes and an
application with less number of screens suits an enterprise in terms of reduced expen-
diture for the development and subsequent maintenance activities, it is at loggerhead
with privacy laws which are demanding purpose-based processing. Therefore, it is
necessary to first build a repository of purposes that a given application can serve.
And, then subsequent refactoring of application can be done (if required) to com-
ply with privacy laws. To find out purposes, we use keyword extraction method,
K-means clustering on text data of application, to get keywords and respective pur-
poses.
Keywords K-means · Crawljax · html parser jsoup · Purpose repository ·

Keyword extraction · tf-idf · Similarity · Web data
A. Jain (B) · S. Mane

Department of Computer Engineering and Information Technology,College of Engineering Pune,
Pune, Maharashtra, India
e-mail: jainas19.comp@coep.ac.in
S. Mane
e-mail: sunilbmane.comp@coep.ac.in
https://doi.org/10.1007/978-981-16-7610-9_8
104 A. Jain and S. Mane
1 Introduction
As data scientists said that “Data gives any information, as we need to process that
data, i.e., to confess anything from data, we need to torture it until we get our result.”
Well, it’s absolutely true when it comes for business organization to make more profit
from the current present system by analyzing historical data set and inculcating the
knowledge gained in taking proper or better efficient business decisions. Nowadays,
organizations are also trying to make their applications or existing system more
secure and privacy concern to gain customer trust by giving privacy to their personal
data, by taking their consent. Government of all countries are also making some rules
and laws regarding privacy of data. So that all organization have to develop or update
their system according to new rules made by government of country.
As of we know, currently, most of the application are not developed keeping the
exact purpose in mind. Every screen of a given enterprise application many times
serve multiple purpose but as of now that is not taking in concern or we can say not
given any importance to that there are several reasons for it. Chief among them is,
it reduces the development time (by creating a multi-purpose screen) and hence, the
subsequent resources required for the testing and maintenance for the application. It
is at loggerhead with privacy laws which are demanding purpose-based processing.
For that, this research part come into existence. Data Privacy and Data Security
both used mutually, but there is some difference between them, Data privacy regulate
how data is collected, shared for any business purpose and how it used after sharing.
Data privacy is a part of data security. Data security safeguard data from intruders and
vicious insiders. Main concept of data privacy is how exactly it deals with data con-
sent, governing obligations, notice or task. More specifically, practical data privacy
concerns about: (1) Exactly how or whether information is shared with third parties.
(2) How information is legally collected or stored. In the digital age, the meaning
of PHI (personal health information) and PII (personally identifiable information) is
nothing but, how the data privacy concept is applied to critical personal info. This
can include medical and health records, SSN numbers (Social Security numbers),
financial info such as: bank account and credit card numbers and even basic, but still
sensitive, information, like addresses, full names and birth dates, etc.
System architecture develop for this research work is shown in Fig. 1. In that major
focus on, how Inference engine is developed for purpose-based processing. For that
first need to clear, what is Inference Engine? How it works? Importance of it? How
it made system as expert System?
Inference Engines: To get new facts and relationships from data repositories or
knowledge graph, these unit of an AI (Artificial Intelligence) methodology apply
some kind of logical rules. The process of inferring relationships between units uses
ML (Machine Learning), NLP (Natural Language Processing) and MV (Machine
Vision) have expanding exponentially the scale and value of relational databases
and knowledge graphs in the past few years. There is two way of building inference
engine as backward chaining and forward chaining.
Extracting Purposes from an Application … 105
Fig. 1 System architecture
Here to find purpose for given application data, we create one system which we
called it as inference engine. Mainly perform last 4-to-5 step of concrete system
design (see Fig. 2).
Regarding data we used in this research, as we are considering web data as our data,
like set of websites from different domains. Detail explanation we see next sections.
Purpose repository here used in research is made manually by taken consideration
of keywords and there purposes of respective domains of data. When we add some
more data from different domain we need to update that purpose repository manually
that was some time consuming work need to perform. Lastly we get purpose related
to keywords which are matched with keywords placed in purpose repository. Where
ever keywords and combination of keywords are matched there purpose is extracted
from that repository of purpose and present in the output.
2 Related Work
As we know most research work is done related to identity management concern and
added to that now-days it related to privacy as privacy policy enforcement and privacy
obligations related work as Marco Casassa Mont, Robert Thyne, present there work
mainly focuses on how to automate the enforcement of privacy within enterprises in a

systemic way, in particular privacy-aware access to personal data and enforcement of
privacy obligations considering identity management system [1–3]. Similarly, IBM
with their Hippocratic database manages to preserve the privacy of data with taken
into consideration of roles of users that is explained in [4], example given as, RBAC
(Role-Based Access Control lists) database system in which according to role tasks
is assign as related to tables, tuples, columns to delete, update, insert, etc. Agrawal
et al. explained hippocratic database with proper architecture in there research [4].
Also, some author focuses on different various approaches which is generally
looking as, knowledge or information inferencing, i.e., combining multiple data
sources and extraction techniques to verify existing data knowledge and obtain new
info or knowledge [5]. To make data private or hidden from some people in that case
one may use of data masking or screen masking for that Goldsteen et al. describes
notable hybrid approach to screen-masking in which, merging benefits of the low
overhead and flexibility of masking at the network with the theme available at the
presentation layers are done [6].
To extract data from online web pages, there are many approaches are available
one may use hybrid approach, which is formed on integrating of hand-crafted rules
methods and automatic extraction of rules described in [7]. P Parvathi; T S Jyothis
describes that there are many approaches to find out which words in text documents
are important to describe the class it is associated with. The proposed technique uses
CNN (convolution neural network) with DL (deep learning) and the DL is used to pre-
dict the classes correctly [8]. Ravindranath et al. present another algorithm to extract
meaning and structure from documents by producing semi-structured documents is
statistical model uses Gibbs sampling algorithm [9]. Singh et al. did survey, regarding
various different inference engines and their comparative study, explained in [10].
There are so many different techniques are present for keyphrase extraction among
them one is graph based ranking technique which is used by authors Yind et al. and
also as we know that a document contains many topics, regarding to that extracted
keywords or keyphrases should be deal with all the main topics contain in the doc-
ument, by taking inspiration from that author take topic model into consideration.
Detailed explanation occurs in [11] similar type of approach used in our research.
As we know there are so many different approaches present for keyword extraction,
here author Beliga et al. [12] present Survey of keyword extraction is elaborated
for supervised and unsupervised methods, graph-based, simple statistical methods,
etc. Selectivity-based keyword extraction method is proposed as a new unsuper-
vised graph-based keyword extraction method which extracts nodes from a complex
network as keyword candidates. Again another author Yan et al. [13] shown there
own approach related to keyphrase extraction as, the task of keyphrase extraction
usually conducts in two steps: (1) Extracting a bunch of words serving as candi-
date keyphrases and (2) Determining the correct keyphrases using unsupervised or
supervised approaches.
There is some research was done on classification of websites based on there
functional purposes, this research done by Gali et al. [14]. In there research, they
proposes novel method to classify websites based on their functional purposes. In

that, they try to classify a website is either as single service, brand or service directory.
For web data extraction, existing approaches use decoupled strategies—attempting
to do data record detection and attribute labeling in two separate phases, Here authors
Zhu et al. propose model in which both task done simultaneously. How exactly they
done that is seen in there paper as [15].
The rest of this paper is structured as follows: In Sect. 3, describes our method-
ology. Experimental results are presented in Sect. 4. At last, Sect. 5 concludes the
paper and describe any future work related with given research.
3 Methodology
This section first proposes a structure of our given model or architecture of system
shown in below figure, after that we described our workflow of given architecture
and what we used to perform in each phase of our architecture.
As shown in Fig. 2, first we required to gather data from web applications for
which we have to find purposes, using “Crawljax” [16]. Crawljax is one of the java
based web crawler used to extract whole web data in form of html states, DOMs,
result .json file containing Json data, states and screenshots of web pages in one
output folder that is used for in next phase of system.
Crawljax is an open source tool generally called web crawler. As it is open source
we get its jar file or code or maven file that need to run in specified framework. As
jar file can run using command prompt on operating system by providing specified
parameters mentioned in their readme file. That readme file we get while we download
its jar files. There are so many number of options we can provide as per need,
like states -s, depth -d, -waitAfterReload, -waitAfterEvent, override -o, etc. They
provide some initial vales to this options mention in readme file. There is compulsory
parameters are url of the page or website to which we want to crawl or want data and
another parameter is path of output folder where result is stored.
Files which extracted is parsed with the help of java based parser as named “Jsoup:
Java HTML parser” or we may use BeautifulSoup html parser. From that we can get
the text content of that application and saved that into text file or we can say that we get
our text data from which our actual work of inference engine is starts. Before move
further, we have to know about some basic idea about Jsoup parser, as name indicate
it is a parser used for parsing data from one form to another form, here we required
data in text format for finding keywords or keyphrases from any document or file.
That we can get with the help of document object Model(DOM) object and its various
features i.e., function or methods like—tElementById(), getElementByTag(), etc.
Text data which we get is need to be pre-processed. And we know that there are
lots of files we get from an application related to each state or web page. That text
data is pre-processed with using some libraries or API’s. There are so many NLP
(Natural Language Processing) libraries present for pre-processing of text data. In
pre-processing we perform task as,
Fig. 2 Concrete System

Design
• First lowercase all letters present in text.

• Remove punctuation or any other symbols present in text.
• Remove white spaces present in text and also join text content.
• Remove stopwords present in text, using stopwords present in English language
and in any library corpus package.
• Apply word tokenize API for tokanization of text data.
• if necessary then remove duplicate words present in text using set() function.
• if necessary apply word stemming and lemmatization process on text data which
we will get in above steps.
Some of these step are implemented in model. There are some more different pre-
processing task are available but in this research, used above pre-processing steps
for cleaning of text data.
After that do the clustering and find out the keywords related to each cluster.
(Here mostly we use K-means clustering algorithm or used unsupervised graph-
based keyword extraction technique). We also used some more keyword extraction
algorithm as, TextRank, Rake, Yake, Gensim summary pakage, LDA, TF-IDF, etc.
By using keywords, we try to find out purposes related to that keywords with the help
of purpose repository by applying some matching algorithm as flashtext API or using
regular expressions [17]. We get purpose related to each keyword and combination
of keywords present in purpose repository which is created manually.
3.1 Clustering Technique
As we know that there are many clustering techniques or algorithms are present but,
in this model, we use k-means clustering algorithm which is based on partitioning
method of clustering and that to unsupervised k-means is used [11]. After this opera-
tion we can get several different clusters contains number of word/s which are similar
with respect to centroid of cluster word in document. Then, we select keyphrases,
as “n” words nearer means some what similar to centroid of each clusters and pro-
vide value to variable “k,” as number of clusters we want to made. by using elbow
algorithm or method we can also find optimize cluster values. From that we find top
“n” keyphrases for every cluster, and we can decide value for “k” by taken into con-
sideration of length of the document that value for k will be find out by performing
number of experiments (on trial basis, default value for consider it as 2).
3.2 Graph Based Keyphrase Extraction
This method is generally used for unsupervised data. According to [11], they used
this approach by creating graphs from word and sentences by evaluating similarity
between them. In that three graphs are constructed as, sentence-to-sentence (s-s
graph), word-to-word (w-w graph) and sentence-to-word (s-w graph) graphs. All
graphs are built by using similarity between them, as cosine similarity between
sentences and words.
• For s-s graph, every sentence consists of several words so we construct word set for
sentences, by using it find out cosine between two vectors of sentences is similarity
between sentences and consider it as weight of edges and sentences as nodes of
s-s graph.
• For w-w graph, to find similarity between words, we need to convert that word into
its numerical values using word embedding or word-vector here we use fastText
word embedding. It is library build by Facebook’s AI Research lab, used for
text classification as well as learning of word embeddings. For obtaining vector
representation of w (words) model allows to construct learning algorithms they
may be unsupervised or supervised.
• For s-w graph, both above mentioned graphs are taken into consideration and try
to construct third graph by using word frequency and inverse sentence frequency
and formulate one formula similar to TF-IDF (Term Frequency-Inverse Document
Frequency) method. We get weight matrix for s-w graph.
For Keyword extraction, there are some algorithms which find keywords auto-
matically such as, TextRank, Rake, TF-IDF.
• TextRank algorithm is genrally work faster on small datasets respect to other two
algorithms. But it gives proper keywords as it is based on graph ranking criteria
similar to PageRank algorithm created by google to set websites ranking.
• Rake name indicates Rapid Automatic Keyword Extraction. It find out the
keyphrases from document without considering any other context. It produces
more keyphrases which are complicated too with having more information than
any single words.
• TF-IDF stands for Term Frequency-Inverse Document Frequency. Most of the
time, this algorithm is used with large number of documents in dataset. As name
indicates, it consider Inverse Document Frequency means for any given word in
one document, it will consider other number of documents containing that word
too for calculating word score or rank of word respect to particular document. One
more term is Term Frequency means count of word occur in particular document.
Together it comes up with one single value which shows how much important that
word in that document or simply gives rank of word respect to particular document.
4 Results
As research topic is very large, so here, we restrict our data with respect to web
applications, as websites related with as three different domain websites. Total we
collect 530+ or more number of webpages from websites as, banking, education and
hospital and health care domains. As we know webpage consist so much data as text
data, images, audio, videos, advertisement, etc. From that we extract text data which
was necessary for our work, like text data from body tag, title tag, form tags, etc. We
collect all that data and make text document of it and also convert that text data in
excel format data using excel.
For finding purpose of the webpage, first required purpose repository that is created
manually for this work. In that we take three columns (Domain, Keywords, Purpose).
In that according to our data, we put keywords and there purposes. If anyone add some
more domains or new webpages need to update that purpose repository manually.
Next phase of doing keyword extraction task. Before doing this we first apply
clustering algorithm to get similar words having same cluster. and then we get top
8–10 keywords related to each cluster. For keyword extraction there are so many
different algorithms are present some are automatic keyword extraction algorithm and
some of having graph-based, statistical-based, unsupervised, supervised algorithms
are here we try to use some of them and find out keywords and there purposes using
purpose repository.
Last phase to find purpose related with keywords from purpose repository using
matching technique, manually or handcrafted rules, or similarity measurement or
comparison between keywords which is getting previously with keywords and there
purpose present in purpose repository. One another method using VLOOKUP present
in excel sheet. Also we are using one flastext library, which is mainly used for search
and replace tasks. But in our case, we modify that task as search and match keywords
with there purpose. We use this because, between regular expressing and flashtext,
flashtext is work faster than regular expression [17].
We can also get the word cloud and word frequency graphs or plots shown in next
images and final result or output also given at last (Figs. 3 and 4).
Fig. 3 Word frequency graph and word cloud
Fig. 4 Final result from sample data

5 Conclusion
Nowadays, privacy management plays more important role for enterprises. The main
objective of it to address customers privacy by considering customers preferences
and rights. It is important to consider the data subjects consent and data requesters
purpose if the specific person wants information from any organization.
In this research work, we conducted a survey of existing techniques and tools
for text analysis, and studied the drawback and limitation of the exiting tech-
nique and tools. So, to understand purposes related to each fields or screen/s with
respect to given application data we implemented a model which temporary ful-
fill our requirement, which will address all the exiting challenges against pur-
pose extraction to enable purpose based processing with the help of keyphrases,
words or keywords and simultaneously provides the efficient purpose repository
from which the finding relationship of keywords and there purposes is more
easy. And also try to get all such matched purpose/s from application screen to
develop purpose-based processing. We also used clustering algorithms for find-
ing similar words or keywords according to topic in a document and then by
combining them get there purposes too, as shown in final result image in result
section.
5.1 Future Work
There is lot of scope to do future research in this area. As of now, I did not get
so much related research work specified with this topic. Any one will want to do
some more research in this area may can extend scope of research by considering
some more different type of data (accordingly need to add there purposes in purpose
repository). Another task one can perform as, try to make purpose repository auto-
matic in nature. Also come up with new approach related to extraction, matching and
selection of keywords and there purposes regarding application and one will make
system more scalable for various different types of applications (now we consider
webpages from some websites, which came into web application type) some more
application type as: desktop application, mobile application, gaming application,
etc., or we can say it as, by adding more different domain data. Added to that, we
try to encourage new researchers by saying some words, “any research idea will
come up in any one’s mind so just read more relevant work and get new ideas for
research in this domain or any domain.” For that purpose, we add some future scope
as above.
References
1. Mont MC, Thyne R (2006) A systemic approach to automate privacy policy enforcement in
enterprises. In: Privacy enhancing technologies 6th international workshop, PET 2006, Cam-
bridge, UK, 28–30 June 2006. Revised Selected papers
2. Mont MC, Thyne R, Bramhall P (2005) Privacy enforcement with HP select access for regula-
tory compliance. Technical report, Technical Report HPL-2005-10, HP Laboratories Bristol,
Bristol, UK
3. Mont MC (2004) Dealing with privacy obligations in enterprises. In: ISSE 2004-securing
electronic business processes. Springer, pp 198–208
4. Agrawal R, Kiernan J, Srikant R, Xu Y (2002) Hippocratic databases. In: VLDB’02: proceed-
ings of the 28th international conference on very large databases. Elsevier, pp 143–154
5. Barbosa D, Wang H, Yu C (2015) Inferencing in information extraction: techniques and appli-
cations. In: 2015 IEEE 31st international conference on data engineering. IEEE, pp 1534–1537
6. Goldsteen A, Kveler K, Domany T, Gokhman I, Rozenberg B, Farkash A (2015) Application-
screen masking: a hybrid approach. IEEE Softw 32(4):40–45
7. Kaddu MR, Kulkarni RB (2016) To extract informative content from online web pages by using
hybrid approach. In: 2016 International conference on electrical, electronics, and optimization
techniques (ICEEOT). IEEE, pp 972–977
8. Parvathi P, Jyothis TS (2018) Identifying relevant text from text document using deep learning.
In: 2018 International conference on circuits and systems in digital enterprise technology
(ICCSDET). IEEE, pp 1–4
9. Ravindranath VK, Deshpande D, Girish KV, Patel D, Jambhekar N, Singh V (2019) Infer-
ring structure and meaning of semi-structured documents by using a gibbs sampling based
approach. In: 2019 International conference on document analysis and recognition workshops
(ICDARW), vol 5. IEEE, pp 169–174
10. Singh S, Karwayun R (2010) A comparative study of inference engines. In: 2010 Seventh
international conference on information technology: new generations. IEEE, pp 53–57
11. Yan Y, Tan Q, Xie Q, Zeng P, Li P (2017) A graph-based approach of automatic keyphrase
extraction. Procedia Comput Sci 107:248–255
12. Beliga S, Meštrović A, Martinčić-Ipšić S (2015) An overview of graph-based keyword extrac-
tion methods and approaches. J Inf Organ Sci 39(1):1–20
13. Ying Y et al (2017) A graph-based approach of automatic keyphrase extraction. Procedia
Comput Sci 107:248–255
14. Gali N, Mariescu Istodor R, Fränti P (2017) Functional classification of websites.In: Proceed-
ings of the eighth international symposium on information and communication technology
15. Zhu J et al (2006) Simultaneous record detection and attribute labeling in web data extraction.
In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery
and data mining
16. Mesbah A, Van Deursen A, Lenselink S (2012) Crawling ajax-based web applications through
dynamic analysis of user interface state changes. ACM Trans Web (TWEB) 6(1):1–30
17. analyticsvidhya. https://www.analyticsvidhya.com/blog/2017/11/flashtext-a-library-faster-
than-regular-expressions/. Last accessed 7 Dec 2017
18. Zhao Y, Li J (2009) Domain ontology learning from websites. In: 2009 Ninth annual interna-
tional symposium on applications and the internet. IEEE, pp 129–132
19. Gao R, Shah C (2020) Toward creating a fairer ranking in search engine results. Inf Process
Manage 57(1):102138
20. Lindemann C, Littig L (2007) Classifying web sites. In: Proceedings of the 16th international
conference on World Wide Web
21. Qi X, Davison Brian D (2009) Web page classification: features and algorithms. ACM Comput
Surv (CSUR) 41(2):1–31
Cotton Price Prediction and Cotton
Disease Detection Using Machine
Learning
Priya Tanwar, Rashi Shah, Jaini Shah, and Unik Lokhande
Abstract Agricultural productivity is something on which the economy extensively

depends. This is one reason why price prediction and plant disease detection play an
important role in agriculture. The proposed system is an effort to predict the price of
cotton and classify it as fresh or sick as accurately as possible for the benefit of all
those who depend on it as a source of income, be they farmers or traders. The main
goal of the created system is the prevention of losses and the boost of the economy.
For this purpose, we have proposed a website in which both the price prediction
using LSTM algorithm and disease detection by CNN algorithm are implemented.
Keywords Machine learning · LSTM · CNN · Classification · Disease detection ·

Price prediction · Agriculture
1 Introduction
Agriculture, the main occupation of India, seems to cover about 70% of the busi-
ness including primary and secondary business that are completely dependent on
agriculture. Market arrival time of any crop plays a prominent role in the crop price
for farmers. Talking about cotton crop specifically, a large number of people tend
to generate dependency on cotton crops, by any of the processes involved in cotton
crops. The increase in demand of cotton at domestic as well as international level
has inclined productivity towards mission-oriented purposes in recent times [1].
However, it is not easy to predict the price and the leaf disease of cotton due to high
fluctuation because of various factors such as weather conditions, soil type, rainfall,
etc. Thus, showing the necessity for the price prediction and disease detection of
cotton crops. Implementing such systems add on to the revenue for the farmers as
P. Tanwar · R. Shah · J. Shah · U. Lokhande (B)

Department of Information Technology, Fr. Conceicao Rodrigues College of Engineering,
Mumbai, India
e-mail: unik.lokhande@fragnel.edu.in
https://doi.org/10.1007/978-981-16-7610-9_9
116 P. Tanwar et al.
well as the country. Thus, having a robust automated solution, especially in devel-
oping countries such as India, not only aids the government in taking decisions in a
timely manner but also helps in positively affecting the large demographics.
Various methodologies have been brought into use to forecast the retail/wholesale
value of agricultural commodities, such as regression, time series and automatic
learning methods. The auto-regressive method and the vector auto-regressive moving
average model of the regression method, tend to predict agricultural commodity
prices by taking into consideration the various factors affecting them [2]. The study
of changes in agricultural product prices is both intriguing and significant from the
standpoint of the government.
We know that manually recorded data is prone to human-caused errors, such as no
or incorrect data reported on a specific day. With new pricing data entering every day
for ML/DL-based models, updating the models may generate stability concerns due
to crop price data quality issues. The data for price prediction in the price prediction
module is of the continuous type, hence it falls under a regression model. The prices
can be determined by recognizing different patterns of the training dataset which is
then passed as the input to the algorithm.
Diseases cause severe effects on plants which are, in the final analysis, the natural
factor. For example, it reduces the overall productivity. The identification and accu-
rate classification of leaf diseases are essential for preventing agricultural losses.
Different leaves of plants have different diseases. Viruses, fungi and bacteria are the
prominent categories for leaf disease [3]. In accordance with this, plants are being
isolated from their normal environment and grown in unique settings. Many impor-
tant crops and plants are highly susceptible to illness. Plant diseases have an impact
on plant development and yield, which has a socio-biological and monetary impact
on agriculture. Plant diseases are one of the ecological elements that contribute to
the coexistence of live plants and animals. Plant cells primarily strengthen their
defences against animals, insects, and pathogens via signalling pathways contained
within them. With careful care, humans have selected and cultivated plants for food,
medicine, clothing, shelter, fibre and beauty for thousands of years. As a result,
monitoring regional crop diseases is critical for improving food security.
Cotton is a drought-resistant crop that delivers a consistent income to farmers who
cultivate in climate-change-affected areas. To detect these cotton leaf diseases appro-
priately, the prior knowledge and utilization of several image processing methods
and machine learning techniques are helpful. In leaf disease detection, we focus on
the number of predictions which can be classified correctly and thus, it falls under
the classification model. There are various classification models that can be used for
detection of disease in a leaf. Classification models are evaluated on the basis of the
results. Thus, this system will become handy for the farmers cultivating cotton to
know the diseased cotton plant and also the price trends for the same.
Cotton Price Prediction and Cotton Disease … 117
2 Literature Review
2.1 Price Prediction
In this work [4], the approach focuses on the development of a precise forecasting
model for wheat production using LSTM-NN, which is very precise when it comes to
the forecasting of time series. A comparison is also provided between the proposed
mechanism and some existing models in the literature. The R value obtained for
LSTM is 0.81. The results obtained for this system can achieve better results in the
forecasts and, whilst grain production will accelerate over a decade, the output ratio
will keep on decelerating and pose a threat to the economy as a whole.
In this paper [5], the authors presented a comparative study of LSTM, SARIMA
and the seasonal Holt-Winter method for predicting walnut prices. Arecanuts price
data on a monthly basis for 14 districts of Kerala was taken from Department of
Economics and Statistics of Kerala. The RMSE values for LSTM for non-stationary
data were 146.86 and for stationary data it was 7.278, the ARIMA S value was
16.5, and the Holt-Winter value was 18.059. It was concluded that the LSTM neural
network was the best model that fit the data.
In this article [6], the main aim of the researchers here is to help farmers by
focussing on profitable growing of vegetables by developing an Android application
in Sri Lanka. The collected data set is divided into 3 parts, in the ratio of 8:1:1
which implies 80% data was used for training, 10% was kept for testing and the
remaining 10% as validation. The model is then created using LSTM RNN for
vegetable forecasting and ARIMA for price forecasting.
In this study [7], the researcher suggests a prediction model for the price of
vegetables that uses the pre-processing method of season-trend-loess (STL) and
long-short term memory (LSTM). In order to predict monthly vegetable prices, the
model used vegetable price data, meteorological data from major producing districts,
and other data. For this system, the model was applied to Chinese cabbage and radish
on the Korean agricultural market. From the performance measurement results, it was
observed that the suggested model of vegetable price forecast had predicted accuracy
of 92.06% and 88.74%, respectively, for cabbage and radish in China.
In this article [8], the researchers suggest the STL-ATTLSTM model, which
complements the decomposition of seasonal trends that uses the Loess Pre-Treatment
Method (STL) and (LSTM). In this system, STL-ATTLSTM model is used for
predicting vegetable prices based on monthly data using different forms of data. The
LSTM attention model has improved predictive accuracy by about 4–5% compared
to the LSTM model. The combination of the LSTM and STL (STL-LSTM) has
reached predictive accuracy of 12% higher than the LSTM attention model. The
STL-ATTLSTM model outperforms other models, having 380 as the RMSE value
and MAPE as 7%.
In this paper [9], the authors have presented an artificial intelligence based solu-
tion to predict future market trends based on the time series data of cotton prices
collected since 1972. The datasets are evaluated using various models like moving
average, KNN, auto-arima, prophet and LSTM. After comparison, LSTM model was
concluded to be the best fit with RMSE value of 0.017 and an accuracy of 97%.
In paper [10], the authors presented an user-friendly interface to predict crop prices
and forecast prices for the next 12 months. The data containing the whole price index
and rainfall of various Kharif and Ragi crops like wheat, barley, cotton, paddy etc.
was collected and trained on 6 different algorithms out of which supervised machine
learning algorithm called Decision Tree Regressor was the most accurate with RMSE
value of 3.8 after the comparison.
In this paper [11], the researchers presented a comparative survey of different
machine learning algorithms to predict crop prices. The data consisting of prices
of fruits, vegetables and cereals was collected from the website of the Agricul-
tural Department of India. Random Forest Regressor was concluded as the optimal
algorithm with an accuracy of 92% as compared to other algorithms like Linear
Regression, Decision Tree Regressor and Support Vector Machine.
In paper [12], the researchers have proposed a web-based automated system to
predict agricultural commodity price. In the two series experiments, machine learning
algorithms such as ARIMA, SVR, Prophet, XGBoost and LSTM have been compared
with large historical datasets in Malaysia and the most optimal algorithm, LSTM
model with an average of 0.304 mean square error has been selected as the prediction
engine of the proposed system.
In paper [13], the authors present techniques to build robust crop price predic-
tion models considering various features such as historical price and market arrival
quantity of crops, historical weather data that influence crop production and trans-
portation, data quality-related features obtained by performing statistical analysis
using time series models, ARIMA, SARIMA and Prophet approaches.
In paper [14], the researchers have proposed a model that is enhanced by applying
deep learning techniques and along with the prediction of crop. The objective of
the researchers is to present a python-based system that uses strategies smartly to
anticipate the most productive reap in given conditions with less expenses. In this
paper, SVM is executed as machine learning algorithm, whilst LSTM and RNN are
used as Deep Learning algorithms, and the accuracy is calculated as 97%.
2.2 Disease Detection
Prajapati et al. [15] presented a survey for detecting and classifying diseases present
in cotton assisted with image processing and machine learning methodologies. They
also investigated segmentation and background removal techniques and found that
RGB to HSV colour space conversion is effective for background removal. They
also concluded that the thresholding technique is better to work with than the other
background removal techniques. The data set included about 190 pictures of various
types of diseases spotted clicked by Anand Agricultural University for classifying
and detecting the type of infection. Performing colour segmentation with masking
the green pixels in the image removed from the background, the otsu threshold on
the fetched masked image in order to obtain a binary image was applied. It was
concluded from the results that SVM provides quite good accuracy.
Rothe et al. [16], a system that identifies and classifies the diseases that cotton
crop deals with, generally, such as Alternaria, leaf bacterial and Myrothecium was
presented. The images were obtained from fields of cotton in Buldhana and Wardha
district and ICRC Nagpur. The active contour model (snake segmentation algorithm)
is used for image segmentation. The images of cotton leaf detected with disease were
classified using the posterior propagation neural network in which training was done
by the extraction of seven invariant moments from 3 types of images for a diseased
leaf. The mean classification accuracy was 85.52%.
In this article [17], the author has developed an advanced processing system
capable of identifying the infected portion of leaf spot on a cotton plant by imple-
menting the image analysis method. The digital images were obtained with the help
of a digital camera of a mobile and enhanced after segmentation of the colour images
using edge detection technologies such as Sobel and Canny. After thorough study,
homogeneous pixel counting technique was used for the image analysis and disease
classification of Cotton Disease Detection Algorithm.
In this article [18], the researchers carried out detection of leaf diseases assisted
with a neural network classifier. Various kinds of diseases like, target leaf spot,
cotton and tomato leaf fungal diseases and bacterial spot diseases were detected.
The segmentation procedure is performed by k-means grouping. Various character-
istics were extracted and provided as inputs to the ANN. The average accuracy of
classification for four types of diseases is 92.5%.
In this work [19], researchers have an approach to accurate disease detection,
diagnosis and timely management to avoid severe losses of crops. In this proposal,
the input image pre-processing by using histographic equalization is initially applied
to increase contrast in the low-contrast image, the K-means grouping algorithm
that is used for segmentation, it is used to classify the objects depending upon a
characteristic set into number of K classes and then classification occurs through
the Neural-Network. Imaging techniques are then used to detect diseases in cotton
leaves quickly and accurately.
In paper [20], the authors have compared various deep learning algorithms such
as SVM, KNN, NFC, ANN, CNN and realized that CNN is 25% more precise in
comparison to the rest after which they compared the two models of CNN which
were GoogleNet and Resnet50 for examining the lesions on the cotton leaves. They
finally concluded Resnet50 to have an edge over GoogleNet proving it to be more
reliable.
Paper [21] comprises the authors conducting cotton leaf disease detection as
well as suggesting a suitable pesticide for preventing the same. The proposed
system implemented Cnn algorithm and with the use of keras model and appropriate
processing layers built a precise system for disease detection.
Paper [22] is an extensive comparative analysis for detection of organic and nonor-
ganic cotton diseases. It consists of information about various diseases and an advis-
able method to detect that disease in its initial stage only. Different algorithms survey
is also discussed along with their efficiencies as well as pros and cons to recognize
the most apt one. It is nothing but an in depth analysis and comparison of quite a lot
of techniques.
3 Price Prediction
3.1 Dataset
This system is based on statistical data that has been obtained from the data released
by the Agriculture Department, Government of India almost every year from their
website data.gov.in [23]. The daily market prices of cotton include information about
the state, district, market in that district, variety of cotton grown, the arrival date of
cotton produce, minimum price, maximum price and modal price of cotton in the
market.
3.2 Long Short Term Memory (LSTM) Algorithm
It is a deep learning model requiring a large data set. The architecture of the LSTM
model is well suited for prediction systems due to the presence of lags of the important
events in time series for unknown duration. A unit cell of the LSTM model has an
input gate, output gate and forget gat entrance, an exit port and a forgotten door. Input
gate handles the amount of the information needed to flow in the current cell state with
the help of point wise multiplication of sigmoid and tanh in the order, respectively.
Output gate takes the charge of decision making for the information that needs to
be passed to the following hidden state. The information from the previous cell that
need not be remembered is decided by the forget gate.
3.3 Methodology
Firstly, in this system, the data set is loaded. Then, the pre-processing of data is done
where necessary filtration is carried out. The boxplots for min, max and modal prices
are plotted in order to understand the outliers present in the data. In order to avoid the
data inconsistency, the values lying in the outliers are dropped. The Sklearn module
performs the pre-processing of data. The prices columns are taken into consideration
and the arrival date column of the data is transformed as the index of the data which is
converted to datetime format using pandas framework. The dataset is then arranged
in the ascending order of the arrival date. This step is then, followed by visualizing
the data set.
This model is trained over the dataset which is further divided into the training
and testing data in the ratios of 80:20. The train_test_split of the sklearn module is
used for splitting the dataset. The data is scaled using the MinMaxScaler. 5 hidden
layers are used in the process of training the model. The model consists of dividing
the data set into small batches and the error is calculated by epoch. Keras sequen-
tial model is used for evaluation. For this system, the model is trained against 200
epochs and batch size taken is 32. The optimizers used for the system are Adam
optimizer, RMSPROP and AdaDelta optimizer. The objective here is to predict the
prices of cotton crops. Thus, various optimizer’s results are compared in order to
decide the one that fetches the best output. The training and validation graph is
plotted for the data. Then, for the prediction model, the price prediction graph is
plotted. The graphs are plotted to increase the ease of understanding. For the purpose
of plotting graphs, matplot library is used for visualization. Mean squared error is
used as a loss function, whilst dealing with the keras. Further, error metrics are
calculated in order to understand the performance of the model. For calculating the
error metrics, math library, mean_squared error, mean_absolute error, max_error,
r2_score, explained_variance_score and median_absolute_error are imported, and
each of these errors are calculated for the testing data and the predicted data.
3.4 Results
Figure 1 describes the results obtained on testing the dataset model for the modal
price prediction of cotton with Adam optimizer for which batch size was taken as
32 and to calculate error 200 epochs were considered. The green line represents the
actual price of cotton crop and the red line indicates the predicted price of the crop.
It is seen that the graph follows the trend throughout.
Fig. 1 LSTM model prediction using Adam optimizer

Accuracy. Figure 2 shows the curve for the training loss versus the validation loss
graph using the Adam optimizer for LSTM model. The training data is represented
by blue line, and the line graph in red is for the validation data. From the graph, it
can be seen that the values converge during training. The data is neither overfitting
nor underfitting for this model.
In Table 1, we can view the performance by taking into consideration various
accuracy parameters for the LSTM model for this system. We have taken 3 different
optimizers in order to check the one that gives the best result. The values obtained
Fig. 2 Training loss versus validation loss graph using Adam optimizer
Table 1 Accuracy metrics

Accuracy LSTM
ADAM RMSPROP AdaDelta
(optimizer) (optimizer) (optimizer)
Root mean square 184.52 209.12 323.74
error
R2 score 0.8304 0.7822 0.4780
Explained variance 0.8304 0.8181 0.6428
score
Max error 1771.18 1804.04 1753.84
Mean absolute error 104.04 154.442 277.47
Mean squared error 34,047.95 43,731.64 104,805.51
Median absolute 49.30 146.14 290.33
error
Mean squared log 0.0016 0.0019 0.0045
error
for each accuracy measure are compared, and the best values are considered. After
training and testing process of the models for the same dataset, these values were
calculated. It can be inferred from the comparison that LSTM that uses Adam opti-
mizer outperforms the LSTM model that uses other optimizers for all the values
obtained. Thus, it can be said that LSTM model that uses Adam optimizer is better
suited for the price prediction of cotton crop for this system.
4 Disease Detection
4.1 Dataset
The initial step is to collect data from the public database, considering an image as
an input. The most popular image domains have been acquired, so any format can
be used as batch input, for example .bmp, .jpg or .gif. The dataset comprises of 1951
images as training, 106 as testing and 253 as validation datasets. The dataset has
four kinds of images in each of the categories, that is, diseased cotton leaf, diseased
cotton plant, fresh cotton leaf and fresh cotton plant.
4.2 Convolutional Neural Network (CNN)
A convolutional neural network (ConvNet/CNN) is a deep learning system that

accepts an input image and assigns importance (weights and learnable biases) to
different characteristics/objects in the image, allowing them to be distinguished when
compared. In comparison with other classification techniques, ConvNet requires very
little pre-processing. Whilst filters are designed by hand in primitive approaches,
ConvNets has the potential to learn these filters/features with enough training. The
architecture of a ConvNet is encouraged from that of the visual cortex and is compa-
rable to that of the neuron connectivity model of human brain. Individual neurons
respond to stimuli in a unique way.
Essentially, a CNN works by conducting various convolutions in the network’s
various layers. This results in various representations of the learning data, beginning
with the most generic in the initial layers and progressing to the most detailed in the
deeper ones. Since the lowering of the size of the convolutional layers, they operate
as a form of extractor of attributes. The dimensionality of the input data divides it
into layers.
4.3 Methodology
There are different types of diseases such as:

• Bacterial diseases: “Bacterial leaf spot” is the common name for a bacterial illness.
Starting as little, yellow-green lesions on young leaves that resemble warped and
twisted leaves, or as black, damp, greasy lesions on older foliage.
• Viral diseases: The most evident symptoms of virus-infected plants can be seen on
the leaves, but they can also be seen on the leaves, fruits, and roots. The sickness
is caused by a virus that is difficult to diagnose. Due to the virus, the leaves seem
wrinkled and curling, and the growth may be small.
• Fungal diseases: With the help of wind and water, fungal infections can damage
contaminated seed, soil, yield, weeds, and propagation. It lightens like grey-green
spots soaked by water and is easily recognizable at the bottom or as it becomes
more seasoned. It causes the leaf’s surface to turn yellow as it spreads inward
[24].
Gathering data and separating it into training, testing and validation datasets is the
first step. The training data set should account for roughly 80% of the total labelled
data. The information would be used to train the model to recognize various types
of photos. The validation data set must contain roughly 20% of the total labelled
data, and it is used to see how well our system identifies known labelled data. The
remaining unlabelled data would make up the testing data set. This data will be used
to see how well our system will classify data it has never seen before.
After importing the necessary libraries, we move on to setting the dimensions of
the image and found 224, 224 pixels best suitable for our system. All image pixels
are then converted to their equivalent numpy array and stored for further use. We
then define the path where all the images are stored.
We start by defining our machine’s epoch and batch size. This is a critical phase
pertaining to neural networks. The epochs were set to 7, and the batch size was
set at 50. The VGG16 model we chose for our system must now be loaded. This
involves importing the convolutional neural network’s transfer learning component.
Transfer learning is simple to utilize because it already has neural networks and other
important components that we would have to develop otherwise. Various transfer
learning models exist. VGG16 was chosen since it only has 11 convolutional layers
and is simple to use. We then used VGG16 to set the weights and characteristics.
After then, the process of building the CNN model begins. The first step is to
use sequential model to define the model. We then flatten the data and add three
more hidden layers. We have a variety of models with various drop outs, hidden
layers and activation. Because the data is labelled, the final activation must mostly
be softmax. After that, we try fitting our training and validation data to our model
using the requirements we specified previously. Finally, we create an evaluation phase
to compare the accuracy of our model training set to that of the validation set.
We then evaluated the classification metrics and created the confusion matrix. To
use classification metrics, we converted our testing data into a numpy array, to read.
Table 2 Classification
Category Precision (%) Recall (%) f 1-score (%)
accuracy metrics
Diseased cotton 100 80 89
leaf
Diseased cotton 96 89 93
plant
Fresh cotton leaf 81 100 90
Fresh cotton 93 96 95
plant
4.4 Results
The classification accuracy metrics report for CNN algorithm was generated with
the values as shown in Table 2. To find the accuracy metrics, we first convert the
testing data into a numpy array to read.
A confusion matrix works best with the help of a dataframe, so the created numpy
array is first converted into a dataframe. A normalized confusion matrix was also
found with an indication of well computed accuracies as shown in Fig. 3.
The training accuracy versus validation accuracy and training loss versus vali-
dation loss graphs were also plotted as shown in Figs. 4 and 5, respectively, which
depicted neither underfitting nor overfitting of data and showed optimal results.
The plot represents a good fit due to the following reasons:
• The training loss plot decreases until a point of stability.
• The validation loss plot decreases up to a point of stability and a small gap exists
with the training loss plot.
Fig. 3 Confusion matrix

Fig. 4 Training versus

validation accuracy
Fig. 5 Training versus

validation loss
5 Conclusion
Agriculture contributes about 20% to India’s GDP, which plays an important role in
India’s economy and employment, so we need to make sure that this segment does
not lose. Hence, a machine learning based system consisting of price prediction and
disease detection modules was created with the ambition of benefiting the society to
the best possible capacity. The novelty of the proposed system is that it is an integrated
one consisting of both price prediction along with disease detection which does not
exist at present based on the research done. Such a system is of immense utility and
benefit to the actual users.
The price prediction module was implemented using the LSTM algorithm and
different optimizers were used to compare the results. LSTM with ADAM is the best
optimizer with RMSE value of 184.52, whilst RMSProp and AdaDelta have RMSE
values of 209.12 and 323.74, respectively.
Also, the disease detection module used CNN algorithm to classify the cotton
plants and leaves as fresh or diseased possessing an accuracy of 91.5%.
References
1. Batmavady S, Samundeeswari S (2019) Detection of cotton leaf diseases using image

processing. Int J Rec Technol Eng (IJRTE) 8(2S4). ISSN: 2277-3878
2. Weng Y, Wang X, Hua J, Wang H, Kang M, Wang F-Y (2019) Forecasting horticultural products
price using ARIMA model and neural network based on a large-scale data set collected by web
crawler. IEEE Trans Comput Soc Syst 1–7, 6(3)
3. Weizheng S, Yachun W, Zhanliang C, Hongda W (2008) Grading method of leaf spot based
on image processing. In: Proceeding of the 2008 international conference on computer science
and software engineering (CSSE). Washington, DC, pp 491–494
4. Haider SA, Naqvi SR, Akram T, Umar GA, Shahzad A, Sial MR, Khaliq S, Kamran M (2019)
LSTM neural network based forecasting model for wheat production in Pakistan. Agronomy
5. Sabu KM, Manoj Kumar TK (2019) Predictive analytics in agriculture: forecasting prices
of arecanuts in Kerala. In: Third international conference on computing and network
communications (CoCoNet)
6. Selvanayagam T, Suganya S, Palendrarajah P, Manogarathash MP, Gamage A, Kasthurirathna
D (2019) Agro-genius: crop prediction using machine learning. Int J Innov Sci Res Technol
4(10). ISSN No: 2456-2165
7. Jin D, Gu Y, Yin H, Yoo SJ (2019) Forecasting of vegetable prices using STL-LSTM method.
In: 6th International conference on systems and informatics (ICSAI 2019), p 48
8. Yin H, Jin D, Gu YH, Park CJ, Han SK, Yoo SJ, STL-ATTLSTM: vegetable price forecasting
using STL and attention mechanism-based LSTM. Agriculture 10(12):612
9. Gayathri G, Niranjana PV, Velvadivu S, Sathya C (2021) Cotton price prediction. Int Res J
Modernization Eng Technol Sci 3(4)
10. Dhanapal R, AjanRaj A, Balavinayagapragathish S, Balaji J (2021) Crop price prediction using
supervised machine learning algorithms. ICCCEBS 2021, J Phys Conf Ser
11. Gangasagar HL, Dsouza J, Yargal BB, Arun Kumar SV, Badage A (2020) Crop price prediction
using machine learning algorithms. Int J Innov Res Sci Eng Technol (IJIRSET) 9(10)
12. Chen Z, Goh HS, Sin KL, Lim K, Chung NK, Liew XY (2021) Automated agriculture
commodity price prediction system with machine learning techniques. Adv Sci Technol Eng
Syst J 6(2):XX–YY
13. Jain A, Marvaniya S, Godbole S, Munigala V (2020) A framework for crop price forecasting
in emerging economies by analyzing the quality of time-series data. arXiv:2009.04171v1
[stat.AP]. 9 Sept 2020
14. Agarwal S, Tarar S (2020) A hybrid approach for crop yield prediction using machine learning
and deep learning algorithms. J Phys Conf Ser 1714. In: 2nd International conference on smart
and intelligent learning for information optimization (CONSILIO), 24–25 Oct 2020, Goa, India
15. Prajapati BS, Dabhi VK, Prajapati HB (2016) A survey on detection and classification of cotton
leaf diseases. 978-1-4673-9939-5/16/$31.00 ©2016 IEEE
16. Rothe PR, Kshirsagar RV (2015) Cotton leaf disease identification using pattern recognition
techniques. In: 2015 International conference on pervasive computing (ICPC). 978-1-4799-
6272-3/15/$31.00(c)2015 IEEE
17. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image
processing edge detection techniques. In: International conference on emerging trends in
science, engineering and technology. IEEE. ISBN: 978-1-4673-5144-7/12/$31.00
18. Kumari CU, Jeevan Prasad S, Mounika G (2019) Leaf disease detection: feature extraction
with K-means clustering and classification with ANN. IEEE. https://doi.org/10.1109/ICCMC.
2019.8819750
19. Warne PP, Ganorkar SR (2015) Detection of diseases on cotton leaves using K-mean clustering
method. Int Res J Eng Technol (IRJET) 2(4). e-ISSN: 2395-0056
20. Caldeira RF, Santiago WE, Teruel B (2021) Identification of cotton leaf lesions using deep
learning techniques. Sensors 21(9):3169
21. Suryawanshi V, Bhamare Y, Badgujar R, Chaudhary K, Nandwalkar B (2020) Disease detection
of cotton leaf. Int J Creat Res Thoughts (IJCRT) 8(11)
22. Kumar S, Jain A, Shukla AP, Singh S, Raja R, Rani S, Harshitha G, AlZain MA, Masud M
(2021) A comparative analysis of machine learning algorithms for detection of organic and
nonorganic cotton diseases. Hindawi Math Probl Eng 2021, Article ID 1790171
23. https://data.gov.in/
24. Saradhambal G, Dhivya R, Latha S, Rajesh R (2018) Plant disease detection and its solution
using image classification. Int J Pure Appl Math 119(14):879–884. ISSN: 1314-3395
Acute Leukemia Subtype Prediction
Using EODClassifier
S. K. Abdullah, S. K. Rohit Hasan, and Ayatullah Faruk Mollah
Abstract Leukemia is a type of blood cancer having two major subtypes—acute

lymphoblastic leukemia and acute myeloid leukemia. A possible cause of leukemia
is the genetic factors of a person. Machine learning techniques are being increas-
ingly applied in analyzing the relation between gene expression and genetic diseases
such as leukemia. In this paper, we report prediction of leukemia subtypes from
microarray gene expression samples using a recently reported ensemble classifier
called EODClassifier. Across multiple cross-validation experiments, classification
accuracy of over 96% is obtained which reveals consistent performance and robust-
ness. It is also demonstrated that like other popular classifiers, the EODClassifier is
also performing well in leukemia prediction.
Keywords Data classification · Leukemia gene expression · Feature selection ·

Ensemble approach · EODClassifier
1 Introduction
Leukemia is a group of cancers related to blood cells. Acute leukemia is of two

types, i.e., acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML).
Genetic factor is believed to be a possible cause of leukemia [1]. Hence, besides hema-
tological diagnosis, investigation with microarray gene expression data samples of
such subjects is also being carried out in recent times. Maria et al. [2] have reported
five machine learning algorithms for diagnosis of leukemia, i.e., support vector
machines (SVM), neural networks (NN), k-nearest neighbors (KNN), naïve Bayes
(NB) and deep learning. Performance of these algorithms have been compared as
well as their merits and demerits have been pointed out. Joshi et al. [3] have worked
S. K. Abdullah (B) · A. F. Mollah

Department of Computer Science and Engineering, Aliah University, IIA/27 New Town, Kolkata
700160, India
S. K. R. Hasan
Infosys Ltd, Kharagpur, West Bengal 721305, India
https://doi.org/10.1007/978-981-16-7610-9_10
130 S. K. Abdullah et al.
with blood slide images for detection of leukemia employing feature selection and
classification. They reported 93% accuracy with KNN. Subhan et al. [4] applied a
similar approach where they segment blood cell images, extract features and clas-
sify. Visual cues inspired feature extraction approach from segmented cells is also
followed in [5–7]. Neural network-based classification is also followed in similar
experiments [8, 9]. A comparative study of such classification algorithms is made
in [10]. Recently, deep learning approaches are also being explored. Sahlol et al.
[11] have presented a hybrid approach for leukemia classification from white blood
cells by using VGGNet and bio-inspired salp swarm algorithm. Unlike blood cell
images, bone marrow images have also been used in convolutional neural network
for leukemia prediction [12].
Since the introduction of microarray gene expression leukemia dataset by Golub
et al. [13], investigation into genetic root of leukemia is being increasingly studied.
Besides prediction of leukemia subtypes from a subject, identification of the
associated gene(s) has become a prime interest. A recently developed classifier
called EODClassifier [14] integrates discriminant feature selection and classifica-
tion following an ensemble approach. Using this classifier, one can select top n
discriminant features for training and prediction. In this paper, two acute leukemia
subtypes, i.e., ALL and AML are predicted using EODClassifier from microarray
gene expression [13]. Multi-fold experiments revealed consistently high performance
and robustness in leukemia prediction.
2 Overview of EODClassifier
In this section, a brief introduction of the EODClassifier is presented. Conceptualized

by Hasan et al. [14], it applies an ensemble approach to predict a recall sample. Here,
every individual feature having a fitness value makes a decision that are combined to
make the final prediction. Besides that, it provides an option to select top n features for
training and prediction. Thus, it integrates feature selection with pattern classification
which may be preferred in certain applications. On the other hand, it is a faster way
of classification as it compares the expression for likelihood instead of computing
the probabilities to make a decision. Code installation and guidelines for using this
classifier are available at [15]. Below, we briefly discuss how it works.
EODClassifier has two parameters, i.e., p = degree and nof = number of features.
According to the fitness values, top n of number of features will be taken for training
and prediction. If one prefers to use all the features, it can be done as nof ‘all’. Here,
p is a parameter subject to tuning.
eod = EODClassifier(nof=5, p=2)
By default, nof = ‘all’ and p = 1. Given a training set of samples X_train with
labels y_train, training can be done as
Acute Leukemia Subtype Prediction Using EODClassifier 131
eod.fit(X_train,y_train)
and test samples X_test can be predicted as
y_pred = eod.predict(X_test)
where y_pred is an array of predicted classes for the test samples. Subsequently,
standard evaluation measures can be applied to quantize the classification perfor-
mance. As of now, it supports binary classification which works for two classes only.
Multi-class support is not available.
3 Methodology
The presented system applies a supervised approach to leukemia subtype prediction.

It requires a collection of AML type and ALL type samples for training. Hence, a
dataset needs to be divided into training and test sets in some ratio. The training
samples are passed to any suitable pattern classifier such as the EODClassifier for
training and the test samples without their class labels are passed to the trained model
of that classifier for prediction. Later, the class labels of the test samples are used to
quantify the prediction performance. Working methodology of the system is shown
in Fig. 1.
3.1 Leukemia Gene Expression Dataset
A brief introduction to the leukemia gene expression dataset [13] employed in this
work is presented here. There are 72 gene expression samples of leukemia patients.
Each of these samples contains the measured and quantified expression levels of
7129 number of genes. Gene expression levels of the samples are visually shown in
Fig. 2. It may be noted that some genes are negatively expressed.
3.2 Leukemia Subtype Prediction
Usually, gene expression datasets such as the present leukemia dataset [13] contain
limited number of samples and high number of features. Moreover, it may be real-
ized from Fig. 2 that the gene expression levels of AML and ALL types are not very
distinct. Figure 3 shows the distributions of four sample genes for all the 72 samples.
It reflects that there are no distinct decision boundaries for most of the features.
Fig. 1 Block diagram of the leukemia subtype prediction method (prediction is done on the basis
of microarray gene expression data of different subjects)
Fig. 2 Expression levels of 7129 genes for 72 samples (The first 25 samples are of AML type and
the remaining 47 samples are of ALL type)
Hence, ensemble approaches such as the one followed in EODClassifier are suit-
able for prediction of high-dimensional samples. It may be noted that this classifier
predicts the final class based on the decisions of each individual features and their
fitness measures. Thus, in this classifier, a discriminating feature contributes more
in determining the final class of a recall sample.
Fig. 3 Expression levels of four sample genes for 25 AML (first class) and 47 ALL (second class)
samples
Experiments have been carried out on the leukemia gene expression dataset [13]
which contains 72 instances and 7129 attributes. All attributes have numerical values
and the outcome or class contains binary values ‘1’ or ‘0’. Class 1 signifies that
the subject is acute lymphocytic leukemia and class 0 signifies that the subject is
acute myelocytic leukemia. Experimental setup is discussed in Sect. 1. Prediction
performance along with comparative performance analysis with respect to other
classifiers is presented in Sect. 2. Finally, some observations are discussed in Sect. 3.
4.1 Experimental Setup
Classification is done using EODClassifier discussed in Sect. 2 for different folds of

cross-validation. Cross-validation is a well-accepted practice in pattern classification
problems since it reflects a stronger picture about a classification model compared
to a model built in a single pass. Therefore, in order to measure performance of
the presented system, cross-validation strategy is followed. Moreover, as there are
only a limited number of samples in high dimensions, leave-one-out cross-validation
strategy is also adopted. Besides the experiments with the EODClassifier, similar
experiments have been conducted with other well-known classifiers for comparative
study. In order to report the obtained results, standard evaluation metrics such as
recall, prediction, f-score and accuracy have been adopted.
4.2 Prediction Performance
Classification models are trained with mostly default parameters. There are only a few
required changes in parameters to these models. These parameters are presented in
Table 1. As leukemia subtype prediction is shown with the EODClassifier, confusion
matrices obtained for different cross-validation are also shown in Fig. 4. It may be
realized that the classification performance of the said classifier is reasonably good
and the misclassification rate is nominal.
Mean precision, recall, f-score, accuracy and RMSE of all folds have been reported
in Table 2 for threefold, fivefold, tenfold, 20-fold and leave-one-out (LOO) cross-
validation experiments with multiple classifiers along with the present classifier of
interest, i.e., the EODClassifier. Default values of parameters as available in scikit-
learn are taken in naïve Bayes. In KNN, the number of neighbors, i.e., k is 3. In
SVM, linear kernel with gamma = ‘auto’ and C = 1 is employed. For multilayer
perceptron (MLP), 100 neurons in the hidden layer with ‘relu’ activation function
are taken. In random forest (RF) classifier, n_estimators = 10 and random_state =
0. At last, for the EODClassifier, we have taken the parameters as nof = ‘all’ and p
= 5.
Table 1 Parameters of different classifiers for training and prediction of acute leukemia subtypes
Classifier Parameters
GNB priors = None
KNN n_neighbors = 3, weights = ‘uniform’, p = 2, metric = ‘minkowski’ p = 2,
metric = ‘minkowski’
SVM kernel = ‘linear’, gamma = ‘auto’, C = 1
MLP random_state = 41, hidden_layer_sizes = 100, activation = ‘relu’, solver =
‘adam’, alpha = 0.0001, batch_size = ‘auto’, learning_rate = ‘constant’,
learning_rate_init = 0.001, power_t = 0.5, max_iter = 200
Random forest n_estimators = 10, random_state = 0, criterion = ‘gini’, max_depth = None,
min_samples_split = 2
EODClassifier nof = ‘all’, p = 5
Fig. 4 Confusion matrices obtained for different folds of cross-validation with the EODClassifier.
Misclassification rate is very less (as reflected in the non-diagonal positions)
4.3 Discussion
As evident from Table 2, EODClassifier achieves over 96% accuracy in all cross-
validation experiments. Accuracies of other classifiers are sometimes less and some-
times closed by that of the EODClassifier. Each method has its own merits and
demerits. It is important to note that no single classifier can be identified as the best
for all problems. A classifier which struggles on a dataset may yield great results on
another dataset. However, it cannot be denied that consistency is important. In that
respect, one may observe that performance of EODClassifier has been consistent in
all experiments conducted in the present work, which reflects its robustness besides
having high classification performance.
Table 2 Acute leukemia subtype prediction performance with multiple classifiers for threefold,
fivefold, tenfold, 20-fold and LOO cross-validation
#Fold Classifier P R F-score Accuracy RMSE
3 NB 0.9267 1.0 0.9610 0.9473 0.1846
KNN 0.8571 0.9743 0.9116 0.8771 0.3487
SVM 0.9743 0.9777 0.9733 0.9649 0.1529
MLP 0.8898 0.8944 0.9398 0.9122 0.2406
RF 0.9440 0.9583 0.9311 0.9123 0.229
EOD 0.9696 0.9761 0.9761 0.9649 0.1529
5 NB 0.975 1.0 0.9866 0.9818 0.0603
KNN 0.8683 1.0 0.9276 0.8954 0.2849
SVM 0.9666 0.975 0.9633 0.9500 0.1180
MLP 0.8955 0.9777 0.9411 0.9121 0.2199
RF 0.95 0.9355 0.9447 0.9303 0.2022
EOD 0.9666 0.9714 0.9664 0.9636 0.1206
10 NB 0.9800 1.0 0.9888 0.9833 0.0408
KNN 0.89 1.0 0.9377 0.9100 0.2119
SVM 0.9800 0.975 0.9746 0.9666 0.0816
MLP 0.9400 0.975 0.9638 0.9433 0.1040
RF 0.9600 0.9550 0.9292 0.9133 0.2080
EOD 0.975 0.975 0.9714 0.9666 0.0816
20 NB 0.9833 1.0 0.99 0.9833 0.0288
KNN 0.9083 0.975 0.9433 0.9083 0.1508
SVM 0.9666 0.975 0.9633 0.9500 0.0860
MLP 0.9416 1.0 0.9633 0.9416 0.0930
RF 0.9666 0.9166 0.9800 0.9666 0.0577
EOD 0.975 0.975 0.9666 0.9666 0.0577
72 NB 0.6491 0.6527 0.6491 0.9824 0.0175
(LOO) KNN 0.6491 0.6250 0.6491 0.9122 0.0877
SVM 0.6315 0.6388 0.6315 0.9473 0.0563
MLP 0.6491 0.6527 0.6491 0.8596 0.1403
RF 0.5789 0.6250 0.5789 0.8771 0.1280
EOD 0.6315 0.6315 0.6315 0.9649 0.3508
bold values are denote performance of the EODClassifier
5 Conclusion
In this paper, prediction of acute lymphocytic leukemia and acute myelocytic

leukemia from microarray gene expression sample using a recently reported
ensemble classifier called EODClassifier is presented. Over 96% classification accu-

racy is obtained in multiple cross-validation experiments. Like some other popular
classifiers, EODClassifier is found to be high performing. Additionally, performance
of this classifier is found to be consistent which reflects its robustness. Possible scope
of future works includes prediction using limited number of features having relatively
high fitness values.
References
1. Bullinger L, Dohner K, Dohner H (2017) Genomics of acute myeloid leukemia diagnosis and
pathways. J Clin Oncol 35(9):934–946
2. Maria IJ, Devi T, Ravi D (2020) Machine learning algorithms for diagnosis of Leukemia. Int
J Sci Technol Res 9(1):267–270
3. Joshi MD, Karode AH, Suralkar SR (2013) White blood cells segmentation and classification
to detect acute leukemia. Int J Emerg Trends Technol Comput Sci 2(3):147–151
4. Subhan MS, Kaur MP (2015) Significant analysis of leukemic cells extraction and detection
using KNN and hough transform algorithm. Int J ComputSci Trends Technol 3(1):27–33
5. Laosai J, Chamnongthai K (2014) Acute leukemia classification by using SVM and K-Means
clustering. In: Proceedings of the international electrical engineering congress, pp 1–4
6. Supardi NZ, Mashor MY, Harun NH, Bakri FA, Hassan R (2012) Classification of blasts in
acute leukemia blood samples using k-nearest neighbor. In: International colloquium on signal
processing and its applications. IEEE, pp 461–465
7. Adjouadi M, Ayala M, Cabrerizo M, Zong N, Lizarraga G, Rossman M (2010) Classification
of Leukemia blood samples using neural networks. Ann Biomed Eng 38(4):1473–1482
8. Sewak MS, Reddy NP, Duan ZH (2009) Gene expression based leukemia sub-classification
using committee neural networks. Bioinform Biol Insights 3:BBI-S2908
9. Zong N, Adjouadi M, Ayala M (2006) Optimizing the classification of acute lymphoblastic
leukemia and acute myeloid leukemia samples using artificial neural networks. Biomed Sci
Instrum 42:261–266
10. Bakas J, Mahalat MH, Mollah AF (2016) A comparative study of various classifiers for
character recognition on multi-script databases. Int J Comput Appl 155(3):1–5
11. Sahlol AT, Kollmannsberger P, Ewees AA (2020) Efficient classification of white blood cell
leukemia with improved swarm optimization of deep features. Sci Rep 10(2536):1–11
12. Rehman A, Abbas N, Saba T, Rahman SIU, Mehmood Z, Kolivand H (2018) Classification of
acute lymphoblastic leukemia using deep learning. Microsc Res Tech 81(11):1310–1317
13. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh
ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES (1999) Molecular classifica-
tion of cancer: class discovery and class prediction by gene expression monitoring. Science
286(5439):531–527
14. Hasan SR, Mollah AF (2021) An ensemble approach to feature selection and pattern classifi-
cation. In: Proceedings of international conference on contemporary issues on engineering and
technology, pp 72–76
15. EODClassifier (2021) https://github.com/iilabau/EODClassifier. Accessed 15 June 2021
Intrusion Detection System Intensive
on Securing IoT Networking
Environment Based on Machine
Learning Strategy
Abstract The Internet of Things is the technology that is exploding in the day-to-
day life of the home to the large industrial environment. An IoT connects various
applications and services via the internet to make the environment contented. The
way of communication among the devices leads to network vulnerability with various
attacks. To protect from the security vulnerability of the IoT, the Intrusion Detection
Systems (IDS) is employed in the network layer. The network packets from the
interconnected IoT applications and services are stored in the Linux server on the
end nodes. The packets are got from the server using the crawler into the network
layer for attack prediction. Thus, the work contains the main objective is to identify
and detect the intrusion among the IoT environment based on machine learning (ML)
using the benchmark dataset NSL-KDD. The NSL-KDD dataset is pre-processed to
sanitize the null values, eliminating the duplicate and unwanted columns. The cleaned
dataset is then assessed to construct the novel custom features and basic features for
the attack detection, which represent the feature vector. Novel features are constructed
to reduce the learning confusion of machine learning algorithm. The feature vector
with the novel and basic features is then processed by employing the feature selection
strategy LASSO to get the significant features to increase the prediction accuracy. Due
to the outperform of ensembled machine learning algorithms, HSDTKNN (Hybrid
Stacking Decision Tree with KNN), HSDTSVM (Hybrid Stacking Decision Tree
with SVM) and TCB (Tuned CatBoost) are used for classification. Tuned CatBoost
(TCB) technique remarkably predicts the attack that occurs among the packets and
generates the alarm. The experimental outcomes established the sufficiency of the
proposed model to suits the IoT IDS environment with an accuracy rate of 97.8313%,
0.021687 of error rate, 97.1001% of sensitivity, and specificity of 98.7052%, while
prediction.
D. V. Jeyanthi (B)
Department of Computer Science, Sourashtra College, Madurai, Tamilnadu, India
B. Indrani
Department of Computer Science, DDE, MKU, Madurai, Tamilnadu, India
https://doi.org/10.1007/978-981-16-7610-9_11
140 D. V. Jeyanthi and B. Indrani
Keywords IDS · NSL-KDD · IoT network environment · Custom novel features ·

PSO · LASSO · Machine learning · HSDTSVM · HSDTKNN · TCB
1 Introduction
The IDS is the concept of shield to avoid attacks in the computer system. To compro-
mise the security and constancy among the network which is connected to the internet
using various techniques, the IDS are the most required part in the network security
configuration. Generally, IDS can be classified into two categories: Anomaly detec-
tion and signature-based detection. An anomaly based detection system constructs a
database with normal and generates an alarm when there is existence of the abnormal
behavior from the normal. The signature-based detection system which maintains a
database includes the obtainable patterns of the attacks [1, 2]. This system verifies
whether the similar patterns or data exists in the current situation and provide an
indication that attack or not.
IoT is based on the network layer design of the intercommunication which is
responsible for data packets moving among the hosts. In IoT architecture, the network
layer is vulnerable and miscellaneous phase which is noticeable to various security
concerns. The main reason for the security vulnerability in the IoT is that it contains
large number of linked nodes which may lead to failure of entire system due to the
affected of single node. The IoT architecture flaws lead to attacks such as DDoS,
remote recording, botnets, data leakage, and ransomware. Manipulating a firewall is
a primary security measure to fight with the vulnerability on IoT, but that is not a
prominent solution due to the variability of the issues on the IoT architecture.
This work proposes a framework that employs machine learning techniques to
predict security anomalies in IoT environment. Thus, the work focuses on the intru-
sion occurrence on the IoT connected devices using machine learning techniques.
The work adopted the NSL-KDD dataset for the attack prediction using the machine
learning techniques by creating novel custom features from the given dataset to
increase the prediction accuracy and reduce the training time. Thus, the proposed
framework provides high-performance results for NSL-KDD_CF than the employ-
ment of feature selection in NSL-KDD dataset. The following section describes the
attack prediction process.
2 Review of Literature
The scheme suggested by Soni et al. [3] creates usage of two methodologies C5.0
and ANN. To classify the information based on the performance of C5.0 and ANN,
a set of significant features must be elected. To attain unique attacks, Somwang et al.
[4] developed a hybrid clustering model joining PCA and FART. Exhausting hier-
archical clustering and SVM, Su et al. [5] improved detection accuracy by uniting
Intrusion Detection System Intensive … 141
IDS with hierarchical clustering and SVM. KDD 99 dataset was utilized to conduct
the experiments. As a result, DoS and probe attacks have been perceived enhanced
outcomes. Ei Boujnouni and Jedra et al. [6] proposed an anomaly detection-based
network intrusion identification system. This scheme encompasses data conversion,
normalization, relevant feature, and novelty discovery models based on a classifi-
cation and resolution scheme composed of SPPVSSVDs and SMDs for defining
whether traffic is normal or intrusive. Bhumgara et al. [7] proposed cross approaches
merging J48 Decision Tree, SVM, and NB to discern dissimilar varieties of attacks
and includes dissimilar sorts of accuracy deliberating to algorithms. Based on the
OPSO-PNN model, the researcher proposed an anomaly based IDS in Sree Kala and
Christy [8].
In the paper [9], an IDS is proposed with minimal set of features that employs
random forest classifier as supervised machine learning method. These manually
selected features assist in training and detect the intrusion in IoT environment
with minimum selection and relevant features. The work [10] proposes construct
an accurate model by employing various data pre-processing techniques that allow
the machine learning algorithm to classify the possible attacks for the parameters
exactly using cybersecurity dataset.
The work [11] identifies various kinds of attacks of IoT threats using deep learning
based IDS for IoT environment using five benchmark datasets. The main objective of
the work [12] was to compare both KDDCup99 and NSL-KDD by the performance
evaluation of various machine learning techniques with large set of classification
metrics. The work [13] focuses on IoT threats by detecting and localizing the IoT
devices which are infected and generate alarm. The research [14] proposes an archi-
tectural design and implementation with hybrid strategy based on multi-agent and
block chain using deep learning algorithms. The researcher [15] proposed a novel
framework for inspecting and labeling the suspected packet header and payload to
increase the accuracy of the prediction. The paper [16] proposed a system to monitor
the soldiers those who are wounded and lost on the front line by tracking the data from
sensors. The paper [17] proposed a system in wireless networks for the sustainable
smart farming with block chain technology is evaluated to measure the performance.
The paper [18] designed a system to control the devices which is far from the control
system by sending the status using the sensors.
3 Proposed Scheme
This section describes the proposed scheme for the IDS for IoT environment with
NSL-KDD. Proposed architecture depicts the attack identification and recognition
based on the basic and novel custom features derived. The features are employed
with the ML for the attack prediction. The detailed architecture for the proposed
scheme is shown in Fig. 1. The proposed architecture for intrusion detection in IoT
environment used NSL-KDD. This architecture handles packet information, missing
value imputation, duplicate detection, best feature selection and classification. The
Fig. 1 Proposed architecture
proposed work mainly focuses on to generate novel features to solve learning confu-
sion problem for classifiers, and it helps to analyst to understand the features. The
proposed work holds 5 layers (i) Data Collection Layer (ii) Pre-Processing Layer (iii)
Construction Layer (iv) Feature Selection Layer and (v) Detection Layer to detect
the attacks.
Table 1 shows the parameters that was used in this work.
Table 1 Used parameters

Parameter Description
D Dataset
DN Novel dataset
DC Cleaned dataset
ACC Accuracy
ER Error rate
SE Sensitivity
SP Specificity
DB Best feature selected dataset
MR Miss rate
FO Fall out
Xn Training set
Yn Testing set
PT True positive
NT True negative
PF False positive
NF False negative
3.1 Data Collection Layer
3.1.1 Dataset
The NSL-KDD has 41 features that considered as into basic, content, and traffic
features. In compare to KDD-Cup dataset, an inventive form of NSL-KDD does
not undergo from KDD-Cup’s shortcomings [17]. In addition, the NSL-KDD (D)
training sets shows a rational number of records. Due to this benefit, it is possible to
execute the experiments on the entire dataset short of manually choosing a small part.
The dataset D includes various attack groups with ratio of DOS (79%), PROBING
(1%), R2L (0.70%), U2R (0.30%) and Normal (19%).
3.2 Pre-processing Layer
This pre-processing layer of this work includes the pre-processing of the raw dataset
to clean and for the process of deriving novel custom features. The pre-processing of
the dataset processes the dataset (D) by eliminating the duplicate columns, avoiding
missing values and redundant columns from the dataset reduces the size for the
further processing.
In Fig. 2, the missing value illustration is shown for the given dataset. The figure
depicts that the given dataset doesn’t contain any missing values.
Fig. 2 Missing value
Table 2 Encoding values

Features Values Encoded value
Service Http, Telnet, etc. 0–70
Flag SF, REJ, etc. 0–11
Protocol TCP/UDP/ICMP 0/1/2
Class Normal/Attack 0/1
In this phase, the features in the dataset are encoded for the unique format for
process. The fields in the dataset are in various formats so it is complex to compute
the custom features for the progression thus the work encodes the fields of the set
into uniform format with the encoding value. The sample encoding value for some
of the features in shown in Table 2.
3.3 Construction Layer
This construction layer builds the proposed novel features which are derived from
the dataset (D). These proposed novel features (DN ) are extracted from the fields
in the dataset (d) which employed in the prediction to increase accuracy, and it is
helping to avoid the learning confusion for ML techniques.
Total Bytes:
The sum of the total number of source and destination bytes among the packets
transaction is integrated to derive the custom feature Total Bytes.
Total Bytes = Source Bytes + Destination Bytes
Byte Counter:
This custom feature is derived to evaluate the Byte Counter with respect to the Total
Bytes evaluated which is proportional to the total count.
Byte Counter = Total Bytes/Count
Interval Counter:
This custom feature is derived to evaluate the Byte Counter with respect to the Total
Bytes evaluated which is proportional to the total count.
Interval Counter = Duration/Count
Unique ID:
The custom feature Unique ID is derived with the concatenation of the service, flag
and protocol type of the captured packet.
UID = ProtocolType + Service + Flag
Average SYN Error:

SYN Error Rate and Destination Host SYN Error Rate of captured packets are
integrated and averaged to obtain the average synchronize error rate.
Average SYN Error = (SYN Error Rate + Destination Host SYN Error Rate)/2
Total Service Rate:

To obtain the Total Service Rate, the same service rate of the packets and different
service rates of the packets are integrated.
TSR = Same Service Rate + Different Service Rate
Nominal of Same Service Rate:

This custom feature is derived to evaluate the nominal of same service of the packets
with respect to the total service rate.
Nominal of Same Service Rate = Same Service Rate/TSR

REJ Error Mean:

With the integration of rejection error rate and destination host rejection, error rate
is evaluated to find the mean value of the REM.
REJ Error Mean = (REJ Error Rate + Destination host REJ Error Rate)/2
Login State:
This login state feature is derived to identify whether the host login is enable or not.
This feature is derived using the feature logged-in.
L State = i f (Loggedin = False) = ’F’ else{i f (Loggedin = ’H ost’) = ’H ’ else ’G’}
Nominal of Different Service Rate:

The evaluation of different service rate nominal of the captured packets the total
number of different service rate is proportional to the TSR.
Nominal of Different Service Rate = Different Service Rate/TSR
3.4 Selection Layer
The purpose of this layer is to identify significant features among the derived features
of the given dataset in order to increase the accuracy of the prediction. This work
presents the following techniques for selecting best features in cleaned NSLKDD
(DC). The selected best features are help to improve the accuracy of the classifier.
3.4.1 PSO
Whenever PSO is employed together as a group and when private involvements are
learned, those experiences are consolidated. According to the proposed resources,
the optimal solution follows a predetermined path. Alternatively, this path is called
the particular best solution (pbest ) of the particle as it has been measured as the
shortest path. By analyzing its individual rapid experiences and interactions with
others, each particle in the exploration space searches for the best solution. A better
fitness value can also be achieved by detecting any particle adjacent to any particle
in the group. This is denoted as the gbest . Each particle has its linked velocity for
the acceleration concerning achieving the pbest and gbest . The basic thought of PSO
is to attain global optimal solution, thereby moving each particle toward pbest and
gbest with random weight at every phase. Particle swarms are randomly generated
and then progress through the search space or primary space until they identify the
optimal set of features by keeping track of their position and velocity. As a result,
the particle’s current position (p) and velocity (v) are described as follows:
pi = { pi1 , pi2 , . . . , pi D }, vi = {vi1 , vi2 , . . . , vi D }
where D is dimension of the search space.

The following equation is used to calculate the position and velocity of the particle
i,
D = pi D + vi D
pik+1 k k+1

vik+1
D = w ∗ pi D + a1 ∗ r 1 ∗ pid − pi D + a2 ∗ r 2 ∗ pgd − pi D
k k k
where kth iterations in the procedure is denoted with k. In the search space, the
dth dimension is represented as d ∈ D. Inertia weight is denoted by “w” used to
regulate the influence of the preceding of the present velocity. The random values
are denoted as r 1 and r 2 for uniformly distributed in [0, 1]. Acceleration constants
are represented as a1 and a2. The elements of pbest and gbest are represented as pid and
pgd in the dimension dth. Particle positions and velocity values are updated without
interruption until the stopping criteria are met, which can be either a large number
of iterations or a suitable fitness value.
3.4.2 LASSO Feature Selection
Through LASSO feature selection, regression coefficients are shrunk, and many of
them are dropped to zero. This aims to normalize model arguments. As a result
of shrinkage, during this phase, the model must be restructured for every non-zero
value. In statistical models, this technique minimizes related errors in predictions. An
excessive transaction of accuracy is offered by LASSO models. Due to the shrinkage
of coefficients, accuracy increases as the inconsistency is reduced, and bias is reduced.
It extremely relies on parameter “λ”, which is the adjusting factor in shrinkage. The
larger “λ” becomes, then the more coefficients are enforced to be zero. Additionally,
it is useful for wipe out all variables that are not correlated to and that are not
accompanying with the response variable. Thus, in LR (Linear Regression), this
algorithm shrinks the error present in the work by providing an upper bound for
squares. If “λ” is a parameter, then the LASSO estimator will be conditional. The
“λ” influences shrinkage, with an upsurge in “λ” increasing shrinkage. An inverse
connection exists between the upper bound of all coefficients and the “λ”. Whenever
the upper bound raises, the attribute λ diminishes. At whatever time the upper bound
is decreased, the “λ” grows instantaneously.
3.5 Detection Layer
The detection layer is surrounded with machine learning algorithms for detecting
the attack using classification techniques. The ML system is employed to the attack
prediction for the intrusion detection system. The best features (S B ) obtained from the
feature selection phase are used as training (X n ) and testing (yn ) set for the prediction
model. The prediction models employed are as follows:
3.5.1 HSDTKNN
One of the ensemble based machine learning algorithm is stacking. The advan-
tage of stacking is that it can harness the abilities of a range of well-performing
models on a classification or regression job and create predictions that have improved
performance than any solo model in the ensemble. This proposed algorithm entitled
“Hybrid Stacking Decision Tree with K-Neighbors Classifier.” Tree-based models
are a class of nonparametric calculations that work by distributing the component
space into different more minor areas with comparative reaction esteems utilizing a
set of splitting rules. Predictions are accomplished by fitting a more straightforward
model in every region. Given a training data X n = {t 1 , …, t n } where t i = {t i , …, t i }
and the training data X n encompasses the subsequent attributes {T 1 , T 2 , …, T n } and,
respectively, attribute Tn comprises the next attribute values {T 1i , T 2i , …, T ni }the
instance of the input and specifies a record for network packet. Each instance in the
training data X n has a specific class “yn ” is the class tag that means the output of
every record perceived. The algorithm first searches for the multiple copies of the
same instance in the training data X n .
The stacking with Decision Tree (DT) is employed to predict the attack among
the network. The ensemble classifier is designed by stacking DT (Meta) and KNN
(Base) together. The DT classifier is integrated with the KNN to enhance the overall
performance of the training time. From the significant feature set, SFS is employed
to construct the training set X n, the each features of n elements is stacked with a
different assigned value. Afterward, the DT model is fitted to the n − 1 portions of
the setup, while the predictions of the network are prepared at the nth part of the
stack. To fit the entire set X n , the same process is repetitive for every part of the
training set X n (i). To both yn and X n , the stacked classifier KNN is fitted. There are
two sets for training: training set and validation set. The validation set is used to
construct the new model with performed evaluations on the set yn .
The stacking model of Meta learners is very much like trying to find the best
combination of base learners. In this classifier (HSDTKNN), (Table 3) the Base
Learner is KNN, followed by the Meta Learner, Decision Tree (DT). The present
algorithm begins by specifying the number of base algorithms. This algorithm uses a
single-base algorithm called “KNN.” There are specific parameters associated with
KNN, such as ten neighbors, KD-Tree computed, and Euclidean distance measures.
DT Meta learner has other parameters, including five levels of max depth, none of the
Table 3 Parameter of
HSDTKNN—Parameters Value(s)
HSDTKNN
Base learner K Neighbor Classifier (KNN)
Meta learner Decision Tree (DT)
Cross validation 5
Max depth (DT) 5
Random state (DT) None
Max leaf node (DT) 20
K-Neighbors (KNN) 10
Algorithm (KNN) KD-Tree
Distance metric (KNN) Euclidean
random state, and 20 maximum leaf nodes. Next, it performs k-fold cross-validation
with value “5” for predicting the value from the base algorithm. Having received
a prediction from the base learner, the meta learner begins to generate ensemble
predictions.
3.5.2 HSDTSVM
The Decision Tree is a tree structure in which internal nodes represent tests on
attributes, branches represent outcomes, and leaf nodes represent class labels.
Subtrees rooted at new nodes are then created using the same procedure as above.
An algorithm based on “Hybrid Stacking Decision Trees Using Support Vector
Machines” is proposed in this work. SVMs are essentially binary classifiers that
divide classes by boundaries. SVM is capable of tumbling the mistake of experi-
mental cataloging and growing class reparability using numerous transformations
instantaneously. As the margin reaches the maximum range, separation between
classes will be maximized. Expect to be that “yn = {x i , yi }” is a testing sample
containing two yi = 1/0 classes, and each class is composed of “x i where, i = 1,
…, m” attribute. With solo decision-making or learning models, DT and SVM are
more performant as a stack. Although the SVM is an accurate classification method,
its deliberate processing makes it a very slow method of training when dealing with
large datasets. In the training phase of SVM, there is a critical flaw. To train enormous
data sets, here need an efficient data selection method based on decision trees and
support vector classification. During the training phase of the proposed technique,
the training dataset for SVM is reduced by using a decision tree. It addresses the issue
of selecting and constructing features by reducing the number of dataset dimensions.
An SVM can be trained using the disjoint areas uncovered by an SVM decision
tree. A smaller dataset thus finds a more complex region than a larger one obtained
from the entire training set. The complexity of decision trees is reduced with small
learning datasets, despite the fact that decision rules are more complex.
Table 4 Parameter of
HSDTSVM—Parameters Value(s)
HSDTSVM
Base learner Support Vector Machine (SVM)
Meta learner Decision Tree (DT)
Cross validation 5
Max depth (DT) 5
Random state (DT) None
Max leaf node (DT) 20
Kernel Sigmoid
Co-efficient Sigmoid
Verbose True
In this classifier (HDTSVM), (Table 4) base learner is the support vector machine
(SVM) and the meta learner is the Decision Tree (DT). The workflow of the present
algorithm begins with specifying the number of the base algorithm. Here, the base
algorithm (SVM) has its own parameters such as Sigmoid Kernel, Sigmoid Co-
Efficient, and Verbose is true. The meta learner DT also has its own parameters such
as five-level of max depth, None of Random State, and twenty Maximum leaf node.
Next, it performs k-fold cross-validation with the value of “5” to predict the value
from the base algorithm. After getting predictions from the base learner, the meta
learner starts predictions to generate an ensemble predicted output.
3.5.3 Tuned CatBoost (TCB)
CatBoost is the name of an implementation of boosted decision trees algorithm

in various applications of Boosted Decision Trees (BDTs) to fight the prediction
change found in different solutions for certain kinds of distributions and the support
for categorical features with another quicker methodology. Catboost algorithm like-
wise relies upon the ordered boosting technique. Ordered boosting procedure is an
improved form of gradient boosting algorithm. Prediction shifting happens due to
distinctive kinds of target leakage. For every novel split for the present tree, CatBoost
algorithm uses a greedy approach. Except for the first split, every next split includes
every combination and categorical feature in the present tree along with categorical
features of the dataset. In this calculation, for each model built after any number of
trees, every training example being evaluated is allotted a gradient value. To ensure
that this grade value being assigned is well-adjusted, the model should be prepared
without the specific training model. For overfitting detection, a training dataset will
be divided and a little segment will be utilized for testing. Assume that the dataset
with test samples (D). D = X j y j Where j = 1, 2, 3, . . . , m is a feature vector
and response feature “yj ∈ R,” which can be numeric (0 or 1). There are a signifi-
cant number of parameters in the model which can be adjusted in order to achieve a
better performance. It is significant likewise to cross-validate the model to see if it
Table 5 Tuning parameters of TCB

TCB—Parameters Tuned value-1 Tuned value-2 Tuned value-3
Iterations 1000 1200 1500
Learning rate 0.01 0.03 0.1
Loss function, verbose Cross entropy, 0 and Cross entropy, 0 and Cross entropy, 0 and
and task type GPU GPU GPU
Depth 6 8 10
Leaf regularization 3 6 9
Classifier accuracy (%) ≤87 ≤93 ≥97
Error rate (%) 13 7 3
generalizes on testing data as it should and prevents overfitting, offering the chance
to prepare the model with a pool of parameter and pick the ones that generalize better
with testing data.
The parameters tuning for the TCB classifier in this work are mentioned in Table
5. The table depicts the tuned value of the tuned parameter, while prediction and its
aspects such as iterations, learning rate, and loss function, depth, etc. With these tuned
parameters implemented in the proposed classifier which increases the classifier
accuracy and decreases the error rate, while attack prediction.
3.6 Evaluation Result
The proposed IDS architecture is implemented using python with Anaconda Environ-
ment on Ubuntu Linux, 64 bit system environment with an Intel Xeon E5-2600 with
16 GB RAM, GTX 1050Ti 4 GB Graphics card and 6 TB hard disk on Rack Server.
The training and testing sets contain the attack and normal packets information that
are collected from NSL-KDD dataset. Processes of this experiment begin with the
cleaning and encoding module to extract non-duplicate and useful features are stored
into the feature vector. To reduce the machine learning algorithm’s learning confu-
sion (zero value) problem, novel features (NSLKDD_CF) are constructed. NSLKDD
dataset used feature selection (PSO, LASSO) to avoid undesirable features and select
only the best features. In the next step, classification (HSDTSVM, HSDTKNN, and
TCB) is performed using the training set and test set. Based on the detection accuracy
between the NSLKDD and NSLKDD_CF (Existing data set features and proposed
custom features), the results are compared with the NSLKDD and NSLKDD_CF.
This section of the work includes the evaluation results for the feature selection and
classification strategies are as follows. The evaluation results are computed for the
dataset NSL-KDD and NSL-KDD_CF (Novel Features) is depicted with illustra-
tions. The elapsed time (in sec) is evaluated for the feature selection strategy, while
selection is displayed in Fig. 3. It shows that the proposed method LASSO consumes
less elapsed time than PSO.
Fig. 3 Feature selection

time
40
Elapsed Time (in Seconds)

35
30
25
20
15
10
PSO LASSO
Feature Selection Algorithm (s)
Figure 4 illustrates the accuracy and error rate for NSLKDD and NSLKDD_CF
with the classifiers performance. The accuracy ratio is high and gives less error rate
for the proposed method TCB than other method. Table 6 shows TP, TN, FP, and FN
ACCURACY AND ERROR RATE
0.9569984 0.9485124 0.9783126 0.9657016 0.9877469

1 0.8969699
0.8
0.6
0.4
0.2 0.1030301
0.0430016 0.0514876 0.0216874 0.0342984 0.0122531
0
HSDTKNN HSDTSVM TCB HSDTKNN HSDTSVM TCB
NSLKDD NSLKDD_CF
CLASSIFIERS
Accuracy Error Rate
Fig. 4 Accuracy and error rate for NSLKDD and NSLKDD_CF
Table 6 TP, TN, FP, and FN values for the classifiers

Dataset Algorithm(s) TP (PT ) TN (N T ) FP (PF ) FN (N F )
NSLKDD HSDTKNN 62,755 57,800 4587 830
HSDTSVM 64,983 54,503 2359 4127
TCB 66,599 56,641 743 1989
NSLKDD_CF HSDTKNN 56,028 36,971 2836 467
HSDTSVM 53,286 33,094 5578 4344
TCB 58,477 36,645 387 793
values for the classifiers HSDTKNN, HSDTSVM, TCB with the datasets NSLKDD,
NSLKDD_CF.
Table 7 compares the classification metric performance of three classifiers
HSDTKNN, HSDTSVM, TCB compared to two datasets NSL KDD, NSLKDD_CF
including accuracy, error rate, sensitivity, specificity, and miss rate.
(a) Accuracy (ACC) and Error Rate (ER):
One way to measure a machine learning algorithm’s accuracy is to determine how
many data points it correctly classifies. Based on all data points, the accuracy of a
prediction is the number of points correctly predicted.
PT + NT
ACC =
PT + PF + NT + NF
62,755 + 57,800 120,555
ACCHSDTKNN = = = 0.9569984
62,755 + 57,800 + 4587 + 830 125,972
The error rate (ERR) is calculated as the number of all incorrect predictions
divided by the number of data points that were analyzed. Error rates are best at 0.0
and worst at 1.0.
ER = 1 − ACC
ERHSDTKNN = 1 − 0.9569984 = 0.0430016
Figure 4 shows the accuracy and error rate for the NSLKDD and NSLKDD_CF.
TCB has more accuracy and low error rate.
(b) Sensitivity and Miss Rate:
As determined by the number of correct positive measurements divided by the total
number of positives, the sensitivity (SN) is calculated. It is also known as recall
(REC) or true positive rate (TPR). Sensitivity is best at 1.0, while it is worst at 0.0.
PT
SE(TPR) =
PT + NF
62,755
SEHSDTKNN = = 0.9869466
62,755 + 830
A miss rate, or false negative rate (FNR), is calculated by dividing the true positive
predictions by the total number of true positives and false negatives. In terms of false
negative rates, the best rate is 0.0, and the worst rate is 1.0.
MR(FNR) = 1 − SE(TPR)
154
Table 7 Classification metrics

Dataset Algorithm(s) Accuracy Error Rate Sensitivity (TPR) Miss Rate (FNR) Specificity (TNR) Fall Out (FPR)
NSLKDD HSDTKNN 0.9569984 0.0430016 0.9869466 0.0130534 0.9264751 0.0735249
HSDTSVM 0.9485124 0.0514876 0.9402836 0.0597164 0.9585136 0.0414864
TCB 0.9783126 0.0216874 0.9710008 0.0289992 0.9870521 0.0129479
NSLKDD_CF HSDTKNN 0.9657016 0.0342984 0.9917338 0.0082662 0.9287562 0.0712438
HSDTSVM 0.8969699 0.1030301 0.9246226 0.0753774 0.8557613 0.1442387
TCB 0.9877469 0.0122531 0.9866206 0.0133794 0.9895496 0.0104504
0.9869466 0.9402836 0.9710008 0.9917338 0.9866206

1 0.9246226
Sensitivity and Miss Rate
0.8
0.6
0.4
0.0753774
0.0130534 0.0597164 0.0289992 0.0082662
0.2
0.0133794
0
NSLKDD NSLKDD_CF
Classifiers
Sensitivity (TPR) Miss-Rate (FNR)
Fig. 5 Sensitivity and fall out for NSLKDD and NSLKDD_CF
MRHSDTKNN = 1 − 0.9869466 = 0.013053
The sensitivity and fall out for the three classifiers and two dataset are in Fig. 5. The
proposed method TCB holds high sensitivity and low miss rate for both NSLKDD
and NSLKDD_CF datasets.
(c) Specificity and Fall Out:
Based on the number of correct negative predictions divided by the number of total
negatives, specificity (SP) is calculated. True positive rate (TNR) is another name
for this ratio. Specificity is best at 1.0, and worst at 0.0.
TN
SP (TNR) =
TN + FP
57,800
SP HSDTKNN = = 0.9264751
4587 + 57,800
False-positive rate (FPR) is calculated by dividing the total number of negatives

by the number of incorrect positive predictions. 0.0 is the best false positive rate,
while 1.0 is the worst.
FO(FPR) = 1 − SP(TNR)
FOHSDTKNN = 1 − 0.9264751 = 0.0735249

0.9870521 0.9895496
1 0.9264751 0.9585136 0.9287562
0.8557613
SPECIFICITY AND FALL OUT 0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1442387
0.0735249 0.0414864 0.0712438
0.1 0.0129479 0.0104504
0
NSLKDD NSLKDD_CF
CLASSIFIERS
Specificity (TNR) Fall-Out (FPR)
Fig. 6 Specificity and fall out for NSLKDD and NSLKDD_CF
The specificity (SP) and fall out (FO) for both NSLKDD and NSLKDD_CF
provides high specificity and low fall out for proposed method TCB than other
classifiers is shown in Fig. 6.
4 Conclusion
The IDS for the IoT networking environment is implemented using machine learning
techniques by constructing custom features dataset NSLKDD_CF. Custom features
are constructed with the motive to diminish the prediction time and upsurge the
accuracy of attack identification, which is exposed achieved. PSO, LASSO are used
for feature selection to neglect undesirable features. Ensembled hybrid machine
learning classifications algorithms HSDTSVM, HSDTKNN, and TCB classified the
attacks in the two dataset NSLKDD and NSLKDD_CF and performance is measured.
TCB with tuned parameters outperformed for the both dataset. This work is limited
with two benchmark datasets. This can be implemented with various IoT real-time
dataset and can be implemented as a product and deployed.
References
1. Devaraju S, Ramakrishnan S (2014) Performance comparison for intrusion detection system

using neural network with KDD dataset. ICTACT J Soft Comput 4(3):743–752
2. Phadke A, Kulkarni M, Bhawalkar P, Bhattad R (2019) A review of machine learning
methodologies for network ıntrusion detection. In: Third national conference on computing
methodologies and communication (ICCMC 2019), pp 272–275
3. Soni P, Sharma P (2014) An intrusion detection system based on KDD-99 data using data
mining techniques and feature selection. Int J Soft Comput Eng (IJSCE) 4(3):1–8
4. Somwang P, Lilakiatsakun W (2012) Intrusion detection technique by using fuzzy ART on
computer network security. In: IEEE—7th IEEE conference on ındustrial electronics and
applications (ICIEA)
5. Horng S-J, Su M-Y, Chen Y-H, Kao T-W, Chen R-J, Lai J-L, Perkasa CD (2011) A novel
intrusion detection system based on hierarchical clustering and support vector machines. Exp
Syst Appl 38(1):306–313
6. Ei Boujnouni M, Jedra M (2018) New ıntrusion detection system based on support vector
domain description with ınformation metric. Int J Network Secur pp 25–34
7. Bhumgara A, Pitale A (2019) Detection of network ıntrusions using hybrid ıntelligent system.
In: International conferences on advances in ınformation technology, pp 500–506
8. Sree Kala T, Christy A (2019) An ıntrusion detection system using opposition based particle
swarm optimization algorithm and PNN. In: International conference on machine learning, big
data, cloud and parallel computing, pp 184–188
9. Rani D, Kaushal NC (2020) Supervised machine learning based network ıntrusion detec-
tion system for ınternet of things. In: 2020 11th ınternational conference on computing,
communication and networking technologies (ICCCNT)
10. Larriva-Novo X, Villagrá VA, Vega-Barbas M, Rivera D, Sanz Rodrigo M (2021) An IoT-
focused intrusion detection system approach based on preprocessing characterization for
cybersecurity datasets. Sensors 21:656. https://doi.org/10.3390/s21020656
11. Islam N, Farhin F, Sultana I, Kaiser MS, Rahman MS et al (2021) Towards machine learning
based intrusion detection in IoT networks. CMC-Comput Mater Continua 69(2):1801–1821
12. Sapre S, Ahmadi P, Islam K (2019) A robust comparison of the KDDCup99 and NSL-KDD
IoT network ıntrusion detection datasets through various machine learning algorithms
13. Houichi M, Jaidi F, Bouhoula A (2021) A systematic approach for IoT cyber-attacks detection
in smart cities using machine learning techniques. In: Barolli L, Woungang I, Enokido T (eds)
Advanced ınformation networking and applications. AINA 2021. Lecture notes in networks
and systems, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-030-75075-6_17
14. Liang C, Shanmugam B, Azam S (2020) Intrusion detection system for the ınternet of things
based on blockchain and multi-agent systems. Electronics 9(1120):1–27
15. Urmila TS, Balasubramanian R (2019) Dynamic multi-layered ıntrusion ıdentification and
recognition using artificial ıntelligence framework. Int J Comput Sci Inf Secur (IJCSIS)
17(2):137–147
16. Rahimunnisa K (2020) LoRa-IoT focused system of defense for equipped troops [LIFE]. J
Ubiquitous Comput Commun Technol 2(3):153–177
17. Sivaganesan D (2021) Performance estimation of sustainable smart farming with blockchain
technology. IRO J Sustain Wireless Syst 3(2):97–106. https://doi.org/10.36548/jsws.2021.
2.004
18. Dr PK (2020) A sensor based IoT monitoring system for electrical devices using Blynk
framework. J Electron Inform 2(3):182–187
Optimization of Patch Antenna
with Koch Fractal DGS Using PSO
Sanoj Viswasom and S. Santhosh Kumar
Abstract An edge fed patch antenna is designed to operate in the Wi-Fi band of 5.2
GHz. For improving the band width and for achieving multiband operation, a Koch
fractal DGS structure was incorporated into the ground plane. On introduction of
fractal DGS, the antenna exhibits dual-band operation at 3.9 and 6.8 GHz. In order to
obtain the originally designed frequency of 5.2 GHz, the antenna structure was opti-
mized using Particle Swarm Optimization (PSO). The optimized antenna resonates
at 5.2 GHz and also at 3.5 GHz, which is the proposed frequency for 5G operations.
So our optimized antenna exhibits dual-band operation and is suitable for Wi-Fi and
5G applications. Also, it provides good gain in the operating frequency bands. This
novel antenna design approach provides dual-band operation with enhanced band-
width in compact size. The antenna structure was simulated and its performance
parameters evaluated using OpenEMS.
Keywords Microstrip antenna · Defected Ground Structure (DGS) ·

Koch Snowflake fractal · Particle Swarm Optimization (PSO)
1 Introduction
Microstrip patch antennas find wide application in wireless communication systems

because of their meritorious features like small size, light weight and easy fabrication
using printed circuit technology [1]. The major drawback of the microstrip antenna
includes narrow frequency bandwidth, spurious feed radiation, low power handling
capability and single-band operation [2]. A number of approaches that can be adopted
for designing dual-band antennas is proposed in [3]. All the antenna configurations
proposed in this paper are based on Euclidean geometry.
Of late, a considerable amount of interest has been directed to develop antennas
based on fractal geometry [4]. Fractal antenna engineering is an emerging area in
S. Viswasom (B) · S. Santhosh Kumar

Department of ECE, College of Engineering, Trivandrum, Kerala, India
e-mail: sanojv@cet.ac.in
https://doi.org/10.1007/978-981-16-7610-9_12
160 S. Viswasom and S. Santhosh Kumar
antenna design that utilizes fractal shapes to design new antennas with improved
features. Two of the most important properties of fractal antenna are Space Filling and
Self Similarity properties. Fractals exhibit self-similarity as they consist of multiple
scaled down versions of itself at various iterations. Hence, a fractal antenna can
resonate at large number of resonant frequencies and exhibit multiband behavior
[5]. The space filling property of antenna can be used to pack electrically larges
antennas into small areas, leading to miniaturization of the antenna structures [4].
Fractal shapes has been used to improve the features of patch antennas [6–9].
Defected Ground Structures (DGS) refers to some compact geometry that is etched
out as a single defect or as a periodic structure on the ground plane of a microwave
printed circuit board. The DGS slots have a resonant nature. They can be of different
shapes and sizes. Also, their frequency responses can vary with different equivalent
circuit parameters [7]. The presence of DGS is found to exhibit a slow wave effect,
which increases the overall effective length of the antenna, thereby reducing its
resonant frequency leading to antenna miniaturization [10]. To achieve maximum
slow wave effect, fractal structures can be etched to the ground plane. In [11], Koch
curve fractal DGS structure has been etched in the ground plane of a circularly
polarized patch antenna, which resulted in considerable improvement in terms of
better radiation efficiency, optimal return loss bandwidth and size reduction. In [12]
Sierpenski carpet fractal DGS structure has been incorporated into a microstrip patch
to improve its performance and the structure optimized using PSO to achieve the
desired performance characteristics.
A microstrip patch antenna has been designed for an operating frequency of
5.2 GHz. The substrate material used for this design is FR4 glass epoxy having a
relative permittivity of 4.4. A Koch Snowflake fractal structure has been introduced
to the ground plane of the designed antenna for multiband operation and wide-
band behavior. The modified antenna with the DGS structure resonates at two new
frequencies—3.9 and 6.8 GHz. The antenna structure was further optimized using
Particle Swarm Optimization (PSO) and optimized antenna resonates at 3.5 and 5.2
GHz, with reasonably good gain. OpenEMS [13] and Octave software were used for
the antenna simulation and analysis.
2 Antenna Design
This section discusses about the methodology adopted to design the antenna. In this
proposed work, a edge fed patch antenna has been designed to operate in the Wi-Fi
band of 5.2 GHz. The introduction of a Koch Snowflake fractal DGS structure in the
ground plane results in improved performance in terms of multiband behavior. How-
ever, the frequency of operation now shifts from the originally designed frequency
of 5.2 GHz. By using Particle Swarm Optimization, the patch antenna dimensions
and the DGS structure are optimized, so that the optimized antenna now operates at
5.2 GHz and a second resonant frequency of 3.5 GHz.
Optimization of Patch Antenna with Koch Fractal DGS Using PSO 161
2.1 Patch Antenna
The patch antenna is a hugely popular antenna used for a wide array of applications,
like in satellite communication, mobile communication and aerospace application.
A basic rectangular patch antenna can be designed by the following equations [1]:
c
W = (1)
εr +1
2 fo 2
1
εr + 1 εr − 1 12h − 2
εreff = + 1+ (2)
2 2 w
0.412h(εreff + 3)( Wh + 0.264)

L = (3)
(εreff − 0.254)( Wh + 0.8)
c
L= √ − 2L (4)
2 f o εreff
where, W is the Width of the patch antenna

εr is the dielectric constant of the substrate
εreff is the effective dielectric constant
L is the length of the patch
L is the extended length.
Using the above design equations, a microstrip patch antenna was designed and
its dimensions obtained as shown in Fig. 1.
The microstrip antenna is fed using a microstrip line of characteristic impedance
50 . The patch antenna has an input impedance of 200 . To facilitate the impedance
matching between the patch antenna and the feed line, a quarter wave transformer
[14] is introduced between the antenna and the feed line.
2.2 Koch Snowflake Structure
As a means of obtaining multiband operation, a Koch snowflake DGS structure is

introduced in the ground plane of the edge fed microstrip antenna. A Koch curve is
obtained by replacing the middle portion of a straight line section, by a bend section.
In the succeeding iteration, each edge is further divided into three equal parts by
replacing the middle section by a bend section. This process is continued for every
other iteration. The iterative steps in the design of Koch Snowflake fractal is as shown
in Fig. 2.
A Koch snowflake fractal of third iteration has been etched to the ground plane
of the proposed antenna as shown in Fig. 3.
Fig. 1 Dimensions of the patch antenna
Fig. 2 Iterative steps in the fractal design of koch curve
The microstrip patch antenna was designed and simulated using openEMS software
using Octave interface. OpenEMS is a free and open source electromagnetic field
solver which utilizes the FDTD (Finite-Difference time-domain) technique. It sup-
ports both cylindrical and Cartesian co-ordinate system. Octave provides a flexible
scripting tool for OpenEMS.
Fig. 3 Koch snowflake fractal DGS structure of iteration 3
3.1 Patch Antenna
The reflection coefficient (S11 ), 2D and 3D radiation pattern of the patch antenna are
as shown in Figs. 4, 5 and 6.
Fig. 4 S11 of patch antenna

Fig. 5 2D pattern of patch antenna
Fig. 6 Patch antenna gain

The antenna resonates at 5.2 GHz and its reflection coefficient is −18 dB as shown
in Fig. 4. Its 2D and 3D pattern are as shown in Figs. 5 and 6. The gain of the antenna
is 5.14 as shown in Fig. 6.
3.2 Patch Antenna with Fractal DGS
For improving the performance of the patch antenna, a Koch Snowflake fractal DGS
structure was introduced to the ground plane. Now the antenna operates at two
frequencies—3.9 and 6.8 GHz. However the operating frequency of the antenna has
shifted from its originally designed resonant frequency of 5.2 GHz. As a means of
obtaining the original operating frequency of 5.2 GHz, Particle Swarm Optimization
has been applied to the antenna structure. Using PSO, the dimensions of the fractal
DGS and the patch dimensions has been optimized so that the antenna still resonates
at 5.2 GHz, along with a new operating frequency of 3.9 GHz.
The reflection coefficient (S11 ), 2D and 3D pattern of the patch antenna with
fractal DGS are as shown in Figs. 7, 8 and 9.
The antenna now resonates at two frequencies 4 and 6.8 GHz and its reflection
coefficient values are −15 dB and −10 dB as show in Fig. 7. Its 2D and 3D pattern
are as shown in Figs. 8 and 9, respectively.
The antenna gain is 3.376 as shown in Fig. 9.
Fig. 7 S11 of patch antenna with fractal DGS

Fig. 8 2D pattern of patch antenna with fractal DGS
Fig. 9 Gain of the patch antenna with fractal DGS

Fig. 10 PSO Implementation using OpenEMS
3.3 PSO Using OpenEMS
Particle Swarm Optimization is a population-based stochastic optimization algorithm

that was motivated by the collective intelligent behavior exhibited by certain animals
like swarm of bees or flock of birds. PSO algorithm was discovered by Reynolds and
Heppner, and the algorithm was simulated by Kennedy and Eberhart in 1995. PSO
is computationally more efficient when compared to Genetic Algorithm.
The block diagram explaining the PSO implementation in OpenEMS is as shown
in Fig. 10.
The substrate parameters along with the desired operating frequency is given as
input to OpenEMS, and the structure modeled using CSXCAD format. We set a
minimum and maximum values for the dimensions of the antenna (patch and fractal
dimension). Also, we set a seed value to initiate the PSO algorithm. A suitable fitness
function is formed using S—parameters, as given below:
1
F(w, l, L) = + λ × S11 ( f ) (5)
Gain( f )
where, l, w—Length and Width of the patch antenna

L—Dimensions of the fractal DGS
Gain (f )—Gain at the designed frequency of 5.2 GHz
S11 ( f )—Magnitude of the reflection coefficient at 5.2 GHz
λ—Lagrange Multiplier.
The dimension that satisfy the fitness function is considered to be the optimized
dimension. The attractive feature of this technique is that the antenna can be designed
for a desired frequency (Table 1).
The reflection coefficient (S11 ), 2D and 3D pattern of the optimized patch antenna
with fractal DGS are as shown in Figs. 11, 12 and 13.
The optimized antenna operates at two frequencies—3.5 and 5.2 GHz and its
reflection coefficient values are −23 dB and −16 dB as shown in Fig. 11. Its 2D and
3D pattern are as shown in Figs. 12 and 13, respectively.
The antenna gain is 2.275 as shown in Fig. 13.
A summary of the results is given in Table 2.
Table 1 Optimized dimensions

Dimensions (in mm) l w L
Original 12.56 17.56 10
Optimized 12 15.835 21.725
Fig. 11 S11 of the optimized antenna

Fig. 12 2D pattern of the optimized antenna
Fig. 13 Gain of the optimized antenna

Table 2 Result summary

Ant. Struc. Res. Freq. S11 value BW at 5.2 GHz
Patch Ant. 5.2 GHz −18 dB 50 MHz
Patch Ant. with DGS 3.9 & 6.8 GHz −15 & −10 dB –
Optimized Ant. 3.5 & 5.2 GHz −23 & −16 dB 100 MHz
4 Conclusion
The design and simulation of a microstrip patch antenna for an operating frequency
of 5.2 GHz are presented in this paper. A Koch Snowflake fractal DGS structure was
incorporated into the ground plane for improving the antenna performance. However,
its resonating frequency deviated from its originally designed frequency of 5.2 GHz.
So the antenna structure was optimized using PSO. The optimized antenna resonates
at 3.5 GHz and 5.2 GHz and show marked improvement in bandwidth as show in
Table 2. The antenna can be used for Wi-Fi and 5G mobile applications.
References
1. Balanis CA (2005) Antenna theory: analysis and design, 3rd edn. Wiley, New York
2. Pozar DM (1992) Microstrip antennas. Proc IEEE 80(1):79–81
3. Maci S, Biffi Gentili G (1997) Dual-frequency patch antennas. IEEE Antennas Propag Maga
39(6):13–20
4. Werner DH, Ganguly S (2003) An overview’ of fractal antenna engineering research. IEEE
Antennas Propag Maga 45(I)
5. Sindou M, Ablart G, Sourdois C (1999) Multiband and wideband properties of printed fractal
branched antennas. Electron Lett 35:181–182
6. Petko JS, Werner DH (2004) Miniature reconfigurable three-dimensional fractal tree antennas.
IEEE Antennas Propag Maga 52(8):1945–1956
7. Masroor I, Ansari JA, Saroj AK (2020) Inset-fed cantor set fractal multiband antenna design for
wireless applications. International Conference for Emerging Technology (INCET) 2020:1–4
8. Yu Z, Yu J, Ran X (2017) An improved koch snowflake fractal multiband antenna. In: 2017 IEEE
28th Annual international symposium on Personal, Indoor, and Mobile Radio Communications
(PIMRC), 2017, pp 1–5
9. Tiwari R (2019) A multiband fractal antenna for major wireless communication bands. In: 2019
IEEE International Conference on Electrical, Computer and Communication Technologies
(ICECCT), 2019, pp 1–6
10. Guha D, Antar YMM (2011) Microstrip and printed antennas–new trends, techniques and
applications, 1st edn. Wiley, UK
11. Ratilal PP, Krishna MGG, Patnaik A (2015) Design and testing of a compact circularly
polarised microstrip antenna with fractal defected ground structure for L-band applications.
IET Microwaves Antenna Propag 9(11):1179–1185
12. Kakkara S, Ranib S (2013) A novel antenna design with fractal-shaped DGS using PSO for
emergency management. Int J Electron Lett 1(3):108–117
13. Liebig T, Rennings A, Erni D (2012) OpenEMS a free and open source Cartesian and cylin-
drical EC-FDTD simulation platform supporting multi-pole drude/lorentz dispersive material
models for plasmonic nanostructures. In: 8th Workshop on numerical methods for optical
nanostructures
14. Pozar DM (2012) Microwave engineering, 4th edn. Wiley, New York
Artificial Intelligence-Based
Phonocardiogram: Classification Using
Cepstral Features
A. Saritha Haridas, Arun T. Nair, K. S. Haritha, and Kesavan Namboothiri
Abstract When cardiovascular issues arise in a cardiac patient, it is essential to diag-

nose them as soon as possible for monitoring and treatment would be less difficult than
in the old. Paediatric cardiologists have a difficult time keeping track of their patients’
cardiovascular condition. To accomplish this, a phonocardiogram (PCG) device was
created in combination with a MATLAB software based on artificial intelligence (AI)
for automatic diagnosis of heart state classification as normal or pathological. Due
to the safety concerns associated with COVID-19, testing on school-aged children
is currently being explored. Using PCG analyses and machine learning methods, the
goal of this work is to detect a cardiac condition, whilst operating on a limited amount
of computing resources. This makes it possible for anybody, including non-medical
professionals, to diagnose cardiac issues. To put it simply, the current system consists
of a distinct portable electronic stethoscope, headphones linked to the stethoscope,
a sound-processing computer, and specifically developed software for capturing and
analysing heart sounds. However, this is more difficult and time-consuming, and the
accuracy is lowered as a result. According to statistical studies, even expert cardiol-
ogists only achieve an accuracy of approximately 80%. Nevertheless, primary care
doctors and medical students usually attain a level of accuracy of between 20 and
40%. Due to the nonstationary nature of heart sounds and PCG’s superior ability to
model and analyse even in the face of noise, PCG sounds provide valuable infor-
mation regarding heart diseases. Spectral characteristics PCG is used to characterise
heart sounds in order to diagnose cardiac conditions. We categorise normal and
abnormal sounds using cepstral coefficients, or PCG waves, for fast and effective
identification, prompted by cepstral features’ effectiveness in speech signal classifi-
cation. On the basis of their statistical properties, we suggest a new feature set for
A. Saritha Haridas (B)

Electronics and Communication Engineering, KMCT College of Engineering, Kozhikode, Kerala,
India
A. T. Nair · K. Namboothiri
KMCT College of Engineering, Kozhikode, Kerala, India
K. S. Haritha
Government Engineering College, Kannur, India
https://doi.org/10.1007/978-981-16-7610-9_13
174 A. Saritha Haridas et al.
cepstral coefficients. The PhysioNet PCG training dataset is used in the experiments.
This section compares KNN with SVM classifiers, indicating that KNN is more
accurate. Furthermore, the results indicate that statistical features derived from PCG
Mel-frequency cepstral coefficients outperform both frequently used wavelet-based
features and conventional cepstral coefficients, including MFCCs.
Keywords Phonocardiogram · AI · Health care · Cardiovascular disorders
1 Introduction
Worldwide, cardiopulmonary disease is the major cause of mortality. Congenital

heart illness may not exhibit symptoms until later in life, at which time treatment
becomes difficult. As a result, it is important to do research on childhood cardiac
disorders. Automatic detection of heart sound waves (phonocardiogram) through
artificial intelligence algorithms is one straightforward approach in this field. This
configuration enables non-medical persons to do cardiac tests. Centuries ago, physi-
cians utilised auscultation to make cardiovascular diagnoses (CVD). The subsequent
stethoscope was meant to provide a more pleasant auscultation experience for the
patient, and the same device is still widely used in modern medicine to diagnose
cardiovascular illness. Auscultation is a technique used to diagnose CVD that requires
significant training and expertise in recognising abnormal heart sounds. According to
statistical studies, even expert cardiologists achieve only approximately 80% accu-
racy, whereas primary care doctors and medical students usually reach a level of
accuracy of around 20–40%.
The phonocardiogram has been shown to be very successful and risk-free in
detecting heart abnormalities early. Additionally, PCG measuring equipment is
straightforward, simple to use and cost-effective. A homemade electronic phono-
cardiogram is being used to record and analyse the heart sound coming from of the
chest wall in order to identify whether the heart activity is normal or pathological.
The patients may be referred to a specialist for additional evaluation and treatment
based on the results. Due to the immense promise in this field, much research is
being conducted to develop an automated method for detecting cardiac problems
using the PCG signal. It is possible to categorise different cardiac illnesses based on
the nonlinear changes of heart sound variation, which is a dataset of physiological
data. For automated cardiac auscultation, we selected wavelet-based feature extrac-
tion and support vector machine (SVM) because of their better capacity to model
and assess sequential data in the presence of noise. It was obtained from the 2016
PhysioNet Computation in Cardiology Challenge [1] that the dataset was utilised
for training and testing phonocardiograms were recorded using sensors implanted in
four typical areas of the human body: the pulmonic region, the aortic area, the mitral
area and the tricuspid region. The data were analysed using the phonocardiograms.
It is anticipated that the results of this research will have significant ramifications
for the early identification of cardiac disease in school-age children in India. This
Artificial Intelligence-Based Phonocardiogram: Classification … 175
research work summarises our effort’s idea, including the hardware and software
phases, as well as the works that inspired us. We were able to solve the problem of
heart sound analysis by developing a low-cost and effective biotechnological system
for capturing and processing PCG signals.
2 Literature Review
Luisada et al. [2] performed a clinically and graphic research study on 500 children
of school age, using the phonocardiogram technique to collect data. Three examiners
conducted clinical auscultation, and a phonocardiographic test revealed 114 (22.8%)
abnormal occurrences. There was no correlation between the heart sound and the
child’s height/weight in the research. Tae H. Joo et al. examined phonocardiograms
(PCGs) of aortic heart valves [3]. They identified frequency domain characteris-
tics by using a parametric signal modelling method that was designed specifically
for sound wave categorization purposes. According to the model, a high-resolution
spectral estimate is provided, from which the frequency domain characteristics may
be deduced. PCGs were classified using two stages of classification: feature selection
first, followed by classification. The classifiers are trained on the locations of the two
maximum spectral peaks.
The classifier successfully identified 17 patients out of a total of 20 cases in
the training set. A method for assessing children for murmur and adults for valve
abnormalities was established by Lukkarinen et al. [4] in situations when ultrasonog-
raphy exams are not readily accessible. The equipment comprises of a stand-alone
electronic stethoscope, stethoscope-mounted headphones, a sound-capable personal
computer, and software applications for capturing and analysing heart sounds. It is
possible to perform a number of operations and research on heart beat and murmur
in the 20 Hz to 22 kHz range of frequencies thanks to the technology that has been
created [5]. Highlighted the essential phases involved in the generation and interpre-
tation of PCG signals. This article discusses how to filter and extract characteristics
from PCG signals using wavelet transformations. Additionally, the authors high-
light the gaps that exist between existing methods of heart sound signal processing
and their clinical application. The essay highlights the limitations of current diag-
nostic methods, namely their complexity and expense. Additionally, it addresses the
requirement for systems capable of correctly obtaining, analysing and interpreting
heart sound data to aid in clinical diagnosis. B. Techniques Artificial Intelligence-
based Shino et al. [6] present a technique for automatically categorising the phono-
cardiogram using an Artificial Neural Network (ANN). A national phonocardiogram
screening of Japanese students was used to validate the method. 44 systolic murmurs,
61 innocent murmurs and 36 normal data are highlighted in the test findings.
The melodic murmur was effectively isolated from the potentially dangerous
systolic murmur via the use of frequency analysis. When it comes to making the
final decision, the procedure is very beneficial for medical professionals. Strunic et al.
[7] created an alternative method for PCG classification by using Artificial Neural
Networks (ANN) as a detector and classifier of heart murmurs, in addition to the

techniques previously described. Heart sounds were categorised into three categories:
normal, aortic stenosis and aortic regurgitation, using both generated and real patient
heart sounds. They were able to categorise with an accuracy of up to 85.74 per
cent. It has been shown that the precision of a team of health students is significantly
associated with the precision of the ANN system when simulated sounds are present.
Using convolutional neural networks, Sinam Singh et al. [8] developed an efficient
technique for identifying PCG. They created two-dimensional scalogram pictures
using a pre-trained AlexNet and sound data from the PhysioNet2016 competition,
in addition to the continuous wavelet transform (CWT). Scalogram images were
utilised in conjunction with deep learning to construct a convolutional neural network.
The proposed approach achieved favourable results by minimising segmentation
complexity.
3 Materials and Methods
Phonetic cardiography (PCG) is a method for capturing and visualising the sounds
produced by the human heart during a cardiac cycle [9]. It is used to diagnose and treat
heart failure. This technique is carried out using a phonocardiogram, which is a kind
of electrocardiogram. Various dynamic processes happening within the circulatory
system, such as the relaxation of the atria and ventricles, valve motion and blood flow
resulted in the production of this sound. When it comes to screening and detecting
heart rhythms in healthcare settings, some well stethoscope method has long been
the gold standard. Auscultation of the heart is the study of determining the acoustic
properties of heart beats and murmurs, including their frequency, intensity, number
of sounds and murmurs, length of time and quality.
One significant disadvantage of conventional auscultation is that it relies on
subjective judgement on the part of the physician, which may lead to mistakes in
sound perception and interpretation, thus impairing the accuracy of the diagnosis. The
creation of four distinct heart sounds occurs throughout a cardiac cycle. During the
first heartbeat of systole, the first cardiac sound, often abbreviated S1, is generated by
the turbulence induced by the mitral and tricuspid valves closing simultaneously. The
aortic and pulmonic valves close, resulting in the production of the second cardiac
sound, which is represented by the term “dub”. When a stethoscope is put on the
chest, as doctors do, the first and second heart sounds are easily distinguishable in a
healthy heart, as are the third and fourth heart sounds (Fig. 1).
The low-frequency third heart sounds (S3) is usually produced by the ventricular
walls vibrating in response to the abrupt distention caused by the pressure difference
between the ventricles and the atria, which causes the ventricular walls to vibrate. It is
only heard in youngsters and individuals suffering from heart problems or ventricular
dilatation under normal circumstances [10]. S4 is very seldom heard in a normal
heart sound because it is produced by vibrations in expanding ventricles caused by
contracting atria, which makes it difficult to detect. Each of the four heart beat has a
Fig. 1 A typical phonocardiograph having the S1, S2, S3 and S4 pulses on it
distinct frequency range, with the first (S1) being [50–150] Hz, the second [50–200]
Hz, the third (S3) [50–90] Hz, and the fourth (S4) being [50–80] Hz. Moreover, the
S3 phase starts 120–180 ms after the S2 phase and the S4 phase begins 90 ms before
the S1 phase.
4 Existing System
The well-known stethoscope technique is the usual method of screening and diag-
nosing heart sounds in primary health care settings. Auscultation of the heart is the
study of determining the acoustic properties of heart sounds and murmurs, including
their frequency, intensity, number of sounds and murmurs, length of time and quality.
One significant disadvantage of this technique of auscultation is that it relies on
subjective judgement on the part of the physician, which may lead to mistakes in
sound perception, thus impairing the validity of the diagnostic information. Figure 2
depicts a block diagram representation of a common phonocardiogram configuration.
Fig. 2 Schematic representation of the phonocardiogram setup

It is necessary to detect sound waves using a sensor, which is most often a high-fidelity
microphone. After the observed signal has been processed using a signal conditioner
such as a pre-filter or amplifier, the signal is shown or saved on a personal computer.
The CZN-15E electret microphone, two NE5534P type amplifiers, a transducer
block and connector (3.5 jack) for signal transmission to the computer, as well as a
12 V DC power supply, are the main hardware components of this workstation’s hard-
ware architecture. This particular sensor is an electret microphone, which does not
need an external power source to polarise the voltage since it is self-contained. Using
the CZN15E electret microphone in this system is something that’s being explored
[11, 12]. Table 1 has a listing of the suitable electret microphone (CZN15E) as well
as the microphone’s physical characteristics. In response to the heart’s vibrations,
vibrating air particles are sent to the diaphragm, which then regulates the distance
between the plates as a result of the vibrations transmitted to it. As air passes through
the condenser, the electret material slides over the rear plate, causing a voltage to be
generated. The voltage produced is very low, and it is necessary to provide it to the
amplifier in order for it to operate at its best. The amplifier needed for high-speed
audio should have a low-noise floor and use a minimal amount of electrical power.
A special operational amplifier, the NE5534P, was developed specifically for this
purpose.
5 Proposed System
This project’s hardware component is broken. Only the simulation part, which
consists of a few simple stages, is used, for example, to determine the cepstral proper-
ties. To accurately analyse heart sounds that are non-stationary in nature, the wavelet
transform is the most appropriate technique for the task at hand. A wavelet trans-
form is a representation of data that is based on time and frequency. Using cepstrum
analysis has many advantages. First, the cepstrum is a representation used in homo-
morphic signal processing to convert convolutionally mixed signals (such as a source
and filter) into sums of their cepstra, which can then be used to linearly separate the
signals. The power cepstrum is a feature vector that may be used to model audio
signals and is very helpful. The method to feature extraction is the most impor-
tant component of the pattern recognition process since it is the most accurate. To
measure the features of each cardiac cycle, the complete cycle must be analysed using
cepstral coefficients, which efficiently determine the log-spectral distance between
two frames. In this part, we compare the two models K-Nearest Neighbour (KNN)
and Support Vector Machines (SVM) [13] to discover which the better choice for
binary classification problems is. KNN has an exceptional accuracy.
Table 1 Review on conventional methods

Author (Citation) Methodology Features Challenges
Tae H. Joo et al. PCG • Frequency domain The denoising method
features can’t remove noises
high-resolution coming from children
spectral estimate crying and moving the
recording
Lukkarinen S et al. PGG • A system for • Mild, moderate and
screening murmurs of severe murmurs were
children and valve not graded
defects of adults.
• Frequency ranges
from 20 Hz to 22 kHz
Hideaki Shino et al. ANN • Nationally verified The sample size is
simulated and relatively low
recorded patient heart
sounds were tested
and classified heart
sounds
S. L. Strunic et al. ANN • Implemented as a The method may not
detector and classifier work well if a
of heart murmurs nonprofessional
• Classified up to volunteer records PCG
85.74% accuracy signal
F. Rios-Gutierrez Jack-Knife method • An iterative process The predicted value
et al. one sample was left quantified to 0 (≤0.5)
out each time or (>0.5) by a threshold
of 0.5
Sinam Ajitkumar CNN • Effective way of The accuracy rate were
Singh et al. SVM classifying PCG closely correlated
More accurate than
ANN
Iga Grzegorczyk et al. Hidden Markov • The segmentation of The best overall score
model the PCG signals is achieved in the official
performed phase of the PhysioNet
challenge is 0.79 with
specificity 0.76 and
sensitivity 0.81
Mawloud Guermoui EGG Characteristic features • A low-dimensional
et al. PCG extracted from PCGs feature space tested
SVM on relatively a big
dataset
James H. and Robert Parametric signal Frequency domain
S. Lees et al. modelling method features suitable for the
classification of the
valve state can be
derived
5.1 Cepstral Coefficients
Initially, frequency cepstral coefficients were proposed to aid in the recognition of

monosyllabic syllables in continuously uttered phrases, but not in the identification
of the speaker. It is feasible to simulate the human hearing system artificially by
computing cepstral coefficients. This is based on the assumption that the human ear
is a very accurate speaker recognizer, which is supported by research. It is based on
the well-known disparity between the critical bandwidths of the human ear and the
critical bandwidths of computers that cepstral characteristics are developed. In order
to preserve the phonetically significant features of the speech stream, linearly spaced
frequency filters were used at low frequencies and logarithmically spaced frequency
filters were used at high frequencies, respectively. Most voice transmissions consist
of tones with changing frequencies, with each tone having an actual frequency, f
(Hz) and a subjective pitch, computed using the Mel scale, for each tonne. The linear
spacing of the Mel-frequency scale is less than 1000 Hz, whilst the logarithmic
spacing is more than 1000 Hz, with linear spacing less than 1000 Hz. A 1 kHz tone
played at 40 dB over the perceptual hearing threshold is estimated to have a pitch of
1000 mels, and this is used as a reference in the following example. Using a filter bank,
it is possible to calculate the FCC coefficients of a signal by dissolving it and then
calculating its frequency. In terms of short-term energy, it provides a discrete cosine
transform (DCT) of the real logarithm of the spectrum on the Mel-frequency scale,
which is represented by the spectrum on the Mel-frequency scale. To identify their
contents, frequency cepstral coefficients are used to identify the contents of flight
reservations, phone numbers spoken into a phone, and voice recognition systems used
for security. A number of modifications to the basic method have been suggested in
order to boost resilience, including increasing the log-mel-amplitudes to a suitable
power (about 2 or 3) prior to applying the DCT and minimising the impact of the
low-energy parts.
5.2 KNN Classifier
The K-Nearest Neighbour algorithm is a fundamental component of Machine

Learning. It is founded on the technique of Supervised Learning. To maximise
accuracy, this approach makes an assumption about the similarity between the new
case/data and the existing cases and assigns the new instance to the category that is
most similar to the existing categories. The K-NN method keeps all available data and
classifies new data points according to their similarity to previously classified data.
This means that as new data is generated, it may be rapidly categorised into one of
the suitable categories using the K-NN method. Although the K-NN technique may
be used to solve regression as well as classification problems, it is most commonly
employed to solve classification difficulties. K-NN is a non-parametric technique,
which means that it makes no assumptions regarding the data used as a baseline. It
Fig. 3 Diagram of KNN
is commonly referred to as a lazy learner algorithm since it does not immediately

begin learning from the training set but instead stores it and then performs an action
on it when the time comes to categorise the data. During the training phase, the KNN
algorithm simply saves the dataset and classifies new data into a category that is
highly close to the dataset used for training.
Consider the following fictitious scenario: There are two categories, A and B, and
we have a new data point × 1 that we would want to allocate to one of them. This type
of difficulty necessitates the deployment of a K-NN algorithm. We can quickly and
simply discover the category or class of a given dataset by utilising K-NN methods.
Consider the following example (Fig. 3):
The following algorithm can be used to illustrate the operation of the K-NN
network:
Step-1: Choose the neighbour’s Kth number.
Step-2: Compute the Euclidean distance between K neighbouring points.
Step-3: Take the K closest neighbours based on the Euclidean distance computed.
Step-4: Count the number of data for each category amongst the k closest
neighbours.
Step-5: Allocate the new data points to the category with the most neighbours.
Step-6: KNN model.
Assume, we have a new data point and need to assign it to the appropriate category.
Consider the following illustration (Fig. 4):
• To begin, we will always select k = 5 as the number of nearest neighbours.
• Then, we’ll calculate the Euclidean distance between the two spots. The Euclidean
distance is the distance between two previously studied geometrical locations. It
is calculated in the following manner:
• We determined the nearest neighbours by computing the Euclidean distance, with
three in category A and two in category B. Consider the following example:
Fig. 4 Diagram of
classification 1
• As we can see, the three nearest neighbours all belong to category A, indicating
that this new data point must as well (Fig. 5).
5.3 SVM Algorithm
When dealing with Classification and Regression issues that require Supervised
Learning, the Support Vector Machine, or SVM, is a frequently used method. In
Machine Learning, on the other hand, it is mostly utilised to tackle categorization
issues. The goal of the SVM method is to find the optimal line or decision boundary
Fig. 5 Diagram of classification 2

Fig. 6 Diagram of
classification 1
that divides n-dimensional space into classes, enabling future data points to be cate-
gorised with ease. A hyperplane is the mathematical term for this optimal choice
boundary.
The SVM algorithm determines the hyperplane’s extreme points/vectors. Support
vectors are used to refer to these severe conditions, and the technique is called a
Support Vector Machine. Consider the diagram below, which illustrates two distinct
categories divided by a decision boundary or hyperplane (Fig. 6).
TRAINING TESTING
6.1 Hardware Requirements
A software requirements specification (SRS) is a comprehensive description of the

software system that will be developed, including both functional and nonfunctional
requirements. It is used in the development of software systems. The SRS is created
in accordance with the agreement reached between the client and the contractors.
Depending on the software system, it may include demonstrations of how the user
will interact with it. There are no requirements that cannot be met by the document
providing the software requirement specification. To build a software system, we
must first have a comprehensive understanding of the system under consideration. In
order to guarantee that these requirements are fulfilled, continuous communication
with consumers is required. When written well, a software system interaction speci-
fication (SRS) defines how a software system will interact with all internal modules,
hardware, other programmes and humans under a wide range of real-world circum-
stances. It is critical for testers to comprehend all of the information given in this
article in order to prevent making errors in test cases and the desired results of those
tests. It is highly suggested that SRS papers be thoroughly examined or tested before
developing test cases or developing a testing strategy. The importance of MATLAB
is due to its capabilities for lattice research. Today, we require a domain in which
estimate, detailing and visual representations involving numbers must be examined.
As a result, we require a dialect that uses fourth-generation technology to allow
anomalous state programming. MATLAB was created by Mathswork. Mathemat-
ical work enables the treatment of lattices; it enables calculation; data and capacity
plotting; calculation development; and user interface design; it enables the consol-
idation of programmes written in other dialects, including FORTRAN, C++, Java
and C; it also enables the dissection of data and the creation of unique applications
and models. It consists of a huge number of implicit charges and the usefulness of
science, which enables us to choose scientific projects, plot ages and feasible mathe-
matical techniques. It is an incredibly helpful equipment for numerical calculations
(Table 2).
There are several critical features of MATLAB:
• It is utilised for juggling numbers, constructing uses and determining.
• It creates the communal environment conducive to problem solving, outlining and
painstaking research.
• Statistics, separation, number crunching unification, straight polynomial math,
normal differential conditions and fathoming improvements are all scientific skills
included in its library, along with built-in apparatuses for graphical data perception
and bespoke plots.
Table 2 Hardware
Processor PC with a core i3processor (Recommended)
requirement
RAM 4 GB (Recommended)
Hard circle 320 GB (Recommended)
Table 3 Table of summons

Command Purpose
vectors
clc Clears command window
clear Removes variables from memory
exist Checks for existence of file or variable
global Declares variables to be global
help Searches for a help topic
Look for Searches help entries for a keywords
Quit Stops MATLAB
Who Lists current variables
Whose Lists current variables (long display)
• It is a tremendously powerful equipment for increasing the character of codes

and widening the introduction of the interface. It offers instrumentation for the
graphical user interface.
• Additionally, it offers tools for connecting non-MATLAB programmes such as
Microsoft Excel, .Net, Java and C with MATLAB computations.
It also made extensive use of a range of applications, including the following:
It includes a programme called Matlab that enables users to manipulate numbers
and see information. Using the provoke “ >>” in the charge window, you can effec-
tively create the summons. Consumers commonly employ a few basic summonses.
A table detailing such orders is given below (Table 3).
Number vectors are one-dimensional representations. In MATLAB, vectors are
divided into two types: Column Vectors: When the arrangement of data or compo-
nents is restricted by square portions, this sort of vector is used; for unconstrained
components, we use a comma or space. Section Vectors: When the arrangement of
data or components is restricted by square sections, this sort of vector is used; for
unrestrained components, a semicolon is used.
Plotting
To create the chart in MATLAB, we must follow the following steps:
1. Define the range of the x variable and, additionally, denote the type of task for
which f x are shown.
2. Similarly, capacity y is defined.
3. There is a summons known as a plot, sometimes known as a scheme (x, y).
MATLAB diagram plot.
Additional modifications to this design include adding a title, naming the x- and y-
hubs, framing network lines connecting the chart plot zones, and altering the graphic’s
tomahawks.
7.1 Training
Preprocess the dataset prior to initiating the training method. Through the use of
randomised augmentation, it augments the training dataset. Additionally, augmenting
enables the training of networks to be insensitive to picture data abnormalities.
Resizing and grayscale conversion are included in the pre-processing. Individuals
were classified into two groups in this section: “Normal” and “Abnormal”. One
can determine the progress of training by tracking several factors. When the “Plots”
option in training Options is set to “training-progress” and the network is trained, train
network generates a figure and shows training metrics for each iteration. Each cycle
determines the gradient and adjusts the parameters of the network. If training options
include validation data, the picture displays validation metrics for each time the
train network validates the network. It provides information on training correctness,
validation accuracy and train loss (Fig. 7).
Confusion matrix and receiver operating characteristic curves illustrate the
system’s performance. Confusion matrix constructs a Confusion Matrix Chart object
from a confusion matrix chart that includes both true and anticipated labels. The
confusion matrix’s rows correspond to the real class, and its columns to the predicted
class. Diagonal and off-diagonal cells represent properly categorised observations
and erroneously classified observations, respectively (Fig. 8).
Fig. 7 Signal analyser

In a binary classification, the first green colour denotes the positive (abnormal),
whereas the second green colour denotes the negative (normal) (normal). We picked
a total of 19 abnormal signals and 29 normal signals for testing. For aberrant signals,
true positives (TP) are 19 (green in the confusion matrices), but true negatives (TN)
are zero (pink colour). That is, all 19 aberrant signals in this case are correctly
anticipated as abnormal. Thus, the true negative is zero in this case, suggesting one
hundred per cent accuracy. Five false positives and twenty-four false negatives are
displayed for every 29 normal signals. That instance, normal 5 signals are incorrectly
classified as abnormal, but normal 24 signals are correctly classified as normal. This
equates to an accuracy of 82.8%. Overall, the accuracy is 95%.
The receiver operational curve (ROC curve) is a graph that illustrates the perfor-
mance of a classification model over all categorization criteria. The genuine positive
rate (Y axis) and the false positive rate (X axis) are shown on this curve (x-axis). The
word “True Positive Rate” is a colloquial term for “recall”.
It is defined as follows:
TP
TPR =
TP + FN
False Positive Rate (FPR) is defined as follows

FP
FPR =
FP + TN
The receiver operating characteristic (ROC) curve depicts the connection between
TPR and FPR over a range of classification criteria. Reduce the threshold for positive
classification, and more items are labelled as positive, increasing both False Positives
and True Positives. A typical receiver operating characteristic (ROC) curve is seen in
the accompanying figure. To compute the points on a ROC curve, we might analyse
a logistic regression model several times with varied classification criteria. However,
this would be wasteful. Fortunately, there is a quick, sorting-based approach known
as AUC that can provide this information. In this situation, the curve formed is
nonlinear (Fig. 9).
AUC is an abbreviation for the Area Under the Receiver Operating Charac-
teristic Curve. The term AUC stands for “Area Under the ROC Curve“. That is,
AUC measures the whole two-dimensional area beneath the entire receiver oper-
ating characteristic curve (consider integral calculus) from (0,0) to (1,1). (100,100).
(1,1).
Dataset Testing Results:
Abnormal Case
See Fig. 10.
Normal Case
See Fig. 11 and Table 4.
Fig. 9 ROC Curve

Fig. 10 PCG Classification result (Abnormal)
Fig. 11 PCG Classification result (Normal)

Table 4 Features of ROC

Accuracy 0.899
curve
Sensitivity 1
Specificity 0.79167
Precision 0.82759
Recall 1
f_measure 0.90566
g mean 0.88976
8 Future Scope
Similarly, electrocardiogram (ECG) data should be analysed using the artificial intel-
ligence (AI) approach. Rather of just classifying cardiovascular diseases as normal
or abnormal, this future endeavour will give them names. Finally, it is recommended
that the PCG and ECG techniques be integrated and used to heart disease diagnostics
in order to enhance the prediction of coronary artery disease.
9 Conclusion
This piece is divided into three different sections. This research is divided into two
stages: the first includes gathering phonocardiogram data, and the second involves
creating an artificial intelligence-based computer system for automatically distin-
guishing normal from pathological heart sounds. Third step is cardiovascular disease
screening of a limited group of PCG as part of their social responsibility. The proce-
dure has been finished in its entirety. After acquiring the PCG signal, features were
extracted using the cepstral coefficient and classification conducted using the KNN
and Support Vector Machine (SVM) techniques. The best choice is shown by KNN.
We used PCG data from the well-known Physio Net online service to conduct
training and testing. The training procedure is significantly faster than previous
feature extraction techniques.
References
1. Moody B, Li-wei H, Johnson I (2016) Classification of normal/abnormal heart sound record-

ings: the PhysioNet/computing in cardiology challenge 2016. In: International conference on
computing in cardiology (CinC), Vancouver, BC, Canada
2. Luisada A, Haring OM, Aravanis C (1958) Murmurs in children: a clinical and graphic study
in 500 children of school age. Brit Heart J 48:597–615
3. Joo TH, James H, Lees RS (1983) Pole-zero modeling and classification of phonocardiograms.
IEEE Trans Biomed Eng BME-30:110–118
4. Lukkarinen S, Noponen AL, Skio K (1997) A new phonocardiographic recording system. J

Comput Cardiol 24:117–120
5. Emmanuel BS (2012) A review of signal processing techniques for heart sound analysis in
clinical diagnosis. J Med Eng Technol 36:303–307
6. Shino H, Yoshida H, Sudoh J (1996) Detection and classification of systolic Murmur for
phonocardiogram screening. In: Proceedings of the 18th Annual international conference of
the IEEE Engineering in Medicine and Biology Society, Amsterdam, pp 123–125
7. Strunic SL, Rios-Gutierrez F, Alba-Flores R (2007) Detection and classification of cardiac
Murmurs using segmentation techniques and artificial neural networks. In: IEEE symposium
on computational intelligence and data mining, Honolulu, HI, USA
8. Singh SA, Majumder S, Mishra M (2019) Classification of short un-segmented heart sound
based on deep learning. In: IEEE international instrumentation and measurement technology
conference, Auckland, New Zealand 2019.
9. Nair AT, Muthuvel K (2021) Automated screening of diabetic retinopathy with optimized deep
convolutional neural network: enhanced moth flame model. J Mech Med Biol 21(1):2150005
(29 p). https://doi.org/10.1142/S0219519421500056
10. Nair AT, Muthuvel K (2020) Blood vessel segmentation and diabetic retinopathy recognition:
an intelligent approach. In: Computer methods in biomechanics and biomedical engineering:
imaging & visualization. https://doi.org/10.1080/21681163.2019.1647459
11. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy.
Publication in Lecture Notes, Springer, Berlin
12. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy.
In: Publication in the IOP: Journal of Physics Conference Series (JPCS), Web of Science
13. Nair AT, Muthuvel K (2020) Research contributions with algorithmic comparison on the diag-
nosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030(29 p). https://doi.org/10.
1142/S0219467820500308
14. Guermoui M, Mekhalfi ML, Ferroudji K (2013) Heart sounds analysis using wavelets responses
and support vector machines. In: 8th International Workshop on Systems, Signal Processing
and their Applications (WoSSPA)
15. Grzegorczyk I, Perka A, Rymko J (2016) PCG classification using a neural network approach.
In: Computing in cardiology conference, Vancouver, BC, Canada
16. Abbas AK, Bassam R (2009) Phonocardiography signal processing: IEEE synthesis lectures
on biomedical engineering. Aachen University of Applied Science
Severity Classification of Diabetic
Retinopathy Using Customized CNN
Shital N. Firke and Ranjan Bala Jain
Abstract Diabetic retinopathy is an issue that impacts the eyes due to diabetes. The
problem is caused by arteries in the light-sensitive tissue in the eyeball. It is becoming
extremely crucial to diagnose early important things to save many lives. This work
will be classified by identifying patients with diabetic retinopathy. A convoluted
neural network has been developed using K-Fold cross-validation technology to make
the above diagnosis and to give highly accurate results. The image is put through
convolution and max-pooling layers that are triggered with the ReLU function before
being categorized. The softmax function was then utilized to complete the process by
triggering the neurons in the dense layers. While learning the system, the accuracy
improves, and at the same period, the loss is reduced. Image enhancement is used
before installing the algorithm to reduce overfitting. The network-based convolution
neural network gave a total validation accuracy of 89.14%, recall of 82%, precision
of 83%, and F1-Score of 81%.
Keywords Convolutional neural network · Cross validation · Diabetic

retinopathy · Deep learning · K-Fold
1 Introduction
Diabetes is a long-term condition in which blood sugar levels rise due to a lack of
insulin [1]. It impacts 425 million adults globally. Diabetes influences the retina,
nerves, heart, and kidneys [2].
Diabetic retinopathy (DR) is the common cause of eyesight loss. DR will impact
191 million individuals worldwide by 2030 [3]. It happens when diabetes harms
S. N. Firke (B) · R. B. Jain

Electronics and Telecommunication Engineering, Vivekanand Education Society’s Institute of
Technology, University of Mumbai, Mumbai, India
e-mail: 2018.shital.firke@ves.ac.in
R. B. Jain
e-mail: ranjanbala.jain@ves.ac.in
https://doi.org/10.1007/978-981-16-7610-9_14
194 S. N. Firke and R. B. Jain
Fig. 1 Steps of DR
the coronary arteries in the eyes. It can cause blind spots, and blurred vision. It can
impact a person with diabetes, and it often affects both eyes.
DR is a progressive procedure, and thus the medical experts recommend that
people with diabetes should be examined at least twice a year for indications of the
sickness [4]. There are four steps of DR such as Mild Nonproliferative DR (Mild
NPDR), Moderate Nonproliferative DR (Moderate NPDR), Severe Nonproliferative
DR (Severe NPDR), and Proliferative DR (PDR). Figure 1 shows steps of DR.
In past times, researchers have worked on improving the efficiency of DR
screening, which detects lesions such as microaneurysms, hemorrhages, and
exudates, and have established a range of models to do so. All the strategies presented
thus far are established on DR prediction and extraction of features.
Convolutional Neural Networks (CNN) or Deep CNN (DCNN) have been
frequently utilized to extract and classify information from fundus images. Recently,
Deep Learning has been widely used in DR detection and classification. It can
successfully learn the features of input data even when many heterogeneous sources
are integrated [5].
This paper’s main contribution is the establishment of an automatic DR detection
technique that depends on a short dataset. With the goal of achieving end-to-end
real-time classification from input images to patient conditions, we are working
on classifying fundus imagery based on the severity of DR. To extract multiple
significant features and then classify them into their corresponding categories, image
data preprocessing approaches are applied. We assess the model’s Accuracy, Recall,
Precision, and F1-Score.
Severity Classification of Diabetic Retinopathy … 195
The rest of the article is summarized as below: Section 2 discusses some relevant
research. We explain our proposed methods in Sect. 3. Section 4 outlines the exper-
imental findings and assesses the system’s performance. The conclusion is provided
in the end section.
2 Related Work
In the developing world, DR is the major source of disability. Depending on their

research area and field of research, a variety of work has been performed in the field
of DR. Here are some citations relevant to our paper.
Chandrakumar and Kathirvel [6] have proposed a method for classifying DR using
deep learning techniques. The authors used Kaggle, DRIVE, and STARE datasets
in this work. Various image preprocessing and augmentation methods are employed
here. Finally, they achieved accuracy ranging from 94 to 96%.
Chen et al. [7] developed a neural network-based technique for identifying DR
and categorized fundus images into 5 stages. They used APTOS blindness detection
datasets that were freely accessible to the public. Finally, the authors obtained an
accuracy rate of 80% and a kappa rate of 0.64.
Lands et al. [8] have considered the pre-trained ResNet and DenseNet architecture
to differentiate DR into 23,302 images of the APTOS database. Images are divided
into 256 × 256, and Adam was used as an optimizer. Image enlargement has been
used to overcome the problem of overfitting. The authors compared three pre-trained
structures (ResNet 50, DenseNet 121, and DenseNet 169) for multi-stage layouts.
The best build was DenseNet 169 with an average accuracy of 95%.
Xiaoliang et al. [9] have used transfer learning of the Alexnet, VGG16, and Incep-
tion v3 to train a CNN. The authors drew inspiration for their work from the Kaggle
dataset, which included 166 images. The images for each architecture were cropped to
227 × 227 for AlexNet, 224 × 224 for VGG16, and 299 × 299 for InceptionV3. They
used the network’s efficiency as the evaluation parameter, and they cross-validated
it using K-Fold validation with K = 5. The best-reported accuracy was 63.2% for
the InceptionV3 architecture.
Shaban et al. [10] detected the DR stages of the Kaggle dataset using a CNN
architecture. The images were resized to a standard size of 224 × 224 before being
delivered to the CNN. The authors used SGD as an optimizer with a 0.001 learning
rate. They obtained a training accuracy of 91% for fivefold and 92% for tenfold.
Many techniques for identifying and classifying DR phases have been used or
proposed in the literature, but there are some drawbacks: Most studies ignored prepro-
cessing steps, while the noise and low contrast affect the categorization accuracy.
Some studies proposed the DR grades diagnosis. These models were conservative,
and they were not applicable in the real being limited and imbalanced datasets.
Besides, they fall into overfitting. In the previous work, the authors used a lengthy
model. So they required large computation power and time to make the model
more realistic. For that purpose, the customized CNN model with the K-Fold cross-
validation (K-Fold CV) technique is developed here. CNN’s are simpler to train and
require many fewer parameters than fully connected networks with the same number
of hidden units. K-Fold CV method can balance out the projected features classes if
one is dealing with an unbalanced dataset. This prevents the proposed model from
overfitting the training dataset.
3 Proposed Work
The deep learning CNN strategy is highly adaptive among the most important strate-
gies in detection activities. It works well specially in image data classification, partic-
ularly in the study of retinal fundus images. CNN can extract useful information from
images, obviating the need for time-consuming human image processing. Its huge
popularity is due to architecture, which eliminates the necessity for feature extrac-
tion. These features of CNN motivate us to use customized CNN models for our
work. Figure 2 shows the general workflow of the proposed system.
It consists of four main steps: preprocessing, data augmentation, model training,
and testing. The preprocessing step includes image resizing, image normalizing, and
label encoder. To reduce overfitting issues, the image augmentation step is used.
The model is trained using the CNN with the K-Fold cross-validation technique.
Finally, measures such as Precision, Recall, Accuracy, and F1-score are calculated
to evaluate the findings and compare them to commonly used methodologies.
3.1 Dataset
The collection of data is an important aspect of the experiments for the proposed
technique’s analysis. The dataset must be chosen carefully since it must contain a
diverse collection of images. For this work, we have obtained a dataset from APTOS
Fig. 2 Proposed system

Fig. 3 Number of images in each class
blindness detection 2019 [11]. We used a publicly accessible DR detection dataset of

fundus images in Kaggle, which has 3699 images taken under a number of imaging
circumstances.
The images are divided into five folders, each corresponding to a different class.
The folders are numbered from 0 to 4, where 0 indicating No DR, 1 indicating mild,
2 indicating moderate, 3 indicating severe, and 4 indicating proliferative DR. There
are 1974 images labeled as No DR, 315 images labeled as Mild, 959 images labeled
as Moderate, 168 images labeled as Severe, and 283 images labeled as Proliferative.
Figure 3 shows the number of images in each class.
3.2 Preprocessing
The fundus photographs used in this method are collected from Kaggle in various
forms. So it is necessary to apply preprocessing stages. Here, we apply various
image preprocessing stages such as Image Resizing, Normalizing Image, and Label
Encoder.
3.2.1 Image Resizing
Images are resized to 128 × 128 pixels to be made ready as input to the system.
3.2.2 Image Normalizing
The technique of adjusting a collection of pixel values to create an image more visible
or standard to the senses is known as image normalization. It is used to eliminate
noise from images. By dividing by 255, the pixel values are rescaled into Null and 1.
3.2.3 Label Encoder
Substituting a numeric value ranging from zero and the n classes minus 1 for the
category variable value has five distinct classes, this method is employed (0, 1, 2, 3,
and 4).
3.3 Dataset Splitting
Here, the dataset is categorized into two phases: a training phase and a testing phase.
80% of data is adopted for training and 20% of data is used for testing. The training
phase comprises a known output, and the model learns from it in order to obtain new
data in future. To evaluate our model’s prediction on this subset, we have the testing
dataset. The full dataset consists of 3699 fundus images [11], which are divided into
2959 training, and 740 testing images.
In the training set, a total of 2959 images of which 1590 images are labeled as
NO DR, 259 images are labeled as Mild, 751 images are labeled as Moderate, 132
images are labeled as Severe and 227 images are labeled as Proliferative.
In the testing set, a total of 740 images of which 384 images are labeled as NO
DR, 56 images are labeled as Mild, 208 images are labeled as Moderate, 36 images
are labeled as Severe and 56 images are labeled as Proliferative. Figure 4 shows the
number of images in each class for training and testing sets.
3.4 Data Augmentation
CNN (deep learning) models are used to split the DR images. The efficiency of the
algorithms can be improved by data augmentation. Data augmentation is applied to
training data, so as to improve its quality, size, and adeptness. The parameters used
in data augmentation are shown in Table 1.
Fig. 4 Number of images in each class for training and testing sets
Table 1 Data augmentation

Technique Setting
used
Zoom 0.2
Rotation 50
Width shift range 0.2
Horizontal Flip True
Fill mode Nearest
3.5 K-Fold CV
To segregate training datasets, the K-Fold CV resampling method is used. This aids
in evaluating the CNN model’s capability. Here, the dataset of training is detached
into 5 independent folds in K-Fold CV, with 5-1 folds utilized to train the algorithm
and the remaining one is saved for validation. This procedure is done until all of the
folds have been used only once as a validation set. 5-Fold CV is depicted in Fig. 5.
3.6 CNN Classification
The CNN algorithm is such a well-known and commonly utilized DL [12]. It is

among the very well-known methods for recognizing and categorizing images. CNN
requires less preprocessing when contrasted to other classification methods. CNN
architectural design is shown in Fig. 6.
Fig. 5 Process of 5-fold cross validation
Fig. 6 Architecture of CNN
3.6.1 Convolutional Layer
The first and most significant layer in CNN is the convolution. The feature selection
layer is so named because it’s where the image’s parameters are taken. The image is
supplied to the filters in convolution. It is the pointwise multiplication of functions
to build the third function. A convoluted feature is a matrix created by applying a
kernel to an image and computing the convolution operation.
In the example illustrated in Fig. 7, there is a 5 × 5 input image with pixel values
of 0 or 1. A 3 × 3 filter matrix is also given. The filter matrix slides over the image
and computes the dot product to produce the convolved feature matrix.
3.6.2 Pooling Layer
To reduce the breadth of a source volume’s dimensions, a pooling layer is utilized. The
thickness of the source is not reduced by this layer. This layer is often used to reduce
the image’s dimensionality, minimizing the processing power needed to process it.
Fig. 7 Example of convolution layer
Fig. 8 Example of pooling

layer
The highest value present in a specified kernel is maintained in max pooling, while
all other values are eliminated. In average pooling, the mean of almost all of the
values in a kernel is stored.
In the example, the max-pooling operation is used on the 4 × 4 input. It’s really
easy to divide it into separate regions. The output is 2 × 2 each of the outputs will
simply be the maximum value from the shaded zone. In average pooling, the average
of the numbers in green is 10. This is average pooling. Figure 8 illustrates the Pooling
operation.
3.6.3 Fully Connected (FC) Layer
The final output of the pooling and convolution layers are flat and coupled to one
or more fully linked layers. It is also termed the Dense layer. When the layers are
fully integrated, all neurons in the preceding layer are coupled with all neurons in
Fig. 9 Example of fully connected layer
the subsequent layer. Layers that are fully connected, often known as linear layers
[13]. Figure 9 depicts the FC Layer.
The result obtained from the proposed method is compared with the existing
method. The experiments are performed on Jupyter Notebook. Keras is used with
the TensorFlow machine learning back-end library.
In the deep learning process, optimizers are crucial. Here, Adam Optimizer is
utilized. It is efficient and takes minimal memory. In this work, the learning rate
and the number of epochs or iterations are set to be 0.0001 and 20, respectively.
The batch size is considered as 32 images per epoch. With all these parameters, the
Neural Network has been designed to learn. The algorithm measures performance on
different parameters like Accuracy, Confusion Matrix, Precision, Recall, F1 Score.
4.1 Hardware and Software Requirement
For contrast adjustment, color balance adjustment, rotation, or cropping, the image
editing tool is employed. The NumPy package is used for image resizing during the
preprocessing stage. Theano library is used to implement CNN architecture. The

hardware Intel (R) Core (TM) i5-8265U CPU @ 1.60 GHz 8 GB RAM is used.
4.2 Performance Evaluation
Various evaluation measures are used to assess the quality of a model. We evaluated
our proposed model using Accuracy, Precision, Recall, and F1 Score. Where, TP=
True Positive, TN= True Negative, FP= False Positive, FN= False Negative.
4.2.1 Accuracy
To calculate accuracy, positive and negative classes are utilized.
TP + TN
Accuracy = (1)
TP + FP + FN + TN
4.2.2 Precision
It is expressed as a percentage of accurately predicted positive findings to total

expected positive findings [14].
TP
Precision = (2)
TP + FP
4.2.3 Recall (Sensitivity)
The percentage of accurately predicted positive findings to all findings in the actual
class is computed as a recall [15].
TP
Recall = (3)
TP + FN
4.2.4 F1-Score
It is the melodic mean of accuracy and recall [15].

Precision ∗ Recall
F1 − Score = 2 ∗
Precision + Recall
4.3 CNN Architecture
The CNN architecture implementation is described in Table 2. This architecture

consists of 2 convolutional layers, 3 fully connected layers along with Max pool
layers. This architecture is trained with the APTOS 2019 Blindness Detection dataset
images.
Table 3 depicts the results obtained from the Confusion Matrix. In this matrix, all
diagonal values show the number of correct predictions out of available images, for
each class.
Table 4 shows the performance evaluation reports.
Table 2 CNN architecture

Layer number Type Maps and neurons Kernel size
0 Input 3 × 128 × 128 –
1 Conv1 32 × 126 × 126 3×3
2 ReLu 32 × 126 × 126 –
3 Pool1 32 × 63 × 63 2×2
4 Conv2 64 × 61 × 61 3×3
5 ReLu 64 × 61 × 61 –
6 Pool2 64 × 30 × 30 2 × 23
7 FC3 256 –
8 FC4 128 –
9 Softmax 5 –
Table 3 Confusion matrix results predicted results

Predicted results
Actual results Class 0 Class 1 Class 2 Class 3 Class 4
366 4 13 0 1
6 40 5 1 4
27 26 151 1 3
3 3 12 18 0
7 10 10 0 29
Table 4 Performance evaluation report stages

Stages Precision Recall F1-Score
Class 0 0.89 0.95 0.92
Class 1 0.48 0.71 0.58
Class 2 0.79 0.73 0.76
Class 3 0.90 0.50 0.64
Class 4 0.78 0.52 0.62
Table 5 Comparison with recent works

Authors Method Number of class Training accuracy Average CV
accuracy (%)
Xiaoliang et al. AlexNet 5 – 37.43
[9] VGG 16 50.03
InceptionNet V3 63.23
Shaban et al. Modified VGG 19 3 91% 88
[10]
Proposed CNN 5 93.39% 89.14
method
4.4 Comparison with Previous Works
The findings of the developed CNN for the APTOS dataset were compared to two
recent studies that employed the same K-Fold CV Method. Table 5 shows a contrast of
the performance measures. The constructed CNN model generated the highest perfor-
mance metrics for detecting the five steps of DR, as depicted in this table. In order
to make a comparison, we looked at four separate indicators to make a comparison.
These are the following: (i) Method, (ii) Number of class, (iii) Training Accuracy, and
(iv) Average CV Accuracy. In comparison with existing methodologies, the proposed
customized CNN strategy gives 93.39% training accuracy and 89.14% average CV
accuracy. This improvement is obtained due to the use of two convolutional layers
for feature extraction and three fully connected layers for classification.
5 Conclusion
Diabetes is an incurable disease that has spread throughout the world. The only
method to fix this problem is to detect the disease early and take preventative action
to reduce the disease impact. In this study, a customized CNN model is established
for DR classification with the K-Fold CV technique. A customized CNN model has
5 layers, including 2 convolutional layers for feature extraction and 3 fully connected
layers for classification. The customized model shows more promising results than
pre-trained models. Experimental results show an average CV accuracy of 89.14%
by the K-Fold CV technique.
References
1. Taylor R, Batey D (2012) Handbook of retinal screening in diabetes: diagnosis and manage-
ment, 2nd edn. Wiley-Blackwell
2. International diabetes federation—what is diabetes. Available online at https://www.idf.org/
aboutdiabetes/what-is-diabetes.html. Accessed on Aug 2020
3. Chen XW, Lin X (2014) Big data deep learning: challenges and perspectives. IEEE Access pp
514–525
4. Sungheetha A, Sharma R (2021) Design an early detection and classification for diabetic
retinopathy by deep feature extraction based convolution neural network. J Trends Comput Sci
Smart Technol (TCSST) 3(02):81–94
5. Zheng Y et al (2012) The worldwide epidemic of diabetic retinopathy. Indian J Ophthalmol pp
428–431
6. Chandrakumar T, Kathirvel R (2016) Classifying diabetic retinopathy using deep learning
architecture. Int J Eng Res Technol (IJERT), pp 19–24
7. Chen H et al (2018) Detection of DR using deep neural network. In: IEEE twenty third
international conference on digital signal processing (ICDSP)
8. Lands A et al (2020) Implementation of deep learning based algorithms for DR classification
from fundus images. In: Fourth international conference on trends in electronics and informatics
(ICTEI), pp 1028–1032
9. Xiaoliang et al (2018) Diabetic retinopathy stage classification using convolutional neural
networks. In: International conference on information reuse and integration for data science
(ICIRIDS), pp 465–471
10. Shaban et al (2020) A CNN for the screening and staging of diabetic retinopathy. Open Access
Research Article, pp 1–13
11. APTOS 2019 Blindness detection dataset. Available online at https://www.kaggle.com/c/apt
os2019-blindness-detection. Accessed on Feb 2021
12. Alzubaidi L et al (2021) Review of deep learning: CNN architectures, concepts, applications,
challenges, future directions. Open Access J Big Data, pp 2–74
13. FC layer. Available online at https://docs.nvidia.com/deeplearning/performance/dl-perfor
mance-fully-connected/index.html. Accessed on Mar 2021
14. Precision and recall. Available online at https://blog.exsilio.com/all/accuracy-precision-recall-
f1-score-interpretation-of-performance-measures/. Accessed on Mar 2021
15. F1-Score. Available online at https://deepai.org/machine-learning-glossary-and-terms/f-score.
Accessed on Mar 2021
Study on Class Imbalance Problem
with Modified KNN for Classification
R. Sasirekha, B. Kanisha, and S. Kaliraj
Abstract Identification of data imbalance is a very challenging one in the modern

era. When we go for a data warehouse, there would be a vast data available in
it but managing data and sustaining the balanced state of data is very difficult to
handle in any type of sector. Occurrence of data imbalance comes when specimens
are classified based on their behaviour. In this paper, the imbalance state of data is
analysed and the machine learning techniques are studied carefully to choose the
best technique to handle data imbalance problems. Wide analysis of the k-nearest
neighbour (KNN) algorithm can be carried out to keep the classification of specimens
grouped equally.
Keywords Imbalance · KNN algorithm · Classification specimens
1 Introduction
Data warehouse is a vast environment where we can get an enormous amount of data.
Data mining is a good environment for the data scientist to get needed data from the
source of the warehouse. Data mining environments are widely used to perform the
evaluation to produce good results with good output. Hence, the data imbalance may
cause severe effects in any kind of sector. In this paper, the analysis of the imbalance
problem and the suitable machine learning technique to solve the imbalance issue
could be considered for the brief study. Flow chart shows the test and training data
classification of data in supervised and unsupervised learning. Before going to the
imbalance problem, the study about classification and clustering could be a broad
knowledge gaining statistics to bring out the idea of several handling mechanisms
for imbalanced data.
Before knowing about the handling mechanism of imbalance analysis, knowing
about the imbalance problem and how it occurs are the most important views to
solve the imbalance issue. In the related work section, clarification about imbalance
R. Sasirekha (B) · B. Kanisha · S. Kaliraj

SRM Institute of Science and Technology, Chennai, India
https://doi.org/10.1007/978-981-16-7610-9_15
208 R. Sasirekha et al.
Fig. 1 Flow chart for imbalanced classification in supervised and unsupervised environment
problems have been carried out. First, the study about the occurrence of imbalance
problems. Secondly, what can be indicated here as an imbalance problem. Finally,
how the imbalance problem can be handled. As in many contexts, KNN is the best
algorithm to handle imbalance problems. In the upcoming sections various modi-
fied KNN algorithms are discussed and how those algorithms are suited for the
classifications are also discussed in detail (Fig. 1).
2 Related Works
There are be a lot of classifiers used in the classification supervised and unsupervised
data. Random forest, Naive Bayes and KNN classifiers are the widely used classifiers
in the field of class imbalance problem solving. In this paper, related works are based
on the study of imbalance problem and what are the handling ways are proposed and
how it could be worked on it.
2.1 Curriculum Learning
In the talk about imbalance data problems, the main curriculum is the learning about
supervised and unsupervised environments. After collecting huge information about
these environments, it would be easy to handle imbalance problems. Supervised
Study on Class Imbalance Problem with Modified … 209
learning: Labelled data comes under supervised learning. Each and every classi-
fication algorithm comes under supervised learning. The classification algorithms
are used for the classification of data based on their nature. The decision vector
plays a vital role in the classification based approaches. The supervised learning
environment could be strong to give predicted results. In case of weakly supervised
learning [1], there could be a more intensive care amongst semi-supervised learning,
domain adaptation, multi-instance learning and label noise learning demonstrate our
effectiveness. Unsupervised Learning: The decision tree algorithm and the clustering
algorithm come under the unsupervised environment. The study about the unsuper-
vised environment is quite difficult when compared to a supervised environment.
Recently, unsupervised learning has been done to accurately getting information
from heterogeneous specimens. However, most of the used methods are consuming
huge times [2]. The formation of clusters is based on the characteristics of the data
points and the centroid could be identified to form a group of data points which are
in the particular range. K-means algorithms which are broadly used in the clustering
of imbalanced data. The data which is said to be imbalanced is very hard to handle.
Such imbalanced data are identified easily by means of the k-means algorithm. The
centroid of the k-means algorithms is identified using Euclidean distance metrics.
The distance for the data points in the particular range is identified and the cluster
could be formed easily by the use of the Euclidean algorithm. Many of them are
not aware about what KNN and K-means algorithms are. Some people think that K-
means and KNN are the same. In this paper, the KNN classifier is studied carefully
and the major difference between KNN, and K-means are described in the upcoming
sections (Fig. 2).
Fig. 2 Sample experiment in weka tool for supervised learning-soybean dataset using naive bayes
classifier
2.2 Clarification About Imbalance Problem
Before entering into any kind of research work there is a need for rich analysis about
the topic. First, clarity about where the imbalance problem occurs. When the class
of specimens are classified into several classes based on their behaviour, imbalance
problem occurs [3] if the number of specimens in one group is not equal to the number
of specimens in another group. Secondly, clarity about what would be considered as
an imbalance problem. The data in the group is not sufficient to produce an exact
result for the evaluation is considered as imbalanced data and the situation which
can’t be handled at the moment is considered as an imbalance problem. Finally,
clarity about the ways to handle imbalance problems. An imbalance problem can be
handled by various approaches widely used approaches are resampling, ensemble,
clustering, evaluation metrics. These approaches are discussed deep in the upcoming
sections. Remaining sections cover the different handling methods of imbalance
classification.
3 Resampling
Widely used approaches to handle data imbalance problems are resampling tech-
niques. There are several resampling techniques used to solve imbalance issues.
Resampling techniques are used to resample the trained data.
3.1 Data Imbalance Problem Can Be Handled

by Oversampling
Oversampling is a widely used resampling method to handle data imbalance prob-

lems. Whilst classifying the specimens of a class into separate groups, there is a
possibility that specimens are unequally classified into two groups. A group which
may have a greater number of specimens is said to be a majority group and the
group which may have lower number of specimens are said to be a minority group.
Oversampling technique takes minority groups to solve issues by adding copies over
it.
Widely used oversampling techniques:
In this modern era, few oversampling techniques are widely used to resample the
trained data.
1. Smote synthetic minority oversampling technique is widely used to over-
come the imbalance problem by adding copies of existing specimens in the
minority group [4]. It is very simple and more effective when compared to other
oversampling techniques.
2. MC-SMOTE: Minority clustering—SMOTE [5] handles the imbalance data by

taking the specimens from minority classes.
3. Adaptive synthetic sampling: Generating synthetic data without copying the
same data of the minority class. This is why adaptive synthetic sampling is
familiar in data balancing strategies.
4. Augmentation [6]: The oversampling applied to the cross-validation, only after
the classification of training and test set. The training data could be augmented.
3.2 Data Imbalance Problem Can Be Handled

by Undersampling:
Undersampling [7] could be a big threatening method. Because elimination happened

in the majority class that could cause severe effects in case of deletion of important
instances from the majority class [8].
RUS: Random undersampling method [4] is the widely used undersampling
method to handle imbalance class. But it can be handled with care. This approach
concentrates on the majority class to solve imbalance problem by removing instances
without affecting the containing class.
4 Ensemble
Ensemble [9] uses Specimens of the majority group can be divided into several equal
sections to solve the imbalance problem. For example: if group A has 40,000 spec-
imens but group B has only 4000 specimens. In this scenario, the working strategy
of the ensemble is to divide the 40,000 specimens in 10 sets which can have 4000
specimens for each. After dividing a large group of specimens into an equal group
of specimens, it can resample the trained data to handle imbalance problems.
4.1 Bagging
Bootstrap Aggregation (Bagging) [10], wonderful ensemble approach. An ensemble

approach which covers the several machine learning algorithms for predictions used
for the exact predictions it could be widely used for the reduction of the variance in
algorithms which has highest variance value [11]. An algorithm which contains high
variance is said to be a decision trees. In training data, the decision trees are more
sensitive. Occurrence of changes in training data would result in various predictions.
Assume we have a sample dataset of 20,000 occurrences (x) and are employing
the CART technique, which has a greater variance. The CART algorithm would be
bagged as follows. Make a large number (e.g., 2000) of random sub-samples of our
dataset with replacement. The CART model has been trained on each sample. Given
a new dataset, calculate the average forecast from each model. For example, if we
had 7 bagged decision trees that predicted the following classes for a given input
sample: read, write, read, write, write, read, and read, we would anticipate read as
the most frequent class. Bagging would not cause the training data to be overfit.
5 Clustering Versus Classification
Clustering is also done under the undersampling of data. Centroids of the cluster
remove the points whichever apart from it. K-means [12] is an algorithm which can
be applied eagerly by the data scientist to handle imbalance problems. KNN comes
under supervised learning to handle imbalance problems. KNN is an algorithm to
predict neighbour behaviour it checks the similar characteristic samples and group
it in a category of which character the neighbours are belonging to.
5.1 K-NN for Imbalance Problem
When using a KNN in any application it will produce balanced data to get an exact
result. Nowadays, we are in need of choosing the best classifier to produce good
accuracy. KNN is a user-friendly algorithm to predict nearest points in a network.
KNN is widely used in the field of medical and in the field of engineering technology.
In the KNN algorithm, K indicates the number of nearest neighbour. KNN is an
excellent algorithm for the classification and in the prediction of nearest neighbours
[13] (Table 1).
Sample application of KNN in weka tool:
The centroid to cluster an item having similar behaviour. Sample application
of KNN in chosen trained dataset. Weather dataset has been chosen to show the
application of KNN (Fig. 3).
While pre-processing a trained dataset in the weka tool, the nominal dataset has
been chosen to apply the KNN algorithm. Classification has been implemented in
nominal datasets only. Hence, the weather nominal dataset has been chosen. Study
Table 1 Description of sample binary datasets with imbalance ratio

Datasets DataCount Number Class Class Imbalance
of distribution ratio
attribute
Heart 270 14 120,150 2 1.250
diabetes 768 9 268,500 2 1.866
climate-model-simulation-crashes 540 20 46,494 2 10.739
Fig. 3 Preprocess the trained dataset—weather
on class imbalance problem with modified KNN for classification 7. In this example,
k value is assigned as 4. Hence, there would be only 4 centroids which can form
a group based on the similarities of a data. Distance of points calculated by means
of Euclidean distance. Now, let us discuss about the various modified KNN. Fuzzy
k-NN: Mainly, a theoretical analysis of knn could be taken as a major thing to come
to know about fuzzy K-NN. Main rule of fuzzy K-NN [14] is grouping of fuzzy sets
which can be made more flexible to analyse the instance of a class in the context
of ensuring membership of data in a class. Fuzzy set theory has been taken rather
concentrating on the centroids to the distance of points. Bayes’ decision rule fails
due to not satisfying the rule of fuzzy K-NN. D-KNN: D-KNN is used to improve
the chance of K-nearest neighbour search over distribution storage and distributed
calculations are done effectively. It reduces the storage space of main memory by
means of providing distributed storage nodes [15]. This algorithm is used in cyber
physical social networks.
6 Discussion
In this section, the important elements to handle imbalance problems have been
discussed. Imbalance problems may occur when trying to classify specimens. A
suitable classifier to be chosen to prevent these kinds of imbalance problems imbal-
ance of data may cause severe effects in any kind of sector widely used classifiers
are Naive bayes, Random forest and KNN getting details about the neighbour node
would be a big task. But, it could be solved by means of K-NN classifier. The
difference between the K-NN and K-Means algorithm is very much important in the
classification of specimens. KNN has been discussed in an earlier section.
Let us discuss K-means based methods to handle imbalance problems.

FSCL: it is a k-means based algorithm to improve the framework of K-Means
[16]. This approach are widely used to bring the data points nearer to the input lines
to have a details about the seed point frequency pressure could be applied to the
data point to make it move towards the incoming data point. Hence, the centroid has
been calculated to have a count of data points must be equally transmitted in cluster.
Uniform effect is identified as a drawback of this method. Because of frequency
weight, uniform effect has occurred in a class in which it may cause severe imbalance
problems.
RPCL: To improve FSCL [8] RPCL has been used. Using revel penalization
method the imbalance Problem could be handled carefully. By the effect of revel
penalization the data points travel towards the opposite direction. The disadvantage
of this method is there is no knowledge about the number of clusters travelled in the
revel penalized mechanism.
MPCA: Multi prototype clustering algorithm [8] is used with a multiple prototype
implementation mechanism to solve the uniform effect. The disadvantage of this
algorithm is the assignment of prototype count is user defined. Hence, the number of
imbalanced data clusters is more than the pre-defined number of prototypes; it may
fall to failure of the framework.
Euclidean metrics [17] which can be used in clustering purposes. Based on the
distances, the data points can be chosen to form a cluster. An outlier analysis [18] is
a simple way to find out the data which are not in the same group. Outliers is like
an odd man out. In general, the points which are outside the region are said to be an
outlier. It could be handled carefully.
NMOTe [19] navo minority oversampling technique is a novel oversampling
approach it has been used to find an efficient and consistent solution.
7 Background Study on Classification Metrics
As per the study, the evaluation can be done in the key classification metrics: Accu-
racy, Recall, Precision, and F1-Score. The Recall and Precision may differ in certain
cases (Fig. 4).
Decision Thresholds and Receiver Operating Characteristic (ROC) curve. The
first is ROC curve and the determination of ROC curve is suites or not by noticing at
AUC (Area Under the Curve) and the other parameters are also known as confusion
metrics. A confusion matrix is a data table which has been used in the description
of a classification model performance on a test data for the true values are already
found. Except Auc all the measures have been calculated by considering the left most
four parameters.
The correctly predicted observations are true positive and true negatives. Reduc-
tion of false positives and false negatives is to be considered.
True Positives (TP)—They are correctly predicted positive values that is the value
of original class is Yes and the value of predicted class is also Yes.
Fig. 4 Sample experiment for classification metrics using trained dataset-weather
True Negatives (TN)—They are correctly predicted negative values that is the
value of original class is No and value of predicted class is also NO.
False Positives (FP)—The value of original class is No and predicted class is Yes
(Table 2).
False Negatives (FN)—The value of original class is Yes but predicted class in
No.
Understanding these four parameters is important to calculate Accuracy, Preci-
sion, Recall and F1 score.
1. Accuracy—it is said to be a ratio of correctly predicted specimens to the taken
total taken specimens. Accuracy = (TP) + TN/TP + FP + FN + TN
2. Precision ratio of correctly predicted positive specimens to the total predicted
positive specimens. Precision = TP/TP + FP
3. Recall is the ratio of correctly predicted positive specimens to the all taken
specimens in original class is Yes. Recall = TP/TP + FN
4. F1 score is the weighted average of Precision and Recall. It takes both false posi-
tives and false negatives specimens. F1 Score = 2*(Recall * Precision)/(Recall
+ Precision)
Table 2 Positive negative

Actual Class: yes Predicted Class: yes TP
observations of a actual class
with the predictions Actual Class: yes Predicted Class: no FN
Actual Class: no Predicted Class: yes FP
Actual Class: no Predicted Class: no TN
8 Conclusion
In this paper, the proper study on supervised and unsupervised learning has been
carried out and the imbalanced problem handling mechanisms are also discussed. In
this survey, several modified KNN classifiers are studied with an Outlier analysis and
classification metrics are discussed elaborately with the training samples to represent
each specimens by choosing suitable K-nearest neighbour based classifier and also
intensive study on classification metrics has been carried out.
9 Future Work
Future work can be carried out by selecting suitable oversampling mechanisms to

solve imbalance problems. According to the survey taken, the undersampling method
is not well suited in the field of handling imbalance problems. In the future work, the
suitable modified K-Nearest Neighbour algorithm can be used with the Euclidean
metrics accompanied with the outlier analysis to handle the imbalance problem with
intended care. There are very few studies examining different distance metrics with
their effect on the performance of KNN. Hence, future work can lead by testing a
large number of distance metrics on a data set and finding up of the distance metrics
that are least affected by added noise.
References
1. Li Y-F, Guo L-Z, Zhou Z-H (2021) Towards safe weakly supervised learning. IEEE Trans
Pattern Anal Mach Intell 43(1):334–346
2. Xiang L, Zhao G, Li Q, Hao W, Li F (2018) TUMK-ELM: a fast unsupervised heterogeneous
data learning approach. IEEE Access 6:35305–35315
3. Lu Y, Cheung Y, Tang YY (2020) Bayes imbalance impact index: a measure of class imbalanced
data set for classification problem. IEEE Trans Neural Networks Learn Syst 31(9):2020
4. Lin W-C (2017) Clustering-based undersampling in class-imbalanced data. Inf Sci 409:17–26
5. Yi H (2020) Imbalanced classification based on minority clustering smote with wind turbine
fault detection application. IEEE Trans Ind Inform 1551–3203
6. Zhang L, Zhang C, Quan S, Xiao H, Kuang G, Liu L (2020) A class imbalance loss for
imbalanced object recognition. IEEE J Sel Top Appl Earth Observ Rem Sens 13:2778–2792
7. Ng WW, Xu S, Zhang J, Tian X, Rong T, Kwong S (2020) Hashing-based undersampling
ensemble for imbalanced pattern classification problems. IEEE Trans Cybern
8. Lu Y, Cheung YM (2021) Self-adaptive multiprototype-based competitive learning approach: a
k-means-type algorithm for imbalanced data clustering. IEEE Trans Cybern 51(3):1598–1612
9. Yang Y, Jiang J (2016) Hybrid sampling-based clustering ensemble with global and local
constitutions. IEEE Trans Neural Networks Learn Syst 27(5):952–965
10. Chakraborty S, Phukan J, Roy M, Chaudhuri BB (2020) Handling the class imbalance in
land-cover classification using bagging-based semisupervised neural approach. IEEE Geosci
Remote Sens Lett 17(9):1493–1497
11. Yang W, Nam W (2020) Brainwave classification using covariance-based data augmentation.
IEEE Access 8:211714–211722
12. Zhang T (2019) Interval type-2 fuzzy local enhancement based rough k-means clustering
imbalanced clusters. IEEE Trans Fuzzy Syst 28(9)
13. Zhuang L, Gao S, Tang J, Wang J, Lin Z, Ma Y, Yu N (2015) Constructing a nonnegative low-
rank and sparse graph with data-adaptive features. IEEE Trans Image Process 24(11):3717–
3728
14. Banerjee I, Mullick SS, Das S (2019) On convergence of the class membership estimator in
fuzzy nearest neighbor classifier. IEEE Trans Fuzzy Syst 27(6):1226–1236
15. Zhang W, Chen X, Liu Y, Xi Q (2020) A distributed storage and computation k-nearest
neighbor algorithm based cloud-edge computing for cyber-physical-social systems. IEEE
Access 8:50118–50130
16. Chen D, Jacobs R, Morgan D, Booske J (2021) Impact of nonuniform thermionic emission on
the transition behavior between temperature-and space-charge-limited emission. IEEE Trans
Electron Devices 68(7):3576–3581
17. Ma H, Gou J, Wang X, Ke J, Zeng S (2017) Sparse coefficient-based k-nearest neighbor
classification. IEEE Access 5:16618–16634
18. Bezdek JC, Keller JM (2020) Streaming data analysis: clustering or classification. IEEE Trans
Syst Man Cybern: Syst
19. Chakrabarty N, Biswas S (2020) Navo minority over-sampling technique (NMOTe): a
consistent performance booster on imbalanced datasets. J Electron Inform 02(02):96–136
Analysis of (IoT)-Based Healthcare
Framework System Using Machine
Learning
Abstract In recent years, Internet of things (IoT) are being applied in several fields
like smart healthcare, smart cities and smart agriculture. IoT-based applications are
growing day by day. In healthcare industry, wearable sensor devices are widely used
to track patient’s health status and their mobility. In this paper, IoT-based framework
for healthcare us ing a suitable machine learning algorithm have been analysed
intensely. Transmission of data using various standards are reviewed. Secure storage
and retrieval of medical data using various ways are discussed. Machine learning
techniques and storage mechanisms are analysed to ensure the quality of service to
the patient care.
Keywords Internet of Things (IoT) · Wearable sensors · Machine learning · Cloud

computing · Fog computing · Security
1 Introduction
An Internet of Things technology is a process, which is utilized to provide the inter-

face and brilliant gadgets to make human life easier. It gathers the sensor data, process
it and send through the internet. Many organizations predict the expansion of IoT
over the years and Cisco Systems is one of them. A specific report of Cisco Systems
states that IoT will be an operational domain of over 50 billion devices by 2023
[1]. Moreover, IoT has its list of providing numerous advantages in daily life. It can
efficiently monitor the working and functionality of devices and gadgets with limited
resources in a very smooth way. Smart healthcare industry can do the following things
such as, Telehealth, Assisted living, Hospital asset management, Drug management,
Hygiene compliance, Disease management and rehabilitation.
B. Lalithadevi (B)
Department of Computer Science and Engineering, SRM Institute of Science and Technology,
Chennai, India
S. Krishnaveni
Department of Software Engineering, SRM Institute of Science and Technology, Chennai, India
e-mail: krishnas4@srmist.edu.in
https://doi.org/10.1007/978-981-16-7610-9_16
220 B. Lalithadevi and S. Krishnaveni
These variety of management services makes the patient life easier and assist
the medical experts as well as hospital to manage and delivering the service within a
short time intervals [2]. IoT is a dynamic community infrastructure that interconnects
different sensor networks via the Internet, accumulates sensor data that transmits and
receives records/facts for further processing. According to structure and strategies
followed in IoT environment, security challenges are there in terms of maintain the
person’s clinical records and hospital records confidentially.
1.1 Contributions
The main contribution of this paper are as follows,

• Presented the deep analysis of smart healthcare framework based on Internet of
Things
• Discussed about the architectural design of cloud computing as well as fog
computing in healthcare field for data storage and retrieval.
• In addition, the impact of block chain for data security particularly security
challenges to manage the medical data records.
• Presented the influence of Internet of things in medical care industry that makes
it as smart environment
• Presented the existing framework for healthcare industry and focused on cloud
computing in general and also sensors for the generalization of health parameters.
• Discussed the machine learning algorithms that used for diseases prediction.
This paper is organized as follows: in Sect. 2, discuss the background of an IoT
in healthcare field that address the related technologies and monitoring the remote
patients through wearable sensors. Section 3 listed out different IoT architecture,
various healthcare services which are available nowadays and various applications
toward patient, hospital and doctors. Section 4 describes the machine learning tech-
niques and various algorithm. Section 5 discuss the data communication standards
used for deploying the healthcare environment based on Internet of things. Section 6
describes enabling cloud computing in IoT for data transmission. Section 7 gives the
ideas about fog computing in smart environment. Section 8 describes various security
challenges in IoT due to global data access. Section 9 discusses about the research
challenges and limitations in IoT-based healthcare framework design. Finally, in
Sect. 10, we present the concluding remarks and reviews.
2 Background
IoT has included a lot of research ideas over the years and the possible areas of
success of it being implemented are studied strictly. Elder people can track down
their health levels through the use of IoT is reduces the stress and dependency of
people on the healthcare system [2].
Analysis of (IoT)-Based Healthcare Framework … 221
2.1 m-Health Things (m-IoT)
Mobile—IoT (m-IoT) has come up in recent days following the fusion of contem-
porary digital fitness standards with LTE connectivity. It has been summed up with
high speed 4G networks to boost up the standard further. The 6LoWPAN is summed
up with the 4G protocols.
2.2 Wearable Medical Sensors
The health-related dataset includes the following attributes such as, lung disease,
severe headache, kidney disorder, liver disorder, LDL, TC, DBP, HDL, TC, obesity,
BG, and HR [3]. Any one of these attributes can be the major cause of hyperten-
sion disease [4]. Figure 1 represents the overview of healthcare framework model.
According to healthcare industry sector, the basic building blocks of an IoT are
various wearable sensors that collects the data from patients remotely and relevant
data is collected from sensors.
These data is transmitted into cloud server and stored on it. In this, artificial
intelligence or machine learning based prediction and detection model is incorporated
to predict the risk values. Alert or warning message is send to medical experts via the
cloud server. In turn, the respective physician can prescribe the appropriate medicine
and take necessary action to protect the persons in critical situation.
Fig. 1 Overview of healthcare framework

Role of Datasets and Acquisition methods Dataset is a collection of information

used to train our model through machine learning or deep learning techniques there
are four different ways to acquire the data from end nodes such as data collection,
data conversion and data sharing. Table 1 demonstrates the role of patient dataset
and acquisition method of wearable body sensors.
Pulse Sensors The accurate measurement of the pulse depends on a variety of factors
but it mostly depends on the body part from where it is measured. The chest and
wrist are common areas. It is measured from fingertips and earlobes as well [1].
Respiratory Rate Sensors It is based on the fact that the exhaled air is warmer
than the intrinsic temperature. The number of breaths taken is also considered into
account. Often advanced signals are used to make it more precise.
Body Temperature Sensors The importance of body temperature sensors in wear-
able devices is noteworthy as well. Fever, high body temperature, and other ailments
can be measured. Figure 2 shows the collection of various healthcare sensors based
on IoT. Terrible temperature coefficient (NTC) and pressure temperature coefficient
(PTC) are temperature sensors [1]. Heart strokes can be detected through this sensor
[4]. Hypothermia can be identified based on these body temperature sensor which is
embedded in a wearable device.
Blood Pressure Sensor Hypertension is a significant danger component for cardio-
vascular sickness, comprising of a coronary respiratory failure. Heartbeat travel time
(PTT) defined as the time taken among beat at the heart and heartbeat at another area,
which incorporates the ear cartilage or outspread artery [4].
Glucose Sensor A sensible device for monitoring the blood-glucose degrees in
diabetic sufferers have been proposed earlier. This gadget calls for sufferers to
manually check blood-sugar levels at ordinary time intervals [5].
ECG sensor ECG sensors are based on electrocardiogram. Their function is to
monitor cardiac activity. Smart sensors and microcontrollers are used in these
systems. There is a smart device involved in a smartphone that is connected through
Bluetooth with the clinical data [6].
Table 1 Patient medical report

Patient data Wearable body Sensed data Disease symptoms
sensors
Patient name: xyzPulse sensors, Acquired data from Fever, headache, heart
Unique ID No.: 1234 respiratory rate various wearable body beat rate, inconsistent
sensors, body sensors pulse rate and breath
Address: ABC temperature sensors, rate, abnormal glucose
blood pressure sensor, Level and pressure
glucose sensor, ECG level
sensor, pulse oximetry
sensors
Fig. 2 Representation of various IoT healthcare sensors
Pulse Oximetry Sensors It estimates the oxygen level in the blood. The degrees
of oxygen in the blood are resolved. This is not a much essential thing to consider
in designing a medical wearable device but can certainly provide an edge in certain
cases [1].
Accuracy and precision of sensed data from wearable devices may be corrupted
by the malicious intruders and change it as erroneous data. So it leads to misguide
the end user in terms of decision support system in treatment. If we want to make a
smart healthcare environment based on IoT, then need to provide a highly secured
framework and efficient model to maintain the privacy and confidentiality of patient
medical data.
3 IoT Architecture, Healthcare Services and Application
Iot architecture describes the flow of data transmission from edge devices to cloud
server through interconnected network for data analysis, storage and retrieval. In
that, all sensed data will be processed further through our prediction model. Figure 3
represents the evolution of IoT architecture. About IoT Healthcare Services and their
Applications, different fields assume a significant part in the administration of private
Fig. 3 Evolution of IoT architectures

wellbeing and wellness, care for pediatric, management of persistent sicknesses, elder
patients among others [2].
3.1 Real-Time Significance of Proposed Model
The proposed model has the significant ability to transmit the data from wearable
medical sensors into cloud server via data transmission and communication protocol
standards as shown in Fig. 1. Smart phone acts like as an intermediate agent between
sensors and web apps. Request and response protocol (XMPP) is applied to send the
amount of data payload and data length code from sensors to android listening port
finally web interface layer activate the physician, patients and hospital to respond the
action based on request parallel. Figure 4 represents the list of Healthcare Services in
medical field. A robotic supporter is used for tracking senior citizen status. This robot
is utilized for the ZigBee sensor gadget to extraordinarily distinguish individuals that
it is tracking [7].
Fig. 4 List of healthcare services

3.2 IoT for Patients
Doctors cannot attend to all patients at all the time, so patients can check the progress
in their health by themselves through the wearable IoT devices. This is made by
the healthcare system in a very efficient way for doctors. Basic research challenge
involved in this, monitor multiple reports of various patients at a time through their
mobile phones and give appropriate medications on time without any delay.
3.3 IoT for Physicians
The various fitness rate are useful to monitor the overall condition of the patient. One
of the research challenges is associated with the doctor. They should give preventive
medications and the patients also have to suffer less if the disease is diagnosed in the
early stage.
3.4 IoT for Hospitals
IoT contraptions labeled with sensors are utilized for observing the real-time district
of clinical bits of gear like wheelchairs, nebulizers, oxygen siphons, and so on.
Appointment fees and travel costs can be reduced by IoT devices [8]. Azure Web
application is used to store the data to perform analysis and predict the health condi-
tions within the expected time [9]. The most important advantages and challenges
of IoT in healthcare consist of:
• Cost and error reduction
• Improved and proactive Treatment
• Faster Disease Diagnosis
• Drugs and Equipment Management.
4 Machine Learning Techniques for Disease Prediction
The medical field has experienced innovation in disease diagnosis and medical data
analysis since it has collaborated its research work with machine learning [10]. The
data of patient details which are generated every day is huge and cannot be surveyed
by simple methods. This statistical data is given to the model which has been trained.
The model might not achieve higher accuracy but near to it.
4.1 Naive Bayes
To understand the naive Bayes theorem, let us first understand the Bayes theorem. It
is based on conditional probability. There are two types of events namely dependent
and independent events.
P(X/Y ) = P(X ∩ Y )/P(Y ), if P(Y ) = 0 (1)
P(Y ∩ X ) = P(X ∩ Y ) (2)
P(X ∩ Y ) = P(X/Y )P(Y ) = P(Y/ X )P(X ) (3)
P(X/Y ) = (P(Y/ X )P(X ))/(P(Y )), i f P(Y ) = 0) (4)
P(X/Y )—conditional probability, the probability of event X occurring given that

Y is true.
P(Y /X)—likelihood probability, the probability of event Y occurring given that X
is true.
This idea is applied on classification datasets where there is a categorical column
such as YES and NO or TRUE and FALSE. The categorical data can be binary data
or multi-classified data [11].
4.2 Artificial Neural Networks
Machine learning has found its use in various fields and medical science is one of
them. Machine learning or deep learning models need very little human assistance
to solve any problem.
Figure 5 shows the overview of artificial neural network model. A machine
learn ing model uses feature selection techniques to find out the probable outcome.
Fig. 5 Overview of artificial neural network

Fig. 6 Representation of
support vector machine
Certain limitations hinder the performance of a machine learning model. Machine

learning algorithm is only bound to the features that are given to it as input, if any
foreign feature comes in, it might predict wrongly.
4.3 Support Vector Machine
There are three kinds of learning techniques in Artificial Intelligence such as regu-
lated, unaided, and support learning. SVM lies under supervised learning which is
used for classification and regression analysis. Figure 6 shows the support vector
machine classifier. Classification datasets contain categorical data as target variables
and regression datasets contain continuous variable as a target [12]. SVM is a super-
vised learning technique that is working on labeled data [12]. If there is a dataset
consisting of circles and quadrilaterals, it can predict whether the new data is a circle
or a quadrilateral. SVM creates a boundary between the two classes. This boundary
plane is the decision boundary which helps the model to predict.
4.4 Random Forest
Ensemble techniques in machine learning are methods that combine multiple models
into one model. There are two types of ensemble techniques, bagging and boosting.
Random forest is a bagging technique. In bagging, many base models can be created
for feature extraction [13]. A new random set of data will be given as a sample to the
models. This method is also known as Row Sampling with replacement. Figure 7
represents the random forest classifier model. For a particular test data, the output
of different models are observed. The output of all models may not be the same so
we used a voting classifier. All the votes are combined and the output which has the
Fig. 7 Overview of random

forest classifier mode
highest frequency or majority votes are considered [5]. Multiple decision trees are
used in a random forest. Decision trees have low bias and high variance. The various
decision trees of different models are aggregated and the final output achieved after
majority voting has low variance [5].
5 Data Communication Standards
To comprehend information correspondence in remote gadgets, we need to compre-

hend the body zone organization. EEG sensors [14], body temperature sensors, heart
rate sensors, etc., are interconnected with the cloud through some handheld devices.
5.1 Short-Range Communication Techniques
The short-range communication techniques present in this paper are ZigBee and
Bluetooth. Both are come under the home network. Zigbee is a technology that was
created to control and sense a network. The various layers in Zig-Bee [15] are the
application layer, security layer, networking layer, media access control (MAC) layer,
and physical layer. As demonstrated in Table 2, a few communication standards are
mentioned for short distance coverage.
Table 2 Short-range communication standards

Features Bluetooth Infrared ZigBee References
Band 2.4 GHz 430 THz to 300 GHz 2.4 GHz [15]
Data rate 1 Mbps 9.6–115.2 Kbps 20–250 Kbps [15]
Range 150 m Less than 1.5 m 10–100 m [16]
Security P-128-AES encryption Very low bit error rate S-128-AES encryption [7]
Topology Star Point-to-point Mesh [7]
5.2 Long-Range Communication Techniques
A network that allows long distance data communication or transmission is known

as a wide area network. This transmission is suitable for large geographical areas.
The range of WAN is beyond 100 km. Tables 3 demonstrates the communication
standards for long-range distance coverage. The major challenge of data communi-
cation standard is selecting the range of data coverage based on dataset collection
feasibility and problem statement for further process.
Table 3 Long range communication standards

Features/Reference LoRaWAN 6LoWPAN SigFox
Band/[15] 125 kHz and 250 kHz 5 MHz (2.4 GHz band) 100 Hz–1.2 kHz 125
(868 MHz band and 2 MHz KHz and 500 kHz
780 MHz band) (915 MHz ban) (915 MHz band)
100 to 600 bit/sec 600 kHz (868.3 MHz
band)
Data rate/[15] 5–15 km 250 kbit/s (2.4 GHz 980 bit/sec to 21.9
band) kbit/s
40 kbit/s (915 MHz (915 MHz band)
band) 20 kbit/s
(868.3 MHz band)
10 to 100 ms
Range/[15] NwkSKey (128 bits)- Handled at link layer 10 to 50 km
Security/[15] ensures data integrity, which includes secure Encryption
AppSKey (128 and non-secure mode mechanism
bits)—provides data
confidentiality
Payload/[16] Between 19 and 250 Header (6 bytes) and Between 0 and 12
bytes session data unit (127 bytes
bytes)
Fig. 8 Representation of cloud computing for healthcare
6 Cloud Computing in Healthcare
Cloud computing is a technology that emerged when the size of data generated daily
became impossible to store and handle. The patient data needs security and privacy.
The resources that are available on our computers like storage and computational
power are managed by the cloud services according to the data [15]. Figure 8 shows
an impact of cloud computing in healthcare field for instant data transmission.
Storage Access Control Layer is the spine of the cloud enabled environment,
which access medical services by utilizing sensors along with BG and sphygmo
manometers in every day’s exercises [16]. Data Annotation Layer resolves hetero
geneity trouble normally occurs throughout statistics processing. Data testing layer
analyzes the medical records saved inside the cloud platform. Portability and integrity
level of data transfer from end devices to cloud server is a challenging task in IoT
healthcare platform.
7 Influence of Fog Computing in Healthcare
Previously, the ability for medical services in 2017 was actualized by utilizing Fog
Computing [16]. A healthcare system was launched in 2016, called health fog [15].
Figure 9 illustrates the recent fog computing studies in healthcare industry using
machine learning and deep learning techniques. Next, to improve system reliability,
cloud-based security functionality was included into health fog [15].
The benefits of edge computing for home and hospital control systems have been
exploited by the current architecture. Health technologies have been moving from
cloud computing to fog computing in recent years [17]. Similar work has been
performed by authors [18] as a four-layered healthcare model comprising of sensation
layer, classification layer, mining layer, and application layer.
Sensation layer obtained the data from various sensors which are located in the
office room. Classification is done based on five different categories such as, Data
about health, Data about Environment, Data about Meal, Data about Physical posture,
Fig. 9 Representation of recent fog based healthcare studies
Data about behavior in classification layer. The mining layer is used for extract the
information from a cloud database. Finally, the application layer provides various
services like personal health recommender system, remote medical care monitoring
system and personal health maintenance system to the end user [19].
7.1 Fog Computing Architecture
In cloud computing science, Fog Computing Architecture shows potential challenges

and security. The basic architecture of fog computing is shown. It is separated into
three main layers as shown in Fig. 10. Device layer is the nearest layer to the end users
or devices are the application layer. It comprises of many hardware such as mobile
devices and sensors. These devices are spread globally. They are responsible for
detecting and communicating the knowledge about the physical object for analysis
and storing to the upper layer. Fog layer is the second layer at the edge of the
network is the mist layer, which incorporates a tremendous measure of haze hubs
[20]. Cloud layer is responsible for permanent management and the comprehensive
computational processing of data [21]. Fog Nodes and cloud platform must consider
the following challenges, such as,
• Retrieve data from IoT devices based on any protocol.
• Check-up IoT-enabled applications for control and analysis of real-time applica-
tions, with minimum response time.
• Send cyclic data to the cloud for further process.
• Data are aggregated which are received from many fog nodes.
Fig. 10 Overview of Fog computing architecture
• Analysis can be done on the IoT data to get business insight.
8 Security Challenges in IoT
The primary goal of IoT security is to safeguard the consumer privacy, data confi-
dentiality, availability, transportation infrastructure by an IoT platform. Block chain
innovation improves responsibility among patients and doctor [23]. DDoS attacks
perhaps one of the great examples of the issues that include shipping gadgets with
default passwords and no longer telling customers to exchange them as soon as they
obtain them [16]. As the wide variety of IoT related gadgets proceeds to upward
push before very long, a wide assortment of malware and ransomware are utilized
to misuse them [24]. An IP camera is suitable for capturing sensitive statistics on
the usage of a huge variety of places, such as your private home, paintings office, or
maybe the nearby gasoline station [25]. There are various security problems in IOT.
Mostly the password of IoT devices is weak or hard coded [26]. Various security
challenges in IoT are,
• Insufficient checking out and updating
• Brute-forcing and the issue of default passwords
• IoT malware and ransomware
• Data security, privacy and untrustworthy communication.
Investigators applied the various machine learning and deep learning models in
healthcare framework for different disease detection. Table 4 summarize the various
methodology used for disease prediction and detection.
Table 4 Summary of Data analysis using Machine learning and Deep learning algorithms in healthcare
Reference/Attainment Objective Type of Methodology Performance
data metrics
[27]/Better decision support given using XG Early prediction and survival rate through MRI Clinical Logistic regression, Support vector ma chine, XG AUC, ROC
boost classifier for further treatment images for breast cancer data—MRI boost, Linear discriminant analysis Precision,
[28]/High accuracy prediction model for Development of deep learning system to Clinical DNN, KNN, Support vector machine Recall F1
survival rate of HCC disease pre-dict carcinoma and its risk level data Score,
Implementation of semi- automated Accuracy
[29]/Efficient feature selection model for framework based on genotype- phenotype Electronic Random forest, Decision trees, KNN, Bayes AUC,
classification of T2DM association for Diabetes Comparative analysis health classifier Precision,
of traditional and machine learning models for records Recall,
early prediction of emergency case Sensitivity,
Analysis of pattern classification models for Specificity
variety of brain tumors
[30]/Temporal based prediction model for Monitoring early and delay movement of Electronic Random forest, Gradient boost, Cox model Confidence
high risk factors of emergency admission infant based on kinematics analysis model health matrix, AUC
Human activity recognition using DBN model records
[31]/SVM based multi-class classification of and dimensionality reduction through kernel Clinical Recursive feature elimination, Lin- Confusionma
Analysis of (IoT)-Based Healthcare Framework …
brain tumor principle component analysis data—MRI eardiscriminant analysis, KNN trix,Entropy,
[32]/Monitoring early movement of infants Detection and progressive analysis of Wearable Ada boost, Support vector machine, Logistic t-test for
through wearable sensors Parkinson’s disease using Machine learning sensordata regression Stan-dard
[33]/DBN based activity recognition through Deeplearningbased emotionclassification through Deep belief network, Feature extraction deviation
PCA and LDA approaches throughphysiological, environmentaland ankle of Accuracy,
location based signals Severitydetectionof infant Preci sion,
proteins associated with acute myeloid Wearable Recall, F1
leukemia body score
patients sensors Accuracy
(continued)
233
Table 4 (continued)
234
Reference/Attainment Objective Type of Methodology Performance

data metrics
[34]/Progression and severity analysis of Sensordata CNN, Random forest AUC,
Parkinson disease symp toms using ML collected CNN, LSTM Confidence
[35]/Hybrid deep learning model for efficient from hands, Stackedautoencoder(SAE) interval,
emotion classification thighs and Standard
[36]/Autoencoder model for high prediction arms deviation,
rate of proteins associated with FLT3-ITD Sensordata ROC
from Precision,
wearable Recall, F1
devices score,
through Accuracy,
smart Error rate
phones Sensitivity,
Protein data Specificity,
Accuracy
9 Discussion
In this section, discuss about the research limitations and challenges of IoT-based
healthcare framework. flexible wearable sensors are required to monitor the patient’s
health status. Design a secured framework for data transmission from edge device to
control device then cloud server. In that, various intruders may be involved to modify
the data and break the confidentiality. Analysis of signals should be done in ECG
and EEG monitoring using ML. Energy efficient optimization algorithm is needed to
protect the consumption and reduce the amount of usage level. Data privacy is more
important especially in healthcare domain. It can be achieved through cryptographic
model and standards.
10 Conclusion
IoT-based healthcare technologies offer different architectures and platforms for

healthcare networks that support the connectivity to IoT backbone and enable the
transmission and reception of medical data. Studies about various fields of healthcare
in IoT, cloud computing and fog computing, this research review is valuable for
researchers. The Internet of Things have developed the healthcare sector, enhancing
the performance, reducing costs and concentrating on quality care for patients. It
provides a complete healthcare network for IoT which associated with cloud and fog
computing, also serve as backbone for cloud computing applications and provides a
framework for sharing Medical data between medical devices and remote servers.
References
1. Baker SB, Xiang W, Atkinson I (2017) Internet of things for smart healthcare: technologies,
challenges, and opportunities. Institute of Electrical and Electronics Engineers Inc., vol 5, pp
26521–26544, Nov. 29, 2017. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2775180
2. Carnaz GJF, Nogueira V (2019) An overview of IoT and healthcare question answering
systems in medical and healthcare domain view project NanoSen AQM view project Vitor
Nogueira Universidade de E´vora An Overview of IoT and Healthcare. Available: https://
www.researchgate.net/publication/330933788
3. Hussain S, Huh E, Kang BH, Lee S (2015) GUDM: automatic generation of unified datasets for
learning and reasoning in healthcare, pp 15772–15798. https://doi.org/10.3390/s150715772
4. Majumder AJA, Elsaadany YA, Young R, Ucci DR (2019) An energy efficient wearable smart
IoT system to predict cardiac arrest. Adv Hum-Comput Interact vol 2019. https://doi.org/10.
1155/2019/1507465
5. Ani R, Krishna S, Anju N, Sona AM, Deepa OS (2017) IoT based patient monitoring and
diagnostic prediction tool using ensemble classifier. In: 2017 International Conference on
Advanced Computing and Communication Informatics, ICACCI 2017, vol 2017-January, pp
1588–1593. https://doi.org/10.1109/ICACCI.2017.8126068
6. Joyia GJ, Liaqat RM, Farooq A, Rehman S (2017) Internet of Medical Things (IOMT): applica-
tions, benefits and future challenges in healthcare domain, May 2018. https://doi.org/10.12720/
jcm.12.4.240-247
7. Konstantinidis EI, Antoniou PE, Bamparopoulos G, Bamidis PD (2015) A lightweight frame-
work for transparent cross platform communication of controller data in ambient assisted living
environments. Inf Sci (NY) 300(1):124–139. https://doi.org/10.1016/j.ins.2014.10.070
8. Saba T, Haseeb K, Ahmed I, Rehman A (2020) Journal of Infection and Public Health Secure
and energy-efficient framework using Internet of Medical Things for e-healthcare. J Infect
Public Health 13(10):1567–1575. https://doi.org/10.1016/j.jiph.2020.06.027
9. Krishnaveni S, Prabakaran S, Sivamohan S (2016) Automated vulnerability detection and
prediction by security testing for cloud SAAS. Indian J Sci Technol 9(S1). https://doi.org/10.
17485/ijst/2016/v9is1/112288
10. Yang X, Wang X, Li X, Gu D, Liang C, Li K (2020) Exploring emerging IoT technologies in
smart health research: a knowledge graph analysis 9:1–12
11. Nashif S, Raihan MR, Islam MR, Imam MH (2018) Heart disease detection by using machine
learning algorithms and a real-time cardiovascular health monitoring system. World J Eng
Technol 06(04):854–873. https://doi.org/10.4236/wjet.2018.64057
12. Krishnaveni S, Vigneshwar P, Kishore S, Jothi B, Sivamohan S (2020) Anomaly-based intrusion
detection system using support vector machine. In: Artificial intelligence and evolutionary
computations in engineering systems, pp 723–731
13. Ram SS, Apduhan B, Shiratori N (2019) A machine learning framework for edge computing to
improve prediction accuracy in mobile health monitoring. In: Lecture notes in computer science
(including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics),
July 2019, vol 11621 LNCS, pp 417–431. https://doi.org/10.1007/978-3-030-24302-930
14. Umar S, Alsulaiman M, Muhammad G (2019) Deep learning for EEG motor imagery classi-
fication based on multi-layer CNNs feature fusion. Futur Gener Comput Syst 101:542–554.
https://doi.org/10.1016/j.future.2019.06.027
15. Minh Dang L, Piran MJ, Han D, Min K, Moon H (2019) A survey on internet of things and
cloud computing for healthcare. Electronics 8(7). https://doi.org/10.3390/electronics8070768
16. Dewangan K, Mishra M (2018) Internet of things for healthcare: a review. Researchgate.Net
8(Iii):526–534. Available: http://ijamtes.org/
17. Sood SK, Mahajan I (2019) IoT-Fog-based healthcare framework to identify and control hyper-
tension attack. IEEE Internet Things J 6(2):1920–1927. https://doi.org/10.1109/JIOT.2018.287
1630
18. Bhatia M, Sood SK (2019) Exploring temporal analytics in fog-cloud architecture for smart
office healthcare. Mob Networks Appl 24(4):1392–1410. https://doi.org/10.1007/s11036-018-
0991-5
19. Raj JS (2021) Security enhanced blockchain based unmanned aerial vehicle health monitoring
system. J ISMAC 3(02):121–131
20. Nandyala CS, Kim HK (2016) From cloud to fog and IoT-based real-time U- healthcare moni-
toring for smart homes and hospitals. Int J Smart Home 10(2):187–196. https://doi.org/10.
14257/ijsh.2016.10.2.18
21. Dubey H, Yang J, Constant N, Amiri AM, Yang Q, Makodiya K (2016) Fog data: enhancing
Telehealth big data through fog computing. In: ACM international conference on proceeding
series, vol 07–09-October-2015. May 2016. https://doi.org/10.1145/2818869.2818889
22. He W, Yan G, Da Xu L, Member S (2017) Developing vehicular data cloud services in the IoT
environment. https://doi.org/10.1109/TII.2014.2299233
23. Suma V (2021) Wearable IoT based distributed framework for ubiquitous computing. J
Ubiquitous Comput Commun Technol (UCCT) 3(01):23–32
24. Hariharakrishnan J, Bhalaji N (2021) Adaptability analysis of 6LoWPAN and RPL for
healthcare applications of internet-of-things. J ISMAC 3(02):69–81
25. Pazienza A, Polimeno G, Vitulano F (2019) Towards a digital future: an innovative semantic
IoT integrated platform for Industry 4.0. In: Healthcare, and territorial control
26. Aceto G, Persico V, Pescapé A (2018) The role of Information and Communication Tech-
nologies in healthcare: taxonomies, perspectives, and challenges. J Netw Comput Appl
107:125–154. https://doi.org/10.1016/j.jnca.2018.02.008
27. Tahmassebi A et al (2019) Impact of machine learning with multiparametric magnetic resonance
imaging of the breast for early prediction of response to neo adjuvant chemotherapy and survival
outcomes in breast cancer patients. Invest Radiol 54(2):110–117. https://doi.org/10.1097/RLI.
0000000000000518
28. Kayal CK, Bagchi S, Dhar D, Maitra T, Chatterjee S (2019) Hepatocellular carcinoma
survival prediction using deep neural network. In: Proceedings of international ethical hacking
conference 2018, pp 349–358
29. Zheng T et al (2017) A machine learning-based framework to identify type 2 diabetes through
electronic health records. Int J Med Inform 97:120–127. https://doi.org/10.1016/j.ijmedinf.
2016.09.014
30. Rahimian F et al (2018) Predicting the risk of emergency admission with machine
learning: development and validation using linked electronic health records. PLoS Med
15(11):e1002695. https://doi.org/10.1371/journal.pmed.1002695
31. Zacharaki EI et al (2009) Classification of brain tumor type and grade using MRI texture and
shape in a machine learning scheme. Magn Reson Med 62(6):1609–1618. https://doi.org/10.
1002/mrm.22147
32. Goodfellow D, Zhi R, Funke R, Pulido JC, Mataric M, Smith BA (2018) Predicting infant
motor development status using day long movement data from wearable sensors. Available:
http://arxiv.org/abs/1807.02617
33. Hassan MM, Huda S, Uddin MZ, Almogren A, Alrubaian M (2018) Human activity recognition
from body sensor data using deep learning. J Med Syst 42(6):99. https://doi.org/10.1007/s10
916-018-0948-z
34. Lonini L et al (2018) Wearable sensors for Parkinson’s disease: which data are worth collecting
for training symptom detection models. npj Digit Med 1(1). https://doi.org/10.1038/s41746-
018-0071-z
35. Kanjo E, Younis EMG, Ang CS (2019) Deep learning analysis of mobile physiological, envi-
ronmental and location sensor data for emotion detection. Inf Fusion 49:46–56. https://doi.org/
10.1016/j.inffus.2018.09.001
36. Liang CA, Chen L, Wahed A, Nguyen AND (2019) Proteomics analysis of FLT3-ITD mutation
in acute myeloid leukemia using deep learning neural network. Ann Clin Lab Sci 49(1):119–
126. https://doi.org/10.1093/ajcp/aqx121.148
Hand Gesture Recognition for Disabled
Person with Speech Using CNN
E. P. Shadiya Febin and Arun T. Nair
Abstract Because handicapped people account for a large percentage of our

community, we should make an effort to interact with them in order to exchange
knowledge, perspectives, and ideas. To that aim, we wish to establish a means of
contact. Individuals who are deaf or hard of hearing can communicate with one
another using sign language. A handicapped person can communicate without using
acoustic noises when they use sign language. The objective of this article is to
explain the design and development of a hand gesture-based sign language recog-
nition system. To aid handicapped individuals, particularly those who are unable to
communicate verbally, sign language is translated into text and subsequently into
speech. The solution is based on a web camera as the major component, which is
used to record a live stream video using a proprietary MATLAB algorithm. Recogni-
tion of hand movements is possible with the technology. Recognizing hand gestures
is a straightforward technique of providing a meaningful, highly flexible interaction
between robots and their users. There is no physical communication between the
user and the devices. A deep learning system that is efficient at picture recognition
is used to locate the dynamically recorded hand movements. Convolutional neural
networks are used to optimize performance. A static image of a hand gesture is used
to train the model. Without relying on a pre-trained model, the CNN is constructed.
Keywords Human–computer interaction · Gesture recognition · Web camera ·

CNN · MATLAB
E. P. Shadiya Febin (B)

India
A. T. Nair
https://doi.org/10.1007/978-981-16-7610-9_17
240 E. P. Shadiya Febin and A. T. Nair
1 Introduction
Technology has been ingrained in human existence in today’s modern civilization. It

penetrates many facets of our lives, from job to education, health, communication,
and security [1]. Human–computer interaction is one of a plethora of disciplines
undergoing rapid evolution (HCI). This method helps people to improve their under-
standing of technology and to develop innovative ideas and algorithms that benefit
society. Humans will be able to converse with technology in the same manner they do
with each other. Machines recognizing hand gestures was one of the early solutions
tested [2].
In the proposed system, we use hand gestures for communication of disabled
people by using CNN. Gesture detection is particularly essential for real-time appli-
cations. Numerous researchers are studying the applications of gesture recognition,
owing to the ubiquitous availability of digital cameras. Due to the intricacy of gesture
recognition, several challenges remain. As a consequence, this obstacle is overcome
through the use of convolutional neural networks and deep learning. Deep learning
outperforms machine learning when it comes to image identification. This example
makes use of the ASL dataset including the hand motions (1–5). By and large, a
picture is preprocessed to the extent that it facilitates the extraction of movements
from static images (i.e., background subtraction, image binarization). The character-
istic is then extracted from all of the images following binarization. Convolutional
neural networks are constructed using neurons that have learnable weights and biases.
Each neuron gets several inputs and computes their weighted total. Following that
an activation function processes it and provides an output.
2 Literature Survey
“A Deep Learning Method for Detecting Non-Small Cell Lung Cancer.” The
researchers [1] conducted a series of experiments in order to build a statistical model
enabling deaf individuals to translate speech to sign language. Additionally, they
created a system for automating speech recognition using ASR using an animated
presentation and a statistical translation module for a variety of sign sets. They trans-
lated the text using state transducer and phrase defined system methods. Various
sorts of figures were utilized during the review process: WER, BLEU, and finally
NIST. This article will walk you through the process of speech translation using an
automatic recognizer in each of the three configurations. The research produced a
result for the output of ASR employing a finite type state transducer with a word
error rate of between 28.21 and 29.27%.
A review on deaf-mute communication interpretation [2]: This article will
examine the several deaf-mute communication translator systems in use today. Wear-
able communication devices and online learning systems are the two major commu-
nication techniques used by deaf-mute persons. Wearable communication systems
Hand Gesture Recognition for Disabled Person … 241
are classified into three categories: glove-based systems, keypad-based systems, and
handicom touch-screen systems. All three of the aforementioned technologies make
use of a number of sensors, an accelerometer, a suitable microcontroller, a text to
voice conversion module, a keypad, and a touch-screen. The second method, namely
an online learning system, obviates the need for external equipment to understand
messages between deaf and hearing-impaired individuals. The online learning system
makes use of a number of instructional techniques. The five subdivision techniques
are as follows: SLIM module, TESSA, Wi-See Technology, SWI PELE System, and
Web-Sign Technology.
An efficient framework for recognizing Indian sign language implementation of
the wavelet transform [3]: The suggested ISLR system is a technique for pattern
recognition that entails two key modules: feature extraction and classification. To
recognize sign language, a combination of feature extraction using the Discrete
Wavelet Transform (DWT) and closest neighbor classification is utilized. According
to the experimental results, the suggested hand gesture recognition system achieves
a maximum classification accuracy of 99.23% when the cosine distance classifier is
utilized.
Hand gesture recognition using principal component nalysis in [4]: The authors
proposed a database-driven hand gesture recognition technique that is useful for
human robots and related other applications. It is based on a skin color model
approach and thresholding approach, as well as an effective template matching
strategy. To begin, the hand area is divided into segments using the YCbCr color
space skin color model. The subsequent stage makes use of thresholding to differ-
entiate foreground from background. Finally, using Principal Component Analysis,
a recognition technique based on template matching is created (PCA).
Hand gesture recognition system for dumb people [5]: The authors presented a
static hand gesture recognition system using digital image processing. The SIFT
technique is used to construct the vector representing the hand gestures. At the
edges, SIFT features that are invariant to scaling, rotation, and noise addition have
been calculated.
An automated system for recognizing Indian sign language in [6]: This article
discusses an approach to automatic sign identification that is based on shape-based
characteristics. The hand region is separated from the pictures using Otsu’s thresh-
olding approach, which calculates the optimal threshold for minimizing the variance
of thresholded black and white pixels within classes. Hu’s invariant moments are
utilized to determine the segmented hand region’s characteristics, which are then clas-
sified using an Artificial Neural Network. The performance of a system is determined
by its accuracy, sensitivity, and specificity.
Recognition of hand gestures for sign language recognition: A review in [7]: The
authors examined a variety of previous scholarly proposals for hand gesture and
sign language recognition. The sole method of communication accessible to deaf
and dumb persons is sign language. These physically handicapped persons use sign
language to express their feelings and thoughts to others.
The design issues and proposed implementation of a deaf and stupid persons
communication aid in [8]: The author developed a technique to help deaf and dumb
individuals in interacting with hearing people using Indian sign language (ISL), in
which suitable hand gestures are converted to text messages. The major objective
is to build an algorithm capable of instantly translating dynamic motions to text.
Following completion of testing, the system will be incorporated into the Android
platform and made accessible as a mobile application for smartphones and tablet
computers.
Indian and American Sign Language Real-Time Detection and Identification
Using Sift In [9]: The author demonstrated a real-time vision-based system for hand
gesture detection that may be used in a range of human–computer interaction appli-
cations. The system is capable of recognizing 35 distinct hand gestures used in
Indian and American Sign Language, or ISL and ASL. An RGB-to-GRAY segmen-
tation method was used to reduce the chance of incorrect detection. The authors
demonstrated how to extract features using an improvised Scale Invariant Feature
Transform (SIFT). The system is modeled in MATLAB. A graphical user interface
(GUI) concept was created to produce an efficient and user-friendly hand gesture
recognition system.
A Review of the Extraction of Indian and American Sign Language Features in
[10]: This article examined the current state of sign language study and development,
which is focused on manual communication and body language. The three steps of
sign language recognition are generally as follows: pre-processing, feature extraction,
and classification. Neural Networks (NN), Support Vector Machines (SVM), Hidden
Markov Models (HMM), and Scale Invariant Feature Transforms (SIFT) are just a
few of the classification methods available (Table 1).
The present technique makes use of the orientation histogram, which has a
number of disadvantages, including the following: comparable motions may have
distinct orientation histograms, while dissimilar gestures may have similar orienta-
tion histograms. Additionally, the suggested technique worked effectively for any
items that took up the majority of the image, even if they were not hand motions.
3 Proposed System
This article makes use of Deep Learning techniques to identify the hand motion.
To train the recommended system, static image datasets are employed. The network
is built using convolutional neural networks rather than pre-trained models. The
proposed vision-based solution does not require external hardware and is not
restricted by dress code constraints. The deep learning algorithm CNN is used to
convert hand motions to numbers. A camera is utilized to capture a gesture, which is
subsequently used as input for the motion recognition system. Conversion of signal
language to numerical data and speech in real time, or more precisely: Recognize
male and female signal gestures 2. Creating a model for image-to-text content trans-
lation using a system-learning approach three. The genesis of words 4. Composition
of sentences 5. Composing the comprehensive text 6. Convert audio to a digital
format. Figure 1 depicts the steps necessary to accomplish the project’s objectives.
Table 1 Review on types of hand gestures used

Author Methods Features Challenges
Virajshinde [11] Electronics based Use of electronic Lot of noise in
hardware transition
Maria Eugenia [12] Glove based Requires use of hand Complex to use
gloves
Sayemmuhammed [13] Marker based Use of markers on Complex to use
fingers or wrist multiple markers
Shneiderman [14] HCI Describe practical Current HCI design
techniques methods
Lawerence d o [15] HCI Usability, higher User should be prepare
educations to use the system
Chu, Ju, Jung [16] EMG Differential mechanism Minimize the wt of the
prosthetic hand
O. Marques [17] MATLAB More than 30 tutorials Accessible all people
A. Mcandrew [18] MATLAB Information about DSP Understand every one
using matlab
Khan, T. M. [19] CED Accurate orientation Noisy real life test data
Nishad PM [20] Algorithm Different color Convert one space to
conversion other
Fig. 1 The system block diagram
Gestures are captured by the web camera. This Open CV video stream captures
the whole signing period. Frames are extracted from the stream and transformed
to grayscale pictures with a resolution of 50 * 50 pixels. Due to the fact that the
entire dataset is the same size, this dimension is consistent throughout the project. In
the gathered photographs, hand movements are recognized. This is a preprocessing
phase that occurs before to submitting the picture to the model for prediction. The
paragraphs including gestures are highlighted. This effectively doubles the chance of
prediction. The preprocessed images are put into the keras CNN model. The trained
model generates the anticipated label. Each gesture label is connected with a prob-
ability. The projected label is assumed to be the most likely. The model transforms
recognized movements into text. The pyttsx3 package is used to convert recognized
words to their corresponding speech. Although the text to speech output is a simple
workaround, it is beneficial since it replicates a verbal dialog. The system architecture
is depicted in Fig. 2.
Convolutional Neural Networks are used for detection (CNN) CNNs are a special
form of neural network that is highly effective for solving computer vision issues.
They took inspiration from the way image is perceived in the visual cortex of our
brain. They utilize a filter/kernel to scan the entire image’s pixel values and conduct
computations by assigning suitable weights to allow feature detection [25, 26]. A
CNN is made up of multiple layers, including a convolution layer, a maximum pooling
layer, a flatten layer, a dense layer, and a dropout layer. When combined, these layers
form a very powerful tool for detecting characteristics in images. The early layers
detect low-level features and move to higher-level features gradually. Alexnet is a
widely used machine learning method. Which is a sort of deep learning approach
that utilizes pictures, video, text, and sound to do classification tasks. CNNs excel
in recognizing patterns in pictures, allowing for the detection of hand movements,
faces, and other objects. The benefit of CNN is that training the model does not
need feature extraction. CNNs are invariant in terms of scale and rotation. In the
proposed system alexnet is used for object detection the alexnet has eight layers with
learnable parameters the model consists of five layers with a combination of max
pooling followed by 3 fully connected layers and they use relu activation in each of
these layers except the output layer.
3.2 MATLAB
MATLAB is a programming environment for signal processing and analysis that is

often used. MATLAB is a computer language for the creation and manipulation of
discrete-time signals. Individual expressions may be directly entered into the text
window of the MATLAB interpreter. You may store text files or scripts (with .m
extensions) that include collections of commands and then execute them from the
command line. Users can also create MATLAB routines. Optimizations have been
made to MATLAB’s matrix [27] algebra procedures. Typically, loops take longer
to complete than straight lines. Its functions can be written as C executables to
increase efficiency (though you must have the compiler). Additionally, you may
utilize class structures to organize your code and create apps with intricate graphical
user interfaces. MATLAB comes pre-loaded with various functions for importing and
exporting audio files. MATLAB’s audioread and audiowrite functions allow you to
read and write data to and from a variety of different types of audio files. The sound
(unnormalized) or soundsc (normalized) functions in most versions of MATLAB
may send signals to the computer’s audio hardware.
3.3 Recognition of Numbers
To identify the bounding containers of various objects, we used Gaussian historical

past subtraction, a technique that modified each historical pixel using a mixture of K
Gaussian set distributions (k varies from 3 to 5). The colorations associated with the
presumed historical past are those that remain over a longer length of time and are
thus more static. Around those changing pixels, we build a square bounding field.
Following the collection of all gesture and heritage photos, a Convolutional Neural
Network model was built to disentangle the gesture symptoms and indications from
their historical context. These function maps illustrate how the CNN can grasp the
shared unexposed structures associated with several of the training gesture markers
and so discriminate between a gesture and the past. The numbers linked with the
hand movements are depicted in Fig. 3 the training is done in CNN. After training
an input image is given by capturing from a webcam. The given image is tested for
recognizing the gesture.
3.4 Results and Discussions
The dataset for sign language-based numerical is carefully assembled in two distinct
modalities for test and training data. These datasets were trained using the Adam
optimizer over a 20-epoch period, providing accuracy, validation accuracy, loss, and
validation loss for each epoch, as shown in Table 2. It indicates a progressive rise in the
accuracy of instruction. As demonstrated in Fig. 4, accurate categorization requires
a minimum of twenty epochs. The accuracy value obtained at the most recent epoch
indicates the entire accuracy of the training dataset. The categorical cross entropy
of the loss function is used to determine the overall system performance. Between
training and testing, the performance of the [28, 29] CNN algorithm is compared
using a range of parameters, including execution time, the amount of time necessary
Fig. 3 The gesture symbols for numbers that will be in the training data
Table 2 Training on single CPU, initializing input data normalization

Epoch Iteration Time elapsed Mini batch Mini batch loss Base learning rate
(hh:mm:ss) accuracy (%)
1 1 00:00:07 28.13 2.2446 0.0010
10 50 00:04:42 100.00 0.0001 0.0010
20 100 00:09:23 100.00 2.1477e–06 0.0010
Fig. 4 Confusion matrix: alexnet

for the program to accomplish the task. Sensitivity measures the fraction of positively
recognized positives that are properly classified. The term “specificity” relates to the
frequency with which false positives are detected. The graph displays the class 5 roc
curve. To obtain the best results, investigations are done both visually and in real
time. The CNN algorithm is advantageous in this task for a variety of reasons. To
begin, CNN is capable of collecting image characteristics without requiring human
intervention. It is faster at memorizing pictures or videos than ANN. CNN executes
more slowly than AN. The proposed system is implemented by using MATLAB 2019
version. The system can also be used in real time by using addition of cameras, used
to address the complex background problem and improve the robustness of hand
detection (Table 2).
3.5 Future Enhancement
Hand gesture recognition was created and developed in accordance with current
design and development methodologies and scopes. This system is very flexible,
allowing for simple maintenance and modifications in response to changing surround-
ings and requirements simply by adding more information. Additional modifications
to bring assessment tools up to date are possible. This section may be reorganized if
required.
4 Conclusion
The major goal of the system was to advance hand gesture recognition. A prototype
of the system has been constructed and tested, with promising results reported. The
device is capable of recognizing and generating audio depending on hand motions.
The characteristics used combine picture capture and image processing to improve
and identify the image using built-in MATLAB techniques. The project is built using
MATLAB. This language selection is based on the user’s needs statement and an
evaluation of the existing system, which includes space for future expansions.
References
1. Hegde B, Dayananda P, Hegde M, Chetan C (2019) Deep learning technique for detecting
NSCLC. Int J Recent Technol Eng (IJRTE) 8(3):7841–7843
2. Sunitha KA, Anitha Saraswathi P, Aarthi M, Jayapriya K, Sunny L (2016) Deaf mute
communication interpreter—a review. Int J Appl Eng Res 11:290–296
3. Anand MS, Kumar NM, Kumaresan A (2016) An efficient framework for Indian sign language
recognition using wavelet transform. Circuits Syst 7:1874–1883
4. Ahuja MK, Singh A (2015) Hand gesture recognition using PCA. Int J Comput Sci Eng Technol
(IJCSET) 5(7):267–27
5. More SP, Sattar A, Hand gesture recognition system for dumb people. Int J Sci Res (IJSR)
6. Kaur C, Gill N, An automated system for Indian sign language recognition. Int J Adv Res
Comput Sci Software Eng
7. Pandey P, Jain V (2015) Hand gesture recognition for sign language recognition: a review. Int
J Sci Eng Technol Res (IJSETR) 4(3)
8. Nagpal N, Mitra A, Agrawal P (2019) Design issue and proposed implementation of commu-
nication Aid for Deaf & Dumb People. Int J Recent Innov Trends Comput Commun
3(5):147–149
9. Gilorkar NK, Ingle MM (2015) Real time detection and recognition of Indian and American
sign language using sift. Int J Electron Commun Eng Technol (IJECET) 5(5):11–18
10. Shinde V, Bacchav T, Pawar J, Sanap M (2014) Hand gesture recognition system using camera
03(01)
11. Gebrera ME (2016) Glove-based gesture recognition system. In: IEEE international conference
on robotics and biomimetics 2016
12. Siam SM, Sakel JA (2016) Human computer interaction using marker based hand gesture
recognition
13. Shneiderman B, Plaisant C, Cohen M, Jacobs S, Elmqvist N, Diakopoulos N (2016) Designing
the user interface: strategies for effective human-computer interaction Pearson
14. Lawrence DO, Ashleigh M (2019) Impact of Human-Computer Interaction (HCI) on users
in higher educational system: Southampton University as a case study. Int J Manage Technol
6(3):1–12
15. Chu JU, Jung DH, Lee YJ (2008) Design and control of a multifunction myoelectric hand with
new adaptive grasping and self-locking mechanisms. In: 2008 IEEE international conference
on robotics and automation, pp 743–748, May 2008
16. Marques O (2011) Practical image and video processing using MATLAB. Wiley
17. McAndrew A (2004) An introduction to digital image processing with Matlab notes for
SCM2511 image processing, p 264
18. Khan TM, Bailey DG, Khan MA, Kong Y (2017) Efficient hardware implementation for
fingerprint image enhancement using anisotropic Gaussian filter. IEEE Trans Image Process
26(5):2116–2126
19. Nishad PM (2013) Various colour spaces and colour space conversion. J Global Res Comput
Sci 4(1):44–48
20. Abhishek B, Krishi K, Meghana M, Daaniyaal M, Anupama HS (2019) Hand gesture recog-
nition using machine learning algorithms. Int J Recent Technol Eng (IJRTE) 8(1) ISSN:
2277-3878
21. Ankita W, Parteek K (2020) Deep learning-based sign language recognition system for static
signs. Neural Comput Appl
22. Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep
convolutional neural networks. Comput Vis Pattern Recogn
23. Chuan CH, Regina E, Guardino C (2014)American sign language recognition using leap motion
sensor. In: 13th International Conference on Machine Learning and Applications (ICMLA),
pp 541–544
24. Devineau G, Moutarde F, Xi W, Yang J (2018) Deep learning for hand gesture recognition on
skeletal data. In: 13th IEEE International conference on automatic face & gesture recognition,
pp 106–113
(29 p). https://doi.org/10.1142/S0219519421500056
an intelligent approach. Comput Methods Biomech Biomed Eng: Imaging Visual. https://doi.
org/10.1080/21681163.2019.1647459
nosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030(29 p). https://doi.org/10.
1142/S0219467820500308
28. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy. In:
Publication in lecture notes. Springer, Berlin
29. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy.
In: Publication in the IOP: Journal of Physics Conference Series (JPCS)
Coronavirus Pandemic: A Review
of Different Machine Learning
Approaches
Bhupinder Singh and Ritu Agarwal
Abstract Millions of individuals have been affected by coronavirus illness. The

coronavirus epidemic offers a significant medical danger to the wider range of popula-
tion. The COVID-19 disease outbreak and subsequent control strategies have created
a global syndrome that has impacted all aspects of human life. The initial stage
detection of COVID-19 has become a difficult task for all researchers and scien-
tists. There exist various ML and Deep Learning techniques to detect COVID-19
disease. There are various stages of COVID-19, initially, it was spread by people
who travelled from countries which were severely affected by Corona Virus. After
some time, it entered the community transmission phase. The virus has different
impact on different individuals and there is no known cure found for this disease.
The virus shows immediate affect on certain individuals whereas on others it takes
few days to weeks for the symptoms to show but on some people it does not show
any symptoms. The most common symptoms are dry cough, fever, lung infec-
tion, etc. This paper provides information about the several tests available for the
detection of COVID-19. This paper provides a detailed comparison among the deep
learning (DL) and AI (artificial intelligence) based techniques which are used to
detect COVID-19 diesease.
Keywords COVID-19 disease · CXR images · AI · Stages and symptoms
1 Introduction
COVID-19 is a viral infection which began spreading in December 2019. This

epidemic has affected every part of the globe in a very short period of time . The
World Medical Community declared it a pandemic on March 11, 2020 [1]. The coro-
navirus is a very contagious virus and can easily transfer from one person to another
B. Singh · R. Agarwal (B)

Delhi Technological University, New Delhi, India
https://doi.org/10.1007/978-981-16-7610-9_18
252 B. Singh and R. Agarwal
person. This virus, also called as extreme acute respiratory syndrome CORONA
VIRUS-2 , affects the respiratory system. The virus shows different reactions to
individual bodies [2]. The basic structure of COVID-19 virus is shown in Fig. 1. The
spread of COVID-19 disease is divided into different stages such as [4]: STAGE-1,
STAGE-2, STAGE-3, and STAGE-4.
STAGE-1 occurs when infected people travells from one country to another. Espe-
cially, people that have travelled overseas are found to be infected with respiratory
ailment. During that time, the illness is not spreading domestically. During STAGE-2,
localized transmissions occur and the source, i.e., the sick individual who may have
traveled to certain other nations that have already been affected, becomes recognized
and can be tracked down. The STAGE-3 is community transmission. At this stage,
the virus is spread from one person to another or one cluster to another. At this stage,
the infected person is hard to trace.
The most serious stage of a contagious diseases propagating inside a country is
STAGE-4. During this stage, there seem to be multiple spots of illness in various
sections of the population, as well as the disease has taken on pandemic proportions.
Because of the flu virus’s quick and extensive spread, the medical organization
classified coronavirus infection 2019 a pandemic around March 11, 2020. However, it
started as a Chinese outbreak, with the first case verified on February 26th in Hubei
province, Wuhan [4]. The epidemiological component of COVID-19 was chosen
independently as a novel coronavirus, initially dubbed 2019-nCoV [5]. The disease
genome was eventually transcribed, and it was termed SARS-CoV-2 as acute respi-
ratory syndrome by the World Association of Categorization of Infections because
it was morphologically closer to the COVID-19 breakout that triggered the SARS
outbreak in 2003.
Fig. 1 Structure of COVID-19 [3]

Coronavirus Pandemic: A Review of Different … 253
Different tests are available for COVID-19 disease detection such as PCR (Poly-
merase Chain Reaction) [5], COVID-19 Antigen [6], and COVID-19 Antibody Test.
COVID-19 infection resulted in a slew of devastating diseases, with eye-catching
signs and symptoms [7]. Nausea, clogged nose, tiredness, and breathing problems
were among the flu-related symptoms mentioned by participants. The composition
of the COVID-19 viral infection is tough to grasp for any researcher. COVID-19 has
several symptoms. Certain signs are frequent, some that are less common, and some
that are unusual.
COVID-19 affects human health as well as mental status. The COVID-19
epidemic has caused many workers who have lost their jobs. People go through
issues of anxiety and stress. The various mental issues and symptoms of COVID-19
are depicted in Table 1.
Early detection is essential for COVID-19 identification, and it can improve
intensive physical rates. In the early stages of COVID-19, image processing is
a vital approach for exploring and identifying the disease. Although, manually
analyzing a large range of healthcare imaging can sometimes be time-consuming
and monotonous, and it is susceptible to human error and biased [10]. Deep learning
(DL) and AI (artificial intelligence) make it simple to distinguish between infected
and non-infected patients. There are various aspects by which AI and ML become
helpful for COVID-19 detection.
In review practice, AI approaches are used to diagnose disease and anticipate
therapy outcomes. AI can give essential information regarding allocation of resources
and judgment through prioritizing the requirement for mechanical ventilation and
breathing assistance in the ICU (Intensive Care Unit) respondents through question-
naires through supporting documents and clinical factors. AI also can be utilized
to forecast recovery or mortality in COVID-19, as well as to offer regular reports,
preservation and predictive analytics, and therapy tracking. AI is used to diagnose
individuals into low, medium, and serious divisions depend on the outcome of their
sensations, predisposition, and clinical reports so that alternative actions can be
implemented to treat individuals as quickly and efficiently as possible [11].
In medical applications, deep learning has achieved significant enhancement.
Deep learning can find patterns in very massive, complicated datasets. They have been
recognized as a viable tool for analyzing COVID-19-infected individuals. COVID-19
identification based on neural network is a deep learning model that uses perceptual
2-dimensional and 3-dimensional data collected from a dimensional lung CT scan to
Table 1 Various issues/problems due to COVID-19

S. No. Mental Issues [9] Symptoms
1 Common mental illness Fear, Depression, Burnout, Anxiety
2 Less common mental illness Sadness, Sleep shortness, Energy shortness,
Dizziness
3 Rare issues Self-harm addiction, Domestic abuse, Loneliness,
Suicide, Social isolation
distinguish among COVID-19 and neighborhood bacterial meningitis. The use of DL

in the medical illness area of corona medical imaging technique minimizes inaccu-
rate and pessimistic inaccuracies in the monitoring and classification of COVID-19
illness, providing a one-of-a-kind chance to give patients quick, low cost, and reliable
medication management [12].
This paper is divided into different sub sections. The first section is the introduc-
tion section which provides the introduction about COVID-19. In Sect. 2, several
existing methods are reviewed. In Sect. 3, the various approaches for COVID-19
disease detection is discussed. The research challenges and limitations with COVID-
19 detection techniques are discussed in Sect. 4. In the last Sect. 5, overall conclusion
and future scope are mentioned.
2 Literature Review
For the analysis and detection of CORONA virus, Jain [13] designed a novel approach
which was based on the DL concept. The testing and training of the DL based model
were done with CXR images. The images of infected and non-infected persons were
utilized for the training purpose of various DL models. The images of chest x-ray
were filtered out and data augmentation was applied to them. The three-DL based
approaches ResNeXt, Inception V3, and Xception were examined based on their
accuracy of COVID-19 detection. The collected dataset had 6432 images of CXR
which were collected from Kaggle site. The collection of 5467 images was used for
the training purpose of models and 965 used for testing purposes. The Xception model
provided the highest accuracy among other models. The performances of the models
were examined on three parameters: precision rate, f1-score, and recall rate. As the
CNN approach provided a standard sate of results in the medical field. For efficient
results with a deep convolutional model, Kamal et al. [14] provided an evaluation of
prototypes based on pre-trained for the COVID-19 classification of CXR frames. The
Neural Architecture Search Network (NASNet), resnet, mobilenet, DenseNet, VGG-
19, and InceptionV3 pre-trained models were examined. The comparison outcomes
of pre-trained models show that three class classifications had achieved the highest
accuracy. Ibrahim et al. [15] proposed a system that can classify three different classes
of COVID-19. The AlexNet pre-trained model was implemented for the classifica-
tion of patients. The model was used to predict the type of COVID-19 class as well as
predict the infected patient or non-infected patient. The CXR medical images were
composed from public datasets. The database of images contained bacterial pneu-
monia, COVID-19, pneumonia viral infected, and healthy or CXR normal images.
The classification outcomes of the proposed model were based on two-way clas-
sification, three ways, and four-way classification. In a two-way classification of
non-infected or normal and viral pneumonia images, the proposed model provided
94.43% of accuracy. In normal and bacterial pneumonia classification, the model
provided 91.43% of accuracy. The model got 93.42% of accuracy in the four-way
classification of images. Annavarapu et al. [16] have proposed a COVID detec-

tion technique that is based on DL. In the proposed system, a pre-trained feature
extractor was used for efficient results. The pre-trained model used by the authors
was the ResNet-50 model that enhanced the learning. The model was based on the
COVID CXR dataset, which contains 2905 images of COVID, infected or pneu-
monia, and medical images of chest. The model performance was examined with
AU-PR, AU-ROC, and Jaccard Index. The model achieved standard results: 95%
of accuracy, 95% f1-score, and 97% specificity. Ozturk et al. [17] designed a novel
technique for the recognition of COVID-19 infection automatically by utilizing raw
chest X-ray images. Ozturk et al. [17] designed a prototype technique for automati-
cally detecting COVID-19 infection employing unprocessed chest X-ray scans. This
technique provided correct diagnostics for binary classification which was used for
comparison of COVID and. no-findings and MC (multi-class) classification which
was used for comparison of COVID infection, no-findings, and pneumonia in binary
classes, this approach had a classification accuracy of 98 percent, while in multi-class
scenarios, it had an accuracy of 87 percent. In this strategy, the DarkNet architec-
ture was used as a classifier. 17 convolutional layers were executed and separate
filtering on every stage was used. Based on the deep learning pre-trained model, Al-
antari et al. [18] implemented a system that identifies the patients of CORONA. The
pre-trained YOLO model was used with a computer-aided diagnosis system. The
purposed system was used for multiple classifications of respiratory diseases. The
system provided the differentiation between eight different types of diseases related
to respiratory. The performance of the system was examined on two datasets: one was
the CXR 8 data collection, and the second was the dataset of COVID-19. Using two
separate datasets of CXR images, the planned system was evaluated using fivefold
tests for the MC prediction issue. For the training purpose, 50,490 images were used
and achieved 96.31% detection accuracy with 97.40% classification accuracy. The
CAD system works like a real-time system and can predict at a rate of 108 frames
per second (FPS) with 0.0093 s only. Table 2 shows the various COVID-19 detection
methods presently in use.
The graphical representation of existing methods results of COVID-19 detection
is shown in Fig. 2.
3 Various Approaches for Disease Detection of COVID-19
For the detection of COVID-19, there are two approaches: AI based and DL based.
Due to artificial intelligence fast-tracking technologies, AI is helpful in lowering
doctors’ stress, because it can analyze radiographic findings using DL (deep learning)
and ML (machine learning) systems. AI’s fast-tracking platforms encourage cost-
effective and time operations through swiftly assessing a huge proportion of images,
leading to better patient care.
Table 2 COVID-19 detection existing methods

Author Proposed Problem/Gaps Dataset Performance metrics
Name, Year methods used
Jain et al. ResNeXt, Need of large Dataset of 6432 Accuracy = 97.97%
[13] Inception V3, dataset for images of chest
and Xception validation of the X-ray
model
Kamal et al. The deep Work only on Dataset of 760 Accuracy = 98.69%
[14] convolutional specific datasets images
model with a
pre-trained
model
Ibrahim et al. AlexNet For better Chest X-ray Accuracy = 99.62%
[15] pre-trained performance, images
model with deep CNN models collected from
learning must collaborate public datasets
with SVR
(support vector
regression) and
SVM (support
vector machine)
Annavarapu Transfer learning Hard to Publically Accuracy = 95%
et al. [16] with ResNet-50 implement available
pre-trained dataset of 2905
model images
Ozturk et al., Deep Less images for Collection of Accuracy = 98.08%
[17] learning-based model validation 1125 images
Darknet model
with pre-trained
YOLO model
Al-antari Pre-trained Need to collect ChestX-ray8 Accuracy = 97.40%
et al. [18] YOLO model more digital dataset and
with X-ray and CT COVID-19
computer-aided images for dataset
diagnosis system validation
purpose
3.1 AI (Artificial Intelligence) Based Approaches
Artificial intelligence technologies are expanding into fields that were traditionally
thought to be in the domain of human intelligence, recent advances in comput-
erized information collection, predictive analytics, and computational technologies.
Medical practice is being influenced by machine learning. It is still difficult to develop
prediction systems that could really reliably predict and identify such viruses. AI
methods, also known as classification methods, may take in information, analyzes
everything statistically, and determine the future based on the statistical architectural
features. Many of these techniques have such a variety of uses, including image
Fig. 2 Comparative analysis of existing approaches
processing, facial recognition, estimation, recommender systems, and so on. ML

seems to have a great deal of potential in diagnostic and treatment as well as the
advancement of machine systems. Healthcare professionals encouraged the use of
digital learning techniques in diagnosis and treatment in the healthcare profession
as a resource to facilitate them [19]. Figure 3 depicts an artificial intelligence-based
technique to COVID-19 detection.
Symptomatic
Symptomatic
COVID-19 AI Diagnosis
Analysis
patients dataset
Positive samples
AI based
Recovery taken & Start
treatment
Therapy
COVID-19 Positive or
Cured
Retest Negative
Fig. 3 AI based approach for COVID detection [19]

To control the impacts of the illness, AI techniques are used for a variety
of domains. Available treatments, image classification connected to COVID-19,
pharmacology investigations, and epidemiological are among the implementations.
3.1.1 Types of AI Approaches
Machine learning: Machine learning methods are employed to analyze medical

conditions in order to make a diagnosis COVID-19 individuals. Individuals are
encouraged fundamental participants to report their conditions. An artificial intel-
ligence system is utilized to detect COVID-19 utilizing information from medical
treatment continuously assess. In a number of studies, a variety of ML algorithms
were used to identify the condition of corona virus. In, multiple machine learning
algorithms are employed to analyze patient information to evaluate the COVID-19
instances: Logistic regression (LR), Support Vector Machine (SVM), Random Fores
(RF), and Decision Tree (DT) [20].
SVM model for the detection of COVID-19: SVM is the simplest way to classify
binary classes of data. SVM is the one of the machine learning algorithms based on
supervised learning helps in classification as well as regression of CXR images of
COVID-19.
LR based COVID-19 detection model: Logistric regression is used to identify the
catagory of disease in COVID-19 identification system. It provides the probabilistic
measures for the classification of CXR images.
Artificial telemedicine: Artificial telemedicine services are increasingly valuable
throughout an epidemic because they allow people to obtain the care they require
from the comfort of their own homes, thereby limiting the retroviruses transmission.
Artificial telehealth algorithms have been developed using AI methods in several
studies. The authors describe an unique AI-based technique for determining the
risk of COVID-19 transmission in broadband linkages. AI types with advantages,
disadvantages, layers, problems are depicted in Table 3.
Table 3 AI types with advantages, disadvantages, layers, and problems

Type Model type Advantage Disadvantage
Generic machine XGBoost model [21] Optimized methodology False detection of
learning for early stage detection positive cases
Ensemble machine SVM. Decision tree, Multi-class High-computational
learning KNN, Naive Bayes identification of diseases time
[22]
Artificial NLP based [23] Remotely and less time Hard to implement
telemedicine cosuming
Machine learning SVM [24] Multi-class False positive results
classification
3.2 Deep Learning-Based Approaches
When applied in the interpretation of multimodal images, DL-based models have the
potential to provide an effective and precise strategy for the diagnosis and classifying
COVID-19 disease, with significant increases in image resolution. Deep learning
methods have advanced significantly in the recent two decades, providing enormous
prospects for application in a variety of sectors and services. Figures 4 and 5 depict
the core architecture-based approach for detecting COVID-19. Deep learning-based
methods with keypoints, layers, advantages, and limitations are shown in Table 4.
There are several types of deep learning techniques are:
• Convolutional Neural Network (CNN): Convolutional Neural Networks have
grown in popularity as a result of their improved frame classification performance.
The activation functions of the organization, in conjunction with classifiers, aid
in the retrieval of temporal and spatial features from frames. In the levels, a
weight-sharing system is implemented, which considerably reduces computing
time [27].
• Recurrent Neural Network (RNN): Due to internal storage space, the RNN
(Recurrent Neural Network) was one of the first algorithms to preserve starting
data, making it excellent for computer vision difficulties involving sequential
Fig. 4 Deep learning-based approach [26]
Fig. 5 Different types of

approaches based on deep
learning
Table 4 Approaches based on deep learning with keypoints, layers, advantages and limitations
Method Keypoints Layers Advantages Limitations
CNN based [13] Work only on Multilayers High accuracy Data scarcity
limited images
DCNN [14] Three class 50 layers Low computational Implementation is
differentiation cost hard
ResNet-50 Provided high 50 layers Less false results High
pre-trained model accurate results computational cost
[16]
Deep More suitable for Multilayer Highly accurate Small dataset
learning-based binary results
YOLO model [17] classification
information, such as voice and communication because it has a lot of storage

[28].
The collection of samples of COVID-19 and dataset description is depicted in
Table 5.
Table 5 Datasets of COVID-19 with description [28]

S. No. Datasets Description
1 ImmPort ImmPort is supported by the National Institutes
of Health, the National Institute of Allergy and
Infectious Diseases, and the Defense Advanced
Research Projects Agency. NIH-funded
programs, other research institutions, as well as
scientific organizations have contributed
information to ImmPort, assuring that all these
findings will become the core of research
consideration
2 N3C (National COVID Cohort The NHI known as National Health Institute
Collaborative) has developed a centralized, isolated
compartment to collect and manage substantial
quantities of patient information records from
people across the nations who have been
identified as having coronavirus illness. There
are 35 collaborative hubs around the United
States
3 OpenSAFELY The National Health Service’s OpenSAFELY
platform is a secure and trustworthy platform
designed for electronic medical records
analysis (NHS). It was designed to provide
instant feedback amid the on-going COVID-19
crisis
4 Vivli On Covid, the Vivli framework includes
medical testing. Johns Hopkins is a part of the
organization
4 Research Challenges and Limitations
Regulations, limited resources, as well as the inaccessibility of huge training samples,

massive impulse noise, and speculations, restricted knowledge of the junction among
both medicine and computer science, privacy and confidentiality issues, inconsistent
accessibility of textual information, and many are generally posing challenges to
artificial intelligence like machine learning and deep learning implementations in
COVID-19 investigation. The various research challenges and limitations of COVID-
19 techniques are as:
• Huge training samples are scarce and unavailable.
• Several intelligence-based deep learning approaches rely on huge training
samples, such as diagnostic imaging as well as various environmental variables.
However, because of the quick growth of COVID-19, insufficient samples are
enabling AI.
• In practice, evaluating training datasets takes a long time or may require the
assistance of qualified healthcare workers.
• There is a gap at the junction of medicine and computer science.
• Data that is structurally inaccurate as well as information that is not structurally
appropriate for example text, image, and numerical data [29].
5 Conclusion and Future Scope
Coronavirus illness is a global pandemic. In the fight against COVID-19, smart

image processing has been crucial. CT scans, X-rays as well as PCR data, are used by
professionals to realistically simulate the condition. Rather than COVID-19, the PCR
analysis determines the number of respiratory diseases, like bacterial pneumonia.
This paper provides a brief introduction to COVID-19. There are multiple stages
of COVID-19 that all are discussed. The several tests for COVID-19 such as PCR,
Antigen, Antibody test, etc. The multiple sign and symptoms of COVID-19 are
also discussed. This paper discusses the various concerns and problems caused by
COVID-19. COVID detection can be done by using a variety of techniques, including
ML and DL techniques. The comparative analysis of existing methods with the help
of graphical representation is depicted. Reference [15] had provided the highest
accuracy rate for the detection of COVID-19.
The accuracy of the techniques is depend upon the training samples. In future,
more advanced techniques will be studied and compared for a better analysis of the
techniques.
References
1. Sungheetha A (2021) COVID-19 risk minimization decision making strategy using data-driven
model. J Inf Technol 3(01):57–66
2. Pereira RM, Bertolini D, Teixeira LO, Silla Jr CN, Costa YM (2020) COVID-19 identification in
chest X-ray images on flat and hierarchical classification scenarios. Comput Methods Programs
Biomed 194:105532
3. Haque SM, Ashwaq O, Sarief A, Azad John Mohamed AK (2020) A comprehensive review
about SARS-CoV-2. Future Virol 15(9):625–648
4. COVID-19: The 4 Stages Of Disease Transmission Explained (2021). Retrieved 24 June
2021, from https://www.netmeds.com/health-library/post/covid-19-the-4-stages-of-disease-
transmission-explained
5. Cai Q, Du SY, Gao S, Huang GL, Zhang Z, Li S, Wang X, Li PL, Lv P, Hou G, Zhang LN
(2020) A model based on CT radiomic features for predicting RT-PCR becoming negative in
coronavirus disease 2019 (COVID-19) patients. BMC Med İmaging 20(1):1–10
6. Mohanty A, Kabi A, Kumar S, Hada V (2020) Role of rapid antigen test in the diagnosis of
COVID-19 in India. J Adv Med Med Res 77–80
7. Coronavirus disease (COVID-19)—World Health Organization. (2021). Retrieved 9 June
2021, from https://www.who.int/emergencies/diseases/novel-coronavirus-2019?gclid=Cj0
KCQjwzYGGBhCTARIsAHdMTQwyiiQqt3qEn89y0AL5wCEdGwk1bBViX2aoqA__F7M
aGeQEiuahTI4aAh4uEALw_wcB
8. Larsen JR, Martin MR, Martin JD, Kuhn P, Hicks JB (2020) Modeling the onset of symptoms
of COVID-19. Front Public Health 8:473
9. Shastri S, Singh K, Kumar S, Kour P, Mansotra V (2021) Deep-LSTM ensemble framework
to forecast Covid-19: an insight to the global pandemic. Int J Inform Technol, 1–11
10. Huang S, Yang J, Fong S, Zhao Q (2021) Artificial intelligence in the diagnosis of COVID-19:
challenges and perspectives. Int J Biol Sci 17(6):1581
11. Arora N, Banerjee AK, Narasu ML (2020) The role of artificial intelligence in tackling COVID-
19
12. Nayak J, Naik B, Dinesh P, Vakula K, Dash PB, Pelusi D (2021) Significance of deep learning
for Covid-19: state-of-the-art review. Res Biomed Eng, 1–24
13. Jain R, Gupta M, Taneja S, Hemanth DJ (2020) Deep learning-based detection and analysis of
COVID-19 on chest X-ray images. Appl Intell 51(3):1690–1700
14. Kamal KC, Yin Z, Wu M, Wu Z (2021) Evaluation of deep learning-based approaches for
COVID-19 classification based on chest X-ray images. Sign Image Video Process, 1–8
15. Ibrahim AU, Ozsoz M, Serte S, Al-Turjman F, Yakoi PS (2020) Pneumonia classification using
deep learning from chest X-ray images during COVID-19. Cogn Comput, 1–13
16. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for
COVID-19 chest X-ray classification. Appl Intell, 1–17
17. Ozturk T, Talo M, Yildirim EA, Baloglu UB, Yildirim O, Acharya UR (2021) Automated
detection of COVID-19 cases using deep neural networks with X-ray images. Comput Biol
Med 121:103792
18. Al-antari MA, Hua CH, Bang J, Lee S (2020) Fast deep learning computer-aided diagnosis of
COVID-19 based on digital chest x-ray images. Appl Intell, 1–18
19. Eljamassi DF, Maghari AY (2020) COVID-19 detection from chest X-ray scans using machine
learning. In: 2020 International Conference on Promising Electronic Technologies (ICPET),
pp 1–4
20. Tayarani-N MH (2020) Applications of artificial intelligence in battling against Covid-19: a
literature review. Chaos, Solitons Fractals 110338
21. Feng C, Huang Z, Wang L, Chen X, Zhai Y, Chen H, Wang Y, Su X, Huang S, Zhu W, Sun W
(2020) A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected
COVID-19 pneumonia in fever clinics. MedRxiv
22. Annavarapu CSR (2021) Deep learning-based improved snapshot ensemble technique for
COVID-19 chest X-ray classification. Appl Intell 51(5):3104–3120
23. Bharti U, Bajaj D, Batra H, Lalit S, Lalit S, Gangwani A (2020) Medbot: conversational
artificial intelligence powered Chatbot for delivering tele-health after Covid-19. In: 2020 5th
International Conference on Communication and Electronics Systems (ICCES), pp 870–875
24. de Moraes Batista AF, Miraglia JL, Donato THR, Chiavegatto Filho ADP (2020) COVID-19
diagnosis prediction in emergency care patients: a machine learning approach. medRxiv
25. Mukhtar AH, Hamdan A (2021) Artificial intelligence and coronavirus COVID-19: applica-
tions, impact and future implications. The importance of new technologies and entrepreneurship
in business development: in the context of economic diversity in developing countries, vol 194,
p 830
26. Burugupalli M (2020) Image classification using transfer learning and convolution neural
networks
27. Ganatra N, Patel A (2018) A Comprehensive study of deep learning architectures, applications
and tools. Int J Comput Sci Eng 6:701–705
28. Chen JIZ (2021) Design of accurate classification of COVID-19 disease in X-ray images using
deep learning approach. J ISMAC 3(02):132–148
29. Welch Medical Library Guides: Finding Datasets for Secondary Analysis: COVID-19 Datasets
(2021). Retrieved 30 July 2021, from https://browse.welch.jhmi.edu/datasets/Covid19
30. Aishwarya T, Kumar VR (2021) Machine learning and deep learning approaches to analyze
and detect COVID-19: a review. SN Comput Sci 2(3):1–9
High Spectrum and Efficiency Improved
Structured Compressive Sensing-Based
Channel Estimation Scheme for Massive
MIMO Systems
V. Baranidharan, C. Raju, S. Naveen Kumar, S. N. Keerthivasan,

and S. Isaac Samson
Abstract Due to its high spectrum and energy proficiency, massive MIMO will
become the most promising technique for 5G communications in future. For accu-
rate channel estimation, potential performance gain is essential. The pilot overhead
in conventional channel approximation schemes is due to the enormous number
of antennas used at the base station (BS), and also this will be too expensive; for
frequency division duplex (FDD) massive MIMO, it is very much unaffordable. We
introduced a structured compressive sensing (SCS)-based temporal joint channel
estimation scheme which reduces pilot overhead where it requires, delay-domain
MIMO channels are leveraged whereby the spatiotemporal common sparsity. The
accurate channel estimation is required to fully exploit the mass array gain, which
states the information at the transmitter side. However, FDD downlink channel esti-
mation always requires more training and computation than TDD mode, even though
the uplink and downlink channel is always not straightforwardly reciprocal, due to the
massive number of antennas in base station. At the base station, we first introduce the
non-orthogonal pilots which come under the structure of compressive sensing theory
to reduce the pilot overhead where they are required. Then, a structured compressive
sensing (SCS) algorithm is introduced to approximate the channels associated with
all the other OFDM symbols in multiple forms, then the inadequate number of pilots
is estimated, and the spatiotemporal common sparsity of massive MIMO channels
is also exploited to recover the channel estimation with precision. Furthermore, we
recommend a space–time adaptive pilot scheme to decrease the pilot overhead, by
making use of the spatiotemporal channel correlation. Additionally, in the multi-cell
scenario, we discussed the proposed channel estimation scheme. The spatial corre-
lation in the wireless channels is exploited for outdoor communication scenarios,
where mostly in wireless channels. Meanwhile, compared with the long signal trans-
mission distance, the scale of the transmit antenna is negligible. By utilizing the
greater number of spatial freedoms in massive MIMO can rise the system capacity
and energy proficiency of magnitude. Simulation results will show that the proposed
system outperforms than all the other existing systems.
V. Baranidharan (B) · C. Raju · S. Naveen Kumar · S. N. Keerthivasan · S. Isaac Samson

Department of Electronics and Communication Engineering, Bannari Amman Institute of
Technology, Sathyamangalam, India
https://doi.org/10.1007/978-981-16-7610-9_19
266 V. Baranidharan et al.
Keywords Frequency division duplexing · Massive MIMO · Structured

compressive sensing · Pilot overhead
1 Introduction
Multiple input multiple output (MIMO) is the construction of a smaller number of

different types of antennas at the base station (BS), when it comes to massive MIMO
systems which consist of more number of antennas constructed at the BS must
be of large numbers (more than 100 nos). By the orders of magnitude, the system
capacity, energy efficiency, and high spectrum can be increased from massive MIMO.
For future 5G communications, massive MIMO has been identified as a significant
future for energy efficiency and high spectrum implementation. In massive MIMO,
the information, and properties in the channel between the transmitter and receiver
which is known as channel state information (CSI) are important for detecting the
signal, allocating the resource, and formation of beam.
Due to massive transmit antennas, channel estimation from 100s of transmit
antennas with each user is different which in case causes increase in pilot over-
head. The number of transmit antennas arrays at the base station is more, and it leads
to pilot overhead so the estimation of channel in frequency division duplex (FDD)
massive MIMO systems is difficult. Till today the massive MIMO research in time
division duplex (TDD) is accurate so the FDD system is avoided. FDD-based massive
MIMO systems are always used for the reduction of pilot overhead temporal correla-
tion and delay domain sparsity has been proposed, where the number of transmitting
antennas at the transmitter is always high due to where the interference cancelation
of training sequence of distinct antennas is complicated.
Compressive sensing technique has been identified by searching the spatial corre-
lation. In spatial–temporal common sparsity-based delay domain vectors which
exploits the structured compressive sensing theory used to reduce the pilot over-
head accumulated in the FDD massive MIMO systems at the BS by the frame-
work of compressive sensing theory using non-orthogonal pilots which is completely
different from orthogonal pilots under the Nyquist theorem framework. Both orthog-
onal and non-orthogonal pilot schemes are always related to decrease the pilot over-
head for evaluation of channel. Adaptive structured subspace pursuit (ASSP) for
spatial–temporal-based common sparse delay domain MIMO channels to maximize
the accurate estimation of the channel with less number of pilots by maximizing the
temporal correlation with the reduction of overhead and accuracy of channel estima-
tion. The required pilot signals are considering the array of antenna at the BS and
mobility of the user from single to multiple cells.
High Spectrum and Efficiency Improved Structured … 267
2 Related Work
Massive MIMO is completely based on boosting the spectral efficiency and multi-
plexing gain, which employs more than 100s of antennas in the base station [1].
At the transmitter side, the channel state information is required for accuracy and
the consumption of large amount of downlink channel estimation where especially
occurs at the FDD systems. To overcome this problem, we introduce distributed
compressive sensing (DCS). Slow variation in the channel statistics is fully exploited
in multi-channel frequency domain, sparsity will come under multiple sub-channels.
Hybrid training is proposed to support the channel matrix previous frames that can
be widely used to represent the downlink three components. These three compo-
nents proposed through uplink in real-time channel state information fast-tracking is
obtained. This technique is widely used for optimization of the convergence function
in channel estimation and adoption of low complexity.
In massive MIMO system, accurate channel estimation is important to ensure good
performance [2]. Comparing to time division duplex (TDD), high data rate and high
coverage of wide area are high in FDD. TDD does not require heavier training and
computation compared to FDD, and it is not straightforwardly reciprocal because of
a greater number of antennas. In the real-time estimation, channel variation renders
become more difficult in FDD massive MIMO and downlink channel estimation is
done here.
In FDD systems, estimation of uplink and downlink feedback to reduce pilot
overhead and cascade pre-coding have been used [3]. So, low-dimensional channel
estimation will be predicted accurately and feedback is also estimated by using
cascaded pre-coding techniques. The parametric model in massive MIMO is used
for downlink channel estimation. Through the decided forward link, the path delay
will be estimated first and then the base station will be quantized. Both downlink
and uplink have the identical path delay where parametric models lead to data fitting
errors.
The high spectrum and energy efficiency in the massive MIMO is the most
promising and developing technology for the wireless communications [4]. In FDD,
downlink channel estimation becomes unaffordable due to a greater number of base
station antennas. Perfect channel recovery in the minimum pilot symbols with Gaus-
sian mixture distribution is followed by each channel vector of the general channel
model. Weighted sum of Shannon mutual informal design pilot symbols between
the user and corresponds channel of grass mannianmoni FDD. NMSE level is not
that much good for multi-user scenarios. Least square (LS) method and distributed
compressive sensing (DCS) method are combined in the DCS techniques for better
estimation. Among the different subcarriers, channel vectors in the angular domain
are estimated in the form of two parts and the overall problem is that computation
complexity is high and channel estimation is not accurate and reduces pilot overhead.
In order to obtain the channel state information accurately at the transition side,
we have to exploit and improve the multiplexing and array gain of the multiple input
and multiple output systems (MIMO) [5]. Due to overwhelming pilot and feed-
back overhead, FDD will not support conventional channel estimation. Compressive
channel estimation is introduced to reduce the pilot overhead in the FDD massive
MIMO systems. Beam space is maximized in beam block massive MIMO and pilot
overhead in the downlink training can be reduced through beam block compressive
channel estimation scheme. For acquiring reliable CSIT, we wish to propose the
optimal block for orthogonal transmission which comes under the pursuit algorithm
at the limit of the pilot overhead, effective channel matrix algorithm is always used
for representation by amplitude and phase of signal which received and developed
at the feedback load.
In FDD-based massive MIMO system, the major problem in these uplink and
downlink for the channel estimation is discussed in work [6]. This will reduce the
pilots in uplink/downlink, codebook, and dictionary-based channel model to present
in this work for channel estimation, and robust channel representation is used by
observing the reciprocity of the AOA/AOD is calculated for the uplink/downlink
data transitions The downlink training overhead can be reduced by utilizing the
information from simple uplink training which is a bottleneck of FDD massive MIMO
system.
For massive MIMO, the parametric channel estimation has been done to propose
the channel estimation [7]. Then spatial correlation of wireless channel is estimated.
The wireless channel is sparse, where the spatial correlated values of wireless channel
are exploited and the scale of the antenna array will be negligible compared to long
signal transmission distance. The similar path delay of the transmitting antennas
usually shares the channel impulse response (CIR). Here we propose a parametric
channel estimation method which exploits the spatial common sparsity of massive
MIMO which leads to reduce the pilot overhead significantly. The accuracy of the
channel will be increased gradually by increasing the number of antennas to acquire
the same accuracy by reducing the number of pilots and the limitation is it does not
support low dimension CSI.
2.1 Spatiotemporal Common Sparsity and Delay Domain
In 5G wireless communication, the broadband channel shows that they exhibit the
delay domain in the sparsity with the extensive of experimental studies. Because
of the large time arrival the channel delay spreader as much as the earliest path.
The transmitter domain is the antenna which is place at the base station the channel
impulse response (CIR) is expressed as
T
h m,r = h m,r [1], h m,r [2], . . . , h m,r [L] , 1 < m < M,
where the term r represents the index of OFDM in delay domain. Then value L is
always indicates the equivalent channel length Dm,r = supp{hm,r } = {l:|hm,r [l]|>pth
1 ≤ l ≤ L is the sub-set of hm,r . Then Pth is indicated as noise in wireless channel.

The sparsity level can be expressed as Pm,r = |Dm,r|c and Pm,r << L is due to delay
domain channel in sparse nature. The overlap becomes larger in the sparsity pattern
of CIR’s in the different transmit receiving antennas, massive MIMO is the system
which has very large M, where the common sparsity can be shared by CIR’s sparse
pattern.
D1,r = D2,r = · · · = D M,r
They referred as spatio common sparsity in wireless massive MIMO channels. The
advanced LTE carrier frequency in working system fc = 2 GHz where the bandwidth
is single as fs = 10 MHz.
Then the uniform linear array (ULA) the distance is calculated as 8λ/2c = 4/fc
= 0.002 µs compare with sample period of system Ts = 1/fs = 0.1 µs. Then λ and
c are denoted as wavelength, and velocity is the difference in the transmitting and
receiving side with the same scatters should be pointed or uncorrelated event due to
non-isotopic antennas.

Dm,r = Dm,r +1 = · · · = Dm,r +R−1 , 1 < m < M
The temporal common sparsity is also referred as the temporal correlation of

wireless channel. The channel correlation is discussed as spatio and temporal channel
with the common sparsity of delay domain in massive MIMO channel. In the existing
channel scheme, the channel estimation is usually not considered to overcome the
challenge of estimation of channel in FDD massive MIMO in this property.
2.2 Proposed SCS-Based Spatiotemporal Joint Channel

Estimation
For the reliable channel estimation, we should use the algorithm known as SCS-
based algorithm which is used at the user side. For the further reduction in the pilot
overhead, we should propose the scheme known as spatial time adaptive pilot scheme.
In the extension of multi-cell scenario, the proposed channel estimation scheme is
discussed.
2.2.1 Non-orthogonal Pilot Scheme
The Nyquist sampling theorem is the completely based on classical framework design
of orthogonal pilots in conventional method, which is all forms of existing MIMO
systems. The different subcarriers can occupy the different transmit antennas. Where
orthogonal pilots can be illustrated. The pilot of the different antennas in the trans-
mitter is same subcarrier which they occupy completely where it is fully based on CS
theory which supports the proposed non-orthogonal pilot scheme for effective data
transmission. The sparse nature of the channel is leverage to reduce substantially.
The MIMO channel estimation is in the form of OFDM symbol we have to
consider, for the Proposed non-orthogonal scheme, subcarrier pilot in the index set
is denoted as which is non-similar sequence set of values ranges varies from 1 to N
which is non-unique for all antennas in transmission there Np = | ξ |c in the subcarrier
of pilot in the OFDM symbol. N is the symbol that denotes the number of subcarriers
in the symbol of OFDM. p m ∈ C Np×1 is denoted as pilot sequence mth transmitting
antennas.
2.2.2 SCS-Based Channel Estimation
The pilot sequence which received as yr ∈ C Np×1 of the rth antenna of an OFDM
symbol can be expressed by the following equation where the removal of discrete
Fourier transform (DFT) and guard interval.

M
yr = diag{Pm} F|ξ h m,r + wr
m=1 0(N −L)×1 m=1

M
M
= Pm F|ξ h m,r + wr = φm h m,r + wr
m=1
where Pm = diag{pm }, F ∈ C N ×N can be expressed as a DFT matrix, and F L ∈ C N ×L

can be expressed as partial DFT matrix which consists of first L columns of F, F|ξ
∈ C Np×N and F|ξ ∈ C Np×L which are considered as sub-matrices. F and F L will
be selected as rows according to ξ. The additive white Gaussian noise (AWGN) is
referred wr ∈ C Np×1 where the vector of the rth symbol of OFDM,
yr = φh r + wr
The aggregate of CIR vector can be considered as,

T

hr = hT 1,r , hT 2,r , . . . , hT M,r ∈ C M L×1
In the massive MIMO channels. N p << M L, where M is referred as large number

of transmitting antennas and Np is referred as the pilots which are limited. By using
conventional channel estimation, we are not able to estimate hr from yr, by the
equation when they are under determined scheme. The sparse signal at the sparsity
due to {hm,r}M = 1, in the observation of hr . Under the framework of CS theory,
the low dimension inspires us to estimate high dimension of sparse signal hr at the
reduced pilot sequence yr . The performance enhancement of the MIMO channel
estimation is inherently of common sparsity. To obtain the CIR equivalent vector

dr ,
we have to rearrange the CIR vector
hr .
T ∗
d̃r = d1,r T , d2,r T , . . . , d L ,r T ∈ C M L 1
∗
ψ = [ψ1 , ψ2 , . . . , ψ L ] ∈ C N p M L
yr = ψdr + wr
The structured sparsity of the CIR vector

d r is equivalent to the spatial common
sparsity in the massive MIMO channels. In OFDM symbols, the MIMO systems
will not change virtually at R due to wireless channel at spatial common spar-
sity where the temporal correlation the coherence of delay in path is determined
by R. However, MIMO channel exhibits the common sparsity in successive R at
spatiotemporal channel which is adjacent of OFDM symbol and they have similar
pilot pattern.
Y = ψD + W
∗
Y = yr , yr +1 , . . . , yr +R−1 ∈ C N p R
∗
D = d̃r , d̃r +1,..., d̃r +R−1 ∈ C M L R
T
D = D1 T , D2 T , . . . , DL T
where the size of M × R for 1 ≤ l ≤ L at D, then the signal gain of lth delay in
path is mth row calculated and rth column of element places in D1 matrix, where the
mth transmitting antenna of the rth OFDM symbol. The structured sparsity exhibits
the equivalent CIR matrix in D for the best performance analysis of the channel
estimation so, only we go for intrinsic sparsity in D matrix.
2.3 Channel Estimation in Multi-cell Massive MIMO
This subsection demonstrates the proposed channel estimation scheme from a single-
cell scenario to a multi-cell scenario. To solve the pilot contamination from the
interfering cells, the frequency division multiplexing (FDM) scheme can be utilized;
i.e., in the frequency domain, pilots of adjacent cells are orthogonal. However, the
channel estimation performance of users in the target cells may be degraded by
downlink pre-coded data from adjacent cells. Thus, we can conclude that the TDM
scheme can be considered as the suitable approach to mitigate pilot contamination in
multi-cell FDD massive MIMO systems due to slight performance loss in the FDM
scheme and reduction in pilot overhead.
2.4 Non-orthogonal Pilot Design Based on CS Theory
The sensing matrix ψ is always more effective and reliable to compress the high-
dimensional sparse signal D in CS theory of design. The pilot placement design ξ
is converted in the design of ψ and the pilot sequence {pm}M m = 1. The sensing
matrix can be determined by parameter ξ and {pm}M m = 1. The correlation of small
column of ψ is desired in the sparse signal recovering which is reliable in CS theory.
We enlighten in the design of ξ and {pm}M m = 1 which is considered appropriately.
The design of {pm}M is small considered with cross-correlation compared to
specific pilot design, and for columns of ψr is given any l, cross-correlation is only
determined by the {pm}M m = 1. Therefore,

H
H
(ψm1,l ) H ψm2,l = ψ1(m1) ψ1(m2) = φm1 (l) (l)
φm2
H
= Pm1◦ F p(l) Pm2◦ F p(l) = (Pm1) H Pm2
where F P = F L |ξ and 1 ≤ m1 > m2 ≤ M.

To achieve |(ψm1,l)H ψ m2,l | for independent and identically distribution where
uniformly distributed U[0, 2π ). Then e jθ κ,m , where ke denotes as elements of Pm
∈ C Np×1 to consider the proposed pilot sequence, l2− norm of each element then ψ
denoted as constant

ψm,1 = N p
2
H

ψm1,l ψm2,l (P|m1) H Pm2
lim = lim =0
N p→∞ ψm1,l ψm2,l N p→∞ Np
2 2
where N p is indicated as limited practice, the good cross-correlation can be achieved

through proposed pilot sequence of column ψ and for any l which is proposed as
random matrix theory (RMT).
For the proposed {Pm}M , with the further investigation then the cross-correlation
of ψ m1,l1 and ψ m2,l2 with l1 = l2, to achieve the small |(ψ m1,l1 )H ψ m2,l 2| which
enlightens as ξ. We have Np > L is typical massive MIMO with two reasons. One
transmitting antenna is associated with pilot estimation, then Np is the number of
pilot overhead which can be at least 64 s; moreover, the delay is 3–5 µs and then
the bandwidth of the typical system is 10 µhz if we refer LTE. Where they are
advanced system parameter, which <<64. We propose an adaptive structure based
on the condition of Np > L, with the uniform speed pilots where the pilot increases
[N/Np] which can predict the small |(ψ m1,l1 )H ψ m2,l2 |. We consider ξ significantly
and the set of {1, 2, …., N} with the equal interval. Then the products of ψ m1,l1 and
ψ m2,l2 are expressed as
Np ∼
(ψm1.l ) H ψm2.l K =1 exp ( jθk )
lim = lim =0
N p→∞ Np N p→∞ Np
The proposed ξ and {pm }M = 1 can be achieved through good cross-correlation

between ψ and N p which is limited between two columns in practice.
2.4.1 Convergence Analysis of Proposed ASSP Algorithm
The sparsity level s = P which is accurate convergence in the proposed ASSP algo-
rithm. In case of s = P, we have to provide the convergence that we proposed
and should stop the criteria. The signal sparse vector can be analyzed through the
conventional SP algorithm and model-based SP algorithm. For the reconstruction,
we provide the convergence of structure sparse matrix.
Theorem 1 For Y = ψD + W, where the ASSP algorithm with the sparsity level
as s = P.

< c p
W
F
k D − D Fk−1
R > c p R > c p
W
F
F F
can be estimated as D with s = P and cP, c P, and c P are constants. Structured

restricted isometric property (SRIP) constant can be shown as cP,c P, and c P and
δP, δ2P, and δ3P. The investigation of convergence case which is s = P, then we
can consider D = D > s + D - D > s where D > s which denotes matrix where the
largest sub-matrix {Dl}L l = 1 according to F-norms and the sub-matrix to 0 of sets
where the expression can be
Y = ψ D >s +ψ(D − D >s ) + W = ψ D >s +W
then
W = ψ(D − D > s) + W
for the case of s = P, where P is the sparse signal D and the s is sparse signal Ds
which is estimated. The acquired and partial correct in the set of estimation s-sparse
matrix is appropriate SRIP theorem. Then
s ∩
T = where
s supports the
s-sparse matrix and
s is the true support of D and is denoted as null set. Hence,
s ∩
T = which will reduce the number of the iterations in convergence of
sparsity level s + 1. The (s + 1) which is the first iteration of the sparsity level
s
and the prior information. The estimate support of the sparsity level which is pointed
out of the proof theorem.
2.5 Computation Complexity of ASSP Algorithm
Computation complexity is the operation used in several algorithms; in each iteration,

the ASSP algorithm is proposed as complexity where M G is denoted as space–time
pilot scheme which is adaptive where transmit antenna in each of the groups. The
ratio of the correlation complexity operation is the major cause which is followed
by support merger or π 3 (.) operations or the norm operation. Then the update of the
Moore–Penrose matrix is inversely proportional to this operation with the parameters
2.3 × 10−2 , 1.7 × 10−6 , 5.7 × 10−5 , and 2.3 × 10−2 , respectively. The Moore–Penrose
matrix operation is the main computation complexity of the ASSP algorithm with
the inversion complexity as

P 2N p (MG s)2 + (MG s)3
3 Simulation Results and Discussion
In this section, we give the detailed description of simulation study performed to

assess the functioning of the proposed channel estimation scheme for FDD massive
MIMO structures. The parameters of the simulation method were established as:
Table 1 Initial simulation

Simulation parameters Values
parameters
OFDM Symbol (R) 1
Fluctuation of the ASSP algorithm (SNR) 10 dB
(ITUVA) channel model (p) 6
System carrier (fc) 2 GHz
Bandwidth of a system (fs) 10 MHz
Guard Interval (N g ) 64
The pilot overhead ratio (ηp) 5–6
DFT size N = 4096, system carrier fc = 2 GHz, length of the guard interval Ng
= 64, and the system bandwidth fs = 10 MHz, which might prevent the maximum
delay spread of 6.4 µs. We assume the 4 × 16 planar antenna array (M = 64), and
MG = 32 is considered to guarantee the spatial common sparsity of channels in each
antenna group. Hence, for SNR = 10 to 30 dB the pth value will be estimated as 0.1,
0.08, 0.06, 0.05, and 0.04, respectively (Table 1).
From the simulations, it is clear that the ASSP algorithm outperforms the oracle
ASSP algorithm for ηp > 19.04%, and its performance is even better than the perfor-
mance bound obtained by the oracle LS algorithm with Np_avg > 2P at SNR =
10 dB. This is because the ASSP algorithm adaptively acquires the effective channel
sparsity level, which was denoted by Peff , instead of using P to obtain better channel
estimation performance. Considering ηp = 17.09% at SNR = 10 dB as an example,
we can find that Peff = 5 with high probability for the proposed ASSP algorithm.
Therefore, the average pilot overhead obtained for each transmit antenna Np_avg
= Np/MG = 10.9 is still larger than 2Peff = 10. From the analysis, we can conclude
that, when Np is insufficient to estimate channels with P, the proposed ASSP algo-
rithm can be utilized to estimate sparse channels with Peff > P, where the path gains
accounting for the majority of the channel energy will be determined, meanwhile
those with the small energy are discarded as noise. Also, the MSE performance fluc-
tuation of the ASSP algorithm at SNR = 10 dB is because Peff increases from 5 to
6 when ηp increases, which leads some strong noise to be obtained at the channel
paths and thus leads to the degradation of the performance of MSE (Table 2).
The channel sparsity level of the proposed ASSP algorithm against SNR and
pilot overhead ratio is depicted in the simulations, where the vertical axis and the
horizontal axis represent the used pilot overhead ratio and the adaptively estimated
channel sparsity level, respectively, and the chroma represents the probability of the
estimated channel sparsity level. We consider R = 1 and fp = 1 without exploiting
the temporal channel correlation in the simulations. Comparisons between the MSE
performance of the introduced pilot placement scheme and conventional random
pilot placement scheme are made where the introduced ASSP algorithm and the
oracle LS algorithm are exploited (Fig. 1).
We consider R = 1, fp = 1, and ηp = 19.53% in the simulations. It is clear that
both the schemes yield a very similar performance. The proposed uniformly spaced
Table 2 Compared with OMP and proposed CS-based JCE algorithms

Parameters SNR OMP Proposed CS-based JCE algorithms
Min 5 0.02586 0.02407
Max 20 0.1158 0.068
Mean 12.5 0.06335 0.04332
Median 12.5 0.05897 0.04152
Mode 5 0.02586 0.02407
Standard deviation 4.761 0.02828 0.01412
Range 15 0.08991 0.04393
Fig. 1 Sparsity
pilot placement scheme can be more easily implemented in practical systems due
to the regular pilot placement. Hence, uniformly spaced pilot placement scheme is
used in LTE-Advanced systems to facilitate massive MIMO to be compatible with
current cellular networks.
The MSE value is compared with the proposed ASSP algorithm with (R = 4) and
without (R = 1) for tie varying channel of massive MIMO systems. The SCS algo-
rithm does not function perfectly due to a smaller number of pilots. The downlink bit
error rate (BER) performance and average achievable throughput per user, respec-
tively, in the simulations where the BS using zero-forcing (ZF) pre-coding is assumed
to determine the estimated downlink channels. The BS with M = 64 antennas simulta-
neously serves K = 8 users using 16-QAM in the simulations and the ZF pre-coding
is based on the estimated channels under the same setup. It can be noted that the
proposed channel estimation scheme performs better than its counterparts (Table 3).
Comparisons between the average achievable throughput per user of different
pilot decontamination schemes are made. We can observe that, a multi-cell massive
MIMO system with L = 7, M = 64, K = 8 sharing the same bandwidth with the
Table 4.3 Compare with SP

Parameters SNR SP Joint channel
and joint channel estimation
estimation
Min 5 0.01326 0.008641
Max 20 0.2019 0.0991
Mean 12.5 0.08267 0.02937
Median 12.5 0.06776 0.01292
Mode 5 0.01326 0.008641
Standard deviation 4.761 0.05938 0.03541
Range 15 0.1886 0.09046
average achievable throughput per user in the central target cell suffering from the
pilot contamination is analyzed. Meanwhile, we consider R = 1, fd = 7, the path
loss factor is 3.8 dB/km, the cell radius is 1 km, the distance D between the BS and
its users can be from 100 m to 1 km, the SNR (the power of the unpre-coded signal
from the BS is considered in SNR) for cell-edge user is 10 dB, the mobile speed of
users is 3 km/h. The BSs using zero-forcing (ZF) pre-coding is assumed to know
the estimated downlink channels achieved by the proposed ASSP algorithm. For the
FDM scheme, pilots of L = 7 cells are orthogonal in the frequency domain (Fig. 2).
Pilots of L = 7 cells in TDM are transmitted in L = 7 successive different time
slots. In TDM scheme, the channel estimation of users in central target cells suffers
from the pre-coded downlink data transmission of other cells, where two cases are
considered. The “cell-edge” case indicates that when users in the central target cell
estimate the channels, the pre-coded downlink data transmission in other cells can
guarantee SNR = 10 dB for their cell-edge users. While the “ergodic” case indicates
Fig. 2 SNR versus MSE

that when users in the central target cell estimate the channels, the pre-coded downlink
data transmission in other cells can guarantee SNR = 10 dB for their users with the
ergodic distance D from 100 m to 1 km.
4 Conclusion
In this paper, we have introduced the new SCS-based spatial–temporal joint channel
evaluation scheme for massive MIMO systems in FDD. To decrease the pilot over-
head, the spatial–temporal common sparsity of wireless MIMO channels can be
exploited. The users can easily evaluate channels with decreased pilot overhead with
the non-orthogonal pilot scheme at the BS and with the ASSP algorithm. According
to the mobility of the user, the space–time and adaptive pilot scheme will reduce
the pilot overhead. Additionally, to achieve accurate channel estimation under the
framework of compressive sensing theory, we discussed the non-orthogonal pilot
design, and the proposed ASSP algorithms are also discussed. The simulated results
show that the modified SCS-based spatial channel estimation scheme will give the
better results than the existing channel estimation schemes.
References
1. Zhang R, Zhao H, Zhang J (2018) Distributed compressed sensing aided sparse channel esti-
mation in FDD massive MIMO system. IEEE Access 6:18383–18397. https://doi.org/10.1109/
ACCESS.2018.2818281
2. Peng W, Li W, Wang W, Wei X, Jiang T (2019) Downlink channel prediction for time-varying
FDD massive MIMO systems. IEEE J Sel Top Sign Process 13(5):1090–1102. https://doi.org/
10.1109/JSTSP.2019.2931671
3. Liu K, Tao C, Liu L, Lu Y, Zhou T, Qiu J (2018)Analysis of downlink channel estimation based
on parametric model in massive MIMO systems. In: 2018 12th International Symposium on
Antennas, Propagation and EM Theory (ISAPE), Hangzhou, China, 2018, pp 1–4. https://doi.
org/10.1109/ISAPE.2018.8634083
4. Gu Y, Zhang YD (2019) Information-theoretic pilot design for downlink channel estimation in
FDD massive MIMO systems. IEEE Trans Sign Process 67(9):2334–2346. https://doi.org/10.
1109/TSP.2019.2904018
5. Huang W, Huang Y, Xu W, Yang L (2017) Beam-blocked channel estimation for FDD massive
MIMO with compressed feedback. IEEE Access 5:11791–11804. https://doi.org/10.1109/ACC
ESS.2017.2715984
6. Chen J, Zhang X, Zhang P (2020) DDL-based sparse channel representation and estimation for
downlink FDD massive MIMO systems. In: ICC 2020—2020 IEEE International Conference
on Communications (ICC), Dublin, Ireland, 2020, pp 1–6. https://doi.org/10.1109/ICC40277.
2020.9148996
7. Gao Z,Zhang C, Dai C, Han Q (2014) Spectrum-efficiency parametric channel estimation scheme
for massive MIMO systems. In: 2014 IEEE international symposium on broadband multimedia
systems and broadcasting, Beijing, China, 2014, pp 1–4. https://doi.org/10.1109/BMSB.2014.
6873562
8. Gao Z, Dai L, Dai W, Shim B, Wang Z (2016) Structured compressive sensing-based spatio-
temporal joint channel estimation for FDD massive MIMO. IEEE Trans Commun 64(2):601–
617. https://doi.org/10.1109/TCOMM.2015.2508809
A Survey on Image Steganography
Techniques Using Least Significant Bit
Y. Bhavani , P. Kamakshi, E. Kavya Sri, and Y. Sindhu Sai
Abstract Steganography is the technique in which the information is hidden within

the objects so that the viewer cannot track it down and only the reserved recipient
will be able to see it. The data can be concealed in different mediums such as text,
audio and video files. Hiding the information in image or picture files is called
image steganography. This steganography method helps in protecting the data from
malicious attacks. The image chosen for steganography is known as cover image
and the acquired image as stego image. A digital image can be described using pixel
values, and those values will be modified using least significant bit (LSB) technique.
To increase the security, various LSB techniques had been proposed. We made a
comparison on various image steganography techniques based on the parameters
like robustness, imperceptibility, capacity and security. In this paper, based on the
comparisons we have suggested few image steganography algorithms.
Keywords Steganography · Spatial domain · Least significant bit (LSB) · Cover

image · Stego image
1 Introduction
Internet technology gives numerous advantages to humans, especially in communi-

cation. In the generation of data communication, the security and privacy of data are
an area that should be mostly considered. To solve these security issues, different
methods like steganography, cryptography, watermarking and digital signatures were
used. Steganography and cryptography [1] are used to conceal the data, watermarking
is used to save copyright, and digital signatures are used to authenticate the data [2, 3].
Y. Bhavani (B) · P. Kamakshi · E. Kavya Sri · Y. Sindhu Sai

Kakatiya Institute of Technology & Science, Warangal, India
E. Kavya Sri
e-mail: b18it030@kitsw.ac.in
Y. Sindhu Sai
e-mail: b18it027@kitsw.ac.in
https://doi.org/10.1007/978-981-16-7610-9_20
282 Y. Bhavani et al.
Steganography (Stegos—to cover, grayfia—writing) [4] is the study of invis-

ible communication. It protects the confidentiality of two communicating parties. In
image steganography, secrecy is accomplished by inserting information into cover
image and creating a stego image. Spatial domain and frequency domain are the
two different domains used for hiding data in the image. In frequency domain [5],
the message is hidden by transforming the cover image. Transformations that are
commonly used include discrete cosine transform (DCT), discrete wavelet trans-
form (DWT) and singular value decomposition (SVD). In spatial domain [6], the
secret image is directly inserted by changing the pixel value of the cover image using
the techniques least significant bit (LSB) and most significant bit (MSB).
The softwares used in image steganography are
• Quick stego: It is a software which hides the message in images using AES algo-
rithm [7]. This software is very simple, fast and executes even complex security
processes. It can be used to secure more than one file and encrypts all types of file
formats such as audio, video, image and document.
• Hide In Picture (HIP): It is a software in which any type of file can be hidden
inside the bitmap pictures [7] using Blowfish algorithm. The users can use pass-
words to hide their files in pictures. So that only the people who knows the
password can access the file hidden in the pictures. The user can also make a
specific colour transparent, where nothing will be stored.
• Chameleon: This software uses LSB algorithm [4, 8] in which LSB of pixel values
of image will be replaced with data bits of message which is to be hidden. It uses
an encryption algorithm which enhances the use of hiding space in a particular
cover image.
As technology advances, more research is being conducted to develop a better
technique for steganography and cryptography to provide more security for the data.
The different applications [2] for steganography are
• E-Commerce
• Media
• Database systems
• Digital watermarking
• Secret data storing
• Access control system for digital content distribution.
2 Related Work
Fridrich [6] this paper proposes a high-precision steganographic technique which

will estimate the length of a hidden message inserted within the LSB technique. The
text image is split into groups of n consecutive or disjoint pixels during this method.
The changed pixel values are employed in this method to work out the content of the
hidden message. This method provides advantages like more stability and accuracy
for a wide range of natural images.
A Survey on Image Steganography Techniques … 283
Dumitrescu et al. [5] introduced another steganalysis procedure for distinguishing

LSB steganography in computerized signals like image and audio. This method
depends on the statistical analysis of sample pairs. The length of a secret message
inserted using LSB steganography can be assessed with more accuracy using this
method. This detection algorithm is very simple and fast compared to other algo-
rithms. Bounds on estimating errors are created to assess the robustness of the
proposed steganalytic approach. In addition, the vulnerability to potential attacks
is examined, and countermeasures are offered.
Ker [8] the histogram characteristic function (HCF) introduced by Harmsen is
used in this paper to find steganography in colour images. But this function cannot
be used in grayscale images. The HCF is applied in two innovative ways: the output
is modified using a selected image and the adjacency histogram is computed instead
of the normal histogram. The results of this approach reveal that the new detectors
are far more dependable than previously known detectors. The adjacency histogram
was not helpful in the secrecy, and it may lead to the detection of secret message by
attackers.
Yang et al. [9] proposed flexible LSB substitution method in the image steganog-
raphy. This method focuses more on noise-sensitive area of the stego image such that
it may obtain more visual quality. The proposed method distinguishes and utilizes
normal text and edge area for insertion. This approach calculates the number of k-bit
LSB for inserting the data to be hidden. The k value is high in the non-sensitive
area of the image and modest in the sensitive image area to equilibrate the image’s
overall quality of visibility. The high-order bits of the image calculate the LSB’s (k)
for insertion. This approach also employs the pixel correction method to improve
stego image quality. But this process will be done only on the limited data set.
Joshi et al. [10] proposed different steganographic methods in spatial domain
which mostly use LSB techniques and perform XOR operations on different bits of
the pixel values of a particular cover image. Data will be embedded by performing
two XOR operations, first XOR is on first and eighth bits and the second XOR is on
second and seventh bits. The obtained value will then be compared and it is used as a
rule for embedding the data into the image. A grayscale image is used as cover image,
and three different message images were used with different sizes. After completing
the total process, PSNR value will be obtained with the largest message length.
Irawan et al. [11] combined the steganography and cryptography techniques. In
this approach, before inserting a message on the LSB, it should be encrypted using
the OTP method. Inserting the data into images will be done at the corner or edge
area of the image to improve the undetectability and security for the data. This type
of insertion at the corner is named as canny method. This method also calculates
quality of stego image using a histogram.
Swain [12] proposed two different techniques in spatial domain of digital
steganography. He categorized two different groups where the bits have equal length
and pixel values of both the groups were exchanged. This is mainly used to conceal
the data. One of his techniques uses single bit to conceal data while the other uses
two bits. During this process of replacement, change in the value of pixel will not
be exceeded by two. These techniques increase security compared to PVD schemes

and LSB methods. But the security is still found to be increased after evaluation.
Islam et al. [7] approach uses a variant of LSB technique with a status bit, to
provide productive filtering and also AES algorithm for providing more security.
Bitmap image is used for LSB technique and the hidden data will be encoded first
and this encoded data will be inserted into image. This method will be having more
embedding limit than normal LSB calculation because of using status bit for enquiring
encoding and extraction of hidden messages. Since PSNR values are high, it results
in high quality of stego image.
Chinnalli and Jadhav [13] suggested an image hiding technique using LSB. To
conceal data, common pattern bits (stego-key) are employed. Based on those pattern
bits and the hidden data bits, the LSBs of the pixel are rearranged. Pattern bits are
made up of M x N rows and columns (of a block) with a random key value. During the
inserting process, each pattern bit should be checked with a data bit and if it matches
successfully, the second LSB bits of the cover image are rearranged. If they won’t
match, then they will remain the same. This method provides more security in hiding
the data using a common pattern key. This technique has low hidden potential due to
the fact single hidden data bit requires a block of (M × N) pixels. The disadvantage
of this technique is having less capacity to hide the data.
Dhaya [14] in his proposed method used Kalman filter function in extracting
the message image, which performs the process with more accuracy. This approach
decreases the complexities in extraction process and maintains more intensity of the
images.
Manoharan et al. [15, 16] proposed a technique which uses contourlet transform
to maintain robustness in the medical images as they contain sensitive data. In this
paper, PSNR and correlation coefficient were also calculated to measure accuracy. He
also proposed watermarking method [16] that uses contourlet transform, the singular
value decomposition and discrete cosine transform to increase the robustness.
Astuti et al. [2] proposed a method to hide messages using LSB of pixel values in
an image. Steganography and cryptography were combined in which image steganog-
raphy uses LSB algorithm and contents of the messages were changed through cryp-
tography by performing XOR operations on the three most significant bits. LSB
method is the mostly used and simple method in the image steganography. Using
LSB technique in hiding the data will not affect the visible properties of the image.
To increase the security, the XOR operation is performed three times in the process
of encrypting the message before it is inserted on the LSB and the three MSB bits
were served as keys to facilitate message encryption and decryption.
The combination of steganography and cryptography techniques will provide
more security for the data and more stability in the transmission of data. The PSNR
value is above 50 dB. In this method, there will be two main processes, embedding
process and extraction process.
2.1 Embedding Process
In this process as shown in Fig. 1, a cover image and a message in the form of binary
image will be taken as input. The output of this process will be stego image.
• First cover image and message image should be read.
• The pixel values of the images should be converted into binary format.
• XOR operation should be performed between seventh and sixth bits of the binary
format of cover image.
• Once again, XOR operation is performed between the result obtained by the above
operation and eighth bit of the binary format of the cover image.
• Now XOR operations should be performed on the message image bits with the
three MSB bits, i.e. eighth, seventh and sixth bits.
• The obtained result is saved in the message bits. By converting this result into
unit8, the pixel value of the stego image will be obtained.
Fig. 1 Embedding process (Source ICOIACT, pp. 191–195)

2.2 Extraction Process
In this process as shown in Fig. 2, the input is stego image and the output is recovered
message image.
• First the stego image should be read.
• The pixel values of the image should be converted into binary format.
• XOR operation should be performed between seventh bit and sixth bit of the
binary format of the stego image.
• Once again, the XOR operation is performed between the result obtained by the
above operation and eighth bit of the binary format of the stego image.
Fig. 2 Extraction process (Source ICOIACT, pp. 191–195)

• Now XOR operations should be performed on the LSB with the three MSB bits,
i.e. eighth, seventh and sixth bits.
• The obtained result is saved on the LSB. By converting this result into unit8, the
pixel value of the message image will be obtained.
This technique [2] is very safe, simple, and it gives high PSNR and MSE values,
so that the information which is hidden will be undetectable. The process will be
completed fast and easily using the XOR operation. The secrecy is maintained very
strictly that the inserted bits will not be detected directly using the XOR operator.
Furthermore, the XOR operation is performed three times in which three keys were
used. The stego file will be kept the same size by using the embedded key in the
cover image and eliminates the need for key distribution to the recipient, which will
increase the speed of communication without changing the size of the file.
3 Critical Analysis
Performance metrics of image steganography technique are peak signal-to-noise

ratio (PSNR) and mean square error (MSE). PSNR is mainly used to measure the
robustness of the image, and MSE is used to measure accuracy of the technique. The
PSNR and MSE values calculated using Eqs. 1 and 2 for some of the techniques are
given in Table 1.

256 − 1
PSNR = 10 log10 (1)
MSE
Table 1 Performance metrics of image steganography techniques

Literature references Technique PSNR value (in dB) MSE value
Yang et al. [9] Texture, brightness and edge-based 40.62 0.04756
detective LSB
Joshi et al. [10] Using XOR operation 75.2833 0.0019
Irawan et al. [11] Uses OTP encryption to hide on edge 80.5553 0.0006
areas of image
Swain [12] Digital image steganography 51.63 0.0011
Islam et al. [7] Using status bit along with AES 60.737 0.054
cryptography
Astuti et al. [2] Using LSB and triple XOR on MSB 54.616 0.225
Bhardwaj et al. [17] Inverted bit LSB substitution 59.0945 0.0647
Bhuiyan et al. [4] LSB replacement through XOR 70.8560 0.0053
substitution
−1 G−1

H

MSE = A f (h, g) − S f (h, g) (2)
h=1 g=1
The different types of algorithms in image steganography as shown in Table 2 are

compared based on characteristics
• Robustness—Maintenance of data consistency after converting cover image to
stego image.
Table 2 Content from image steganography techniques

Literature Domain Technique Image steganography characteristics
references Robustness Imperceptibility Capacity Security
Fridrich [6] Spatial Estimation of Y N Y Y
secret message
length
Dumitrescu Frequency Detection via Y Y N Y
et al. [5] sample pair
analysis
Ker [8] Spatial Steganalysis of Y N N Y
LSB matching
Yang et al. Spatial Texture, N Y Y N
[9] Brightness and
edge-based
detective LSB
Joshi et al. Spatial Using XOR Y Y N Y
[10] operation
Irawan Spatial Uses OTP N Y Y N
et al. [11] encryption to
hide on edge
areas of image
Swain [12] Spatial Digital image N N N Y
steganography
Islam et al. Spatial Using status N Y Y Y
[7] bit along with
AES
cryptography
Chinnalli Spatial Combine Y N N N
and Jadhav pattern bits
[13] (stego-key)
with secret
message using
LSB
Astuti et al. Spatial Using LSB Y Y N Y
[2] and triple
XOR on MSB
• Imperceptibility—The property preserves the quality of image after embedding

process.
• Capacity—Size of data inserted into an image.
• Security—Confidentiality of data.
4 Conclusion
In this paper, different image steganography techniques are analysed on the basis of
characteristics of image. The different methods which are currently being used in
the image steganography were highly secured as they won’t allow to detect the pres-
ence of message and retrieve the message for unauthorized access. The combination
of steganographic and cryptographic techniques results in an accurate process for
maintaining secrecy of information. Majority of these techniques use LSB algorithm
to maintain confidentiality and quality of image. Since it is very advantageous as it
is simple and provides imperceptibility, robustness for the data.
References
1. Ardy RD, Indriani OR, Sari CA, Setiadi DRIM, Rachmawanto EH (2017) Digital image signa-
ture using triple protection cryptosystem (RSA, Vigenere, and MD5). In: IEEE International
conference on smart cities, automation & intelligent computing systems (ICON-SONICS), pp
87–92
2. Astuti YP, Setiadi DRIM, Rachmawanto EH, Sari CA (2018) Simple and secure image
steganography using LSB and triple XOR operation on MSB. In: International conference
on information and communications technology (ICOIACT), pp 191–195
3. Bhavani Y, Sai Srikar P, Spoorthy Shivani P, Kavya Sri K, Anvitha K (2020) Image segmentation
based hybrid watermarking algorithm for copyright protection. In: 11th IEEE international
conference on computing, communication and networking technologies (ICCCNT)
4. Bhuiyan T, Sarower AH, Karim R, Hassan M (2019) An image steganography algorithm using
LSB replacement through XOR substitution. In: IEEE international conference on information
and communications technology (ICOIACT), pp 44–49
5. Dumitrescu S, Wu X, Wang Z (2003) Detection of LSB steganography via sample pair analysis.
IEEE Trans Sign Process 51(7):1995–2007
6. Fridrich J, Goljan M (2004) On estimation of secret message length in LSB steganography
in spatial domain. In: Delp EJ, Wong PW (eds) IS&T/SPIE electronic imaging: security,
steganography, and watermarking of multimedia contents VI. SPIE, San Jose, pp 23–34
7. Islam MR, Siddiqa A, Uddin MP, Mandal AK, Hossain MD (2014) An efficient filtering based
approach improving LSB image steganography using status bit along with AES cryptography.
In: IEEE international conference on informatics, electronics & vision (ICIEV), pp 1–6
8. Ker AD (2005): Steganalysis of LSB matching in gray scale images. IEEE Sign Process Lett
12(6):441–444
9. Yang H, Sun X, Sun G (2009) A high-capacity image data hiding scheme using adaptive LSB
substitution. J. Radio Eng 18:509–516
10. Joshi K, Dhankhar P, Yadav R (2015) A new image steganography method in spatial domain
using XOR. In: Annual IEEE India conference (INDICON), pp 1–6, New Delhi
11. Irawan C, Setiadi DRIMC, Sari A, Rachmawanto EH (2017) Hiding and securing message on
edge areas of image using LSB steganography and OTP encryption. In: International conference
on informatics and computational sciences (ICICoS), Semarang
12. Swain G ((2016)) Digital image steganography using variable length group of bits substitution.
Proc Comput Sci 85:31–38
13. Channalli S, Jadhav A (2009) Steganography an art of hiding data. J Int J Comput Sci Eng
(IJCSE) 1(3)
14. Dhaya R (2021) Analysis of adaptive image retrieval by transition Kalman filter approach based
on intensity parameter. J Innov Image Process (JIIP), pp 7–20
15. Manoharan JS (2016) Enhancing robustness of embedded medical images with a 4 level
Contourlet transform. Int J Sci Res Sci Eng Technol pp 149–154
16. Mathew N, Manoharan JS (2012) A hybrid transform for robustness enhancement of
watermarking in medical images. Int J Digital Image Process 4(18):989–993
17. Bhardwaj R, Sharma V (2016) Image steganography based on complemented message and
inverted bit LSB substitution. Proc Comput Sci 93:832–838
Efficient Multi-platform Honeypot
for Capturing Real-time Cyber Attacks
S. Sivamohan, S. S. Sridhar, and S. Krishnaveni
Abstract In today’s world, cyber-attacks are becoming highly complicated. The

hacker intends to expose sensitive information or potentially change the operation
of the targeted machine. Cybersecurity has become a major bottleneck to the on-
demand service’s growth since it is widely accessible to hackers for any type of
attack. Traditional or existing intrusion detection systems is proving unreliable due
to heavy traffic and its dynamic nature. A honeypot is a device that exposes a server
or network that has vulnerabilities to the internet and collects attack information
by monitoring and researching the techniques used by attackers. In this paper, we
setup an effective active protection architecture by integrating the usage of Docker
container-based technologies with an enhanced honeynet-based IDS. T-Pot platform
will be used to host a honeynet of different honeypots in the real-time AWS cloud
environment. The development of this honeynet methodology is essential to recover
threat identification and securing the cloud environment. Moreover, the experiment
results reveal that this defending mechanism may detect and log an attacker’s behavior
which can expose the new attack techniques and even zero-day exploits.
Keywords Cyber security · Intrusion detection system · Honeynet · AWS cloud ·

Docker
S. Sivamohan (B) · S. S. Sridhar · S. Krishnaveni

Departments of Computer Science and Engineering, SRMIST, Kattankulathur, Chennai, India
e-mail: ss3983@srmist.edu.in
S. S. Sridhar
e-mail: sridhars@srmist.edu.in
S. Krishnaveni
https://doi.org/10.1007/978-981-16-7610-9_21
292 S. Sivamohan et al.
1 Introduction
Nowadays, the number of cyber-attacks is growing at a rapid pace, the existing

detection techniques are becoming increasingly ineffective, demanding the devel-
opment of more relevant detection systems [1]. According to the statistics show
that security vulnerabilities in the virtual network layer of cloud computing have
increased dramatically in recent years. Security breaches have become more diffi-
cult and pervasive as a result of the massive increase in network traffic. With the
use of traditional network-based intrusion detection systems (IDS), combating these
assaults has proven ineffective. In most cases, an intrusion detection system (IDS)
is used in conjunction with a firewall to provide a complete security solution. One
of the main challenges in securing cloud networks is in the area of the appropriate
design and use of an intrusion detection system, which can track network traffic
and, hopefully, detect network breaches [2]. After many of the investigations over
the recent decades, the cybersecurity challenge remains unsolved. The factor for
increase is attackers’ access to more processing power and resources, which allows
them to perform more complicated attacks [3].
Intrusion detection systems (IDS) and Firewall systems are widely used for the
detection and prevention of malicious threats [4]. Deep knowledge about the mali-
cious codes and their target destination is required for improving security. This infor-
mation to honeypot was established to perform the information, where the informa-
tion was captured and stored. KDD 99, ISCX, DARPA, and CAIDA, among others,
have constraints, such as obsolete traffic, simulated traffic that does not match real-
world network traffic, and a lack of typical normal data [5]. The general architecture
of a honeypot system is depicted in Fig. 1.
Fig. 1 The general architecture of honeypot system

Efficient Multi-platform Honeypot for Capturing … 293
In this study, to learn more about attackers, their motivations, and techniques,
a honeynet system was used. These systems let attackers engage with them, while
monitoring attacks by posing as actual machines with sensitive information [6]. We
setup an effective active protection architecture by integrating the usage of Docker
container-based technologies with an enhanced honeynet-based IDS. T-Pot Platform
will be used to host a honeynet of different honeypots in the real-time AWS cloud
environment [7].
A network attack benchmark should include a wide range of cyber-attacks created
by various tools and methodologies in addition to representative normal data. The
outcomes of building and assessing detection models using representative data are
realistic, this can bridge the gap between machine learning-based building identifi-
cation algorithms and their actual deployment in real-world cloud networks. Integra-
tion of honeypot collected information with a firewall and the IDS could be made
to reduce the occurrence of false-positive and to improved security. There are two
types of honeypots namely, research and production honeypots. ‘Research Honeypot’
collects information relating to blackhat [8]. This is done by giving full access to the
system without any filtration. Production honeypots are used where they acted as a
filter between the state and blackhat and for preventing malicious attacks [9]. Honey-
pots are characterized based on their design, deployment, and deception technology.
Figure 2 illustrated the various types of honeypots.
The complexity of attacks, changes in attack patterns, and technique are all factors
that should be considered in the cloud environment. Inability to resolve these secu-
rity breaches has always had serious impacts and has made the environment more
susceptible [10]. From the attacker’s perspective, there has been an increase in cyber-
related events, as well as greater complexity [11, 32]. To address the concerns, the
Fig. 2 Various types of honeypots

study presented in this paper suggests the use of honeypots to analyze anomalies
and patterns based on honeypot data. This study intends to create a prototype as
a proof of concept to find appropriate attack detection approaches via honeypots.
The objectives of this work will be leveraged to further intrusion attack detection
analytical tools and techniques. As a result, the followings are the main objective of
this current work:
• To detecting attacks in a cloud environment by developing an intrusion detection
system based on honeypots.
• To develop a prototype as a proof of concept.
• To learn from the attacker’s actions in a virtual environment.
• To evaluate and interpret cyber-attacks.
The main goal of this paper is to deploy a multi honeypot in a cloud environment
that captures attacker patterns and then evaluates the collected data for intrusion
detection functionality. This paper presents attack detection approaches based on the
use of honeypots in a cloud environment to create an intrusion detection system. The
following are the key performance contributions:
• Improved honeynet-based IDS that will be used to identify attacks and anomalies
in a cloud environment. It provides the multi honeypots platform for analyzing
and understanding the behavior of attackers.
• Identified the abnormalities and intrusion attempts by using anomaly detection.
• Analyzed and recognized anomalies in attacks in a cloud environment by learning
from an attacker’s behaviors.
• The development of a honeynet data collection system is the major contribution
of this study.
• A rapidly deployable pre-configured network of honeypots, which are devices
possible to detect active threats in public cloud, is a unique component of this
system.
The following is how the rest of the paper is structured: A overview of relevant
honeypot work for intrusion detection systems is included in Sect. 2. Section 3
describes the proposed framework and detecting intrusion attacks in the cloud and
offers a methodology for data collection, and Sect. 4 presents the findings of the data
analysis and the experiments. Finally, we come to this conclusion in Sect. 5.
2 Related Works
This section presents the relevant honeypot work for intrusion detection systems.
The honeypots are a type of network traffic and malware investigation tool that has
been widely utilized. Lance Spitzner [12], the Honeynet Project’s creator, defines a
honeypot as “a security resource” whose usefulness is contingent on being targeted
or compromised.
Majithia et al. [13] have used the model of running honeypots of three types
on a Docker server, with a logging management mechanism that is built on top of
the ELK framework, and discussed issues and security concerns associated with
each honeypot. The honeypots used were HoneySMB7, Honey WEB-SQLi, an
HTTP protocol honeypot that includes SQL injection vulnerability, and HoneyDB,
a honeypot built for MySQL databases vulnerabilities, the work displayed analysis
of the attacks using unique IPs and the distribution among the honeypots.
Adufu et al. [14] investigated and compared running molecular modeling simu-
lation software, auto dock, on container-based virtualization technology systems
and hypervisor-based virtualization technology systems, and concluded that the
container-based systems managed memory resources in an efficient manner even
when memory allocated to instances are higher than physical resources, not to
mention the reduction in the number of execution times for multiple containers
running in parallel.
Seamus et al. [15] built a study honeypot aimed at Zigbee device attackers. Zigbee
devices are typically used in Manets. Since IoT devices are becoming more exten-
sively used, their risks are being more generally recognized, motivating the develop-
ment of this honeypot. As a result, a risk evaluation of these devices is critical. They
used the honeypot in their implementation.
To catch the hacker’s unethical behavior, Jiang et al. [16] used an open-source
honeynet setup. During the process of the study, nearly 200,000 hits were discovered.
This test explored ways for intruders to be notified of their goals, such as a web server,
FTP server, or database server.
Sokol et al. [24] created a distribution of honeypot and honeynet used for OS
virtualization, a method that was largely unexplored in research at the time. The
research’s most major contribution is in the automation of honeypots, with their
technique for generating and evaluating their honeynet solution being remarkable.
According to the study, OS-level virtualization has very little performance or
maintenance overhead when compared to virtualization technologies or bare-metal
systems. They also point out that utilizing containers to disguise honeypots adds an
extra element of obfuscation. Even though, they are confined environments sharing
the kernel of a legitimate operating system, when fingerprinted, they are more likely
to appear as a valid system [17, 30].
Alexander et al. [25] have employed as an alternative to virtualization, researchers
investigated the usage of Linux containers to circumvent a variety of virtual environ-
ment and monitoring tool detection methods. The goal was to see if using container
environments as a way to host honeypots without being identified by malware would
be possible in the long run [18].
Chin et al. [26] proposed a system called HoneyLab, which is public infrastructure
for hosting and monitoring honeypots with distributed computing resource struc-
ture. Its development was prompted by the discovery that combining data collected
from honeypots in diverse deployment scenarios allows attack data to be connected,
allowing for the detection of expanding outbreaks of related attacks. This system
collects data from a huge number of honeypots throughout the globe in order to
Table 1 Comparison of various honeypots

ManTrap BOF Spector Honeynet Honeynet
Interaction level High Low High Low High
Freely available No No No Yes Yes
Open source No No No Yes Yes
Log file support Yes No Yes Yes Yes
OS emulation Yes No Yes Yes Yes
Supported service Unrestricted 7 13 Unrestricted Unrestricted
identify attack occurrences. Their approach, on the other hand, is based on two low-
interaction honeypots, which restrict the amount of data acquired from the attack.
[20]. In order to gain a better knowledge of attacker motivations and techniques, an
improved system would be able to gather more data on attack occurrences [28].
Table 1 summarized the comparative analysis of five different honeypots in the
tabular form.
3 Methodology
This work proposed a new honeynet-based intelligent system for detecting cyber-
attacks in the cloud environment. It demonstrates the system configuration of
container-based multiple honeypots that have the ability to investigate and discover
the attacks on a cloud system. The complete implementation of all honeypots created
and deployed throughout the investigation, as well as a centralized logging and moni-
toring server based on the Elasticsearch, Logstash, and Kibana (ELK) stack, were
included in the section. This tracking system is also capable of monitoring live traffic.
Elasticsearch was chosen because it can provide quick search results by searching an
index rather than searching the text directly. Elasticsearch is a scalable and distributed
search engine [19]. Kibana is a freely available client-side analytics and search dash-
board that visualizes data for easier understanding. It’s used to display logs from
honeypots that have been hacked [21].
The information was acquired over a period of a month, during which time all
of the honeypots were placed in various locations across the globe. The honeypots’
capabilities can considerably assist in attaining the recommended method to reducing
threats to critical service infrastructures. Many of these tasks have been recognized
as being provided by containers. The simplicity with which identically configured
environments may be deployed is one of the major advantages of container technolo-
gies. Container technologies, on the other hand, cannot provide the same simplicity
of deployment for a fully networked system [18]. This motivated the development
of a deployment mechanism for the whole system, allowing for its reconditioning in
a limited span of time. Figure 3 shows the detailed system overview of the model
framework.
Fig. 3 A detailed system overview of the deployed honeynet framework
As there are several existing techniques, both in research and development that
have made significant contributions to such a solution. There is no one method that can
give a viable, workable way to deploy active network defense as a single-networked
deployable unit [22].
The preliminary development approach used a network design that would help
the researchers achieve their main goal of creating a flexible honeynet in a Cloud
scenario, which can manage any illegal entry or untrusted connections and open a
separate Docker container for each attacker’s remote IP address. Figure 4 describes
the dataflow of the proposed solution. This proposed system was designed to be
scalable and adaptable, allowing new features to be added rapidly and the platform
to adapt to the unique requirements of a given infrastructure. It is made up of three
primary components that were created independently using the three-tier application
approach as follows:
• DCM: It is a data collection module that gathers essential information from a
variety of physical and virtual data sources.
Fig. 4 Data flow for the honeynet-based attack detection framework

• DAM: It is a data analysis module that provides the user with a set of advanced
analyzes to produce physical, cyber, or mixed intelligence information (for
example, cyber threats evaluation, facility classification by criticality, and pattern
detection from social interactions) by processing the stored raw data.
• DVM: It is a data visualization module that provides true awareness of the physical
and cyber environments through a combined and geospatial representation of the
security information.
This experimental design has tested the two types of containers (SSH and HTTP).
The Suricata container has been added as an example of the different types of honey-
pots that were utilized in the model. In order to overwhelmed limits and the difficult
setup of the honeypot network, virtualized systems on cloud infrastructures have
been used. AWS cloud provider was considered for these purposes. The architecture
of the honeypot system is illustrated in Fig. 5. The route of the attackers in the attack
scenario is depicted in Fig. 6. Within the set-up, the following attack scenario was
carried out: Using SSH or Telnet, an attacker was able to obtain access. Any root
credentials would get access to the SSH session when requested to log in. The attacker
would again try to find further weaknesses on the computer. When the attacker is
satisfied, he will try to download and run malicious programs on the machine. In this
approach, a purposeful vulnerability is presumed, the goal of which is to fool the
attacker into thinking the system has a flaw, essentially studying the attacker’s path
and attack tactics.
The experimental setup comprises of five honeypots and a supplementary system
for collecting the logs generated [23]. This experimental design has tested the two
types of scenarios (SSH and HTTP).
Fig. 5 Architecture of the honeypots system

Fig. 6 Attacker’s path in the attack scenario
Testing Scenarios
In this experiment, we applied three test case scenarios for the purpose of verification
of the functionality of the model.
SSH Scenario
SSH scenario created an SSH connection from a considerate simulated attacker and
observed the following:
• An instance of the Kippo container was created, and the traffic was forwarded to
it as shown in Fig. 7.
• The attacker was able to navigate through the Kippo interactive terminal with fake
file-system observed in Fig. 8.
• A fingerprinting attack easily detected a well-known fingerprint indicator for
Kippo honeypot using the command ‘vi’.
• Kippo honeypot container logs were saved and forwarded to syslog for recording
all the interactive communication with the attacking session.
Fig. 7 SSH container established session

Fig. 8 Client browsing through Kippo fake file-system
Fig. 9 HTTP container established session
HTTP Scenario
Creating an http request to the reverse proxy address would result into the following:
• Create an http honeypot using Glastopf Docker image with the specified naming
convention shown in Fig. 9.
• Attacker browsing a fake web server page where he can apply different attacks
trying to authenticate shown in Fig. 10.
• Container logs collected and sent to syslog highlighting the source IP of the
original attacker.
The honeynet behavior, as expected was creating a container per attacking session
(unique IP) with the naming convention of having the image name associated with
the IP of the originated source of attack to make it exclusive for this session. The
attacker was directed to a fake website to apply different attacks that recorded and
limited inside the dedicated Glastopf container [24].
4 Data Analysis and Results Discussion
The experimental study was performed for a period of six month, over 5,195,499
log entries from attackers were acquired for further analysis. The real-time data
was gathered from August 19, 2020, to February 19, 2021, and the findings were
compiled from the dataset using the Kibana interface, which allows for data aggre-
gation across multiple fields of the whole database. The main task is to find solutions
Fig. 10 Client browsing through Glastopf fake web server
to particular investigation queries, such as the source, target, and attack technique.
The observation reflects the legitimate implementation of honey net system. Honey
net with a flexible and dynamic transition between honeypots can reveal some of
the future and potential attacks for a cloud environment, through allowing attacker
strike a fraudulent system with the same potential vulnerabilities [29]. The intrusion
data was investigated by manipulating the counts, ratios, statistical Chi-Squared χ 2
test and the P-value =<0.0001 for each honeypot. A Chi-Squared statistical test was
used in this study to determine the statistical significance of the results. It is a most
well-known statistical measuring approach, that analyzes the connection between
two variables. The following is the Chi-Squared (CS) formula [26]: The following
is the Chi-Squared (CS) formula [26]:
2
χ2 = Oi j E i j /E i j (1)
ij
where O represents observed value, E represents expected value, χ 2 represents Chi-

Squared value, and i and j are two variables. The results obtained showed that the
leading top ten source IP addresses, top ten prominent attacks, top ten leading coun-
tries attacks, ten prominent everyday attacks, ten leading passwords, ten leading
usernames and ten leading source ports. Figure 11 shows the honeypots attack map
visualization on Kibana, which emphasis on the map of all attacking countries all over
the world, also exhibiting that high prevalence of attacks from Netherland composed
the supreme attacks (23.68%) and Canada produced the lowest attacks (1.96%).
Netherland had the high-pitched prevalence of attacks on honeypots. Approximately,
Fig. 11 Visualization of honeypots attacks in different regions of the world map
5,195,499 attacker’s hits were discovered in the period from August 19, 2020, to
February 19, 2021.
Table 2 displays the total number of counts and the percentage of honeypot attacks
from the top 10 regions. The statistical χ 2 test value of 75,010.689 for examining
the independence of the measures of those attacks from these countries is got as
significant at the 0.0001 level. The results obtained showed that excessive preva-
lence of attacks from Netherland was seen. Figure 5 in Chap. 10 depicted the graph
of maximum prominent ten countries attack on honeypots from the leading ten
countries. Overall, Netherland produced the topmost attacks (23.68%) and Canada
generated the least traffics (1.96%) on honeypots. Subsequently, Netherland had the
extreme prevalence of attacks on honeypots. The total 177,615 counts of hits were
Table 2 Top 10 attacks from

Country Count Percentage
different countries
Netherland 177,615 23.68
China 148,206 19.76
US 143,807 19.17
Germany 75,022 10
Russia 74,481 9.93
Republic of Korea 47,314 6.31
France 29,959 3.99
Australia 23,951 3.19
Brazil 15,064 2.01
Canada 14,681 1.96
Table 3 Top 10 source IP

Source IP CNT Percentage
37.49.231.70 125,624 92.8
185.222.56.196 57,820 5.88
58.218.207.85 51,365 0.31
167.99.39.142 36,802 0.26
37.49.231.40 35,628 0.23
222.239.10.135 25,435 0.19
116.31.116.5 25,101 0.15
52.187.253.72 23,785 0.09
98.18.169.34 23,167 0.08
119.32.3.66 16,704 0.02
revealed from Netherland and the least counts 14,681 of hits were revealed from
Canada. The results obtained were clearly showed that the excessive prevalence of
attacks from Netherland and Canada generated the least on honeypots.
Table 3 summarizes the overall counts and percentages of threats across all honey-
pots based on the top ten most visible source IP addresses. For analyzing the indepen-
dence of the measurements of those attacks from these IP addresses, the statistical χ 2
test value of 42,143.1345 is significant at the 0.0001 level. The result is a large number
of cyber-attacks from the IP address 37.49.231.70, and the IP address 119.32.3.66
was the source of the fewest attempt.
Figure 12(6) depicts the graphs of honeypot cyber-attacks from the top 10 source
IP addresses.37.49.231.70 generated the supreme attacks (92.8%) and the IP address
119.32.3.66 produced the least attacks (0.02%). Consecutively, the results obtained
were clearly showed that the IP address 37.49.231.70 had the immense prevalence
of attacks on honeypots from different countries, such as US, Germany, Russia,
republic of Korea, France, Australia, Netherlands and Canada. For investigating
the independence of the consequences of the attack from these IP addresses, the
statistical χ 2 test value of 42,143.1789 is significant at the 0.0001 level. Attacks
from IP address 37.49.231.70 have had a huge impact, as seen in Fig. 10. Clearly,
the IP address 37.49.231.70 generated the most cyber-attacks (92.8%), whereas the
IP address 119.32.3.66 generated the fewest (0.02%).
Table 4 summarizes the total number of counts and ratios of the attacks on all
the honeypots from the maximum ten source port used by intruders from different
countries. The statistical χ 2 test value of 11,322.923 for studying the independence
of the measures of those attacks from sources port is seen as significant at the 0.0001
level. The results obtained showed that excessive prevalence of attacks using port
5900.
Figure 12(1) shows the trend of honeypot attacks from the top ten source ports.
Clearly, port 5900 was used for ultimate assaults (22.38%) and port 7070 was used
for minimal attacks (2.78%) from the United States. A total of 3177 attempts were
made. As a result, attacks on honeypots from China were quite common on port 5900.
Fig. 12 Attack’s analysis results on different honeypots
Table 4 Top 10 source port

Port Country count Percentage
5000 Netherland 26,214 20.87
5038 Russia 3146 12.56
8000 Germany 2618 3.77
7000 Netherland 24,496 10.5
7070 US 3177 2.78
5900 China 28,524 22.38
2222 China 5961 8.03
2223 Russia 5781 7.42
445 China 4992 6.42
5060 US 8320 5.27
Table 5 Top 10 username

Username Count Percentage
Root 187,615 24.68
Admin 138,206 17.76
Enable 123,806 16.17
User 75,022 10
Shell 74,481 9.93
Default 47,314 6.31
Support 29,959 3.99
Quest 23,951 3.19
Operator 15,064 2.01
Super user 14,681 1.96
Figure 12(2–4) visualizes the username and password on honeypot Cowrie from
different countries, noticeably, root used the most attempts on cowrie while admin
used second attempts on cowrie. Subsequently, root had the tremendous occurrence
of attacks on honeypots. Table 5 gives details on the overall number of counts and
attack ratios across all honeypots from the top ten users. The most repeatedly used
usernames such as root, admin, enable, user, shell, etc., the root was the top most
often used username.
For analyzing the similarity of the totals of usernames used in the attacks, the
statistical χ 2 test result of 73,009.956 is significant at the 0.0001 level. The results
obtained showed that excessive prevalence of attacks from username root was seen.
Figure 12(2) depicts the graph of the attacks on the honeypots from the maximum
ten usernames. Clearly, root used the most (24.68%) and super user was used the
minimum (1.96%). Subsequently, the topmost incidence of attacks using root. Table 6
displays the total number of counts and ratios of the attacks on all the honeypots from
the leading ten passwords. The statistical χ 2 test value of 75,010.896 for examining
Table 6 Top 10 password

Password Count Percentage
sh 177,615 23.68
12,345 148,206 19.76
system 143,807 19.17
123,456 75,022 10
admin 74,481 9.93
1234 47,314 6.31
Password 29,959 3.99
Pass 23,951 3.19
root 15,064 2.01
user 14,681 1.96
the independence of the extents of passwords used in the attacks is significant at the
0.0001 level. The findings reveal that an excessive occurrence of attacks with the
password sh was seen.
Figure 12(5) visualizes the attacks on the honeypots from the topmost ten pass-
words. Evidently, sh was used the ultimate (23.68%) and user used the minimum
(1.96%). Subsequently, sh had the extraordinary prevalence of attacks on honeypots.
The total 177,615 counts of hits were revealed from password sh and the least counts
14,681 of hits were revealed from user.
Figure 12(6) visualizes the attacks on the honeypots from the topmost ten IP
address. Honeypots source IP reputation are segregated into known attacker, bad
reputation, anonymizer, malware, form spammer, bot, crawler, mass scanner, bitcoin
node, mining node, Hackers use various OS distribution such as Window 7, Linux
3.11, Linux3.x, etc. Figure 11 Depicts the per day Attack’s analysis on different
honeypots.
5 Conclusion
This study proposed a honeynet-based multi-platform honeypot for detecting cyber-

attacks. The development and analysis aid us in identifying the methods and tools
utilized by malicious attackers, and even some their tactics of system manipulation.
This article demonstrates the containerized honeynet system in a cloud environment
for detecting cyber-attacks, as well as how they can be created using open-source
technologies. We built up a pre-configured network of honeypots that can be quickly
configured and provide active threat detection in cloud networking infrastructures,
which is a unique component of this system. The development of this honeynet
methodology is essential to recover threat identification and securing the cloud envi-
ronment. Moreover, the experiment results reveal that this defending mechanism may
detect and log an attacker’s behavior which can expose the new attack techniques
and even zero-day exploits.
References
1. Grance T, Mell P (2009) The NIST definition of cloud computing. National Institute of
Standards & Technology (NIST). http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf
2. Roschke S, Cheng F, Meinel C (2009) Intrusion detection in the cloud. Dependable, autonomic
and secure computing. In: IEEE international symposium on cloud computing, pp 729–734
3. Hoque MS, Bikas MA (2012) An implementation of intrusion detection system using genetic
algorithm. Int J Netw Sec Appl (IJNSA)
4. DTAG Community Honeypot Project (2016) T-Pot 16.10—multi-honeypot platform redefined.
http://dtag-dev-sec.github.io/mediator/feature/2016/10/31/t-pot-16.10.html. Accessed 2 June
2018
5. Amazon Web Services, Inc. (2018) What is AWS?—Amazon web services. https://aws.ama
zon.com/what-is-aws/. Accessed on 5 Apr 2018
6. Mohallel AA, Bass JM, Dehghantaha A (2016) Experimenting with Docker: linux container and
base OS attack surfaces. In: 2016 international conference on information society (i-Society),
pp 17–21
7. Docker Inc. (2018) Docker security—docker documentation. https://docs.docker.com/engine/
security/security/. Accessed on 21 Apr 2018
8. Docker Inc. (2018) Docker hub. https://hub.docker.com/. Accessed on 06 May 2018
9. Elasticsearch BV (2018) Heap: sizing and swapping—elasticsearch: the definitive guide
[2.x]—elastic. https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html.
Accessed on 13 Apr 2018
10. Anicas M (2017) How to install elasticsearch, Logstash, and Kibana (ELK Stack) on
Ubuntu14.04—DigitalOcean. https://www.digitalocean.com/community/tutorials/how-to-ins
tall-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04. Accessed on 15 Apr 2018
11. Amazon Web Services, Inc. (2018) AWS IP address ranges—Amazon web services. https://
docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html. Accessed on 16 Apr 2018
12. Spitzner L (2003) Honeypots: catching the insider threat. In: Proceedings of the 19th annual
computer security applications conference, ACSAC ’03. IEEE Computer Society, Washington,
DC, p 170
13. Majithia N (2017) Honey-system: design, implementation & attack analysis. PhD thesis, Indian
Institute of Technology, Kanpur
14. Adufu T, Choi J, Kim Y (2015) Is container-based technology a winner for high performance
scientific applications. In: Network operations and management symposium (APNOMS).
IEEE, pp 507–510
15. Dowling S, Schukat, Melvin M (2017) A zigbee honeypot to assess IoT cyberattack behaviour.
In: 28th Irish signals and systems conference (ISSC), pp 1–6
16. Jiang X, Xu D, Wang Y-M (2006) Collapsar: a vm-based honeyfarm and reverse honeyfarm
architecture for network attack capture and detention. J Parallel Distrib Comput 66(9):1165–
1180
17. Han W, Zhao Z, Doupé A, Ahn G-J (2016) Honeymix: towards sdn-based intelligent honeynet.
In: Proceedings of the 2016 ACM international workshop on security in software defined
networks & network function virtualization. ACM, pp 1–6
18. Kaur T, Malhotra V, Singh DD (2014) Comparison of network security tools—firewall,
intrusion detection system and honeypot, p 202
19. Krawetz N (2004) Anti-honeypot technology, IEEE Security & Privacy, pp 76–79
20. Sahu N, Richhariya V (2012) Honeypot: a survey. Int J Comput Sci Technol
21. Vasilomanolakis E, Karuppayah S, Kikiras P, Mu¨hlhäuser M (2015) A honeypot-driven cyber
incident monitor: lessons learned and steps ahead. In: Proceedings of the 8th international
conference on security of information and networks, SIN ’15. ACM, New York, NY, pp. 158–
164
22. Leung A, Spyker A, Bozarth T (2018) Titus: introducing containers to the netflix cloud.
Commun ACM 61:38–45
23. Combe T, Martin A, Pietro RD (2016) To docker or not to docker: a security perspective. IEEE
Cloud Comput 3:54–62
24. Pisarcık P, Sokol P (2014) Framework for distributed virtual honeynets. In: Proceedings of the
7th international conference on security of information and networks, SIN ’14. ACM, New
York, NY, pp 324:324–324:329
25. Kedrowitsch A, Yao DD, Wang G, Cameron K (2017) A first look: using linux containers for
deceptive honeypots. In: Proceedings of the 2017 workshop on automated decision making for
active cyber defense, SafeConfig@CCS 2017. Dallas, TX, USA, October 30–November 03,
2017, pp 15–22
26. Chin WY, Markatos EP, Antonatos S, Ioannidis S (2009) HoneyLab: large-scale honeypot
deployment and resource sharing. In: 2009 third international conference on network and
system security, pp 381–388
27. Kyung S, Han W, Tiwari N, Dixit VH, Srinivas L, Zhao Z, Doupé A, Ahn G-J (2017)
Honeyproxy: design and implementation of next-generation honeynet via SDN. In: IEEE
conference, on communications and network security (CNS)
28. Sokol P, Míšek J, Husák M (2017) Honeypots and honeynets: issues of privacy. EURASIP J
Inf Sec (1):4
29. Krishnaveni S, Prabakaran S, Sivamohan S (2018) A survey on honeypot and honeynet systems
for intrusion detection in cloud environment, American scientific publishers all rights reserved
printed in the United States of America. J Comput Theoret Nanosci 10(15):2956–2960
30. Sivaganesan D (2021) A data driven trust mechanism based on blockchain in IoT sensor
networks for detection and mitigation of attacks. J Trends Comput Sci Smart Technol (TCSST)
3(01):59–69
31. Samuel MJ (2021) A novel user layer cloud security model based on chaotic Arnold
transformation using fingerprint biometric traits. J Innov Image Process (JIIP) 3(01):36–51
32. Shakya S, Pulchowk LN, Smys S (2020) Anomalies detection in fog computing architectures
using deep learning. J Trends Comput Sci Smart Technol 1:46–55
A Gender Recognition System
from Human Face Images Using VGG16
with SVM
S. Mandara and N. Manohar
Abstract Human beings have many distinct attributes, and facial features are a
significant part of them. Facial features help in distinguishing people. An auto-
matic gender recognition (AGR) system recognizes a person’s gender based on
these distinct features by highly advanced human cognitive skills built through suffi-
cient training. This paper proposes a system which combines convolutional neural
network and SVM to classify the gender of human. It aims in achieving efficiency on
a larger dataset which includes face images of all human stages. Initially, human face
images are trained using the pre-trained model of CNN, i.e., VGG16. In addition,
the extracted features are loaded into the SVM classifier for classification, which
identifies the gender class and labels it to which class the input image belongs to,
i.e., male or female. The system performance is evaluated on a larger dataset. The
proposed method achieves good classification results on the larger dataset.
Keywords Convolutional neural network (CNN) · Support vector machine

(SVM) · Visual geometric group (VGG16)
1 Introduction
The recent technological advancement and increased popularity in areas like artificial
intelligence, machine learning, data analysis, etc., have opened a wide range of
opportunities for research. Image processing is also one of the popular research
areas. It is used in the data manipulation and studying of a dataset-based images
which help in image analysis and experimentation.
Face recognition technology, nowadays, has gained a wider base; the gender
identification from the facial images plays a big role in many of these areas. In
the previous years, many techniques have been developed for classifying the gender
from images. The gender identification or classification is a binary classification tech-
nique which classifies the gender to male and female class. This identity performs
S. Mandara (B) · N. Manohar

Department of Computer Science, Amrita School of Arts & Sciences, Amrita Vishwa
Vidyapeetham, Mysuru Campus, Mysuru, India
https://doi.org/10.1007/978-981-16-7610-9_22
310 S. Mandara and N. Manohar
an essential function in social interactions as unique languages have unique greet-

ings and grammar regulations for men or women. Regardless of the work being
performed, these rules play crucial role in our everyday lives. Many researches are
carried out in this field to make the identification system an automatic process and
have also put their efforts to associate the classification technology with the real-time
implementation that are carried by commoners.
Our paper proposes the work carried in gender classification and also presents the
work carried using VGG16 along with SVM. VGG16, a pre-trained CNN model, is
used in the training phase of system which helps the system to train well on very
large face dataset. The CNN-based model is exceptionally successful in diminishing
the quantity of boundaries without decaying the nature of model. This trained model
extracts the feature maps from the images. Then, the extracted feature matrix is
sent to the SVM classifier for classification. SVM classification distinctly classifies
the data points, i.e., groups data points into two classes based on the similarity.
This classification has a greater result on two class classification. This paper is in
the accompanying way. A brief written overview of current models and methods is
presented in Part 2. Part 3 explains the proposed strategy in detail. The complexity
of the results obtained and the dataset used are introduced in Part 4, and the work is
completed in Part 5.
2 Related Work
Prior to presenting the proposed technique, we audit related work for gender clas-
sification and give a superficial outline of the existing work related to our area of
research. The process of extracting gender related features from facial images has
recently received a lot of technical aspects, and several techniques have been indi-
cated for the same [1]. We note that this show disdain toward the extraction on features
matrix arrangement as classify the exact gender assessment [2], the overview below
incorporates techniques intended for one or the other undertaking.
Early gender personality methodologies rely upon figuring extents between
various evaluations of facial reflexes. The moment when the facial reflections (such
as eyes, nose, mouth, and chin) are restricted and their size and spacing are evaluated,
they are allowed to extend between them and used to query faces of different gender
orientations, as shown by the manual principle. Every one of the later strategies [3]
uses an equivalent method to manage model gender orientation grid in subjects under
gender class. As these techniques require localization of actual facial features, a diffi-
cult problem by itself, they are unacceptable for in-the-wild footage where one might
want to seek out on amicable stages. As an alternative use, a system that treats devel-
opment cooperation as a topological space [4] or posh [5]. A disadvantage of those
systems is that they need input footage to be shut forward looking. These systems as
such present preliminary outcomes simply on constrained instructive assortments of
close forward looking pictures (example MORPH [6]). As a result, these methods
are again confused with dissolute images.
A Gender Recognition System from Human Face … 311
As mentioned above, what is unusual is the strategy of image matching using

similar attributes. In [7], Gaussian mixture model (GMM) [8] used in reproducing
the spread of facial defects. In [9], the matrix model is again used to consider the
distribution of adjacent face scores using fixed descriptors instead of pixel regions
in each case. Finally, the hidden Markov model is replaced by the super vector [9]
in [10] and the search is performed with GMM. A choice as opposed to the close by
picture power patches are fiery picture descriptors: Gabor picture descriptors [8] are
used in combination with the fuzzy LDA classifier in [7], which treats facial images as
points with multiple gender orientation categories. In [4], natural animation function
(BIF) [9] and various complex teaching methods are combined to assess gender. The
reformer’s gender orientation classifier closely uses the Gabor feature [11] and the
features of the dual-neighborhood model (LBP) [12] in [13] close by a reformist
gender orientation classifier composed of SVM [5] to bunch the data picture to a
gender class followed by an assistance vector backslide[14] evaluate a thorough
gender classification.
Ultimately, [9] proposed the improvement and transformation of related part
examination [8] and the prediction of local economy [10]. These technologies use
activity appearance models [15] as embedded images, which are independently used
for distance learning and dimensionality reduction. These methodologies have end
up being incredible on little or possibly obliged benchmarks for gender orientation
distinguishing proof. In short, the most effective method was found in the group
photo test [9]. In [4], the current performance of the test is shown using various LBP
descriptors [14] and the wrong SVM classifier. The result report reports that there is
indeed a problem and plan to compare. One of the earliest methods of gender recog-
nition [13] is to use the nerve organization prepared utilizing close forward looking
face pictures. In [3], 3D head development (obtained by a laser scanner) and image
power are combined to determine the gender direction. They are used [14] and are
explicitly applied to the image level. Perhaps with the help of SVM [11], AdaBoost
used for a comparative explanation, which is also about image performance here.
At long last, a gender recognizable proof request that is viewpoint invariant was
presented by [5].
The new strategies [12] use the Weber’s local surface descriptor [10] for gender
direction affirmation, displaying close ideal processing on the FERET standard [5]. In
[15], the force, shape, and characteristics of the surface are used together with general
information, so that almost ideal results can be obtained in FERET test. Previously,
analyzed strategies used FERET benchmark [5] for designing the proposed system
and review exhibitions. These images were taken under very controlled conditions, so
their probability is much lower than that of facial images in the wild. In addition, the
results obtained in this test indicate that it is being evaluated and is not applicable to
the current system. The proven comparative advantages of these methods are difficult
to assess. Therefore, [9] studied the standard reference point “labeled face in nature”
(LFW) [14], which is mainly used for face verification. This process combines LBP
function and AdaBoost classifier. And with the gender level and the dataset, we rely
on contains more test images than the images provided by LFW and uses a more
general structure suggested for all information to inform performance.
3 Proposed Work
The methodology proposed in this work envisions to improvise and enhance the
system’s performance and efficiency on a larger dataset which outperforms the earlier
works. Previously, machine learning methods used traditional visualization func-
tions such as local binary files, color histograms, and SIFT [16]. However, literature
shows that CNN’s performance is better than these previous ones. Figure 1 shows
the architecture diagram of our proposed system, which includes two stages: training
and testing. The images collected are loaded into the pre-trained VGG16 network
model. CNN automatically learns these images attributes in hierarchical structure.
This network comprises of rich feature representatives that are from a wide set of
images. The below layer checks the corner and edges of attributes such as color and
shape, and the upper layer represents objects in the image [17]. Therefore, CNN is
best suited for image classification tasks. In traditional networks, the entire system is
trained from scratch due to insufficient data. Therefore, we use a pre-trained CNN for
training the system. The pre-trained network (CNN) can train more than million of
image and classify those images into categories of 1000 objects. Few most popular
networks are VGG16, AlexNet, VGG19, GoogLeNeT, etc. Larger dataset of face
images is used in training by VGG16 model. From the initial input layer to the final,
i.e., max pooling layer (marked as 7 × 7 × 512), the feature extraction part is taken
into account, and the other layers, i.e., last fully connected layer is removed.
We use SVM in this model for the purpose of classification. As SVM provides
greater results for two class classification problems, merging this SVM with CNN
Fig. 1 Proposed system architecture

Fig. 2 Proposed VGG16 architecture with SVM classification
Fig. 3 General VGG16 architecture [18]
can yield to a greater results and works well on huge dataset also. So, the feature
matrix extracted from CNN is then fed into this classifier for the prediction of gender
class labels (male or female) as shown in Fig. 2.
3.1 Visual Geometric Group (VGG16) Configuration
In this section, generalized architecture of VGG16 model Fig. 3 is described by

Andrew Zisserman and Karen Simonyan [18]. This model has total 13 convolutional
layers and fully connected three layers. The initial two layers are convolution layers
with 3 * 3 filters. The initial two layers use 64 filters. Since the same convolution is
used, the volume is 224 * 224 * 64. The filter is of 3 * 3 with a step size of one. Use a
pooling layer with a maximum pool size of 2 * 2 and a step size of two. Reduce both
the height and also the width of the volume from 224 * 224 * 64 into a new size 112
* 112 * 64, and then a two-layer convolution with 128 filters. The new size is 112 *
112 * 128. After using convolutional layers, volume is 56 * 56 * 128 and two other
convolutional layers with 256 filters added, followed by a max pool layer, reducing
the size to 28 * 28 * 256. The other two stacks consist of three convolutional layers,
and each of them is separated using a max pool layer. After the final pooling layer,
the volume (7 * 7 * 512) is smoothed to a fully connected layer (FC) which includes
4096 channels and softmax class with 1000 output [19].
3.2 SVM for Gender Identification
SVM is considered as discriminant-based classifier and is taken into account to

be the foremost powerful classifier within the literature [20]. Gender classification
labels images into two categories only. Therefore, using SVM, which is best for
bi-classification gives good output results and can be recommended for labeling
unknown face images [21].
In the training process, the SVM classifier uses “N” samples for training, the
number of features is “k,” and support vectors associated with each category are
stored in the knowledge database. In the testing phase, query samples with the same
feature “k” are projected onto a hyper plane. This process classifies if the query
sample belongs to one of the available classes [22].
Steps carried in SVM are
• Preparing model from the dataset.
• Extracted features from the test input will be verified based on the features of the
dataset.
• Classification of the gender based on the mapping of features classified.
4 Experimental Details
This section briefly elaborates the dataset that is created and collected for experiment
to test and do evaluation on our proposed methods.
4.1 Dataset
The dataset for our experimentation is comprised of 24,000 face images with two
classes. Since convolutional neural networks only work effectively when the dataset
is large, we created our own dataset to experiment and some images are captured,
some were collected from the person, and other images were downloaded from
online source like celebrity images, group photos of people, and publically available
image sets. There are 12,000 images of each class, i.e., male and female class which
comprises of all age group’s human faces. For analyzing the effectiveness of projected
methodology, we considered the images of children’s face, old age human faces where
it makes the gender predictions difficult as the face features might vary and analyzing
it based on the pre-defined learning becomes challenging. Some problems such as
multi-faceted images with different lighting, occlusion, faces captured in different
positions, and different views are common. Figure 4 shows some example images
from randomly selected images in the dataset (Table 1).
Fig. 4 Sample images of human face

Table 1 Experimental dataset description

Entries Description
Dataset content Consists of 24,000 images of male and female class each consisting of 12,000
face images comprising of individual, group images of young, middle, and old
age faces
Dataset source Captured images, downloaded images from publically available dataset such
as celebA dataset and UTKFace dataset
This implementation runs on the above mentioned dataset, with training and testing
phases. For these steps, the input image has been resized to 227 × 227. The implemen-
tation is done on 1 GB Radeon HD 6470M GPU processor. In the training phase,
VGG16 pre-training weights are used for training the images and extracted 4096
features. The testing phase uses SVM classifier to extract the same set of features
from unlabeled face image. SVM utilizes linear kernel for training.
5 Results
Two significant closing can be created with our results. Regardless, in either case,
CNN is useful to obtain complex genre results, because the size of the image set is
more unlimited. Second, the effectiveness of our model shows that using prepared
information, more complex structures can improve results and accuracy. Here, SVM
is used as classifier for human face recognition issue. Figure 5, 6, and 7 are some
image samples used for gender classification.
These show that a significant part of the misunderstandings made by our structure
are a result of incomprehension in the viewing conditions of the reference image
segment of the dataset. Most conspicuous are botches achieved by dark or low
objective and obstacles (particularly from considerable beautifiers). Gender orienta-
tion evaluation befuddles regularly occur with pictures of newborn children or little
young people where clear gender credits are not yet evident. So our system has tried
to overcome this issue using varied images which have considered all age groups
face images.
The proposed model is evaluated using standard verification criteria such as preci-
sion, recall, and F-measure. These scores are calculated based on the confusion matrix
obtained when classifying human gender images. To evaluate the performance of the
proposed system, we conducted experiments with our dataset and showed the results
by training and testing on different set of samples.
Fig. 5 Gender classification

using SVM
In the first trial, out of 24,000 samples, we considered 7200 images (30% of 12,000
male and 30% of 12,000 female class samples) for training and 16,800 samples for
testing to evaluate the performance.
In the second trial, out of 24,000 samples, we considered 12,000 images (50%
of 12,000 male and 50% of 12,000 female class samples) for training and 12,000
samples for testing to evaluate the performance.
In the final trial, out of 24,000 samples, we considered 16,800 images (70% of
12,000 male and 70% of 12,000 female class samples) for training and 7200 samples
for testing to evaluate the performance.
The accuracy and the performance of the system for each trial are shown in the
Table 2.
Fig. 6 Gender classification using SVM
6 Conclusion
In this project, along with deep learning CNN model VGG16, we have combined
SVM classifier to classify human gender. Initially, the images are fed into the pre-
trained system, i.e., VGG16 for feature extraction. From the extracted features, SVM
classifies the gender. Despite the fact that several previous strategies have addressed
the issue of gender identification, much of this study has recently focused on images
taken in gender classification. For experimentation, we created and also gathered
images from different sources which consist of 24,000 face images in the dataset
that includes samples of all age groups and with two classes. By the analysis of
the gender classification using deep learning approach with SVM, we obtained good
accuracy of 92.34% on a larger dataset. The system classifies and performs well even
when varied and different images of face are used.
Fig. 7 Gender classification using SVM
Table 2 Classification results in terms of percentage

Train-test (%) Accuracy (%) Precision (%) Recall (%) F-measure (%)
30–70 86.52 86.98 86.98 86.98
50–50 90 90.08 90.08 90.08
70–30 92.3 92.34 92.34 92.34
References
1. Fu Y, Guo G, Huang TS (2010) Gender classes synthesis and estimation via faces: a survey.
Trans Pattern Anal Mach Intell 32(11):1955–1976
2. Fu Y, Huang TS (2008) Human gender classes estimation with regression on discriminative
aging manifold. Int Conf Multimed 10(4):578–584
3. Gao F, Ai H (2009) Face age, gender classification on consumer images with gabor feature and
fuzzy LDA method. In: Advances in biometrics. Springer, pp 132–141
4. Eidinger RE, Hassner T (2014) Gender estimation of unfiltered faces. Trans Inform Forensics
Security
5. Cortes VV (1995) Support-vector networks. Mach Learn 20(3):273–297
6. Gallagher AC, Chen T (2009) Understanding images of groups of people. In: Proceedings
conference on computer vision pattern recognition. IEEE, pp 256–263
7. Geng X, Zhou ZH, Smith-Miles K (2007) Automatic gender classes estimation based on facial
aging patterns. Trans Pattern Anal Mach Intell 29(12):2234–2240
8. Hillel B, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence
relations. Int Conf Mach Learn 3
9. Chao WL, Liu JZ, Ding JJ (2013) Facial gender classes estimation based on label sensitive
learning and gender classes oriented regression. Pattern Recogn 46(3):628–641
10. Chen J, Shan S, He C, Zhao G, Pietikainen M, Chen X, Gao W (2010) Wld: a robust local
image descriptor. Trans Pattern Anal Mach Intell 32(9):1705–1720
11. Baluja S, Rowley HA (2007) Boosting gender identification performance. Int J Comput Vision
71(1):111–119
12. Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns:
application to face recognition. Trans Pattern Anal Mach Intell 28(12):2037–2041
13. Choi SE, Lee YJ, Lee SJ, Park KR, Kim J (2011) Gender classes estimation using a hierarchical
classifier based on global and local facial features. Pattern Recogn 44(6):1262–1281
14. Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details:
delving deep into convolutional nets. arXiv:1405.3531
15. Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference
on computer vision. Springer, pp 484–498
16. Manohar N, Pranav MA, Akshay S, Mytravarun TK (2020) Classification of satellite images,
information and communication technology for intelligent systems. In: ICTIS 2020. Smart
innovation, systems and technologies, vol 195. Springer
17. Golomb BA, Lawrence DT, Sejnowski TJ (2000) Sexnet: a neural network identifies sex from
human faces. Neural Inform Process Syst
18. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional
neural networks. In: Proceedings of the 25th international conference on neural information
processing systems, vol 1, pp 1097–1105
19. Rafique I, Hamid A, Naseer S, Asad M (2019) Age and gender prediction using deep
convolutional neural networks. In: International conference on innovative computing
20. Akshay S, Apoorva P (2018) Segmentation and classification of FMM compressed retinal
images using watershed and canny segmentation and support vector machine. In: (ICCSP)
2017 international conference on communication and signal processing
21. Manohar N, Kumar YHS, Kumar GH (2020) Supervised and unsupervised learning in animal
classification. In: International conference on advances in computing, communications and
informatics (ICACCI). Jaipur, pp 156–161
22. Fei L, Yajie W, Hongkun Q, Linlin W (2014) Gender identification using SVM based on human
face images. In: International conference on virtual reality and visualization
Deep Learning Approach for RPL
Wormhole Attack
T. Thiyagu, S. Krishnaveni, and R. Arthi
Abstract The network of smart devices and gadgets forms the Internet of things
(IoT). The IoT technology implemented in our day-to-day devices has shown more
advantages to the users. With this, the use of IoT devices has also increased which
increases the network traffic. An increase in network traffic has attracted many
hackers to inject more network attacks. The more the usage, the more it is vulnerable
to attacks. One such IoT attack is the RPL protocol wormhole attack. Thus, there
is a need for an intrusion detection system (IDS) to protect the network data. The
proposed work concentrates on generating real-time wormhole attacks in the Cooja
simulator, and using a recurrent neural network (RNN), deep learning model to detect
and classify the wormhole attack data from the normal data in the IoT network traffic.
The proposed work produced an accuracy of 96%. The F1 score produced is 96%.
Keywords IoT · RPL · Wormhole · RNN · Cooja simulator
1 Introduction
The increased usage of IoT devices in our day-to-day life has raised the importance of
research in the security of these devices [1]. The work records the dataset generation
of the RPL wormhole attack in the Cooja simulator. The generated dataset is deducted
and classified using the RNN model.
Any aspect of our lives takes continuously tracked down by data. Routing occur-
rences produced the Internet of things (IoT) by surrounding devices, regulators, and
T. Thiyagu (B) · S. Krishnaveni · R. Arthi

Department of Computer Science and Engineering, SRM IST, Kattangulathur, Chennai 600203,
India
e-mail: tt0595@srmist.edu.in
S. Krishnaveni
R. Arthi
e-mail: arthir4@srmist.edu.in
https://doi.org/10.1007/978-981-16-7610-9_23
322 T. Thiyagu et al.
selectors. Several tools exist to model the IoT domain, and hence, big data are used
to analyze these technologies more easily [2]. Furthermost, general emulators are
Cooja, GNS-3, notify, and MATLAB. Inappropriately, developing and maintaining
effective IoT contact are a challenging job.
As the amount of data produced grows, the term data protection has become
increasingly important [3]. Particularly, the safety of complex records needs to be
under the ethics of data safety (confidentiality, integrity, and availability) [4]. Since
there is a lack of stable routing rules, there are several occurrences of IoT attacks
such as car hacking, DDoS, and other physical attacks.
1.1 Problem Statement
The security of IoT devices has become a major concern to the users with the increase
in IoT network traffic. The user is affected by novel intrusions day by day. The devices
are vulnerable to denial of service, flooding, worm, and many other attacks. The RPL
protocol attacks pose a major challenge in deduction and mitigation. The availability
of existing datasets produces a higher performance. However, it fails to perform
well when it is implemented practically. Thus, a new dataset is required which is
generated in the real-time IoT environment for every routing attack. Also, a suitable
deep learning model is required to efficiently deduct and classify these attacks from
the normal data traffic.
1.2 Contribution
The following are the contributions shown in this work:

• Generation of network traffic with many nodes in Cooja simulator and establishes
communication between them.
• Injection of RPL wormhole attack in the generated network traffic.
• Recording of the network traffic containing normal and malicious data using
Wireshark as .pacp and .csv files.
• Deduction and classification of malicious data from normal data using the RNN
deep learning model.
• Performance evaluation of the proposed model.
1.3 Paper Organization
In this paper, Sect. 2 presents the background and related works about the RPL attacks
and deep learning models. Section 3 shows the proposed methodology. The result is
Deep Learning Approach for RPL Wormhole Attack 323
discussed in Sect. 4, and the conclusion is discussed in Sect. 5. Future work presents
in Sect.6.
1.4 Background
Cooja Simulator:
With the availability of more IoT network simulators such as ifogsim, cloudsim,
OMNET++, and many other simulators, the Cooja simulator suits best for the given
problem statement [5]. The Cooja simulator provides a wide scope to create many
number of nodes in the wireless network [6]. Figure 1 shows the Cooja interface.
The generation of network traffic in this simulator is easier.
2 Related Works
Many research on RPL attacks has been done in the IoT industry. The analysis of
different categories of RPL attacks such as wormhole attacks, blackhole attacks,
flooding attacks, and synchole attacks is extensively done [7] and gives a survey
about the effects of RPL wormhole attacks, and its detection methods [8] give a
detailed research finding about wormhole attacks. The attack detection is done by
the packets leashes. Temporal leashes and geographical leashes are the two main
categories considered. The work provides both detection and mitigation of the attack.
The work in [9] represents the method of generating dataset in the Cooja simulator
and injecting attacks in the normal traffic. The work explains the method of capturing
the generated dataset through Wireshark. The proposed methodology in [10] shows
the deep learning approach for the generated dataset. The work uses ANN for RPL
Fig. 1 Sink-sender architecture with Cooja ınterface

rank attack detection and classifies it from the normal data packets in the generated
network traffic.
3 Methodology
Figure 2 explains the methodology used to generate the RPL wormhole attack dataset
injected in normal traffic through the Cooja simulator. Then, the generated dataset
is applied to the RNN model to classify them into malicious and normal data.
3.1 Components Used
Simulation: The Cooja simulator is used to generate and record network traffic in
WSN. In this research, the simulator is used to set up 1000 s of nodes and establish
communication between them. Then, the RPL wormhole attack is injected into it.
3.1.1 .pcap and Wireshark
The Wireshark is the most popular application that uses .pcap file interval circulation
monitoring [11]. Wireshark can be used on Windows, Mac OS X, and Linux. As long
as the appropriate programs are enabled, these .pcap files can be accessed. Some
Fig. 2 Proposed methodology

frequently used tools generate pcap files, such as Wireshark, WinDump, tcpdump,
Packet Square-Capedit, and Ethereal.
3.1.2 .CSV File
The .CSV file is a simple text register that encloses a set of documents separated
by commas. Similar data are often used to transfer data between applications [8]. In
this proposed work, the network generated from the Cooja simulator is captured by
Wireshark and stored as .pacap and .csv files.
3.1.3 Feature Extraction
The captured network traffic is sent for feature extraction. Feature extraction is the
process of analyzing the number of properties needed to represent a large amount
of data. Several professionals of deep learning claim that properly optimized feature
extraction is the secret to build successful models. It is a method for identifying
essential information components. Pattern identification and recognizing common
patterns in a wide number of documents are two examples of this approach. Spam
detection is another example of this process [12]. It is an effective data preprocessing
technique that has been scientifically designed to improve feature dimensionality and
improve the efficiency of deep learning in implementation.
3.1.4 Preprocessed Dataset
File preprocessing is a data mining method that includes translating raw data into
an understandable format. Actual data have some missing, unreliable, and defi-
cient in specific performances and patterns. It may also contain numerous errors.
Preprocessing the data is a true way of addressing such problems.
3.1.5 RNN Classifier
The network traffic after data preprocessing is sent to the recurrent neural network
(RNN). RNN is a supervised learning model which processes data in sequence. As
in Fig. 3, the three stages of RNN working are
1. First, the data are moved to the hidden layer which predicts an output.
2. Then, the predicted value is compared with the actual value. The difference is
recorded as a loss function. The less the loss function value, the better the RNN
prediction performance.
Fig. 3 Working of RNN

model
3. Finally, depending on the loss function, the unmatched packets are sent to the
input lane through back-propagation and node values are adjusted to match with
the actual value.
This paper mainly focuses on the performance of the deep learning RNN model in
the classification of network traffic packets as RPL wormhole attack data and normal
data. The network traffic captured from the Cooja simulator forms the base for the
performance evaluation. Figures 4 and 5 show the output of traffic generated in the
Cooja simulator.
4.1 Confusion Matrix
The performance of the deep learning model is effectively done by confusion matrix
which is shown in Fig. 6. The true positive (TP) and false negative (FN) say that the
predicted value matches the actual value. That is, it predicts true for actual true and
false for actual false. On the other hand, true negative (TN) and false positive (FP)
say that the predicted value does not match the actual value. The more the TP and
FN value, the more the accuracy of the detection model (Fig. 6).
Fig. 4 Generated traffic

without wormhole attack
Fig. 5 Generated traffic

with wormhole attack
4.1.1 Evaluation Metrics
According to the work done, the value is true for wormhole attack data and false for
normal data.
Table 1 gives the confusion matrix of the work.
Accuracy:
Accuracy is given by the number of correct predictions divided by the total number
of predictions.
Table 1 Confusion matrix

Actual
positive Negative
Predicted Positive 128 8
negative 0 23
TP + TN
Accuracy = = 0.94.
TP + TN + FN + FP
Precision:
Precision is given by the actual positive values divided by the predicted positive
values.
TP
Precision = = 0.94.
TP + FP
Recall:
The recall is given by total positives divided by the number of correctly predicted
values.
TP
Recall = =1
TP + FN
F1 Score:
The F1 score gives the harmonic mean of recall and precision value obtained.
2 × Precision × Recall
F1Score = = 0.96.
Precision + Recall
The F1 score of the RNN model is high which proves that the proposed RNN
detection model performs well for RPL wormhole attacks in the IoT network traffic.
5 Conclusion
The work shows the importance of research on RPL wormhole attacks and their
severity in IoT networks. First, the network traffic is generated in the Cooja simulator
with normal traffic and wormhole attack which is captured through Wireshark. This
data are sent to the RNN classifier which classifies the dataset to normal data and
malicious data. The performance of the deep learning model is evaluated through
a confusion matrix. The F1 score achieved is 0.96 which shows that the proposed
method performs well for the classification of RPL wormhole attacks.
6 Future Work
The work has generated output for RNN deep learning classification for RPL worm-
hole attacks. The similar work can be extended to various other RPL attacks such
as blackhole attacks, sinkhole attacks, DoS attacks, flooding attacks, rank attacks,
version attacks, and other novel RPL attacks. Also, the work concentrates on detection
techniques. This can be extended by applying mitigation techniques in the network
traffic.
References
1. Pongle P, Chavan G (2015) Real-time intrusion and wormhole attack detection in internet of
things. Int J Comput Appl 121(9)
2. Tahboush M, Agoyi M (2021) A hybrid wormhole attack detection in mobile ad-hoc network
(MANET). IEEE Access 9:11872–11883
3. Cakir S, Toklu S, Yalcin N (2020) RPL attack detection and prevention in the Internet of Things
networks using a GRU based deep learning. IEEE Access 8:183678–183689
4. Morales-Molina CD, Hernandez-Suarez A, Sanchez-Perez G, Toscano-Medina LK, Perez-
Meana H, Olivares-Mercado J, Portillo-Portillo J, Sanchez V, Garcia-Villalba LJ (2021) A
dense neural network approach for detecting clone id attacks on the rpl protocol of the iot.
Sensors 21(9):3173
5. Mahmud A, Hossain F, Choity TA, Juhin F (2020) Simulation and comparison of RPL,
6Lowpan, and coap protocols using cooja simulator. In: Proceedings of ınternational joint
conference on computational ıntelligence. Springer, Singapore, pp 317–326
6. Rana AK, Sharma S (2021) Contiki cooja security solution (CCSS) with IPv6 routing protocol
for low-power and lossy networks (RPL) in ınternet of things applications. In: Mobile radio
communications and 5G networks. Springer, Singapore, pp 251–259
7. Dutta N, Singh MM (2019) Wormhole attack in wireless sensor networks: a critical review.
Adv Comput Commun Technol 147–161
8. Hu Y-C, Perrig A, Johnson DB (2006) Wormhole attacks in wireless networks. IEEE J Sel
Areas Commun 24(2):370–380
9. Malik M, Dutta M (2017) Contiki-based mitigation of UDP flooding attacks in the internet
of things. In: 2017 ınternational conference on computing, communication and automation
(ICCCA). IEEE, pp 1296–1300
10. Choukri W, Lamaazi H, Benamar N (2020) RPL rank attack detection using deep learning. In:
2020 international conference on innovation and intelligence for informatics, computing and
technologies (3ICT). IEEE, pp 1–6
11. Singh U, Samvatsar M, Sharma A, Jain AK (2016) Detection and avoidance of unified attacks
on MANET using trusted secure AODV routing protocol. In: 2016 symposium on colossal data
analysis and networking (CDAN). IEEE, pp 1–6
12. Tun Z, Maw AH (2008) Wormhole attack detection in wireless sensor networks. World
Academy of Science, Engineering and Technology 46
13. Sivaganesan D (2021) A data driven trust mechanism based on blockchain in IoT sensor
networks for detection and mitigation of attacks. J Trends Comput Sci Smart Technol (TCSST)
3(01):59–69
Precision Agriculture Farming
by Monitoring and Controlling Irrigation
System Using Sensors
Badri Deva Kumar, M. Sobhana, Jahnavi Duvvuru, Chalasani Nikhil,

and Gopisetti Sridhar
Abstract IoT facilitates the authorization of things and device activities that are
connected across the cloud network interface remotely. It has a very significant
contribution toward revolutionary farming methods. This paper describes about an
autonomous crop irrigation system. The ability to control and monitor irrigating
plants that not only reduces the human intervention but also to sense and record
the processing of the system status in real time makes our system more unique and
simplified than any existing system.
Keywords Soil moisture sensor · Temperature sensor · Automated irrigation ·

Precision agriculture
1 Introduction
Agriculture is undoubtedly the key development of rise of human civilization, the

largest means of living in India. While agriculture supplies food materials for liveli-
hood, smart agriculture gives employment opportunities to the vast population of
India. The global demand for food has been increasing with the population rise over
the past six decades. In short, the significance of agriculture is so extensive that it
is the only way that can prevail life. In such circumstances, we found it enthralling
to highlight its importance. Modern agriculture plays the vital role. As time passed,
the scientific and technical advancements emerged in agriculture. Precision farming
is a determining key and a decisive part of third phase of modern agro-revolution.
The outworn techniques are changed and replaced with self-regulating and semi-
automatic and preprogrammed techniques. Prior to the usage of technology and
automation in cultivation process, no amount of information was available on farm
and the crop. On this account, farmers had no means of knowing how much amount
B. D. Kumar (B) · M. Sobhana · J. Duvvuru · C. Nikhil · G. Sridhar

CSE Department, VR Siddhartha Engineering College, Kanuru, India
M. Sobhana
e-mail: sobhana@vrsiddhartha.ac.in
https://doi.org/10.1007/978-981-16-7610-9_24
332 B. D. Kumar et al.
of crop they had lost. So, in the account of scientific and technical innovation, the
future of farming is poised to be pushed to new heights.
In the recent times, with the high-speed, secure, and emerging IOT technology,
each and every one can use their smart gadgets to monitor their devices present in
the field to gain real-time stats on their farms. This system can sustain both water
and electricity. With the onset of open-source Arduino boards along with low-tariff
sensors, it is viable to create a device that can monitor the soil wetness and flush
the fields or the landscape subsequently [1]. This technique, in point of fact, mini-
mizes the amount of extra manpower needed for the actual procedure in day-to-day
operations.
Now compared to the existing systems, we wanted to elaborate the accuracy of
the working of the system to an extent to provide sufficient and healthy growth of
the crop. The fact that soil humidity sensor which is practically available outside can
only detect around 20–30 cm based on the texture of the soil. So, to overcome this,
we are going to use multiple sensors in the field making the system capable to read
number of values from these different sensors to calculate the average values and
take the decision based on the instructions. The threshold value given in the system
is also not defined as static or fixed. This threshold value is determined by the type
of soil used and the temperature values. The type of soil is set by the user while
installing the device in a particular field.
1.1 Modern Techniques
There are so many techniques used for irrigation, but the basic and most widely used
techniques in the modern days are drip irrigation system and sprinkler irrigation
system.
1.1.1 Drip Irrigation Method
In this type of irrigation system, the crop roots are irrigated by carrying water directly
to the root nodes using an array of pipes and valves. It is easy to liquid fertilize the
roots of the plants directly which would do a great work in saving both irrigating
water as well as the fertilizers. Figure 1 shows the drip irrigation method used in
recent times [2].
1.1.2 Sprinkler Irrigation Method
Sprinklers are a great method for irrigating large fields and can even save a lot of
water. Small gardens can mostly use normal sprinklers, but larger sprinklers with
plenty of coverage need to be used if you have several acres of lands to irrigate.
One of the best advantages about this method is that it can be automated on a timer.
Precision Agriculture Farming by Monitoring … 333
Fig. 1 Drip irrigation system
Fig. 2 Sprinkler irrigation system
One can set the timer to when the sprinkler system must turn on every single day.
Additionally, they can also allot it to run for a certain amount of time. Figure 2 shows
the modern sprinkler system [2].
2 Existing System
Jani et al. proposed to support aggressive water management for the agricultural land
by implement a smart irrigation system using IoT [3]. Bhanu et al. have designed a
system that considers a few parameters for data analysis using IoT cloud platform [4].
Bajwa et al. developed a smart solution for irrigating plants by using three different
modules which are different sensors [5]. Rawal et al. proposed to build a monitoring
system for determining the humidity [1]. These systems even after being utilized to
the extent to save irrigation and reduce the human labor, they significantly are hard
to implement and would require considerate amount of time to be established.
Koduru et al. proposed a framework-based cloud application for smart irriga-
tion by utilizing excessive rain water [6]. Krishnan et al. have implemented a smart
system using global system for mobile (GSM) communications to provide notifica-
tion messages about job’s statuses such as dampness level of soil [7]. Laksiri et al.
have proposed a smart irrigation system that was said to provide an effective method
to irrigate farmer’s cultivation by implementing both remote and manual irrigation
system and uploading the stats online via Internet [8]. Singh et al. developed a system
by using different sensors to analyze and respond to the different soil conditions [9].
Chapungo et al. have developed a system using specific sensing technologies and
sensor deployment using satellites and drones which irrigates and sprays pesticides
[10]. Amin et al. have developed a system that specifically uses drip irrigation to
maintain the soil moisture within a specific range on potted wheat plants proposing
it to be useful for the wheat production especially in drought areas [11]. Hadi et al.
have designed a system that used the application of IoT for irrigating the gardens
remotely by the owner that allows to both measure and detect the soil moisture [12].
Laabidi et al. have proposed a smart grid irrigation system by dividing the field into
grids and each grid acting as a separate group for the IoT application [13]. These
systems require more number of sensors and would cost high for initial purchase.
Karar et al. have presented a design of a water pump control for the development
for smart irrigation in their system [14]. Karpagam et al. have proposed an IoT
enabled watering system for water management and distribution [15]. Rohith et al.
have designed a smart irrigation system using basic sensors to control the water
[16]. Stojanović et al. have presented an application of digital technologies in the
field of agriculture for supporting smart computers [17]. Murlidharan et al. have
developed an application of precision agriculture using IoT and ML on the basis of
the existing technology [18]. The present day technology allows us to do things that
are imaginable in every aspect possible, but the side effects of these are excessive
investment or cost and lack of adequate knowledge of the complexity involved.
3 Proposed System
In this section, the functionality and working of the device model are explained.
Considering IoT as main principle for the working of system, Arduino UNO micro-
controller is used for further applications like controlling the system automated using
the parameters used from different payloads with each having their own unique
characteristics.
Fig. 3 Arduino UNO
3.1 Materials Used
The materials used in our project are Arduino UNO, temperature sensor, soil moisture
sensor, relay module, irrigation motor, power adapter.
3.1.1 Arduino UNO
Arduino UNO is the basic processor used here for the functioning of the system like
reading the inputs and providing the output as designed. Figure 3 shows the image
of Arduino [4].
3.1.2 Relay Module
The function of this module is to connect the terminals based on the input condition
provided to the module. Based on the switching condition, either of the two terminals
will be powered up which are notated as NC and NO, respectively. Figure 4 shows
the relay module [16].
3.1.3 Irrigation Pump
The irrigation motor is a device which pumps water by pressurizing it from a water
source either pond or groundwater from a well. In other words, this is the heart of
irrigation. The HP of the pump used here can differ based on the capacity of the field
to be irrigated. This pump is controlled by the Arduino which regulates when to turn
on and off based on the conditions met by the program code. Figure 5 is a sample
irrigation pump used for prototype.
Fig. 4 Relay module
Fig. 5 Irrigation pump
3.1.4 Temperature Sensor
This sensor calculates the temperature around the soil to study and understand the
environmental conditions and to determine the amount of irrigation required for the
plants in that specific condition comparing it to the room temperature. The sensor
module used here is LM35. Figure 6 shows the image of LM35 temperature sensor.
3.1.5 Soil Moisture Sensor
This sensor when placed in the land detects the humidity present around its area
on basis of the percentage of electron flow on basis of the sensitivity. It returns the
percentage value as output. Figure 7 indicates the sample soil moisture sensor [4].
Fig. 6 Temperature sensor
Fig. 7 Soil moisture sensor
3.2 Working Principle
The execution of the system starts by first reading the temperature and humidity
levels from the respective sensors. After the values are recorded, the Arduino does the
work by analyzing and performing the designated function. The function performed
here is to power the motor to irrigate the land, thus making this system completely
autonomous. Figure 8 describes the basic flow of the system explaining principle of
working.
Fig. 8 Working principle of

smart irrigation system
Fig. 9 Basic I/O flow diagram
3.3 Basic I/O Flow
The basic process flow structure algorithm can be explained in brief as

The temperature sensor is connected to one of the analog pins like A0, and the
moisture sensor is connected to another analog pin, say A1 from the pins on the
Arduino board. The algorithm of the system can be given as
1. Initialize variables to Arduino pin numbers, threshold to a value and set average
temp.
2. Read the temperature and moisture sensor value into the variables.
3. Map the reading of the moisture sensor to useable values for easier calculations.
4. Convert the temperature sensor values to degree Celsius.
5. Compare temperature values with average value and increase/decrease the
threshold value accordingly.
6. Compare the threshold value to moisture level and turn on the pump if needed.
7. Repeat the process again after some delay.
Figure 9 corresponds the layout of the IO characteristics of the system.
3.4 Implementation
The system functioning can be simply put as the Arduino reading the inputs from the
installed sensors and controlling the relay module which drives the power to the irri-
gation motor. The sensor outputs go through a screening process where the Arduino
analyzes the readings with the threshold values input by the user. These threshold
Fig. 10 Working prototype
values are set based on the crop planted, and the current season which decides whether
the crop requires more irrigation or not. The reading from the temperature sensor is
used to either increase or decrease the threshold value. Higher the temperature value,
more humidity will be required for a healthy crop. The sensor values are transmitted
to the Arduino as digitally binary signals as high and low. We used an Arduino UNO
board which covers most of the processing done by the system. There is an option
to manually override the functioning of the system by just activating the payload
manually using a switch and not through the Arduino board. The process of valua-
tion and analysis is repeated usually for every 1–2 h. The practical implementation
of our system is shown in Fig. 10.
4 Analysis and Experimental Results
Using the parameters and values defined the system analyzes, the practical values
that are saved in the clipboard of Arduino to trace back the output function in a
runtime instance. The sample recorded readings can be observed from the port.
The snaps in Figs. 11 and 12 are the recorded outputs from the sensors by the
Arduino. Based on these values, the Arduino does the decision making to do the
designated job which, here, is the operation of irrigation motor.
Fig. 11 Output recording in dry soil condition
Fig. 12 Output recording in irrigated soil condition
5 Conclusion
This designed system works efficiently using IoT in the basis of precision agricul-
ture to detect the dampness in the soil and analyzing all the practical values from the
sensors used to control the irrigation flow when required. The system is completely
autonomous making it easier for the farmer to reduce extra human effort and save
extra expenses. Our system uses multiple sensors at different parts of the field consid-
ering the fact that a moisture sensor can only work efficiently for a range of around
20–30 cm depending on the soil texture. The system analyzes multiple readings,
process them, and then does the decision making to determine the necessity of water,
thus making the system dynamic to temperature, soil conditions, and texture varia-
tions. This dynamic calculation of threshold value exclusively by the system itself
not only saves time but also makes it portable and realistic for the environmental
changes and is economically feasible.
References
1. Rawal S (2017) IoT based smart irrigation system. In: Proceedings of the international journal
of computer applications, vol 159
2. Prakash BR, Kulkarni SS (2020) Super smart irrigation system using internet of things. In:
2020 7th international conference on smart structures and systems (ICSSS), pp 1–5
3. Kansara K, Zaveri V, Shah S, Delwadkar S, Jani K (2015) Sensor based automated irrigation
system with IOT: a technical review. In: proceedings of the international journal of computer
science and information technologies, vol 6, pp 5331–5333
4. Bhanu KN, Mahadevaswamy HS, Jasmine HJ (2020) IoT based smart system for enhanced
ırrigation in agriculture. In: 2020 ınternational conference on electronics and sustainable
communication systems (ICESC), pp 760–765
5. Safdar Munir M, Bajwa IS, Ashraf A, Anwar W, Rashid R (2021) Intelligent and smart ırriga-
tion system using edge computing and IoT. In: Proceedings of the ınnovative and ıntelligent
technology-based services for smart environments
6. Koduru S et al (2018) Smart irrigation system using cloud and internet of things. In: Proceedings
of 2nd ınternational conference on communication, computing and networking, pp 195–203
7. Krishnan RS et al (2020) Fuzzy logic based smart irrigation system using internet of things. J
Clean Prod 252:119902
8. Laksiri HGCR, Dharmagunawardhana HAC, Wijayakulasooriya JV (2019) Design and opti-
mization of IoT based smart ırrigation system in Sri Lanka. In: 2019 14th conference on
ındustrial and ınformation systems (ICIIS), pp 198–202
9. Singh R, Srivastava S, Mishra R (2020) AI and IoT based monitoring system for ıncreasing
the yield in crop production. In: Proceedings of the international conference on electrical and
electronics engineering, pp 301–305
10. Chapungo NJ, Postolache O (2021) Sensors and comunication protocols for precision agricul-
ture. In: Proceedings of the 2021 12th international symposium on advanced topics in electrical
engineering (ATEE)
11. Amin AB, Dubois GO, Thurel S, Danyluk J, Boukadoum M, Diallo AB (2021) Wireless sensor
network and ırrigation system to monitor wheat growth under drought stress. In: 2021 IEEE
ınternational symposium on circuits and systems (ISCAS), pp 1–4
12. Hadi MS, Nugraha PA, Wirawan IM, Zaeni IAE, Mizar MA, Irvan M (2020) IoT based smart
garden ırrigation system. In: Proceedings of the 4th ınternational conference on vocational
education and training
13. Laabidi K, Khayyat M, Almohamadi T (2021) Smart grid irrigation. In: Proceedings of the
innovative and intelligent technology-based services for smart environments
14. Karar ME et al (2020) IoT and neural network-based water pumping control system for smart
irrigation. In: Proceedings of the arXiv:2005.04158, pp 107–112
15. Karpagam J et al (2020) Smart irrigation system using IoT. In: Proceedings of the 2020 6th
ınternational conference on advanced computing and communication systems (ICACCS), vol
6 (15). IEEE
16. Rohith M, Sainivedhana R, Sabiyath Fatima N (2021) IoT enabled smart farming and ırrigation
system. In: Proceedings of the 2021 5th ınternational conference on ıntelligent computing and
control systems (ICICCS), pp 545–552
17. Stojanović R, Maraš V, Radonjić S, Martić A, Durković J, Pavićević K, Mirović V, Cvetković
M (2021) A feasible IoT-based system for precision agriculture. In: Proceedings of the 2021
10th mediterranean conference on embedded computing (MECO)
18. Murlidharan S, Shukla VK, Chaubey A (2021) Application of machine learning in precision
agriculture using IoT. In: 2021 2nd international conference on intelligent engineering and
management (ICIEM), pp 34–39
Autonomous Driving Vehicle System
Using LiDAR Sensor
Saiful Islam, Md Shahnewaz Tanvir, Md. Rawshan Habib,

Tahsina Tashrif Shawmee, Md Apu Ahmed, Tafannum Ferdous,
Md. Rashedul Arefin, and Sanim Alam
Abstract An overview of light detection and ranging (LiDAR) sensor technology

for autonomous vehicles is presented in this paper. The sensor called LiDAR sensors
is a key component of autonomous driving’s for the upcoming generation as an
assistance function. LiDAR technology is discussed, including its characteristics,
a technical overview, prospects as well as limitations in relation to other sensors
available in the industry. Comparison and comment on sensor quality are based
on factory parameters. The basic components of a LiDAR system from the laser
transmitter to the beam scanning mechanism are explained.
Keywords LiDAR sensor · Velodyne HDL-64E · Laser · Pulse · Sensor
1 Introduction
Every year, around 1.35 million people died because of vehicle crashes throughout
in the world. Among those people, over half are pedestrians, cyclists, and motor-
cyclists and the number goes beyond fatalities. Consistently, each year nearly 50
million peoples are injured vehicle crashes in the worldwide [1]. The great majority
of these accidents have a common thread which is human error and inattention.
Additionally, there are several factors including speeding, distraction, drowsiness,
and alcohol consumption. The autonomous vehicle can assist in reducing risky behav-
iors and accidents. Autonomous driving vehicles are known as driverless vehicles
that combined sensors and the software for control to navigate self-driving. It depends
on their perception systems and ability to gain information from the nearby envi-
ronment. For proper self-driving, it is important to identify the presence of different
S. Islam (B) · Md S. Tanvir · Md A. Ahmed

Technische Universität Chemnitz, Chemnitz, Germany
Md. R. Habib
Murdoch University, Murdoch, Australia
Md S. Tanvir · T. T. Shawmee · T. Ferdous · Md. R. Arefin · S. Alam
Ahsanullah University of Science & Technology, Dhaka, Bangladesh
https://doi.org/10.1007/978-981-16-7610-9_25
346 S. Islam et al.
Fig. 1 Autonomous vehicles Volvo CX-90 used HDL-64E LiDAR [3]
vehicles, pedestrians, and other significant substances. LiDAR abbreviation is light

detection and ranging. LiDAR was discovered in the early 1960s after invention
of the lasers. The wavelengths of these laser waves usually range between 600 and
1000 nm [2]. LiDAR sensors work by sending a focused light beam to a distant object
and then receiving it back with the reflected light. The distance that a remote light
reflectivity of the object is determined by the beam’s intensity. To handle obstacle
situation smoothly on the road and identify roadblocks, we might have installed
rotating device on the top of the autonomous vehicles. Sometimes these devices are
ascended above the bonnet which is called as LiDAR shown in Fig. 1.
LiDAR serves as the self-driving vehicle’s eye and giving a 360° view of its
surroundings to ensure safe driving. In every second, thousands of laser pulses are
sent out by a continuous rotating system which collides with different objects on
surrounding vehicles and reflect the signals. These reflections of light make 3D
point cloud and the computer records every laser reflection point for converting into
animated in 3D representation. The 3D position helps to monitor distance between
the vehicles that cross by other vehicles. It helps to control the brakes in order to slow
down or stop. When the road is clear in ahead it also can allow to rise the speed. The
LiDAR also incorporates by pre-scan technology that helps to achieve collision-free
riding. Not only does LiDAR provide more comfortable driving, but it also enhances
safety. Data is captured by sensors and processed by algorithms perception based
on information represented in the environment. In self-driving vehicles, perception
systems are comprised of active and passive sensors, such as cameras, radars, and
LiDARs. LiDAR is the active sensor that provides emitting laser to the surrounding.
Laser reflection returns are processed by processing the reflected beams.
Compared with other technologies, LiDAR has a significantly higher resolution,
making it very sensitive to objects that might disrupt its path. Distances between
Autonomous Driving Vehicle System Using LiDAR Sensor 347
the vehicle and nearby objects can be accurately calculated with LiDAR sensor.
For interpreting the environment, LiDAR data can easily be transformed into 3D
maps. In low light conditions, LiDAR performs well, regardless of ambient light
variations. Direct measurement of distance is enabled by LiDAR data which does not
require decoding or interpretation, therefore enabling faster processing. In LiDAR,
a large amount of measurements can be made instantaneously and can be precise to
a centimeter.
2 Working Principle
Electronic distance measuring instruments (EDMIs) and LiDARs work based on

the similar principles. It consists of a transmitter, receiver, and reflector (object) as
shown in Fig. 2. The distance among transmitter and the reflector measured by the
travel of the time. Reflectors can be made from either physical objects or artificial
devices, such as prisms. The distance between the LiDAR sensor and object can be
calculated as follows:
t
S =c∗ (1)
2
where s is covered distance, c is the speed of light (3 × 108 m/s), and t is time taken
by laser beam between emission and reception. Various factors contribute to find the
range for pulse-based laser system whose equation is as follows:

Range = (P ∗ A ∗ Ta ∗ To )/(Ds ∗ π ∗ B) (2)
where P is the power of laser, A is Rx optics area, T a is the atmosphere’s transmission,

T o is transmittance of optics, Ds is detector of sensitivity, and B is the radians with
diverging beams.
Fig. 2 Laser range measurement [4]

348 S. Islam et al.
LiDAR consists of three components which are embedded in its package [3]. The
first one is a transmitter which is usually a laser that stands for light amplification
through stimulated emission of the radiations. The laser is characterized by rapid
sending of laser light pulses that emit normally 150,000 pulses/sec. It is classified
according to its wavelength between 600 and 1000 nm toward the shorter wavelength
range. It is not visible in human eye as can be seen in electromagnetic spectrum in
Fig. 3. The majority of methods are based on continuous lasers, and pulsed is used.
Another component is the receiver which is shown in Fig. 4, and travel time is
determined by measuring the incoming light beam. The object’s reflected light scans
the surroundings and generates the 3D coordinates. Low-intensity light demands a
highly sensitive receiver. The following apparatus is used for low light detection:
• Silicon PIN detector
• Silicon avalanche photodiode (APD)
• Photomultiplier tube (PMT).
Fig. 3 Electromagnetic spectrum [5]
Fig. 4 Travel time measurement [6]

The last component is position and navigation system. This navigation system
is used for determining the position and angle with the respect to the current posi-
tion. The object’s position is determined using the global positioning system (GPS),
which measures an object’s longitude, latitude, and altitude. In contrast, the inertial
measurement unit (IMU) allows us to precisely measure an object’s angle [7]. It is
especially used in airborne sensors.
3 Setup and Construction
An active LiDAR sensor always can be divided into two sections. Figure 5 shows the
basic configuration of LiDAR system. One transmitter sends the signal in the form of
a laser beam while a receiver reflected radiation by an optical detector in the form of
a photodiode that electric signal analyzes in computer. The beam expander is able to
include in the system within the sender unit to decrease the branching of light beam
prior it emits into atmosphere. The reflected photons are captured by geometrical
optical structure throughout the atmosphere at receiving end. It is followed by optical
analyzing method that depends on specific application and selects wavelength which
states out accumulated light. The detector receives the specified optical wavelength
and converts it into the electrical signal. The signal performance is defined by number
of time that elapses since the laser pulse is emitted. The object’s distance is determined
by using electronic time measurements and stored data in computer. Figure 6 depicts
the three-dimensional configuration of LiDAR sensor. For perform scanning over at
minimum one layer of these systems focuses on single pair emitter that combined
some moving mirror effect. This mirror reflects the emitting light from diode and also
reflects return light to detector. This device can swiftly measure the surface area, and
Fig. 5 Structure of LiDAR sensor [7]

350 S. Islam et al.
Fig. 6 Principle of mechanical spinning LiDAR [6]
the rate is more than 150 kHz. Wavelengths of LiDAR depend on application, and it
extends the range between 250 and 11 µm. LiDAR uses several beams to minimize
the movement mechanism. For example, Velodyne series uses array laser diode to
enhance point of cloud density.
In automotive LiDAR scanning system, the most popular solution is spinning
mechanism [5]. In general, two types of systems are used: polygonal mirror system
and nodding mirror system, for instance, as shown in Fig. 6, a mechanical spinning
mechanism. The lasers are tilted by an integrated nodding mirror system to create
a vertical field of view. LiDAR base is rotated to achieve 360° horizontal field of
view (FOV). The state of the art for multiple beams is used in LiDAR to decrease the
moving mechanism. Mechanical spinning offers a number of advantages over large
FOV. The Velodyne VLP series, for example, increases point cloud density with
arrays of laser and photodiodes. A rotating mechanism is enormous for implemen-
tation inside the vehicle and is vulnerable to extreme circumstances like vibration,
which is ubiquitous for automotive applications. FOV is the angle that is captured
by a sensor. When using a camera with a LiDAR sensor, it is better to select the FOV
carefully so that the LiDAR outputs match the region covered by aerial photographs.
4 LiDAR Technologies
LiDAR operates with many laser beams by scanning field of view. This is accom-
plished of a delicately constructed beam controlling system. This amplitude pulsed
laser diode emitting at infrared frequency generates with laser beam. The surround-
ings reflect to the laser beam which returns to scanner. Photodetector receives the
returning signal. The signal is filtered by fast electronics, which measures the differ-
ence between transmitter and receiver signals that are proportional to distance. This
difference is used to calculate the range from sensor model. Almost all 3D points and
Fig. 7 Time of flight (ToF) lager rangefinder [8]
intensities corresponding to reflected laser energy are included in LiDAR outputs.

Figure 7 shows conceptual presentation of operating principle.
LiDAR system is divided into two parts: laser rangefinder and scanning system.
Laser rangefinder uses the modulated rays of a laser transmitter to illuminate target.
After optical analysis and photoelectric transformation, the photodiode creates an
electrical signal from scattered photons. Laser beams are usually steered by scan-
ning systems. Different horizontal and vertical angels are shown by ∅i , θ i , where
i indicates an index which specifies in direction that is pointed in the beam. This
portion discusses basic principles of the rangefinder to understand the assessment
process and constraints. In the following, it explains how scanning systems determine
the sensor field of view.
4.1 Principle of Laser Rangefinder
An object-ranging device that measures distance with a laser beam is called a laser
rangefinder. The operation depends on the shape of modulation that is used on the
laser beam. Direct detection rangefinder uses pulsed lasers so that their time of flight
(ToF) can be determined. A frequency-modulated continuous wave (FMCV) works
indirectly for velocity and distance measured using Doppler effect [9]. The term
coherent refers to these types of structures.
4.2 Laser Reception and Transmission
In order to receive the reflected signals, the laser signals must be generated, emitted,
and received by the receiver electronics. Additionally, the rangefinders performance
and cost are determined by reflected signals. The ToF LiDAR sensor requires the
352 S. Islam et al.
pulsed (amplitude modulated) laser signal. A fiber laser diode or pulsed laser is used
for generating this type of signal. The laser diode oscillates as an electric current
flowing through diode junction. There are two types of diode lasers: surface-emitting
semiconductor laser (VCSEL) and edge-emitting lasers (EELs). In the telecommu-
nications industry, EEL has been used for a long-term period. The output of VCSEL
beam is circular, and contrarily EEL sends elliptical laser beam and needs extra optics.
In the automotive applications, pulsed laser diodes are hybrid devices. A capacitor is
mounted on laser chip, and it is activated by a MOSFET transistor [10]. C. Photode-
tector: The photoelectric effect transforms light energy into electrical energy in a
photodetector. One of the most important features is photosensitivity, which speci-
fies how a photodetector reacts when it receives photons. The photosensitivity of a
laser beam depends on its wavelength. As a result, it is very important to consider
the selection of laser wavelength when choosing a LiDAR system’s photodetector.
4.3 APD and SPAD
APD stands for avalanche photodiode, whereas SPAD means single-photon

avalanche diode. It multiplies by applying reverse voltage at the photocurrent using
the avalanche effect. Signals are multiplied by the APD, reducing the effect of noise.
PIN photodiode has a greater internal input power (around 100) with SNR. As a
result, APDs are broadly used in modern LiDAR systems. On the other hand, the
performance of SPAD is much higher than that of APD, with the gain is 106 . This
feature allows SPAD to detect very weak light from a large distance. Additionally, the
technology used to produce CMOS technology can be integrated into photodiodes
in SPAD fabrication using single chip. This is required in order to enhance LiDAR
accuracy while reducing costs and power usage.
4.4 Scanning System
A scanning system permits the lasers to transmit rapidly in a vast area. Mechanical
spinning or solid-state scanning are the two most common scanning technologies.
A rotating mirror system, such as the HDL64 from Velodyne, is typically included
in the former autonomous driving history in its early stages. Automotive industry
preferred moving parts where solid state refers only scanning system.
Figure 8 shows an example of a common product: Velodyne’s HDL64. Although
the Velodyne HDL-64E is a relatively expensive sensor, it is often used in the auto-
mobile sector. It offers a high-resolution picture and 3D information about the envi-
ronment. This sensor has 64 lasers in a group of four, each with 16 laser emitters,
and detectors in a group of two, each with 32 detectors. It is mounted on the car’s
roof and spins constantly at speed is 5–20 Hz. It possesses 360° horizontal and 26.8°
vertical field of view. An angular resolution of very fine angular precision allows
Fig. 8 Velodyne’s HDL64

[3]
for a very clear and detailed view. A distinction of 0.08° can be made between very
small objects. A range of sampling points is also available up to 2.2 Mpoints/s [11].
5 LiDAR Perception System
A vehicle’s perception system translates the perceived surroundings in hierar-

chical terms by analyzing the perception sensor outputs, and we can obtain object
descriptions (physical, semantic, intention awareness) from map data and localiza-
tion. Object detection, tracking, motion prediction, and identification are the four
processes in the typical LiDAR data processing pipeline, as shown in Fig. 9. The
emergence of deep learning technologies has modified this classic flow, which we
will describe following classic approaches. As Velodyne LiDARs are popular in the
research community, the existing data processing methods are applicable on primarily
mechanical spinning LiDAR.
5.1 Object Detection and Recognition
An object detection algorithm extracts and estimates the physical characteristics of

object candidates: the detecting objects’ positions and shapes. This algorithm also
included clustering and ground filtering methods. The ground filtering labels a point
cloud as either ground or not. After that, clustering processes are used to group non-
ground points into various objects. The spherical processing of LiDAR signals (r,
354 S. Islam et al.
Fig. 9 LiDAR perception system [8]
ϕ, θ ) provides a better method. Through distance and other criteria, the rest of the
non-ground objects can be grouped easily [8]. On the other hand, the object recog-
nition method based on machine learning approach provides semantic information
(e.g., types of pedestrians, vehicle, truck, plant, building, etc.). The procedure of
recognition feature extraction is employed to calculate compact object descriptors,
and a step of modeling arises from pre-trained classification objects. Another way to
acquire generic shape features is through principal component analysis (PCA) in 3D
objects. By evaluating the eigenvalues generated by PCA, three salient characteris-
tics (surfaceness, linearness, and scatterness) can be acquired [12]. An example of
supervised machine learning is the classification method that provided by following
feature extraction: The class of an input is predicted by a statistical model trained
on the ground truth dataset. The number of well-known datasets is available, and
KITTI provides an abundance of resources. There are plenty of machine learning
(ML) tools available in the arsenal of ML, some algorithms, such as naive Bayes,
support vector machine (SVM), KNN, and so on, maybe applied [13]. The SVM
involving radial basis functions (RBF) is the most popular method due to efficiency
and speed. Figure 6 illustrates the results of our implementations (SVM with RBF
kernel) on the identified on-road items. The neural network is used to classify LiDAR
objects. Often in practice, classes are unknown, so a classification method can well
handle this situation.
5.2 Object Tracking
Objects are tracked using multiple algorithms which associate and locate the objects
through information received from spatial–temporal consistency. As a model of
movement in state space, a single object tracker evaluations the movement based on
Bayesian filters. By extending the single dynamic model to several operation models,
the interacting numerous model filter can pact with complicated cases. Particle filter
(PF) is another frequent technique that is meant for more broad scenarios that do not
fulfill the Gaussian linear assumption. Radar-based multiple object tracking (MOT)
typically model all detectable objects as points, while LiDAR-based MOT model
their detections as points is distinguished for tracking both the shape and the number
of detected targets. A sophisticated method that uses multiple shapes models: poly-
gons, lines, L-shapes, and points. The form of a moving item changes with varia-
tions in posture and sensor perspective when tracking it. A tracking method has been
developed that estimates the states of both poses and movement simultaneously 2D
polylines representing shapes. LiDAR detects objects as points as opposed to radar,
which represents detections as points [14]. The distinctive feature of MOTs based
on shapes is that the detections can be tracked along with their shapes.
5.3 Deep Learning Methods Emerging
Waves of deep learning (DL) follow the enormous success in computer vision and
speech recognition, and the same applies to LiDAR data processing. An algorithm
for deep learning is part of the machine learning field that works for multilayer
neural network. The traditional methods of machine learning, such as in the same
way that SVM efficiently extracts features from a raw input, DL systems are able
to do the same. DL has also implemented object tracking. A deep structure model
under tracking has been proposed by the traditional tracking algorithm [15].
6 LiDAR and Similar Technologies
RADAR and SONAR are two associated technologies that exploit the same
phenomenon of generating pulses and receiving signals back as LiDAR. Radio detec-
tion and ranging is an acronym for RADAR. It employs longer-wavelength radio
waves. The range of frequency used for detecting an object in front of it is 3 Hz–
3000 GHz. In comparison with RADAR, LiDAR provides more accurate results. The
differences between LiDAR and RADAR are shown in Table 1. Sound navigation
and ranging is an acronym for SONAR. It detects objects using sound waves. The
frequency range is extended from 10 kHz to 1 GHz. The differences between LiDAR
and SONAR are shown in Table 2.
356 S. Islam et al.
Table 1 Difference between

Properties LiDAR RADAR
LiDAR and RADAR
Beam type Laser waves Radio waves
Accuracy Possible and high Not possible
Image surface Provides 3D image Unable exact 3D
image
Transmission and CCD optics uses Uses antennae
reception
Performance Degraded in poor Operates in all
weather conditions weather conditions
Table 2 Difference between

Properties LiDAR SONAR
LiDAR and SONAR
Beam type Laser waves Sound waves
Accuracy High Better than RADAR
Cost High Cheap
Sensor range <=120 m <=4
Working Optimistic Not required clear
environment atmosphere weather
In comparison with RADAR and SONAR, LiDAR is a relatively new technology.

At present, various challenges are being faced by LiDAR such as the following:
• High cost: The cost of LiDAR is high because it can provide three-dimensional
types data. Therefore, it is relatively expensive to deal with multidimensional data
as compared to other technologies.
• False returns: In different atmospheric circumstances such as heavy rainy weather,
fog and clouds LiDAR cannot be utilized properly. Due to raindrops, it diffuses
and refracts light, leading to give sometimes false information.
• Complex Data: In most cases, sensors deliver data point clouds containing x-, y-,
and z-coordinates and aerial photographs do not have the same visual impact. As
aerial photographs satellite imaging may also be used [16].
• Unsafe: There are some risks associated with using LiDAR. There can be a diffi-
culty deciding who is responsible if an accident occurs whether it is the car
manufacturer, the programmer, or the sensor firm.
• Total reflection: An object can be illuminated by focusing a light beam surfaces,
such as flat, reflective ones, which are affected.
A comparison with LiDAR sensor has been included to show the benefits and
drawbacks of sensors with similar purpose. The next generation of autonomous
systems will be appeared secure due to technological advancement. The automotive
industry is increasing their budget for autonomous driving.
• Airborne LiDAR: Airborne LiDAR has made the most progress in recent years by
processing and delivering 3D point clouds. The drone industry is also developing
lightweight sensors and autonomous drones.
• Agriculture: Farming technologies (AgTech) can use LiDAR to identify areas that
get the best exposure to sunshine. A machine learning system can also use the
data to identify crops that need water and fertilizer.
• Robotics: LiDAR technology is utilized to provide mapping and navigation capa-
bilities to robots. This technology is used for self-driving cars, that the vehicle
can detect the distance between itself and other objects in its surrounding [17].
• Exploration for Oil and Gas: LiDAR can detect tiny molecules in the atmosphere
since it has a smaller wavelength than other technologies. Gas and oil deposits
can be traced using differentiable absorption LiDAR (DIAL).
• Land Management: Organizations that manage land resources can monitor them
in real time, which allows for an increased level of efficiency in mapping and less
time spent conducting aerial surveys.
• Renewable Energy: In order to harness solar energy properly, LiDAR can be
adapted to determine optimal panel positions. In addition to calculating wind
direction and speed, it can also be used to place wind turbines at wind farms [18].
• Military and Law enforcement: In military, LiDAR technology is used to identify
the targets such as image processing is used for tanks and missiles making digital
maps of the terrain and different objects in their path. The same principles have
been applied to law enforcement speed limits within cities. A laser speed gun is
used to accomplish this, and a camera is used to capture the images based on the
time of flight for calculating the speed.
7 Conclusion
In this paper, we have provided a review of LiDAR sensor technology and future
safety roads may use it as a companion. LiDAR is generally more precise than camera
or radar in terms of measuring distance. As a result, LiDAR-based algorithms are
highly reliable in evaluating physical information (object positions, headings, shapes,
etc.). We demonstrate that LiDAR-based detection systems for autonomous vehicles
can be compromised by adversaries. Developing deep learning for 3D data from
LiDAR’s will be one of the most important future directions.
References
1. World Health Organization (WHO) (2018) Global status report on road safety 2018. https://
www.who.int/violence_injury_prevention/road_safety_status/2018/en/externalicon. Accessed
28 Oct 2020
2. Liu J, Sun Q, Fan Z, Jia Y (2018) TOF LiDAR development in autonomous vehicle. In: 2018
IEEE 3rd optoelectronics global conference. Shenzhen, pp 185–190
358 S. Islam et al.
3. High Definition LiDAR Sensor for 3D Application, Velodyne’s HDL-64E, White Paper/Oct
2007
4. Fujii T, Fukuchi T (2005) Laser remote sensing. CRC Press, ISBN 10:0-8247-4256-7
5. Warrian P (2018) Mining: the inversion of industry 4.0, CDO conference, Vancouver
6. Wenzl K, Ruser H, Kargel C (2012) Decentralized multitarget-tracking using a LIDAR sensor
network, Graz
7. Weitkamp C (ed) (2006) LiDAR: range-resolved optical remote sensing of the atmosphere,
102, Springer Science & Business
8. Li Y, Ibanez-Guzman J (2020) LiDAR for autonomous driving: the principles, challenges, and
trends for automotive LiDAR and perception systems. IEEE Signal Process Mag 37:50–61
9. Horaud R et al (2016) An overview of depth cameras and range scanners based on time-of-flight
technologies. Mach Vis Appl 27:1005–1020
10. Baker WE et al (2014) LiDAR-measured wind profiles: the missing link in the global observing
system. B Am Meteorol Soc 95:543–564
11. Velodyne LIDAR Key features. Available: https://velodyneLiDAR.com/products/hdl-64e/.
Accessed: 22 June 2021
12. Zermas D et al (2017) Fast segmentation of 3D point clouds: a paradigm on LiDAR data for
autonomous vehicle applications. In: IEEE international conference on robotics and automation
(ICRA), Singapore
13. Li Y, Ruichek Y (2012) Moving objects detection and recognition using sparse spacial ınfor-
mation in urban environments. In: 2012 IEEE intelligent vehicles symposium. Madrid, pp
1060–1065
14. Chiu C, Fei L, Liu J, Wu M (2015) National airborne LiDAR mapping and examples for
applications in deep-seated landslides in Taiwan. In: 2015 IEEE international geoscience and
remote sensing symposium (IGARSS). Milan, pp 4688–4691
15. Rasshofer RH, Gresser K (2005) Automotive radar and LiDAR systems for next generation
driver assistance functions. Adv Radio Sci 3:205–209
16. Baras N et al (2019) Autonomous obstacle avoidance vehicle using LIDAR and an embedded
system. In: 2019 8th international conference on modern circuits and systems technologies
(MOCAST). Thessaloniki, pp 1–4
17. Kim JK et al (2015) Experimental studies of autonomous driving of a vehicle on the road using
LiDAR and DGPS. In: 2015 15th international conference on control, automation and systems.
Busan, pp 1366–1369
18. Duong HV et al (2012) The electronically steerable flash LiDAR: a full waveform scanning
system for topographic and ecosystem structure applications. IEEE Trans Geosci Remote Sens
50:4809–4820
Multiple Face Detection Tracking
and Recognition from Video Sequence
M. Athira, Arun T. Nair, Kesavan Namboothiri, K. S. Haritha,

and Nimitha Gopinath
Abstract Face detection and recognition are a cutting-edge biometric technology.

Numerous approaches and systems have been extensively studied in this subject.
Face recognition is gaining popularity, and the majority of us use it without recog-
nizing it. This detects many faces from a video sequence that has been tracked and
identified. Face detection is performed here using the Viola–Jones technique and
neural networking. By integrating the Viola–Jones approach with a neural network,
it is possible to determine the computing time and develop a resilient algorithm (this
system will adapt and learn new data very fast and system more robust). The system
becomes more robust and capable of detecting more faces, while reducing false
positives. Neural networking is a method for detecting faces that is image-based. For
image tracking, the KLT tracking algorithm is employed, and for recognition, the
eigenface detection approach is used. The purpose of this study is to detect many
faces in a video sequence, follow the face and recognize it.
Keywords KLT tracking · Viola–Jones algorithm · Eigenface method · Haar

features · Neural network · Multiple face recognition · Multiple face detection ·
Multiple face tracking · Region of interest (ROI)
M. Athira (B)
India
A. T. Nair · K. Namboothiri
K. S. Haritha
Government Engineering College, Kannur, India
N. Gopinath
Rajadhani Institute of Science and Technology, Mankara, India
https://doi.org/10.1007/978-981-16-7610-9_26
360 M. Athira et al.
1 Introduction
Faces are critical components of human interactions. Nowadays, the face is employed
as a biometric identifier in a variety of commercial applications, including access
control for security, criminal identification and surveillance. Face detection is a tech-
nique for extracting the human facial region from photographs. Face detection is a
prerequisite for all of the above-mentioned applications and other face-related appli-
cations. There are numerous methods for detecting and recognizing faces. While
face recognition has received much attention in the literature, there have been few
attempts at visual detection of many faces concurrently in videos, which could have
practical uses in video monitoring.
The purpose of this paper is to discuss the detection, recognition and tracking of
several faces in videos. Multiple faces are detected in movies, and the tasks of multiple
face recognition (MFR), multiple face tracking (MFT) and multiple face recognition
are performed concurrently. The Viola–Jones method and a neural network are used
in this example to detect several faces in a video stream. The KLT tracking algorithm
is used to track the detected faces, and the faces are recognized using the eigenface
approach.
2 Literature Survey
2.1 Face Detection
Face detection methods based solely on the Viola–Jones technique and the code
“vision. Cascade Object Detector” demonstrated a lower success rate. The failures
were attributed to the following factors: insufficient lighting, partial occlusion of
the face and a high false detection rate [1]. The proposed strategy for increasing
the success rate includes the development of an algorithm that solves several of the
disadvantages of the Viola–Jones algorithm. It significantly minimizes the rate of
erroneous detection. The success rate was increased to 90%.
2.2 Face Detection Algorithm
Face detection algorithms come in a variety of flavours. These algorithms are roughly
categorized into two kinds [1]: technique based on features and method based on
learning. Faces are detected using feature-based approaches based on a few simple
traits found in the facial regions [2, 3]. They make no allowance for the effect of
ambient light, rotation, or position. The skin colour model is one of the most exten-
sively used approaches in this category. Statistical models and machine learning algo-
rithms underpin learning-based techniques [4]. These techniques are more resilient
Multiple Face Detection Tracking and Recognition … 361
and take more time to compute than feature-based methods. They produce excellent
results in a variety of rotational stances, even in low-light situations. As a result,
these methods are chosen over feature-based methods.
2.2.1 Viola–Jones Algorithm
This is an object detection technique that is based on learning. Here, we’ll use it to
detect faces. It identifies things using Haar characteristics and a cascade of classifiers
[2, 5]. By utilizing the integral image, the Haar characteristics are computed. The
adaptive boost (AdaBoost) method is used to select the best features [1]. This process
occurs at each stage, and there is a cascade of these stages. At each stage, faces that
are wrongly detected are discarded. Thus, the more phases, the more accurate the
face detection.
Haar features are obtained by dividing the entire image into small windows or
rectangular sections of size M × M. Each window’s features is determined indepen-
dently [6]. Haar-like features are rapidly generated using an image’s intermediate
representation—the integral image.
For each window, a huge number of Haar characteristics are computed (approx-
imately 180,000). The majority of these features are superfluous. The AdaBoost
algorithm is used to minimize redundancy. The AdaBoost algorithm is a classifica-
tion function that is used to eliminate redundant characteristics and condense a big
number of features [7]. It is, in essence, a classifier constructed from a weighted
mixture of weak classifiers.
After determining which windows contain the best features, we must now deter-
mine which of these windows contain faces. On average, less than 0.01% of all
windows in an image are positive, containing faces. The first recognized faces must
travel through a series of cascaded stages in order to locate the positive windows.
2.2.2 Neural Network
The detection of objects in images is one of the most frequently utilized application
of neural networks [1]. It does it through the usage of unsupervised neural networks.
The image is used as the input in this case. This network’s job is to locate the required
object(s) within the image (x-coordinate, y-coordinate, width, height of the rectangle
around the object).
The Computer Vision Systems Toolbox is often used in MATLAB to do object
detection. It includes several cascade object detectors based on the Viola–Jones
method. MATALB has object detectors for identifying faces, eyes, mouths and noses,
among other things [8, 9]. Additionally, one can design a customized detector to
identify additional things [10]. A neural network must be trained to develop such
a detector. To accomplish this, the MATLAB package includes the function “train
Cascade Object Detector”. The network must be fed both positive and negative
pictures during the training process. Positive photographs contain the required object,
while negative images do not [11]. Positive images have a border around the desired
thing. This highlighted area is referred to as the region of interest (ROI) [12]. After
the network has been successfully trained, an .xml file is generated. This.xml file
is used by the Cascade Object Detector system object to detect objects in the test
images.
2.3 Face Tracking
The term “object tracking” refers to the process of maintaining a record of a certain
type of object. Due to the primary focus on the face in this research, we monitor human
faces using the input features [5]. Continuous tracking teaches us to disregard issues
like lighting, position fluctuation and so on. Here, human face tracking is performed
on a video sequence [13].
2.3.1 KLT Algorithm
The Kanade–Lucas–Tomasi algorithm is used to track features. It is really popular.

Lucas and Kanade introduced the KLT algorithm, which was later extended by
Tomasi and Kanade [10]. This approach is used to locate scattered feature points that
have sufficient texture to allow for accurate tracking of the needed points [2]. The
Kanade–Lucas–Tomasi (KLT) algorithm is used to constantly track human faces in a
video frame. They do this by determining the parameters that allow for a reduction in
dissimilarity measurements between feature points associated with the original trans-
lational model. To begin, this algorithm calculates the displacement of the tracked
points between frames [5]. It is straightforward to calculate the head’s movement
using this displacement computation. The optical flow tracker is used to track the
feature points on a human face [5, 14]. The KLT tracking algorithm tracks the face
in two easy steps: first, it identifies the traceable feature points in the first frame and
then uses the estimated displacement to track the discovered features in subsequent
frames.
2.4 Eigenface Method
The so-called eigenface technique is one of the easiest and most direct PCA methods
used during face recognition systems [1]. This technique reduces faces to a tiny
collection of key properties called eigenfaces, which form the basis of the first set
of learning images (training set). Recognition is accomplished by projecting a new
image into the eigenface [15] subspace, and then classifying the person by matching
its location in eigenface space to the positions of previously recognized individuals
[8, 16]. Table 1 summarizes the results of the literature survey.
Table 1 Review on face detection tracking and recognition

Author Methodology Features Limitation
[1] Gurlov Singh Eigenface method + DIP-based detect and Accuracy less than 90%
Amit Kumar Goel, PCA recognize face
June 2020
[16] T. Sudheer Genetic algorithm + Pre-processing Less accuracy
Kumar, N. Eigenface classification
Vishwanath, K localization
Karthik, January 2020
[10] Ranganathas, Y. Corner measure Viola–Jones algorithm Tracts only one face
P. Gouramma, IEEE algorithm + KLT KLT point tracking
2016 tracking Greater accuracy than
KLT
[2] Swati Nigam, W_HOG Efficient facial Time-consuming
Rajiv Singh and A. K. expression recognition
Misra, May 2018
[8] Monali Nitin Viola–Jones Increase accuracy
Chaudhari, Mrinal algorithm 78.4–90%, eyes
Deshmukh, Gayatri, detection, glasses
Ramrakhiani, detection
Rakshita, Parvatikar,
IEEE 2018
[12] Agnihotram LBPH + CNN ORL, FEI and the Different dataset gives
Venkata Sripriya, dataset created a different accuracy rate
Mungi Geethika, CNN performed better
Vaddi Radheshyam, compared to LBPH
June 2020
[13] Xiuzhuang Identity specific MFR, MFT Less accuracy
Zhoua, Kai Jina, Qian metric learning
Chenb, September method
2017
[3] S. D. Lathika, K. Micro expression Entropy loss function
K. Thyagarajan, 2017 based deep learning centre loss function
classifier recognize false faces
[4] Shivam Tanvar, Processing image Helpful face
Pronica, Chawla, with special parts recognition features
Rosy Maadam, Preet
Bhadrana, August
2020
3 Proposed Method
3.1 Face detection-Viola–Jones
To begin, the faces in an image are detected using the system’s built-in object detector.
This technique is based on the concept of integral picture and makes use of Haar-like
features. The threshold is modified to decrease the rate of false detection.
3.2 Neural Network for Face Detection
To minimize false detection, we trained an artificial neural network to obtain faces.

MATLAB’s Cascade Object Detector tool was used to conduct the training. To
train the neural network, a collection of photographs was used, which included
photographs of faces from numerous perspectives and with varying backgrounds and
lighting. The region of interest [17, 18] (ROI) on these photographs was defined using
MATLAB’s Training Image Labeler App. The dataset included 100 photographs
taken in a variety of lighting situations. Following training, a “.xml” file was created
and used to recognize faces in the test video. The region that was detected is classed
as a face.
3.3 Tracking of Faces
To begin, this method recognizes Harris corners in the initial frame. After that, it
proceeds to detect points utilizing optical flow by computing the velocity of the pixels
in a picture. The optical flow of the image is computed for each translation motion.
Harris corners are recognized by connecting successive frames’ motion vectors to
create a track for each Harris point. To ensure that we do not lose track of the video
sequence, we apply a Harris detector every ten to fifteen frames. This is nothing more
than verifying the frames on a periodic basis. This allows for the tracking of new
and existing Harris points. In this paper, we will consider solely two-dimensional
motion, specifically translation movement.
Assuming that the initial position of the corner is (x, y). Then, if it is displaced by
certain variable vector (b1 , b2 , … bn ) in the subsequent frame, the displaced corner
point of the frame is equal to the sum of the initial point and the displaced vector.
The new point’s coordinates will be x’ = x + b1 and y’ = y + b2 . As a result,
the displacement should now be calculated in relation to each coordinate. This is
accomplished through the usage of the warp function, which is a function that takes
coordinates and a parameter. It is referred to as Eq. 1,
W (x; p) = (x + b1 ; x + b2 ). (1)
The formation is estimated using the warp function. The initial identified points are
used as a template image in the first frame. The difference between the displacement
and the preceding point is used to calculate the subsequent tracking points in the
following stages. Alignment is determined by Eq. 2,

[l(W (x; p) − T (x)]2 (2)
x
where p denotes displacement. Assume that the original estimate of p is a known

parameter, and then determine p by Eq. 3.

[l(W (x; p + p) − T (x)]2 (3)
x
The displacement is calculated by locating the Taylor series and then differenti-
ating it with respect to p as in Eq. 4.
T
∂w
p = H −1 ∇l · [T (x) − l(W (x; p))] (4)
x
∂p
where H denotes the hessian matrix. This is how the displacement p is estimated
and the next traceable point is located.
3.4 Face Recognition - Eigenface Method
To begin recognizing faces, we built and loaded the dataset. Cropping, resizing and
saving the recognized faces during runtime into a folder is also loaded. Following
that, a random index was constructed using the random function, as well as a random
index of the observed faces. Using random index sequence, photographs from the
database are also loaded into a separate variable. Following that, we find the total of
all of the photographs and subtracted it. These photographs were used to calculate
the eigenvectors. After obtaining the eigenvalues, a matrix was built in which each
row included the signature of an individual image. This means that we now have
the eigenvalues and the image’s signature to identify them. In the previous step, we
subtracted the image’s mean value. Then it was multiplied by the eigenvector. Finally,
depending on the discrepancy between current picture signatures and recognized face
signatures.
3.5 Proposed Algorithm Steps
• Train the neural network and generate the .xml file

• Read the video and detect faces using Viola–Jones algorithm
• Resize and crop detected faces
• Save cropped faces in a folder during run time
• Load database and by using random function generated a random index of detected
faces and for images in database
• Using random index all images in database are loaded
• Calculated mean of all images and subtracted mean from them and find the
eigenvectors
• Created matrix with each row contains signature of individual images
• Subtract mean value from random index of detected images and multiplied with
eigenvector
• Like this recognition is done
• Like this from the detected face facial features are extracted
• Track the detected faces with that facial features using KLT tracking algorithm.
3.6 Result and Discussion
The flowchart of the proposed system is shown in Fig. 1.

Recognize the faces. The location of faces in a video frame is determined using
an.xml file and the “vision. Cascade Object Detector” System object. The cascade
object detector detects objects using the Viola–Jones algorithm and a trained clas-
sification model. While the detector is configured by default to detect faces, it can
also detect other types of objects. By changing the threshold, erroneous and missed
detections are minimized. Figure 2 shows detected faces with bounding box from
video frame.
Cropping the recognized faces and resizing the images to the dimensions of the
photographs in the database (i.e. 92 and 122) as well as converting RGB images
[19, 20] to grayscale images and saving them to the folder. Figure 3 illustrates the
cropped photographs.
The Kanade–Lucas–Tomasi (KLT) algorithm is used to follow the face throughout
time. While using the cascade object detector on each frame is viable, it is compu-
tationally expensive. This algorithm recognizes the face only once and then tracks
it across the video frames using the KLT algorithm. The KLT algorithm traverses
the video frames by tracking a set of feature points. Once the face is detected, the
following step in the example is to identify feature points that can be tracked reliably.
After identifying the feature points, you can now track them using the vision point
tracker system object. The point tracker attempts to locate the appropriate point in
the current frame for each point in the previous frame. Then, using the estimate
geometric transform function, the translation, rotation and scale between the old and
Fig. 1 Algorithm of the

proposed system
Fig. 2 Detected faces
Fig. 3 Cropped detected faces
new points are estimated. This transformation is done to the faces’ bounding box.
Figure 4 depicts the face traits that were detected and the detection process
Simultaneously, the cropped faces saved in a folder during run time are read,
the random index of the photographs is determined, and the eigenvalue is calculated,
yielding a matrix of all image signatures. Subtracted the average value from the image
to be recognized. Then it was multiplied by the eigenvector. Finally, based on the
difference between existing picture signatures and the recognized facial signature.
Figure 5 illustrates the eigenface approach for face detection.
It achieves a 96% accuracy in face detection alone due to the usage of a neural
network and the Viola–Jones algorithm, but drops to roughly 92% when combined
with recognition and tracking. That indicates that out of every 100 trials, it may make
eight errors.
The dataset is collected from mainly two Web sites for face detec-
tion tracking and recognition “https://www.mathworks.com/matlabcentral/fileex
Fig. 4 Features extraction and tracking
Fig. 5 Recognized faces
change/47105-detect-and-track-multiple-faces” “https://www.nzfaruqui.com/face-
recognition-using-matlab-implementation-and-code/”, and the face database is
created by including needed faces, and further, changes to the dataset are made
as for multiple face detection tracking and recognition.
4 Conclusion
The combination of the Viola–Jones algorithm and the neural network results in a
higher level of accuracy in face detection than the Viola–Jones algorithm alone. It is
more than 90% accurate. Multiple faces are discovered with the help of Viola–Jones
and the neural network. The facial traits are recognized, and the Kanade–Lucas–
Tomasi tracking system tracks the faces using those features. Additionally, it may
occasionally fail to recognize the face when the person rotates or tilts his head.
Face recognition is accomplished using the eigenface approach. Face recognition is
demonstrated in isolation in the following figures. Face recognition may occasionally
fail due to lighting circumstances, although the mistake probability is much lower.
Its accuracy rate is 92% in this case.
References
1. Singh G, Goel AK (2020) school of computer science and engineering “Face Detection and
Recognition system using digital image processing, IEEExplore
2. Nigam S, Singh R, Misra1 AK (2017) Efficient facial expression recognition using histogram
of oriented gradients in wavelet domain, Springer Science
3. Lalitha SD, Thyagarajan KK (2018) Microfacial expression recognition based on deep rooted
learning algorithm
4. Tanvar S, Chawla P, Maadam R, Bhadrana P (2020) Authentication of face using MATLAB.
IEEE PLOREISBN:978-1-7281-5371-1
5. Boda R, Jasmine Pemeena Priyadarsini M (2016) Face detection and tracking using KLT and
Viola Jones, School Electronics and Communication Engineering. ARPN J Eng Appl Sci
6. Singh N, Daniel N, Chaturvedi P (2017) Template matching for detection & recognition of
frontal view of human face through matlab, ICICES
7. Boda R, Jasmine Pemeena Priyadarsini M (2016) School of Electronics and Communication
Engineering, Face detection and tracking using KLT and Viola Jones. ARPN J Eng Appl Sci
11(23)
8. Chaudhari MN, Ramrakhiani G (2018) Face detection using Viola Jones algorithm and neural
networks, 978-1-5386-5257-2/18/$31.00 ©IEEE
9. Rizwan SA, Kim K, Jalal A (2017) An accurate facial expression detector using multi-
landmarks selection and local transform features. IEEE
10. Ranganathas YP, Gouram (2016) A novel fused algorithm for human face tracking in video
sequence. In: International conference on computational system and information system for
sustainable solution
11. Suresh D, Rohit Kumar K, Subin S, Shanbhag S, Naveena N (2020) Int Res J Modern Eng
Technol Sci 02(04)
12. Sripriya AV, Geethika M, Radhesyam V (2020) Real time detection and recognition of human
faces, ICICCS IEEE Xplore Part Number:CFP20K74-ART; ISBN: 978-1-7281-4876-2
13. Zhou X, Jin K, Chen Q, Xu M, Shang Y (2017) Multiple face tracking and recognition with
identity-specific localized metric learning
14. Lalitha SD, Thyagharajan KK (2019) Micro-facial expression recognition in video based on
optimal convolutional neural network (MFEOCNN) algorithm, journal
nosis of diabetic retinopathy. Int J Image Graph 20(4):2050030 (29pages). World Scientic
Publishing Company. https://doi.org/10.1142/S0219467820500308
16. Sudheer Kumar T, Vishwanath N, Karthik K (2020) Face detection using matlab. IJSDR, vol 5
(29 pages). World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056
18. Nair AT, Muthuvel K Blood vessel segmentation and diabetic retinopathy recognition: an
intelligent approach. In: Computer methods in biomechanics and biomedical engineering:
imaging & visualization, Taylor & Francis. https://doi.org/10.1080/21681163.2019.1647459
19. Nair AT, Muthuvel K, Haritha KS (2020) Effectual evaluation on diabetic retinopathy, Lecture
Notes, Springer
20. Nair AT, Muthuvel K, Haritha KS (2021) Blood vessel segmentation for diabetic retinopathy,
publication in the IOP J Phys Conf Ser (JPCS)
Review Analysis Using Ensemble
Algorithm
V. Baby Shalini, M. Iswarya, S. Ramya Sri, and M. S. Anu Keerthika
Abstract Today digital reviews play a vital part in influencing the customer. E-
commerce companies provide a platform for consumers to share their thoughts and
comments, and thus, it provides an insight into the performance of the product to the
company as well as to buyers. To make it as useful, one classification of a review
is needed. Opinion mining is also known as sentiment analysis which in general
is a process of extracting subjective information from the data collected. Machine
learning provides better insights by automatically analyzing the product review and
separating them into classes and labels. Opinion mining is an artificial intelligence
tool, and its research is very useful for determining the sentiment of comments. A
feed-forward neural network classifier is used to determine the sentiment tendency of
the comment. The proposed revamp of the sentiment analysis approach was compared
with RNN and CNN approaches. The result is displayed in the form of a chart that
has higher precision. Thus, this technique is helpful for comments analysis.
Keywords Machine learning · Sentiment analysis · Reviews · Opinion mining ·

Classifier
V. Baby Shalini (B) · M. Iswarya · S. Ramya Sri · M. S. Anu Keerthika

Department of Information Technology, Kalasalingam Academy of Research and Education,
Krishnankoil, India
e-mail: v.babyshalini@klu.ac.in
M. Iswarya
e-mail: 9917008008@klu.ac.in
S. Ramya Sri
M. S. Anu Keerthika
https://doi.org/10.1007/978-981-16-7610-9_27
374 V. B. Shalini et al.
1 Introduction
In recent years, a huge amount of data like reviews and opinions are collected through
web sites and social networking sites. Because of the rapid growth of the internet
and social media [1], an increasing number of people are beginning to openly share
their views on the Internet [2, 3]. It shows the importance of sentiment analysis in
different fields. Every day, a large amount of feedback is shared on the internet [4].
The proposed revamp of the sentiment analysis method is used to determine the
user’s opinion in a chunk of text.
Sentiment analysis aims at getting emotion or opinion-related knowledge espe-
cially when the data received is large in size. Sentiment analysis is a text analysis
tool, which involves natural language processing (NLP) [5], machine learning (ML),
data mining, knowledge retrieval, and other research areas. The sentiment analysis
of comments primarily focuses on the sentiment orientation of the comment corpus
[5]. Here the NLP is the best way to uncover and understand the emotion expressed
in the text.
The NLP is the preprocessing method that is performed before the actual imple-
mentation of the sentiment analysis model. The analysis of the review [6] indicates
the user’s emotions and is classified as positive or negative [7] set up a slant inves-
tigation through removing number of tweets with the assistance of prototyping and
the outcomes coordinated client’s perspectives by means of tweets into positive and
negative. We can perform this sentiment analysis using various algorithms of machine
learning [8] and deep learning [9]. This algorithm has been used by numerous special-
ists for picture characterization [10] and tweet classification [11]. Even our proposed
method is using two algorithms to make the classification effective.
In this paper, the standard term frequency–inverse document frequency (TF-IDF)
algorithm incorporates the contribution of the word’s sentiment to text sentiment
classification and the weighted word vector is created. The word vector incorporates
a low dimension and holds semantic information of the word. However, distributed
word vectors do not contain sentiment information regarding words. The sentiment
analysis technique of comments using LSTM and naive bayes classification (NBC)
[3] is proposed. In this research, the different sentiment analysis researches were
expounded and the experiments also done, and finally, the proposed system get
summarized.
The rest of the paper is organized into four different sections. Section 2 covers
various previous systems and their drawbacks. Section 3 explains the proposed
method, and Sect. 4 includes results and discussion. Section 5 summarizes the review
of the work.
Review Analysis Using Ensemble Algorithm 375
2 Related Work
Hongchengsoong et al. [12] used to input the data in various formats such as HTML,
PDF, Word, and XML. The document specified that the corpus was converted into text
and preprocessed. One of the most crucial stages in the development of successful
classifiers is feature extraction. Sentences that contain subjective expression are
kept, while sentences containing objective communications are rejected. There are
three different types of sentiment classification: supervised, unsupervised, semi-
supervised. Supervised learning methods include support vector machine, artificial
neural network, random forest, decision tree, naive bayes, and K-nearest neighbor.
Unsupervised methods utilize a dictionary-based approach. Semi-supervised or
hybrid approaches are used to overcome the flaws of both supervised and unsu-
pervised methods. The drawback is that it does not determine which methods of
separation will produce the best results.
Surya et al. [13] used the amazon product review with 600 records. The method
used to split data is the naive bayes classifier. Prior and posterior probability is used
to classify the data. The data is separated into two categories: training and testing.
The purpose of test data is to analyze it. An R-analysis tool is utilized, and different
packages are installed. The pure command of corpus is used for processing. Before
proceeding, the sentence is divided into words and the consistency of each word is
examined. After this process, an algorithm is applied. Each word is considered for
both positive and negative likelihood. The high probability type will be the result
of a split. The output is in matrix mode. Predicting accuracy, using the confusion to
predict accuracy, the accuracy obtained is 80%. The disadvantage is word similarity,
and the negation handling of words can be improved to get a better result.
Mohammed et al. [14] have developed a technique for analyzing emotions utilizing
both content and emoticon. Machine learning and deep learning are used. The
database is used for twitter-based flight updates. The dataset has 14,460 reviews.
The information was tokenized, and the stop words, URLs, digits were eliminated.
In the next section, the punctuation marks and emoticon are removed and the opinion
is analyzed. Next by combining content and emoticon information, the feelings were
considered utilizing ML and DL algorithms like SVM, NBC, RF, LSTM, and CNN.
The algorithm separates and highlights like TF-IDF, bag of words, n-gram, and
emoticons. In each case, apply ML and DL algorithms and record their scores. As a
result, with combined text and emoticon data, 89% accuracy is obtained.
Nithyashree et al. [15] are very focused on collecting data from tweets. These
tweets are downloaded using the programming language of the Twitter API and java.
In this paper, the author releases the text of a particular hotel from twitter and the
comments are converted into an information framework. The SVM machine learning
process is used to separate tweets. For classification and regression, the ML algorithm
is utilized. After collecting data, every piece of information will be labeled using an
unsupervised algorithm and the words are compared to text files (positive, negative)
if there is a similarity then the word is classified. As a result, the accuracy obtained
is 61.11%. The drawback is the accuracy of the result.
Erfianjunianto et al. [16] proposed a text mining model for emotion detection
that is applied using particle swarm optimization and naive bayes classifier. Here the
dataset is taken from Twitter. The input data is divided into three groups: The data is
preprocessed by the following techniques transforming the cases tokenization, stop
word removal, and stemming. After that, the vector creation is done by using the TF-
IDF. The weighted vectors get optimized by using the particle swarm optimization
approach, and then the naive bayes classifier method is used for the final step of
classification and the data get classified as anger, fear, joy, sadness. Nearly 7000
data have been used in this research methodology, and the output is represented as a
confusion matrix. The results of this classification method using PSO and NBC have
an accuracy of 66.54% but the drawback is that it takes a lot of time to perform PSO
and NBC.
Wan et al. [17] compared six classification methods with the proposed outfit
approach, which combines the five particular algorithms into a multi-algorithm
ensemble-based classification including NBC, SVM, Bayesian network classifier,
decision tree, and random forest. All methodologies were trained and evaluated
on a similar dataset of 12,864 tweets, with the classifiers being validated using a
tenfold evaluation. The ensemble classifier achieves the highest accuracy of 84.2%
in the experiment with three classes in terms of precision, recall, and F-measure. The
ensemble classifier also has the maximum accuracy of 91.7%, in the test with just
two classes positive and negative which is more accurate. The drawback is real-time
data analysis is not done.
Suci et al. [18] have used a naive bayes algorithm in paper. The data was gathered
from a YouTube comment containing a KFC video salted egg and a tweet about
the KFC salted egg. The paper is separated into two sections, before and later than
affirmation where the information is collected for preprocessing, and at last, classifi-
cation is done and results represented as a confusion matrix. The methodology here is
including problem identification, information preprocessing, information handling,
and evaluation. The accuracy achieved in this paper was 86.48%.
Guixian et al. [19] elaborated that bidirectional long short-term memory
(BiLSTM) is used for analysis and they have used the TF-IDF algorithm for word
vectoring. It is used to generate the weighted word vector as an input. Then the
output from BiLSTM is utilized to represent the text. At last, the neural network and
softmax mapping are used to obtain the tendency of sentiment text. Here they have
collected 15,000 hotel comment text from web sites. The drawback of this paper is
the training period that takes a longer time.
Perera et al. [20] referred that opinion mining or sentiment analysis is such a
fine method to examine the comments, and it also classifies it polarity as positive,
negative, and neutral. Usually, opinion mining has three distinct degrees that are
document-based, sentence-based, and aspect-based. From the above three levels, the
author specifically focused on the aspect-based level. The proposed system holds
preprocessing, aspect extraction, dependence parser, and SentiWordNet. Here, the
author collected the data from the Zomato application were the collected reviews
of 100 restaurants. The passable accuracy of this system is 70%. In this paper, the
estimation result occupied from “testing manually” and “testing systematically,” and
compared. In their future work, they have said that could improve the best approach
to discover the opinion word associated with aspect opinion level.
3 Proposed System
Sentiment analysis is a technique for analyzing subjective data within the text. This
is the method for extracting useful information from people’s feelings, thoughts,
and emotions about entities, events, and their attributes. Customers use and make
decisions about online shopping and decide based on the views of others have an
enormous effect on the product. A method using BiLSTM and naive bayes to analyze
the sentiment of reviews is proposed. The proposed system as shown in Fig. 1 consists
of four parts – input, preprocessing, classification, and output. The input data are the
comments received from the users so a user interface is created, and the dataset for
training and testing the model is uploaded in the database before implementation of
this model.
The data provided through the user interface get stored in the database as well.
The preprocessing is done once the data get loaded into the system. In the prepro-
cessing, the stop words, punctuations, and other supportive words which have no
value will be removed and the preprocessed data get into the classification process.
This preprocessing is accompanied by the BiLSTM algorithm. The classification of
data is done using the NBC algorithm. The output of this model is illustrated in the
form of a pie chart.
User interface User Comments
Database
Pre-processing
& Classification
Admin Interface Output

3.1 Input Design
The relation between the device and the user is the input design. It entails creating
data preparation specifications and procedures, as well as the steps required to convert
transaction data into a functional format for processing, which can be accomplished
by entering the data directly into the system. The input is created in such a way that it
offers protection and convenience while maintaining privacy. The dataset documents
were collected from the web site that we have created. It is a data collection, and
it can be in any format like, for example, HTML, CSV OR XML. The data that is
collected should be divided into phrases or tokens. Tokenization is the process of
filtering unnecessary words. It converts the sentence into words. We also designed a
user interface web application using Django to get real-time input.
3.2 Preprocessing
In order to avoid overloading and storage issues, preprocessing or data cleanup is

essential. Prior to preprocessing, all the special characters or any other words which
do not add any value to the analytics part are deleted. The stop words also have to
be removed from the data. The stop words like is, was, the, it, and so on should be
removed. The procedure of compiling documents prior to entering the main stage
was carried out at this stage. The preprocessing is the first and most essential step in
classification.
3.3 Classification
Naive Bayes uses a language model to assign class names sometimes, which can
be represented using mathematical strategies [9]. Naive Bayes is a conditional
opportunity model, based on the bayesian theorem.
P(B|A)P(A)
P(A|B) =
P(B)
• P(A|B) the probability of event A ocuuring, given event B has occurred
• P(B|A) the probability of event B occurring, given event A has occurred
• P(A) the probability of event A
• P(B) the probability of event B
Let us visualize the word cloud of sentences straightforwardly and negatively. We

consider that in a literal sense, there are words such as “good,” “best,” “very good,”
and so on. With negative emotions, we can see the words “bad,” “disappointing,”
“very bad,” “poor,” etc. Some special characters and invalid characters in the result
need to be deleted.
Before feeding our data on learning algorithms, we need to process it in advance.
In this database, we removed all punctuation marks and convert all characters into
lowercase letters and splited the database into a train and test setup. The product
update database is collected from the web site, and the database is intended for
analysis purposes.
In this work, a method of analyzing the feelings of the comments based on the
naive bayes algorithm is proposed. LSTM is an active neural network of sentence
formation with its ability to capture long-term dependence. BiLSTM uses forward
and backward to process sequences.
After this stage, the information that was prepared for classification was created.
This stage utilizes the naive bayes classifier to calculate the probability value from
a report to decide its class. The prior procedure was supposed to make classification
easier while also improving accuracy. Now, this is where machine learning comes.
In this process, supervised learning method was applied. We can employ a lexicon
(pre-classified set of words dictionary) or a bag of words to model the information.
When it comes to classifying data, an algorithm is the most important. The super-
vised learning method is used for testing and classification of the data we collected
to attain good accuracy. The naive bayes classifier was employed in this study as the
classification algorithm [9]. Classification technique based on bayes theorem it is
very easy to built and useful for very large dataset. The NBC is simple and effective.
The naive bayes classifier technique is a text document classification algorithm that
is often used. This technique is used to classify data, and it is simple and accurate.
After the preprocessing stage, the algorithm is applied.
3.4 Output Design
The best output is which meets all the prerequisites of the end client and presents the
data. In any framework, results are conveyed to the clients and to other framework
through outputs. The item comments are gathered from the web site and that data is
considered for analysis.
In this work, the revamp of sentiment analysis using reviews based on BiLSTM
and naive bayes classifiers is proposed. LSTM is an effective neural network for
sentence modeling for its ability to capture long-term dependencies. BiLSTM uses
a forward and backward LSTM to process sequence. We are using Django to
import models, forms, application configurations, and NLTK for natural language
processing, matplotlib. We try to predict the positive (label 1) or negative (label 0)
sentiment of the sentence.
It is represented as a pie chart indicating the positive and negative. The chart also
represents the percentage of both the labels, and it performs analysis on real time
which is the biggest advantage of this work every comment given by the customer
got added into the analysis dataset; thus, those comments also get into count up to the
Table 1 Table of analysis

Topic Total review Positive Negative
Product comments 60,666 33,158 27,508
last analysis and we reviewed nearly 60,000 comments which include the training
dataset also. Table 1 shows the results of the analysis done.
4 Result and Discussion
Implementation is a phase of a project in which the formation of theology is trans-

formed into an effective programming model. The most critical stage is finding a
successful system and giving confidence to the new user system that it works best
and most effectively. It begins with user comments through the user login so first the
user module is created to get the comments from the user after that the admin module
along with the admin login setting and database is created and the comments from the
user login are the dataset or the input for the analysis. Before that the preprocessing
is done, and finally, the output is displayed.
In this paper, the input data is the real-time data from the user. Web site was
developed for collecting data from user. Our proposed method uses naive bayes
classifier and BiLSTM. One method is used for preprocessing, and the other method
is used to classify the reviews. The preprocessing is done to remove all the unwanted
data. It divides the paragraph into a sentence or sentence into a set of words. Special
characters and stop words are also removed. After the preprocessing stage, the data
is ready for classification.
In this paper, naive bayes classifier (NBC) is used for classification to get the
accurate result of the input data. As a result, the prediction is shown with three
representation graph charts (spline, pie chart, column chart) for better understanding.
The real-time analysis of the input data was performed compared to the existing work.
This is an ensemble algorithm, and thus, it has more accurate results.
The result of this paper is shown in Figs. 2, 3, and 4. The charts show the best
performance is achieved when using the naive bayes classifier and also can obtain
very high accuracy. BiLSTM looks for context information and can get a better repre-
sentation of text comments. With comparative testing of other traditional analysis
methods, by using NBC and BiLSTM we get a better result.
Spline chart is line graph that associates the plotted focuses in straight line. In this
outline, the x-axis with 0 indicates the negative and 1 indicates the positive. In the y-
axis, the number of comments is represented. The point at the top shows the positive
comments and the point at the underneath demonstrates the negative comments.
Pie chart is a circular graph, wherein the blue segment shows up for the negative
comments and the red part show up for the positive comments.
Fig. 2 Spline chart
Fig. 3 Pie chart
Fig. 4 Column chart

A column chart is helpful and is utilized for a better understanding of the outcome.
In this representation, the positive is 1 and negative is 0 which are coordinated along
the horizontal lines and the number of comments is coordinated along the vertical
line.
5 Conclusion
In this paper, a method of analyzing emotions is suggested and used in the process
of analyzing ideas. Due to the lack of word representation in the current study, the
data collections of the combined data are integrated with the TF-IDF algorithm for
the calculation of term weight, and a new way of representing the word vector is
proposed based on term weight development. Also, the model fully analyzes contex-
tual information and can obtain a better textual representation of comments. Finally,
with a neural feed-forward network and a softmax map, a tendency to text sensi-
tivity is achieved. By using a comparative study of traditional methods of emotional
analysis, the accuracy of the proposed method of analysis is improved. However, the
comment analysis method takes longer in the training model. In the future, a way to
speed up the model training process can be studied.
References
1. Alattar F, Shaalan K (2021) Using artificial ıntelligence to understand what causes sentiment
changes on social media. IEEE
2. Amara S, Balaji K, Subramanian R, Akshith N, Murthy GN, Vikas M (2021) A survey on
sentiment analysis. IEEE
3. Long F, Zhou K, Ou W (2019) Sentiment analysis of text based on bidirectional LSTM with
multi-head attention. IEEE
4. Wijayanto UW, Sarno R (2018) An experimental study of supervised sentiment analysis using
Gaussian Naive Bayes. IEEE
5. Verma B, Thakur RS (2018) Sentiment analysis using lexicon and machine learning-based
approaches: a survey. Springer
6. Kumar KLS, Desai J, Majumdar J (2016) Opinion mining and sentiment analysis on online
customer review. IEEE
7. Kariya C, Khodke P (2020) Twitter sentiment analysis. IEEE
8. J. Ramakrishnan DM, Srinivasan K, Mubarakali A, Narmatha C, Malathi G (2020) Opinion
mining using machine learning approaches: a critical study. IEEE
9. Dhola K, Saradva M (2021) A comparative evaluation of traditional machine learning and deep
learning classification techniques for sentiment analysis. IEEE
10. Umer M, Sadiq S, Ahmad M, Ullah S, Choi GS, Mehmood A (2020) A novel stacked CNN
for malarial parasite detection in thin blood smear ımages. IEEE
11. Sadiq S, Mehmood A, Ullah S, Ahmad M, Choi GS, On BW (2021) Aggression detection
through deep neural model on twitter. IEEE
12. Soong HC, Rehanakbar, Norazirabint, Jalil IA, Ayyasamy R (2019) The essential of sentiment
analysis and opinion mining in social media. IEEE
13. Surya Prabha PM, Subbulakshmi B (2019) Sentiment analysis using Naive Bayes classifier.
IEEE
14. Ullah MA, Marium SM, Begum SA, Dipa ND (2020) An algorithm and method for sentiment
analysis using the text and emotion. Elsevier
15. Nithyashree T, Nirmala MB (2020) Analysis of the data from the twitter using machine learning.
IEEE
16. Erfianjunianto, Ranchman R (2019) Implementation of Text mining model to emotion detection
on social media comments using particle swarm optimization and Naive Bayes classifier. IEEE
17. Wan Y, Gao Q (2015) An ensemble sentiment classification system of twitter data for airline
services analysis. IEEE
18. Ramdhani SL, Andreswari R, Hasibuan MA (2018) Sentiment analysis of product reviews
using Naive Bayes algorithm: a case study. IEEE
19. Xu G, Meng Y, Qiu X, Yu Z, Wu X (2017) Sentiment analysis of comment texts based on
BiLSTM. IEEE
20. Perera IKCU, Caldera HA (2017) Aspect based opinion mining on restaurant reviews. IEEE
A Blockchain-Based Expectation
Solution for the Internet of Bogus Media
Rishi Raj Singh, Manish Thakral, Sunil Kaushik, Ayur Jain,

and Gunjan Chhabra
Abstract Fake media, also known as the Web of dishonest media, has emerged in a
variety of areas of digital culture, including politics, media, and social networks. Due
to the frequency with which the media’s credibility is threatened, radical measures
are required to prevent further deterioration. IoFMT is becoming more common
with today’s artificial intelligence and deep learning developments; however, such
concessions to learning may be severely limited. In order to define ownership and
integrity of all digital output, it is critical to present evidence of its authenticity.
A blockchain is a digital ledger of distributed ledger technology. A promising new
decentralized safety platform has been proposed in order to assist in dealing with
the problem. In a data-driven environment, fake media’s technical component is
crucial although several blockchain-based solutions for authentication have been
presented. However, the majority of existing studies are based on irrational post-
incident beliefs. This proposal proposes a preventative approach for IoFMT utilizing
a blockchain-based solution, the suggested approach also incorporates a weighted-
ranking algorithm to identify the truthfulness of misinformation while providing
an incentive feature to encourage its dissemination. Although our approach focuses
on fake news, the platform can also be used to create other kinds of electronic
information. This position applies to a demonstration of the benefits of the solution
proposed.
Keywords Bogus media · Blockchain · Security and threat
R. R. Singh (B) · M. Thakral · S. Kaushik · A. Jain · G. Chhabra

School of Computer Science, University of Petroleum and Energy Studies, Dehradun, India
G. Chhabra
e-mail: gchhabra@ddn.upes.ac.in
https://doi.org/10.1007/978-981-16-7610-9_28
386 R. R. Singh et al.
1 Introduction
The phrase “fake information” was coined in 2001, and it is currently used online
365 percent more often than it was at the time of its creation. It is the responsibility of
media outlets to deal with misleading information, which is a sensitive and difficult
topic to address. The spread of fake news is becoming more prevalent in a number
of areas, including the economics, politics, and diplomacy, among other things, as
technology development continues to improve. This is something that does occur
from time to time, although it is rare. Thanks to a large number of free tools currently
accessible, it is now simpler than ever for individuals to create and falsify fake data.
Individuals may now generate and forge fake data more easily than ever before
because of the abundance of free tools available to them. A large proportion of the
population may benefit from the applications and technologies created by Song,
Kim, Hwang, and Lee since they are easily accessible to the public. Because of
the proliferation of social media networks, individuals are more prone than ever to
spread incorrect information, as shown in the graph below. Through their social media
networks, individuals are more likely than ever before to spread false information
more quickly, farther, and deeper than genuine news stories and information.
Although there are significant differences between countries, ethnic groups, and
individuals, the government is now attempting to limit the misuse of publicly
available information, both collectively and individually, as well as to prevent its
widespread distribution, despite the fact that there are significant differences between
countries, ethnic groups, and individuals. Human integrity cannot be compromised
under any circumstances, including in the face of erroneous information from news
media sources. The terms honesty, candor, and attentiveness come to mind when
thinking about the coming decade as well as the generation that will be more aware
of and cognizant of the issue as a whole. These first three words come to mind when
we think of the next decade and the generation that will be more alert and aware of the
issue than previous generations. Because of technological advancements, misleading
media contributes to the reversal of those objectives. Information may now travel
around the globe as fast as an infectious illness virus, which is a result of deceptive
media’s ability to mislead the public. In recent years, technological advancements
and human aspirations have joined forces to expand the scope of efforts to fight
the creation and dissemination of “fake news,” which has grown more common in
society. Recently, new techniques for producing and distributing misleading content,
such as text, have been developed and implemented on the internet in order to deceive
the public. In addition, there has been a rise in the use of visuals and motion pictures
to mislead the general public [1]. Fake news has been successfully combated with
the use of artificial intelligence, deep learning, and blockchain technologies, all of
which have been used in conjunction with one another to great effect. The area in
question has been the subject of some scientific investigation to date. Bitcoin has
emerged as one of the most innovative technologies to emerge in recent years, and
it is expected to continue to do so.
A Blockchain-Based Expectation Solution … 387
Blockchain technology, in its most basic form, ensures the integrity of transaction
data after it has been recorded on a distributed ledger network using cryptographic
methods, and this is known as proof of stake. As a result of this design, blockchain
technology is the most suitable technology for acting as the foundation for this
kind of business. As nodes in the network work together to create blocks, they are
also involved in block-related activities, both of which need consensus to be used
for the network to function properly. In a trustworthy environment, the suggested
blockchain-based approach for detecting and preventing false media content may be
successful in identifying and preventing fake media material. The implementation of
a consensus method to regulate bitcoin operations is suggested in order to preserve
openness while at the same time restricting the flow of cash. False information is
spread in a variety of ways, including via the use of false media. As part of our
assessment process, we are led by the game feature, which specifies what constitutes
good and bad behaviors in the context of a certain situation [2].
The following are the research’s major benefits:
It is more important to prevent fake news than only detect it.
• A solid evidence protocol specific to you
• Possible applications of blockchain technology outside the realm of finance [3].
2 Study on Definitions and Associated Topics
In this section, we provide an overview of the Web of fake news as defined in the
paper, as well as all related work that has been presented in the context of fake news
detection, specifically in the blockchain space [4].
2.1 Fake News Media on the Internet
When dealing with a significant quantity of information and services, it may be chal-
lenging to preserve privacy and security at the same time. A false piece of information
is any piece of information that is not true in any manner, shape, or form whatso-
ever. The opposite of truthful statements claims that are just partially or completely
false in some manner. Content that is shared on social media or promoted via adver-
tising would also be appropriate distribution methods. It is possible that false media
information will cause physical harm, in addition to ambiguity, material correctness,
opinion influence, bad judgment, and voting habits. In the 2016 presidential election,
all of these variables were present, and they had all been anticipated to take place
before the election [5].
2.2 Additional Work
When false news began spreading, new skills and ideas for fighting it were created,
particularly in light of the effect fake news had on the political and military sectors,
which resulted in unprecedented occurrences all over the globe. Using machine
learning methods that are based on human linguistic processing, it is feasible to draw
attention to linguistic patterns and evenhandedness in a specific language. It is the
end result of the whole procedure that a machine learning vocabulary is created by
merging two different classifier models with another classifier model to generate a
machine learning vocabulary [2]. For detecting false news stories, it was essential
to create a technique of this kind. In news organizations and social media platforms
alike, fake news is quickly becoming the most often utilized kind of propaganda. It is
also gradually becoming the most extensively disseminated type of propaganda [6].
According to the Bitcoin Foundation, bitcoin blockchains have the potential to be
used to identify false information on the Internet. The systems that utilize blockchain
technology to trace the sources of news items are distinguished from the systems that
do not use blockchain technology at all, according to our research. This framework,
which incorporates a distributed structure as a component of its design, makes smart
agreements as well as agreement techniques available to the public. With the use
of blockchain technology and a consensus process, it is feasible to enhance data
tracking, which will make it easier to double-check information in the future [4]. If
you want your media, information chains to be effective, you have to follow a set
of rules and regulations. The bulk of the study on this subject has focused on the
idea that social media platforms play a significant role in the dissemination of false
information, which is supported by the data.
However, there has been some debate over whether this is true. We will go into
more detail on the idea of decentralization, as well as the concept of Ethereum smart
contracts, further down on this page. The restaurant franchise has a strong presence
on social media sites, such as Facebook and Twitter. Auditor ratings are made public
to the press and the media at various periods throughout the year at different times.
In addition to a user-accessible rating that indicates the correctness/authenticity of
specific news, the item will be validated using a weight-based validation technique,
which will be explained in more depth later on. The validity of validators who are
located in the same geographical area as the validity of validators who are not are
given the highest weight. A non-profit organization was formed in order to fight the
spread of false information. Smart contracts are being used to streamline the process
of registering and publishing news items, which will ultimately save time and money.
The editors and authors of newspapers have filed a formal appeal with the federal
government. A public key and hidden pair with a public key and hidden duo with a
public-key cryptography are given to the publisher after passing an initial check [7].
A secret pair with a public key and hidden duo with a public-key cryptography is also
issued to the publisher after passing an initial check. In order to evaluate whether or
not an author can be trusted, a credibility score is assigned to them at the moment of
publication. This score increases over time when more information is made available
about the author. The P2P network was used to distribute the breaking news [8].
3 Model of Architecture
Given the less permissive nature of cryptocurrency systems such as Bitcoin and
Ethereum, as well as the difficulty in achieving consensus in the context of blockchain
technology, it has been highlighted as a subject for debate. Anyone with an Internet
connection may participate in the mining process and earn money as a miner, regard-
less of where they are physically situated. According to Nakamo, the most common
forms of distributed networks are proof of work and proof of stake, but there are
a number of other types of distributed networks as well, including (2008). It is
preferable to use protocols that are based on strong data rather than those that are
not. Evidence in the real world is a mathematical argument based on hashing that
is time-consuming to construct due to the fact that it is computationally costly to
compute. This has resulted in it serving as the backbone for each and every one of
the main bitcoin systems that are now in use, albeit at a high cost and with limited
performance scalability owing to the inherent constraints of the bitcoin protocol. Our
situation necessitates the use of evidence since it will assist in combating the spread
of false information.
Additionally, some proof-of-stake features are added, which reduces the need
to spend a large amount of energy verifying the blocks, which is beneficial. For
this, we use an algorithm to identify the blockchain members who have the most
interest in ensuring that the aforementioned blocks are checked, thus decreasing
the amount of energy spent verifying the blocks. There isn’t any question about
that, to be honest with you. At the same time, Byzantine fault, the mother of them
all, is also raising a little brood of its own. It has been shown that it is feasible
to develop algorithms that are fault-tolerant in certain situations. By looking at the
algorithms’ surface appearances, it seems like they operate in a cyclic manner. During
this period, mine management is carried out by a member of the governing party
who is in charge of proposing new mining blocks to be mined during this period.
In order to achieve agreement on important issues in a timely way, it is possible to
develop a procedure. In spite of the fact that PoA requires fewer message exchanges
than BFT, overall performance is improved as a consequence of the decrease in
communication overhead. Dinh and his associates were taken into custody. Despite
this, it is still uncertain as to what the real implications of such improved performance
will be, particularly in terms of cost. In a genuine ultimately synchronous network
architecture, the reliability and consistency of requirements are guaranteed (such as
the Internet). As part of our network’s development, we created our own version of
the principle of affirmative action, which is based on the concept that people who
have an economic stake in the network have an obligation to act in the network’s best
interests. Their most urgent issues have been addressed, and as a result, they have
grown more motivated over time to see the system through to completion. Instead of
monetary compensation, conventional proof-of-stake methods are used in this case;
however, the stake is of a symbolic rather than monetary character. It is necessary to
know who performed the validation and when they finished it in order to complete the
validation process. Otherwise, the procedure will not be completed. “Identification”
will be used to refer to the presence of validators’ identities on a website where news
organizations may participate in a decentralized and dynamic manner in the capacity
of validators [4].
3.1 Overview of the Structure
A modified proof-of-authority protocol is proposed in this paper as the basis for an

architectural model for blockchain technology that is conceptually similar to that
proposed in previous work. We chose PoA over PoS and PoW not only because
it is much more cost-efficient than the other two techniques since it does not need
any processing, but also because it has a higher transaction rate than the other two
methods. A trustworthiness score is used to evaluate whether or not there is agreement
on a certain point of view [9]. The essential system components represented in Fig. 1
are further described as follows.
1. Media Organization: A list of all entities that have been given authorization to
register, publish, and participate in the validation of transactional data will be one
Media Solid
Authorised
Organizatio Evidence
Data
n Protocol
Evidence
Lying and
Media authorised
protocol
Fig. 1 Media organizational view

of the pieces of information included inside this entity. People who do not have
real-world news reporting experience will not be considered news organizations
since we are building together a well-established network of reporters who are
known for their dependability and professionalism. If their bids for inclusion in
the blockchain have been filed, news organizations such as CNN, the BBC, and
France24 will want to be included as well [4].
2. Authentication Data: When applying for registration, businesses will be asked
to provide certain information about themselves as well as confirmation that
they are news organizations. The kind of documentation that is utilized in this
scenario includes, for example, business licenses, which are samples of what is
required. In addition to newspapers and television stations, the media business
comprises radio stations and other types of broadcasting enterprises. Once the
content has been verified, the database will be made available for use by the
public. Participation by news organizations is strongly welcomed. As news orga-
nizations get authentified, cryptocurrencies such as bitcoin and smart contracts
are being utilized to make the process more convenient for everyone involved.
The nodes that have been classified have been highlighted in yellow. It is critical
to keep this in mind.
3. Solid evidence Protocol: As previously indicated, the unique solid evidence
protocol that in a consensus method for determining the credibility score of
False Press Things is also included in the paper, and it is described in detail.
The confirmation of a node or news organization gives them the opportunity to
request that their news be published. The ability to request news publication is
available to certain of these organizations, depending on their credibility score.
In the event that certain nodes are selected to join, they will be tasked with
determining whether or not a particular transaction is still in process. When
False Press Things are submitted for approval, they are subjected to a testing
procedure, during which validators evaluate if the transaction is real or not.
When distinguishing between transactions that are “genuine” and those that are
“fake,” and if they meet the Degree of Fakeness criteria, the transaction’s hash
on the chain is altered, allowing for more freedom in choosing what steps to
take next for publishing. The transaction will be labeled as such, however, if the
criteria are not fulfilled. This is because we are in the midst of creating an open
network for news outlets and will designate it as such if the requirements are
not met. Despite the fact that it is a “fake,” it is simple to track down the source
of the information.
4. Lying Media Things: False information may be spread via a variety of mediums,
including text, images, and videos, among others. The most important character-
istics of fake news are as follows: When it is feasible, things should be preserved
and improved upon. In order to successfully fight the issue, it is necessary to
maintain information on the kind of fake being dealt with, as well as sensitive
information on the individuals involved. Finding news sources would become
much simpler in the future as a result of this development. Despite the fact that
we are implementing a preventative approach, there remains a backlog of risks
that must be addressed in the interim. In order to detect fraud in the future and
prevent it from happening in the future, it is necessary to identify erroneous

information that may be utilized to identify fraudulent activity [10].
3.2 Individualized Clear Evidence
A customized consensus algorithm in the manner of a dynamically scaled assessment

score is used in our solution (Fig. 2).
If you are participating in the PoA variation, confirmation is carried out in order
to establish identities for those who are participating in the variation, advancing in
their vocations, and/or possessing traits that define their classification. They may
be required to provide these documents in order to authenticate their identity if it
becomes essential, in addition to any other paperwork that gives them authorization
to work in the journalistic field. When attending a big sports event for which press
credentials are required, the participant may find himself or herself in a scenario
similar to the one described below. Following the completion of the first confirmation,
you will be permitted to go on to the next step of the process in the procedure. Each
of the ways will provide nodes with the opportunity to become validators, with the
likelihood that this will occur determined by the credibility score given to each of
them by the methods. To avoid any prejudice or interruption to the existing system,
the current value of r is selected for validator approval. Nodes that want to become
Fig. 2 Representation of validation process [4]

validators in the future will be required to honor this value if they arrive at an approval
value that is equal to or greater than the value already in use by a validator approval
node in the meanwhile. The findings of our investigation will be explored in more
depth in the following sections. A technique of validating proof of concept (PoC)
papers has been created, which includes a credibility score as well as a way of
evaluating them. Technique for calculating a score and validating it [4].
3.3 System for Calculating Credibility Scores
In addition, some other parties use an evolvable credibility score as a component

in reaching consensus on what is currently the highest point in current blockchain
architecture. It ensures accuracy in reporting is a key responsibility of news organi-
zations when it comes to ensuring that the information they distribute is accurate.
One of the difficulties of dealing with such an issue via the use of methods is that
there is no predefined set of criteria to take into account when calculating a credi-
bility score, which is one of the disadvantages of doing it in the first place. Aiming to
discover a cure for the resultant computational energy need, as well as for the effect
and construction that may lead the results to be invalidated, the complete ranking
score provided here is an effort [11].
Because we are building a preventive system rather than a detection system, and
rather than waiting for the results to be published, our evaluation will be based on
media submissions rather than actual outcomes, as explained above. The degree of
believability is determined via the use of a method that takes into consideration both
static and dynamic variables. If the cloud architecture and principles are followed, it
is possible to make changes inside the system with a fair degree of ease. However,
this is not always the case. When making a choice, it is important to take into account
both of these factors at the same time. They may have their interactions with the rest
of the system pre-programmed to meet their specific requirements if they so choose
[2].
The following are the static and dynamic factors:
• Presenting accurate and detailed news: News organizations who submit news that
is considered to be true and real get a rise in their metric evaluations [12].
• Fake Media Trying to report: Bitcoin employees are responsible for preventing
the spread of disinformation. Having said that, if reported news is found to be
false, the publication will be awarded with a higher score. In the same way, if
the published news. The journalist’s rating will be reduced if the information was
accurate and true [2].
• Empty time: Customers and writers that stay active will be rewarded. They will
be rewarded with a boost in their credibility score. As a result, engaged users are
awarded for their continued contributions to the system. Spreading Fake Media:
Spreading fake news is deemed bad behavior in the opportunity bitcoin system.
Publishers of bogus news will see a significant drop in their credibility score
Variables that are constant.
Static factor
• Geographical: Seeing that if a news company is closer to the area of the worried
news, it could suggest that the news is more credible and that it is simpler and more
likely to interact. As a result, when other organizations rate news, the evaluator
will have a weighted average take advantage of the assessment
• Media Truthfulness: The contract status will rise or fall depending on the findings
of the algorithm employed to serve as a guide. To validate the truth of the provided
news, an agreement must be reached [13].
3.4 Procedure of Validation
When evaluating validators, they are given a credibility score, which is used to
determine whether or not they are competent to serve in this capacity. Participants in
the validation process will be given an invitation, and they will have the opportunity
to accept or reject the request at their discretion. As soon as the choice of the desired
participant is obtained, the algorithm is put into action. Depending on the outcome,
the system will either elevate the user to the position of validator or continue to
welcome the next individual to the system. Then, as can be seen in the image, it
will do an upgrade. The first algorithm is as follows: The anticipated number of
invitations varies based on the overall reputation of the ecosystem, but it is always
in the neighborhood of 100. The use of a guarantee to ensure that the aggregate
credibility score of all validators exceeds 50% of the total rating is one such method.
An excellent illustration is an order to become members of the network, the top
X individuals must be chosen based on the order in which their trust ratings are
spread across the network. As soon as the individual who has extended it declines an
invitation, an invitation will be extended to the next person on the list, depending on
their degree of trustworthiness. This method is utilized to bring in new participants
to the event by inviting them through email. When the method is invoked, the inputs
sumValCred and reqSumCred are used to complete the task. It is possible to go to
additional variables after starting with the sumValcred variable, which reflects the
overall trustworthiness of all validators in the environment.
The function reqSumCred is in charge of calculating the total amount of validator
credit that is needed for the request. In accordance with the usage case, the project
manager will make choices. If the sumValCred value is less than the required sum,
then the loop will exit. If it is more than the required amount, the loop will continue.
Regardless of whether or not this is correct, it shows that the validator’s group is
still accepting new participants. Upon receiving a new invitation, the most eligible
individual on the non-validator list will be contacted, and a variable called Decision
will be used to keep track of the authorized user’s option. Accepting an offer is feasible
for a user; in this instance, the system will assign the user to execute validation tasks,
and the new sumValCred will be raised by the amount of new sumValCred. It is given
a credibility grade based on his or her experience. After the algorithm has received
the choice from the most recently welcomed individual, the pointer will be moved
to the next person who has been invited. The loops will determine if it is required to
continue with the next least eligible user on the non-queue validator, based on the
results of the previous choice. Pretend you’re in the following situation: We have a
total of 26 users in our ecosystem, who are denoted by the letters A B C D Z in the
following: User 1 through User 26. The names of the individuals are given in the
following order: According to our example, User A has the greatest credibility score,
which is 26, followed by User B with 25, and User C with 24, and so on until User
A gets the lowest credibility score, which is 24, and so on until User A is last on the
list.
User Z received a perfect score of one for his overall performance. An overall
credibility rating of 351 has been given to the system as a result of these findings.
As an example, consider the situation in which we want the holders to be completely
honest in this situation. They will have a total credibility score that is more than 50%
of the total trust in the whole network, or 175.5, whichever is greater. Soon after the
first cycle begins, the offer to act as a validator is sent to the biggest websites based
on their trust rating, where it remains until the total credibility score of the authorized
users reaches 175.5, which is the required minimum. If the users at the top have a high
level of trustworthiness, the rest of the users will have a high level of trustworthiness
as well. Similarly, if a validator refuses to become a validator and then subsequently
declines the invitation to join the list of validators as the next validator on the list, the
invitation will be issued to the next validator on the list until the minimum threshold
is met. The following validation tool is being decommissioned.
There are a number of reasons why validators may be removed at any moment,
including a decrease in the repute of a product or a business. Certain individuals
may consider spreading fake information or giving their assent to manufactured
information acceptable. The outcome of a connection with a non-validator or when
the validity of a non-association validator is in question. It will ultimately exceed the
credibility of the verifiers if it continues to develop at its current rate. In addition,
an invitation will be sent out in each of the aforementioned situations, as previously
stated. Following that, when the system is refreshed or updated again, the system will
identify and approve or reject the eligible participants before sending the information
along to the validators group for final revisions and approval time. Using the method
outlined in Algorithm 1, it is shown in Algorithm 2 how to choose new members
who have higher credibility scores and to eliminate those who have lower credibility
scores by following the approach demonstrated by the first [4]. Validation of the
addition and removal of data using an algorithm. As with Algorithm 1, it takes the
same set of inputs as Algorithm 1 and gives the same result as Algorithm 1. It also
seeks sum credentials, which is similar to Algorithm 1. The loop will next check to
see if any non-validators with a higher credibility score are available, and if they are,
the loop will continue to the next step. To be precise, the algorithm will continue in
the same way as Algorithm 1 until the non-validator with the lowest credibility scores
gets a higher score than the validator with the greatest credibility score. Algorithm
2 will then repeat the process. The one who stands out the most among the others.
The loop will be followed by a second loop that will do an additional check to
identify whether or not there are any redundant validators in the code to be executed.
sumVa will have higher credibility than reqSumCred, and vice versa. As a result,
if the claim is right, then follows that the algorithm must likewise be accurate. In
this scenario, the validator’s identity will be deleted, and the sumValCred will be
reduced by the amount of credit earned by the validator. As a foundation for this
novel consensus approach, De Angelis et al. used the Clique consensus method,
which was first given by them. Because of our consensus-based approach, we can
improve efficiency while simultaneously reducing the amount of messages that are
delivered. When new news items are submitted, they will be sent to the relevant
recipients for distribution. Authenticity checks will be performed on transactions put
on the waiting list in order to verify that they are genuine. Following are the technical
specs for your information, which should be considered exhaustive [4, 13, 14].
Epoch: The consensus method follows the pattern of time epoch; each decision on
whether to broadcast or not should be made at the same time as the rest of the system.
Those validators from that point in time. A customized transition block is sent to
the system ahead of time to prepare for the next phase of advancement that turn’s
collection of validators.
Validators for each period are as follows: Each epoch will have a sum of 1N. The
validator with the best score was chosen out of two validators. The person with the
highest believability rating will be regarded as a leader [15]. The credibility score
determines the likelihood of each validator being selected. Those with a level higher
of trustworthiness are much more likely to be selected. For each validator, they can
only introduce a new block 1 + N. There are two blocks. If a validator approves a
bit of news, it will be unable to make a choice on the next bit of information 1 + N
by 2 people are watching the news. With the Information, the selected auditors will
be able to suggest a new block [2].
Judgment at each epoch: The validators with the highest perceived credibility will
be given the content from the line waiting in the order of the delivery date. In other
terms, the very first news item received will be delivered to the leadership, who will in
turn hand the item to the next validator who will do so for the rest of the team. Every
epoch, a validator has the option of approving, rejecting, or quitting. If the news has
been validated [16], when the next block with the Information has been released,
everyone in the environment will be able to see the news. If suggested, while the
verifier denies the new block, the news will be kept from the users. Because of this,
we will be designed toidentify each validator’s choice in order to prevent malicious
conduct, such as a deliberate move against another news organization. If the validator
has a negative feeling toward the authenticity of the item, then the media is returned
to the top of the page with a block. A line of people who are waiting to be considered
in the following epoch.
Fork: There is some delay because each ledger is physically separated from others.
The fork could occur throughout each era cycle. The rule that the verifier with the
most credibility has the top importance on the chain, however, resolves this issue.
As a result, throughout each epoch, the majority of the verifiers will place the blocks
with the most confidence first. Various news organizations may engage in malevolent
activities toward those news organizations. Each validator’s verdict is stored on the
blockchain. The users who benefit from bitcoin’s accountability can start a vote
against validators who are acting maliciously.
References
1. Douglas A (2006) News consumption and the new electronic media. Int J Press/Polit 11(1):29–
52
2. View at: Publisher Site|Google Scholar
3. Wong J (2016) Almost all the traffic to fake news sites is from Facebook, new data show
4. Thakral M, Singh RR, Jain A, Chhabra G (2021) Rigid wrap ATM debit card fraud detec-
tion using multistage detection. In: 2021 6th international conference on signal processing,
computing and control (ISPCC), 2021, pp 774–778, https://doi.org/10.1109/ISPCC53510.
2021.9609521
5. Lazer DMJ, Baum MA, Benkler Y et al (2018) The science of fake news. Science
359(6380):1094–1096
6. García SA, García GG, Prieto MS, Guerrero AJM, Jiménez CR (2020) The impact of the term
fake news on the scientific community’s scientific performance and mapping in the web of
science. Social Sci 9(5)
7. Holan AD (2016) 2016 lie of the year: fake news, politifact, Washington, DC
8. Kogan S, Moskowitz TJ, Niessner M (2019) Fake news: evidence from financial markets.
https://ssrn.com/abstract=3237763
9. Robb A (2017) Anatomy of a fake news scandal. Rolling Stone 1301:28–33
10. Soll J (2016) The long and brutal history of fake news. Politico Magazine 18(12)
11. Hua J, Shaw R (2020) Coronavirus (covid-19) “infodemic” and emerging issues through a data
lens: the case of China. Int J Environ Res Public Health 17(7):2309
12. Conroy NK, Rubin VL, Chen Y (2015) Automatic deception detection: methods for finding
fake news. Proc Assoc Inform Sci Technol 52(1):1–4
13. Shu K, Sliva A, Wang S, Tang J, Liu H (2017) Fake news detection on social media. ACM
SIGKDD Explor Newsl 19(1):22–36
14. Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science
359(6380):1146–1151
15. Allcott H, Gentzkow M (2017) Social media and fake news in the 2016 election. J Econ Persp
31(2):211–236
16. Rubin VL, Conroy N, Chen Y, Cornwell S (2016) Fake news or truth? using satirical cues to
detect potentially misleading news. In: Proceedings of the second workshop on computational
approaches to deception detection. San Diego, CA, pp 7–17
Countering Blackhole Attacks in Mobile
Adhoc Networks by Establishing Trust
Among Participating Nodes
Mukul Shukla and Brijendra Kumar Joshi
Abstract Mobile Adhoc network (MANET) is a self-configured, infrastructure-

less network of mobile devices connected without wires. It is used in various fields
for various purposes like military, local conferences, and information movability.
However, due to the lack of built-in security, safety is a significant concern in the
MANET. There are various attacks possible on MANET, one of which is the Black-
hole. It is an active attack in which a malicious node shows itself as the shortest route
and absorbs the packet just like a blackhole does in-universe. This paper consists of a
proposed trust-based mechanism that helps to improve the trust in a blackhole attack.
We have used an approach of trust AODV routing, which uses trust value to find the
Blackhole affected node, and it is seen that it returns better results. Specific evalu-
ation parameters have been used to evaluate the consequence like packet delivery,
end-to-end delay, and throughput; trusted AODV (TAODV) performs well improves
the network’s efficiency compared to Blackhole AODV (BAODV).
Keywords Mobile Ad hoc Network · Blackhole attack · Trust estimation · AODV

routing protocol · Packet Delivery Ratio · Throughput · End-to-End delay
1 Introduction
A mobile ad hoc network (MANET) is a network topology with multi-hops and a

collection of random nodes free to move as the network topology changes. Flexibility
means no strictness related to the network; random mobility means a node can change
the networks without administration. In MANET, these nodes transfer traffic to other
specified nodes and behave as a router and a base station that shows its autonomous
M. Shukla (B)
Department of Information Technology, Shri G. S. Institute of Technology & Science, Indore,
India
e-mail: mukul@sgsits.ac.in
B. K. Joshi
Electronics & Telecommunication and Computer Engineering, Military College of
Telecommunication Engineering, MHOW, India
https://doi.org/10.1007/978-981-16-7610-9_29
400 M. Shukla and B. K. Joshi
behavior. It requires less human interference. MANET is used in military battlefields,

IT companies, law enforcement, campus networks (conferences), and construction
(architecture).
Even after its pros, there are some cons like lack of authorization facility and
security. Security [1] is a major concern for MANET, as there is no in-built secu-
rity available. These attacks are categorized as data and control traffic attacks. In
a data traffic attack, either packet will be dropped or end-to-end delay increases—
for instance, the Blackhole attack, gray hole attack, and jellyfish attack. But in the
case of a control traffic attack, it deals with controlling the data traffic—for instance,
wormhole attack, HELLO Flood attack, rushing attack, and man in the middle attack.
While discussing the best route and nodes, then in MANET, routing protocols
play an important role. These protocols are further classified as a proactive protocol
that keeps itself up to date by maintaining a table for every participating node.
Although this will decrease its performance and maintenance complexity, a reac-
tive protocol like AODV overcomes proactive protocol problems, generating the
table on demand. It uses two principles: dynamic source routing (DSR) protocol and
destination sequence distance vector routing (DSDV) protocol [2]. Another category
is the hybrid routing protocol; it behaves in a mixed manner for initial nodes. It
will be maintainable. For others, it will maintain a record of nodes on-demand, like
the zone routing protocol [3]. AODV uses three types of messages route request
(RREQ), route reply (RREP), and route reply (RERR). It gives priority to the nodes
for sending data packets is based on the sequence number. However, it will increase
the overhead and sometimes misguide the node.
2 Related Works
Saddiki et al. [4] proposed a scheme known as neighbors trust-based and includes
neighbor nodes’ participation to detect misbehaves nodes. Also, they presented a
study of security issues related to routing in MANET, specifically in OLSR protocol
and its exposure for cooperative Blackhole attacks. The feasibility of the scheme is
tested through NS-2. As future work, it can be extended for other types of attacks.
Keerthika et al. [5] presented a hybrid weighted trust-based artificial bee colony
2-opt algorithm. They have been used AODV to secure MANET from a blackhole
attack. This algorithm finds a secure optimal path, and for improvisation, they have
been used 2-opt as local search. It will use current solutions to generate new solutions.
For evaluation of the performance, PDR, hop sink, and end-to-end delay were used.
Mehetre et al. [6] proposed a routing scheme that is secure and trusted. For
selecting nodes and securing the WSN data packet, they have been used 2-step
security and dual assurance scheme. The cuckoo search algorithm plays an essential
role in this scheme as it provides a secure routing path for identified trusted paths.
Energy is used as a performance parameter.
Countering Blackhole Attacks in Mobile Adhoc … 401
Arulkumaran et al. [7] focused on MANET’s role in the military; for this, they
have focused on a fuzzy logic strategy to improve AODV performance. The certifi-
cate provided to only trusted nodes using fuzzy logic helps detect malicious nodes
in the proposed approach. As a future enhancement, it can be deployed for other
fields like emergency operations and PAN. Also, the improvement of throughput and
decrements of end-to-end delay value is expected from future work.
Singh et al. [8] use trust points of mobile along with the clustering technique.
For detection of the attack, the activities of clusters head are monitored by the trust
points, and in case of any Blackhole detection, it will generate an alert in the network.
As a future enhancement, it can be used to detect other attacks and improve accuracy.
Singh et al. [9] proposed a solution for attacks like a blackhole, wormhole, and
collaborative Blackhole. Trust values are used as a decision factor whose value lies
between 0 and 1. Value more than 0.5 will allow the node in the network else to block
the node. As a future enhancement, trust broadcasting and aggregation are needed.
Singh et al. [10] presented a solution that uses a trusted AODV routing protocol for
a collaborative Blackhole attack. Trust value, calculated using a hyperbolic tangent
function, is considered for finding the malicious node. As future work, trust value
can be used for finding other attacks also.
Hazra et al. [11] proposed a trust model with a different level of computations.
In the context of data forwarding, it will identify and isolate blackhole attackers.
This trust model can detect other types of attacks for the ad hoc network as a future
enhancement.
3 Blackhole Effect
This section includes the blackhole effect and the process of AODV in the network.
The blackhole attack is a type of data traffic attack in which one of the nodes behaves
as a suspicious node [12–14]. It works similarly as a Blackhole exists in-universe.
As the energy and matter disappear in the Blackhole in the universe in the same way,
packets disappear when they follow this malicious node’s route. This malicious node
attracts the packet by showing the route to its destination.
In Fig. 1, a network consists of 7 nodes (0-1-2-3-4-5-7), among which 0–5 totals
of six nodes are actual while node number 7 is a malicious node. In this network, a
minimum path is required to deliver the packet to its destination. It has been seen that
while passing through any of the nodes from 0 to 5, the value of hop count will be
2, but if it travels via node 7, hop count will be one. Here, hop count means distance
travel via packet, while moving from source to destination.
While using AODV, having multiple REPLY requests gives preference to the route
with the maximum sequence number. So, 0 is taken as a source, and five is considered
the destination, so node number 7 will show the maximum sequence number as it is
a malicious node that will pretend itself to be the destination. So, there is a need to
detect it.
Fig. 1 Blackhole affected network
4 Proposed Scheme
This section presents an trusted AODV routing protocol that secures route selection
with the trust. In this, neighbor plays an essential role as the trust values adjustment
depends on the node’s experiences with its neighbor. The proposed approach is
divided into three algorithms dependent on each other and explains the approach’s
working.
A. Status of Trust
In the proposed algorithm, the AODV and trust estimate function are embedded
together. Trust between the nodes and cooperation are the key factors responsible for
communication in MANET.
Input: A network having mobile nodes.
Output: An efficient route search and a possible blackhole attack on a network
scenario.
Procedure:
(1) Let us consider a network with several random nodes such as 20, 40, 60, etc.
(2) The REQ request is generated and waits for the TRIP reply to establish
communication between source and destination.
(3) Once it receives multiple replies, it will select the best communication route
based on the sequence number and hop count.
(4) Different nodes can be classified as UnTrusted, Trusted, and most trusted based
on neighbor’s trust and threshold values.
• UnTrusted: The UnTrusted node is the node having a low value of trust.
When a naive node arrives at the network scenario in a given scenario, its
association with other nodes is negligible. So, it is treated as an UnTrusted
node.
• Trusted: When the node receives some packets, then its trust level increases.
So, it is considered a Trusted node. Its trust value lies in between UnTrusted
and MostTrusted.
• MostTrusted: MostTrusted nodes have the highest trust value and are
considered the most reliable node. Here, high trust refers to the successful
transmission of packets with the neighbors.
(5) The trust estimation function returns the trust status of all the nodes based on
their reliable nature. The record of the status of a node is maintained in a table
known as a Trust Table.
(6) The Trust table works as follows—Whenever any node receives a packet, it
will refer to the table. Suppose a naive node joins the network that time, it is
measured as an UnTrusted node, which increases the possibility of the attack.
The Trusted node will be referred to in the absence of MostTrusted node, but
the UnTrusted node has never been chosen as an option.
B. Trust Calculation
Trust value is a decision-maker in the network, which helps to identify nodes as reli-
able or not. For neighbors, different threshold values decided to become Trusted,
UnTrusted, and MostTrusted. Threshold values for the UnTrusted, Trusted, and
MostTrusted is T ut , T t, and T mt, respectively. The trust calculation is described below.
Input: As input, it requires values of Route Request Success Rate (RREQS), Route
Request Failure Rate (RREQF), Route Replay Success Rate (RREPS), Route Replay
Failure Rate (RREPF), Data Success (DATAS), and Data Failure (DATAF).
Output: Trust value
Procedure:
(1) Value of t 1 and t 2 will be calculated as,
Number of packets actually forwarded

t1 =
Number of packets to be forwarded
Number of packets received from anode but originated from others
t2 =
Number of packets received from it
(2) Value for t 3 by using t 31 , t 32 , t 33
t3 = (t31 + t32 + t33 )/3

where
t 31 = (Route Request Success Rate − Route Request Failure Rate)/(Route Request
Success Rate + Route Request Failure Rate)
t 32 = (Route Replay Success Rate − Route Replay Failure Rate)/(Route Replay
Success Rate + Route Replay Failure Rate)
t 33 = (DATAS − DATAF)/(DATAS + DATAF)
(3) Intermediate values obtained in step 2 are used to calculate the final trust value
(FT).
FT = tanh(x)
Where
x = t1 + t2 + t3
tanh is a hyperbolic tan function having Final Trust.

Its value should be lie in the range of −1 to +1, but we consider only a positive
value between 0 and +1 in our experimental result.
(4) After that in even intervals of time, RREP will be received from the neighbors,
and in need of an Updation threshold value will be used.
At last, the route will be decided accordingly.
C. Proposed working flow chart (Fig. 2)
5 Implementation and Result
This section includes the implementation details of the proposed approach. We have
used the NS-2 simulator to check the results of the simulation. A detailed study of
NS-2 is given below.
A. Experimental Simulator
Network Simulator-2 (NS-2) is an object-oriented and readily available simulator. It
is an OTcl script interpreter whose main components are simulation event scheduler,
which plays an essential role in tracking simulation time. It is responsible for the
action associated with the packet, which is pointed by the event. Another component
is object libraries and module libraries for network setup. OTcl is also written in C++
to reduce the event’s packet and processing time. These two languages, i.e., C++,
and OTcl, are connected through the TclCL link [15]. In the NS2 simulation, we
have applied the AODV protocol for routing [16].
In this paper, we have to apply the proposed approach on network simulator 2,
hardware for simulation work like processor, Intel(R) Core(TM) i3-6006U CPU @
2.00 Ghz, Installed memory (RAM), 8.00 GB (7.89 GB usable), operating system,
Ubuntu 18.10, 64-bit. Minimum Required hardware, 100 GB for the experiment.
Fig. 2 Proposed trust-based model
B. Simulation Parameters
Table 1 gives a brief of experimental parameters and corresponding values. Three
routing protocols are used in the 1200 m * 1500 m, showing how the attacker’s
performance is affected.
The given network has used five scenarios like 20 nodes, 40 nodes, 60 nodes, 80
nodes, and 100 nodes. Our experimental simulation time is 100 ms. Initial energy,
transmission powers, response power, idle power, and sense of node power in terms
of a watt are essential parameters in the simulation. As a result of this experiment,
node 2, 4, 6, 8, 10 is malicious nodes while transferring a size 1024-byte packet.
Table 1 Experimental
Parameter Values
parameters
Simulator name NS 2.35
Protocol AODV, BAODV, TAODV
Nodes 20, 40. 60, 80, 100
Time 180 s
Type of traffic TCP and UDP
Size of packet 1024 bytes
Pause time 16
Size of scenario 1200 × 1500
Speed (maximum) 18 m/s
Malcious node 3, 6, 9, 12, 15
5.1 Result
This section has presented results generated by the proposed algorithm. Validate our
result in terms of throughput, end-to-end delay, and PDR. In the NS2 simulator, we
have considered scenarios that include a total node as 45 in-network.
The parameters are as follows:
Throughput: the data retrieval at the destination node in any time interval unit is
termed throughput [17]; it can be defined as Eq. 1.
received bytes ∗ 8
Throughput = kbps (1)
time of simulation ∗ 1024
Avg End-to-End Delay: The time a packet uses to reach the source to destination
is called the end-to-end delay [17]; it can be defined as Eq. 2.
1 N
Avg EE delay(ms) = (Rn − Sn ) (2)
N n=1
Packet Delivery Ratio (PDR): The data packets’ ratio sent to the data packets
received is termed the PDR [17]. Mathematically, it can be defined as Eq. 3.
packets recieved
PDR(%) = (3)
packets sent
A. Packet Delivery Ratio
PDR represents the ratio of packets received by the destination and generated packets
by the source.
Fig. 3 Packet delivery ratio
Figure 3 shows that the Blackhole affected AODV network shows poor perfor-
mance, but AODV shows satisfactory results than trusted AODV. Packet Delivery
Ratio of trusted AODV is improving results by 40–50% as per compare BAODV.
B. End-to-End Delay
End-to-end delay is when the packet takes a while it moves to cover the distance
from the source to destination.
Figure 4 shows that delay increases in BAODV, but the TAODV value of delay is
low, which means better performance. End-to-end Delay of trusted AODV improves
results by 40–50% compared to BAODV in milliseconds.
C. Throughput
Throughput shows the number of packets successfully received per unit of time.
In Fig. 5, it has been shown that the value of throughput is almost similar for AODV
and TAODV, but in the case of BAODV, it shows poor performance. Throughput of
trusted AODV improves results by 40–50% compared to BAODV in terms of kbps.
Fig. 4 End-to-end delay

Fig. 5 Throughput
6 Conclusion
MANET plays an essential role in several fields, so its security is a priority. This
paper consists of studying MANET security attacks like Blackhole and their resolve
problem using the trust-based method. The performance of the approach was eval-
uated using parameters like packet delivery ratio (PDR), end-to-end delay, and
throughput, which shows an average number of packets deliver. TAODV performs
well improves the network’s efficiency compared to BAODV.
In future, this study can be useful and can be used to build a trust-based system
for MANET for different types of attacks like wormhole attacks.
References
1. Vinayagam J, Balaswamy C, Soundararajan K (2019) Certain investigation on MANET security

with routing and blackhole attacks detection. Procedia Comput Sci 165:196–208. https://doi.
org/10.1016/j.procs.2020.01.091
2. Singh A, Singh G, Singh M (2018) Comparatisve study of OLSR, DSDV, AODV, DSR and
ZRP routing protocols under blackhole attack in mobile ad hoc network. In: Singh R et al
(eds) Intelligent communication, control, and devices, advances in intelligent systems and
computing, vol 624. © Springer Nature Singapore Pte Ltd., pp 1–11
3. Tsiota A, Xenakis D, Passas N, Merakos L (2019) On jamming and black hole attacks in
heterogeneous wireless networks. IEEE Trans Veh Technol 68(11):10761–10774. https://doi.
org/10.1109/TVT.2019.2938405
4. Saddiki K, Boukli-Hacene S, Gilg M, Lorenz P Trust neighbors based to mitigate the cooper-
ative black hole attack in OLSR protocol. In: Thampi SM et al (eds) SSCC 2018, CCIS969,
2019, pp 117–131
5. Keerthika V, Malarvizhi N (2019) Mitigate black hole attack using hybrid bee optimized
weighted trust with 2-opt AODV in MANET. Wireless Personal Communications, Springer
Science + Bussiness Media, LLC, part of Springer Nature 2019, pp 1–12
6. Mehetre DC, Emalda Roslin S, · Wagh SJ (2018) Detection and prevention of blackhole and
selective forwarding attack in cluster ed WSN with active trust. ClusterComputing, Springer
Science + Bussiness Media, LLC, part of Springer Nature 2018, pp 1–16
7. Arulkumaran G, Gnanamurthy RK (2017) Fuzzy trust approach for detecting blackhole attack
in mobile ad hoc network. Mobile Netw Appl, Springer Science+Bussiness Media, LLC, part
of Springer Nature 2017, pp 1–8
8. Singh M, Singh P (2016) Blackhole attack in MANET using mobile trust points with clustering,
© Springer Nature Singapore Pte Ltd. 2016 A. In: Unal et al (eds) SmartCom 2016, CCIS 628,
2016, pp 565–572
9. Singh U, Samvatsar M, Sharma A, Jain AK (2016) Detection and avoidance of unified attacks
on MANET using trusted secure AODV routing protocol. In: Symposium on colossal data
analysis and networking (CDAN), pp 1–6
10. Singh S, Mishra A, Singh U (2016) Detecting and avoiding of collaborative black hole attack
on MANET using trusted AODV routing protocol. In: Symposium on colossal data analysis
and networking (CDAN), pp 1–6
11. Hazra S, Setua SK (2014) Blackhole attack defending trusted on demand routing in ad-hoc
network. Smart Innovation, Systems and Technologies 28, © Springer International Publishing
Switzerland, pp 1–8
12. Vo TT, Luong NT, Hoang D (2019) MLAMAN: a novel multi-level authentication model and
protocol for preventing blackhole attack in mobile ad hoc network. Wireless Netw 25:4115–
4132. https://doi.org/10.1007/s11276-018-1734-z
13. Arulkumaran G, Gnanamurthy RK (2019) Fuzzy trust approach for detecting black hole attack
in mobile adhoc network. Mobile Netw Appl 24:386–393. https://doi.org/10.1007/s11036-017-
0912-z
14. Cai RJ, Li XJ, Chong PHJ (2019) An evolutionary self-cooperative trust scheme against routing
disruptions in MANETs. IEEE Trans Mob Comput 18(1):42–55. https://doi.org/10.1109/TMC.
2018.2828814
15. Issariyakul T, Hossain E (2012) Introduction to network simulator NS2, Springer Science +
Business Media, LLC, pp 1–20
16. https://tools.ietf.org/html/rfc3561
17. Uddin M, Taha A, Alsaqour R, Saba T (2017) Energy-efficient multipath routing protocol for
a mobile ad-hoc network using the fitness function. IEEE Access, vol 5, pp 10369–10381.
https://doi.org/10.1109/ACCESS.2017.2707537
Identification of Gene Communities in
Liver Hepatocellular Carcinoma: An
OffsetNMF-Based Integrative Technique
Sk Md Mosaddek Hossain and Aanzil Akram Halsana
Abstract Liver hepatocellular carcinoma (LIHC) is the most common primary

malignancy of the liver and is one of the primary contributors to cancer-related
death worldwide. The present work proposed a computational framework to dis-
cover functional gene communities in LIHC by integrating RNASeq gene expres-
sion profiles with protein-protein interaction data to elucidate the inherent complex-
ities of biomolecular mechanisms in LIHC. Here, we have proposed an offsetNMF-
based module integration technique that incorporates characteristics of both gene
co-expression modules discovered through a refined WGCNA-based algorithm and
protein complexes predicted through a parameter-free greedy approximation algo-
rithm PC2P. Biological significance analysis of the integrated gene communities
discovers several highly LIHC-associated pathways.
Keywords Liver hepatocellular carcinoma · RNASeq · Protein-protein interaction

network · Gene communities · Non-negative matrix factorization
1 Introduction
Liver hepatocellular carcinoma (LIHC) is the most common primary malignancy of

the liver that commences in the liver cells. Typically, it occurs in people suffering
from chronic liver disease and cirrhosis. Despite significant advancements in diag-
nosis and treatment, incidence and mortality of LIHC continues to rise, and it’s one
of the primary contributors to cancer-related deaths worldwide. The vast majority
of LIHC occurs in eastern and southern Asia, and it’s more common in males than
females. The advancements of high-throughput technologies like DNA microarray
S. M. M. Hossain (B)
Computer Science and Engineering, Aliah University, Kolkata, West Begal 700160, India
A. A. Halsana
Computer Science and Engineering, Jadavpur University, Kolkata, West Begal 700032, India
https://doi.org/10.1007/978-981-16-7610-9_30
412 S. M. M. Hossain and A. A. Halsana
and next-generation RNASeq technology give rise to the availability of a massive

amount of gene expression data of cancer-related patients. Many computational tech-
niques have been proposed for gene co-expression network (GCN) analysis to study
the progression characteristics of different diseases, identifying disease-related gene
communities and relevant biomarkers [15, 16, 18, 26, 27]. Hossain et al. discovered
the gene modules and associated genes in pancreatic ductal adenocarcinoma (PDAC)
from time-series gene expressions through the Dirichlet process Gaussian process
mixture model (DPGP) [12]. The progression characteristics of HIV-1 has been
investigated through different module preservation statistics by constructing gene
co-expression modules in [13]. Identification of gene co-expression modules has
been carried out using the weighted gene co-expression network analysis (WGCNA)
framework to study the relevant genes associated with LIHC in [5, 11]. Nevertheless,
the inherent complexities of biomolecular mechanisms necessitate computational
tools for analyzing biological networks with integrated data collected from multiple
data sources (“omes”) [17, 30, 31].
In this context, the present work proposed a computational framework to inves-
tigate the dynamic characteristics of biomolecular systems in liver hepatocellular
carcinoma by integrating gene expression data with protein-protein interaction (PPI)
data through a non-negative matrix factorization algorithm. Initially, we evaluated
the differential expression of genes through the edgeR [28] R/Bioconductor pack-
age on the RNASeq raw transcript abundance count data collected from The Cancer
Genome Atlas (TCGA). Subsequently, the regularized logarithm (rlog) normalized
gene expression profiles of the differentially expressed (DE) genes were utilized for
GCN construction. The identification of gene co-expression modules has been car-
ried out by CoExpNets [6] that incorporates a refinement in the widely used weighted
gene co-expression network analysis (WGCNA) [21] framework via k-means as an
additional processing step to discover more biologically meaningful communities.
We obtained the protein-protein interaction (PPI) data for the DE genes from the
STRING webserver. Identification of protein complexes from the constructed PPI
network (PPIN) was performed using the protein complexes from coherent partition
(PC2P) algorithm [25] that formalizes protein complexes as biclique spanned sub-
graphs, including both sparse and dense subgraphs. A cluster assignment matrix was
then prepared from the clustering solutions obtained through the above two cluster-
ing techniques that incorporate both gene expression (transcriptomic) and protein-
protein interaction (proteomics) data. Subsequently, consensus gene communities
were discovered through a non-negative matrix factorization algorithm. Finally, we
have performed a gene ontology (GO) and pathway-based analysis to interpret the
biological significances of the identified modules. Figure 1 shows the brief overview
of the overall framework adopted in the present analysis.
Identification of Gene Communities in Liver Hepatocellular Carcinoma … 413
Transcript abundance (counts) of

normal and LIHC samples
Dataset Preprocessing:
1. rlog Normalization
2. DE Gene Analysis
Normal LIHC
Samples Samples
edgeR to Identify Differentially

Expressed (DE) Genes in LIHC
DE Genes
Gene Expression of Protein-Protein Interaction

LIHC Samples
DE Genes (DEGE) Network of the DE Genes
Gene Coexpression
Network (GCN)
GCN Modules Protein Complexes

through CoExpNets through PC2P
offset NMF
Integrated gene
communities (IGCs)
Biological Significance Analysis
Fig. 1 The figure shows overall framework adopted in the present work
2 Method
2.1 Dataset Preparation
This work focuses on two biological data types: gene expression profiles and protein-
protein interaction (PPI) data. Liver hepatocellular carcinoma (LIHC) RNASeq
dataset was obtained from The Cancer Genome Atlas (TCGA). It comprises of raw
counts of 56,493 Ensembl genes across 421 samples which are categorized into 371
tumor samples and 50 normal samples. Normalized gene expression was extracted
from the raw read counts using a regularized logarithm (rlog) transformation [1].
The edgeR [28] R/Bioconductor package was used here to analyze read counts from
RNASeq gene expression profiles to evaluate differential gene expression for it’s
efficient statistical negative binomial (NB) modeling on count data. 1964 differen-
tially expressed (DE) official genes were identified with adjusted p-value ≤ 0.01
by comparing the tumor and normal samples. The protein-protein interaction net-
work (PPIN) of the DE genes was obtained from the STRING webserver comprising
13,210 interactions. Normalized gene expression profiles of the DE genes along with
their PPI information were taken into account for further analysis.
2.2 Construction of GCN and Identification of Gene Modules
2.2.1 GCN Construction
A gene co-expression network (GCN) specifies the connection strength among the
participating genes indicating the correlation of their expression profiles. Initially,
we constructed the network represented by a symmetric adjacency matrix Ad jd×d ,
where each element, Ad ji j denotes the connection strength between the genes i and
j and d is the number of participating genes. We have used the Pearson correlation
of expression profiles for each pair of genes to compute their connection strength in
the present work.
2.2.2 Adjacency Matrix Transformations
Many real-world networks, including biological networks approximately follows

scale-free property. Therefore, we utilized the following power transformation [32]
with fixed power τ = 18 to make our initial GCN scale-free:
Ai j = |Ad ji j |τ (1)
Later, the scale-free GCN was augmented through the topological overlap measure
(TOM)-based similarity measure that defines the relative inter-connection strength
between each two nodes considering their shared neighborhood [21].
2.2.3 Identification of GCN Modules
The identification of gene co-expression modules has been carried out by CoExpNets
[6] that incorporates a refinement in the widely used weighted gene co-expression
network analysis (WGCNA) [21] framework via an additional processing step to
discover more biologically meaningful gene communities. In particular, the average

linkage hierarchical clustering via a dynamic tree cut method with the TOM-based
dissimilarity iθj was initially utilized to group the genes based on their expression
profiles for identifying GCN modules. Later, expression profiles of all genes inside
each GCN module was summarized using a module eigengene (ME) which is the
first column of the right singular matrix obtained from the singular value decompo-
sition of the module expression matrix. The dissimilarity among each pair of module
eigengenes Dis Sim I,J were then computed as follows:
1
Dis Sim I,J = (1 − cor(M E I , M E J )), (2)
2
where cor indicates the Pearson correlation coefficient. The dissimilarities computed
as above were then used to discover the GCN meta-modules by reapplying the average
linkage hierarchical clustering [20].
Juan et al. proposed CoExpNets [6] that uses a signed eigengene-based connec-
tivity k M E J (i) between the expression profile of each gene (i) and the J th eigengene
M E J , expressed as 1 − k M E J (i) as a distance measure to perform an additional
k-means clustering to discover refined modules from a GCN:
1
k M E J (i) = (1 + cor(E xi , M E J )), (3)
2
where, E xi refers to the expression profile of the gene i. CoExpNets uses a k-means
algorithm that initialize k cluster centroids with the obtained module eigengenes
(ME) through the previous WGCNA-based framework described above. Genes are
iteratively re-assigned to a new cluster to form the improved gene modules until a
stopping criterion (decrease in the number of misplaced genes) is met.
2.3 Identification of Protein-Complexes from PPI Data
Structural and functional information of distinct proteins form the backbone of

the cellular system. Analyzing protein complex is extremely important for discov-
ering the underlying mechanism of biological processes [14]. We identified pro-
tein complexes (PPIN modules) from constructed PPIN (collected from STRING)
using a parameter-free network-based protein complex predictor algorithm, Protein
Complexes from Coherent Partition (PC2P) algorithm [25]. PC2P formulates the
protein complexes on the basis of key properties of biclique spanned subgraphs
(dense and sparse). A biclique spanned graph G ≡< V, E > is defined as a graph
whose node set can be partitioned into two mutually exclusive but collectively
exhaustive subsets, such that E(G) contains all possible edges between the parti-
tioned node sets. Graph G is biclique spanned if the number of connected compo-
nents in G is ≥ 1 (or it’s complement G is disconnected). It uses a coherent net-
work partition P = {P1 , P2 , . . . , Pk }, such that each Pi is a disconnected subgraph,

∀i ∈ [1, k]. A partition P is obtained by eliminating all edges that connects Pi and
P j , 1 ≤ i = j ≤ k.
The greedy approach in PC2P incorporates quantitative score determination for
every node n of a biclique spanned subgraph, N B2 (n) which is at most 2 distance
from n. The score C N P with η being the biclique spanned subgraph density is
defined as:
2|E in |
η= (4)
| i n|(| i n − 1|)
|E out | 1
CN P = , (5)
|E in | η
where |E in | denotes the number of edges inside subgraph i, |E out | denotes

the number
of edges going out from the subgraph i to the rest of the network and | i n| represents
total number of nodes in ith subgraph. The algorithm attempts to minimize the
proposed score C N P by investigating N B2 (n) based on it’s connectivity. If N B2 (n)
is connected, it identifies the minimum number of node removals that disconnects
the complemented subgraph, otherwise it declares the node set as a biclique spanned
subgraph.
2.4 Detection of Optimal Number of Modules
Determining the optimal number of modules within data samples is a fundamental

issue for a clustering solution, such as the specification of k in k-means clustering.
Visual inspection of an ideal set of modules is inefficient for data with more than 2–3
dimensions. The present work utilizes the TrCovW, TraceW and Friedman cluster
validity indices [7, 9, 10, 24] with TOM-based dissimilarity for identification of opti-
mal number of modules in LIHC gene expression data. These indices use maximum
value of differences among the index hierarchy levels.
2.4.1 TrCovW Index
Milligan et al. proposed the TrCovW index which depicts the trace of within clusters
pooled covariance matrix. The optimal solution is obtained based on the highest
difference scores among the index hierarchy levels [24]. TrCovW score (sc.tr covw)
is defined using:
sc.tr covw = trace(covariance(Dm )), (6)
where Dm is the matrix of within-group dispersion for data clustered into m modules
and is defined as:

m
Dm = (vi − ck )(vi − ck )T , (7)
i=1 i∈Ck
where vi is the ith d-dimensional feature vector and ck is the centroid for the
module Ck .
2.4.2 TraceW Index
It has been among the most popular validity indices to be used in clustering context.
This criterion is monotonically increasing with solutions leading to lesser number of
clusters. Thus, the optimal number of modules are determined based on the maximum
value of second difference scores [9]. The TraceW score (sc.tracew) is computed by
sc.tracew = trace(Dm ) (8)
2.4.3 Friedman Index
This index was introduced by Friedman et al. to be used as a basis for a non-
hierarchical clustering technique [10] and the Friedman score (sc.Friedman) is
computed by
sc.Friedman = trace(Dm−1 Bm ), (9)
where Bm is the inter-group dispersion matrix for a m-clustered dataset and is

expressed as:
m
Bm = numi (ci − x)(ci − x)T , (10)
i=1
where numi is the number of objects in module Ck with x as the centroid of overall
data matrix.
2.5 Identification of Gene Communities by Integrating GCN

Modules and Protein Complexes
Non-negative matrix factorization (NMF) is a multivariate analysis algorithm where

a matrix V is usually decomposed into two matrices W and H such that all elements
in V, W and H are ≥ 0, i.e. V ≈ W H where W ∈ R p×q , V ∈ R p×k and H ∈ Rq×k .
NMF has an inherent clustering property [8] that automatically forms clusters of the
input data V as {v1 , v2 , v3 , . . . , vn }. Specifically, V is approximated from V W H
by minimizing the error objective function ||V − W H || F with constraints W ≥ 0,
H ≥ 0.
In this work, we have used offset NMF approach by Badea et al. [4] as a consensus
clustering technique to integrate GCN modules and protein complexes. This method
is a modified version from standard NMF which uses simple multiplicative updates
based on a Euclidean distance to fit a model including an intercept. To execute the
NMF algorithm, initially, we prepared two module assignment matrices (C m×n ),
separately, one from the GCN modules (GCM) and the other from the PPIN modules
(PCM):
1 if gene [q] ∈ module [ p]
C pq = (11)
0 otherwise,
where, m denotes the number of modules (either in GCM or in PCM) and n refers
to the number of genes. We prepared a single combined module assignment matrix
from the above two matrices by horizontal concatenation and used it as an input
to the offset NMF method to discover gene meta-communities that incorporates
characteristics of both gene expression profiles and PPI information. The rank for
NMF decomposition was set to the optimal number of modules detected from the
gene co-expression network.
3 Result
3.1 Evaluation of Differential Expression
In the present work, we have have selected top 1964 DE official genes in LIHC
following the method described in Sect. 2.1. Figure 2 presents the MA (ratio intensity)
plot indicating significantly up- and down-regulated genes. We found that the gene
‘REG3G’, ‘PGC’, ‘REG3A’, ‘REG1B’, ‘LGALS14’, ‘REG1A’, ‘CLPS’, ‘LIN28B’,
‘PAEP’, ‘DCAF4L2’, ‘PAGE1’, ‘COL2A1’, ‘PRSS1’, ‘CPLX2’, ‘MAGEB2’ were
the top 15 DE genes in LIHC from our edgeR analysis.
3.2 Outcomes of GCM and PCM Identification
We have obtained 4 co-expressed gene meta-modules by using the approaches men-

tioned in Sect. 2.2 comprising 309, 882, 503 and 270 number of genes, respectively.
Figure 3 shows the dendogram obtained from the WGCNA-based dynamic tree cut
average linkage hierarchical clustering algorithm incorporating module eigengene-
based dissimilarity among the initially obtained gene modules. Same color was
assigned to the genes obtained in a particular gene meta-module. This clustering
solution was then used in the k-means algorithm as discussed in Sect. 2.2.3 leading
to final 4 gene meta-modules.
Up-regulated Down-regulated Not-significant
Log2 fold change
Log2 mean expression
Fig. 2 The figure shows the mean-difference plot (MA plot) picturizing significantly up- and
down-regulated genes in LIHC with absolute fold change ≥ 2
Fig. 3 The figure shows the cluster dendogram obtained through dynamic tree cut and merged
dynamic method
a b c
Fig. 4 The figure shows the cluster validity scores for finding optimal number of clusters in LIHC
using a TrCovW, b TraceW and c Friedman indices
Additionally, we obtained 265 complexes from the PPIN of the DE genes through
the PC2P algorithm (Sect. 2.3). We discovered that the optimal number of modules for
gene communities in LIHC RNASeq data of DE genes is 7 using the TrCovW, TraceW
and Friedman cluster validity indices. Figure 4 shows the score of the TrCovW,
TraceW and Friedman cluster validity indices.
3.3 Identification of Integrated Gene Communities
We have represented the identified GCM and PCM as two separate binary module
assignment matrices with 1964 × 4 and 1964 × 265 dimension, respectively. Finally,
we obtained 7 consensus modules by applying offsetNMF-based module integration
technique proposed in Sect. 2.5 that incorporates both gene expressions and PPI
information in LIHC. The discovered clusters contained 296, 257, 469, 738, 24, 122
and 58 genes, respectively. Figure 5 shows the first four integrated gene communities
(IGCs) depicting significantly up-, down-regulated and hub genes based on maximal
clique centrality (MCC) scores.
4 Biological Significance Analysis
Functional enrichment analysis of the obtained integrated gene communities (IGCs)

were performed to get their biological interpretation. We identified biological pro-
cesses (gene ontology) [3], KEGG pathways and disease gene associations (Dis-
GeNet) of the modules through Enrichr webserver. Figure 6 shows the top two
significant biological processes (BP), KEGG pathways, and diseases genes asso-
ciation enrichment of the identified integrated gene communities (IGCs). IGC1
Gene Gene NAA11

Community 1 ODAM EVX1 Community 3 TMPRSS15
PAEP
KLK4 REG3G
COX7B2
TNP1 TBX4 MYO18B
GABRR3
REG1A SLC6A7 MATN3 MUCL1 KRT12 CD109 XAGE5
TFDP3 MDGA2
PAGE4 ISX DSCR4
MT3 BEST4 LY6K GABRG2
TINAG
REG1B CST1 DMP1
PAGE1
DCAF8L1GABRR1 GABRA3
ANO2 AMBN
WNT3A ANKFN1 DCAF4L2
TUBA3C CA10 DSCR8
PGC CPLX2 KCNJ9 OTOG DCAF8L2
LIN28B
SPARCL1 MAGEA11
SH3GL3 HS3ST4
HOXA13
TERT
KCNU1 PCSK1
IGDCC3 REG3A
EDDM3A Gene
CPA1SLC22A8 Community 4
SI MEP1A
Gene KIRREL2
CRHBP DNER
Community 2 CALCA AFP
IGFN1 SEZ6 BPIFB2
HHATL SCG3 GABRA2
NDST3
PRSS1 NELL1
ARHGAP36
SLC9A3 CLCNKA KCNC1 GRIN1
FAM163B CHGA ECEL1
DIO2
FGF23 LYPD1 LY6H
MARCO PRSS21
INS-IGF2 PRND PSCA
B4GALNT2
TM4SF20
NTM BMPER
ANGPTL7 CALY CEACAM7
IGF2BP1 CPA2
SCG2 DAPL1 GABRE NGB
SLCO1C1 MAG
FCN2 ALPI ZIC5
GP2 TMEM132A
SCNN1G THY1
ASB15 ZIC2 GABRD SLC26A6 SPP1 BPIFA1
MSLN
BMP10 CLEC1B
ADIPOQ NR0B1 CHGB TMEM151B CYP19A1
STAB2 GPC3
CLEC4G RBPJL COL2A1
MS4A10 PZP
MAFA GPR50
FCN3 CLPS DLK1TRIM71
CLEC4M
Node Fill Colour Shape Guide

Top 50 DE genes with highest MCC score Down regulated DE genes
Up regulated DE genes
1 13 25 37 50
Fig. 5 The figure shows the discovered first four gene communities from NMF based integrated
clustering approach
contained ‘CACNA1I’, ‘KCNJ6’, ‘KCNJ9’, ‘CACNA1S’, ‘HCN2’ genes associ-

ated with the gonadotropin-releasing hormone (GnRH) secretion pathway. Stud-
ies show higher expression of GnRH in human hepatocellular carcinoma. IGC2
contained ‘SLC9A3’, ‘MT2A’, ‘MT1F’, ‘MT1G’, ‘MT1X’, ‘MT1H’, ‘MT1HL1’,
‘MT1E’ genes directly associated with ‘JAK/STAT’ pathway, abnormal activation of
which promotes angiogenesis, metastasis, tumor growth in hepatocellular carcinoma
(HCC). Moreover, 41 genes in IGC2 are related to diabetes and diabetes mellitus.
The occurrence of HCC is 2–3 times higher in diabetic patients [22]. IGC4 contained
‘GABRA2’, ‘GRIN2A’, ‘CACNA1B’, ‘GABRE’, ‘GABRD’, ‘GRIN1’ involved in
the nicotine addiction pathway which is highly associated with HCC. ‘MMP13’,
Module KEGG Pathway Gene Ontology (Biological Process) Disease-Genes Association

Positive regulation of epithelial cell
GnRH secretion; MAPK
IGC 1 proliferation (GO:0050679); Response to Eye Color; Skin Pigmentation
signaling pathway
peptide hormone (GO:0043434)
Cellular response to zinc ion
Mineral absorption; JAK- Coronary Arteriosclerosis; Diabetes /
IGC 2 (GO:0071294); Cellular response to
STAT signaling pathway Diabetes Mellitus
copper ion (GO:0071280)
Nicotine addiction; Pentose Cell-cell adhesion mediated by cadherin
Crigler Najjar syndrome - type 1;
IGC 3 and glucuronate (GO:0044331); Adherens junction
Cystadenocarcinoma - Mucinous
interconversions organization (GO:0034332)
Skeletal system development
Nicotine addiction; IL-17 Finnish congenital nephrotic syndrome;
IGC 4 (GO:0001501); Striated muscle
signaling pathway Defect of vertebral segmentation
contraction (GO:0006941)
Negative regulation of RNA biosynthetic
Transcriptional process (GO:1902679); Purine Malignant neoplasm of testis;
IGC 5
misregulation in cancer ribonucleoside monophosphate catabolic Adenocarcinoma
process (GO:0009169)
Mitotic sister chromatid segregation
p53 signaling pathway;
IGC 6 (GO:0000070); Mitotic spindle organization Carcinogenesis; Liver carcinoma
Fanconi anemia pathway
(GO:0007052)
Phospholipase C-activating G-protein
Neuroactive ligand- coupled receptor signaling pathway
IGC 7 receptor interaction; (GO:0007200); positive regulation of Neuroendocrine Tumors; Obesity
Calcium signaling pathway cytosolic calcium ion concentration
(GO:0007204)
Fig. 6 The figure shows the top two significant biological processes (BP), KEGG pathways, and
diseases genes association enrichment of the identified integrated gene communities (IGCs)
‘MMP1’, ‘IL17F’, ‘MUC5B’, ‘IL17D’, ‘MUC5AC’, ‘IL17B’ genes in IGC4 con-

tribute to the interleukin-17 (IL-17) signaling pathway promoting alcohol-induced
HCC. ‘CCNB2’, ‘CCNB1’, ‘CCNE1’, ‘RRM2’, ‘CDK1’, ‘GTSE1’ genes in IGC6
are associated with the p53 signaling pathway whose alteration affects all stages of
HCC development [19]. Patients with Fanconi anemia are more prone to tumors in
liver [23]. IGC6 contained ‘FANC1’, ‘BLM’, ‘RAD51’, ‘UBE2T’ involved in the
Fanconi anemia pathway. More than 50 genes in IGC6 were directly associated with
liver carcinoma and carcinogenesis. Several studies show the involvement of neu-
roactive ligand-receptor interaction with HCC, and IGC7 contained 43 genes having
a significant contribution to that pathway. Certain neuroendocrine tumors (NET)
which are metastasized to the liver can feature hepatocellular carcinoma (HCC)
[2]. IGC7 comprises ‘GRP’, ‘LPAR3’, ‘GCG’, ‘TACR1’, ‘NTS’, ‘BRS3’, ‘SSTR5’,
‘GPRC6A’, ‘CXCL12’, ‘SST’, ‘PPY’, ‘GAST’, ‘NTSR2’ genes related to NET.
Obesity is an absolute risk factor responsible for the development of (HCC) [29].
27 genes in IGC7 are responsible for the development of obesity. Furthermore, we
discovered that 62, 128, 50, 85, 14, 109 and 28 genes are directly associated with
liver carcinoma and adult liver carcinoma found in the IGCs 1–7, respectively.
5 Conclusion
In our work, we have proposed a computational framework to discover gene commu-

nities by integrating RNASeq gene expression data with protein-protein interaction
(PPI) data through offset non-negative matrix factorization (offsetNMF) to gain better
insights into biological networks. We found that the identified integrated gene com-
munities (IGCs) are highly associated with LIHC related biological processes and
pathways. This study may be further enriched by integrating other distinct omics data
and other modern machine learning algorithms. LIHC is one of the major primary
liver cancers worldwide causing millions of pre-mature deaths every year. Further
experimental study of the gene communities are needed to elucidate in-depth under-
standing about this deadly disease. Discovery of potential biomarkers and survival
analysis of the identified genes inside the gene communities may offer more precise
diagnosis and therapeutic remedies of LIHC at an early stage.
References
1. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Nat Pre-
cedings 1–1
2. Arista-Nasr J, Fernández-Amador JA, Martínez-Benítez B, de Anda-González J, Bornstein-
Quevedo L (2010) Neuroendocrine metastatic tumors of the liver resembling hepatocellular
carcinoma. Annals Hepatol 9(2):186–191
3. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K,
Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet
25(1):25–29
4. Badea L (2008) Extracting gene expression profiles common to colon and pancreatic ade-
nocarcinoma using simultaneous nonnegative matrix factorization. In: Biocomputing. World
Scientific, pp 267–278
5. Bai KH, He SY, Shu LL, Wang WD, Lin SY, Zhang QY, Li L, Cheng L, Dai YJ (2020)
Identification of cancer stem cell characteristics in liver hepatocellular carcinoma by WGCNA
analysis of transcriptome stemness index. Cancer Med 9(12):4290–4298. https://doi.org/10.
1002/cam4.3047
6. Botía JA, Vandrovcova J, Forabosco P, Guelfi S, Sa D, Hardy K, Lewis J, Ryten CM, Weale
M (2017) An additional k-means clustering step improves the biological features of WGCNA
gene co-expression networks. BMC Syst Biol 11(1):1–16
7. Charrad M, Ghazzali N, Boiteau V, Niknafs A (2014) NbClust: an R package for determining
the relevant number of clusters in a data set. J Stat Softw 61(6):1–36
8. Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and
spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining.
SIAM, pp 606–610
9. Edwards AW, Cavalli-Sforza LL (1965) A method for cluster analysis. Biometrics 362–375
10. Friedman HP, Rubin J (1967) On some invariant criteria for grouping data. J Am Stat Assoc
62(320):1159–1178
11. Gu Y, Li J, Guo D, Chen B, Liu P, Xiao Y, Yang K, Liu Z, Liu Q (2020) Identification of 13
key genes correlated with progression and prognosis in hepatocellular carcinoma by weighted
gene co-expression network analysis. Front Genet 11:153. https://doi.org/10.3389/fgene.2020.
00153
12. Hossain SMM, Halsana AA, Khatun L, Ray S, Mukhopadhyay A (2021) Discovering key
transcriptomic regulators in pancreatic ductal adenocarcinoma using Dirichlet process Gaussian
mixture model. Sci Rep 11(1):7853. https://doi.org/10.1038/s41598-021-87234-7
13. Hossain SMM, Khatun L, Ray S, Mukhopadhyay A (2021) Identification of key immune
regulatory genes in hiv-1 progression. Gene 792:145735. https://doi.org/10.1016/j.gene.2021.
145735
14. Hossain SMM, Mahboob Z, Chowdhury R, Sohel A, Ray S (2016) Protein complex detection in
PPI network by identifying mutually exclusive protein-protein interactions. Procedia Comput
Sci 93:1054–1060. https://doi.org/10.1016/j.procs.2016.07.309
15. Hossain SMM, Ray S, Mukhopadhyay A (2019) Identification of hub genes and key modules
in stomach adenocarcinoma using nsnmf-based data integration technique. In: IEEE 2019
international conference on information technology (ICIT), pp 331–336
16. Hossain SMM, Ray S, Mukhopadhyay A (2017) Preservation affinity in consensus modules
among stages of HIV-1 progression. BMC Bioinform 18(1):181
17. Hossain SMM, Ray S, Mukhopadhyay A (2020) Detecting overlapping gene communities dur-
ing stomach adenocarcinoma: a discrete nmf-based integrative approach. In: 2020 IEEE inter-
national conference on advent trends in multidisciplinary research and innovation (ICATMRI),
pp 1–6. https://doi.org/10.1109/ICATMRI51801.2020.9398458
18. Hossain SMM, Ray S, Tannee TS, Mukhopadhyay A (2017) Analyzing prognosis characteris-
tics of Hepatitis C using a biclustering based approach. Procedia Comput Sci 115(Supplement
C):282 – 289
19. Krstic J, Galhuber M, Schulz TJ, Schupp M, Prokesch A (2018) p53 as a dichotomous regulator
of liver disease: the dose makes the medicine. Int J Mol Sci 19(3):921
20. Langfelder P, Horvath S (2007) Eigengene networks for studying the relationships between
co-expression modules. BMC Syst Biol 1(1):1–17
21. Langfelder P, Horvath S (2008) Wgcna: an r package for weighted correlation network analysis.
BMC Bioinform 9(1):1–13
22. Li X, Wang X, Gao P (2017) Diabetes mellitus and risk of hepatocellular carcinoma. BioMed
Res Int
23. Masserot-Lureau C, Adoui N, Degos F, de Bazelaire C, Soulier J, Chevret S, Socié G, Leblanc T
(2012) Incidence of liver abnormalities in Fanconi anemia patients. Am J Hematol 87(5):547–
549
24. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number
of clusters in a data set. Psychometrika 50(2):159–179
25. Omranian S, Angeleska A, Nikoloski Z (2021) Pc2p: parameter-free network-based prediction
of protein complexes. Bioinformatics
26. Ray S, Hossain SMM, Khatun L (2016) Discovering preservation pattern from co-expression
modules in progression of HIV-1 disease: an eigengene based approach. In: 2016 IEEE interna-
tional conference on advances in computing communications and informatics, ICACCI 2016,
Jaipur, September 21–24, 2016. IEEE, pp 814–820
27. Ray S, Hossain SMM, Khatun L, Mukhopadhyay A (2017) A comprehensive analysis on
preservation patterns of gene co-expression networks during Alzheimer’s disease progression.
BMC Bioinform 18(1):579
28. Robinson MD, McCarthy DJ, Smyth GK (2010) Edger: a bioconductor package for differential
expression analysis of digital gene expression data. Bioinformatics 26(1):139–140
29. Saitta C, Pollicino T, Raimondo G (2019) Obesity and liver cancer. Annals Hepatol 18(6):810–
815
30. Song E, Song W, Ren M, Xing L, Ni W, Li Y, Gong M, Zhao M, Ma X, Zhang X, An R (2018)
Identification of potential crucial genes associated with carcinogenesis of clear cell renal cell
carcinoma. J Cell Biochem 119(7):5163–5174. https://doi.org/10.1002/jcb.26543
31. Sun M, Song H, Wang S, Zhang C, Zheng L, Chen F, Shi D, Chen Y, Yang C, Xiang Z, Liu Q,
Wei C, Xiong B (2017) Integrated analysis identifies microrna-195 as a suppressor of hippo-
yap pathway in colorectal cancer. J Hematol Oncol 10(1):79. https://doi.org/10.1186/s13045-
017-0445-8
32. Zhang B, Horvath S (2005) A general framework for weighted gene co-expression network
analysis. Stat Appl Gene Mol Biol 4(1). https://doi.org/10.2202/1544-6115.1128
Machine Learning Based Approach
for Therapeutic Outcome Prediction
of Autism Children
C. S. KanimozhiSelvi, K. S. Kalaivani, M. Namritha, S. K. Niveetha,
and K. Pavithra
Abstract Autism spectrum disorder (ASD) refers to a complex neuro developmental

conditions that occur within early years of life. Early intervention may help improve
neuro developmental disturbances arising from impoverished socio-emotional inter-
actions in the first years of life, Machine learning helps build a system that automati-
cally improves through progressive learning and predicts outcome. Autism spectrum
disorder affecting the early development of a child and continue as a lifelong disorder
which can be improved with the appropriate therapy. Each autistic child has varied
signs and symptoms and so requires unique therapy. The smaller number of avail-
able special teachers and therapists found difficult to plan and monitor the therapy
progress and outcome for every child. It will be very useful if an automated expert
system available to monitor the progress of the therapeutic plan and outcome for
autistic children. The simple game based therapeutic tool available for early devel-
opmental period of an autistic child will be used to treat the autistic child. Behavior
of the child will be monitored progressively. The monitored data and result of the
therapeutic plan can be used to enhance or modify the therapeutic plan and also helps
suggest therapeutic plan for autistic children with similar signs and symptoms.
Keywords Autism spectrum disorder · Machine learning · Therapy · Game tool ·

Outcome prediction
1 Introduction
Autism spectrum disorder is a condition affecting the early development of the child
and continued as a lifelong disorder [1]. The condition cannot be cured but can
improve with the appropriate therapy. These therapies may help the kid function and
participate in the community by reducing symptoms, improving cognitive ability
and daily living skills. It needs a proper therapy plan to accomplish success. The
condition can be controlled if treated at early stage [2]. Each autistic child varied
C. S. KanimozhiSelvi (B) · K. S. Kalaivani · M. Namritha · S. K. Niveetha · K. Pavithra

Department of Computer Science and Engineering, Kongu Engineering College, Perundurai,
Erode 638060, India
https://doi.org/10.1007/978-981-16-7610-9_31
426 C. S. KanimozhiSelvi et al.
with signs and symptoms; require unique therapy plan for every child. As a result,
therapy plans are often interdisciplinary, include parent-mediated interventions, and
are tailored to the child’s specific requirements.
Behavioral intervention plans have emphasized the development of social commu-
nication skills, especially at young ages when children are naturally learning these
abilities, as well as the elimination of restricted interests and repetitive and problem-
atic behaviors. Occupational and speech therapy, as well as social skills training and
medication, may be beneficial for some children. Depending on an individual’s age,
skills, challenges, and characteristics, the optimum treatment or intervention may
change.
Due to the less availability of expert therapeutic decision system, choosing the
appropriate therapy for a particular child is always very difficult for the therapist.
Often the therapeutic cost becomes high and unaffordable to parents with weak
economic background.
Governmental institutions do not have sufficient number of therapists, special
teachers and other experts to handle the increasing number of autism children. It is
always very difficult for the smaller number of available special therapists to plan
the therapy and monitor the progress and outcome of the therapy for every child. The
parent-mediated therapy has the proven benefits and the being useful method with
the India like populated country. The parent-mediated therapy will be possible if the
therapeutic system available being user-friendly and useable with minimal education
and expertise and even by the children themselves [3].
The game based therapeutic application may help to improve the skills of the
autistic child. However, it could not become the real-life interaction, and the child
may found difficulty in applying the learned skills with the real-life situation. The
autistic children having the tendency to addict to visual sensory stimuli and tendency
to stick to the routine may prone to addiction with the therapeutic games itself
and found difficult to divert from the games. The parent-mediated therapy may not
possible if the parent themselves suffer from any mental difficulty, being working
parents having little time to spend with their disabled child, etc.
If the simple game based therapeutic tool available for early developmental period
of an autistic child will be of great help for the parents and will help the skill devel-
opment at the early developmental period than at the school age [4]. The therapist’s
burden can be reduced if an expert system is available to predict and suggest the
therapeutic plan and can measure the therapy progress and outcome of the therapy
for every autistic child, and further can predict the possible therapeutic plan for a
child coming with similar set of autistic features.
2 Literature Review
AR, augmented reality, is identifying an application that provides environment that

combines real world attributes and digital information. S.K. Bhatt, N.I. De Leon
and Adel Al-Jumaily in a journal [5] proposed AR based games that helps improve
Machine Learning Based Approach for Therapeutic Outcome … 427
hand–eye coordination and social interaction in children affected by autism. Two

types of game applications have been identified to improve motor skills in autistic
children thereby enhancing their social skills.
Mohammed E. Hoque, Joseph K. Lane, Kaliouby, Matthew Goodwin and
Rosalind W. Picard in an article [6], have identified a novel intervention, a game
type application which customizes speech. This application uses audio processing
algorithms and analysis of speech identifies areas of deficit and provides practice
environment to develop intelligent speech. This application supports to enhance
communicative skills in autistic children.
Diajeng Phytanza and Erick Burhaein have proposed aqua-based activities as a
therapy for autistic children [7]. This aims in improving psychological and physical
ability. It also helps improve social, emotional and communicative skills. The five
different types of aquatic games proposed here concentrates in frequency, intensity,
time and type which are abbreviated as FITT. This therapy method proves to improve
eye contact and perspectives skills in autistic children. Louanne E. Boyd, Kathryn
E. Ringland and Oliver L. Haimson evaluated an i-Pad game’s impact on social
skills in autism spectrum disorder [8]. The participants involved in this game were
autistic children between third and fifth grades. Evaluation considered any technical
challenge and social behavior of the participants. The evaluation results were that the
game facilitated membership, partnership and friendship without a human mediator.
Grossard, Grynspan, Serret, Jouen, Bailly and Cohen proposed designing serious
games to teach social interaction and emotions to autistic children [9]. They used
information communication technologies (ICTSs) in therapy. They studied the avail-
able games in some databases. They found that in a total of 31 serious games, 16 were
specific to train emotional recognition and 15 to train social skills. On analyzing the
game usage reports, they observed significant correlation between the reports now
and the reports then. The main limitation of this method is that the validation proce-
dure was not up to medical standards and game design seems to be complicated. The
future requirements were more collaboration between game designer and clinical
experts, and serious games have to be developed considering less intelligent kids.
The task of choosing an appropriate therapeutic plan and monitoring the therapy
progress is uncertain and time consuming. Pennisi et al. [10] identified that the use
of social robots in ASD therapy can have positive effect and perform better than
humans. They identified that during therapy sessions, reduced repetitive and stereo-
typed behaviors were observed and the robots also managed to improve spontaneous
language. It is also identified that the researchers and therapists can connect easily
with the help of robots.
Wieckowski and White [11] reviewed the existing technology-based interventions
to improve the social communication impairments. The technologies that can be used
are mobile, computers, robotics and virtual reality. The authors also identified that
the research on use of technology for investigation of impairment in reception of
non-facial communication is limited.
To improve the learning and social interaction abilities of ASD children,
Bharatharaj et al. [12] designed a parrot-inspired robot and an indirect teaching
technique called AMRM. The effectiveness of the proposed robot was analyzed by
using emotion recognition system and closed-format questionnaire. Both the analysis
showed positive results in acceptance of the parrot-inspired robot and AMRM.
Linstead et al. [13] used the Applied Behavior Analysis (ABA) technique for
effective treatment for autism spectrum disorder (ASD) children. They studied and
evaluated the autism children’s’ treatment intensity and duration of the treatment
period of various treatment domains like social, cognitive, adaptive skills, academic,
language, motor, executive and play skills by using linear regression. The result
shows that academic and language domains are highly response for the treatment
intensity and duration. The domains are based on the Applied Behavior Analysis
(ABA).
Adam Mourad Chekroud et al. developed a 12 week citalopram course for
predicting outcome of symptomatic remission, which was used by machine learning
based algorithm with clinically rated antidepressant data. This system is supported to
evaluate disease risk, reappearance, or supportability of treatment. This model is most
effective to identify the future respondent’s for the person with depression, this system
validated improvement of depressive level on next 2 weeks. The model is trained 25
identified features by using Sequenced Treatment Alternatives to Relieve Depression
(STAR*D) and COMED trials, and validation is done with ten-fold cross validation.
This technique is not suitable for prediction of non-responder in the medication and
this model outcome prediction is suitable only for basic level.
Linstead et al. [14] demonstrated the benefits of machine learning techniques to
predict learning outcome of behavioral therapy for children with autism spectrum
disorder (ASD). The Applied Behavioral Analysis (ABA) therapies depends on chil-
dren behavioral principles of learning, motivation, stimulus control, generalization
and reinforcement. This system supports to select high-intensity ABA treatment or
low intensity ABA treatment. In the neural network, the linear regression model
has best suitable to identify the learning outcome prediction dataset. This creates the
correlation between age, gender, treatment hours and intensity of the ASD treatment.
Dvornek et al. [15] focused on selecting best therapy for the particular ASD
affected children and avoids ineffective intervention outcomes in loss of money and
intervention time. The Pivotal Response Treatment (PRT) therapies supported to the
motivation and self-initiation which is more time consuming to training ASD children
and care takers. So, this machine learning techniques random forest and tree bagging
models are used to suggest the best therapy from large amount of available therapies
to use and get better outcome by using visual based fMRI images.
3 Flow Diagram
Figure 1 represents the flow diagram of predicting the outcome of therapy given
to the autistic children. There are different phases namely collecting data, building
model, allocating treatment plan and predicting the therapy improvement. Initially,
the dataset which is the form of rows and columns that are gathered from special
schools and autism centers, are converted to a CSV (Comma-Separated Values) file.
Fig. 1 Flow diagram of

predicting the outcome of
therapy
Then, clustering of autistic children is done with each cluster containing children with
most similar symptoms. This will be done using K-Means clustering algorithm. Then,
treatment will be allocated based on the cluster formed. Then, improvement will be
predicted based on the scores obtained from the therapy using Pearson correlation
algorithm.
4 Module Descrıptıon
The data has been collected for children with autism spectrum disorder. The data
collected was the answers from parent’s questionnaire. A total of 84 children partic-
ipated, and based on the data collected from their parents, they were clustered into
five groups. The clustering has been done by K-means algorithm. It was to separate
children with most similar symptoms into different categories. After the clustering
of children into groups, they were suggested therapy in the form of games. A game
is proposed that would measure and also improve the severity of symptoms in the
children, in their listening skills [16]. This game would consist of three parts, which
help in different issues that arise with ASD affected children in the listening skill
area. A child affected by ASD can lack what other normal children do, in specific
ways that are mentioned by the parents’ questionnaire, by severity that ranges from
mild to highly severe. The severity can be determined by the time it takes for the
child to respond to name-calling, verbal commands and looking toward the source,
whether or not the child reacts to extraneous sounds, or whether the child can coor-
dinate other senses as well have a response to listening. Based on the results from
the game score database, the game scores are correlated with those scores from the
initial parents’ questionnaire, which will then be used to determine how far a child
has progressed or regressed [17].
5 Data Collection
Real-time dataset has been collected from SSA schools. Figure 1 is the sample
dataset. The source of dataset is parent questionnaire [18] this is to be filled by the
parents. This questionnaire contains 14 categories of questions with subdivision in
each category. The 14 categories are communicative skill, listening skill, social skill,
non-verbal skill, imitation, use of objects and interest, emotional response, visual
response, body movement and use level of activity, senses, adaptation to change,
fear and nervousness and intellectual and functional language skills. These questions
are specific to symptoms that are identified in the autism affected children and are
categorized to skills list accordingly. The answers to these questions are obtained in
the form of rating scale [1–4] which increases with increase in severity. Further the
result of questionnaire is converted into required dataset.
6 Clustering and Treatment Allocation
The dataset, which consists of the score of the 84 children, now would be clustered
into five groups. The clustering is necessary to identify children with closely relatable
symptoms, and so, prescription of therapy would be made easier. Figure 2 is the visual
Fig. 2 Parent questionnare dataset sample
representation of children separated into five different groups by using a machine

learning algorithm called the K-means, which clustered the children based on the
dataset given. K-means clusters data based on closely on relate to the other, and
how far it is from another. By using the algorithm, the children were separated into
five groups, with relatable symptoms and severities. The clusters were evaluated and
given suitable therapy for betterment and continuous observation. This model gives
a silhouette coefficient of 0.506.
7 Game Applıcatıon
Here, simple game based therapeutic tool available for early developmental period of
an autistic child will be of great help for the parents and will help the skill development
at the early school age. On account of this, a game is developed to enhance the
listening capability in children which is a category in the parent questionnaire. It was
designed to help literate or non-literate children with ASD aged between six [19]
and twelve to increase their listening skill.
Three different types of game are developed where each game maps improve the
symptom that is questioned to identify the level of listening ability in the autistic child.
The first game of application has the category of things like vegetables, fruit, shapes,
animals, colors and birds. Children can select any of the categories mentioned, and
they will be directed to the next page where sound of an animal or else will be played
and total of three pictures of specifies category will be displayed. After listening to
the sound, children have to choose the appropriate picture. Here, sound can be played
any number of times. If the children choose the correct picture, score will be added
but the score will be reduced based on the number of times sound played. Full score
will be given only when the kid finds the correct at the first try [Fig. 3].
The second game consists of audio that is mixed with noises that play in the
middle of smooth music. The participant has to identify if they were distracted by
the noise, for which the game would record a timestamp. Depending on the time, it
Fig. 3 Output of K-means

clusterıng of autıstıc chıldren
took for the participant to recognize that there is a noise, the severity of autism for
listening skill would be determined, and score is recorded [Fig. 4].
The third game consists of a set of pictures, and the corresponding sounds related
to them. The audio would consist of sounds that relate to the pictures given, in any
order. The participant must choose if the pictures in the same order as they hear the
clips from the audio file. Finally, the score is recorded [Fig. 5].
Progress of treatment can be measured frequently using the score gained by chil-
dren and based on that extra care will be given to the specified children. Finally,
the Game score will be calculated and converted to the format of the dataset that
was obtained before the clustering of children and allocation of treatment. Using
these two sets of datasets obtained, outcome of the treatment will be predicted using
covariance.
Fig. 4 Game 2 sample

question
Fig. 5 Game 3 sample

question
This game will be helpful for both treating the autistic children and also to measure
the enhancements in the treatment given (Figs. 6–8).
8 Monitoring Therapy Progress
The activity of children in the game will be monitored, and the therapy progress will
be measured as numerical score. The numerical score obtained three different parts
of the game is converted to values in the scale between 1 and 4 which is equivalent
to the scale in the parent questionnaire test data obtained before involving the child
in the therapy. The parameters considered in score evaluation are the total number
of attempts, the score of child in the before treatment dataset (Fig. 1) and number of
correct and wrong answers by the child. In the report page, the highest and recent
score of individual game is displayed, and the next displayed is the severity level of
the child as far as listening skill is considered (Fig. 6).
Fig. 6 Game 4 sample question
Fig. 7 Game 5 sample question

Fig. 8 Report sample
9 Correlation Between Datasets
Pearson’s relationship coefficient gives an approach to assess how well two arrange-
ments of information are identified with one another, X versus Y on a diagram.
The direct relationship you are searching for includes utilizing a straight line
(y = mx + b), however, a connection coefficient can be determined utilizing various
recipes like polynomials.
Data that is collected after involving autistic child in games relevant to skills in
which the child is deficit is considered. The new dataset must be correlated [20]
with the original dataset that was collected at the beginning of the therapy. Pearson’s
correlation coefficient has been used to correlate between the initial stage of the chil-
dren, and the progress that has been made after game was suggested. The Pearson
correlation algorithm is applied to the two datasets, and it creates the correlation
coefficient for each of the data as shown in Fig. 9. Here, negative correlation coef-
ficient indicates that the treatment plan has proven successful in treating the child
[Fig. 10]. This is done to continually monitor the progress made by the children, and
whether or not it has created an impact on them. If the current plan doesn’t work,
another plan would be suggested.
Fig. 9 Correlation output
10 Conclusion
Autism spectrum disorder is a condition affecting the early development of the child
and continued as a lifelong disorder. However, the condition cannot be curable fully
but can improve with the appropriate therapy. It needs a proper therapy plan to
accomplish success. Each autistic child varied with signs and symptoms; require
unique therapy plan for every child. Due to the less availability of expert therapeutic
decision system, choosing the appropriate therapy for a particular child is always very
difficult for the therapist. Often the therapeutic cost becomes high and unaffordable
to parents with weak economic background. The Governmental institutions do not
have sufficient number of therapists, special teachers and other experts to handle
the increasing number of autism children. It is always very difficult to plan the
therapy and monitor the progress and outcome of the therapy for every child due
to a smaller number of available special therapists. The parent-mediated therapy
Fig. 10 Therapy outcome
has the proven benefits and being useful method for most populated country like
India. Parent-mediated therapy will be possible if the therapeutic system available as
user-friendly can be useable with minimal education and expertise and even by the
children themselves. If a simple game based therapeutic tool is available for early
developmental period of an autistic child will be of great help for the parents and
will help the skill development at the early developmental period than at the school
age. The therapist’s burden can be reduced if an expert system available to predict
and suggest the therapeutic plan and can measure the therapy progress and outcome
of the therapy for every autistic child, and further predict the possible therapeutic
plan for a child coming with similar set of autistic features.
In this paper, a game application has been developed considering only listening
skill. Further therapeutic game tool has to be developed for other 13 categories of
skill considered in parent questionnaire. By allowing several autistic children to use
the tool developed and by monitoring the progress, more statistical data of outcome
of the treatment will be collected, processed and be used to train the machine learning
model to suggest more accurate treatment plan in future.
References
1. Bernardes M, Barros F, Simoes M, Castelo-Branco M (June 2015) A serious game with virtual
reality for travel training with autism spectrum disorder. In: 2015 International conference on
virtual rehabilitation (ICVR). IEEE, pp 127–128
2. De Urturi ZS, Zorrilla AM, Zapirain BG (July 2011) Serious game based on first aid education
for individuals with autism spectrum disorder (ASD) using android mobile devices. In: 2011
16th International conference on computer games (CGAMES). IEEE, pp 223–227
3. Hiniker A, Daniels JW, Williamson H (June 2013) Go go games: therapeutic video games for
children with autism spectrum disorders. In: Proceedings of the 12th international conference
on interaction design and children. pp 463–466
4. Malinverni L, Mora-Guiard J, Padillo V, Valero L, Hervás A, Pares N (2017) An inclusive design
approach for developing video games for children with autism spectrum disorder. Comput Hum
Behav 71:535–549
5. Bhatt SK, De Leon NI, Al-Jumaily A (2017) Augmented reality game therapy for children
with autism spectrum disorder. Int J Smart Sens Intell Syst 7(2)
6. Hoque ME, Lane JK, El Kaliouby R, Goodwin M, Picard RW (2009) Exploring speech therapy
games with children on the autism spectrum
7. Phytanza DTP, Burhaein E (2019) Aquatic activities as play therapy children autism spectrum
disorder. Int J Disabil Sports Health Sci 2(2):64–71
8. Boyd LE, Ringland KE, Haimson OL, Fernandez H, Bistarkey M, Hayes GR (2015) Evaluating
a collaborative iPad game’s impact on social relationships for children with autism spectrum
disorder. ACM Trans Accessible Comput (TACCESS) 7(1):1–18
9. Grossard C, Grynspan O, Serret S, Jouen AL, Bailly K, Cohen D (2017) Serious games to
teach social interactions and emotions to individuals with autism spectrum disorders (ASD).
Comput Educ 113:195–211
10. Pennisi P, Tonacci A, Tartarisco G, Billeci L, Ruta L, Gangemi S, Pioggia G (2016) Autism
and social robotics: a systematic review. Autism Res 9(2):165–183
11. Wieckowski AT, White SW (2017) Application of technology to social communication
impairment in childhood and adolescence. Neurosci Biobehav Rev 1(74):98–114
12. Bharatharaj J, Huang L, Mohan R, Al-Jumaily A, Krägeloh C (2017) Robot-assisted therapy
for learning and social interaction of children with autism spectrum disorder. Robotics 6(1):4
13. Linstead E et al (2017) An evaluation of the effects of intensity and duration on outcomes cross
treatment domains for children with autism spectrum disorder. Transl Psychiatry 7(9):e1234
14. Linstead E, et al. (2015) An application of neural networks to predicting mastery of learning
outcomes in the treatment of autism spectrum disorder. 2015 IEEE 14th International
conference on machine learning and applications (ICMLA). IEEE
15. Dvornek NC, et al. (2018) Prediction of autism treatment response from baseline fmri using
random forests and tree bagging. arXiv preprint arXiv:1805.09799
16. Kasari C, Gulsrud A, Freeman S, Paparella T, Hellemann G. Longitudinal follow-up of children
with autism receiving targeted interventions on joint attention and play. J Am Academy Child
Adolesc Psychiatry 51(5):12
17. Omar KS, Mondal P, Khan NS, Rizvi MRK, Islam MN (2019) A machine learning approach
to predict autism spectrum disorder. In: 2019 International conference on electrical, computer
and communication engineering (ECCE). pp 1–6. https://doi.org/10.1109/ECACE.2019.867
9454
18. Fletcher-Watson S, Pain H, Hammond S, Humphry A, McConachie H (2016) Designing for
young children with autism spectrum disorder: a case study of an iPad app. Int J Child-Comput
Interact 7:1–14
19. Wei X, Wagner M, Christiano ER et al (2014) Special education services received by students
with autism spectrum disorders from preschool through high school. J Spec Educ 48:167–179
20. Usta MB, Karabekiroglu K, Sahin B, Aydin M, Bozkurt A, Karaosman T, Aral A, Cobanoglu C,
Kurt AD, Kesim N, Sahin İ (2019) Use of machine learning methods in prediction of short-term
outcome in autism spectrum disorders. Psychiatry Clin Psychopharmacol 29(3):320–325
An Efficient Implementation of ARIMA
Technique for Air Quality Prediction
Rudragoud Patil, Gayatri Bedekar, Parimal Tergundi, and R. H. Goudar
Abstract Among all the natural resources that are required for survival of living
things, air is the most vital one. Therefore, for their existence, good quality of air is
very much essential. But day by day at an alarming frequency air is getting polluted.
Increased industrialization as well as use of vehicles and machines to a great extent
is some of the reasons behind air pollution problem. Change in nature’s life cycle and
disruption in the life cycle of human beings are the outcome of air pollution. Because
of pollution in air, short term and long-term health effects are being faced by all human
beings. Therefore, the utmost alarming concern for all of us is the air pollution. This
problem of air pollution can be conquered by progression in research work as well
as by use of many machine learning methods efficiently. Auto-regressive integrated
moving average (ARIMA) model is used to predict air quality or air pollution based
on machine learning techniques. In this paper, ARIMA model is implemented by
using Python, and result proves that implemented method is better in predicting the
air quality.
Keywords Machine learning · Air pollution · Life cycle · ARIMA method
1 Introduction
Polluted air and its impact among the critical challenges are faced by humanity due
to globalization, accelerated industrialization, and urban development. Urbanization
is one of the root causes of air pollution growth, which has a major impact on public
health. The main air pollution metric is PM 2:5. These small and light particles are
R. Patil · G. Bedekar (B) · P. Tergundi · R. H. Goudar

Department of Computer Science and Engineering, Gogte Institute of Technology, Belagavi,
Karnataka, India
R. Patil
e-mail: rspatil@git.edu
P. Tergundi
e-mail: pvtergundi@git.edu
https://doi.org/10.1007/978-981-16-7610-9_32
442 R. Patil et al.
behind 4 to 8% increase in lung cancer and cardiopulmonary diseases [1]. Although

that public health concern which always grows have been handled by different poli-
cies and services developed by cities, their benefits are subject to change. Air pollu-
tion forecasting has therefore been the focus of work in many different areas, such
as computer science, statistics, and environmental science [2]. To predict potential
air emissions, the approach for that accuracy is to use longer time period data from
past years to determine emission levels. System analysis is very much needed for
keeping the ordered method that can be classified into four stages. Current physical
system goes through examination and understanding in the first stage. Either a new
business system is planned or an existing system is replaced here. In the next stage,
it is discovered that how existing system can be physically implemented. After that,
the essential logical system is the next stage, and last stage is the development of the
required system. System is then prepared by means of analysis.
The existing system used for forecasting air quality has some limitations such as
manual entry of most of the data, no preprocessing of data in the encoding methods,
no provision of forecasting the improved level of accuracy, and loss of informa-
tion. Whereas the proposed system makes model prediction, makes use of ARIMA
algorithm for prediction of air quality accuracy. High performance, better accuracy,
and ease in model prediction are the advantages of the proposed system [3]. In this
paper based on the C4.5 algorithm, using regression, deep learning, recurrent neural
network, a quicker and more trustworthy air quality prediction model is suggested
[4]. High detection accuracy techniques are integrated into our air quality prediction
model method. In this paper, at the end comparison with other machine learning
techniques for results authentication will be done [5].
2 Related Work
This paper suggests and proposes expanded model of short-term memory of

neural network, which implicitly examines spatiotemporal similarities, for predicting
concentration of air pollutants. Upon experimentation and analysis of results, it is
proven that long short-term memory neural network (LSTME) is better to many other
statistical approaches [6]. A method of forecasting concentrations of air pollutants
on numerous scales has been suggested by this model. The output of the anticipated
model was ideal for long-term statistical analysis. Here, the five predictive models,
support vector machine (SVM), k-nearest neighbors (KNN), Naive-Bayesian clas-
sifier, random forest, and neural network were established. With the best neural
network, 99.56% accuracy and log (0.0543) loss efficiency of five machine learning
models are produced [7]. A combination of classification algorithms. Calculate the
cumulative rolling eight hours of contaminants in a restricted area and find out sensor
clusters showing similar patterns of pollutant concentration in the city over multiple
years. Here, the most common methods of deep learning, the recurrent neural network
(RNN), was used to predict air quality [8]. Reduction in errors amid prediction and
actual data is the aim of this model. The four advanced regression methods utilized
An Efficient Implementation of ARIMA Technique for Air Quality … 443
here for air quality and/or pollution prediction are random forest regression, tree
egression decision, multi-layer perceptron regression, and gradient boosting regres-
sion. MAE and RMSE have been used as parameters for the comparison of regression
model dependent on processing time and sample size [9]. In addition, it was also
determined in order to figure out the superlative model considering the processing
time as well as the least fault rate time needed for processing for each form. The best
methods of random forest regression which uses the four different approaches are
listed below.
Decision tree, random forest, help vector machine [10], etc. algorithms are used
from tests. As per the experimental tests, the probability of accuracy increases with
the rise in the number of attributes [11]. When compared with individual model,
combined model’s outcome is higher. Among the various classification and regression
methods used for predicting air quality index (AQI) of main contaminants including
PM10, O3 , PM2.5, NO2 , CO, and SO2 . ANNs and help vector regression are best
suited for forecasting the air quality of New Delhi. Mean absolute error, Rd2, and
mean square error have been used to evaluate these methods [12]. BLSTM and IDW
techniques have been proposed for spatiotemporal forecasts of air quality at numerous
time granularities [13]. The LSTM network’s forecast results are delivered at hourly,
regular, and once in a week granularity and numerous periods of time. As per the
results of experiments, better predictive performance for concentration of PM2.5 can
be achieved with the proposed model [14]. The WNN model developed for PM2.5
concentration in the short-term forecast has some strong features compared to other
models [15, 16].
Objectives of the ARIMA model are listed here.
1. Examination of factors affecting the air quality and increasing the air pollution.
2. Recommending and creating a method for developing a procedure for measure-
ment of air pollution quickly and accurately.
3. Use of AI, ARIMA model to train the system air quality can be identified.
4. It recommends a system that suggests a method to decrease the variables
affecting air quality.
5. Verification of the results obtained against certain machine learning techniques
through analysis.
6. Planning a representation model in highly dense areas having confirmed or
assumed but still low air quality.
7. Suggesting most reliable information for assessing the population at risk from
revelation to poor air quality.
3 System Architecture
ARIMA prediction model with time series with external feedback as the technique of
machine learning for forecasting of air quality is utilized here. For writing code, we
have chosen Python script, and the dataset used here belongs to Delhi city. Figure 1
given below shows the overall basic system architecture [17].
444 R. Patil et al.
Fig. 1 Basic system architecture
Data selection and loading and preprocessing

In the data selection process, selection of the dataset used for air forecast detection
is done and that dataset includes the co, pt08.s1, nmhc, and so on. By means of data
preprocessing, elimination of unwanted data from the dataset is done. And by using
imputer library, the null values such as missing values are removed.
Feature processing, Target preparation, and Model Preparation

By combining hourly values and standardizing n hourly values, goal is set. In the
next stage of preparation of the model, regression machine learning techniques and
ARIMA system MLA will be applied.
Divide data in test data and train data
For cross validation, available data are split and for predictive model development
that one portion of the data are used, whereas other part is used for assessing the
model’s performance. Partitioning of data of data into training as well as testing sets
is important part in the assessment of data mining model, and out of all data, most
of the data is used for training, whereas a smaller part of the data is used for testing
[18].
Classification using ARIMA algorithm
ARIMA provides a concept of convergence as well as a more straightforward autore-
gressive average is generalized. As a supervised learning, time series data can be
expressed. By using previous time steps as input and afterward time steps as an
output variable, the data can be restructured as a classification issue [19].
System testing and training
Former to operative phase of the system for confirming its precision and efficacy
system, testing is carried out by using one of the ways such as understanding the
specified aim with which a system was designed [20]; research should be done
to show that every function is working completely. The actual installation of the
package in its specific world, to the gratification of the predictable users and to the
device operation is done in by system implementation. User should get basic machine
knowledge training, and after that, they should be trained on the latest application
software, so the basic idea of using the new program can be gained by them. In order
to provide an idea of the entire operations of the system, operational documentation
is very much needed. System maintenance is also necessary for making the system
structure adaptable to the changes.
4 Implementation
A. Data acquisition and filtering:

For neural network training in this phase, data retrieval was performed. US
EPA is the data source, which displays the daily readings (mean, first max,
max hourly, AQI) of four major pollutants NO2 , CO, SO2 , O3 . Data regarding
pollution from us (2000–2016) were compressed and taken down to 2010–16
as with minimal computation power 16-year data cannot be processed.
446 R. Patil et al.
B. Build neural network:

A feed-forward ANN model is formed for prediction of the AQI for a given
region. Construction and execution of a multi-layer perceptron (MLP) with
one input, one output, and one hidden layer were done. Aim behind the use of
the single-hidden layer architecture was to reduce the difficulty of the neural
network and to improve the required computational efficiency [21]. Parameters
for the input were the metrics with three pollutants the mean, first max hour,
and first max value, and the parameter for output was the AQI. In the process
of creating a neural network model [22] as comparison between inputs is done
and from outputs feedback is received, precise dataset selection is very vital
[23].
C. Train neural network:
For execution of the above proposed model in this segment, Java on NetBeans
IDE was used. 70% of the data was used for user training.
D. Validate neural network:
30% of the data which was left was used in the network’s final validation.
E. Component prediction using ARIMA:
For forecasting the future input values, ARIMA prediction model and time
series were used in this module. These values were entered into the network in
order to receive the final output values of AQI [23] (Fig. 2).
(1) Pictorial representation of attribute values affecting air quality from dataset of
9350 records (Fig. 3).
(2) References used to implement AIR quality prediction (Fig. 4)
(3) Following references are used for implementing AIR quality prediction:
AQI categories are PM10 (24 h.), PM2.5 (24 h.), NO2 (24 h.), O3 (8 h.), CO
(8 h.), SO2 (24 h.), NH3 (24 h.), and PB (24 h.) [24].
Ranges are as follows:
• For PM10 (24 h.)—range of 0 to 50 is good; 51 to 100 is satisfactory; 101
to 250 is moderately polluted; 251 to 350 is poor; 351 to 430 is very poor,
and 430 above is severe.
• For PM2.5 (24 h.)—range of 0 to 30 is good; 31 to 60 is satisfactory; 61 to
90 is moderately polluted; 91 to 120 is poor; 121 to 250 is very poor, and
250 above is severe.
• For NO2 (24 h.)—range of 0 to 40 is good; 41 to 80 is satisfactory; 81 to
180 is moderately polluted; 181 to 280 is poor; 281 to 400 is very poor, and
400 above is severe.
• For O3 (8 h.)—range 0 to 50 is good; 51 to 100 is satisfactory; 101 to168 is
moderately polluted; 169 to 208 is poor; 209 to 748 is very poor, and 748
above is severe.
Fig. 2 Implementation
process
• For CO (8 h.)—range of 0 to 1 is good; 1.1 to 2 is satisfactory; 2.1 to 10 is

moderately polluted; 10 to 17 is poor; 17 to 34 is very poor, and 34 above
is severe.
• For SO2 (24 h.)—range of 0 to 40 is good; 41 to 80 is satisfactory; 81 to
380 is moderately polluted; 381 to 800 is poor; 801 to 1600 is very poor,
and 1600 above is severe.
• For NH3 (24 h.)—range of 0 to 200 is good; 201 to 400 is satisfactory; 401
to 800 is moderately polluted; 801 to 1200 is poor; 1200 to 1800 is very
poor, and 1800 above is severe.
• For PB (24 h.)—range of 0 to 0.5 is good; 0.5 to 1 is satisfactory; 1.1 to 2 is
moderately polluted, 2.1 to 3 is poor; 3.1 to 3.5 is very poor, and 3.5 above
is severe.
Health Impacts related to AQI:

• In range 0 to 50 (Good)—there are nominal impacts.
• In range 51 to 100 (Satisfactory)—there are minor breathing issues
448 R. Patil et al.
Fig. 3 Pictorial representation of attribute values affecting air quality from dataset of 9350 records
Fig. 4 Time series dataset with different 15 attributes affecting air quality
Fig. 5 ARIMA model results
• In range 101 to 200 (Moderately polluted)—breathing uneasiness is there

in people having lung and heart disease.
• In range 201 to 300 (Poor)—breathing uneasiness on extended exposure
and having heart disease.
• In range 301 to 400 (Very poor)—there is evident impact on people having
lung, heart, and respiratory disease in the people.
• In range 401 to 500 (Severe)—there are significant health issues in the
people having lung, and heart disease. Also, respiratory problems are faced
by healthy people too.
(4) ARIMA Model Result in Brief (Fig. 5).
6 Conclusion
A detailed review of Delhi’s 24-day predictions of air quality, i.e., RH value predic-
tions are rendered and improved by machine learning process. Datasets qualified and
used are 100% for the check or prediction of next 24-days air quality. ARIMA time
Series model algorithms are being adapted with various features, and model forecast
results are estimated in machine learning processes. The probability of increasing
precision improves with an improvement in usability according to the proof of exper-
imental tests. All 15 datasets attributes are used during training. As compared to the
limited dataset model for method aspect, the result from a large model with big
dataset is better.
450 R. Patil et al.
7 Future Scope
By means of combining air quality input dataset and human health dataset, we can
forecast is the environmental conditions are good or will upsurge the danger of viral
infections on the human body.
References
1. Saba A, Asghar MN (2017) Comparative analysis of machine learning techniques for predicting
air quality in smart cities. IEEE. https://doi.org/10.1109/ACCESS.2019.2925082
2. Bo Liu L (2016) Forecasting PM2.5 concentration using spatio-temporal extreme learning
machine. In: 2016 15th IEEE international conference on machine learning and applications.
Beijing, China
3. Pooja B (2019) Air quality prediction using machine learning algorithms. https://doi.org/10.
7753/IJCATR0809.1006
4. Kong T, Wang Y (2017) Air quality predictive modeling based on an improved decision tree
in a weather-smart grid. https://doi.org/10.1109/ACCESS
5. Weizhen L (2018) Air pollutant parameter forecasting using support vector machines. City
University of Hong Kong, Hong Kong, Department of Building and Construction
6. Li X (2017) Long short-term memory neural network for air pollutant concentration predictions,
method development and evaluation. Elsevier: Environ Pollut 231(2017):97e1004
7. Amado TM (2018) Development of machine learning-based predictive. Proceedings of
TENCON 2018–2018 IEEE region 10 conference. pp 28–31
8. Ayyalasomayajula H (2016) Air quality simulations using big data programming models. In:
IEEE second international conference on big data computing service and applications 2016
9. Kuo J (2019) Deep learning-based approach for air quality forecasting by using recurrent
neural network with gaussian process in Taiwan. In: 2019 IEEE 6th international conference
on industrial applications and engineering
10. Li X (2017) Long short-term memory neural network for air pollutant concentration predictions:
method development and evaluation. Environ Pollut 231(Pt 1):997–1004
11. Song L (6–11 July, 2014) Spatio-temporal PM2.5 prediction by spatial data aided incremental
support vector regression. In: 2014 international joint conference on neural networks (IJCNN).
Beijing, China
12. Zhang S. Prediction of Urban PM2.5 concentration based on wavelet neural network. 978-1-
5386-1243-9/18/$31.00_c 2018 IEEE
13. Lee MH (2012) Seasonal ARIMA for forecasting air pollution index: a case study. Am J Appl
Sci 9(4):570–578
14. Salemdawod A (2017) Water and air quality in modern farms using neural network. ICET2017.
Antalya, Turkey
15. Qi Z, Deep air learning: interpolation, prediction, and feature analysis of fine-grained air quality.
IEEE Trans Knowl Data Eng
16. Soundari AG, Jeslin JG, Akshaya AC (2019) Indian air quality prediction and analysis using
machinelearning. Int J Appl Eng Res 14(11):181–186. ISSN 0973-4562
17. Ip F (2010) Forecasting daily ambient air pollution based on least squares support vector
machines. In: Proceedings of the 2010 IEEE international conference on information and
automation. Harbin, China
18. Septiawan WM (2018) Suitable recurrent neural network for air quality prediction with
back propagation through time. In: 2018 2nd international conference on informatics and
computational sciences (ICICoS)
19. Zhang C (2017) Early air pollution forecasting as a service: an ensemble learning approach.
In: 2017 IEEE 24th international conference on web services. Beijing, China
20. Xia Xi, “A Comprehensive Evaluation of Air Pollution Prediction Improvement by a Machine
Learning Method,” in 2015 IEEE International Conference on Service Operations And
Logistics, And Informatics (SOLI), Beijing, China, 2015.
21. Azzouni A, Pujolle G, NeuTM: a neural network-based framework for, LIP6/UPMC. Paris,
France
22. Tapale MT, Goudar RH, Birje MN et al (2020) Utility based load balancing using firefly
algorithm in cloud. J Data, Inf Manag 2:215–224. https://doi.org/10.1007/s42488-020-000
22-2
23. Rijal N (2018) Ensemble of deep neural networks for estimating particulate matter from images.
In: 2018 3rd IEEE international conference on image, vision and computing
24. Desai NS, IoT based air pollution monitoring and predictor system on Beagle bone black
A Survey on Image Emotion Analysis
for Online Reviews
G. N. Ambika and Yeresime Suresh
Abstract Emotions are sentiments, opinions, and feelings which are expressed by
the public through text, images, and videos. Opinion analysis for the Internet data is
now attracting a growing research people provide feedback on the Internet through
the reviews and images in different platforms like Instagram, Facebook, Twitter, and
other online Websites on the products. Major work was implemented for processing
the sentences. Finite amount of the research that focuses on analyzing opinions of
image information. Image emotion topics will be ANPs, i.e., adjective noun pairs
manually concealed tags for Internet imagery those helpful of predicting opinions,
or else emotions convey by the people in terms of pictures. The main aim is to predict
emotions of the images which are not label. To raise this issue, deep learning methods
are utilized for opinion analysis of images, since the deep learning techniques have
the capabilities for successfully understanding the behavior of images.
Keywords Convolutional neural network · Image processing · Deep learning

techniques · Image classification
1 Introduction
Currently, public provide more information on social Websites through images about
the places, products, or restaurants they visit every day or emotions depict in the
form of Mojis, pictures, and videos. Analyzing and processing the information like
this from social Websites or photo-sharing networks like Flickr, Twitter, Instagram,
Snapchat, etc., provide insight into the common emotion of the public about. Also, it
would be useful to know the emotion of an image depict to manually predict emotional
tags on—like happiness, sad, etc., significance or post with visual information often
G. N. Ambika (B)
Department of CSE, BMSIT & M, Yelahanka 560064, India
e-mail: ambikagn@bmsit.in
Y. Suresh
Department of CSE, BITM, Ballari 583104, India
https://doi.org/10.1007/978-981-16-7610-9_33
454 G. N. Ambika and Y. Suresh
consists a short textual explanation or no wordings at all. So, the visual characteris-
tics express most of the people opinion or sentiment in these types of the contents.
Further, images can overcome language boundary and are easier to Interpret. Figure 1
shows several pictures which are collected from the different social networks where
various types of emotions like happiness and sadness are articulated. A study on
image emotion analysis is still in the initial stage. Interpreting the emotion from the
image is difficult because of the various causes. Image opinion analysis involves the
ability to identify object, view, action, and people judgment. Generation of charac-
teristics from the images to predict the opinion includes a large amount of human
attempt and period. Contrarily, deep Learning models include a huge quantity of
training information which is complicate to gather the images. Deep learning is a
subfield or a method of machine learning that makes the machine intellectual’s suffi-
cient, the computer which is able to study through the knowledge and recognize the
world of concepts. Machines obtain information through the existing world experi-
ence, without the help of human beings to make system understand the entire state
or to make decisions. The word “deep” defines the measuring the neural networks
hidden layers. This representation is trained by taking huge amount of data which is
labeled and uses the architecture called as neural network architecture that makes the
characteristics and parameters will be taken directly from the provided information
not including any individual interference. Deep learning plays an important respon-
sibility for image opinion analysis by providing different methods are convolutional
neural network (CNN), deep neural network (DNN), region neural network (RNN),
and deep belief network (DBN).
Deep learning is defined as a outline which produces precise learning param-
eter for classification of images. The main aim of manuscript to study and analyze
the various techniques of deep study architecture specifically; deep neural network
(DNN), convolution neural network (CNN), region neural network (RNN), fast R-
CNN. The Sect. 2, in manuscript, describes study implemented till now for picture
opinion analysis by means of the above mentioned methods. The Sect. 3 in the
Fig. 1 Sample images as reviews for restaurant

A Survey on Image Emotion Analysis for Online Reviews 455
manuscript provides analysis performance and restrictions of methods described in

Sect. 2. Section 4 provides conclusion.
2 Related Work
Several Investigators explore different methods for analyzing the image emotions,
and the results of machine learning algorithms are significant. Among various
machine learning methods, the techniques which rely on the deep learning are best
for image opinion examination. This section provides few important studies which
are done by the investigators by means of deep learning methods, along the results.
2.1 Deep Neural Network (DNN)
Deep neural network will be applied both for Image emotion analysis and textual
opinion analysis. Neural network has many layers, primarily imagery will be given
to input layer, then processed to produce the results via outcome layer. Between these
layers, many hidden layers at hand for further processing an input image, because
of many hidden layers for additional processing an input image, because of many
hidden layers in neural network, called deep neural network. Machine understands
the image in the form of a rows and columns; every pixel holding some value, pixel
defines activations. Every neuron is interconnected another neurons; activation of
primary layer determines activation for further layer. The main aim is to link the
pixels of image into edges; edges into the sub patterns, at last join the recognized
patterns to form an image for analyzing the emotions.
This paper [1] has presented different techniques that explain an image consistency
method to resolve whether the image data and the text information are dependable
with each other. This manuscript includes the models like convolution neural network
which is suited for image emotion analysis. Deep belief networks are utilized for data
which are not labeled and to beat the limitation of unlabeled images. Their studies
convey that deep learning methods are good as compare to supervised vector machine
(SVM).
This study [2] has implemented new BDMLA which explains bidirectional atten-
tions plus multilevel associations among image and text information for classifica-
tion. This method is focused on demonstrative image regions related. This study has
used the social network images for emotion analysis.
456 G. N. Ambika and Y. Suresh
Convolutional neural network defined as feed forward neural network mainly applied
in image processing, classifying the images and prediction of images. One of the key
applications is the image and visual analysis. A series of operations are implemented
for image emotion analysis with convolutional neural network. Convolutional neural
network consists convolutional layer follows nonlinear layer which is followed by
pooling layer then fully connected layer. The first layer of convolutional neural
network image classification convolutional layer where imagery will be given as a
input. Reading an image will begins from top left corner, image is converted into
a matrix which is known as filters. There are many convolutional layers. When the
image will be send to convolutional layer and output of one layer is input to next layer.
Second layer is nonlinear layer in pipeline, there is an activation function provides
the CNN a nonlinear behavior. After the nonlinear layer, there is a pooling layer
which decreases the workload by dropping the characteristics of an image size if
provided image is of large size. If any of characteristics were previously recognized
in model during earlier convolution operation. Then, it is not further processed for
further classification, this process is called down sampling or subsampling. After the
pooling layer if result still expected, then a thought of fully connected layer will be
provided. Results from convolutional networks will be considered (Fig. 2).
This paper [3] describes the image text regularity which is determined using
multimodal emotion analysis technique; this technique identifies the relationship
among the images beside the sentences. They have used the SentiBank for describing
visual concept.
In this paper [4], deep multimodal attentive fusion (DMAF) is used. This describes
the discriminative sorts plus the interior relationship concerning visual represen-
tation. Here, they have presented a technique that contains convolutional neural
network architecture for text and image emotion analysis to perform multimedia
emotion analysis. They have considered images from the Twitter and Tumblr. The
information contains both +ve and −ve reviews.
Fig. 2 Basic convolutional neural network

This study [5] implemented the opinion analysis foe text sentiment as well as
image. This model based on Facebook datasets. Dataset consists of both positive
as well as negative data. Technique implemented provides comparison of the text
between CNN and SVM. They concluded that convolutional neural network has
performed effectively best compare to the machine learning techniques.
This paper [6] implemented a model using the convolutional neural network to
extract some characteristics from image and classify image, according to the behavior
and features in proper class. Create various neural networks to train the model for
examination of pattern and experienced performance.
2.3 Regional Convolutional Neural Networks (R-CNNs)
The main aim for R-CNN of considering an input image provides list for bounding
boxes as results, every bounding box consists of object and also category (e.g., car
or pedestrian) of object. Currently, R-CNN has been absolute for performing other
computer vision responsibilities. Subsequent concepts cover few versions of R-CNN
that has been implemented. R-CNN provides an input image; R-CNN starts by using
a mechanism known as selective search for fetching the regions of interest (ROI),
where each ROI is a bounding box that may correspond to boundary for object in
image analysis.
The Table 1 provides the comparisons between the different techniques which
are used for text and image classification. The deep learning technique provides the
good accuracy for image opinion analysis.
3 Conclusion
Classification of images into different classes like happy, sad, and neutral is a difficult
task, multiple factors can be considered. Currently, many new and helpful images
classification techniques are evolving and investigators examine them in terms of
image classification accuracy and time efficiency. A main insight of a particular
image.
Classification target is to choose a perfect technique. Various methods may have
various performances in different tasks. To check the most optimized technique,
investigators should choose the correct data type, data size, and expected outcome.
In common, the classification system will be designed based on the dataset chosen.
Table 1 Literature survey
458
Sl. No. Author and year Proposed work/algorithm Merits Demerits Observation
used
1 Udit Doshi (2021) Adopted convolutional Implemented for Happiness, surprise, Deep learning techniques
“Emotion detection and neural network model classification of images for sadness, anger, and fear are can be used to predict and
sentiment analysis of static social networks not predicted improve the accuracy
images”
2 Zhao [4] “An image-text Multimodal adaptive This method exploits an Not described the interior Using deep learning
consistency driven sentiment analysis method image consistency method relationship among image method, characteristics of
multimodal sentiment and convention SentiBank to resolve whether the and semantic text contents. image and text can be
analysis method for social methods are used image data and the text Tested only on input of extracted
media” information are dependable social media. Dataset
with each other considered is small
3 Huang et al. [5], Two separate unimodal This model knowledgeable This model is not designed Design of an additional
“Image–text sentiment models were proposed operative sensation for learning the sensible deep perfect for
analysis via deep classifiers for graphic and multimodal characteristics learning the multimodal
multimodal attentive fusion textual modality tic is more real for image characteristic is required.
2019” correspondingly text sentiment exploration Finding the fine granularity
relationship among image
pair and text combines is
required
(continued)
G. N. Ambika and Y. Suresh
Table 1 (continued)
used
4 Xu et al. [6], Has proposed a new To achieve the bidirection Exploring the consequence Proposing the
“Visual-textual sentiment bidirectional multilevel attentions plus multilevel of social networks among discriminative sorts and the
classification with attention (BDMLA) associations among the social imagery on opinion inner relationship among
bidirectional multilevel technique is proposed image and textual study of public pictures is images
attention networks” information for emotion required
classification,
this prototype focused on
demonstrative image
regions related to the
agreeing text narrative, a
visual thoughtfulness grid
is planned
5 Ortis et al. [7], 2020, Addressed delinquent of Adventures the associations Cannot handle images of Deep visual representations
“Exploiting objective text image opinion study among visual and textual different structures to be considered.
description of images for converging on the valuation structures inter-related to Examining the assignment
A Survey on Image Emotion Analysis for Online Reviews
visual sentiment analysis” of the divergence of the imageries of copy sentiment

sentiment suggested extrapolation by attractive
through copy. into explanation also
Preliminary from a simulations and procedures
surrounding method which that effort to predict
activities together visual sentiment right from pixels
and written structures should be implemented
(continued)
459
Table 1 (continued)
460
used
6 Ye [8], “Visual-textual Tucker fusion method is A new deep tucker fusion Not addressed on all the Interconnected
sentiment analysis in used technique addressed the discriminative mage-sentence data four
product review” difficult of visual-textual characteristics kinds of structures should
sentiment analysis be subjugated
7 Yang [9], “Sentiment Proposed a new emotion Analyzed customer Be able to merely share This study of the opinion
analysis for e-commerce analysis model SLCABG feedback, we can provide emotion into positive and fineness classification of
product reviews in Chinese model assistance to merchants on negative classifications, images is required
based on sentiment lexicon e-commerce stages to which is not appropriate in
and deep learning” achieve user feedback in areas with extraordinary
time to advance their necessities for sentiment
examination eminence and refinement
fascinate extra customers to
utilize
8 Kausar [10], “A sentiment Proposed a sentiment Ability to progression It has a disadvantage of Clarification of dissimilar
polarity categorization polarity categorization different kinds of textual handling diverse styles aspects of consumer
technique for online technique statistics such as cynicism analyzes on merchandise
product reviews” excellence can be
considered
G. N. Ambika and Y. Suresh
References
1. Doshi U, Barot V, Gavhane S (2020) Classification of images for social networks. In: 2020
IEEE international conference on convergence to digital world—quo vadis. ICCDW
2. Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for
sentiment analysis. In: ACL
3. You Q, Cao L, Jin H, Luo J (2016) Robust visual-textual sentiment analysis: when attention
meets tree-structured recursive neural networks. In ACM MM
4. Zhao Z, Zhu H, Xue Z, Liu Z, Tian J, Chua MCH, Liu M (2019) An image-text consis-
tency driven multimodal sentiment analysis method for social media. Inf Process Manag
56(6):102097
5. Huang F, Zhang X, Zhao Z, Xu J, Li Z (2019) Image–text sentiment analysis via deep
multimodal attentive fusion. Knowl-Based Syst 167:26–37
6. Xu J, Huang F, Zhang X, Wang S, Li C, Li Z, He Y (2019) Visual-textual sentiment classification
with bi-directional multi-level attention networks. Knowl-Based Syst 178:61–73
7. Ortis A, Farinella GM, Torrisi G, Battiato S (2020) Exploiting objective text description of
images for visual sentiment analysis. Multimedia Tools and Appl 1–24
8. Ye J, Peng X, Qiao Y, Xing H, Li J, Ji R (2019) Visual-textual sentiment analysis in product
reviews. In: 2019 IEEE International conference on image processing (ICIP). IEEE, pp 869–873
9. Yang L, Li Y, Wang J, Sherratt RS (2020) Sentiment analysis for e-commerce product reviews
in Chinese based on sentiment lexicon and deep learning. IEEE Access 8:23522–23530
10. Kausar S, Huahu X, Shabir MY, Ahmad W (2019) A sentiment polarity categorization technique
for online product reviews. IEEE Access
11. Pang B, Lee L et al (2008) Opinion mining and sentiment analysis. Found Trends R Inf Retrieval
2(1–2):1–135
12. Bollen J, Mao H, Pepe A (2011) Modeling public mood and emotion: twitter sentiment and
socio-economic phenomena.” In: ICWSM
13. Hu X, Tang J, Gao H, Liu H (2013) Unsupervised sentiment analysis with emotional signals.
In: WWW
14. Ren Y, Zhang Y, Zhang M, Ji D (2016) Context—sensitive twitter sentiment classification using
neural network. In: AAAI
15. D’Avanzo E, Pilato G (2015) Mining social network users’ opinions’ to aid buyers’ shopping
decisions. Comput Hum Behav 51:1284–1294
An Efficient QOS Aware Routing Using
Improved Sensor Modality-based
Butterfly Optimization with Packet
Scheduling for MANET
S. Arivarasan, S. Prakash, and S. Surendran
Abstract Mobile ad hoc networks (MANETs) have recently attracted many

academicians interest due to their relevance in allowing mobile wireless nodes to
interact without the need of existing cable or predefined infrastructures. Owing to
nodes mobility and networks dynamic infrastructure, finding an ideal trustworthy
routing path among source and destination in MANET is a difficult problem. Hence,
better routing mechanisms are required to improve the data transmission’s relia-
bility that provides load balancing among the nodes. The previous system designed
a red deer algorithm-based energy-efficient QoS routing (RDA-EQR) for MANET.
However, it has issue with packet collision and packet delivery rate. The suggested
system devised an improved sensor modality-based butterfly optimization for quality
of service aware routing with packet scheduling (ISMBOQAR-PS) to improve packet
transmission performance in MANETs to address this issue. Here, in this work,
optimal paths are selected by using improved sensor modality-based butterfly opti-
mization (ISMBO) algorithm based on the node parameters such as energy, reliability,
bandwidth, static resource capacity, quality, and delay. Then, to minimize transmis-
sion collisions, achieve high levels of power conservation, and increase network
lifetime, the modified time-division multiple access (MTDMA)-based packet trans-
mission method is utilized. Outcomes of experiment show that introduced system
achieves high performance compared to the previous system in terms of throughput,
packet delivery ratio, energy utilization, and end-to-end delay.
Keywords Mobile ad hoc network (MANET) · Time-division multiple access

(TDMA) · Improved sensor modality-based butterfly optimization (ISMBO) ·
ISMBOQAR-PS · QDAS · RDA-EQR · MD-CCH · Quality of service (QoS)
S. Arivarasan (B)
Sathyabama Institute of Science and Technology, Chennai 600119, Tamil Nadu, India
S. Prakash
Department of Electronics and Communication Engineering, Bharath Institute of Science and
Technology, BIHER (Deemed To Be University), Chennai 600073, Tamil Nadu, India
S. Surendran
Department of Computer Science and Engineering, Tagore Engineering College, Chennai 600127,
Tamil Nadu, India
https://doi.org/10.1007/978-981-16-7610-9_34
464 S. Arivarasan et al.
1 Introduction
A Mobile ad hoc network is an infrastructure-free network of mobile nodes that

may alter their geographic positions freely, resulting in dynamic topologies made
up of bandwidth-constrained wireless links. These mobile devices are self-organized
to communicate with each other for sharing the data without any fixed controller.
And, nodes can move in any direction on their own [1, 2]. As a result, mobility is
a key feature of this network. The number of average linked routes is affected by
nodes’ mobility that in turn impacts routing algorithm’s performance. As a result,
it is necessary to devise a new strategy for effectively managing network mobility.
MANETs are being used in defense process, emergency search and rescue process,
meetings and conventions, and other situations where fast information sharing is
needed with no use of a fixed infrastructure [3, 4].
One of the most important factors impacting the energy dissipation rate in
MANETs is routing. Owing to their increased changing topology, lack of setting
up infrastructure for centralized administration, band width restricted wireless links,
and energy restricted nodes, routing protocols which find data packets route from
source to destination node utilized in conventional wired networks that cannot be
openly used in ad hoc WNs. A number of routing methods for ad hoc WNs have
recently been suggested. Proactive or table-driven routing protocols and reactive or
on-demand routing protocols are the two types of routing protocols [5].
Proactive or table-driven protocols aim to identify a path on a constant basis
and retain routing data from each and every node in network, in order that when
information has to be transferred among two nodes, route is previously available.
Nodes must keep one or more tables up to date with consistent routing data. Protocols
in this area include DSDV, WRP, CGSR, and others. Routes are produced in a reactive
way only when the source node requests them. While a node in the network needs a
route to another node, it begins a route identification procedure [7]. All possibilities
are considered, and the best path is chosen. Till either the destination turn out to be
unreachable or the route is no longer desirable, the route is maintained by some route
maintenance method. Reactive protocols include AODV, DSR, LMR, TORA, and
ABR. When using MANETs to transmit, a data packet that meets quality of service
(QoS) criteria such as E2E delay, throughput, energy usage, and establishing path
from source to destination is critical.
One of the most difficult problems in MANET is preventing collisions caused by
several nodes sending data over the same channel at the same time. Transmissions
using TDMA-based channel access are energy-efficient and collision-free [8]. The
TDMA is subdivided into frames, with each frame subdivided into time slots that can
be given to different nodes. All nodes must be permitted to send packets at least once
in a single frame. It avoids the problem of conflict and allows efficient spectrum reuse
[9]. It is particularly well suited to traffic with a periodic transmission pattern and
assured QoS requirements. As a result, a wide range of TDMA scheduling algorithms
has lately been used to achieve a higher PDR with shortest possible E2E delay.
An Efficient QOS Aware Routing Using Improved Sensor Modality … 465
Remaining part of the article is structured as literature review on routing mech-

anisms that can be applied to them is presented in Sect. 2. The proposed improved
sensor modality-based butterfly optimization for quality of service aware routing
with packet scheduling (ISMBOQAR-PS) technique is explained in Sect. 3. Experi-
mental assessment mechanism and performance investigation are presented in Sect. 4.
Finally, conclusion of article is described in Sect. 5.
2 Literature Review
Energy aware ad hoc on-demand multipath distance vector routing protocol was
introduced by Kokilamani and Karthikeyan [10] as a novel technique for path selec-
tion strategy utilizing energy factor. The proposed system aims to address the afore-
mentioned issues by choosing energy aware nodes along the path. The node is only
evaluated during the route selection process if it has a value greater than less energy
threshold data. NS2 simulator is used to evaluate the developed model, and the results
are significant [10].
Chen et al. [11] created a topological change adaptive ad hoc on-demand multipath
distance vector routing protocol that can adjust to high-speed mobility node while
maintaining quality of service. A stable path selection method is planned in this
protocol that takes into account not only resources of node (residual energy, available
bandwidth, and queue length) but also link stability probability between nodes as
path selection parameters. Moreover, in order to adapt to fast topology changes,
the protocol includes a link interrupt prediction mechanism that modifies routing
strategy depending on periodic probabilistic estimations of stability of link. On the
NS2 platform, various scenarios with node speeds ranging from 10 to 50 m/s, data
rates ranging from 4 to 40 kbps, and node counts ranging from 10 to 100 are simulated.
The findings demonstrate that when the speed of the node is greater than 30 m/s,
the proposed protocol’s quality of service metrics (PDR, E2E delay, throughput)
are considerably improved, but they are greater when the speed of node is less than
30 m/s [11].
For MANET, Saravanan et al. [12] devised a PSO-based method based on expected
transmission count measures. The current technique illustrates that when repeating
computations that are related to network load, the supplier should select the optimal
way. As a result of the unusually large number of transmissions, it has an important
effect on the system’s efficiency. PSO method is used to improve trail choice by
integrating ETX metrics. The ETX metric values are used in the PSO-based method
to scale back the transmissions which are used to efficiently packet’s deliver to
specified destinations. ETX only considers an entire range of transmissions when
PSO selects the best route. Outcomes of simulation demonstrate that the PSO-ETX
strategy outperforms with regard to delivery ratio, time delay, and throughput [12].
In MANET, Kasthuribai and Sundararajan [13] developed a safe and QoS-based
energy aware multipath routing. Particle swarm optimization-gravitational search
method is proposed for multipath route selection. This method selects network’s
energy-efficient multipath routes. A route’s link quality may degrade after a certain
amount of transmissions. So, using the cuckoo search method, that is, dependent on
cuckoo behavior, an optimal path is chosen from the network’s existing paths. The
designed work’s performance measurements demonstrate that it improves energy
efficiency and network lifespan [13].
A unique dynamic time-division multiple access scheduling approach for
MANETs was proposed by Ye et al. [14]. To begin, a service priority-based dynamic
TDMA scheduling method is described, which uses service priority as a reference
parameter for slot assignment while also taking into account transmission throughput
and E2E delay performance. A MD-CCH approach is also provided to improve the
frame structure for better slot utilization across the system. The unique strategy is
created by combining the SP-DS and MD-CCH approaches. Simulation findings indi-
cate that suggested method performs better with regards to slot usage, slot allocation
competence, E2E delay, and transmission throughput [14].
3 Proposed Methodology
The proposed system designed a improved sensor modality-based butterfly optimiza-

tion for quality of service aware routing with packet scheduling (ISMBOQAR-PS)
for enhancing the performance of packet transmission in MANET. The suggested
system’s flow diagram is depicted in Fig. 1.
Path selection using Improved

Sensor Modality based Butterfly
Node Initialization Optimization
Energy,
Reliability
Bandwidth
Static resource capacity
Quality
Delay
Modified Time Division Multiple

Access
Packet transmission
Performance evaluation
Fig. 1 Flow diagram of the proposed work

3.1 Network Model
Let us consider the problem space as a graph G = (V, E), nodes are represented by
vertex, all the nodes in the network are represented as V. A link is indicated by edge
that exists between two nodes, and set of all links is indicated by E. When d ≤ r ,
there is a two-way link e(e ∈ E) between them. P indicates set of paths from (s ∈ V )
to destination D(D ∈ V ). Collection of entire edges and collection of entire nodes
of a path p( p ∈ P) is represented as E( p) and N ( p), correspondingly. TC indicates
the quality of service of a path on T from s to destination d.
Each radio link has its own cost, which includes things like energy, reliability,
bandwidth, static resource capacity, quality, and delay. The MANET’s mobile nodes
are first configured in chosen changing surroundings. Mobile nodes serve as both
host and router. Network region is where the link among mobile nodes is established.
Mobile nodes’ coordinates are calculated, allowing the location and mobile nodes’
velocity to be determined in future. m is the number of nodes in the MANET that
is indicated as 1 < i < m. The suggested multipath selection scheme’s second stage
is path discovery. The number of links among nodes connected with the relevant
communication channel determines path from source to the destination mobile node.
Assume P be total number of paths connecting source and destination nodes and is
indicated as 1 < j < P.
3.2 Improved Sensor Modality-based Butterfly Optimization

(ISMBO)-based Optimal Routing with QOS
The QoS routing issue is modeled as an optimization problem whose prime goal is
to determine optimal paths by taking into account of energy, reliability, bandwidth,
static resource capacity, quality, and delay.
Energy: Node’s energy is computed for each potential path in MANET. Because
the MANET is battery-powered, it uses less energy, resulting in higher performance.
For improved communication between mobile nodes, the energy parameter should be
set to its maximum value. The energy associated with nodes also affects the network’s
lifetime. The energy function is defined as follows:
1
mn P
R energy = En i j (1)
mn ∗ p i=1
j =1
i∈ j
Here, 1 < i < mn mobile nodes number is 1 < j < p is paths number, En i j is
every node’s energy in the resulted route. Energy value is defined as
j j
energy En i j = PTi p ∗ En iT X + PRp ∗ En R X (2)
Here, the value of

j j
En iT p = T pti − T X ip , En Rp = Rpt − Rp pj (3)
where T p is transmitted power, R p is receiver power, and P is packet size.

Reliability: The source node utilizes the single path to transmit data to the sink if
the needed QoS can be fulfilled using a single path p, and the reliability on a single
path p is R p and may be computed as follows:
hop p

Rp = Rlinkl (4)
l=1
where Rlink is the link reliability and l is the total available links.
Bandwidth: The bandwidth calculated as
BandWidth (R(s, d)) = min(bw(s)), s ∈ R(s, d) (5)
Static resource capacity (SRC): The packet queues size Pq (M B), CPU’s
speed PC PU (G H z), battery power Mb (mW ), and maximum presented band-
width BandWidth(kbps) are described as node’s static resource capacity (SRC). SRC
is computed through
S RC = γ ∗ Pq + λ ∗ PC PU + β ∗ Mb + α ∗ Bandwidth (6)
Here, γ , λ, β, and α are characteristics of node which are weighted, their total is 1.
Quality: The equation shown for computing the link quality between node N and
its 1-hop neighbors:
M1hopRe
quality = (7)
T M1hopSent
M1hopRec, Hello message N received from 1hop; TM1hopSent, total Hello

message 1-hop has sent.
Delay: The equation for computing a delay of nodes.
n
(received time − sent time)
delay = (8)
N =0
N
Designing of fitness function from QoS metrics: Depending on connectivity of

node, the fitness function F(x) for proposed model is calculated by (9):
1
p
F(x) → Fitness = T Ci (9)
p i=0
A. Butterfly Optimization Algorithm.
Butterfly optimization algorithm is a new nature-inspired meta-heuristic which

replicates butterflies’ natural foraging and mating behavior. BOA’s approach is
formed on the fragrance produced by butterflies that aids other butterflies in their
search for prey and a mate. Butterflies are BOA’s search operators, which function
in an orderly manner to complete the task of finding the best solution in the search
area. Butterflies use their sense receptors to sense/smell the fragrance resources in
order to find their food. Scent/smell sensors are spread throughout the butterfly’s
body components, including the palp, radio wires, legs, etc., and are used to detect
scent/smell. Chemoreceptors are nerve cells on the surface of a butterfly’s body that
act as receptors [15, 16].
In butterfly optimization algorithm, considers a butterfly would generate a strong
fragrance that is linked to the butterfly’s fitness, that is, when a butterfly goes from
one region to the next, its fitness varies as needed. Scent will spread over time, and
various butterflies will be able to detect it. This is how the butterflies will be able
to share their own data with other butterflies, forming a collective social learning
method. While a butterfly detects scent of another butterfly, it will go toward it and
this stage is referred to as global search. In other case, if a butterfly is unable to detect
fragrance, it will travel arbitrarily, which is known as local search. Each fragrance
in butterfly optimization algorithm has its own distinct, fascinating fragrance, and
unique touch. It is one of the guiding characteristics that sets butterfly optimization
algorithm apart from other meta-heuristics. To understand how fragrance is regulated
in BOA, we must first understand how a method such as sound, light, or temperature
is computed.
The entire idea of differentiating and caring for the procedure is formed on
three key items: stimulus intensity (I), sensory modality (c), power exponent (a).
Tactile aims to measure the kind of vitality and process it in comparable methods in
sensory modality methodology, while method alludes to crude data used by sensors.
Currently, many modalities include light, sound, and temperature, and in butterfly
optimization algorithm, the modality is butterfly fragrance, while I is intensity of
the physical stimulus. In butterfly optimization algorithm, I is concerned with solu-
tion/butterfly’s fitness. In the proposed research work, the node in the network is
considered as butterflies in the population. The energy, reliability, bandwidth, static
resource capacity, quality, and delay are considered as objective function. This means
that when a butterfly emits Av highly noticeable amount of fragrance, other butterflies
in the area may identify it and be drawn toward it. Power of the butterfly or solu-
tion represents an increase in intensity, while a is the factor that considers normal
expression. Authors have conducted several stimuli estimate experiments on bugs,
creatures, and people, and they have suggested that as the number increases, insects
become less sensitive to changes in the environment. The fragrance is computed in
BOA using these ideas as follows:
f = cI a (10)
where f i is the fragrance apparent level, that is, how more grounded the fragrance is
smelled by ith butterfly, c is the sensory modality, I is the objective function (node
fitness), and a is the power exponent dependent on modality that accounts changing
absorption levels. Global search and local search stage are the two most important
phases in the algorithm. This is equivalent to claiming that fragrance levels rise in an
admired area. A butterfly will produce a fragrance that may be noticed from anywhere
in the region this manner. In the initial (global) search stage, nodes make a stride
toward the fittest node g* that may be depicted as

xit+1 = xit + r 2 × g ∗ − xit × f i (11)
where xit is the solution vector xi for ith nodes in iteration number t. Where g ∗
indicates present best solution (node) identified within entire solutions (all the nodes)
in the present stage. ith butterfly’s fragrance is indicated as f i and r is an arbitrary
value between [0, 1]. Neighborhood (local) search stage is formulated as

xit+1 = xit + r 2 × x tj − xkt × f i (12)
where x tj and xkt are jth and kth nodes from the search space. When x tj and xkt have
a comparable swarm and r is a arbitrary value between [0, 1] then (12) turns to
neighborhood random walk. Butterflies may search for food and a mating partner on
a neighborhood and worldwide scale. So, as part of BOA, a switch probability p is
utilized to switch among standard worldwide quests and local search.
Improved Sensor Modality-based Butterfly Optimization (ISMBO) Algorithm
In conventional butterfly optimization algorithm, static value of sensory modality c is
used for searching process of butterfly optimization algorithm. However, it does not
perform well. Theoretically, a large value of c will enable the butterflies to explore
new search space; however, it will have adverse effect on the convergence toward
global optimum solution. Whereas if a small value of c is used, the results will be
bad. This means c has great effect on the searching abilities of the butterflies and
if the value of c is modified according to the requirements of stage of optimization
process, it will prove beneficial to the performance of the algorithm. Hence, in this
work, improved adjusting strategy of sensory modality is designed and used. The
sensory modality c is calculated as following: ISMBO, value of cnew is calculated as

0.02
cnew = ct + (13)
c × tMax
t
where
t current iteration while executing the algorithm.
tMax maximum number of iteration.
Using the concepts, in ISMBO, the fragrance is updated as
f = cnew I a (14)
Algorithm 1: ISMBO Algorithm

Input: Number of nodes, energy, reliability, bandwidth, static resource capacity,
quality, and delay
Output: Optimal paths
1. Objective function f (x)
2. Initialize the nodes in the network
3. Compute node fitness function (energy, reliability, bandwidth, static resource
capacity, quality, and delay)
4. Describe sensor modality c, power exponent a, switch probability p
5. while ending condition not reached do
6. for every node in the network do
7. Compute fragrance for bf by Eq. (14)
8. end for
9. identify best bf
10. for every node in the network
11. Create an arbitrary number r from [0, 1]
12. if r < p then
13. Move toward best solution by Eq. (11)
14. else
15. Move arbitrarily by Eq. (12)
16. end if
17. end for
18. Revise value of the power exponent a
19. end while
20. Output the best path in the network
By this way, the optimal route is selected from the available routes and this tech-
nique chooses better possible route based on the node quality parameters. It results
the reliable and energy-efficient routing.
3.3 Modified TDMA-based Packet Transmission Scheme
Once the paths selected, the packets are transmitted through that selected path
according to the allocated time slots. Modified TDMA-based packet transmission
method for MANET is presented in this study work in order to prevent transmission
collisions, achieve high levels of power conservation, and extend network lifetime.
Control packets are broadcasted in control phrase in the conventional TDMA frame
architecture. The control slots in the control phrase are not utilized for data packet
transmission. To reduce control packet slot usage, the conventional TDMA frame
structure system is modified by removing the control phrase from TDMA frame.
Fig. 2 Structure model of modified TDMA frame
MTDMA frame structure system is shown in Fig. 2. It has a smaller control phrase
than a conventional TDMA frame. Assume that control packet is broadcasted during
data phase’s data slot. If the node gets the response control packet, this data slot will
be reserved. Data packets are then broadcasted in designated data slot.
A TDMA frame in MTDMA frame model only has a large number of data slots.
Depending on the needs of the node, each data slot is utilized to broadcast data
packets, control packets, or both. When required, data packet might be broadcasted
along the control packet. It can keep away from data packet conflict in broadcasting
by using a specific slot reserve algorithm in the modified TDMA frame.
In this proposed research work, the simulation was conducted using NS2 simulator.
In this simulation, they used 50 mobile nodes that were randomly placed in a 1000
× 1000 square area. The speed of each node varies from 5 to 30 m/s, and it follows
a random waypoint model. Each node’s transmission range is set at 250 m. The
performance of the proposed improved sensor modality-based butterfly optimiza-
tion for quality of service aware routing with packet scheduling (ISMBOQAR-PS)
is compared with the previous QoS aware differential ant-stigmergy (QDAS) and
red deer algorithm-based Energy-efficient QoS routing (RDA-EQR) with regard to
throughput, PDR, energy usage, and E2E delay.
Here, considered four measures for measuring the suggested QDAS performance:
(i) throughput, (ii) packet delivery ratio (PDR): ratio of total packets received to total
packets, (iii) energy consumption: the total energy consumed for completing the
successful data transmission, and (iv) end-to-end (E2E) delay: It represents total
time spent to broadcast data packet from source to destination.
QDAS RDA-EQRP ISMBOQAR-PS

1.2
Throughput (kbps) 1
0.8
0.6
0.4
0.2
0
10 20 30 40 50
Number of nodes
Fig. 3 Throughput versus number of nodes
4.1 Throughput (TP)
Figure 3 illustrates throughput outcome of suggested ISMBOQAR-PS is compared

with the previous QDAS and RDA-EQR approaches. Number of nodes is plotted
along x-axis along with throughput is plotted along y-axis. It is noted that introduced
ISMBOQAR-P Sattains higher throughput rate of 0.96 kbps when other methods such
as QDAS and RDA-EQR achieve 0.93 kbps and 0.95 kbps, respectively, for 50 nodes.
Efficiency of nodes with regard to throughput is observed to be even better total nodes
increases. Cause for this is that suggested work is capable of determining the best
route using ISMBO. Also, ISMBOQAR-PS is designed to meet QoS requirement
with MTDMA-based packet transmission.
4.2 Packet Delivery Ratio (PDR)
Suggested ISMBOQAR-PS performance is contrasted against current QDAS and

RDA-EQR methods with regard to PDR. Along x-axis, consider number of nodes
and PDR along y-axis. By experimental outcomes, it can be concluded that the
suggested system attains high delivery ratio rate of 0.9% when existing methods
such as QDAS and RDA-EQR attain 0.85% and 0.871%, respectively, for 50 nodes.
By the graph, when number of nodes increases the appropriate delivery ration also
increases gradually (Fig. 4).
Packet Deliery ratio (%)

0.92
0.9
0.88
0.86
0.84
0.82
0.8
0.78
10 20 30 40 50
Number of nodes
Fig. 4 PDR versus number of nodes
4.3 Energy Consumption
The proposed ISMBOQAR-PS performance is compared with the previous QDAS

and RDA-EQR approaches with regard to energy consumption. Suggested methods
energy consumption and existing methods energy consumption are shown in Fig. 5.
In x-axis, number of nodes is taken and energy consumption is taken as y-axis. In
this proposed work, modified TDMA-based packet transmission scheme is utilized
to improve the energy consumption. The experimental result shows that the proposed
system attains energy consumption of 260j, whereas the previous method such as
QDAS and RDA-EQR achieves 221j and 251j, respectively, for 50 nodes. The
proposed system is designed to meet QoS requirement and stability of the path
selection with better energy consumption.

300
Energy Consumption
250
200
150
100
50
0
10 20 30 40 50
Number of nodes
Fig. 5 Energy consumption versus number of nodes


4.5
4
End to end delay (s) 3.5
3
2.5
2
1.5
1
0.5
0
10 20 30 40 50
Number of nodes
Fig. 6 E2E delay versus number of nodes
4.4 End-To-End (E2E) Delay
From graph, it indicates E2E delay performance of proposed ISMBOQAR-PS and

QDAS and RDA-EQR approaches. Along x-axis, number of nodes are considered and
E2E along y-axis. Proposed system attains different E2E delay with increase of node
mobility in the network. Through the experimental outcomes, it can be concluded
that suggested system attains minimum delay of 3.2 s, whereas other methods such
as QDAS and RDA-EQR achieve 4 s and 3.7 s, respectively, for 50 nodes (Fig. 6).
5 Conclusion
A wide range of fields, such as commercial and military applications, employs

MANETs. Thus, establishing a path from source to target is essential to ensure
that the data packet delivered meets the QoS requirements. The proposed system
designed a improved sensor modality-based butterfly optimization for quality of
service aware routing with packet scheduling (ISMBOQAR-PS) which satisfies
energy and delay constraints. In this proposed work, improved sensor modality-based
butterfly optimization (ISMBO) algorithm is utilized for optimal path selection. The
node quality metrics are utilized as a fitness function. To achieve collisions free
transmission, modified time-division multiple access (TDMA)-based packet trans-
mission is performed. Experimental results show that the proposed system achieves
high performance compared to the previous system with regard to throughput, PDR,
energy utilization, and E2E delay.
References
1. Kanellopoulos D, Sharma VK (2020) Survey on power-aware optimization solutions for

manets. Electronics 9(7):1129
2. Nazhad SHH, Shojafar M, Shamshirband S, Conti M (2018) An efficient routing protocol for
the QoS support of large-scale MANETs. Int J Commun Syst 31(1):e3384
3. Tilwari V, Dimyati K, Hindia MHD, Fattouh A, Amiri IS (2019) Mobility, residual energy,
and link quality aware multipath routing in MANETs with Q-learning algorithm. Appl Sci
9(8):1582
4. Sarkar S, Datta R (2017) An adaptive protocol for stable and energy-aware routing in MANETs.
IETE Tech Rev 34(4):353–365
5. Singal G, Laxmi V, Rao V, Todi S, Gaur MS (2017) Improved multicast routing in MANETs
using link stability and route stability. Int J Commun Syst 30(11):e3243
6. Nigar N, Azim MA (2018) Fairness comparison of TCP variants over proactive and reactive
routing protocol in MANET. Int J Elect Comput Eng 8(4):2199
7. Jain R, Kashyap I (2019) An QoS aware link defined OLSR (LD-OLSR) routing protocol for
MANETs. Wirel Pers Commun 108(3):1745–1758
8. Sivaram M, Porkodi V, Mohammed AS, Manikandan V, Yuvaraj N (2019) Retransmission
DBTMA protocol with fast retransmission strategy to improve the performance of MANETs.
IEEE Access 7:85098–85109
9. Selvakumar G, Ramesh KS, Chaudhari S, Jain M (2019) Throughput optimization methods
for TDMA-based tactical mobile ad hoc networks. In: Integrated intelligent computing,
communication and security. Springer, Singapore, pp 323–331
10. Kokilamani M, Karthikeyan E (2017) A novel optimal path selection strategy in MANET using
energy awareness. Int J Mobile Network Des Innov 7(3–4):129–139
11. Chen Z, Zhou W, Wu S, Cheng L (2020) An adaptive on-demand multipath routing protocol
with QoS support for high-speed MANET. IEEE Access 8:44760–44773
12. Saravanan N, Subramani A, Balamurugan P (2019) Optimal route selection in MANET based on
particle swarm optimization utilizing expected transmission count. Clust Comput 22(5):11647–
11658
13. Kasthuribai PT, Sundararajan M (2018) Secured and QoS based energy-aware multipath routing
in MANET. Wirel Pers Commun 101(4):2349–2364
14. Ye Y, Zhang X, Xie L, Qin K (2020) A dynamic TDMA scheduling strategy for MANETs
based on service priority. Sensors 20(24):7218
15. Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global
optimization. Soft Comput 23(3):715–734
16. Arora S, Singh S (2017) Node localization in wireless sensor networks using butterfly
optimization algorithm. Arabian J Sci Eng (Springer Sci Bus Media BV) 42(8)
IoT Based Electricity Theft Monitoring
System
S. Saadhavi, R. Bindu, S. Ram. Sadhana, N. S. Srilalitha, K. S. Rekha,

and H. D. Phaneendra
Abstract The electricity theft in the public spaces is increasing day by day and has
affected the regular electricity management process. It has been a tedious task in the
electricity management to identify the theft of energy in public connections. As a
result, a system that can identify power theft and make the necessary decisions in both
normal and stolen conditions is required. The idea presented here proposes the use of
Microcontrollers to monitor the electricity distribution system, which can be readily
implemented with current electronic meters. The system continuously measures the
amount of power delivered by the distribution unit as well as the amount of energy
consumed at the consumer’s location. If energy was tapped straight from an overhead
distribution feeder, we can discover and take appropriate action immediately by
comparing the two values. Also the nontechnical losses at customer’s site can be
detected likewise. The proposed idea can communicate in two directions namely, the
consumers and the utility company. A web application is being developed to obtain
data on consumption power for consumers, as well as a desktop application for real-
time data monitoring for administration and we can also terminate the service on the
basis of power theft and reconnect on specific measures.
Keywords Power theft · Distribution system · Microcontroller · Internet of things

(IoT) · Arduino · NodeMCU · Tamper detection · Webserver
1 Introduction
The need for electricity has grown in response to the growing demand for modern
equipment and the general public’s desire for a more opulent lifestyle. The theft of
electricity has resulted in the power distribution problem. If electricity is illegally
consumed, the country’s economic situation will be severely disrupted. It is critical
to control electrical power usage and make optimum use of it without wasting it.
S. Saadhavi (B) · R. Bindu · S. Ram. Sadhana · N. S. Srilalitha · K. S. Rekha · H. D. Phaneendra

The National Institute of Engineering, Mysuru 570008, India
e-mail: 4ni17cs073_b@nie.ac.in
https://doi.org/10.1007/978-981-16-7610-9_35
478 S. Saadhavi et al.
It is an easy task to find out the power theft from the legal consumer meter rather
than checking the same with the illegal consumers. There are various types of power
theft such as energy meter by-passing, removing the wires from the energy meters,
energy meter tampering, etc. The electricity theft majorly occurs at two places, the
distribution line and on the energy meters attached at home. There is a need for an
electricity management system to find the amount and the location of the power theft.
The only possible way is to compare the consumer power consumption data with the
power distribution transformer data.
In this paper, we have proposed a model to check the power theft and inform
to the authority. Monitoring of electricity distribution system helps in electricity
management and finding out the misuse of power consumption. The hardware module
consists of an Arduino board connected to sensors deployed for real-time monitoring
the power supply and theft related aspects. The current sensors will monitor the data
related to power and sends the data to the Arduino board. The data collected from
the electricity distribution system will be stored in the database GoDaddy.
An application to monitor the theft is developed for desktop application. The
application is able to administer the power theft and also take actions by automatically
disconnecting the power supply through relay module.
A desktop application is developed for the monitoring the energy theft by the
meter tampering. Based on the tampering details of the customer the suitable action
is taken to disconnect the power supply through the relay module. A web application
is used at customer site to get the information on energy consumption and email
generation in case of power disconnection due to theft.
2 Related Work
In [1], R. E. Ogu and G. A. Chukwudebe have proposed the functional system for
detecting the electricity theft. They have suggested the preventive mechanism to
avoid the energy theft using IoT platform. The Arduino MKR1000 microcontroller
board is used to coordinate the overall functionalities. An Infrared sensor is used to
detect the theft when the humans try to open the sensitive part of the meter.
In [2], Prashant Choudhary and Jitendra Nath Bera have proposed an algorithm
to calculate the voltage drop across the power distribution network. A mechanism is
proposed to identify the theft locations and sends SMS about energy consumption
information.
In [3], Mohd. Uvais, have implemented a detection system to find the power theft
in the distribution line. The system is implemented using the MATLAB to detect
the theft. To read the voltage and current reading from LT side of the distribution
transformer a controller is designed.
In [4], Ms. Aishwarya P. Kamatagi et al., have proposed a system using IoT to
monitor the energy meter readings. The Raspberry pi is connected to the energy
meter is used for monitoring the variations in the meter readings and print the same.
IoT Based Electricity Theft Monitoring System 479
In [5], Muhammad Badar Shahid et al., have designed a power theft detection
system. The system will monitor the theft using consumer load profiling, which may
happen at the customer place or at the power distribution line. A prevention algorithm
is designed to monitor the theft and take suitable action. When the power theft occurs,
measures will be taken to disconnect the legal consumers and to send a high voltage
pulse at the distribution line.
In [6], H. E. Amhenrior et al., have designed and implemented an automatic
tamper detect and reporting system. The system will identify the bypass internal
theft and external theft on the service cable fetched from the electric poles. The
system consists of a developed Single Phase Prepayment Energy Meter and the
supply authority Global System for Mobile Communications (GSM) capable device
platform. To detect the external bypass energy theft the wireless Current Transducers
were used.
In [7], M.M. Mohamed Mufassirin, et al., have designed and proposed a model to
detect the theft in electricity in Sri Lanka without human interaction. The system
implemented will detect the energy theft and sends the alert message to autho-
rized energy provider. The system is designed with Global System for Mobile
communication (GSM) technology.
In [8], Makarand Sudhakar Ballal, et al., have presented a theft detection and
prevention system based on logic control with consumer care unit. The system will
identify the pilferage locations and estimates the power stolen by illegal consumers.
The system is maintaining the voltage regulation of legal consumers and has
contributed helped for proper revenue collection of electricity.
In [9], Jaya Deepthi B, et al., have proposed and designed an electricity theft moni-
toring and detection system. The system is built using Arduino and GSM Modues.
The power usage data and amount of theft of electricity is displayed on the LCD. To
find the percentage of theft the difference between the power used and power stolen
is calculated. The status of the power theft information is sent to electricity board
and the consumer.
In [10], Kumar Nalinaksh, et al., have proposed system for power theft detection
system, based on a grid and power discoms. The proposed solution is used as an
add-on the existing system, which monitor and record the theft statistics without any
human intervention.
In [11], Sukumar, P., et al., have proposed and implemented a system to monitor
the power theft and also identify the location of theft. The system designed a system
consists of a microcontroller, and a ZigBee module to check for the theft of electricity.
The alarm system will send an alert signal to the user. The various software’s used
are AVR studio and WIN AVR compilers.
In [12], A. U. Kulkarni, et al., have proposed a solution to the electricity theft
problem using IoT. The system consists of Arduino Uno and Raspberry Pi3, which
monitors the lines connected in parallel. Once the theft is monitored, the theft location
is identified and reported to the admin through SMS.
In [13], Zahoor Ali Khan has proposed a technique based on supervised learning
technique to detect the electricity theft. The model is efficient compared to other
existing models as the preprocessing of the data is done using sigma rule and normal-
ization method. The Adasyn algorithm is proposed to overcome the class imbalance
problem.
In [14], Mahima Singh, et al., have proposed a smart energy meter to keep track
of the number of units used. The system is designed to monitor the theft using the
wireless sensor networks. The integrated technology is going to measure the normal
power consumption of the consumer and also identifies the theft occurred in the
distribution line.
In [15], Anish Jindal. et al., have discussed various data-driven techniques that can
be used for detection of electricity theft in smart grid infrastructures. The two types
of theft detections at the meter level and at the aggregate level were discussed. The
values at the meter level and at aggregate level can be used to identify the anomalous
measurements.
Problem Statement
Electricity pilfering is the nontechnical losses in the power distribution system. As
there are no any approved estimates of theft but it is assumed to be around 30%. Due
to power theft true bills are not generated by the meter and electricity authority cannot
take the actual charges of electricity usages. Power theft has resulted in significant
earning losses in respected authority, creating a fund crisis for investment in power
system and demanding an expansion in generating capacity to cope with power
losses. Yet, the attitude toward electricity theft control remains pathetic. Manually, it
is difficult to identify the theft. Looking upon these consensuses, there is a need for
the real-time automatic theft detection systems which can also be easily incorporated
with existing infrastructure.
3 Existing System
The earlier energy meter was built on aluminum disc, and the values of voltage
and current values real-time. But, since there was no safety measures used, the
thieves would easily tamper the system to misuse the power supply. The thieves are
implementing several techniques to tamper the electricity using meter. There is no
mechanism in meter to detect and deal with the theft happened. To handle the misuse
of the power the electric meters were designed to overcome the issues such as meter
data accuracy and power theft. The electronic meter will display the amount of power
consumed on a LED display. There is a need for a technician to monitor the monthly
billing process.
4 Proposed System
This project proposes monitoring of electricity distribution system, designed using

comprising of Microcontrollers that can be easily plugged in with existing electronic
meters. The system continuously analyses data on energy distribution and sends it
to the electricity providers over the Internet. It may measure the amount of power
delivered by the distribution unit as well as the amount of energy consumed at the
customer’s location. By comparing both the values, we can detect and take suitable
actions instantly if energy were tapped directly from an overhead distribution feeder.
The nontechnical losses at customers site can be detected likewise by analyzing the
amount of power in phase and neutral wire. The system is capable of effectively and
accurately communicating theft information and location between the customers and
the company.
5 Design
The power theft can be identified by the additional current flowing in the distribution
line. The theft of power supply is monitored and once the usage limit crosses the
range then immediate action will be taken by the Admin or the controller. Based
on the amount of theft the decision will be taken to prevent the external tapping in
the distribution line. Similarly, if difference between the amount of power in phase
and neutral wire exceeds beyond a limit, meter tampering at consumer’s side can be
detected.
The proposed system is designed to solve many of the existing issues of power
theft. The system wireless network is used to establish reliable and effective commu-
nication of the system. The system monitors the distribution line and detects the
power theft based on the data received at the instant. The power theft location can be
determined based on the additional current consumed from a particular place. The
electricity inspection line men will inform the electricity authority, incase if there is
a direct power theft from the distribution line. Simultaneously controller sends the
signal to relay to block the electricity. If the theft was detected at customer’s site,
it will send an alert message for customer to remove theft load. The system will be
reinitialized after the rectification.
The Fig. 1 shows the block diagram of the components of the electricity monitoring
system. The electricity monitoring system consists of a Microcontroller ARDUINO
and NODEMCU, Relay module and Current sensors.
Fig. 1 Block diagram of electricity monitoring system
6 Implementation
The electricity distribution monitoring system is implemented using both hard-

ware components and software tools. The following steps were carried out in the
implementation phase.
• A customized model is built using the microcontroller and current sensor.
• The MQTT is implemented to access telemetric message service and also a
database is designed for storage purpose.
• A Desktop application is built to monitor the real-time theft.
• A Relay model is added to control the load. The Relay model will monitor and
turn of the load incase if theft occurred.
• The Client application is designed to check consumption detail log.
6.1 Hardware Components
• SCT013 CURRENT SENSOR is a Non-Invasive AC current sensor used for

measuring the electricity consumption of a building.
• ARDUINO UNO (R3) microcontroller is used to read the current sensor values.
• NODEMCU ESP8266 has high processing power with inbuilt Wi-Fi/Bluetooth
and Deep Sleep Operating features specifically used for IoT projects.
• A combination of Arduino and Nodemcu is used to build the embedded system
because all operations can not be performed by single hardware board. The
connection between them is established using UART serial communication
protocol.
• 5 V SINGLE CHANNEL is used to disconnect the power supply at theft condition.

• HiveMQ’s MQTT broker is designed for cloud native deployments and used
for sending sensor messages and receiving control messages from desktop
application.
• A GoDaddy’s database is used to store consumer login details and consumption
values which is accessed by Consumer application.
6.2 Software Tools
• Arduino IDE is an open source software, where the editor is used for writing the
required code and compiler is used for compiling and uploading the code into the
given hardware module.
• Microsoft Visual Studio is used to develop ASP.NET web application for
consumer login and C#.NET framework Windows form desktop application for
administrator.
6.3 Algorithm for Monitoring System
• Step 1. Turn on the electricity monitoring system.

• Step 2. Start monitoring in desktop application.
• Step 3. Read the values from current sensors and update to cloud.
• Step 4. If external tampering is detected
Then inform the authority to check the line.
• Step 5. If internal tampering is detected
Then cut-off power supply through relay and reconnect only after certain
measures are undertaken.
• Step 6. Go back to step 3 until the system is turned off (Fig. 2).
7 Results
Initially, when the system is turned on, the microcontroller reads the values from the
current sensors. It takes around 30 seconds for microcontroller to calculate sampling
values for the first time. The system will collect the sensor information with a duration
of 3 seconds and the sensed data is aggregated in the Microcontroller to obtain the
instantaneous value.
The Fig. 3 shows the working model of an electricity monitoring system,
consisting of SCT013 CURRENT SENSOR. For measuring alternating current
across a wire. ARDUINO UNO (R3). For reading current sensor values. NODEMCU
ESP8266 has inbuilt Wi-Fi to upload sensor values to the cloud. 5V SINGLE
CHANNEL RELAY used to disconnect the power supply at theft condition. Jumper
Fig. 2 Workflow of the electricity monitoring system
Fig. 3 Electricity
monitoring system
wires are used to interconnect the components. AC Main Chords are used for
consuming power supply. The bulbs are used for showing the theft condition.
The admin has to click ‘start monitoring’ button to read the sensor values. The
Fig. 4 shows the desktop application designed for an admin. The purpose of this
application is to monitor the customer’s power consumption status. The monitoring
for a particular customer is done by entering their email and clicking the start moni-
toring button. The power consumption values are read and stored in the cloud, then
sent to the desktop application for further process. Then the value is added to list
view tool and appropriate log message is generated. The values indicate 3 conditions
namely, the normal condition, internal theft condition and external theft conditions.
During the normal power consumption the buttons used for indicating internal
and external theft remain green as shown in Fig. 5. In case of the internal theft, the
condition can be detected using the power consumption values as shown in Fig. 6.
The button for internal theft turns yellow on the first attempt of theft and turns red if
the same condition continues. A theft warning mail is sent to the customer and power
supply is automatically disconnected. The power supply is reconnected only after
the values reach normal condition and the internal theft button turns back green.
The desktop application is able to detect the external theft. The button is changed
to red color indicating the external theft. The admin notices this change and inform
the respective authorities to check the power supply lines and take necessary actions.
The button for external theft turns green only after all the issues are resolved and
normal values are being read by the sensors.
The consumer application consists of registration, login and view consumption
pages. The power consumption page contains the details such as date, time, phase
value and the summation of the total energy consumed. The phase value is the product
of voltage and the current consumed. The Fig. 7 shows the consumer login page and
Fig. 8 shows the power consumption details page of consumer application.
Fig. 4 Desktop application

Fig. 5 Normal condition
Fig. 6 Theft condition
8 Conclusion and Future Enhancements
The electricity monitoring system is designed to monitor the power theft in the
distribution system. The measured data is collected from the sensors and updated to
MQTT broker public cloud. The systems also generate an alert and send the theft
information to the concerned authorities. The experimental result shows the elec-
tricity monitoring system effectively monitors and alert against power theft. The
system designed is very reliable and low cost. The system designed is scalable and
capable of handling the data collected from the different sensors. The system helps
the admin to access and control the theft remotely without human intervention. This
Fig. 7 Login page
Fig. 8 Power consumption details

will reduce the electric line tampering to the maximum extent. The proposed frame-
work is capable of resolving the most prevalent issue of power theft. By designing
appropriate devices, the functionality of the system can be moved over the edge for
easy implementation in existing system. Online billing notification through SMS or
email should be linked with this system.
References
1. Ogu RE, Chukwudebe GA (2017) Development of a cost-effective electricity theft detection

and prevention system based on IoT technology. In: 2017 IEEE 3rd international conference
on electro-technology for national development (NIGERCON). pp 756–760. https://doi.org/
10.1109/NIGERCON.2017.8281943
2. Choudhary P, Bera JN (2020) SMS based load flow monitoring and analysis for theft location
detection in rural distribution systems. 2020 IEEE calcutta conference (CALCON). pp 386–390.
https://doi.org/10.1109/CALCON49167.2020.9106499
3. Uvais M (2020) Controller based power theft location detection system. In: 2020 international
conference on electrical and electronics engineering (ICE3). pp 111–114. https://doi.org/10.
1109/ICE348803.2020.9122940
4. Kamatagi AP, Umadi RB, Sujith V (2020) Development of energy meter monitoring system
(EMMS) for data acquisition and tampering detection using IoT. In: 2020 IEEE international
conference on electronics, computing and communication technologies (CONECCT). pp 1–6.
https://doi.org/10.1109/CONECCT50063.2020.9198495
5. Shahid MB, Shahid MO, Tariq H, Saleem S (2019) Design and development of an efficient
power theft detection and prevention system through consumer load profiling. In: 2019 inter-
national conference on electrical, communication, and computer engineering (ICECCE). pp
1–6. https://doi.org/10.1109/ICECCE47252.2019.8940644
6. Amhenrior HE, Edeko FO, Ogujor EA, Emagbetere JO, Design and implementation of an
automatic tamper detection and reporting capability for a single phase energy meter. 2017 IEEE
3rd international conference on electro-technology for national development (NIGERCON)
7. Mohamed Mufassirin MM, Hanees AL, Shafana MS, Energy theft detection and control-
ling system model using wireless communication media. Proceedings of fifth annual science
research sessions 2016 on “Enriching the novel scientific research for the development of the
nation”. pp 123–130
8. Ballal MS, Suryawanshi H, Mishra MK, Jaiswal G, Online electricity theft detection and
prevention scheme for smart cities. IET Smart Cities 2(3):155–164. https://doi.org/10.1049/iet-
smc.2020.0045
9. Jaya Deepthi B, Ramesh J, Chandra Babu Naidu P (2019) Detection of electricity theft in the
distribution system using arduino and GSM. In: 2019 international conference on computation
of power, energy, information and communication (ICCPEIC). pp 174–179. https://doi.org/10.
1109/ICCPEIC45300.2019.9082397
10. Nalinaksh K, Pathak L, Rishiwal V (Feb 2018) An internet of things solution for real-time identi-
fication of electricity theft and power outrages caused by fault in distribution systems(converting
existing electrical infrastructure of third world countries to smart grids). Conference: 2018 3rd
international conference on internet of things: smart innovation and usages (IoT-SIU). https://
doi.org/10.1109/IoT-SIU.2018.8519935
11. Sukumar P, Mounika S, Nesan G, Boopathi B, Vinoth Kumar M (2019) Wireless electricity
theft detection using Zigbee. Int Res J Eng Technol (IRJET) 6(3)
12. Kulkarni AU, Jayalakshmi GN (2018) IoT solution for live wire tampering. 2018 IEEE Punecon.
pp 1–7. https://doi.org/10.1109/PUNECON.2018.8745414
13. Khan ZA, Adil M, Javaid N, Saqib MN, Shafiq M, Choi J-G (2020) Electricity theft detection
using supervised learning techniques on smart meter data. Sustainability 12:8023. https://doi.
org/10.3390/su12198023, http://www.mdpi.com/journal/sustainability
14. Singh M, Kumari A, Goyal V, Kumar P (Feb 2019) Energy theft detection by smart energy
meter using WSN in real time. Int J Eng Res Technol (IJERT) 8(2) ISSN: 2278-0181
IJERTV8IS020089
15. Jindal A, Schaeffer-Filho A, Marnerides AK, Smith P, Mauthe A, Granville L (30 March
2020) Tackling energy theft in smart grids through data-driven analysis. In: 2020 international
conference on computing, networking and communications (ICNC), IEEE Xplore, INSPEC
accession number: 19493849. https://doi.org/10.1109/ICNC47757.2020.9049793
An Exploration of Attack Patterns
and Protection Approaches Using
Penetration Testing
Kousik Barik, Karabi Konar, Archita Banerjee, Saptarshi Das,

and A. Abirami
Abstract The purpose of penetration testing is to assess the vulnerabilities present in

communication networks/Digital Devices. Penetration testing analyses the strength
of protection techniques in the digital environment. This test is conducted at periodic
intervals to analyze risks and control to accomplish more distinguished security
standards. The proposed work discusses factors and components while preparing a
penetration test. Various penetration tests are performed on private networks using
different tools on the Kali Linux platform. The types of attack considered for this
study are credential harvester, web jacking, and smartphone device penetration in
secured penetration testing laboratory setup. The tests are performed in detail with
various criteria like successful, partially successful, and failure. Recent studies show
how organizations suffered because of security incidents. Finally, some mitigation
strategies are pointed out to counteract these threats to develop awareness among
users.
Keywords Kali Linux · Metasploit framework · Mitigation strategies · Penetration

testing · Social engineering toolkit
1 Introduction
Data privacy and information security are on the uppermost precedence list for an
organization nowadays—all sensitive information in business demands to protect to
develop a competitive success. Therefore, penetration testing examines an organi-
zation’s information technology infrastructure that incorporates software, hardware,
K. Barik (B) · K. Konar · A. Banerjee · S. Das

JIS Institute of Advanced Studies and Research, Kolkata, India
S. Das
e-mail: saptarshi@jisiasr.org
A. Abirami
Bannari Amman Institute of Technology, Anna University, Erode, India
https://doi.org/10.1007/978-981-16-7610-9_36
492 K. Barik et al.
and resources. This process involves reviewing an organization’s information tech-

nology infrastructure concerning vulnerabilities like operational process, hardware
configuration, software configuration, patch management, system configuration, and
software to identify the weakness.
The prime goal of penetration testing is to discover the security gap in existing
systems like exploring internal employee awareness, analyzing an organization’s
security policy, and examining the efficacy of an organization to counter security
events. There are four standard penetration tests: external testing, internal testing,
blind testing, and double-blind testing [1]. An external pen test involves an ethical
hacker attempting to enter an organization’s devices, such as domain name service
(DNS), email, routers, firewall, web applications, and websites across the Internet.
The aim is to assess if an outside intruder can get illegal entrance. An internal
penetration test [2] is conducted inside an organization’s network, seeing vulnera-
bility indoors. Blind testing [3] simulates the operations and methods of an authentic
hacker—the penetration tester endeavors to gather information concerning the target
from publicly accessible sources. Finally, double-blind testing [4] is conducted based
on public data available without informing the employee. It allows the organization
to perceive the reactions of the security administrator. There are different penetration
testing tools available with open source and commercial versions. Some of them can
be customizable to increase their potency and depends on the situation to be exam-
ined. Penetration testing comprises three types: black, white, and gray [5] based on
information. The penetration tester has no earlier details on the targeted system in
case of a black-box test, and they use their method to find out the breach. A gray box
penetration test prompts focusing on the targets with the most significant risk and
value from the start. It is ideal for an attacker with long-term access to the network
by knowing a user with elevated rights. The penetration tester has earlier informa-
tion about the intended system like IP address, protocols, network infrastructure in
white-box penetration testing.
There are various penetration testing approaches [3] like network penetration
testing, web application penetration testing, website penetration testing, wireless
penetration testing, social engineering attacks, physical testing, cloud penetration
testing to analyze an organization’s securities. Network penetration testing simu-
lates penetration methods for breaching networks, including routers, firewalls, port
scanning, policy evaluation, network traffic analysis, evaluation IDS, identifying
legacy devices, third-party appliances [2]. Common vulnerabilities include asset
misconfiguration, weak passwords, unwanted open ports, wireless network distur-
bance, product-specific issues. In web application penetration testing, the penetration
tester explores potential risks in server-oriented applications, including a web appli-
cation, connection, APIs, mobile application, framework, and patch management.
Common vulnerabilities include cross-site scripting, session management, source
code flaws, injection flaws, authentication protocols, and cryptography-related issues
[6]. In website and wireless penetration testing, penetration testing detects vulnerable
wireless network configuration, authentication check, and website vulnerabilities,
including web server configuration-related issues, mac address spoofing, cross-site
scripting, access points, and encryption [7]. While in social engineering attacks,
An Exploration of Attack Patterns and Protection Approaches … 493
penetration test explores vulnerabilities within employees, and ethics includes tail-
gating, phishing attacks, and eavesdropping [8]. In physical testing, an endeavor is
to get the physical path to test the integrity of cameras, sensors, RFID systems, entry
systems, keypads [9]. Finally, cloud penetration testing measures the protection of
cloud-based assets, including configuration, networks, credentials, procedure, and
data sensitivity [10].
Three types of penetration tests conducted using independent networks in a
protected environment are presented. Different penetration analysis tools within the
Kali Linux platform are explored. The results are formulated and delivered, the
defensive strategies to ward off those compromises.
The remaining paper is formulated as follows. Section 2 discusses the related work.
Section 3 comprises the proposed methodology, set up a laboratory environment,
and performed attacks. In Sect. 4, different types of penetration attacks, captured
and presented in this paper. Section 5 comprises the mitigation strategies, outcomes,
and discussions. Finally, conclude the paper in VI and provide various directions for
future research.
2 Related Work
Reddy and Yalla [11] proposed mathematical analysis of penetration testing and
simulation and graphs to develop data security and strategy. Guarda et al. [12]
suggested a framework to provide guidelines for penetration testing in virtual envi-
ronments. Nagpure and Kurkure [13] performed vulnerability assessment and pene-
tration testing of web applications using both manual and automation processes.
They showed that the automatic penetration testing process is more accurate than the
manual process. Zitta et al. [14] performed penetration testing of intrusion detection
and prevention system Suricata IPS tools in applications of security of embedded IoT
devices. Hasan and Meva [15] studied the mechanics of the VAPT process and gath-
ered tools that are useful during the VAPT process in web applications. Lyashenko
et al. [16] analyzed the effectiveness of detection of phishing attacks using real-time
data using different tools.
Salahdine and Kaabouch [17] presented a detailed study of social engineering
attacks, classifications, detection strategies, and prevention procedures. Rahalkar
[18] proposed a study on the essential details and configurations invoking other tools
in the Metasploit framework. Cayre et al. [19] proposed a new security audit and
penetration testing framework called mirage dedicated for IoT systems. Patel [20]
surveyed current vulnerabilities, tools used to determine vulnerabilities to secure the
organization from cybersecurity threats. Patel and Patel [21] presented an analytical
study of penetration testing using tools to enhance wireless infrastructure security.
Raj and Walia [22] performed a scanning exploits system using the Metasploit frame-
work tool. Pandey et al. [23] conducted a vulnerability assessment and penetration
testing using a controlled setup using Raspberry Pi 3b+. Alabdan [24] showed a
494 K. Barik et al.
comprehensive analysis of phishing attack techniques using vectors and other tech-
nical approaches. Lu and Yu [25] proposed monitoring, scanning, capturing, data
analysis on Wi-Fi networks using Kali Linux.
3 Proposed Methods
Kali Linux platform [26] is used for this study which is an open source tool, download-
able from (www.kali.org). It comes with preinstalled tools to support information
security assignments like ethical hacking. Offensive Security [27] developed Kali
Linux, a renowned information security company primarily related to superior pene-
tration testing and security checking. Some significant points of Kali Linux are: It is
customizable, It includes over 600 penetration testing tools, It has a custom kernel
and the latest patches included, It supports multi-language and GPG signed pack-
ages, It is Filesystem Hierarchy Standard (FSH) compliant and Known bug issues
are detailed.
There are many penetration testing tools available in the market, and a comparison
study of various tools is presented in Table 1.
In this section, three types of penetration tests are conducted in a secure
environment.
Table 1 Comparison of penetration testing tools

Name Ownership Platform Functionality
Nmap [28] Freeware Windows, linux, unix Network mapper supports
from the scanning of various
types of protocols
Nesus [29] Freeware Windows, unix, linux Popular vulnerability scanner
and used for remote scanning
Metasploit [30] Freeware Windows, unix, linux It is used to test weaknesses
in the operating system
Wireshark [31] Freeware Windows, unix, linux Protocol analyzer with GUI
version
Internet security scanner Shareware Windows, unix, linux Managing security risks and
[32] detection of network
vulnerability
BeEF [33] Freeware Windows, linux Focuses on web browser
security
Aircrack-ng [34] Freeware Windows For accessing wireless and
network security
Social engineering toolkit freeware Linux, mac For social engineering attacks
[35]
3.1 Credential Harvester Attack Method
This section discusses how to generate the credential harvester attack method [36].
This attack method is applied to clone a website to perform phishing attacks to get user
credentials from the system. Using Kali Linux, Metasploit, and Social engineering
toolkit, Turstedsec [35], a clone of the application has been created. A clone URL of
https://gmail.com is started and running on port no 80. Once the target user clicks on
the link on our secured test setup, it will present them with a replica of gmail.com.
Once the user login in by sharing username and password, the user will be redirected
back to the legitimate site. This setup can trap the user and password of the user logged
using cloned URL, Fig. 1. shows a replica of gmail.com, and Fig. 2. represents the
number of users who clicked and tried to log in using a cloned URL.
3.2 Web Jacking Attack Method
It creates a site clone and displays the target user with a link affirming that the website
has moved into a new location [37]. A clone application is created using Kali Linux,
Metasploit, and Social engineering toolkit, Turstedsec [21]. In this section, gmail.com
cloning is done; the URL floated over it would be gmail.com. When the user clicks
the moved link, Gmail opens and is replaced with the malicious web server. The
timing of the web jacking attack can be changed in the config/set_config flags. A
standard website cloner, https://gmail.com with port no 80, is used, represented in
Fig. 3.
Fig. 1 Clone of gmail.com

496 K. Barik et al.
Fig. 2 The number of users clicked the clone URL
Fig. 3 A standard website cloner
After clicking the links, it will redirect users to the reproduced web page shown
in Fig. 4.
Fig. 4 Clone of gmail.com

Fig. 5 Pentest.apk file generation
3.3 Smartphone Penetration Testing
Kali Linux, virtual machine, and android emulators are used for smartphone
penetration testing [38]. An android emulator is an android device on which
penetration testing tasks are performed. First, start Kali Linux, log in as a root
user through the virtual machine. Create a deployable application using Kali
Linux and Metasploit. Enter the command in Kali Linux terminal msfvenom—
p android/meterpreter/reverse_tcp LHOST (Our IP address) LPORT = 4444 R >
pentest.apk. The pentest.apk file is generated shown in Fig. 5.
Load the Metasploit application using msfconsole command, and enter the
multi/handler exploit, shown in Fig. 6. Set payload android/meterpreter/reverse_tcp
while creating an APK file with msfvenom. Set intended IP address and port no 4444.
Now transfer the. apk file to the target mobile device. After installing the app on the
target mobile device, access the smartphone, the meterpreter session.
4 Results
These experiments utilize credential harvester attack, web jacking, and smartphone
penetration testing on a secured testing platform using the Metasploit framework.
Two computers, one for the attacker and another one for the server, have been
employed. The server computer runs the Windows 10 Professional operating system
with Intel (R) Core (TM) i7 5 GHz and 16 GB of RAM processor. Additionally, three
virtual machines are installed inside the server with Ubuntu Server 14.04.6 LTS ×
86 with 1 GM RAM are used.
498 K. Barik et al.
Fig. 6 Msfconsole and exploit
Figure 7 presents the credential of the harvest attack. Figure 8 shows the web
jacking attack method, and Fig. 9 shows mobile device penetration testing performed.
Based on the result, classification is formed in three representations, e.g., successful,
partial, and failure. The blue color represents a successful attack, the red shows partial
success, and the green color shows unsuccessful attacks.
The experiment is conducted among 40 users, out of which nine attempted are
successful, six are partial, and 25 are failed attacks. Successful means users have
responded to clone URL, and partial means users partially reacted to clone URL.
A. Credential Harvester Attack Method
22%
Successful
63% 15%
Partial
Failed
Fig. 7 Attack analysis using credential harvest method

B. Web Jacking Attack Method
20%
53%
Successful
27% Partial
Failed
Fig. 8 Attack analysis using web jacking method
C. Mobile Device Penetration Testing
40%
45% Successful
Partial
15% Failed
Fig. 9 Attack analysis, mobile device penetration testing
In web jacking attack methods, attacks are performed among 40 users, out of
which eight attempts are successful, twenty-one is partial, and eleven are failed.
In mobile device penetration testing, attacks are performed among 40 users, out
of which sixteen are successful, six are partial, and eighteen are failed.
500 K. Barik et al.
5 Discussions
Protecting ourselves and building consciousness among users against digital crimes
are the major requirements. First, identify the vulnerabilities and plan to patch them.
In this section, mitigation measures for three different attacks in the laboratory setup
are illustrated.
Credential Harvester Attack Method
User awareness and employee response are significant aspects; indeed, the organi-
zation uses antiphishing, antivirus software with the latest version. They must be
conscious of phishing and should not open any link from an unknown source. In
addition, they should check the URL address details properly before validating user
credentials to avoid being duped.
Web Jacking Attack Method
It is another type of social engineering phishing attack and illegally attempting
command of a site. The user should not provide sensitive information to unknown
links and adequately check the website URL to prevent this attack. Additionally,
users should not consider this is a legitimate site because it sounds ok; a browser
with an antiphishing detection program can be employed.
Smartphone Penetration Testing
In the laboratory setup examined in this paper, taking advantage of Kali Linux and
Metasploit tools testify to describe remote control of an android device. To safeguard
against abuses, the user should not download an application from unknown sources,
download applications from cloud websites, and apply antivirus with a regular update
on mobile devices.
The graph shown in Fig. 10 represents the typical reasons and causes of security
violations in organizations in 2020 [39]. As per the survey report, 34% responded
because of malware attacks, and 29% responded because of data exposure, the orga-
nization suffered security incidents [39]. As per report [39], there are 2,647,428 failed
login counts, tried login as an administrator, 376,206 were unable to log as admin,
9384 were unable to log in as the user RDP login attempts. Figure 11 presents the
comprehensive report of forgotten login in the RDP login attempt.
Awareness is crucial in preventing malicious activities from these experiments,
including phishing attacks, hacking of smartphones, ATM fraud, online banking
fraud, etc. However, there are certain areas where this work can be substantially
improved. For example, the experimental evaluation in this work is performed in
a secured test platform. Still, in a real-life attack scenario, the actual environment
can vary to a great extent. In such a situation, there is always a chance of the tools
as mentioned above behaving unpredictably. Therefore, this work will significantly
benefit from expanding the experimental scenario from secure to real-life.
Organization suffered security incidents
Cryptojacking
Malware
Account
Compromise
Exposed Data
Ransomware
Malware Exposed Data Ransomware

Account Compromise Cryptojacking
Fig. 10 Organizations suffered security incidents
Username in failed login attempts

ssm-user user test
2% 2% 1%
admin
12%
administrator
83%
administrator admin user ssm-user test
Fig. 11 The username used in failed login report
6 Conclusion
Different penetration testing processes are discussed in this paper, several factors
are considered while conducting penetration tests, and popular tools are utilized to
perform the penetration test. With Internet technology and fast digitization advance-
ment, information security is quite challenging for organizations and regular users.
Penetration testing plays a significant role in achieving the security analysis gap in
the existing setup. Open source penetration testing tools can be customized as per
user requirements and used in diverse domains. Three types of attacks are analyzed:
Credential Harvester Attack Method, Web Jacking Attack Method, and Smartphone
502 K. Barik et al.
Penetration Testing are performed in a secured environment. The attacks are analyzed
with three different scenarios and presented with the corresponding mitigation tech-
niques. The future scope is to explore and examine other cyberattacks and devise
algorithmic strategies to prevent them.
Declaration The work is performed in a secure laboratory setup and does not possess any malicious
intent.
References
1. Weissman C (1995) Handbook for the computer security certification of trusted systems.
Information assurance technology analysis center falls church VA.
2. Denis M, Zena C, Hayajneh T (April 2016) Penetration testing: concepts, attack methods, and
defense strategies. In: 2016 IEEE long ısland systems, applications and technology conference
(LISAT). IEEE, pp 1–6
3. Shah S, Mehtre BM (2015) An overview of vulnerability assessment and penetration testing
techniques. J Comput Virol Hacking Tech 11(1):27–49
4. Shorter JD, Smith JK, Aukerman RA (2012) Aspects of ınformational security: penetration
testing is crucial for maintaining system security viability. Technol Plann 13
5. Blackwell C (2014) Towards a penetration testing framework using attack patterns. In:
Cyberpatterns. Springer, Cham, pp 135–148
6. Shuaibu BM, Norwawi NM, Selamat MH, Al-Alwani A (2015) Systematic review of web
application security development model. Artif Intell Rev 43(2):259–276
7. Rahman A, Ali M (Aug 2018) Analysis and evaluation of wireless networks by implementation
of test security keys. In: International conference for emerging technologies in computing.
Springer, Cham, pp 107–126
8. Shindarev N, Bagretsov G, Abramov M, Tulupyeva T, Suvorova A (Sep 2017) Approach
to identifying of employees profiles in websites of social networks aimed to analyze social
engineering vulnerabilities. In: International conference on ıntelligent ınformation technologies
for ındustry. Springer, Cham, pp 441–447
9. Al Shebli HMZ, Beheshti BD (May 2018) A study on penetration testing process and tools.
In: 2018 IEEE long ısland systems, applications and technology conference (LISAT). IEEE,
pp 1–7
10. Mishra S, Sharma SK, Alowaidi MA (2020) Analysis of security issues of cloud-based web
applications. J Ambient Intell Humanized Comput 1–12
11. Reddy MR, Yalla P (March 2016) Mathematical analysis of penetration testing and vulnera-
bility countermeasures. In: 2016 IEEE ınternational conference on engineering and technology
(ICETECH). IEEE, pp 26–30
12. Guarda T, Orozco W, Augusto MF, Morillo G, Navarrete SA, Pinto FM (Dec 2016) Penetra-
tion testing on virtual environments. In: Proceedings of the 4th ınternational conference on
ınformation and network security. pp 9–12
13. Nagpure S, Kurkure S (Aug 2017) Vulnerability assessment and penetration testing of web
application. In: 2017 ınternational conference on computing, communication, control and
automation (ICCUBEA). IEEE, pp 1–6.
14. Zitta T, Neruda M, Vojtech L, Matejkova M, Jehlicka M, Hach L, Moravec J (Dec 2018)
Penetration testing of intrusion detection and prevention system in low-performance embedded
IoT device. In: 2018 18th international conference on mechatronics-mechatronika (ME). IEEE,
pp 1–5
15. Hasan A, Meva D (2018) Web application safety by penetration testing. Int J Advan Stud Sci
Res 3(9)
16. Lyashenko V, Kobylin O, Minenko M (Oct 2018) Tools for ınvestigating the phishing attacks
dynamics. In: 2018 ınternational scientific-practical conference problems of infocommunica-
tions. Science and technology (PIC S&T). IEEE, pp 43–46
17. Salahdine F, Kaabouch N (2019) Social engineering attacks: a survey. Future Internet 11(4):89
18. Rahalkar S (2019) Metasploit. In: Quick start guide to penetration testing. Apress, Berkeley,
CA. https://doi.org/10.1007/978-1-4842-4270-4_3
19. Cayre R, Nicomette V, Auriol G, Alata E, Kaâniche M, Marconato G (Oct 2019) Mirage:
towards a metasploit-like framework for IoT. In: 2019 IEEE 30th ınternational symposium on
software reliability engineering (ISSRE). IEEE, pp 261–270
20. Patel K (April 2019) A survey on vulnerability assessment & penetration testing for secure
communication. In: 2019 3rd ınternational conference on trends in electronics and ınformatics
(ICOEI). IEEE, pp 320–325
21. Patel AM, Patel HR (March 2019) Analytical study of penetration testing for wireless ınfrastruc-
ture security. In: 2019 ınternational conference on wireless communications signal processing
and networking (WiSPNET). IEEE, pp 131–134
22. Raj S, Walia NK (July 2020) A study on metasploit framework: a pen-testing tool. In: 2020
ınternational conference on computational performance evaluation (ComPE). IEEE, pp 296–
302
23. Pandey R, Jyothindar V, Chopra UK (Sep 2020) Vulnerability assessment and penetra-
tion testing: a portable solution Implementation. In: 2020 12th ınternational conference on
computational ıntelligence and communication networks (CICN). IEEE, pp 398–402
24. Alabdan R (2020) Phishing attacks survey: types, vectors, and technical approaches. Future
Internet 12(10):168. https://doi.org/10.3390/fi12100168
25. Lu HJ, Yu Y (2021) Research on WiFi penetration testing with Kali Linux. Complexity
26. https://www.kali.org/
27. https://www.offensive-security.com/
28. https://nmap.org/
29. https://www.tenable.com/products/nessus
30. https://www.metasploit.com/
31. https://www.wireshark.org/
32. https://www.ibm.com/jm/download/IBM_ISS_Overview.pdf
33. https://beefproject.com/
34. https://www.aircrack-ng.org/
35. https://www.trustedsec.com/tools/the-social-engineer-toolkit-set/
36. Boyanov PK, Savova ZN (Oct 2019) Implementation of credential harvester attack method in
the computer network and systems. In: International scientific conference “Defense technolo-
gies,” faculty of artillery, air defense and communication and ınformation systems. Shumen,
Bulgaria
37. Goutam A, Tiwari V (Nov 2019) Vulnerability assessment and penetration testing to enhance
the security of web application. In: 2019 4th ınternational conference on ınformation systems
and computer networks (ISCON). IEEE, pp 601–605
38. Alanda A, Satria D, Mooduto HA, Kurniawan B (May 2020) Mobile application security
penetration testing based on OWASP. IOP Conf Ser: Mater Sci Eng 846(1):012036. IOP
Publishing
39. SOPHOS (2021) Threat report. https://www.sophos.com/en-us/labs/security-threat-report.
aspx
Intrusion Detection System Using
Homomorphic Encryption
Aakash Singh, Parth Kitawat, Shubham Kejriwal, and Swapnali Kurhade
Abstract IT infrastructures are more at risk of attacks related to cybersecurity.

Modern businesses need a great deal of security in order to be protected against
various attacks like U2R, Probe, R2L, and denial of-service (DoS), etc. The issue
with current IDS is that the analysis of the system data is done by external SOC
(Security Operational Centers) which brings up many security concerns like revealing
the details of the network packet which in turn reveals really important information
about the company’s regular activities. We are proposing a method in which we are
basically assessing the detection model on the system data privately such that the
system data, as well as the detection model, is encrypted which helps in minimizing
information leakage. Our desired security goal will be that the security operation
center is not able to learn anything about data owners data, as well as the data owner
is not able to learn anything about SOC’s model. We encrypt the DO’s data and
the SOC’s model. The various machine learning algorithms can be explored for the
detection model like support vector machines, decision trees, and neural networks.
Keywords Intrusion detection system (IDS) · DoS · R2L · Probe · Security

operation center · One hot encoding · Partial homomorphic encryption · Paillier
cryptosystem
A. Singh (B) · P. Kitawat (B) · S. Kejriwal (B) · S. Kurhade

Sardar Patel Institute of Technology, Mumbai, India
e-mail: aakash.singh@spit.ac.in
P. Kitawat
e-mail: parth.kitawat@spit.ac.in
S. Kejriwal
e-mail: shubham.kejriwal@spit.ac.in
S. Kurhade
e-mail: swapnali.kurhade@spit.ac.in
https://doi.org/10.1007/978-981-16-7610-9_37
506 A. Singh et al.
1 Introduction
Our project lies in the domain of Cyber Security, in which it mainly focuses on
Intrusion Detection. Intrusion Detection System is one of the major topics going
on in the world. With hackers using new techniques and technologies, there is an
increased interest in the field of Cyber Attack Detection System as we now have more
advanced threats. For defense against cyberattacks like Denial of Service(DoS), R2L,
U2R and Probe Intrusion Detection Systems are a valid, and convenient solution.
Many IDS rely on two techniques for efficient detection: (1) surveilling IT systems
to collect data such as system logs and network packets, or (2) using detection models
like anomaly detection, classifiers, attack signatures, which is used to classify the
system data [1]. Needless to say, a precise detection model plays a critical part in the
operation of an IDS. Moreover, an IDS which is accurate enough can be formed only
when we have a set containing an ample amount of historical data indicating attacks
and good expertise in this field. Also, alleviation, prevention, and reaction after an
attack has occurred need teams which have some well-defined skill sets. Thus, exter-
nalizing the IDS to cybersecurity specialists is a good policy for many organizations.
The security operation center, also called SOCs, are a convenient and economical
alternative. The issue with current IDS is that the analysis of the system data is done
by external SOC (Security Operational Centers). Intrusion Detection Systems clas-
sify attacks by tracking various activities in IT systems containing various computers
and network links. This is done by monitoring system data, which can be taken from
multiple sources like system network traffic or log files which can reveal sensitive
information about the firm or organization. This brings up many security concerns
like revealing the details of the network packet which in turn reveals really impor-
tant information about the company’s regular activities [1]. The main objectives of
this paper include providing an end to end encrypted model such that the SOC is
not able to learn anything about the Data owner’s data, to evaluate the Intrusion
detection model on the system data using different machine learning algorithms. To
evaluate this model with other traditional or existing Intrusion Detection Systems
with respect to security analysis and performance. For this, we tried various machine
learning models and different types of encryption techniques. The main crux of our
paper is to create an Intrusion Detection System which is highly efficient, secure, and
maximizes the leakage prevention of sensitive information from the Data owner’s
side (Table 1).
2 Related Work
Intrusion Detection System is one of the major topics going on in the world. All
the work done by a company can be stolen in moments if the company cannot stop
intruders from stealing their data or if the company does not know that someone has
hacked their system, or if an attack has occurred or not in either of which cases the
Intrusion Detection System Using Homomorphic Encryption 507
Table 1 Common attacks and their effects

Sr. No. Attacks Effects
1 DoS Economic losses due to downtime and re-sources., service downtime,
disruption to dependent services
2 U2R Intruders gain control of the local systems
3 R2L Intruders gain control of the local systems
4 Probe Surveillance sweep performing either a port sweep or ping on multiple host
addresses. Surveillance sweep through many ports to determine which
services are used by the system
data is going to be leaked. IDS at the moment has two types based on Data source:
either network based or host based. In Intrusion Detection using a host-based system
the data is being taken from the host’s computer, it also keeps check on log Files
and network traffic in accordance with host computer [2]. Network based IDS keeps
checking on data packets of user’s work in a network [2]. In this paper by Roshan
Kumar, the authors worked on a misuse based intrusion detection system. Intrusion
detection based on Anomaly and Misuse are also 2 categories of IDS’s [3]. Anomaly
Intrusion Detection System takes into account the history of user’s actions, whereas
Misuse IDS uses a set of predefined rules in order to work [3]. Updation of these rules
should be regular. As defined by S. Niksefat, we learn how to classify privacy issues
in intrusion detection systems [1]. There are no techniques that can identify all types
of intrusion, therefore to protect data, the model is chosen on the specific application
[1]. The dataset is very difficult to obtain for intrusion detection projects. The dataset
must contain various types of cyberattacks which can be used to attack the data owner.
I. Sharafaldin takes a dataset which includes various attacks and defines the best set
of features to be considered while tackling those attacks [4]. In this paper by R.A.
Popat, they encrypted the data before sending it to SOC and also encrypt features used
in IDS to prevent data leakage to security system owner and model to the data owner.
R.A. Popat implements three different algorithms in encryption where the decision
tree is three times more efficient than other methods [5]. D. Archer implements
steganography to secure data storage on the cloud [6]. The most optimizable and
secure encryption we have seen is Homomorphic encryption. As it easily works on
big data [7]. In their scenario, they used machine learning to their advantage by using
such models for intrusion detection purposes. We can put to work a machine learning
model which first ranks the security features based on the effect those features had
and later on help construct a specialized tree-based Intrusion Detection System on
the basis of the features that had previously been selected [8]. Also, we can use an
algorithm in which we first make random combinations of 3 features using simu-
lated annealing and then SVM is applied on that feature combination, which is then
able to detect anomalous behavior from the Internet data traffic [9]. Also using a
good fusion of machine learning feature selection techniques and classifiers, we can
produce high performance generating combinations [10]. Now Deep learning is one
of the complex branches of ML that helps us learn the ranked feature depiction
508 A. Singh et al.
and constant, continuous relationships by passing the network information through

various layers that are hidden. This field of Deep learning has been successful to
achieve significant results in artificial intelligence, recognition of speech, processing
of image, etc. Now, these performances are also used for various cybersecurity things
like IDS, classifying virus attacks, analyzing and predicting possible network traffic,
detecting ransomwares, categorizing texts that are encrypted, detecting URLs that
can be harmful, detecting various anomalies, and detecting domain names which can
be harmful. The paper basically concentrates on doing the analysis regarding the
efficiency of several classical ML models with Deep learning models for Network
based Intrusion Detection System using the datasets of NIDS that are accessible
openly like the KDD Cup dataset, the NSL KDD dataset, etc. [11]. A lot of academic
analysis to better the efficiency of Intrusion Detection System has been done using
the benchmark KDD Cup dataset. Attacks can be classified into Probe attacks, Denial
of service attacks, User to root attacks, and the Remote to local attacks. The Intru-
sion Detection System created using the RNN model has a good ability for creating
efficient systems for detecting an intrusion, with a great precision for both multiclass
and binary classification [12]. We can create a system using the techniques that are
stated above as an alternative to the traditional Intrusion Detection Systems.
3 Technologies and Concepts
Security Data: If a data which is obtained from a networked system, that basically
helps us to find if attacks, threats, suspicious behavior, anomalies, or any type of
unsanctioned action has occurred, then this data is known as security data. Example,
network packets, system log files, etc. The company or firm, which is responsible
for providing security data is known as the Data owner.
Detection Model: Detection model is basically a machine learning model which
takes the historical data pertaining to security as input and uses it for intrusion
detection. In the Fig. 1, the decision tree depicted is one of the, e.g., of model used
for detection where the node of the tree are the TCP flag description for source and
destination, flow direction and the name of the protocol [13].
Intrusion Policy: It is just a bunch of attack policies which when enforced,
applying the OR operation indicate if or what type of an attack has occurred.
Homomorphic encryption: It is a technique of encryption that allows us to
operate on encrypted data without decrypting it first [14]. It is a very important
concept used in our paper is it basically helped us to prevent the leakage of Data
owner’s data at the Security Operation Center(SOC).There are four main functions
provided to us by the Homomorphic encryption system, these are:
Encryption: Encryption to Cipher text from normal text. Decryption: Decrypting
a Cipher text to a normal text. Key generation: Producing private and public keys.
Evaluating: The process of performing on data that has been encrypted, carrying
out the procedure represented in the binary circuit. In every binary circuit it is
compulsory to describe the depth, number of inputs, and the size.
Fig. 1 Decision tree for intrusion detection
Partial Homomorphic Encryption: Partial homomorphic encryption (PHE)

subset of encrypting using Homomorphic systems in which only a certain amount
of arithmetic procedures or functions can be executed on the values that have been
encrypted [7]. The basic essence of the PHE is that only certain functions like multi-
plication or addition can be executed an endless number of times on the Cipher text.
Paillier cryptosystem is also a PHE, what is it and how we can use it is explained in
the below stated paragraph.
Paillier Cryptosystem: Paillier cryptosystem is a Partially homomorphic System.
It is supposed to have only two types of operations. First operation is for the addition
of the Cipher text, and the second operation is for multiplication. In the Paillier
cryptosystem, if the public key is the modulus n. The homomorphic property is then

∈(m1) · ∈(m2) = g m1r 1n ·g m2 r 2n mod n 2
= g m1+m2 (r 1r 2)n mod n 2
=∈(m1+m2)
where m1 and m2 are messages, g is the base, r1 and r2 are random, and the Epsilon()
function represents the encryption of the message.
4 System Diagram
In the proposed system, there are two entities involved: Data Owner (DO) and Secu-
rity Operation Center (SOC). Security data is owned by the Data owner but lacks the
expertise in the field of intrusion detection and thus shares its data with the external
SOC which has the required expertise and offers its intrusion detection service to the
Data owner. DO but it is hesitant to share the data with an external party because of
security concerns and does only after having taken all the necessary precautions.
510 A. Singh et al.
1. First, SOC with the help of an intrusion policy which is just another defined
bunch of intrusion detection configurations forms its proprietary detection
model.
2. The feature selection process is used to eliminate features which are either
redundant or irrelevant to lower the computing time.
3. The data owner then encrypts the security data with its public key using partial
homomorphic encryption and sends it to the SOC.
4. After the pattern matching phase, the result of the phase which is encrypted by
default is sent to the DO. The DO then decrypts the results using its private key
and sends it to the SOC for examination.
5. The SOC then decrypts the result and learns about the offensive records and to
which rule in the intrusion policy these records are matched.
6. It then alerts the Data owner in case of an intrusion and sends the offensive
records and also advises on the steps to be taken in case of an attack.
5 Implementation Details
5.1 Data Cleaning and Preprocessing
The duplicates have already been removed as NSL KDD dataset is already standard-
ized [15]. The nan and infinity values are replaced with zero initially. Preprocessing
operation is done on the dataset as the dataset contains numerical and non-numerical
values. One Hot Encoding is used for this operation. An integer matrix denoting
the values of the categorical features is an input to the One Hot Encoder. This will
transform all the categorical features to their corresponding binary features out of
which one will be active at a time. The dataset is then divided into four parts based
on the attacks (U2R, Probe, DoS, R2L) which need to be classified (Fig. 2).
5.2 Feature Scaling
Featuring scaling is performed to steer clear of features which have large values as
this will affect the final result. Standard Scaler is used to perform this operation. In
Standard Scaler the average for a feature is calculated and then the mean is subtracted
from the current value of the feature and the result is divided by the standard deviation.
The standard deviation will be 1 after each feature is scaled (Fig. 3).
Fig. 2 System diagram
5.3 Feature Selection and Model
It is the process in which irrelevant and unnecessary features are eliminated with
minimal information loss. Subsets of the features are selected which fully repre-
sents all the features in the dataset in terms of accuracy and other metrics. It is also
possible that there is a correlation between features when a large number of features
are present. Feature selection also helps to eliminate this problem. We have used
Recursive Feature Elimination (RFE) to perform this operation. We plot the graph of
Figs. 4, 5, 6, and 7 for the accuracy against the number of features and based upon
that we select the optimal number of features for each of the attacks. Here, we have
built two models: decision trees and random forest. Both machine learning models
are built for all 4 types of attack, i.e., U2R, R2L, DoS, and Probe. This model is used
on the dataset containing every feature (123) and also separately for the features (13)
selected after feature selection operation.
5.4 Encryption
– The customer data is encrypted using a public key at the Data owner and is sent to
the SOC and the encryption scheme used is a paillier cryptosystems-based partial
Homomorphic encryption system.
512 A. Singh et al.
Fig. 3 Flow chart of intrusion detection system
Fig. 4 Optimization of features for DoS

Fig. 5 Optimization of features for R2L
Fig. 6 Optimization of features for U2R
– At the SOC the encrypted data is applied to the machine learning model which
produces the encrypted result as an output.
– That encrypted result is sent to the Data owner where it can be decrypted using a
private key which is only available with the Data owner and not the SOC.
– Then the unencrypted data is again encrypted using a simple encrypted scheme to
maintain end to end encryption and protect the system from external adversaries
knowing about the system.
514 A. Singh et al.
Fig. 7 Optimization of features for probe
– The result is decrypted at the SOC and an alarm is raised if an intrusion has
happened and appropriate steps to be taken in this situation to reduce the severity
of the attack damage will be provided to the DO (Fig. 8).
6 Results and Evaluation
6.1 Evaluation Metrics
Intrusion Detection is considered and approached as a problem where the records

need to be categorized into two classes either malicious (Intrusion attack) or normal
state. SOC raises an alarm when the record is categorized as malicious. In practice the
Intrusion Detection System misses some attacks or falsely classifies some attacks.
The evaluation metrics of any Intrusion Detection System therefore must take all of
the above mentioned points into consideration.
6.2 Results
All the implementations which include training the data, extracting the features,
and Homomorphic encryption have been implemented using the python libraries.
Partial homomorphic encryption based on paillier cryptosystems is achieved using
the paillier library in python. Figure 9 compares our model with the existing work
in the literature [11]. Figure 10 represents the result for attacks (DoS, Probe, U2R,
Fig. 8 Data encryption process
R2L) obtained when decision trees is used as Intrusion Detection Model. Figure 11
represents the result obtained for attacks when random forest is used as an Intrusion
Detection Model.
7 Conclusion
In this paper, we present a protocol for signature-based IDS on security data which
is encrypted. This protocol helps the Data owner to trust the third party security
operations center which has the required expertise in IDS, because he is confident
that the security data will remain encrypted during the entire protocol and can never
516 A. Singh et al.
Fig. 9 Comparison of different intrusion detection models
Fig. 10 Results for different attacks using decision tree as intrusion detection model
Fig. 11 Results for different attacks using random forest as intrusion detection model
be decrypted without the private key which is held only by the Data owner. Decision
trees and random forest are used for the machine learning model which are then
privately evaluated over the encrypted network data using Homomorphic encryption.
This intrusion detection protocol has several drawbacks mainly because of the high
computing power required by Homomorphic encryption algorithm and significantly
higher overhead generated by HE compared to the traditional approaches. Also the
IDS generates alerts after a certain time lag as the SOC does not have any clear
information on the output of the intrusion detection model which is also encrypted
and needs to be sent to the data owner where it is decrypted using the private key.
The decrypted results are then sent to the SOC for analysis and thus the time lag.
8 Future Work
For future, we would try to use parallel execution to reduce the overhead that comes
with Homomorphic encryption and to include other intrusion detection models and
classification methods in our proposed system.
518 A. Singh et al.
References
1. Niksefat S, Kaghazgaran P, Sadeghiyan B (2017) Privacy issues in intrusion detection systems:

a taxonomy, survey and future directions. Comput Sci Rev 25:69–78
2. Chaipa S, Eloff MM, Eloff MM (16–17 Aug 2017) Towards the development of an effec-
tive intrusion based detection model. In: 2017 information security for South Africa (ISSA).
Johannesburg, South Africa
3. Kumar R, Sharma D (2018) Signature-anomaly based intrusion detection algorithm. In: 2018
second international conference on electronics, communication and aerospace technology
(ICECA). Coimbatore, pp 836–841. https://doi.org/10.1109/ICECA.2018.8474781
4. Sharafaldin I, Lashkari AH, Ghorbani AA (22–24 Jan 2018) Toward generating a new intrusion
detection dataset and intrusion traffic characterization. ICISSP, Funchal, Madeira–Portugal
5. Bost R, Popa RA, Tu S, Goldwasser S (2015) Machine learning classification over encrypted
data, presented at the 2015 NDSS conference. CA, USA
6. Anitha Ruth J, Sirmathi H, Meenakshi A (17 June 2019) Secure data storage and intrusion
detection in cloud using mann and dual encryption through various attacks. IET Inform Security
13(4):7. Tamil Nadu, India
7. Archer D, Chen L, Cheon JH, Gilad-Bachrach R, Hallman RA, Huang Z, Jiang X, Kumaresan
R, Malin BA, Sofia H, Song Y, Wang S (July 2017) Applications of homomorphic encryption,
homomorphic encryption.org, redmond WA, Tech. Rep
8. Sarker I, Yb A, Alsolami F, Khan A (6 May 2020) IntruDTree: a machine learning based cyber
security intrusion detection model. https://doi.org/10.20944/preprints202004.0481.v1
9. Chowdhury MN, Ferens K, Ferens M (2016) Network intrusion detection using machine
learning. In: 2016 International conference on security and management SAM’16
10. Biswas SK (2018) CSE dept., NIT Silchar, Assam, India, 788010. Intrusion detection using
machine learning: a comparison study. 118(19):101–114
11. Meena G, Choudhary RR (2017) A review paper on IDS classification using KDD 99 and NSL
KDD dataset in WEKA. In: 2017 international conference on computer, communications and
electronics (Comptelix). Jaipur, pp 553–558. https://doi.org/10.1109/COMPTELIX.2017.800
4032
12. Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detection using
recurrent neural networks. IEEE Access 5:21954–21961. https://doi.org/10.1109/ACCESS.
2017.2762418(2017)
13. Lashkari AH, Kadir AFA, Taheri L, Ghorbani AA (2018) Toward developing a systematic
approach to generate benchmark android malware datasets and classification. In: 2018 inter-
national carnahan conference on security technology (ICCST). Montreal, QC, pp 1–7. https://
doi.org/10.1109/CCST.2018.8585560
14. Gentry C, Boneh D (2009) A fully homomorphic encryption scheme. Stanford University,
Stanford
15. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP
99 data set. In: 2009 IEEE symposium on computational intelligence for security and defense
applications. Ottawa, ON, pp 1–6. https://doi.org/10.1109/CISDA.2009.5356528
16. Dias LP, Cerqueira JJF, Assis KDR, Almeida RC (2017) Using artificial neural networks in
intrusion detection systems to computer networks. In: 2017 9th computer science and electronic
engineering (CEEC). Colchester. pp 145–150. https://doi.org/10.1109/CEEC.2017.8101615
Reversible Data Hiding Using LSB
Scheme and DHE for Secured Data
Transfer
D. N. V. S. L. S. Indira, Y. K. Viswanadham, J. N. V. R. Swarup Kumar,

Ch. Suresh Babu, and Ch. Venkateswara Rao
Abstract There are many ways to make data secure with different processing tech-
niques. The data is insinuated in a host and converted using encryption methods for
further transfer. The host medium is mutated using some principles of alteration rules,
and the genuine host medium is reclaimed back after the extraction of secret data
from it. This paper adopts reversible data hiding approach to surge the security by
hiding data in an image. This paper makes use of color images rather than grayscale
to exaggerate the capacity of hidden data. Senders can encrypt the original image
using data hiding in encryption (DHE) by using an encryption key and dynamic
histogram. The LSB is then compressed to make space for the data hiding key to be
used to hide the data. The receiver makes use of both encryption and hiding keys for
accurate retrieval of data. If the receiver makes use of only one key, the particular
functionality will respond depending on the key used.
Keywords Reversible data hiding · Image and data recovery · Dynamic

histogram · Image encryption · Least significant bit
1 Introduction
The idea of information concealment proposes that privileged data must be mediated
into a transporter medium with the idea of some host adjustment requirements. In the
usual approaches, the information concealment strategies will cause the host medium
D. N. V. S. L. S. Indira (B) · Y. K. Viswanadham (B) · Ch. Suresh Babu · Ch. Venkateswara Rao
Department of Information Technology, Gudlavalleru Engineering College, Gudlavalleru, AP
521356, India
J. N. V. R. Swarup Kumar
Department of Computer Science and Engineering, Gudlavalleru Engineering College,
Gudlavalleru, AP 521356, India
e-mail: swarupjnvr@gecgudlavalleru.ac.in
D. N. V. S. L. S. Indira · Y. K. Viswanadham · J. N. V. R. Swarup Kumar · Ch. Suresh Babu ·
Ch. Venkateswara Rao
Gudlavalleru Engineering College, Gudlavalleru, AP 521356, India
https://doi.org/10.1007/978-981-16-7610-9_38
520 D. N. V. S. L. S. Indira et al.
to twist. In certain domains, such as clinical images and military stuff, these kinds
of twists are simply not allowed. There are a few reversible information concealing
strategies that have been presented. Lossless pressure-based techniques, distinction
development strategies, histogram adjustment techniques are some of the techniques
that have been developed [1, 2].
A lot of the commonly used lossless pressure-based tactics rely on factual repe-
tition of the host in order to create room for the concealment of sensitive data. The
difference upgrade is one of the most important steps in the picture preparation
procedure. As an example, we will discuss histogram leveling as a way to improve
the distinctiveness of a picture. A histogram is a visual representation of data [3, 4].
Histograms are graphical representations of the force of pixels in an image.
When we use this technique, we stretch the image so that it is more distinct.
This is a graphical representation of a picture that is dependent on its pixels and
the specific power of comparing pixels [5, 6]. The same is true for an 8-cycle dim
scale, where there are 256 unique potential powers. If you are interested in learning
more about the histogram of color photographs, click here. So as to increase the
difference of the photos, power levels are again remapped into images using the
histogram equalizing process and change the histogram of a genuine picture into a
level, uniform histogram. There is a constant center or close to middle brightness level
in the restored image, which indicates a high degree of brightness. Changes should be
made to photographs with low and high brightness values [5, 7]. A few standards are
employed for the enhancement of histogram balance-based differentiation upgrading,
such as bi-histogram adjustment (BBHE), equivalent region dualistic sub-picture
histogram evening out (DSIHE), and least mean brilliance blunder bi-histogram
balance (MMBEBHE) [8–10].
The method of BBHE precisely parceled the picture into two comparable parts
and the partition force is introduced by the information mean splendor esteem, which
is the normal power of all pixels that develop the information picture, and there two
parts are equalized (Figs. 1 and 2).
The procedure of dualistic sub-picture histogram leveling (DSIHE) follows a
similar model as followed by BBHE. While the least mean brilliance blunder,
bi-histogram evening out (MMBEBHE) is the expansion of BBHE for additional
improvement of differentiation [7, 11].
2 Related Work
The image separation will be improved by using the histogram equalization (HE) as
it performs dynamic arrive at expansion and levels a histogram. Norms show that the
entropy of the message source is highest when the message has uniform scattering. In
this manner, histogram evening out can start dynamic changes in the image contrast
[3, 12].
Reversible Data Hiding Using LSB Scheme and DHE for Secured … 521
Fig. 1 DHS
2.1 BBHE—Brightness Preserving Bi-histogram

Equalization
BBHE is one such procedure which is, by and large, used to stay aware of the
distinction of the image. In the BBHE connection, we will consider an image and
undertaking its histogram see. The image histogram will furthermore be apportioned
Fig. 2 3-D histogram
into equal parts. The splendor regard is resolved as the mean of the image power
regard which is just the separation power which builds up the image. These two
picture histograms are autonomously leveled out to make a histogram which will lie
in the extent of information mean and dim level. Exactly when we join these two
histograms, we will make a histogram going from zero to L-1. When this histogram
is disconnected depending on power, we will convey two histograms of arrive at 0
and reach esteem [5, 13].
2.2 DSIHE—Dualistic Sub-Image Histogram Equalization
Identical locale dualistic sub-image histogram equalization is another cycle which

follows a comparable communication as of BBHE in the DSIHE method, and we
will consider the image and further separate the image into two sub-pictures. This
decay of picture is assembled not concerning the faint scale regard yet rather we
will isolate these two sub-pictures into light sub-picture and faint sub-picture. Right
when the image is crumbled, the splendid picture conveyed has a value of ordinary
picture level and focuses faint level. In this procedure, the DSIHE methodologies will
sub-hole the image into lighter and more mind blowing, the more splendid picture
will go probably as the best yield for the securing the distinction of the picture [1, 5].
2.3 MMBEBHE—Minimum Mean Brightness Error Bi-HE

Method
Least mean brightness botch bi-histogram system is one such strategy which is used
to stay aware of the distinction of the image. The MMBEBHE also accepts a compar-
ative communication as the BBHE. The solitary difference is that when the image
is sub-isolated into sub-modules, there we consider the cutoff levels of the pictures.
The output modules will have the levels which are in the running [0, lt] and I [lt +
1, L −1]. MMBEBHE is officially characterized by the accompanying systems:
(1) Determine the AMBE for each potential threshold level.
(2) Determine the XT threshold level that produces the smallest AMBE.
(3) Divide the input histogram into two halves based on the XT obtained.
2.4 RMSHE—Recursive Mean-Separate HE Method
Recursive mean separate histogram evening out method is moreover one such system
which is, by and large, used to save the splendor of the image. In the BBHE technique,
we will play out the mean separation and a while later parcel the image to save the
quality. By virtue of RMSHE, we will separate the image into extra events to stay
aware of the magnificence of the main picture. HE is identical to RMSHE level 0
(r = 0). BBHE is identical to RMSHE with r = 1.The yielded picture brilliance is
protected, and unique picture is acquired [5].
2.5 Classical Encryption Caesar Cipher Algorithm
The most fundamental issue of cryptography keeps an eye on which it settle is to

ensure security of correspondence over questionable medium. We use the encryption
techniques because we need a strong encryption computation so the gatecrasher
cannot unscramble it and the keys ought to be participated in such a style that the
sender and beneficiary can simply unravel the confined information. The Caesar
figure is a replacement figure, named after Julius Caesar. The Caesar figure befuddles
the letters of the letters in order, causing the outcome look like jabber [13, 14] (Fig. 3).
Fig. 3 Symmetric encryption model
3 Data in Encrypted Image Using Reversible Data Hiding

Technique
3.1 Reversible Data Hiding
The standard interaction or technique for information stowing away makes some
aggravations in input picture while recovering information out of Stego picture.
The method of reversible information stowing away is a cycle where we inject the
privileged information inside a picture and recover the first cover picture with any
bending.
In the recent years, scientists had proposed numerous new philosophies for the
reversible information stowing away. In the distinction development technique, we
consider the two contiguous pixel worth of the picture and twofold the pixel worth
of them. The multiplying of the picture pixel will create new LSB esteem. The new
LSB esteem gives an extra space to implant the information in the picture (Fig. 4).
The information hider method is additionally one such strategy which performs
reversible information stowing away. In this technique, we consider the histogram’s
apex points and adjust the pixel values to introduce information into it. In the further
examination, there are numerous methods which play out the reversible information
concealing ways to deal with work on the presentation [15].
3.2 LSB Scheme
The LSB scheme of reversible information concealing strategy is better thought

about and precisely accomplished by the legitimate utilization of shading pictures
as opposed to the dim scale picture. For more security assurance, the substance
Fig. 4 Image used for DHE
proprietor can encode unique picture utilizing a legitimate encryption key. Then,
at that point utilizing the concealing key we will pack the LSB bits to oblige the
information at all huge pieces. On the off chance that the recipient has just one kind
of key either covering up or encoded he can get just one yield either covered up
information or unscrambled picture.
We are thinking about four kinds of host pictures of sizes 512*512 which we named
lena, mandrill, plane, and cake.
Both the sets A and B were isolated into 16 subsets. We need to anticipate that the
limit of the subset should be more than that of information. The measure of assistant
data of the past subset, the helper data of the subset is produced after the information
is implanting or installed. We can recuperate the first substance from this utilizing the
backward request. The ideal exchange instrument carried out for every single subset
with the exception of the last one is utilized to accomplish a decent payload-bending
execution. Utilizing the LSB substitution technique, we will implant the helper data
in the last subset and recuperate the substance in opposite request (Figs. 5, 6, 7, and
8).
5 Conclusion
In the investigation of histogram adjustment-based strategies, it was found that there

are numerous cases that require a higher level of brilliance protection and are not
handled well by the histogram adjustment-based strategies HE, BBHE, and DSIHE,
Fig. 5 Original image
Fig. 6 Image encrypted
Fig. 7 Image decrypted directly
and that the RMSHE procedure is able to take care of these cases. In addition to
the BBHE, the MMBEBHE is a technique that allows for maximum brightness
to be preserved in a photograph. In spite of the fact that these tactics are useful
Fig. 8 Image is decrypted using DHE technique
for constructing a suitable splendor-safeguarding strategy, they can have certain

unintended impacts based on the histograms of distinct dim levels.
Using the DHE technique, we would protect the information in an image with
a high secure value and one that is free of incidental effects that might otherwise
compromise its integrity. This information will be converted to a coded format for
better degrees of security protection. Instead of a dim scale image, on the off chance
that the measure of information to be installed is excessively large, we will examine
a shading picture.
As opposed to a dim scale picture, we would choose a shading picture due to
the fact that we would be saving a great deal of information when comparing a
dim scale picture with an RGB picture. By implanting the video parts and applying
master encryption and unscrambling techniques, we would enhance the security of
this process.
References
1. Zhang X (2011) Reversible data hiding in encrypted image. IEEE Signal Process Lett
18(4):255–258. https://doi.org/10.1109/LSP.2011.2114651,April
2. Pravalika SL, Joice CS, Joseph Raj AN (2014) Comparison of LSB based and HS based
reversible data hiding techniques. In: 2014 2nd international conference on devices, circuits
and systems (ICDCS). pp 1–4
3. Lee J-D, Chiou Y-H, Guo J-M (Oct 2013) Reversible data hiding scheme with high embedding
capacity using semi-indicator-free strategy. Comput Intell Image Process 2013
4. Qin C, Zhang X (2015) Effective reversible data hiding in encrypted image with privacy
protection for image content. J Vis Commun Image Represent 31:154–164
5. Puteaux P, Puech W (July 2018) An efficient MSB prediction-based method for high-capacity
reversible data hiding in encrypted images. IEEE Trans Inform Forensics Secur 13(7):1670–
1681
6. Anita H, Hangargi K, Pattan P (July 2019) Reversible data hiding in encrypted image. Int J
Innovative Technol Exploring Eng (IJITEE) ISSN: 2278-3075 8(9)
7. Wedaj FT, Kim S, Kim HJ et al. (2017) Improved reversible data hiding in JPEG images based
on new coefficient selection strategy. J Image Video Proc 63
8. Gonzalez RC, Woods RE (2002) Digital image processing, 2nd edn. Prentice Hall
9. Al-qershi O, Ee KB (Oct 2009) An overview of reversible data hiding schemes based on
difference expansion technique. First international conference on software engineering and
computer systems
10. Peter N (2015) A system for separable reversible data hiding using an encrypted image. Int J
Eng Res Technol (IJERT) 3(28)
11. Abikoye O, Adewole S, Oladipupo J (2012) Efficient data hiding system using cryptography
and steganography. Int J Appl Inform Syst (IJAIS) 4:6–11. https://doi.org/10.5120/ijais12-
450763
12. Yu C, Zhang X, Tang Z, Chen Y, J Huang (2018) Reversible data hiding with pixel prediction
and additive homomorphism for encrypted image. Secur Commun Networks 2018:13. Article
ID 9103418
13. Manikandan VM, Masilamani V (2018) Reversible data hiding scheme during encryption using
machine learning. Procedia Comput Sci 133:348–356
14. Sabeen Govind PV, Wilscy M (2015) A new reversible data hiding scheme with improved
capacity based on directional interpolation and difference expansion. Procedia Comput Sci
46:491–498
15. Ayyappan S, Lakshmi C, Menon V (2020) A secure reversible data hiding and encryption
system for embedding EPR in medical images. Curr Signal Transduct Ther 15(2)
Prediction of Solar Power Using Machine
Learning Algorithm
M. Rupesh, J. Swathi Chandana, A. Aishwarya, C. Anusha, and B. Meghana
Abstract The stability and reliability of power in an integrated renewable energy

system vary according to changes in environmental conditions, as the radiation,
temperature, and humidity change continuously; the power generation through PV
system also changes, and hence, the power scheduling and operation mainly depend
on the estimation of power through renewable energy sources. The power generation
in PV systems initially depends on the radiation and temperature, so prediction of
weather conditions helps in predicting the power generation through PV solar plant.
The stability of a power system can be improved by predicting solar energy which
can tell approximately how much solar power can be generated in the future at a
particular location. Solar power forecasting has several methods, one of all methods
is using machine learning/neural networks. In this paper, the power generation with
a solar plant is forecasted by predicting the future weather generation using machine
learning algorithms. The accuracy of forecasting will be checked directly with the
practical data which is generated and simulated data using MATLAB/Simulink by
applying various machine learning algorithms.
Keywords Weather forecasting · Solar power forecasting · Artificial neural

network (ANN) · Machine learning · Feed-forward back-propagation algorithm
M. Rupesh (B) · J. Swathi Chandana · A. Aishwarya · C. Anusha · B. Meghana

Electrical & Electronics Engineering Department, BVRIT Hyderabad College of Engineering for
Women, Hyderabad, India
e-mail: rupesh.m@bvrithyderabad.edu.in
J. Swathi Chandana
e-mail: 17wh1a0223@bvrithyderabad.edu.in
A. Aishwarya
C. Anusha
B. Meghana
https://doi.org/10.1007/978-981-16-7610-9_39
530 M. Rupesh et al.
1 Introduction
In present days, the economic growth is directly proportional to electrical demand,

so in developed and developing countries like India, fossil fuels are exhaustive in
nature and their by-products are resulting in pollution and greenhouse gases so power
generation mainly depends on renewable energies to satisfy the customers demand.
The PV system [1] plays a major role in renewable energy systems because it is
abundant in nature and is renewable [2], as the PV power is varying due to the
changes in weather conditions it will affect the operation of the grid adversely. To
maintain the system stable in variable power generation conditions, the fossil fuels
should act as spinning reserve and proper planning and estimation of operating time,
capacity is very important to save the environment and reduce fossil fuel usage.
Prior knowledge of power produced through PV systems helps in estimating the
fossil fuel power requirements and reduces the cost for power generation. In solar
power generation, the prediction mainly depends on weather prediction and past
data. Hence, the collection of past data is very important in solar power forecasting.
Various researchers have proposed many forecasting mechanisms with good results,
but still, there is room to improve the results. Artificial neural network (ANN) in
machine learning gives a very high degree of accuracy in predicting the weather and
power in solar PV systems [3].
The paper is organized like the proposed method in Sect. 2, implementation of
ANN forecasting method in Sect. 3, simulation and results in Sect. 4, and Sect. 5
deals with conclusion.
2 Proposed Model
Reliable data availability and choosing the right attributes from the collected data
is very important to predict accurately especially in solar power generation [4]. In
this work, the data is collected from the BV Raju Institute of Technology, Narsapur,
Medak Dist., India.
We choose the 3 years of weather information dataset from the above-said location.
The dataset available is minutes-based values of weather parameters like irradiance,
temperature, panel temperature, wind direction, and speed. Different weather values
are collected from the year 2012 to 2014, i.e., dataset of 201,235 to analyze the
relationship between the weather parameters and the power generation for accurate
prediction (Figs. 1 and 2).
Prediction of Solar Power Using Machine Learning Algorithm 531
Fig. 1 Correlation of weather parameters
3 Machine Learning Algorithm
State of the art of solar power generation will only be established if forecast algorithm
predicts that how much power will be generated at any location and time [5–8]. The
machine learning algorithm will be developed with training, testing, and validating
the collected data, the workflow [9] is as followed in the flowchart as shown in Fig. 3.
3.1 Feed-Forward Back-Propagation Algorithm

for Prediction
We have tested the selected data of the given work in the commonly used machine
learning algorithm feed-forward neural network with back-propagation to evaluate
the performance with weather data. In this weather, parameters are given as an input
to the ANN [10] and it gives the output as predicted solar power. The neuron functions
Fig. 2 Relationship between the temperature and irradiance
used in this model are used to train the dataset with the learning rate as 1000 epochs.
It is analyzed and observed that the RMS error value decreases as the increased
learning rate.
The given dataset is split into three categorized as testing, training, and validating
datasets.
A multilayer feed-forward neural network in our proposed method consists of
input layer, two hidden layers and output layer [11]. The input layer consists of the
weather parameters as attributes, and output layer consists of solar power and voltage
as attributes.
Figure 4 [12] is representing multilayer feed-forward back-propagation neural
network.
Cost function of gradient is defined as [13].
1
h w,b (x) − y 2
J (W, b; x, y) = (1)
2
From the above, the squared-error cost function is defined as
Fig. 3 Flowchart for process of ANN

Fig. 4 Multilayer feed-forward back-propagation neural network model [13]
1
m
λ nl−1 (l) 2
Sl Sl+1
J (W, b) = [ J W, b; x (i) , y (i) + W ji (2)
m i=1 2 l=1 i=1 j=1

1 1
m

nl−1 (l) 2
Sl Sl+1
J (W, b) = h w,b x (i) − y (i) 2 + λ W ji (3)
m i=1 2 2 l=1 i=1 j=1
One iteration of gradient decent updates the parameters W, b as follows.
∂
Wilj = Wilj − α J (W, b) (4)
∂ Wilj
∂
bi(l) = bi(l) − α J (W, b) (5)
∂bil
Partial derivation of cost function
1
m
∂
J (W, b) = [ J W, b; x (i) , y (i) + λWi(l)
j (6)
∂ Wi j
l m i=1
1
m
∂
J (W, b) = [ J W, b; x (i) , y (i) (7)
∂bil m i=1
4 Simulation and Results for Solar Prediction Using ANN

Model
The feed-forward back-propagation model is used as neural network model to study

the prediction of solar generation using the collected dataset by following the various
steps in developing neural network model. The steps might be included like dividing
the collected data as 70% as training data, 15% data as testing, and 15% data as
validating data, after dividing the data develop the neural network model [14] using
nntool in matlab to predict the solar generation [14, 15]. The flowchart for developing
the neural network model using feed-forward back-propagation algorithm is shown
in Fig. 5.
The proposed neural network model using feed-forward back-propagation algo-
rithm is shown in Fig. 6.
Figure 7 shows the training of NN for 100 epochs.
Figure 8 shows the performance of the NN model using FFBP algorithm for
training, testing, and validation processes.
Figure 9 shows the regression values of the NN model using the FFBP algorithm
for the prediction of solar generation at any location.
Fig. 5 Flowchart for

preparing the neural network
Start
model using FFBP algorithm
Read Training Data
Read TesƟng Data
Use Neural Network for Training and
Get PredicƟon result
Plot PredicƟon Result
Plot Regression PredicƟon Result
End
Fig. 6 NN model using FFBP algorithm
Fig. 7 Training the NN model
The R value for the given model is about 0.994, which tells about the accuracy of
the predicted model.
5 Conclusion
In this paper, the Sun irradiance, temperature, wind velocity, humidity, as input
variables, solar-generated voltage, power as output variables have been connected
from the BVRIT Narsapur, Medak Dist. Telangana, and generalized artificial
Fig. 8 Performance curves of NN model
neural network model using machine learning algorithm, i.e., feed-forward back-
propagation algorithm have been developed for weather and solar power forecasting
using MATLAB/Simulink application. Finally, it can be concluded that the solar
forecasting is achieved with the accuracy of 99.4%; hence, our model can be used to
estimate the power generation from any solar plant at any location.
Fig. 9 Regression plot for the NN model with FFBP algorithm
Acknowledgements The authors would like to thank BVRIT, Narsapur Solar Plant in charge Mr.
N. Ramchandar, Associate Professor, EEE, BVRIT, Narsapur, and Mr. M. Sudheer Kumar, Assistant
Professor, BVRIT HYDERABAD College of Engineering for Women, Hyderabad.
References
1. Rupesh M, Shivalingappa TV (2019) Evaluation of optimum MPPT technique for PV system

using MATLAB/simulink fig 2. Equivalent circuit of PV cell. 5:1403–1408
2. Revana G, Kota VR (2020) Simulation and implementation of resonant controller based PV
fed cascaded boost-converter three phase five-level inverter system. J King Saud Univ—Eng
Sci 32(7):411–424
3. Trivedi S (2021) Evaluation of the use of artificial neural networks to predict the photovoltaic
power generation factors by using feed forward back propagation (FFBP) technique. Int J Curr
Sci Res Rev 04(02):113–119
4. Jawaid F, Nazirjunejo K (2017) Predicting daily mean solar power using machine learning
regression techniques. 2016 6th Int Conf Innov Comput Technol INTECH 2016:355–360
5. Wu YK, Chen CR, Abdul Rahman H (2014) A novel hybrid model for short-term forecasting
in PV power generation. Int J Photoenergy 2014
6. Coelho JP, Boaventura-Cunha J (2014) Long term solar radiation forecast using computational
intelligence methods. Appl Comput Intell Soft Comput 2014(December):1–14
7. Gupta A, Kumar P, Pachauri RK, Chauhan YK (2014) Performance analysis of neural network
and fuzzy logic based MPPT techniques for solar PV systems. 2014 6th IEEE Power India Int
Conf 1–6
8. Khan I, Zhu H, Khan D, Panjwani MK (2018) Photovoltaic power prediction by cascade
forward artificial neural network. 2017 Int Conf Inf Commun Technol ICICT 2017
2017(December):145–149
9. Ahmed R, Sreeram V, Mishra Y, Arif MD (2020) A review and evaluation of the state-of-the-
art in PV solar power forecasting: techniques and optimization. Renew Sustain Energy Rev
124(June 2019):109792
10. Aljanad A, Tan NML, Agelidis VG, Shareef H (2021) Neural network approach for global
solar irradiance prediction at extremely short-time-intervals using particle swarm optimization
algorithm. Energies 14(4)
11. Shekher A, Khanna V (2016) Modelling and prediction of 150KW PV array system in Northern
India using artificial neural network. 5(5):18–25
12. Kabilan R, et al. (2021) Short-term power prediction of building integrated photovoltaic (BIPV)
system based on machine learning algorithms. Int J Photoenergy 2021
13. (2015) Multi-layer neural network neural network model. http://deeplearning.stanford.edu/tut
orial/supervised/MultiLayerNeuralNetworks/. pp 1–6
14. Shaik NB, Pedapati SR, Ammar Taqvi SA, Othman AR, Abd Dzubir FA (2020) A feed-forward
back propagation neural network approach to predict the life condition of crude oil pipeline.
Processes 8(6)
15. Choudhary A, Pandey D, Bhardwaj S (2020) Artificial neural networks based solar radiation
estimation using backpropagation algorithm. Int J Renew Energy Res 10(4):1566–1575
Prediction of Carcinoma Cancer Type
Using Deep Reinforcement Learning
Technique from Gene Expression Data
A. Prathik, M. Vinodhini, N. Karthik, and V. Ebenezer
Abstract In recent decades, the investigation based on the molecular level for the
classification of cancer is becoming trending research topic for several researchers to
identify the type of cancer based on the gene expression data. Analyzing large number
of gene characteristics offered in-depth classification problem for cancer types. These
characteristics help in understanding the gene functions and interaction between the
abnormal and normal conditions of it. Under various conditions, the expression data
of gene to genes behavior is monitored by this characteristic. In this paper, a deep
reinforcement learning (DRL) model is proposed for the effective analysis of gene
expression data to find the type of cancer. The dataset of gene expression is used
for analyzing the model for predicting the cancer types. Furthermore, the simulation
results show that the proposed DRL model can predict the cancer type by obtaining
a 97.8% of accuracy when compared with other existing models.
Keywords Deep reinforcement learning · Genes behavior · Classification
1 Introduction
Deoxyribonucleic corrosive or hereditary data put away by DNA stores required by

each living creatures in the event that to frame, work, and create. DNA is considered
to be vital part in every living creature around since its parameters encode every
needed information’s for life maintenance. This information pertaining to genetic is
safeguarded and processed from each cell to every other cell during the methodology
of division in cell in which beginning guardian cell disperses into two more up-to-date
little girls cells. Particles of DNA structure a turned twofold helix related together
A. Prathik (B) · M. Vinodhini · N. Karthik

Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology, Chennai, India
V. Ebenezer
Department of Computer Science and Engineering, Karunya Institute of Technology and Science,
Coimbatore, India
e-mail: ebenezerv@karunya.edu
https://doi.org/10.1007/978-981-16-7610-9_40
542 A. Prathik et al.
and concurred in a critical exact way. The normal four units of molecular forming
the helix DNA are then sequenced by specific courses of action like each segment
on each strand can just connect with specific boundaries in the rest of the strand.
Replication of DNA occurs by breaking the connection among the two strands-the
curved twofold helix and each strand structure a strand that is coordinating.
Level of articulation in quality speaks to the complete of RNA created in each
cell underneath different conditions of science. So during the time of division in cell
methodology, if the phones anguish different illnesses that are threatening or malig-
nant growth that structures transformation or adjustment in qualities, the unstoppable
conduct of quality will be demonstrated to girl cells. In addition, specific articula-
tions in quality will be self-important and subsequently level of articulations can be
shaped by dissecting the RNA. The degree of articulations of thousands of qualities
can be persistently determined underneath certain reenactment condition and circum-
stances because of certain headway of microarray DNA innovation. This philosophy
made it liable to appreciate life on level of particles. For move ring the microarray
test of DNA from their simple model, which includes the inclusion of DNA succes-
sion imprinted in a higher thickness cluster on an infinitesimal glass slide, into an
advanced change, which is the lattice containing the quality articulation that can be
observed and reenacted a few stages which are finished.
The average strategy is to impart mRNA acquired from two cells and oppo-
site transmit them two to frame cRNA, notice them using fluorescent colors. Two
examples are dissipated over entire microarray to hybridize each cDNA (referenced
marked cDNA make connect to their substitute cDNA on microarray to make a
particle of twofold abandoned in technique known as hybridization). This hybridiza-
tion in this manner advance like a pointer of specific quality. The slide is prepared
to accomplish numerical qualities of each color. At that point, the controlled that
can be demonstrated powers of numerical estimations of each color; the hues force
compares to number of mRNA spelled out for each quality. By partner the power
of qualities shading under two different reproduction circumstances, articulation of
quality stages can be checked. For each quality on each chip, articulation of quality
range is: where is red dyes intensity and is green dyes intensity.
The optimal of genes which are discriminant among different types of cancers
or classes is a vital research area. Third, various trade-offs have been exposed like
maintaining rate of accuracy versus maintaining generalization, handling complexity
versus enhancing the performance of classifiers, improving the memory requirement.
Those parameters have affected the significance of the algorithm involving the clas-
sification in cancer. However, in recent days, for significant amount of data of gene
expression for the classification of cancer, the total of samples involved in training
is very little when compared to huge number of genes included in the simulations.
At the point when complete qualities are drastically higher than the all-out exam-
ples, it is probably going to recognize irregular and applicable organic relationships
of conduct of quality with test stages. To defend against results, finding a least
conceivable, however, most point-by-point subset of qualities is the point of quality
choice. This is a huge issue in AI, which is characterized as highlight choice [1].
In extra, a lower subset of qualities is additionally important in creating articulation
Prediction of Carcinoma Cancer Type Using Deep Reinforcement … 543
Fig. 1 Block diagram

reinforcement learning
of quality-based symptomatic applications. Moreover, the lowest number of sample

training and a huge number of genes makes the selection of gene collection huge
relevant and challenging issues in expressions in gene-based classification of gene.
Traditional statistical models for classification and clustering have been extensively
utilized for selection in gene [2].
Support vector machine (SVM) [3] also has been utilized mostly for solving prob-
lems in classification. Linear SVMs are utilized in elimination of backward process
for the selection of gene and the determination procedure is in that delineates to
as SVM recursive component disposal (SVM-RFE). Related with different other
element assortment models, SVM-RFE is climbable, effective strategies for cover-
ings. Further insights concerning other choice of highlight model can be broken down
[4]. Fig. 1 depicts the reinforcement learning overall process which states agent also
attains the information pertaining to the environment [5].
2 Related Work
Okun [6] mentioned as ensemble model simulated over colon dataset. Filter selection
based on feature methods is utilized to alleviate the overfitting effects. Three various
selection gene models were simulated, namely backward determination Hilbert–
Schmidt independence criterion “BAHSIC” [7], extensive range distribution-based
selection of gene “EVD” [8] and singular range decomposition entropy selection in
gene in [9]. The ensemble includes five classifiers and utilizes k-nearest neighbor
“K-NN” with various values of K either 3 or 5 neighbors which is nearest. The K-NN
classifiers decision was advocated as it does not require preparing which is to a great
extent reasonable for use with dataset of colon as a result of microarray nature infor-
mation. Because of lower sample size, boosted substitution fault calculation “BRE”
is utilized in [10]. The substitution boosted calculation is based on theory, large
confidence should be qualities is based on theory, boosted substitute error calculated

are minimum variance and normally low predisposition too. It has distinctive serious
in relationship with cross-approval and bootstrap deficiency figuring particularly for
lower test issues [11].
Utilizing expression of miRNA profiles to distinguish samples which are
cancerous from normal ones, and for cancer classification into its sub-stages, is a
predominant area of research and was mentioned in various types of cancers models
such as lung, breast, pancreatic in [12], and liver in [13]. The earlier papers utilized
one of the following techniques which involve supervised machine learning models
such as SVM, microarray prediction analysis and predictor of compound covariate.
Various attempts for improvising cancer classifier have been mentioned in [14]. In
[14], total of feature selection of methods, as spearman’s and Pearson’s correlations,
coefficient cosine, Euclidean distance, information gain and information mutually
and signal-to-noise are utilized to improvise cancer classifier. Also, various clas-
sification models which are neighbor k-nearest model, multilayer perceptron’s and
linear kernel along with support vector machine are utilized in [15]. The examina-
tion has concentrated distinctly on upgrading the classifier on fundamental on marked
samples articulation miRNA profiles and did not use exposure referencing unlabeled
sets, additionally articulation of quality profiles was not used to extemporize miRNA
dependent on samples of malignancy classifiers. Upgrading the precision of order by
building classifiers one for starting miRNA information and another for information
in mRNA information were found in [16]. That examination previously referenced
determination of highlight using alleviation F include choice, at that point it rehearses
stowed fluffy classifier KINN lastly it relates the two classifiers using combination
of choice standard.
The background of methods is that it undertakes the presence of both mRNA and
miRNA data for every patient, and it just utilizes fusion of decision rule to associate
the classifier decision without improving the classifier themselves. Authors in [17]
have mentioned utilizing discrete function learning (DFL) model on the expression of
miRNA profiles to identify the sub-group of miRNA that depicts level of expression
which are distinctive in normal and tissues in tumor and then utilizes these miRNAs to
form a classier. Another kind of machine learning model is semi-supervised learning,
which is a blend of unsupervised and supervised learning. It combines unlabelled
and labeled data to form a precise learning framework. Semi-supervised approaches
of machine learning make use of publicly available unlabeled groups to improve the
training of data classifiers. Moreover, the method depends mainly on expression of
gene and does not associate both sets of expression of gene and miRNA.
Different AI semi directed techniques such as self-learning and co preparing
were made reference to various spaces. The heuristic techniques for self-learning
are one prior models in semi-regulated learning and that were referenced in [18].
Self-learning was used in different applications as recognition of items, disam-
biguation of word sense, and recognizable proof of emotional thing. Likewise, test
preparing, which is a semi-managed strategy that referenced and is additionally used
in application as disambiguation of word sense and grouping of email.
A deep reinforcement learning model is proposed for analyzing the cancer types
based on the gene expression data. Figure 2 shows the overall architecture of the
proposed model. The deep neural network with reinforcement learning can easily be
identified, and the performance is improved based on the accuracy metrics. There
are three main modules in this research framework, and these are: preprocessing
method, feature extraction, and classification.
The microarray dataset of gene expression is collected as given in Table 1. The

description of dataset is clearly defined in Table 1. Three types of gene dataset are
brain tumor, glioblastoma, and lung cancer. Each dataset has its own description
Fig. 2 Proposed framework for cancer-type prediction

Table 1 Description of gene

Gene expression dataset Samples Genes Class labels
expression data
Glioblastoma dataset 50 12,625 4
Brain tumor dataset 40 7129 5
Lung cancer dataset 34 10,541 3
where it consists of sub-types of cancer. The sub-types of brain tumor consist of

malignant glioma having 10 samples, normal cerebellum having 4 samples, medul-
loblastoma having 10 samples, rhabdoid tumor having 10 samples, and primitive
neuroectodermal having 6 samples.
Data described in the given includes the gene expression which is included as an
input to the proposed model. Various different dataset profiles from different nominal
conditions are included for the model. Let us consider the S as a gene set and D as
the values of gene expressions which are observed in the framework, V = (f _ab) be
assumed as expression matrix, where f _ab describes the values of expression of every
gene. Let J be a vector is including training label consistent to various instances (I).
Deep features are generated automatically and final classifier is formed using these
features.
3.2 Preprocessing Technique
In this process, the dataset is manipulated using the preprocessing module. The
major process of this module is filtering, logarithmic transformation, data normaliza-
tion, and thresholding. Before the classification process is made, these preprocessing
modules are used for preparing the dataset into well format. Once the preprocessing
phase is completed, enrichment of gene data is initiated utilizing various dataset as
provided in Table 1. Functional span is estimated using the provided gene expression.
Algorithm 1 provides different phases involved in it. Mathematically functional span
is written as function (F) which includes three parameters as input:
F(X, geneset, mc) (1)
where X describes the profile of input gene and mc is the total of cores available for
parallel processing.
Algorithm 1
Input: Gene Expression (V)
Output: Function Span (F)
Step 1: V = Scale (V) // measure the columns of numeric values
Step 2: [R C] = size (V)
Step 3: calculate the measurement in parallel
Step 4: for ag_i R do
Step 5: s_p = preprocess (V [ag_i]) // preprocessing of GeneList

Step 6: F = Calculate Enrichment (s_p,s_set)
Step 7: end for
Step 8: return F
Training the classier is performed using algorithm 2. After the initial preprocessing
and estimation of functional span of gene expression, classifier training is analyzed
using DRL model.
Algorithm 2
Input: gene expression’s functional span (S), labels (L).
Output: Classifier
Step 1: do in parallel
Step 2: Adjust framework
Step 3: J_f = as.h2o (S,L) // data frame preparation for H2o
Step 4: A_m = h2o.model building by deep learning (〖training 〗_data)
Step 5: C_s = feature (A_m,S)
Step 6: return A_m
3.3 Parameter Optimization
Parameter development is a tool to increase the input for the specific step. This is
carried out using various methodologies and procedures. In this paper, deep rein-
forcement learning model is incorporated for this optimization purpose. Parameter
named as H2O in the parallel classifier is modified automatically using option known
as hyperparameter. It achieves grid search randomly over all the existing parameters
and returns extremely accurate model. In this context, the proposed DRL model is
utilized to enhance a number of neurons. This is achieved using values of random
neurons in populating hidden layers to get the optimal model.
Encoding chromosome: Let us assume that an extreme total of neurons of present
hidden layer be n and the neurons in output be o. The expression of neurons in hidden
layer can be expressed using binary hidden system as:
m 1 , m 2 , m 3 , . . . ., m n (2)
Encoding is used for neurons, means m_i will be 0 or either 1 based on the
scenario of whether the neurons exist or not. Real encoding is utilized for weights
(V _ij), which is usually represented as follows:
V11 V21 . . . Vn1 V12 V22 . . . Vm2 . . . V10 V20 . . . Vn0 (3)
3.4 Feature Extraction
It is an important process for making the performance of the proposed model for better
classification. Features are mainly concentrated here to extract particular variables
so that the classification accuracy can be improved. Here, the PCA algorithm is used
for analyzing and extracting the features for the proposed model and extracted the
significant genes from the dataset.
3.5 Proposed Classification Module
The proposed DRL module is used for the prediction of cancer type from the gene
expression dataset. This classification module can easily predict the type of cancer
based on genes dataset even if it has multiple class labels. Each class is identified
by the deep neural network and continuous estimation using the Q-learning method
reinforcement learning.
In this section, the simulation results are analyzed using the proposed DRL model.
The results are compared with other existing techniques to analyze the performance
of the proposed classifier model. For the performance analysis, the accuracy, TPR,
and FPR metrics are used for the result analysis and three main datasets used for
this result analysis are breast cancer, glioblastoma dataset, and lung cancer. Table 2
shows the performance analysis and comparison results of the proposed model vs
existing algorithms.
Figure 3 shows the comparison results of different algorithms for various datasets,
namely breast cancer, glioblastoma, and lung cancer. The proposed DRL model
outperforms by obtaining more than 98% of accuracy when compared with other
existing techniques.
4.1 ROC Curves & Model Validation
The ROC curve depicted in Fig. 4a-b illustrates the sensitivity and specificity of
model depicted on training data and additional-based test data. The performance of
prediction on the test data depicts an enhance in the accuracy for DRL by 86.5%. The
data utilized for training the model performs with AUC = 0.95. More remarkably,
we could note still classifier well AUC = 0.75, while the curve for validation is above
the classifier line.
Table 2 Performance comparison for different datasets
Datasets algorithms Breast cancer Glioblastoma Lung cancer
Accuracy (%) TPR (%) FPR (%) Accuracy (%) TPR (%) FPR (%) Accuracy (%) TPR (%) FPR (%)
Proposed DRL model 98.3 97.8 1.34 99.2 98.34 0.98 97.34 96.9 2.34
SVM 91.23 90.78 8.77 92.34 91.25 7.65 93.42 91.3 7.26
RF 78.9 76.7 23.2 81.23 80.94 20.34 82.34 81.98 20.12
ANN 94.5 93.2 6.57 93.47 92.34 7.12 94.5 93.8 5.34
Prediction of Carcinoma Cancer Type Using Deep Reinforcement …
549
Fig. 3 Analysis of proposed vs existing methods
In our model, we have illustrated that the primary gene expression can be an
excellent predictor of response to cancer drugs. By utilizing various classification
and clustering techniques, we analyzed the cancer gene expression with validation
accuracy of 86%. Our performance analysis depicts that the DRL model performs
better than the other existing models as shown in Fig. 3. The DRL model had a large
substantial sample size of patient’s data. This was beneficial, as the model was able to
achieve increased diversity in the data used for training to build a demanding model
that was able to successfully forecast on a newer dataset.
5 Conclusion
In this research, the DRL model is proposed for analyzing the cancer types using
the gene expression data. This classification method obtains a correct class of the
particular cancer with having more than 98% of accuracy when compared with ANN,
RF, and SVM classifiers. The false rate for the proposed model is much less for iden-
tifying the cancer types. The overfitting is reduced by obtaining correct testing and
training data for the model, and using PCA extraction technique, we further analyzed
the feature for improvement of performance. Moreover, this proposed model can be
easily used for the classification of multi-class dataset in different domains.
Fig. 4 a ROC curve for DRL model (sensitivity: 0.87 specificity: 0.70 AUC:0.88) b ROC curve
gene expression model: cross-validation (sensitivity:0.75 specificity:1.0 AUC:086)
References
1. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1–2):273–324
2. Blum A, Langley P (1997) Selection of relevant features and examples in machine learning.
Artif Intell 97(1–2):245–271
3. Golub T, Slonim D, Tamayo P, Huard C, Gaasenbeek M, Mesirov J, Coller H, Loh M, Downing J,
Caligiuri M, Bloomfield C, Lander E (1999) Molecular classification of cancer: class discovery
and class prediction by gene expression. Science 286:531–537
4. Alizadeh A et al (2000) Distinct types of diffuse large b-cell lymphoma identified by gene
expression profiling. Nature 403:503–511
5. Tizhoosh HR, Taylor GW (2006) Reinforced contrast adaptation. Int J Image Graph 6(03):377–
392
6. Okun O (2011) Feature selection and ensemble methods for bioinformatics: algorithmic
classification and implementations. Med Inform Sci Ref
7. Song L, Smola A, Gretton A, Borgwardt KM, Bedo J (June 2007) Supervised feature selection
via dependence estimation. In: Proceedings of the 24th international conference on machine
learning. ACM. pp 823–30
8. Li W, Sun F, Grosse I (2004) Extreme value distribution based gene selection criteria for
discriminant microarray data analysis using logistic regression. J Comput Biol 11(2–3):215–
226
9. Varshavsky R, Gottlieb A, Linial M, Horn D (2006) Novel unsupervised feature filtering of
biological data. Bioinformatics 22(14):e507–e513
10. Dougherty ER, Sima C, Hanczar B, Braga-Neto UM (2010) Performance of error estimators
for classification. Curr Bioinform 5(1):53–67
11. Braga-Neto UM, Dougherty ER (2004) Is cross-validation valid for small-sample microarray
classification? Bioinformatics 20(3):374–380
12. Volinia S, Calin G, Liu C (2006) A microRNA expression signature of human solid tumors
defines cancer gene targets. Proc Natl Acad Sci USA 103:2257–2261
13. Murakami Y, Yasuda T, Saigo K (2006) Comprehensive analysis of microRNA expression
patterns in hepatocellular carcinoma and nontumorous tissues. Oncogene 25:2537–2545
14. Bishop JA, Benjamin H, Cholakh H, Chajut A, Clark DP, Westra WH (2010) Accurate classifi-
cation of non-small cell lung carcinoma using a novel microRNA-based approach. Clin Cancer
Res 16(2):610–619
15. Parry RM, Jones W, Stokes TH, Phan JH, Moffitt RA, Fang H, Shi L et al (2010) k-Nearest
neighbor models for microarray gene expression analysis and clinical outcome prediction.
Pharmacogenomics J 10(4):292–309
16. Yousef M, Nebozhyn M, Shatkay H, Kanterakis S, Showe LC, Showe MK (2006) Combining
multi-species genomic data for microRNA identification using a Naive Bayes classifier.
Bioinformatics 22(11):1325–1334
17. Zheng Y, Kwoh CK (2006) Cancer classification with microRNA expression patterns found
by an information theory approach. J Comput 1(5):30–39
18. Ibrahim R, Yousri NA, Ismail MA, El-Makky NM (2013) MiRNA and gene expression
based cancer classification using self-learning and co-training approaches. In: 2013 IEEE
international conference on bioinformatics and biomedicine. IEEE, pp 495–498
Multi-variant Classification
of Depression Severity Using Social
Media Networks Based on Time Stamp
M. Yohapriyaa and M. Uma
Abstract Many people in the modern day are suffering from severe depressive
illness. According to the World Health Organization (WHO), depression will become
more common in the next twenty years. Detecting depression at an early stage is
difficult since many people are unaware that they are suffering from it, and this
undetected situation can lead to suicidal thoughts. Thus, depression needs to be
predicted at early stages. Due to the increase in number of people using social media,
the online social network became a platform for many individuals to share their
feeling and expression in day-to-day life. This paper has attempted to develop a
system for analyzing social media posts (Twitter tweets) of the individual for a
specific time period of four weeks or more depending upon the case. The emotions
in the textual data are examined using LSTM-CNN; if the pattern changes, it identifies
a change in the person’s emotional well-being. The method would identify the degree
of depression and the reason of depression based on the change, whether it is due to
a personal connection, the job, or some other factor.
Keywords Depression · LSTM · CNN · Time stamp · Web scrapping · RNN ·

Social media · Depression severity.
1 Introduction
Individuals experience a variety of emotions; among these, more than 350 million
people suffer from depression, one of the most prevalent mental diseases [1]. It
happens in various intensities as well. Prolonged phases of depression lead to a
number of serious mental health issues, not only affecting the productivity of a
person, but also sometimes lead to self-harm and suicide. Symptoms of depression
include anxiety, sometimes a feeling of loneliness, in worst cases considering oneself
not worthy enough, along with mood swings, eating disorders, etc. People normally
M. Yohapriyaa (B) · M. Uma

SRM Institute of Science and Technology, Chennai, India
e-mail: ym7034@srmist.edu.in
https://doi.org/10.1007/978-981-16-7610-9_41
554 M. Yohapriyaa and M. Uma
show different sets of symptoms, and while being in that state, they do not feel
comfortable in talking to others about their problems freely.
Nowadays, many people use social media like Twitter, Facebook, and forums to
share their emotions and feelings in day-to-day life [2]. Open and free communication
platforms such as the social media sites, online blogs, and discussion forums help in
problem solving and information sharing [3]. The traditional method of research in
the area of depression analysis was based on questionnaire methods, which require
subjective response or comments from the individuals. This method does not provide
good accuracy since it differs from person to person and it is difficult to obtain real
emotions like social media data. Thus, social media is widely used in detecting the
disorders like stress, anxiety, and depression. According to WHO September 2012
Report, it is stated that 75% of suicides were happened in low- and middle-income
countries. Based on Lancet Report in 2012, it is clear that in India many adults aged
15–29 years were committing suicide. National Crime Records Bureau Reports state
that 2471 students commit suicide due to failure in examination in the year 2013.
Figure 1 shows various percentages of suicidal cause in the year 2015. According
to the WHO, about 7,88,000 persons were affected by this illness in 2015, with
family problem
illness
causes unknown
marriage related
other causes
unemployment
failure in examinaon
drug abuse
Bankruptcy
love affair
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00%
Fig. 1 Percentage of various causes of suicides in 2015

Multi-variant Classification of Depression Severity … 555
roughly 8934 students dying as a result of depression [4]. In the following five years,
this figure is expected to grow to 39,775. As a result, there is a need to establish
a system for early depression prediction in order to save the lives of many young
people. In this paper, we tried to develop a system to detect depression from Twitter
post of individuals for a specific time period. LSTM-CNN is used to detect emotions
that reflect in the post posted by the user. The system will detect the severity of the
depression based on the result of the neural network.
2 Related Works
Many research works were carried out in the field of depression detection by
analyzing the social media data. Depression can affect the language used by suffering
individuals. University of Texas has conducted an examination in the form of essay
writing for a group of individuals, who are depressed, non-depressed, and formerly
depressed college students. The study confirms the increased usage of the word “I” in
particular, along with more negative emotion words in the depressed student’s group,
thus telling us the use of singular form of words is done more frequently by such
individuals. Similarly, a Russian speech study also found an increased frequency of
pronouns and verbs in past tense among the depressed patients. Another study done
on English forum posts observed an elevated use of words like absolutely, completely,
every, nothing, etc., commonly known as absolutist words, among people suffering
from depression and anxiety. Thus, we believe that Twitter tweets will be very useful
in detection of depression [5].
Trotzek et al. [6] have used natural language processing for depression detection by
comparing user-extracted data with the dataset and classify the severity of depression.
The proposed system considers early risk detection error as a metric for depression
detection. This method does not provide high accuracy due to high false-positive
rate of the system. Minimizing false-positive rate can increase the accuracy of the
system.
Aldarwish et al. [7] use Naïve Bayes classifier algorithm along with the question-
naire method to detect stress on Facebook users based on their location and Facebook
post posted by the individuals. The proposed system uses API to extract the data and
questionnaire for further classification to increase the accuracy of the system. Iram
et al. [8] focus on classifying depression through studying linguistic style of the post
of the individual along with the sentiment analysis with the use of mood tags. This
paper implements linguistic inquiry word count as an analysis tool to determine the
severity of depression based on the content generated by the user. Choudhury et al.
[9] in this paper use Twint to extract the Twitter data of individuals using Twitter
Username. Finding the annotated data is difficult with this approach. If the data is
cleansed well, the accuracy can be increased.
Song et al. [10] in this paper consider the ruminative thinking, writing style along
with depression symptoms for text analysis. Recurrent neural network is used to
understand user semantics. This paper presents feature attention network which simu-
lates the process of detecting depression using social media text by domain expert.
Oak [11] in this paper focus on structuring and processing of text before analysis
followed by radial basis function network (RBFN). These discriminant predictors
along with random forest classifier will help in identifying the depressed post and
differentiate it from neutral one. Victor et al. [12] in this paper proposed automa-
tion evaluation with multimodal neural networks to detect depression based on the
Facebook post of the user along with the questionnaire methods. The proposed frame-
work also incorporates artificial intelligence mental evaluation (AIME) and Naïve
Bayes classifier algorithm to increase the accuracy of the system. Fedil et al. [13]
analyze user behavior based on different aspects of their writings and other features
like textual spreading, time span, and time gap. The research is unique because they
considered time gap between the posts which is not taken into account in previous
research, but the time variable nature of this parameter makes the training data diffi-
cult and decreases the accuracy of the system. Smys et al. [15] proposed a hybrid
approach of support vector machine and Naïve Bayes algorithm to improve the accu-
racy in early detection of depression. Depression data from different social media
domains should be included to test the accuracy and sensitivity of the proposed
model.
Valanarasu et al. [16] proposed a model that uses dynamic multi-context infor-
mation from various social media data like Twitter, Facebook, and Instagram for
predicting the personality of person. The accuracy of the proposed approach is high
compared to other traditional approach in the process of personality prediction of
person. Senthil Kumar et al. [17] proposed a hybrid technique based on Naïve Bayes
and the decision tree for predicting children behavior based on their emotional reac-
tion. The limitation in the proposed model is that it led to overfitting when there is
more change in the training data.
3 Methodologies
The proposed model will work as follows: The LSTM with convolutional neural
network was built using Keras to determine whether social platform users are depres-
sive based on their Twitter posts. We used binary classification in this project because
retrieving datasets on mental illnesses is difficult. Long short-term memory (LSTM)
is well suited to classify and predict sequential data, which was chosen for this project
to retrieve random tweets. We retrieved a CSV file from the Kaggle dataset Twitter
sentiment. Since there are no public datasets available for depression, the Twint tool
is used to scrape data with the keyword depression to get data from over thousands of
users at once. For the data procession stage, the tweets data goes through a cleansing
stage, where all the irrelevant data is removed, which includes the emojis used in
tweets, hashtags, stop words, and various punctuations. The contractions are then
expanded. The tokenizer is then used to assign indices to words and filter out infre-
quent words, thus increasing the usability of the datasets and decreasing the time
Fig. 2 Workflow of proposed system
complexity of the system. Then, we proceed with making the embedding matrix for
the embedding layer of the model. For model architecture, the tokens and tweets
are entered into the embedding layer in a structured manner to get an embedding
vector, which forms our working unit. Figure 2 represents the architecture model of
the proposed system.
3.1 Data Extraction
3.1.1 Twint Data Scraper
We have designed the graphic user interface in such a way that it accepts basic user
details, such as name, age, gender, date of birth, and their Twitter account user ID.
This user ID is our main source for data extraction to get all the tweets from the
Fig. 3 Code snippet for entering Twitter ID and start date of analysis
user’s account. This includes all the tweets/posts, number of comments, number of
reactions, etc. For this, we have used the tool called Twint.
Twint is an advanced tool for scraping, coded inwardly in Python, allowing the
user to scrape tweets from Twitter user profiles without having to use Twitter APIs. It
utilizes the search operators of Twitter to permit scraping tweets from specified users,
along with hashtags used and comments with date and time the tweets were posted
by the user, without extracting sensitive user information such as their messages and
other personal interactions. Twint can differentiate tweets from other information
such as e-mails and telephone numbers.
Some of the benefits of using Twint that made us choose this include: First, it
fetches almost all user tweets (the API of Twitter limits it to last 3200 tweets only);
second, it felicitates fast setup initially; third and most important is it can used
anonymously too without Twitter sign-up, all this without any charges per usage.
Figure 3 represents code snippet that shows the part of code where user entry for
Twitter ID and start date for analysis is done.
3.1.2 Data Cleaning
In this stage, the raw data is cleaned to avoid all unnecessary information like hash-
tags, links, emojis, mentions, stop words, and punctuations. It is essential to clean the
data so that the unnecessary data is removed and only data essential to the working
of the project remains.
3.1.3 Data Padding
Here, the individual tweets are padded with extra tweets to get uniform tweet length
in every tweet of 140 characters. This is done to make all data uniform in length by
inserting spaces to make every tweet 140 character long.
3.1.4 Tokenizer
The tokenizer is used to assign indices to words and filters out infrequent words.
Tokenizer is used to convert human readable text to machine readable text. It is used
to separate words in sentence for good understanding of the text by the machine.
During training, the model understands depressive words since the tokenizer has
separates them. Thus, tokenizer is essential for the project.
3.2 Analysis
The main model used for analysis is based on LSTM-CNN. Long short-term memory
(LSTM) is an artificial form of recurrent neural network (RNN) architecture used
in the field of deep learning. It is unique as it not only processes single data points
like images but entire sequences of data like speech or video [14]. Its unit includes
a cell, an output gate, an input gate, and a forget gate. The main purpose of cell is
to remember values over arbitrary time intervals, while the other three gates have
to regulate the information flow into and out of the cell. LSTM is used to classify,
process, and make predictions based on time series data by considering the lags
varying duration between important events in a time series. Convolutional neural
network (CNN) is a class of deep neural networks, applied to analyze visual images. It
is useful in applications such as image and video recognition, recommender systems,
natural language processing, image classification, medical image analysis, and finan-
cial time series. CNNs are like regularized versions of multilayer perceptron; thus,
they have each neuron in one layer which is connected to all neurons present in the
next layer. Figure 4 represents the combined structure of CNN and LSTM algorithm.
The LSTM-CNN combined architecture involves using the CNN layers combined
with LSTM; here, CNN helps in feature extraction on input data, while LSTM
supports sequence prediction. This combination is used to analyze and predict visual
time series, and for generating textual descriptions using sequences of images, or
videos. This model is used for activity recognition problems, which included gener-
ating textual descriptions of activities, demonstration done through a sequence of
images and image description, which generated textual descriptions of single images.
Thus, the importance of this architecture lies in generating textual descriptions of
images. Key assets of CNN are that it is pre-trained on tackling challenging image
Fig. 4 Architectural representation of CNN and LSTM algorithm
classification tasks for feature extractor. CNN-LSTM has also been used for recog-
nition of speech, where LSTMs work on audio and textual input data, and CNNs for
feature extractions.
The convolutional layer is added in the proposed approach because CNN is great
at learning spatial structure from data and the convolutional layer takes advantage
from that and learns some structure from the embedding vector. The output obtained
from the convolutional layer is then fed into the LSTM layer, whose corresponding
output is fed into the dense layer with sigmoid function for final prediction.
3.3 Classification
For determining depression severity classification, we implemented the following

parameter.
• If the frequency of the depressed tweets is over 60%, then the person is suffering
from severe depression.
from moderate depression.
from mild depression.
• If the frequency of the depressed tweets is below 20%, then the person is not
depressed.
4 Result and Analysis
The model has been successful in detecting depression of user using their Twitter IDs
and obtaining their severity measure. The result of our model shows a bar graph of
function of number of depressed tweets to time stamp. Depressed tweets are depicted
through spikes, and the absence of spikes represents normal or non-depressed tweets
from the user.
Figure 5 represents the results of test users; the graph depicts the total number of
depressed and not depressed tweets, and based on this the severity is obtained.
The most important feature of the system is detecting depression without inter-
acting with any psychologists. The system is reasonably accurate in detecting depres-
sion from the scraped tweets without any external input, and the combined nature
of LSTM and CNN in this proposed system is well suited for increasing the perfor-
mance of the system. The system has successfully proved that it can help in situation
where the user can recover themselves from the depression without the involvement
of therapist or psychologist. Figure 6 represents the confusion matrix, and the input
for the confusion matrix is obtained from fifty individuals continuously using Twitter.
Twitter ID and few other factors like age, gender, and name are collected from the
users. Data like comments, tweets, and retweets is extracted from the user’s Twitter
ID, and the depression severity is obtained using the proposed model. The confusion
matrix states that even though the system has high successful detection rate it needs
to be improved on better accuracy and detection rate. This can be done by increasing
the size of the dataset and running more epochs while training the model.
5 Conclusion
A combined method of convolutional neural network and long short-term memory

is proposed with the motive of attaining highest accuracy for simulating the process
of detecting depressed people from social media texts collected from Twitter for
a period of four weeks. We proposed a combined method of convolutional neural
network and recurrent neural network (LSTM) with the aim of attaining highest
accuracy for simulating the process of detecting depressed people using social media
texts. Through our model, we can focus more on depression-relevant sentences by
post-level attention, which fits well into the real-world situation where only a few
posts are relevant to depression even for depressed users. It also enables interpreta-
tion of why a particular post is relevant to depression in terms of features taken from
psychological studies, which is important for further clinical analysis of depressive
symptoms. However, our model uses smaller training data as input due to limited
computing power; with more computing power, we can anticipate that our model
will show competitive performance against the state-of-the-art model. The proposed
model takes advantage of high-dimensional representations of neural networks and
at the same time allows other high-level features to be readily incorporated; if we
add other useful features to the model, it will be possible to obtain more reason-
able and diverse explanations for different aspects of depression. If we can generate
appropriate feature for other mental disorders (such as dementia, schizophrenia, and
bipolar disorder), it will be possible to simulate the process of diagnosing them in a
similar way.
USER RESULT 1
NOT DEPRESSED: The severity of depression for user 1 based on the classification of depression and
not depressed tweets is stated as not depressed.
USER RESULT 2
Fig. 5 User result

MILDLY DEPRESSED: The severity of depression for user 2 based on the classification of depression
and not depressed tweets is stated as Mildly Depressed.
USER RESULT 3
MODERATELY DEPRESSED: The severity of depression for user 3 based on the classification of
depression and not depressed tweets is stated as moderately depressed.
USER RESULT 4
SEVERLY DEPRESSED: The severity of depression for user 4 based on the classification of depression and
Not depressed tweet is classified as severely depressed.
Fig. 5 (continued)

References
1. Depression and other common mental disorders: global health estimates (2017) World Health
Organization
2. Glavan R, Mirica A, Firtescu B (2016) “The use of social media for communication”, official
statistics at European level. Rom Stat Rev 4:37–48
3. Rosa RL, Schwartz GM, Ruggiero WV, Rodríguez DZ (2018) A knowledge-based recom-
mendation system that includes sentiment analysis and deep learning. IEEE Trans Industr Inf
15(4):2124–2135
4. Khan A, Husain MS, Khan A (2018) Analysis of mental state of users using social media to
predict depression! a survey. Int J Adv Res Comput Sci 9(2):100–106
5. Al Asad N, Pranto MAM, Afreen S, Islam MM (2019) Depression detection by analyzing social
media posts of user. In 2019 IEEE international conference on signal processing, information,
communication & systems (SPICSCON). IEEE, pp 13–17
6. Trotzek M, Koitka S, Friedrich CM (2018) Utilizing neural networks and linguistic metadata
for early detection of depression indications in text sequences. IEEE Trans Knowl Data Eng
32(3):588–601
7. Aldarwish MM, Ahmad HF (2017). Predicting depression levels using social media posts.
In 2017 IEEE 13th international symposium on autonomous decentralized system (ISADS).
IEEE, pp 277–280
8. Fatima I, Mukhtar H, Ahmad HF, Rajpoot K (2018) Analysis of user-generated content from
online social communities to characterise and predict depression degree. J Inf Sci 44(5):683–
695
9. Choudhury MD, Gamon M, Counts S, Horvitz E (2016) Predicting depression via social media.
In: Proceeding of AAAI conference on weblogs and social media
10. Song H, You J, Chung JW, Park JC (2018) Feature attention network: interpretable depression
detection from social media. In PACLIC
11. Oak S (2017) Depression detection and analysis. In: 2017 AAAI spring symposium series
12. Victor E, Aghajan ZM, Sewart AR, Christian R (2019) Detecting depression using a frame-
work combining deep multimodal neural networks with a purpose-built automated evaluation.
Psychol Assess 31(8):1019
13. Cacheda F, Fernandez D, Novoa FJ, Carneiro V (2019) Early detection of depression: social
network analysis and random forest techniques. J Med Internet Res 21(6):e12554
14. Tadesse MM, Lin H, Xu B, Yang L (2020) Detection of suicide ideation in social media forums
using deep learning. Algorithms 13(1):7
15. Smys S, Raj JS (2021) Analysis of deep learning techniques for early detection of depression
on social media network-A comparative study. J Trends Comput Sci Smart Technol (TCSST)
3(01):24–39
16. Valanarasu MR (2021) Comparative analysis for personality prediction by digital footprints in
social media. J Inf Technol 3(02):77–91
17. Kumar TS (2021) Construction of Hybrid deep learning model for predicting children behaviour
based on their emotional reaction. J Inf Technol 3(01):29–43
Identification of Workflow Patterns
in the Education System: A Multi-faceted
Approach
Ganeshayya Shidaganti, M. Laxmi, S. Prakash, and G. Shivamurthy
Abstract Educational institutions struggle with an overwhelming amount of docu-

mentation and legislation that must be handled on a daily basis. In this regard, answer-
ability and accuracy becomes extremely critical. This sector deals with multiple tasks
that are associated with desirability accompanied with assorted cost. Hence, each
institution demands for cost-effective outcomes that accommodate both reusability
and automation, often interpreted in the form of workflow patterns. This paper
is tantamount to the application of workflow patterns in an all-inclusive manner
primarily in the education sector. The uphill novel task proposed in this paper is the
multitude of scenarios ranging from easy to difficult. A multi-faceted approach with
different scenarios of slow, manual, labour-some tasks and its systematic workflow
of execution are validated with substantial results and inferences.
Keywords Workflow systems · Models · Education · Automation · Development ·

Design · Scheduling · Cloud computing
1 Introduction
Workflow technology is one that is an on-going process with respect to time, in the
sense that the development process is ever-green and it is a newly emerging field of
Technology. The paradigm is construed in such a manner that the scope of its outreach
is broad, i.e. there are multiple products available on the market. As newer products
are taken into consideration, the older frames of references turn out to be obsolete in
nature and this provides very little contextual information. Although, this is a problem
G. Shidaganti (B) · M. Laxmi

M.S. Ramaiah Institute of Technology, Bangalore, Karnataka, India
e-mail: ganeshayyashidaganti@msrit.edu
S. Prakash
Chandigarh University, Chandigarh, India
G. Shivamurthy
Visvesvaraya Institute of Advanced Technology (VIAT), Muddenahalli, India
https://doi.org/10.1007/978-981-16-7610-9_42
566 G. Shidaganti et al.
that does not have a quick-fix, retrieving domain knowledge is easy with the help
of perspectives in the Education Sector. The control flow perspective deals with the
transcend of information from the source to the output layer, the various parameters
and procedures involved like serialization, parallelism, synchronizations and joins.
The data perspective deals with all the related information that is useful at any instance
of time which can be either be on the business end of things or on the model end of
things. The local variables hold crucial states of data before and after the execution
of the workflow model. The resource perspective provides a significant level of
anchor able support and ties the whole workflow operation together with the help of
fixed syntactic and semantic functionality. The operational perspective deals with the
mapping of higher and lower level elements. This is essential as the business activities
can directly be translated into meaningful applications which is of utmost importance
in education related domains. The main contribution proposed in this paper is to
provide the uphill novel task of the multitude of scenarios ranging from easy to
difficult. A multi-faceted approach with different scenarios of slow, manual, labour-
some tasks and its systematic workflow of execution are validated with substantial
results and inferences. The different critical workflows that are consisdered as a part
of the proposed approach are as follows,
• Sequence
• Parallel Splıt
• Synchronızatıon
• Exclusıve Choıce
• Sımple Merge
• Multı-Choıce
• Synchronızıng Merge
• Multı-Merge
• Dıscrımınator
• N-Out-Of-M Joın
• Arbıtrary Cycles
• Multıple Instances Wıth A priori Desıgn Tıme Knowledge
• Multıple Instances Wıth No Aprıorı Run Tıme Knowledge.
2 Related Work
On referring a number of papers mentioned below, we have focused in this paper,

application of these meaningful workflow patterns in the educational sector to derive
insights on the same. A large number of in-house college related activities are taken
into account and the necessary circumstantial situations are derived based on the
current parameters which are relevant to the given specifications.
Cloud computing [1] provides high level Internet services to large number of
users, scalable storage and high performance. Cloud systems store data in the form
of large and distributed repositories which need parallel, distributed techniques for
analysis and such analysis methods will be used by scientists, working professionals.
Identification of Workflow Patterns in the Education … 567
The flow of data, documents or tasks between the participants and procedural rules
are called as workflow patterns. Models which implement such pattern will have
control programme in-built and will handle scheduling, sequencing of each step
from central location. An advantage of using workflow patterns in analysis is, patterns
will be defined before saving. Hence, they can be called either for modification or
re-execution which allows analysers to abolish typical forms while reusing them
in different scenarios. Existing visual workflow management systems are Galaxy,
Swift, Tarvena, e-Science, Tavaxy, Keplar, Clowdflows, etc.
Task scheduling plays an important role in cloud systems. The problem of task
scheduling is, a task needs to be divided into set of subtasks, and available resources
should be distributed among sets of multiple tasks in such a way that the desired
goal should be met. Performance [2] of cloud systems depends upon task scheduling
algorithms. The referred paper discusses different approaches of task scheduling
based on energy and deadline awareness. Fine-grained and Coarse-grained tasks
together form scientific workflow. Scheduling of tasks to virtual machine introduces
system overhead. If multiple fine grain tasks are executing in a scientific workflow,
then it increases overhead. In order to overcome the scheduling overhead, multiple
small tasks have been combined to one large task, which decreases the scheduling
overhead and improves execution time of the workflow.
The dramatic rise in size, volume and sophistication in cloud services and
resources contributes to their increasing difficulty in monitoring and accessing them.
Developing new strategies for identifying, implementing and handling resources to
ensure the need of quality of services [3] is becoming an area of research commonly
referred to as Resource Orchestration (RO). The increasing complexity of Cloud
services prompted the development of new programming and delivery frameworks.
Major providers are encouraging pattern-based production to create additional value-
added services. This approach attempts to provide complicated systems and infras-
tructure through the integration of simpler ones. E-businesses need to actively alter
business operations, i.e. the processing of reports and activities in a business called
as workflow, in this extremely competitive and evolving world. To help these contin-
ually evolving processes, additional robust workflow management systems [4] are
needed. In this study, an autonomous application model is demonstrated for the
implementation of e-workflow applications.
The workflow management in [5] is made simpler with the help of various tools
technology available today like Kubernetes and terraform. Workflow management
systems like Hyperflow accompanied with the aforementioned technology assures
a complete distributed and non-centralised execution and management workflow. In
[6] the concept of bioinformatic workflow is proposed as well where the scalability
is another major concern as the data is generated rapidly and every recorded data has
a role to play. Though cloud and containers contribute extensively they come with
the inability to work over different cloud providers and difficulty in management
of the tremendous number of containers. The authors of [7] highlights the impor-
tance of workflow management systems in the domains of scientific management.
Over the years, scientists have dealt with the growing technological advancements
and modelling all the requirements has caused several setbacks and troubles. In [8]
Table 1 Summary of the related work

Related work Limitations Proposed Approach
[5] The workflow included only sequence The proposed approach considers the
flow of the information and not the different types of workflows along with
various forms of parallel split parallel split in a systematic manner
accompanied with Kubernetes and
terraform
[6] It is limited with the usage of The proposed approach was
resources and does not consider the implemented using Kubernets and thus
distributed container approach considered the container management
effectively for the parallel split up
workflows
[7] It considers the workflow of scientific The proposed approach considers the
management systems with different types of resources and
homogeneous resources only workflows of synchronization and
merge
[8] The workflow considered is based on The proposed approach considers the
the parallel split up with no synchronization-based workflow in a
synchronization strategy and distributed manner using Kubernets
distributed
the development using workflow management systems in the field of Blockchain

Technology is addressed. The entire system is built based on the general steps of a
Blockchain model and this paper demonstrates how the two can be seamlessly inte-
grated to achieve a status Quo demonstration of the same. The workflow manage-
ment is essential as it helps with the correct understanding and identification of all
the crucial Blockchain aspects. The verification of the ledger is enforced using the
workflows. A summary of the related work with its limitations and the advantages
of the proposed system is as shown in the Table 1.
3 Workflow Patterns in Education
The workflow patterns [9] offer a spectrum of possibilities from seemingly easy to
futuristically being difficult. To enhance the scope of understanding of these patterns,
it will be divided into a generic description and a related example to understand the
given situation better. For each of the patterns and the specific examples a random
dataset was consisdered using the MongoDB.
Pattern 1: Sequence: A workflow is a sequence of steps in any environment to
achieve a defined goal. These steps are designed to improve performance and ensure
efficiency in a certain order. The end goal of a workflow [10–15] defines the structure,
performance and tracking of the different tasks.
In Fig. 1, placement_eligibilty activity gets triggered after completion of all the
previous activities execution in sequence. SSLC_marks_verification task checks for
Fig. 1 Sequence
the particular cutoff in 10th grade, if satisfied the 12th_marks_verification task will
be executed which checks for the cutoff in 12th grade, if the cutoff still holds,
CGPA_verification task checks for CGPA cutoff in current degree and on satis-
fying the required criteria the candidate is given eligibility for the placement by
placement_eligibility task.
Pattern 2: Parallel Splıt: A parallel split pattern refers to a workflow that simul-
taneously executes one or more tasks. The pattern emulates a scenario where-in
parallelization using threads is key. The order in which they are defined is not
specified.
In Fig. 2, the scenario of a student applying for a competitive exam is examined.
When the candidate is applying for competitive exams like JEE or CET, the filled
application form is submitted to the examination portal meanwhile status of the appli-
cation will be notified to the applicant via email or message. Hence, the activation of
the exam_application will trigger the following two activities, application_sent and
status_update, simultaneously using the AND workflow.
Pattern 3: Synchronızatıon: The confluence of more than 1 branch into a primary
branch that when all inputs are turned on, the control thread is passed to the subsequent
branch.
In Fig. 3, Activity SEE_eligibility is activated only after the completion of both
the activities, attendance_eligibility and internal_marks_eligibility. Both these tasks
Fig. 2 Parallel split
Fig. 3 Synchronization
Fig. 4 Exclusive choice
are processing synchronously and are influencing the final task SEE_eligibility.
According to the example in order to get eligibility for SEE (semester end examina-
tion) student should have 85% of attendance and secured at least 30 out of 50 marks
in internals.
Pattern 4: Exclusıve Choıce: Division of a particular branch into sub-branches
so that, if the approaching branch is triggered, the control is quickly transferred to
exactly one single branch. This is based on the procedure that any one of the outgoing
splits can be chosen.
In Fig. 4, Activity supplementary_eligibility is implementing exclusive choice
and hence only one of the activities will be triggered. In this example, supplemen-
tary_eligibility task is checking the eligibility of the student to take up Supplementary
exam based on the number of subjects not cleared. If the count is lesser than or equal
to two then write_supplementary task will be invoked and student will be allowed to
write supplementary exam else year_backlog will be activated and student will not
be allowed to write supplementary.
Pattern 5: Sımple Merge: It is the unification of more than 1 single branch into
a resultant, subsequent of the same. On activation of the approaching branch, the
control thread is transferred to the subsequent branch.
In Fig. 5, the task reciept_generate will be activated either by completion of
activity online_payment or by offline_payment. According to the example student
can pay the fees through online or offline mode and after completion of payment the
receipt will be generated.
Fig. 5 Simple Merge

4 Advanced Branchıng and Synchronızatıon Patterns
Higher levels of assisted branching and contemporized patterns results in a variety

of patterns that are eminent in business processes. They require the use of advanced
branching and unification concepts.
Pattern 6: Multı-choıce: It deals with the splitting of a parent branch into children
branches so that if an incoming branch is triggered, the control is quickly transferred
to any of the departing branches, based on due selection criterion.
In Fig. 6, Multi-choice workflow is used to describe the placement mailing system
by the function placement_mail where the mail is sent to the branch it is meant to be.
Companies for recruitment may or may not come for all the branches hence, sending
the recruitment mail to all the branches are not necessary. If BE candidates are eligible
for a particular recruitment drive the mail is sent to them by the function BE_mail.
If more than one branch is eligible the mail is sent collectively to all the eligible
branches by the functions called MTech_mail for MTech students, “MCA_mail” for
MCA students and “MBA_mail” for MBA students.
Pattern 7: Synchronızıng Merge: It promotes the concurrence of more than one
single branch, that was earlier separated, into one resultant branch. When newer
inputs are activated, the control is moved to the subsequent branch, respectively.
Fig. 6 Multi-choice
Fig. 7 Synchronizing merge
In Fig. 7, to explain the synchronizing merge, a student’s fund is considered.

Certain documents have to be submitted by the student for the grant of scholar-
ship or student loan. The grant_scholarship will be triggered only after both the
processes, get_aadhar and bank_account, are enabled by the multi-choice construct.
The scholarship will be granted to the student if and only if the student provides the
institution with the Aadhar and bank account details. The grant_scholarship will be
triggered in spite of get_PAN not being enabled since PAN card details are not neces-
sary for scholarship to be granted and it is only the activated branches, get_aadhar
and bank_account, that has to be enabled for the execution of the grant_scholarship
task. The student_loan will be processed only after the multi-choice executes the
get_aadhar, get_PAN and bank_account. In the sense the loan will be granted to the
student only after the submission of the Aadhar, bank account and PAN card details.
Comparatively, the student_loan will not be triggered until and unless the get_PAN
is being enabled since PAN card details is necessary for loan to be granted. Each and
every activated branch, get_aadhar, bank_account and grant_scholarship has to be
enabled for the execution of the student_loan task.
Pattern 8: Multı-merge: The unification of more than 1 branch into one concur-
rent unit after an earlier internal hassle involving the system template, because of
which the control thread is transferred to the relatively easier branch when the prelim-
inary inbound branch is allowed. Subsequent branch allowing is not related to the
control thread being transferred, whatsoever.
In Fig. 8, internal_component of the marks of a respective student is shown to
depict multi-merge workflow. Internal assessment is split into two tasks, such as,
paper publication or NPTEL exam and online quiz. It is assumed that each student
is given a choice to either attend the NPTEL exam or publish a research paper and
that it is mandatory for every student to take up the quiz. AND workflow is used to
activate both online_quiz and either paper_publication or NPTEL_exam, which is in
turn implemented using XOR split restricted by the choice of the respective student.
The evaluation of these tasks are merged for the final 20 marks internal component.
Ten marks is granted for either paper publication or NPTEL exam and another 10
for online quiz. These sub tasks are independent of each other and the completion
of either of the subtasks results the MULTIPLE MERGE workflow to activate an
instance of SIS_updation, resulting in multiple instances of SIS_updation activity
for multiple active incoming transitions. Most work flow engines do not generate
Fig. 8 Multi-merge
any instances once the first instance is up and running. If the concept of multi-merge
is absent from the loop, generic design pattern ensures that the activity instances are
multiplied and are thus followed in the workflow model.
The previous scenario dealt with multiple instances of the SIS_updation, for each
incoming transitions that is, Paper_publication, NPTEL_exam and Online_quiz.
In Fig. 9, in order to overcome the above stated problem, in here, the SIS_Updation
activity is replicated with respect to each incoming transition which would result in
the single instantiation of SIS_updation for each incoming transition but happen
concurrently.
Pattern 9: Dıscrımınator: The unification of more than one single branch into
one concurrent branch after an earlier canonical departure in the system template,
so that the control is transferred to the succeeding unit when the primary arriving
unit is allowed. Subsequent branch allowing is not correlated with the passing of the
control thread in any manner.
In Fig. 10, the review process of a research paper can be considered to explain
the discriminator workflow. The review process will involve multiple sub-processes
and three such sub-processes, plagiarism check, standard conformation, domain
aptness, are considered here. The subsequent activity accept_paper will be trig-
gered if and only if the sub-processes, plagiraism_check, standard_conformation
and domain_apt, are enabled and return positive responses.
Fig. 9 Multi-merge with SIS_updation
Fig. 10 Discriminator
Consider a scenario where the activation of the plagiarism_check returned nega-

tive value, then the accept_paper will be triggered to not accept the paper, but then
the discriminator will still continue to wait for the remaining two processes, stan-
dard_conformation and domain_apt, to be completed and ignores their respective
results. The discriminator resets itself to its original state only after all the incoming
transitions have been triggered.
Some work flow engines are explicit in the sense that no instances will be spawned
as long as the primary instance of the activity is existing. This does not prove to be
a rightful solution for the Discriminator as instances will generated based on the
completion of the previous one. In order to overcome the above stated problem, the
Cancelling Discriminator needs just the primary thread of the control to be accepted
by an inbound unit. As soon as this is accepted, the residual branches are care-
fully drawn into “bypass” mode. This property ensures that the tasks that have been
untouched remain so, thereby allowing the Discriminator to be restarted as quickly
as possible.
In Fig. 11, the activation of the paper_review process will result in the
AND workflow triggering all the required sub-processes, plagiraism_check, stan-
dard_conformation and domain_apt. Once either of the process is completed, the
other processes that have not completed will be cancelled. In other words, if the
plagiarism_check transition is completed and returned a negative response, the
accept_paper will be triggered to not accept the paper, then the remaining incomplete
processes, standard_conformation and domain_apt, can be terminated (Task E).
Pattern 10: N-out-of-M joın: Coalition of more than one branch (say M) merging
to form one succeeding unit after a related beforehand separation of the system
template so that the construct is transferred to the consequent branches when N of
the approaching branches is allowed, keeping in mind that N is a lesser value than
M. Consequent branch enabling is not related to the control thread being handed out.
Fig. 11 Discriminator with AND workflow

So, activating all functional incoming divisions, the link build resets. The join takes
place in a fixed, cohesive and structured manner, i.e. in the finite model where-in the
join is present, there must should be one Parallel Split construction beforehand and
it must unify all the units emerging out of the same.
In Fig. 12, a final year BE student is given the facility of choosing two subjects
out of three subjects as open electives. The selection of the respective subjects by
the student can be implemented using the N-out-of-M join. In Fig. 13, once the
open_elective is triggered, the availability of the multiple subjects, DS, CC and DL,
for each student is provided by the AND workflow. The selection of any two subjects
out of the three options provided will lead to the subsequent activity of SIS_updation.
Largely, the work flow models do not have features that ensures the quick and
easy solution of the N-out-of-M-Join. However, the combined results of pattern 3
and 9 spells wonder and one can truly obtain the necessary results of the same. One
disadvantage being, the model becomes awfully advanced and tougher to understand.
In Fig. 10, the following implementation AND workflow is used at three levels. The
activation of the open_elective activity will trigger the first level AND which is
used to depict the availability of the three subjects. AND at second level is used
Fig. 12 N-out-of-M join

Fig. 13 N-out-of-M join

with AND workflow
to form all the possible combinations such as DS_CC, DS_DL and DL_CC, that a
particular student can opt for. All the possible combinations formed are provided
to the discriminator using the third level AND, which will in return activate the
SIS_updation.
5 Structural Patterns
Cohesive data-retrieval is proving out to be largely advanced ad difficult, because

of the numerous factors associated with it. Analysing and obtaining the structure
of these systems will result in the entire process being informative and increase
understandability.
Pattern 11: Arbıtrary Cycles: The ability to represent cycles with more than
one entry or exit point in a process model. Individual entry and exit points must be
associated with separate branches.
In Fig. 14, the complete CET process can be used to explain the arbitrary cycle
workflow pattern. We have considered a single entry and multiple exit points. Let
us consider a student who is interested in both engineering and medical colleges.
The student will enter the respective list of colleges he is interested in, activating the
enter_list transition. The implicit AND workflow is used to allocate both medical
Fig. 14 Arbitrary Cycles
and engineering colleges to the respective student based on the respective conditions
(acquired rank). The above process will result in the activation of the allocate_eng
and allocate_med processes. The result of the above two processes will result into
the activation of final_list providing a single final list of colleges using the MULTI-
MERGE workflow. Using XOR workflow, the student can accept the allocated college
if he is satisfied and exit the process, activating the terminate_process transition. Else,
if the student is not satisfied with the allocated college then he/she can either withdraw
from the entire CET process and exit or proceed with the next round of counselling.
This can be achieved using another XOR workflow where either withdraw_process
or continue_process is triggered based on the condition whether the user wants to
proceed with process or not. The continue_process will result in iterative cycles. The
amorphous presence of the Arbitrary Cycles patterns is laborious to keep in some of
the BPM offerings, most likely those that abide by structured principles. Situations
invoke the likelihood of revolutionizing process models having Arbitrary Cycles as
structured elements.
In Fig. 15, the structured cycle is similar to the previous process till the final list of
colleges is generated. Let ϕ be the parameter stating that the student is satisfied. If the
student is satisfied, then the parameter Θ is set to TRUE else for further computations
the parameter Θ is set to the value that the student wants to discontinue the process.
Either of the two assigned values will be passed on to MULTI-MERGE. Further
computations are carried as follows based upon the Θ and ϕ values using XOR
workflow. Θ set to TRUE indicates ϕ being set to satisfied resulting in the student
accepting the offer and terminating the process, triggering terminate_process. If the
Fig. 15 Structured Arbitrary Cycles
student wants to discontinue and is not satisfied with the allocation(−ϕ) then the
student withdraws from the entire process, activating withdraw_process, else the
student proceeds to continue with further rounds of Counselling (−Θ).
Pattern 12: Implıcıt Termınatıon: If there are not any left-over elements or
residues of work, present either now or relatively in the future, a special system
should be construed to ensure that the system is not in deadlock state. There are
normative ways to determine the successful completion of the system.
In Fig. 16, fee_payment is followed by two subtasks, bank_update and
college_update. In bank_update subtask bank database is updated and fee receipt
is generated by the activation of the process receipt_generate in sequential manner.
Second subtask updates the database of college with the amount paid by student as
Fig. 16 Implicit termination

fee, activating the college_update process. Both the subtasks are independent of each
other and terminate right after their functioning. As per the example the first subtask
terminates right after generating fee receipt and second task terminates after updating
college database. Either of the process is not terminated upon the termination of the
other process.
6 Patterns Involvıng Multıple Instances
Pattern 13: Multiple Instances with a Priori Design Time Knowledge: More than
one domain of a function can be generated in a given system example. The number
of instances needed is known at the time of development. Such cases are separate
and operate at the same time. Before any additional activities can be activated, the
task instances must be coordinated at completion.
In Fig. 17, activity result_in _progress is followed by an activity of updating the
result (subject_update) for each subject. At the time of result estimation (design)
the number of instances which are enabled will be known as the number of subjects
the student has registered for is known beforehand. In this example, the student is
assumed to have registered for six subjects, hence six instances are enabled. Comple-
tion of all the enabled tasks will lead to the activation of result_announcement. In
simple terms before announcing the final results, result of each subject has to be
updated, which takes place concurrently.
Pattern 14: Multiple Instances with a Priori Run Time Knowledge: Numerous
types of a function can be generated in a given system example. The possible occur-
rences needed can be related to many runtime variables, inclusive of data in sentential
state, asset accessibility and within-process interaction, but it is understood prior to
the function values need to be generated. These instances, once begun, are self-
sufficient from the rest and execute at the same time. At completion, the instances
must be synchronized earlier to any consequent functions can be invoked.
In Fig. 18, connectivity to college Wi-Fi is depicted in the flow chart. Multitude
of instances with prior run time information workflow initiate’s the calls depending
on number of factors like the network bandwidth, number of devices on the network,
latency, etc. The fact that the connection can be established or not is not known in prior
since the established connections at a particular instance of time varies dynamically
and can be known only at run time. Before a new connection is established the
constraint is checked, that is number of connections already established, at run time. If
the count is below the limit, new connection is granted and the number of connections
Fig. 17 Multiple instances with a priori design time knowledge

Fig. 18 Multiple instances

with a priori run time
knowledge
is incremented by activating the task increment_count_by_one as well as notifying

the successful registration by registration_successful activation. All the connections
established so far are independent of each other as per the workflow. Provided, if the
constraint is not met then the new connection is not granted and a message saying
maximum limit reached is sent by notification_sent process.
Pattern 15: Multiple Instances with No Apriori Run Time Knowledge:
Different instances of a function can be generated in a given system example.
The total count of functions required rely on varied possibility of runtime factors,
comprising of existential state data, availability of systemic resources and within-
process communication. It will not be aware until and unless the last instance has
finished execution. Once they have begun, they are separate from each other and
operate concurrently. When instances are operating, new instances can be started at
any moment. Synchronizing the instances at the end is required before any future
calls can be enabled.
In Fig. 19, online coding competitions conducted on platforms like hacker rank
is used to depict multiple instances without prior runtime knowledge. As we know
before participation in such events it is a must to register. It can be done either
right after the announcement of the competition or on spot by on_spot_registration
function. Multiple registrations (instances) are entertained which depends on the
resources like the number of systems to conduct the competition, Internet connec-
tivity, invigilators, etc. which is not known until the final registration is done
(including on spot) thus no prior knowledge till the competition starts. As the
competitor register, resources are allocated by the function system_allocation and
are prepared to take up the event by the function ready_to_take_challenge. During on
Fig. 19 Multiple instances with no priori run time knowledge
spot registration, if the resource are available it is allocated else additional resource
are bought in and allocated as in function add_additional_systems. Only when the
registration closes the competition is initiated which is the next subsequent task.
Pattern 16: Multiple Instances with Synchronization: Several instances of a
task can be generated in a given system example. Such cases are equally exclusive
and operate separately. There is no need to synchronize them when they are done.
Each instance of the numerous value activities imbibed has to execute immediately
within the contextual information retained. They must be independent of each other
and should not be referentially tied to each other.
In Fig. 20, library fine is used to showcase the multi instances requiring synchro-
nization workflow. In the flow chart r stands for return date of the book that is when
the book is returned, e stands for due date which implies the date after which the
fine will be charged. Initially, days and due_amount are 0. Days are computed by
subtracting r and e that is number of days the fine is charged which is known only
during runtime and these many instances of the function add_delay_charges are
initiated. The final due_amount is computed by synchronizing (adding) the results
of all the instances of add_delay_charges where the due_amount is set to the charge
amount of the library for per day delay by each instance.
7 Conclusion and Future Work
This paper provides a complete, in-depth and comprehensive understanding of

the various kinds of workflow patterns that is recognized in the education sector.
The application of workflow patterns is exhibited in this paper using multi-faceted
approach. Through the analyses discussed above, the shortcomings, capabilities and
drawbacks of the designs along with its implementations were identified. The case
studies listed provide a complete description of the undertaken activities and delivers
Fig. 20 Multiple instances with synchronization
effective solutions for the same. To further understanding, every pattern has also been
explained diagrammatically. The content presented in this paper can help facilitate
more work in this field and related domains.
References
1. Prerana KA, Sadashiv N (2017). A study of workflow management systems in the cloud
environment. In: 2017 International conference on energy, communication, data analytics and
soft computing (ICECDS). IEEE, pp 2262–2267
2. Kaur S, Aggarwal M (2018) Extended balanced scheduler with clustering and rep-lication for
data ıntensive scientific workflow applications in cloud computing. J Electron Res Appl 2(3)
3. Amato F, Moscato F (2017) Exploiting cloud and workflow patterns for the analysis of
composite cloud services. Futur Gener Comput Syst 67:255–265
4. Ndeta J, Katriou S, Siakas, K (2015) An approach to E-workflow systems with the use of
patterns. Int J Entrepreneurial Knowl 3.https://doi.org/10.1515/ijek-2015-0007
5. Orzechowski M, Balis B, Pawlik K, Pawlik M, Malawski M (2018). Transparent deployment
of scientific workflows across clouds-kubernetes approach. In: 2018 IEEE/ACM ınternational
conference on utility and cloud computing companion (UCC Companion). IEEE, pp 9–10
6. Moreno P, Pireddu L, Roger P, Goonasekera N, Afgan E, Van Den Beek M, He S,
Larsson A, Schober D, Ruttkies C, Johnson D (2018) Galaxy-Kubernetes integration: scaling
bioinformatics workflows in the cloud. BioRxiv, 488643
7. da Silva RF, Filgueira R, Pietri I, Jiang M, Sakellariou R, Deelman E (2017) A characterization

of workflow management systems for extreme-scale applications. Futur Gener Comput Syst
75:228–238
8. Fridgen G, Radszuwill S, Urbach N, Utz L (2018) Cross-organizational workflow management
using blockchain technology-towards applicability, auditability, and automation
9. Eder J, Gruber W (2002). A meta model for structured workflows supporting workflow trans-
formations. In: East European conference on advances in databases and ınformation systems.
Springer, Berlin, Heidelberg, pp 326–339
10. Gogolla M, Kobryn C (eds) (2003) UML 2001-The unified modeling language. Modeling
languages, concepts, and tools. In: 4th ınternational conference, Toronto, Canada, 1–5 Oct
2001. Proceedings, vol 2185. Springer, Berlin
11. Raj JS (2020) Improved response time and energy management for mobile cloud computing
using computational offloading. J ISMAC 2(01):38–49
12. Konjaang JK, Xu L (2021) Multi-objective workflow optimization strategy (MOWOS) for
cloud computing. J Cloud Comput 10(1):1–19
13. Abualigah L, Diabat A, Abd Elaziz M (2021) Intelligent workflow scheduling for Big Data
applications in IoT cloud computing environments.Cluster Comput 1–20
14. Zhang L, Zhou L, Salah A (2020) Efficient scientific workflow scheduling for deadline-
constrained parallel tasks in cloud computing environments. Inf Sci 531:31–46
15. Jiugen Y, Ruonan X (2020). Cloud computing-based big data mining connotation and solution.
In: 2020 15th ınternational conference on computer science & education (ICCSE). IEEE, pp
245–248
Detection of COVID-19 Using Segmented
Chest X-ray
P. A. Shamna and Arun T. Nair
Abstract COVID-19 is quickly gaining popularity across the globe. By April 14,
2020, 128,000 individuals had been killed by COVID-19, and 1.99 million inci-
dents had been recorded in 210 countries and regions, totaling 219.747 cases. The
rapid spread of the virus throughout the globe has resulted in a severe shortage of
medical test kits in many parts of the world, particularly in Africa. A chest X-ray may
prove to be a more successful screening method in certain situations than thermal
screening of the whole body, due to the fact that the respiratory system is the most
susceptible area in a human’s body to infection. Lung segmentation is the initial
stage in identifying diseases using a chest x-ray picture. We describe a method for
segmenting the lung region from CXR images that is based on the Euler number
thresholding approach, i. When compared to current state-of-the-art methods, the
suggested method demonstrates superior accuracy and performance.
Keywords COVID-19 · Chest X-ray · Resnet-50 · Euler number · ImageNet ·

Confusion matrix · ROC
1 Introduction
In December of this year, a cluster of atypical pneumonia cases was discovered in

the Chinese area of Wuhan, and the disease rapidly spread across the globe [1–4].
At initially, just a few cases were recorded in the European Union, in countries like
France and Germany, but the number of cases exploded in the years that followed.
Multiple cruise ships have been affected by the epidemic, and cruise companies
have begun canceling or rerouting their itineraries in response to travel restrictions
imposed by nations worldwide to limit the spread of the disease [5]. By April 14,
P. A. Shamna (B)
India
A. T. Nair
https://doi.org/10.1007/978-981-16-7610-9_43
586 P. A. Shamna and A. T. Nair
2020, 128,000 individuals had died from COVID-19, and 1.99 million instances
had been recorded in 210 countries and territories, for a total of 219.747 cases.
COVID-19 is an infectious illness caused by the corona virus, which was recently
discovered and recognized. It was not found until December this year, when an
epidemic occurred in Wuhan, China [6]. The most often seen symptoms of COVID-
19 are fever, exhaustion, and a dry cough. The vast majority of patients (about 80%)
recover completely without additional therapy. According to the Centers for Disease
Control and Prevention, about one out of every six people infected with COVID-19
becomes very sick and has difficulty breathing. Senior individuals and those who have
chronic medical conditions such as hypertension, heart disease, or diabetes are more
likely than the general population to suffer from serious diseases [7], according to
research. COVID-19 virus is mostly spread via association with respiratory droplets
instead of through the air, according to current knowledge. Pneumonia is the term
used to describe an infection of the lungs caused by a kind of acute respiratory illness.
The lungs are composed of tiny air pockets known as alveoli that fill with oxygen as
a healthy person breathes in and out. When a person develops pneumonia, the alveoli
become blocked with pus and blood, causing discomfort when breathing and raising
the body’s oxygen absorption rate. Symptoms associated with pneumonia include
fever, difficulty breathing, and fatigue. Pneumonia infections can be transmitted in
a variety of ways [8, 9].
2 Literature Review
Hubel and Wiesel [10] the 1959 publication on single-neuron receptive fields is
widely regarded as a seminal work in computer vision [11], since it outlines the
key response characteristics of visual cortex neurons and the mechanisms by which
a cat’s sensory experience changes its cortical architecture. Roberts [11] published
a paper in 1963 describing a technique for obtaining three-dimensional data from
two-dimensional pictures of solid objects. Simply stated, the external world has been
reduced to a collection of geometric shapes that are flat on the surface. According to
[12], the discovery that vision is hierarchy was made in 1982. The vision system’s
primary role is to produce three-dimensional world representations with which the
user may interact. Early on in the development of a perception system, low-level
algorithms for line detection, curve detection, and corner detection were used as
stepping stones to a high-level understanding of visual information [13].
Simultaneously, this paper describes the construction of a self-organizing network
simulator comprising simple and complicated cells capable of pattern recognition
and that is not influenced by changes in location. Numerous convolutional layers
were used in this example, with weight vectors serving as filters in their receptive
fields. Following the completion of correct calculations, the filters were intended
to generate activating events that would be used as inputs for future layers of the
network in order to function properly. Various commercial text recognition and zip
code decoding programs have been launched [14], with the most recent being text
Detection of COVID-19 Using Segmented Chest X-ray 587
recognition plus. After everything was said and done, the MNIST data collecting
system, which utilized handwritten numbers, was viable to create. Around the year
1999, a lot of scholars concentrated their efforts on identifying artifacts based on
their physical characteristics [15].
Species-specific traits that are unchangeable in terms of rotation and position,
and to a lesser degree, orientation, changes in light have been created to assist in
the recognition of objects in a visual recognition system. The first real-world facial
recognition program was implemented a few years later, in 2001 [16]. Despite the
algorithm’s lack of attention on deep learning, it has figured out which characteristics
assist in facial recognition. A standardized picture collection, as well as a set of
common evaluation criteria, was seen as being immediately necessary when the field
of computer vision first started to take form, and the group set out to create these as
soon as they could.
It was established in 2010 that the ImageNet Massive Visual Recognition Compe-
tition (ILSVRC) would be held. During the event, the most inventive submissions are
judged for this award, which is given out on an annual basis. With over one million
images, ImageNet has established itself as the gold standard for categorizing and
characterizing objects across a wide variety of object categories, and it continues to
do so. On average, the ILSVRC error rate in picture description was approximately
26% during 2010 and 2011.
In 2012 [17], researchers at the University of Toronto created a convolutional
neural network that can recognize faces had a 16.4% mistake rate when completing
picture identification tests. CNN’s history has been transformed by this incident.
Microsoft research paper [17] has achieved remarkable achievements in the fields
of object detection and identification, as well as in the tasks of localization, via the
use of its Residual Network, or ResNet. When applied to the ImageNet test combo,
this combined effect of residual nets resulted in a 3.57% inaccuracy when compared
to the baseline. This achievement earned the team first place in the 2015 International
Laser Scan and Classification Competition, which took held in Germany (Table 1).
3 Proposed Method
The Residual Network (ResNet) outperforms prior classification networks, such as

CNN and others, in image classification tests [17]. A growing number of people
are becoming interested in deep neural networks, despite the fact that they need a
large quantity of data to train and operate at the cutting edge of technology. Beyond
numbers, hyperparameters such as learning rate and drop-out values are often found
to be important in obtaining the best possible results in the shortest amount of time
and minimizing the issue of overfitting [20]. On the other hand, selecting a random
value for the hit hyperparameter and testing approach might be time consuming and
inefficient. It is critical to evaluate the learning rate value, as a value that is too low
makes neural network training inefficient and time-consuming. However, a number
that is too high may result in divergent behavior in the loss function [21], which is
Table 1 Review on computer vision

Author Methodology Features Challenges
Robert [11] Machine perception of • Faster and simpler • Need of regular
3D solids process monitoring
• Accurate outcome • Lack of specialist
• Cost reduction
Maar [12] Vision: A • Perception system • Low-level algorithm
computational was developed
investigation in to the
human representation
and processing
Fukushima [13] The Neocognitron is a • Self-organizing • Several convolution
hierarchical neural artificial network layers
network that is capable • Recognize pattern
of visual pattern
identification and
classification
Lecun et al. [14] Recognizing • Minimal • Architecture of
handwritten digits pre-processing of network was highly
using a data was required constrained
back-propagation • Specially designed
network is possible for a task
Lowe [15] Local scale invariant • Features are robust • Quite slow
features to Occlusion and • Cost long time
clutter
Viola and Jones [16] Face recognition that • AdaBoost classifier • Harsh backlighting
is accurate in real Have faster • Occlusions
time. The International performance
Journal of computer • Extra voting
vision is a
peer-reviewed journal
that publishes research
in computer vision
Krizhevsky et al. [18] Deep neural networks • For image • Big neural network
are used to classify identification tasks, required
images in imagenets CNN produced an
error rate of 16.4
percent, according to
the results
Zhang et al. [17] In order to recognize • Recognizability and • In VGG no. of
images, deep residual identification of parameters
learning is required objects • Reduces storage
space
Punia et al. [19] For the detection of • Detect COVID-19 • Lack of data qual
COVID-19, computer • Differentiates • Limited data set
vision and radiology pneumonia and
are used COVID-19
undesirable. In this paper, we provide a method that is based on ADADELTA [22]

and is designed to choose an acceptable learning rate value while eliminating the use
of hit and try. It is necessary to train the neural network using a batch of 128 pictures,
and the loss function is calculated by analyzing the neural network’s structure.
3.1 The Transfer Learning
In machine learning, transfer learning is a subset in which a method that has previ-
ously been developed for one task is replicated as a starting point for a different
job [23]. It is a kind of machine learning method in which a method that has been
developed for one task is utilized as a starting point for the next activity, as opposed
to traditional machine learning. It is not planned to publish the present research
owing to the fact that the dataset is too tiny to provide significant findings. For the
purpose of achieving exceptional outcomes, the method makes use of an existing
neural network that has been trained on a bigger dataset. It is thus being used as the
foundation for a new model that takes use of the accuracy of the previous network in
order to achieve a particular objective. Because of a number of factors, this method
for optimizing the outputs of a neural network trained on a short dataset has acquired
increasing popularity in recent years, including its effectiveness. Image classification
tasks were carried out with the help of ResNet-50 (Fig. 1). Initially, it was trained
using the ImageNet data set, which included about 3.2 M images.
With the use of transfer learning and the data set that was acquired, both previously
learned architectural models were re-trained and fine-tuned. ResNet-50 is designed
in the same manner as ResNet-34, and it is divided into five stages. However, each
convolution block is composed of three convolution layers, for a total of 23.52 million
trainable parameters.
3.2 Segmentation of Chest x-ray
The initial step toward automated cardiothoracic ratio computation is to create a

segmented CXR picture with the lungs segregated from the background and import
it into a data processing system. Segmentation of the lungs can be achieved in a
variety of ways. Segmentation of the image is performed using an Euler number-
based thresholding technique. After isolating the lungs, the picture may be examined
to ascertain the presence of cardiomegaly. Preprocessing CXR pictures is essential
to increase the overall quality of the segmented images. If the CXR image has an
overwhelming quantity of background, the chest area of the image must be cut
off before to usage. By expanding the size of the histogram, histogram equalization
increases the image’s contrast. When producing a smooth image, the two-dimensional
Gaussian operator is used to preserve edge details, while achieving a smooth image.
Additionally, the Gaussian operator is used to remove the noise. As seen in the
Fig. 1 Resnet-50 Architecture
equation, the picture is segmented into a collection of regions R, each of which is

made of homogeneous, non-overlapping, and linked sub-regions.
R = {Ri : i = 1, 2, 3, ... N } (1)
The original image, represented by the equation, is formed by the union of all
subregions.
I = R1 ∪ R2 ∪ R3 .... ∪ R N (2)
For every i = 0,1,2, … ,N, the regions Ri should be connected, and each area Ri
should be homogeneous. Disjointness between adjacent regions Ri and Rj should be
maintained, i.e.,
Ri ∩ R j = ∅ (3)
Several research groups have focused their efforts during the last decade on
the segmentation of the lung fields in chest X-rays. Numerous solutions have
been proposed. There are various generalized classifications of solutions, including
rule-based approaches, pixel classification-based methods, deformable model-based
approaches, and hybrid strategies. We utilize a rule-based segmentation technique

that incorporates thresholding and morphological techniques. Despite the fact that a
number of segmentation techniques are available, the most often used approach in
practice is thresholding. You can convert an input image I to a binary image B via
thresholding.
T value determination was carried out using histogram-based methods. However,
a significant drawback is that the image’s coherence cannot be guaranteed. As a
result of the segmentation, holes and extraneous pixels may appear in the segmented
image. We propose an a Euler number-based approach for calculating the threshold T
in order to preserve the image’s coherence. Euler number thresholding has been used
in real-time applications [24], but it is not used for picture thresholding. When an
image is assigned a Euler number 25–27, this signifies that the topological structure of
the picture has been captured and stored. Transforms such as translations, rotations,
scale changes, affinities, projections, and even non-linear transformations such as
the deformation of the forms included inside an image have no impact on the Euler
number, according to the Euler number. Calculating the Euler number of a binary
image can be done mathematically in two ways: either globally, where C denotes the
number of regions in the image (number of connected components of the object) and
H denotes the number of holes in the image; or locally, where the number of regions
in the image (number of connected components of the object) is denoted by C and
H denotes the number of holes in the image is denoted by H. (isolated regions of the
images background). In order to calculate the Euler number E of a binary picture, it
is possible to utilize local computations, as shown in the following equation.
E(t) = 1/4 [ q1 (t) − q3 (t) − 2qd (t)] (4)
where E(t) is the threshold value that was used to generate the binary picture from a
gray level image, q1 is the number of 2 × 2 matrices in the image that include a single
1 and the rest 0’s, and q2 is the number of 2 × 2 matrices in the image that contain a
single 1 and the remainder 0’s. There are four distinct matrices from which the value
of q1 may be determined. q3 denotes the number of 2 × 2 matrices included inside
the picture, with three 1’s and one 0’s in each row and column. There are four distinct
matrices that might be counted as q3 . The letter qd is used to represent the number
of diagonal 2 × 2 matrices. There are two different sorts of qd matrices that may be
constructed. The Euler number E is computed for each given binary picture. In the
instance of a chest X-ray, it is anticipated that the segmentation technique will be
utilized to divide the picture into two lung regions. As a consequence, the expected
Euler number is 2, owing to the expected number of linked components being two
and the expected number of holes being zero. As a result, the Euler number should
be anticipated to be two. The formula for computing Euler’s number E is as follows.
E = c−h (5)
It has been proven that a graph with unique threshold values on the X-axis and
related Euler numbers on the Y-axis can exhibit fading exponential behavior for
a given picture [28, 29]. As a consequence, a matching threshold value may be
determined for a given Euler number. As a consequence of this observation, the
second possibility has been ruled out; thus, this a singleton set that contains a single
value, which is the required threshold value. The Lung Segmentation Algorithm
consists of the following steps:
1. Following grayscale acquisition, it is required to convert the grayscale picture to
a binary image. As seen in Eq. 4, a binary image B is generated by transforming
an input image I to a binary image B.
2. The Euler number-based thresholding approach is used to calculate the value
of T. A certain chest X-ray is related with the threshold for Euler number 2,
indicated as T. In order to determine the Euler number, use the formulae E =
C−H, where E indicates the Euler number, C the number of linked components,
and H represents the number of holes. In the CXR that has been given, there
are two linked components without holes. E = 2 as a result.
3. Eliminate the black zone that occurs in the CXR image’s four corners using the
Breadth First Search Algorithm.
4. Apply an erosion and dilution technique to the lung borders, smoothing them
out with the aid of a disk as a structural element.
5. Erosion is a process in which a structuring element S does an operation on a
binary image B that results in data loss (denoted B S). It generates a new binary
image, Be = B S. Erosion is the process by which a layer of pixels is removed
from the inner and outer boundaries of a region of interest [30].
6. A structural element S performs a dilation operation on the image B. The
following is a definition: Dilation has the polar opposite effect on the land-
scape as erosion does. When dilation is utilized, a layer of pixels is added to
both the inner and outside boundaries of areas.
7. Using the boundary acquired in step 6, initialize the snake’s points using a
random point selection technique. On the boundary established in step 6, apply
a random point selection procedure.
8. Each control point’s snake energy is reduced to its simplest form. The greedy
snake believes that by minimizing energy at each control point, and the overall
quantity of energy is reduced.
9. Examine the acquired picture to aid in the diagnosis of a variety of diseases.
Figure 2 illustrates the experimental setup schematically. OpenCV and MATLAB

computer languages are used for image preprocessing, segmentation, augmentation,
and manipulation. We used Adam Optimizer to further train pre-trained ResNet-50
models, which we initialized with random weights and then further trained using
Fig. 2 Schematic representation of COVID-19 detection using segmented CXR
Adam Optimizer. As previously stated, a batch size of 128 with a learning rate of 0.1
was used to train the model for a total of 10 epochs.
4.1 Training
Prior to commencing the training method, pre-process the dataset. It enhances

the training data set through the use of randomized augmentation. Additionally,
augmenting enables networks to be trained to be invariant to aberrations in image
data. Pre-processing includes resizing and grayscale conversion. We used the Euler
number to segment lung regions in MATLAB and showed the segmented picture
using the suggested techniques (Fig. 3). By charting numerous variables throughout
training, one may ascertain the training’s progress. When you set the ‘Plots’ option
in training Options to ‘training-progress’ and begin network training, train Network
creates a figure and shows training metrics for each iteration. Each cycle calculates
the gradient and modifies the network parameters. If the validation data is included
in the training options, the figure displays validation metrics for each time the train
Network validates the network (Fig. 4).
It displays training accuracy, validation accuracy, and train loss, among other
metrics.
4.2 Evaluation
The confusion matrix (Fig. 5) and receiver operating characteristic curves (Fig. 6)
depict the performance of the system (Table 2). Confusion matrix chart generates a
Fig. 3 Segmented CXR
Fig. 4 Training progress of Resnet-50
confusion matrix chart object from a confusion matrix chart with true and anticipated
labels. The actual class is represented by the rows of the confusion matrix, while the
expected class is represented by the columns of the confusion matrix. Observations
that were properly classified are represented by diagonal and off-diagonal cells,
respectively, whereas observations that were erroneously classified are represented
by off-diagonal cells.
The multiclass receiver operating characteristic (ROC) curve revealed that the
ResNet 50 model performed very well, attaining statistically significant features
(Table 2). This result established that the proposed technique was the most precise,
Fig. 5 Normalized confusion matrix
Fig. 6 ROC curve

Table 2 Features of ROC

Accuracy 0.96104
curve
Sensitivity 0.96296
Specificity 0.96
Precision 0.92857
Recall 0.96296
f_measure 0.94545
g mean 0.96148
with a false positive rate of just 0.60. (misclassification). Thus, the ResNet 50 CNN
can enhance the categorization of COVID-19 images.
5 Conclusion
In this paper, we present a technique for detecting the COVID-19 virus by analyzing
X-Ray pictures that we have generated. Additionally, the method developed differen-
tiates between people suffering from pneumonia and those suffering from COVID-
19, a distinction that is necessary since the symptoms of both diseases are similar
and patients often confuse the two. A COVID-19 test kit is much more costly than
utilizing an X-ray to identify the presence of COVID-19, and it is not nearly as quick
as current thermal imaging methods. This implies that airports, hotels, and retail
malls may all utilize it for basic screening. The authors believe that their study will
inspire other researchers to create other methods for detecting potential COVID-19
infection that do not rely on medical COVID-19 test kits. COVID-19 detection with
segmented CXR has a higher detection rate than in prior examples. The suggested
hybrid lung segmentation approach may be used to estimate heart boundaries. It has
the potential to be used as a screening tool for lung disease. We hope that our study
inspires other researchers to create other methods for identifying viral infection in
patients that do not need the use of COVID-19.
References
1. Lu H, Stratton CW, Tang YW Outbreak of Pneumonia of unknown etiology in Wuhan China:

the mystery and the miracle. J Med Virol
2. Gorbalenya AE (2020) Severe acute respiratory syndrome-relatedcoronavirus–the species and
its viruses, a statement of the coronavirus study group. BioRxiv
3. Huang C, Wang Y, Li X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z (2020)
Clinical features of patients infected with 2019 novel coronavirus in Wuhan China. The Lancet
395(10223):497–506
4. Wang C, Horby PW, Hayden FG, Gao GF (2020) A novel coronavirus outbreak of global health
concern. The Lancet 395(10223):470–473
5. Carlos WG, Dela Cruz CS, Cao B, Pasnick S, Jamil S (2020) Novel Wuhan (2019-nCoV)
Coronavirus. Am J Respir crit Care Med 201(4):P7–P8
6. Zhou P, Yang XL, Wang XG, Hu B, Zhang L, Zhang W, Si HR, Zhu Y, Li B, Huang CL, Chen
HD (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin.
Nature 1–4
7. Lai CC, Shih TP, Ko WC, Tang HJ, Hsueh PR (2020) Severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and
thechallenges. Int J Antimicrob Agents 105924
8. Ruuskanen O, Lahti E, Jennings LC, Murdoch DR (2011) Viral pneumonia. The Lancet
377(9773):1264–1275
9. Bartlett JG, Mundy LM (1995) Community-acquiredpneumonia. N Engl J Med 333(24):1618–
1624
10. Marrie TJ (1994) Community-acquired pneumonia. Clin Infect Dis 18(4):501–513
11. Lee JY, Yang PC, Chang C, Lin IT, Ko WC, Cia CT (2019) Community-acquired adenoviral and
pneumococcal pneumonia complicated by pulmonary aspergillosis in an immunocompetent
adult. J Microbiol Immunol Infect Weimianyugan ran zazhi 52(5):838
12. Su IC, Lee KL, Liu HY, Chuang HC, Chen LY, Lee YJ (2019) Severe community-acquired
pneumonia due to Pseudomonas aeruginosa coinfection in an influenza A (H1N1) pdm09
patient. J Microbiol Immunol Infect 52(2):365–366
13. Hubel DH, Wiesel TN (1959) Receptive fields of singleneurones in the cat’s striate cortex. J
Physiol 148(3):574–591
14. Roberts LG (1963) Machine perception of three-dimensional solids. Doctoral dissertation,
Massachusetts Institute of Technology
15. Marr D (1982) Vision: a computational investigation into the humanrepresentation and
processing of visual information
16. Fukushima K (1988) Neocognitron: a hierarchical neural network capable of visual pattern
recognition. Neural Netw 1(2):119–130
17. LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990)
Handwritten digit recognition with a back-propagation network. In: Advances in neural
information processing systems, 396–404
18. Lowe DG (1999, September) Object recognition from local scaleinvariantfeatures. In iccv
99(2):1150–1157
19. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vision 57(2): 137–154
20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learningfor image recognition. In:
Proceedings of the IEEE conference oncomputer vision and pattern recognition, pp 770–778
21. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009, June) Imagenet: a large-scale hierar-
chical image database. In 2009 IEEE conference on computer vision and pattern recognition,
pp 248–255. IEEE
22. Behera L, Kumar S, Patnaik A (2006) On adaptive learning rate that guarantees convergence
in feedforward networks. IEEE Trans Neural Networks 17(5):1116–1125
23. Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXivpreprint arXiv:1212.5701
24. Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–
1359
25. Li G, Müller M, Thabet A, Ghanem B (2019) Can GCNs goas deep as CNNs? arXiv preprint
arXiv:1904.03751
26. Lakhani P, Sundaram B (2017) Deep learning at chest radiography: automated classification of
pulmonary tuberculosis by using convolutional neural networks. Radiol Soc North Am 16–26
27. He L-F, Chao Y-Y, Suzuki K (2013) An algorithm for connected-component labeling, hole
labeling and Euler number computing. J Comput Sci Technol 28(3):468–478
(29 pages). World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056
29. Nair AT, Muthuvel K Blood vessel segmentation and diabetic retinopathy recognition: an intel-
ligent approach. Computer methods in biomechanics and biomedical engineering: imaging &
visualization. Taylor & Francis. https://doi.org/10.1080/21681163.2019.1647459
nosis of diabetic retinopathy. Int J Image Graph 20(4):2050030 (29pages). World Scientic
31. Nair AT, Muthuvel K, Haritha KS (2020) “Effectual evaluation on diabetic retinopathy”
publication in Lecture Notes. Springer
32. Nair AT, Muthuvel K, Haritha KS (2021) “Blood vessel segmentation for diabetic retinopathy”
publication in the IOP. J Phys Conf Ser (JPCS). Web of Science
33. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenetclassification with deep convolutional
neural networks. In: Advances in neural information processing systems, pp 1097–1105
34. Punia R, Kumar L, Mujahid M, Rohilla R (2020) Computer vision and radiology for COVID-
19 detection. In: 2020 international conference for emerging technology (INCET) Belgaum,
India, 5–7 Jun 2020
A Dynamic Threshold-Based Technique
for Cooperative Blackhole Attack
Detection in VANET
P. Remya Krishnan and P. Arun Raj Kumar
Abstract VANET is a highly dynamic network, where the vehicles frequently move
around various locations. Due to its rapidly changing topology and unreliable security
infrastructure, the routing protocols in VANET are vulnerable to several attacks such
as DDoS attacks, blackhole attacks, and wormhole attacks. This paper focuses on a
cooperative blackhole attack where several malicious nodes collaborate to execute
the attack. The attacker nodes drop all packets they receive. We present a security
technique to detect the cooperative blackhole attackers by analyzing the dropped
packets at each node. Using linear regression to determine the packet drop threshold
helps our proposal to improve its accuracy further. The simulation results show that
our proposed system provides a high detection accuracy of 99.78% and false positives
limited to 0.025%.
Keywords VANET · Security · Blackhole attack · Packet drop · Machine

learning · Dynamic threshold
1 Introduction
VANET is a vehicle network whose principal goal is to ensure safe driving and
efficient traffic flow. It is a part of intelligent transport systems (ITS). The com-
munication in VANET occurs in two modes, vehicle-to-vehicle (V2V) and vehicle-
to-infrastructure (V2I) communication. VANET uses multi-hop intermediate nodes
to transfer the messages among the vehicles outside the communication range [1].
VANET has several complex features such as high density, dynamic topology, rapid
changes in the environment, interference, mobility, short-lived connections, etc., that
leads to packet loss in the network. The routing protocols in VANET must consider
these factors to improve the network performance. The well-known routing protocols
in VANET include AODV, DSR, DSDV, and OLSR [2]. Evaluating the efficiency of
existing routing protocols shows that the AODV has a better performance in VANET
P. Remya Krishnan (B) · P. Arun Raj Kumar

National Institute of Technology Calicut, Kozhikode, Kerala, India
https://doi.org/10.1007/978-981-16-7610-9_44
600 P. Remya Krishnan and P. Arun Raj Kumar
scenarios concerning its end-to-end delay, throughput, packet delivery ratio, and
latency [3].
The VANET routing protocols should be robust against attacks. Hence, finding a
secure and suitable routing path in the rapidly changing topology of VANET becomes
a critical issue [4]. The blackhole attack is a severe packet-dropping attack in VANET,
as it discards all data packets that pass through it without forwarding them to their
intended destination. As a result, the critical messages may not reach the destination
on time, and the VANET safety application may fail [5]. Depending on the count of
attacker nodes that execute the attack process, a blackhole attack can be either single
or cooperative [6]. This paper suggests a method for secure routing in VANET using
AODV by eliminating the cooperative blackhole attack in VANET [7].
The following are our contributions in this paper:
1. We propose a security mechanism for detecting cooperative blackhole attacks in
VANET based on dropped packet analysis at each node.
2. Determination of dynamic packet drop threshold in the network using linear
regression.
3. A less complex detection algorithm for blackhole attacker nodes takes O(cn)
time, where ‘c’ is the observation time, and ‘n’ is the number of vehicles in the
RSU range.
The remaining part of this paper is structured as follows: Sect. 2 discusses the existing
works. Section 3 goes into detail about the proposed detection method. Section 4 gives
the simulation setup and analysis. We conclude the paper in Sect. 5.
2 Related Works
Several works have been done to ensure the quality of routing in VANET. Hortelano
et al. [4] proposed a watchdog-based blackhole detection mechanism for VANET. In
this approach, each sender node verifies whether the receiver retransmits the packet or
not. Each vehicle keeps a trust value for its neighbor vehicles. Moreover, if a neighbor
node drops packets higher than a predefined threshold, it is labeled malicious. The
researcher has proved that the solutions proposed for routing attacks in MANET are
also applicable to the VANET scenario. Banarje [6] introduced an approach to identify
cooperative blackhole/grayhole attacks in VANET by monitoring the neighbor traffic.
The source sends a prelude message before transmission, and the destination responds
via an acknowledgment in the postlude message. Each node keeps a list of blacklisted
nodes. The approach is well-explained but lacks simulation results and performance
analysis. In [7], a customized algorithm is introduced to guarantee the security and
performance of the AODV in VANET toward the blackhole and grayhole attacks.
The approach identifies attacker nodes according to their behaviors and deletes them
from the routing process.
Ramaswamy et al. [8] implemented a technique to evict cooperative blackhole
nodes. It presents a trust-based algorithm that uses data routing information (DRI)
A Dynamic Threshold-Based Technique … 601
information for monitoring the trusted nodes. The approach is not applicable in gray-
hole attack detection due to its reliance on trust values. Agrawal et al. [9] presented
an approach for detecting cooperating attacker nodes. The approach is based on a
backbone node called a vital node that is considered trusted. The strong nodes are in
charge of detecting the malicious nodes. The approach fails when the vital node is
compromised.
Wahab et al. [10] also detected the blackhole nodes using watchdogs. The method
works in five phases. The reputation phase is to calculate vehicle reputation values.
The watchdog phase is for monitoring. The voting phase is to collect data, the fourth
phase is for reliability check, and the fifth phase is for information propagation.
The outcome is a list of blackhole attackers. In [11], the performance of blackhole
mitigation techniques in MANET is analyzed for grayhole attacks. A multi-attack
detection system for blackhole and grayhole attacks in MANET is implemented by
Ali et al. [12], it chooses watchdog node based on connected dominated set concept.
In this approach, before adding a node into the watchdog set, its energy and non-
existence in the blacklist are cross-checked.
Ankit et al. [13] developed a modified AODV protocol to eliminate blackhole
attacks. The RREQ and RREP packets are modified for routings. It also uses a cryp-
tographic mechanism for verification. The performance of the approach is demon-
strated using NS2-based simulations. Ankit et al. [14] implemented another tech-
nique to minimize the causes of the blackhole attack by finding alternative paths to
the destination or using the sequence number in packets.
Most of the existing approaches mentioned here detect/prevent the blackhole
attack by bringing modifications to the AODV protocol or using an additional back-
bone node to perform the watchdog mechanism. However, our proposed approach
does not need any additional infrastructure or changes to the existing protocols.
Instead, the blackhole nodes are detected using a dynamic threshold determined for
packet drop at each node.
3 Proposed Dynamic Threshold-Based Detection

of Cooperative Blackhole Attack in VANET
This section presents the attack model and assumptions. Then, we discuss the pro-
posed methodology for detecting the cooperative blackhole nodes in detail. Figure 1
shows the flow diagram of our proposed detection approach.
3.1 Attack Model and Assumptions
This approach considers the attack scenario where multiple blackhole attacker
nodes cooperate themselves to drop the packets. The routing is done based on the Ad-
Fig. 1 Proposed system model
Fig. 2 Attack model
Hoc On-Demand Distance Vector (AODV) protocol. In AODV, the network topology
is not maintained in the routing table all the time. Instead, the best suitable path to
the destination is selected only when the source makes a request. Hence, due to the
dynamic behavior and rapidly changing topology of VANET, AODV is suitable for
its environment [1].
Figure 2 shows an attack scenario in VANET with multiple blackhole nodes. In
this scenario, the vehicles labeled A, B, F, and D are normal nodes, and vehicles
‘E’ and ‘C’ are the blackhole attacker nodes. The attacker nodes place themselves
in positions such that they receive the maximum traffic from the normal nodes. The
RREQ and RREP indicate the flow of route requests and route response packets sent
between the vehicles.
When source A requires a route to destination D, it makes a route request (RREQ)
to its neighbor nodes. Upon receiving this RREQ request, the blackhole attacker
nodes C and E along the path to destination D forge a false route response (RREP)
with a high sequence number and less hop count value. This forged RREP is sent
back to source A. Thus, source A selects either of the forged routes A-E-D (2 hops)
or A-C-D (2 hops), discarding the RREP from the original neighbor node B with
route A-B-F-D (3 hops) [15]. When the route is created, source A sends the data
packets to D, and the consequence is that the blackhole nodes E or C may drop all
received packets.
The detection mechanism we propose aims to identify these attacker nodes ‘E’
and ‘C’ based on their packet drop rate. According to our mechanism, the RSU
observes all vehicles in their range and records their packet drop values. The RSU
then computes a threshold for the tolerable packet drop value and compares the
packet drop at each node. The nodes that drop packets above this threshold will be
removed from the network, considering them as the attacker nodes. The following
are the assumptions made in our proposed approach:
• Assumption 1: Every node in range of an RSU broadcasts a beacon message that
contains the vehicle ID.
• Assumption 2: The incoming and outgoing traffic of each node is visible to RSU.
3.2 Proposed Detection Mechanism
In a cooperative blackhole attack, the attackers will drop every data packet they
receive that affects the network’s performance. Here we present a mechanism to find
the blackhole nodes in VANET based on the mean of dropped packets measured
from the vehicular nodes over an observation period. The RSU is responsible for
performing the detection process for all vehicles in its range. From Fig. 2, suppose
the source node A establishes a path to destination D via the blackhole node E,
i.e., Route (A–E–D), and starts sending data packets. Then, the RSU performs the
following steps:
1. Calculate the dropped packets at each vehicle
Consider a vehicle Vi , where the packets are either received properly or dropped
after the reception. Also, the packets may be sent properly or get dropped while
sending. The RSU calculates the packet drop rate PDi of a vehicle Vi using Eq. 1
[16].
(packetRD + packetSD )
PDi = (1)
(packetRC + packetSC + packetSD )
where,
PDi —Packet Drop (PD) at vehicle Vi
packetRD —Packets received but dropped.
packetSD —Packets sent but dropped.
packetRC —Packets received correctly.
packetSC —Packets sent correctly.
2. Determine the Dynamic threshold for Packet Drop (PD)

After the collection process, the RSU gets a set of packet drop values for all
vehicles in its range. However, the packet drop at a vehicle can be due to several
reasons other than the intentional drop. An increase in traffic density or vehicle
mobility may lead to packet discard, drop from the queue, packet lifetime expi-
ration, etc., which results in packet drop at a normal node. So, we cannot classify
every node that drops the packet as an attacker node. We need to set a threshold
value to separate the normal nodes from the attackers based on the packet drop
rate. However, if we set a constant threshold when traffic density or mobility
increases, it may result in false positives.
To solve this problem, we consider the threshold as a function of vehicle den-
sity and vehicle mobility. Moreover, the determination of the threshold is defined
as a binary classification problem that seeks the optimal decision boundary in
the density-packet drop plane and mobility-packet drop plane. There exist several
methods in machine learning for classification. In this paper, we use linear regres-
sion (LR) to finalize the threshold [17]. Linear regression gives a line that best
fits the data points available on the plot to differentiate among blackhole attackers
and genuine nodes [18]. Traffic density is the vehicle count present in a given
length of road. It is expressed as vehicles/kilometers. The RSU can estimate the
traffic density D at a particular road segment as follows [19]:
Vehicle_Count
d= (2)
Segment_Len
Here, Vehicle_Count is the number of vehicles an RSU can hear within the density
estimation period. Segment_Len is the length of the road occupied by the vehi-
cles. Thus, for an estimated density D and mobility M, the RSU can determine
the Permissible packet drop thresholds (δ1 and δ2 ) based on linear regression as
follows:
δ1 = s1 + b1 D (3)
δ2 = s2 + b2 M (4)
Here, s1 and b1 are the slope and intercept of the decision boundary in the density-
packet drop plane, and s2 and b2 are the slope and intercept of the decision
boundary in the mobility-packet drop plane. The above parameters can be learned
using data from our simulation experiments.
First, we run several simulations with various traffic densities and node mobilities
and record all measured packet drops for verification. Then, these values are
used as training data to identify the optimal decision boundaries. Figures 3 and
4 show the obtained results. After training, we obtain the parameters, Slope s1
and Intercept b1 from node density-packet drop plane in Fig. 3 and Slope s2 and
Intercept b2 from node mobility-packet drop plane in Fig. 4.
Fig. 3 Threshold of varying

node density
Fig. 4 Threshold of varying

node mobility
3. Procedure of cooperative blackhole node detection

Algorithm 1 describes the procedure for detecting blackhole nodes using our
proposed approach. From the threshold determination results shown in Figs. 3
and 4, it is visible that the packet drop at normal nodes is minimal, which is
closer to 0, while the blackhole nodes are much higher. Our detection algorithm
first estimates the density D and mobility M of the current scenario. Then, the
threshold values δ1 and δ2 are calculated by putting the values ‘D’ and ‘M’ in
Eqs. 3 and 4.
Next, we find the packet drop value PDi at each vehiclei and compare it with the
threshold values (δ1 and δ2 ) to distinguish the cooperative blackhole nodes from
normal nodes as follows:
if (P Di ≥ δ1 or P Di ≥ δ2 ) then
Add V ehiclei to Blackhole_list.
end if
The Blackhole_list containing the IDs of attacker vehicles is broadcast to all vehicles
in the range of the RSU. Upon receiving this list, the genuine vehicles will discard
these attacker nodes from their routing choices.
The proposed algorithm’s complexity is calculated as O(cn), where c is the obser-
vation time, which is assumed to be constant, and n gives the number of vehicles in
RSU range. As a result, our algorithm’s complexity is linear, as it grows in direct
proportion to the size of n.
Algorithm 1: Blackhole node detection using dynamic packet drop threshold

Input: V = V1 , V2 ...Vn set of vehicle IDs in range of the RSU.
P Di : Dropped packets from vehicle Vi
D: estimated traffic density.
M: estimated traffic mobility.
s1,s2: Slope of the optimal decision boundaries
b1,b2: Intercept of the optimal decision boundaries
C: Critical Time
Output: Blackhole_list = list of IDs of blackhole attacker nodes.
1: for T = 1 to c do
2: Compute the packet drop threshold for density D: δ1 = s1 + b1 D
3: Compute the packet drop threshold for mobility M: δ2 = s2 + b2 M
4: for 1 to n do
( packet R D + packet S D )
5: Packet drop P Di at V ehiclei = ( packet RC + packet SC + packet S D )
6: if (P Di ≥ δ1 or P Di ≥ δ2 ) then
7: Blackhole_list = insert(V ehiclei )
8: end if
9: end for
10: end for
11: Broadcast the Blackhole_list to all vehicles in the network.
12: Eliminate the vehicles in Blackhole_list from routing.
4 Simulation and Result Analysis
4.1 Simulation Setup
We simulated the VANET environment using SUMO traffic simulator version 0.32.0
with Open Street Map [20] and the entire simulation is ported to version 2.34 of NS-2
[21] for packet tracing and network animation. Table 1 shows the simulation parame-
ters. Individual simulations were run by varying the node density and mobility in the
network in which 10% of nodes were set as malicious blackhole nodes that performs
packet drop. The entire simulation is run for 100 s and performs the detection every
10 s. The observation period of 10 s is determined by conducting several simulation
experiments.
Table 1 Parameters for simulation

Parameters Values
Simulator NS2
Area 800 * 800 m
Simulation time 100 s
Antenna Omni antenna
Number of Vehicles 20–200
Packet Generation Rate 10 packets per second
MAC protocol 802.11p
Routing protocol AODV
4.2 Performance Metrics
We assess the efficiency of proposed technique based on the following metrics

• Detection Rate (DR): It is the rate at which vehicles are correctly identified as
the blackhole nodes. The following expression defines it. Here, TP refers to true
positive, i.e., the blackhole nodes correctly classified as attackers [22]. False-
negative (FN) is the black hole nodes falsely identified as genuine node [23].

TP
DR = ∗ 100 (5)
TP + FN
• False Positive Rate (FPR): The normal nodes falsely found as blackhole nodes.
Here, true negative (TN) gives the genuine vehicles correctly classified [22].

FP
FPR = ∗ 100 (6)
FP + TN
4.3 Result and Comparison
We analyzed the efficiency of our technique by changing the number of nodes

in the network at various mobility levels. A 10% total nodes were set as blackhole
nodes. Figure 5 shows the detection rate of our technique. The highest detection
rate of 99.78% is achieved by the proposed approach at a mobility of 60 km/h.
Figure 6 demonstrates the plot for false positive Rate. The simulation has achieved
a minimum false positive rate of 0.025%. The results indicate that our technique
exhibits a high detection accuracy initially, and then there is a slight drop in it when
the number of nodes increases but stays within 98–99% range. Also, it is observed
that the false positive rate is less when there is a fewer number of nodes. As the
node count increases, the approach gives more false positives, but sooner it recovers,
Fig. 5 Detection rate of

proposed approach
Fig. 6 False positive rate of

proposed approach
and the false positive rate decreases. To justify, initially, there are only a few nodes.
Hence, the packet drop at normal nodes will be less. So the approach can easily
separate the attackers from genuine vehicles. However, when there is an increase
in the node count, the approach takes time to fine-tune the packet drop threshold
δ1 and δ2 using linear regression. Once the threshold for increased traffic density is
determined, the approach again improves its performance. Also, the approach shows
a better detection rate and the false positive rate at node mobility of 60km/h. Due to
mobility in VANET, there will be rapid changes in the topology that leads to frequent
route changes and affects the accuracy of determining the packet drop threshold δ1
and δ2 in our approach, affecting the detection and false positive rates.
We compare our approach with an existing approaches in [24–26] that detects
blackhole attacks in VANET. The comparison is made in terms of detection rate by
changing the attacker rate from 10 to 20%. In [24–26], the detection rate varies from
95–98% in the presence of 10% attacker nodes and 88–97% when 20% nodes are
attacker nodes. However, our approach maintains a detection rate of 97–99% in both
Fig. 7 Detection rate

comparison-10% attackers
Fig. 8 Detection rate

comparison-20% attackers
cases. The results of the comparison are shown in Figs. 7 and 8. From this, we can
observe that our technique attains good detection accuracy while there are a large
number of attackers compared to the existing approaches. We are not using a constant
threshold to classify the nodes into normal or attacker nodes in our approach. Instead,
as the node density increases, the RSU fine-tunes the packet drop threshold using
linear regression that helps in achieving a high detection rate.
5 Conclusion
This paper presents a technique to detect cooperative blackhole attacks in VANET

by analyzing the dropped packets at each node. The RSU determined the optimal
packet drop threshold δ1 and δ2 at each interval of time using the linear regres-
sion technique. The estimation of the packet drop threshold is critical in detecting
blackhole nodes accurately. The proposed system provides a high detection accu-
racy of 99.78% and false positives limited to 0.025%. Our method has a minimum
computational overhead compared with the existing techniques since we do not use
any complex cryptographic algorithms for the detection process. In the future, we
intend to evaluate the efficiency of our system for the grayhole and wormhole attack
detection in VANET.
Acknowledgements This work is supported by the funding agency Science and Engineering
Research Board (SERB), Government of India, under Core Research Grant (CRG) scheme. The
funding grant number is EMR/2016/007502.
References
1. Lu Z, Qu G, Liu Z (2019) A survey on recent advances in vehicular network security, trust,

and privacy. IEEE Trans Intell Transp Syst 20(2):760–776
2. Smys S, Vijesh Joe C (2021) Metric routing protocol for detecting untrustworthy nodes for
packet transmission. J Inf Technol 3(02):67–76
3. Mohamad F, Mohamad O, Abedelhalim H, Abdellah E (2014) Efficiency evaluation of routing
protocols in VANET. In: Third IEEE international colloquium in information science and
technology (CIST), October 2014
4. Hortelano J, Ruiz JC, Manzoni P (2010) Evaluating the usefulness of watchdogs for intrusion
detection in VANETs. In: IEEE conference on communications workshops, May 2010
5. Dhaya R, Kanthavel R (2021) Bus-based VANET using ACO multipath routing algorithm. J
Trends Comput Sci Smart Technol (TCSST) 3(01):40–48
6. Banarje S (2008) Detection/removal of cooperative black and gray hole attack in mobile Ad-
Hoc networks. In: Proceedings of the World Congress on engineering and computer science,
pp 337–342, October 2008
7. Shahabi S, Ghazvini M, Bakhtiarian M (2016) A modified algorithm to improve security and
performance of AODV protocol against black hole attack. Wirel Netw 22(5):1505–1511
8. Ramaswamy S, Fu H, Sreekantaradhya M, Dixon J, Nygard K (2003) Prevention of cooperative
black hole attack in wireless ad hoc networks. In: Proceedings of 2003 International conference
on wireless networks (ICWN’03), pp 570–575. Las Vegas, Nevada, USA
9. Agrawal P, Ghosh RK, Das SK (2008) Cooperative black and gray hole attacks in mobile ad
hoc networks. In: Proceedings of the 2nd international conference on Ubiquitous information
management and communication. ACM, New York, pp 310–314
10. Wahab OA, Otrok H, Mourad A (2014) A Dempster-Shafer based Tit-for-Tat strategy to regulate
the cooperation in VANET using QoS-OLSR protocol. Wireless Pers Commun 75(3):1635–
1667
11. Gurung S, Chauhan S (2019) ’Performance analysis of black-hole attack mitigation protocols
under gray-hole attacks in MANET. Wireless Netw 25(3):975–988
12. Ali Zardari Z, He J, Zhu N, Mohammadani KH, Pathan MS, Hussain MI, Memon MQ (2019)
A dual attack detection technique to identify black and gray hole attacks using an intrusion
detection system and a connected dominating set in MANETs. Future Int 11(3):61
13. Kumar A, Varadarajan V, Kumar A, Dadheech P, Choudhary SS, Ambeth Kumar VD, Panigrahi
BK, Veluvolu KC (2021) Black hole attack detection in vehicular ad-hoc network using secure
AODV routing algorithm. Microprocessors Microsyst 80
14. Kumar A, Dadheech P, Goyal D, Patidar PK, Dogiwal SR, Janu N (2021) A novel scheme for
prevention and detection of black hole & gray hole attack in VANET network. Recent Patents
Eng 15(2):263–274
15. Tobin J, Torpe C (2017) An approach to mitigate multiple black hole attacks in VANET. In:
16th Europen conference on Cyber Warfare and Security
16. Al-Ani AD, Seitz J (2015) QoS-aware routing in multi-rate Ad hoc networks based on ant
colony optimization. Netw Protoc Algorithms 7:1–25
17. Huang M (2020) Theory and implementation of linear regression. In: 2020 International con-
ference on computer vision, image and deep learning (CVIDL), pp 210–217
18. Dhende S, Musale S, Shirbahadurkar S et al (2017) SAODV: black hole and gray hole attack
detection protocol in MANETs. In: 2017 International conference on wireless communications,
signal processing and networking (WiSPNET), pp 2391–2394
19. Darwish T, Abu Bakar K (2015) Traffic density estimation in vehicular ad hoc networks. Ad
Hoc Netw 24(PA):337–351
20. Lim KG, Lee CH, Chin RKY et al (2017) SUMO enhancement for vehicular ad hoc network
(VANET) simulation. In: 2017 IEEE 2nd international conference on automatic control and
intelligent systems (I2CACIS), pp 86–91
21. Bavarva A (2013) Traffic detection in VANET using NS2 and SUMO. Int J Adv Res Comput
Sci Software Eng 3:1–7
22. Hichem S, Senouci SM (2015) An accurate and efficient collaborative intrusion detection
framework to secure vehicular networks. Comput Electr Eng 43:33–47
23. Tyagi P, Dembla D (2018) A secured routing algorithm against black hole attack for better
intelligent trans portation system in vehicular ad hoc network. Int J Inf Technol 04:11
24. Lachdhaf S, Mazouzi M, Abid M (2017) Detection and prevention of black hole attack in
VANET using secured AODV routing protocol. In: Proceedings of the 9th international con-
ference on networks and communications, pp 25–36, November-2017
25. Gautham PS, Shanmughasundaram R (2017) Detection and isolation of Black Hole in VANET.
In: 2017 International conference on intelligent computing, instrumentation and control tech-
nologies (ICICICT), pp 1534–1539
26. Hassan Z, Mehmood A, Maple C, Khan MA, Aldegheishem A (2020) Intelligent detection
of black hole attacks for secure communication in autonomous and connected vehicles. IEEE
Access 8:199618–199628. https://doi.org/10.1109/ACCESS.2020.3034327
Detecting Fake News Using Machine
Learning
Ritik H. Patel, Rutvik Patel, Sandip Patel, and Nehal Patel
Abstract The evolution of information and communication in this digital era has
increased the number of Internet accessible people. Internet has changed the way
information is consumed, and as its consequence, the fake news market has boomed
up. Fake news is one of the major concerns regarding the spread of Internet connec-
tivity because fake news has the potential to make high political damages to countries.
“Fake news” gained popularity during US electoral campaign. Fake news detection
works with application on natural language processing for clarifying and cleaning
up the news, and then, the model uses term frequency-inverse domain frequency
for further processing. Aim of this paper is computational approach automatically
detects fake news and also gives accuracy of the model.
Keywords Convolutional neural network · Deep learning · Face mask detection ·

Machine learning
1 Introduction
Fake news generally refers to false reports or misinformation which is shared in

the form of various images, articles, or videos which are often disguised as real
news which aims to manipulate people’s opinions. Fake news and lack of faith in
the media are regularly emerging problems with tremendous repercussions in our
culture. Obviously, “fake news” is a purposely misleading lie, but lately the talk of
online media babbling is changing its meaning. The concept is currently used by
some of them to ignore the evidence contrary to their preferred viewpoints. The
term “fake news” became common parlance since the US presidential elections.
The term “fake news” is used especially to portray really erroneous, disturbing,
R. H. Patel · R. Patel · S. Patel · N. Patel (B)

K D Patel Department of Information Technology, Faculty of Technology & Engineering (FTE),
Chandubhai S. Patel Institute of Technology (CSPIT), CHARUSAT, Changa, Gujarat, India
e-mail: nehalpatel.it@charusat.ac.in
S. Patel
e-mail: sandippatel.it@charusat.ac.in
https://doi.org/10.1007/978-981-16-7610-9_45
614 R. H. Patel et al.
and misdirecting articles distributed generally to bring in cash through site visits.
Facebook and particularly its child company WhatsApp have been at the epicenter
of much of the fake news spread. A function has already been introduced to flag fake
news on the Web when it is seen by a user in Facebook. Fake news and varying types
of false information spread can take on different forms. They have major impacts,
because information plays a major role in shaping the world view as humans make
important decisions based on information. How fake news affects the financial sector
can be explained using various examples. Whether it is a company, an institution, or
even a government, fake news has a big adverse effect of it on the financial sector.
Mental effects of fake news on a person/crowd is considered to be a very sensitive
issue; reportedly, there has been a huge spike of mental harassment cases due to
ruining one’s social image using fake news.
In politics also, fake news is considered to be a major player, not only during
the elections but also during the presidential terms. By the spread of fake news, the
reputation of a political party or even a politician can be hindered and in a bad way
too fake news is not a new thing in India as well. False accusations are not a new
thing nowadays; people can easily be misled by brainwashing them. Such was the
case in the recently ended CAA protest where the protesters did not know what they
were protesting for. So, the above points give brief information about how fake news
can have adverse impacts but to understand them more here are some recent cases
where fake news played a major role.
Regarding the CAA protests, the Supreme Court of India advised the central
government to consider “objectives and the benefits of the Citizenship Amendment
Act, a plea for publicizing aims” to cut out all the misleading news portrayed about
CAA. The plea lawyer stated that he had visited Jamia and Seelampur and observed
that more than 95% of the protesters did not even know what CAA is. They were
made to feel that the law will take back their citizenship. This was carried out using
deep fake videos circulated by accounts created in the neighboring countries. The
spread of those deep fake videos was to such an extent that the Indian Ministry of
External Affairs had to call out the Prime Minister of Malaysia for publicizing them
and also for making “factually inaccurate remarks” on CAA. Fake news was very
prevalent in 2019 elections as well. As Vice writes, political parties have weaponized
the platforms and misinformation was weaponries, respectively. It rose to such an
extent that Facebook went on to delete about one million account a day, spreading
false information [1].
Perhaps, the Kashmir issue is the best issue which exhibits the adverse ill effects of
fake news. Misinformation and disinformation related to Kashmir are widely preva-
lent; there have been many instances of photographs from the Syrian and Iraqi civil
wars being passed off with the intention of fueling violence and backing insurgen-
cies from the Kashmir conflict. After the Indian revocation of Article 370 of Jammu
and Kashmir in August 2019, misinformation relating to defense, public welfare,
lack of supplies, and other administrative issues followed. By getting Twitter to
remove accounts distributing fake informative news, the Ministry of Electronics and
Information Technology supported [1].
Detecting Fake News Using Machine Learning 615
2 Literature Review
Ghosh et al. [2] have used different combinations of support vector machine (SVM),
convolutional neural network (CNN), logistic regression (LR), bidirectional long
short-term memory (Bi-LSTM) algorithms. They used tweets from Twitter to create
dataset. The combined use of CNN layers and LSTM layers had given efficient model
with maximum accuracy.
Pantech Solutions Institute created a fake news detector in which count vectorizer
and TF-IDF vectorizer are used to transform the text and Naive Bayes classifier as
classifier. For testing of model, scikit-learn’s grid search functionality is utilized.
They found the optimize scenario for count vectorizer is with parameters two-word
phrases no single words, no lowercasing and to only use words that minimally three
times appeared in the corpus [3].
Rodríguez et al. [4] created three different neural architectures, two by their own
and one with the BERT language model. First model is based on SVM, second
is based on LSTM, and last on CNN. Article suggests that BERT is a pre-trained
language model which has maximum efficiency in a great number of NLP tasks.
With training over 5 epochs, LSTM-based model got an accuracy of 91%. And with
training over 4 epochs, CNN-based model achieved an accuracy of 93.7%.
Muhammed et al. [5] have implemented combination of support vector machine,
passive-aggressive, and Naive Bayes (NB) classifier algorithm. For text processing,
there are use of two vectorizers which are count vectorizer and TF-IDF vectorizer.
Their approach was to combine use of Web crawling and machine learning processes.
By doing so, maximum accuracy observed was 80%.
3 Proposed Model
After observing and analyzing different algorithms and models, this project has
considered 2 algorithms: Naive Bayes and passive-aggressive, for carrying out clas-
sification on the datasets. The varying results of both classifiers are discussed in the
discussion section.
3.1 Model Flow
See Fig. 1 and Table 1.

Fig. 1 Model flow

Table 1 Literature survey

References Methodology Features Dataset Accuracy Result
[6] Support vector Text embedding COVID-19 74% (with Proposed
machine (BERT and fact news tweet SVM), 73% zero-shot
SVM), random verification collection (with RFC), learning
forest classifier from tweeter 73% (with approach for
(BERT RFC), MLP) fake tweet
multi-layer detection
perceptron The zero-shot
neural network model
(BERT MLP) achieves
about 81%
accuracy
[7] Long short-term Word2Vec News tweets 74.8% (with The
memory embedding collection SVM), 76% performance
(LSTM), from tweeter (with RF), of the GCN
random forest 67% (with shows the
(RF), support LSTM), effectiveness
vector machine, 79% (with of this
graph GCN) approach
convolutional
network (GCN)
[8] Bidirectional Doc2Vec FEAVER 56.49% The zero-shot
encoder embedding dataset (with variant of this
representations zero-shot approach
from setting) and significantly
transformers 73.74% outperforms
(BERT) with (with all compared
zero-shot setting supervised) zero-shot
and supervised baselines
setting
[9] Graph neural Unigrams, FakeNewsNet 72.6% Requires a
network( GNN), bigrams, dataset limited
gradient punctuation, number of
episodic psycholinguistic, features of the
memory readability, and social context,
(GEM), elastic syntax features does not rely
weight on text
consolidation information,
(EWC) achieves
superior
performance
than methods
that require
syntactic and
semantic
analysis
(continued)
Table 1 (continued)
References Methodology Features Dataset Accuracy Result
[10] TF-IDF vectors, Bag of words, Dataset 94.31% Model is
dense neural word vector created by efficient when
network representation Craig the stances
Silverman between news
Used for articles and
FNC-I headlines are
challenge “unrelated,”
“agree,” and
“discuss,” but
the accuracy
drop for
“disagree”
stance up to
44%
[11] Unsupervised Gibbs sampling, LIAR dataset, 75.9, 67.9% UFD
fake news update rule buzz feed performs
detection dataset better on
framework LIAR dataset
(UFD) than buzz
feed dataset
[12] SVM Length, convey National 87% SVM gives
less clout, appear Public Radio, the best
more negative in New York prediction
tone Times, and results among
Public logistic
Broadcasting regression,
Corporation random
forest,
decision tree,
k-neighbor
classifier
3.2 Dataset
The dataset used for this model is of US Presidential Elections 2016. Kaggle released
this dataset as challenge to create an accurate model of fake news detection since this
election was the reason that fake news came into spotlight. There are 20,000 articles
in this dataset with the column named as “label.” This label column accepts only 2
values, either 1 or 0. 1 is assigned if the news is true, and 0 if the news is false.
3.3 Data Preprocessing
Before using the dataset into model, raw texts of news require preprocessing. We
have used different methods to clear different types of text noise. In this data cleaning
part, first regex expression has been used. With the use of regex, only a word or a
string can be allowed; beside that, symbol, number, and sign are filtered from text.
After using regex, there is only words in text. Second step for cleaning is use of stop
word removing process. To remove stop word, first we have to covert text style to
sentences to words; for that, use of tokenizer is essential. Before use of tokenizer, the
text was considered as sentences but to remove stop words we need to check every
word so for that tokenizer is used. Tokenizer allows text to break a string into tokens.
Here, the tokens are words. After the process of tokenization, it is easy to remove
stop words.
3.3.1 Stop Word Removal
Stop word is set of commonly used words in a language which do not have more
significance in news text. Stop words can be filtered or processed from the text as
they are most common and do not have more importance. Stop words are like part
of sentence which connects words, for example, prepositions like “in,” “from,” “of,”
“to”; conjunctions like “or,” “but,” “and”; articles like “the,” “a,” “an,” etc. Such
stop words which hold less useful information may take valuable processing time;
therefore, removing stop words in data processing is key factor in natural language
processing. The library use in this model to remove stop words is natural language
toolkit (NLTK).
3.3.2 Lemmatization
Lemmatization is a technique to convert any given word into a normal form by

removing suffixes from that word. Using lemmatization, we can group together the
inflected forms of a word or reduced to its common form so words can be analyzed
as a single item. For example, look at below figure if we do not use lemmatization
processor will consider follows, following, and followed as different words, it will
be huge problem when we are dealing with large dataset, and because of this the
processing time would be high. But after lemmatization, these three words would be
considered as follow; by this way processing can be optimized.
3.4 NLP Techniques
NLP techniques are used to transform text input data in the form of vectors to make
it compatible for the classifiers to perform mathematical operations on it. This model
uses TF-IDF vectorizer for the conversion.
3.4.1 TF-IDF Vectorizer
Term Frequency (TF): Term frequency is how many times a word appears in a docu-
ment divided by the total of how many words are in the document. Term frequency
changes in every document; it is unique for every document [13]. It is calculated by:
n i, j
t f i, j = (1)
k n i, j
Inverse Document Frequency (IDF): The inverse document frequency offers the
measure of what quantity info the word provides; that is, it says if the word is rare
or common across all documents. It is the log of inverse fraction of the documents
that contain the word [13]. It is found by:

N
wi, j = t f i, j ∗ log (2)
d ft
3.5 Naive Bayes Classifier
Naive Bayes is a simple supervised function and a generative model which returns
probabilities. Predictions are made using Bayes theorem; then, the predictions are
made on the basis of the presence of a specific feature that is separate or unrelated
to any other feature’s presence. The predictions made by the model are Naïve “on
basis of conditional independence between the pair of features; hence, the name is
derived Naive Bayes.” In Bayes’ theorem, a class variable y’ which dependent on x’,
here x’ is a vector which is made of x i ’... x n ’ is given by [14]:
P(xi |y, x1 , . . . , xi−1 , xi+1 , . . . , xn ) = P(xi |y) (3)

3.6 Passive-Aggressive Classifier
Passive-aggressive classifier is a family of online learning algorithms proposed by

Crammer. It is based on simple idea and based on performance passive-aggressive
classifier proved efficient than many other alternative algorithms. It is one of the
available incremental learning algorithms, and it is simple to implement, since it has
a closed-form update rule. Let supposed to have a dataset [15]:

X = {x 0 , x 1 , . . . , x t ,. . .} where x i ∈ R n
(4)
Y = y0 , y1,··· , yt , . . . where yi ∈ {−1, +1}
In Eq. (4), t is temporally dimension. Here, data are collected from same data
generating distribution, there will be no larger parameter modification and the algo-
rithm will keep learning, but if the source of data is changed with different distri-
butions, the weights will change according to new distribution and slowly forget
previous ones. In this model, the data will be drawn from same distribution. Given
weight vector w, the prediction can be calculated as [15]

y t = sign w T · x (5)
This algorithm is based on Hinge loss function [15]:

L(θ ) = max 0, 1 − y · f x t ; θ (6)
In Eq. (6), the value L differs between 0 and k, and if the value of k is 0 that means
perfect match. This depends on f (x(t), q) [15]. The update rule which is generally
used in passive-aggressive algorithm is:

wt+1 = arg minw 21 w − wt 2 + Cξ 2
(7)
L(w; x t , yt ) ≤ x
In Eq. (7), first let us assume the slack variable is x = 0. In case a x(t) sample is
presented, the classifier will determine the sign with use of current weights. When
the sign is correct, the value of loss function will be 0 and the value of arg min will be
w(t). This clearly shows that when correct classification occurs the classifier will be
in passive state. Now assume that instead of correct classification misclassification
occurred in classifier [15].
Table 2 Confusion matrix

Predicted value = 0 Predicted value = 1
Actual value = 0 True negative False positive
Actual value = 1 False negative True positive
3.7 Accuracy Measurement
The accuracy is measured on the basis of the confusion matrix. While applying Naive
Bayes algorithm and passive-aggressive classifier, confusion is used for finding out
how much accuracy the model gives.
A table is used to describe the performance of a classification model on a set of
test data for which the true values are known as confusion matrix. Now, the concept
of confusion matrix is pretty easy, but terminology related to it is quite confusing;
hence, the name confusion matrix is a table [16] with four completely different
combinations of actual and predicted values as shown in Table 2:
To understand and analyze performance of this model, methods used are recall,
F1-score, and precision. Precision is the ratio between the true positives and all the
positives, i.e., fraction of relevant instances among the retrieved instances [17].
True Positive
Precision = (8)
True Positive + False Positive
The recall is the measure of model correctly identifying true positives. Recall is
given by fraction of the total amount of positive instances that were actually retrieved
[17].
True Positive
Recall = (9)
True Positive + False Negative
F1-score is the harmonic mean of recall and precision. It is given by [17]:
Precision × Recall
F1 − score = 2 × (10)
Precision + Recall
After training the model, in testing phase for some datasets the accuracy of Naive
Bayes is higher and for others the accuracy of passive-aggressive is higher. So, in the
result of this model, the accuracy of passive-aggressive and Naive Bayes classifier
for the dataset of US Presidential Elections 2016 is 96.8% and 83.6%; by observing
both accuracies, this model will suggest the user to carry out further classification
using passive-aggressive classifier only. It is to be noted that Naive Bayes classifier
Fig. 2 Confusion matrix of [[1009 5]

Naive Bayes [323 663]]
Fig. 3 Confusion matrix of [[987 27]

passive-aggressive [37 949]]
works on the Bayes probability theorem, so the accuracy in predicting fake news will
be varying as compared to passive-aggressive which remains more stable.
Passive-aggressive will first check the prediction; if it will match to actual result,
then the weight would be same and it will remain in passive state but if the prediction
will not match to actual, it will go in aggressive state and try to change weights in
the way that the predicted value would come as close as possible to actual value; this
is the reason that passive-aggressive is more efficient most of the times than other
classifiers. And that is why passive-aggressive classifier gives a higher accuracy in
this model. The confusion matrices obtained in this model are:
In Figs. 2 and 3, the values represent the test case numbers which have been
classified into these confusion matrices. Here, 2000 test cases have taken that means
2000 news articles have been classified in these matrices. From Figs. 2 and 3, the
values of true positive and true negative are obtained as 1009 and 663 for Naïve Bayes
and 987 and 949 for passive-aggressive. Here for prediction to be accurate, the sum of
values of true negative and true positive has to be higher than the sum of values of false
negative and false positive. As reported in Sect. 3.7.1, result clearly shows that the
sum of true value of passive-aggressive is much higher than Naive Bayes, and hence,
passive-aggressive has a higher accuracy than Naive Bayes. Now to measure Naive
Bayes and passive-aggressive classifiers’ performance closely precision, recall, and
F1-score are used.
In Fig. 4, comparison of classifiers and comparison of these parameters of accuracy
for this propose simulated model (Naive Bayes and passive-aggressive) and support
Fig. 4 Comparison of classifiers

vector machine (SVM) classifier. Precision, recall, and F1-score for an ideal classifier
are 1; hence, the model which achieves values nearer to 1 is more likely to be an
ideal classifier; as the above graph shows, passive-aggressive classifier has values
very nearer to 1 than Naive Bayes; it can be stated that passive-aggressive can be
considered as a better classifier of this model.
5 Conclusion
In this paper, we have proposed a model for detecting fake news from dataset.
The dataset used is 2016 Presidential Elections news data. Stop word removal and
lemmatization techniques are used to preprocess the dataset. Passive-aggressive
classifier and Naive Bayes classifier are used to classify whether the news is
fake or real. The result of each classifier differs with different datasets. Accuracy
of Naive Bayes is 83.6%, and accuracy of passive-aggressive is 96.8% for this
dataset. Here, the performance of passive-aggressive is better than Naive Bayes.
The reason being different states of passive-aggressive classifier can classify more
accurately than the probability-based Naive Bayes classifier. Here, the performance
of passive-aggressive classifier is far better than other classifiers.
References
1. https://en.wikipedia.org/wiki/Fake_news_in_India
2. Ghosh A, Veale T (2017) Magnets for sarcasm: making sarcasm detection timely, contextual
and very personal. In: Proceedings of the 2017 conference on empirical methods in natural
language processing, pp 482–491
3. “Fake news detection using machine learning”. Pantechsolutions (2018) www.pantechsolut
ions.net/fakenews-detection-using-machine-learning
4. Rodríguez ÁI, Iglesias LL (2019) Fake news detection using Deep Learning. arXiv preprint
arXiv:1910.03496
5. Murshid TM, Nikhil PP, Ranjith EP, Francis JJ (2019) Fake news detection using machine
learning. Int J Innovative Res Sci Eng Technol 8(06):6784–6786
6. Kar D, Bhardwaj M, Samanta S, Azad AP (2020) No rumours please! A multi-indic-lingual
approach for COVID fake-tweet detection. arXiv preprint arXiv:2010.06906
7. Sharma S, Sharma R (2020) A graph neural network based approach for detecting suspicious
Users on Online Social Media. arXiv preprint arXiv:2010.07647
8. Li Q, Zhou W (2020) Connecting the dots between fact verification and fake news detection.
9. Han Y, Karunasekera S, Leckie C (2020) Graph neural networks with continual learning for
fake news detection from social media. arXiv preprint arXiv:2007.03316
10. Thota A, Tilak P, Ahluwalia S, Lohia N (2018) Fake news detection: a deep learning approach.
SMU Data Sci Rev 1(3):10
11. Yang S, Shu K, Wang S, Gu R, Wu F, Liu H (2019) Unsupervised fake news detection on social
media: a generative approach. In: Proceedings of the AAAI conference on artificial intelligence,
vol 33, no 01, pp 5644–5651.
12. Singh V, Dasgupta R, Sonagra D, Raman K, Ghosh I (2017) Automated fake news
detection using linguistic analysis and machine learning. In: International conference on
social computing, behavioral-cultural modeling, & prediction and behavior representation in
modeling and simulation (SBP-BRiMS), pp 1–3
13. https://towardsdatascience.com/natural-language-processing-feature-engineering-using-tf-
idf-e8b9d00e7e76
14. https://en.wikipedia.org/wiki/Naive_Bayes_classifier
15. https://www.bonaccorso.eu/2017/10/06/ml-algorithms-addendum-passive-aggressive-algori
thms/
16. https://towardsdatascience.com/understanding-confusion-matrix-a9ad42dcfd62
17. https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/
Predicting NCOVID-19 Probability
Factor with Severity Index
Ankush Pandit, Soumalya Bose, and Anindya Sen
Abstract Hospitals worldwide are struggling to cope up with patient’s admission

issues related with the increasing number of COVID-19 patients’ cases mainly driven
by Delta variant, as severely ill nCOVID patients are found waiting for hospital beds,
which are occupied by non-critical COVID patients. To make the situation worse,
people who are partially or fully vaccinated against COVID-19 are also getting re-
infected. Due to the absence of prior knowledge of an index of severity for COVID-19
patients, hospitals, with limited number of ventilators and medical equipment, fail
to admit patients on any priority basis. With multiple tests kit available in market till
now, there is none with an instantaneous index for severity prediction for COVID.
This research develops a free and user-friendly algorithm titled “SAHAYATA 1427”
(renamed herein Sahayata) which predicts a factor for a patient having the probability
of disease nCOVID-19 termed as “probability factor” of COVID-19 for each patient.
Concurrently, the algorithm also provides an index for severity by which the patient is
affected by nCOVID, termed as “severity index.” The input data is both demographic
and patient provided. The severity index is determined using artificial intelligence.
Using a logistic regression model with data set of existing COVID patients, Sahayata
predicts the probability factor for an nCOVID-19 patient with an accuracy, precision
and recall of 88.17%, 100% and 87.3%, respectively. Results indicate that it can be
used effectively both at hospitals by trained medical personnel and at home by the
A. Pandit (B)
Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata,
West Bengal, India
e-mail: ankush.pandit.cse20@heritageit.edu.in
S. Bose
Communications and Multimedia Engineering, University of Erlangen-Nuremberg, Erlangen,
Bavaria, Germany
e-mail: soumalya.bose@fau.de
A. Sen
Department of Electronics and Communication Engineering, Heritage Institute of Technology,
Kolkata, West Bengal, India
e-mail: anindya.sen@heritageit.edu
https://doi.org/10.1007/978-981-16-7610-9_47
628 A. Pandit et al.
general population. Sahayata helps the COVID-19 patients living in rural communi-
ties with smaller patients care facilities with limited equipment by providing a way
for efficient treatment care.
Keywords Artificial intelligence · COVID-19 detection · Prediction · Probability

factor · Severity index · Logistic regression · Classification · Normalization
1 Introduction
The motivation of this research work arises from the ongoing COVID-19 global
crisis where hospitals of multiple nations are running short of beds, ventilators and
medical equipment leading to collapse of healthcare system across the globe.
Nations like France, China and USA are on break yet again mostly due to Delta
variant of COVID-19. China’s COVID-19 cases hit a seven-month high on August
10, 2021, with the reporting of 143 new cases [1]. Similar trends can be seen at far
west with B.1.617.2 (Delta) variant cases piling up in the USA. As per the report of
centers for disease control and prevention [2] in USA, the daily trends in COVID-19
cases and death rates are increasing per 100,000 population as shown in Figs. 1 and
2. The report also states that there is an increase in 22% in hospital admission in
USA for new COVID-19 cases in between August 01, 2021, and August 07, 2021,
as compared to its previous week, i.e., from July 25, 2021, to July 31, 2021 which is
shown in Fig. 3. The report went on saying that there is a whopping 21.4% increase
in registered COVID-19 cases in the age group of 0–17 years during the same time
frame as shown in Fig. 4. The report further shows the global trend in COVID-19
cases in terms of incidence rates as shown in Fig. 5. A report from “Times of India”
Fig. 1 Daily trends in number of COVID-19 cases till August 08, 2021, in the USA reported to
centers for disease control and prevention and total and cumulative incidence rate of COVID-19
deaths per 100,000 population
Predicting NCOVID-19 Probability Factor with Severity Index 629
Fig. 2 Daily trends in number of COVID-19 cases till August 08, 2021, in the USA reported to
centers for disease control and prevention and total and cumulative incidence rate of COVID-19
cases per 100,000 population
Fig. 3 New hospital admission in USA from August 01, 2021, to August 07, 2021 as per centers
for disease control and prevention
[3] states USA is recording over 1,00,000 new COVID cases a day, and average
number of cases have doubled from two weeks ago. During the same time, deaths
have doubled to 516 a day. “Bloomberg” reported [4] that Austin, the capital city of
Texas, USA, with 2.4 million inhabitants has only six intensive care unit beds left
due to massive rise in COVID-19 cases. Condition is even worse in Middle East with
“Times of India” [5] reporting that in Iran, one person is dying of COVID-19 every
2 min with total deaths have reached 94,603 cases.
“Sahayata” means “help” in “Sanskrit” and has been developed in the “Bengali
Year “of 1427 viz. in year 2020. As the name suggests, the proposed prediction
algorithm has the ability to help the health workers, taking quick decision in proper
Fig. 4 New hospital admission in USA by age groups from August 01, 2021, to August 07, 2021
as per centers for disease control and prevention
Fig. 5 Global trends in epidemic curve trajectory classification till August 8, 2021, as per centers
for disease control and prevention
allocation of hospital beds, ventilators or ICU for deserving patients and eventually
reducing the pressure on healthcare institutes.
Though multiple high accuracy COVID-19 rapid testing kits are now available
in the global markets, the cost of the kits and testing [6–8] is a matter of concern
for people especially from poor and developing nations. Also, these kits do not
provide the severity of COVID-19 cases. Although many effective methods have
been developed to predict severity of COVID-19 [9, 10], cost and user-friendliness
is questionable.
The research develops a free of cost novel prediction algorithm named Sahayata
which provides probability factor of COVID-19 along with severity index of each
case.
Sahayata is easy to use and can be used by healthcare institutes as well as at homes.
Sahayata takes into account of the manually and scientifically computed weightage of
parameters like COVID-19 symptoms and duration of each symptom, severity factor
of locality, national impact factor, comorbidity, age, sex group, travel history and
SpO2 level of patient to predict the probability of a person having been diagnosed by
COVID-19 with severity in each case. The model parameters are manually adjusted
until we get best output representing the chances that the concerned person is diag-
nosed by COVID-19 after repeated test trials with available data. Sahayata operates
as a logistic regression model where the weighted sum of each contributing param-
eter is passed to a sigmoid function. This model also gives an index of severity for the
COVID patients as output, following the same method, representing the severity of
illness of the patient. The chances of COVID-19 infection and severity of it computed
by the model is plotted on graph against their corresponding graph already obtained
from preexisting database, for determination of accuracy of the algorithm.
The two measurements that Sahayata provides are of utmost importance, espe-
cially during the hour of crisis in a country like India, where population density
is very high. At the scarcity of medical resources faced by doctors or officials-in-
charge, they have to prioritize some patients based on their medical situation, and
the probability factor and severity index will prove to be handy to make a quick
decision. Thus, our algorithm helps not only to save significant amount of time but
also save the money to run the additional tests. The severity index is calculated in
such a way that if the health condition of patient is likely to deteriorate quickly, and
his/her severity index will be higher than others, so he/she will automatically get
higher priority if our algorithm is followed.
Our algorithm together with multiple AI-based app for measuring SpO2 level,
GPS-based positioning, few Q&A regarding physical condition and travel history
can make predictions about someone’s COVID status and measure its severity with
high accuracy, thus creating an easy way for self-assessment especially for people
with no particular medical knowledge.
Sahayata predicted the patients diagnosed with COVID-19 with an accuracy of
88.17%, precision of 100% and recall of 87.3%. Besides, it has computed severity
index in each case. A graph has been plotted between SpO2 level and severity index
for each patient. The graph depicted that the severity increases with low SpO2 value
which is a typical medical case, thus proving the authenticity of the severity index
prediction curve given by Sahayata.
4 Major novel contributions presented by our project
• Self-testing is possible by users besides use by doctors or nurses.
• Easy integrations of new symptoms in the system, if observed during a particular
period.
• Helps in decision making for resource allocation approach.
• Predicts the severity of each case.
• Applicable to be used anywhere in the world, as global and local factors have
been taken into account for each of parameters.
2 Materials
Sahayata has been developed in Python high-level programming language. The algo-
rithm is open-sourced, user-friendly and is available in the GitHub repository. Two
freely available datasets are primarily used: (i) Probability factor dataset and (ii)
severity index dataset as discussed below. The program is developed on Python 3.6
IDLE in Windows 10 platform.
2.1 Probability Factor Dataset
Sahayata has been tested on the dataset collected by “Open COVID-19 Data Working
Group” [11] as on date May 26, 2020, for prediction of probability factor. The
dataset has been filtered further to remove redundancy and incomplete data. The
filtered dataset includes 59 patient IDs from different nationalities, corresponding
symptoms experienced, travel history and travel country. Of these, 55 are COVID-19
positive with two being asymptomatic patients.
2.2 Severity Index Dataset
Sahayata has been tested on the dataset of 103 patients from India, of different age,
sex and comorbidity status, duration of COVID-19 symptoms and SpO2 level at time
of admission in hospital for predicting severity index.
3 Methods
3.1 Prediction of Probability Factor for NCOVID-19
3.1.1 Algorithm to Predict Probability Factor of NCOVID-19
The prediction of probability factor for nCOVID-19 has been shown in Fig. 6.
As the above diagram depicts, our algorithm has three main modules, the first
module receives necessary user data and extracts the corresponding factors. The
second module calculates the weighted sum of the different factors and normalizes
the weighted value between 0 to 1 using a sigmoid filter. And, the last module
classifies the user based on a comparison of the normalized value to a pre-calculated
threshold value. The following sections describe each module in details.
Fig. 6 Flowchart depicting detection of nCOVID-19
User Data Collection Module

In this section, Sahayata asks for the COVID-19 symptoms experienced by the
patient. Sahayata replaces each of the symptoms with the sum of their corresponding
set of scientifically, mathematically and manually computed pre-defined local and
global factors. The local factor of any COVID-19 symptom represents the affect and
frequency of the symptom toward COVID-19 in the concerned nation. There are
multiple symptoms of COVID-19 found out which is not common to all countries
across the globe. As a result, the authors have computed a global factor for each
symptom so that Sahayata can be used at global stage. The global factor of any
COVID-19 symptom represents the effect and frequency of the symptom toward
COVID-19 in the whole world.
After this, Sahayata takes into the account the weightage of locality from where
the patient belongs. The weightage to locality is pre-defined and given based on
the number of cases of COVID-19 observed in the area. Locality has been divided
into four groups: Containment zone, red zone, orange zone and green zone (in the
order of highest to lowest number of COVID-19 cases detected). The data of locality
factor we used for training of our model was taken from different news reports and
government sources.
Then, the algorithm takes into account the travel factor which is the sum of
traveling country fatality rate and world fatality rate of COVID-19.
Calculation and Normalization Module
Once the entire user given information is converted into numerical values, the second
module calculates the weighted sum of the different factors.
Our previous explanation of the first module of Sahayata algorithm gives the idea
that the weighted sum increases with an increase in the number of symptoms. But if
a person is COVID-19 positive and already showing some symptoms, adding a few
mores symptoms to his condition does not alter the fact that he is a positive case.
So, the final probability should not alter much even if some symptoms are added to
the existing condition, which increases the weighted sum to a significant amount.
Keeping this idea in mind each of the three above sections under 3.1.1 is summed
up and passed through a sigmoid filter to normalize the probability score in scale of
0 to 1.
Classification Module
Finally, the authors have computed the average of probability factor for COVID-
19 positive patients from training dataset. This average is considered as threshold
value. The probability factor for each patient is compared with threshold value. If
found larger or closer to threshold value, Sahayata declares the patient as COVID-19
positive, else negative.
3.1.2 Formulae for Prediction of Probability Factor for NCOVID-19

Travelling Country Factor (TCF) = Total COVID − 19 case
(1)
in the travelling Country/Population of the Country
World Factor (WF) = Sum of TCF for all countries (2)
Travel Factor (TF) = TCF + WF (From Eqs. 1 and 2) (3)
Local Factor (LF) of a Symptom = Total Number of instances

of a symptom in a particular Country/Total COVID − 19 cases of Country
(4)
Global Factor (GF) of a Symptom = Total Number of

(5)
instances of a symptom/Total COVID − 19 cases

Total Symptoms Factor (TSF) = (LF + GF) (From Eqs. 4 and 5)
for every symptom
(6)
Locality Factor (LoF) =0.8, Containment Zone

=0.6, Red Zone
=0.4, Orange Zone
=0.2, Green Zone (7)
Probability Factor (PF) of nCOVID − 19 = sigmoid(2 ∗ TF + 6.5 ∗ TSF + 1.5 ∗ LoF)
(From Eqs. 3, 6 and 7 and Sect. 3.1.1)
(8)
TCF and WF have been calculated from Worldometer [12] as on date June 14,
2020. LF, GF and TSF have been calculated from filtered dataset of [11] as on date
May 26, 2020. The LoF values have been chosen experimentally resulting to the best
prediction result with the weightage being given to the more severe zones. Authors
have defined a total weightage in scale of 10 for TF, TSF and LoF (in Eq. 8) with the
values for each parameter that has been chosen experimentally resulting to the best
prediction result.
3.2 Prediction of Severity Index for NCOVID-19
3.2.1 Algorithm to Predict Severity Index of NCOVID-19
The computation of the prediction of the probability factor for a patient whether
affected with nCOVID-19 is shown in Fig. 7.
From the diagram, we get the basic difference between the calculation of severity
and probability factor. We are concerned about the absolute value of the weighted sum
of all the user related information, and so, subsequent comparison or normalization
Fig. 7 Flowchart depicting severity index prediction of nCOVID-19

is not used here. So, the whole algorithm has two sections: User Module to collect
user information and extract corresponding factor values and Calculation Module
to calculate the weighted sum and output the final severity index.
User Module
The computation of weightage of COVID-19 symptoms works in the same way as
explained in the Eq. 4, 5 and 6 under the Sect. 3.1.2.
The duration of each symptom is taken into account and is multiplied with the
TSF (from Eq. 6) calculated in the above section.
Then, Sahayata asks for the sex group and replaces it with the existing fatality
rate of the corresponding sex group defined in [12] as on date July 5, 2020.
After this, the algorithm takes into account the difference of SpO2 level from
the standard value for the patient and stores it as SpO2 level factor. The standard is
calculated as the average of SpO2 level of healthy persons.
Sahayata asks for the age of the patient. It is replaced fatality rate of the age group
pre-defined by World Health Organization [13] as dated from February 24, 2020 to
April 13, 2020.
Next, the algorithm takes the comorbidity of the patient and replaces it with
their corresponding pre-defined fatality rate [14–17]. This pre-defined fatality rate
of comorbidity is defined as comorbidity factor. The sum of comorbidity factor of
existing comorbidities is computed.
Calculation Module
Each of the six above sections under 3.2.1 is summed up and multiplied with the
fatality rate of the country to predict the severity index of COVID-19 positive patient.
3.2.2 Formulae for Prediction of Severity Index for NCOVID-19

Comorbidity Status (CS) = (Comorbidity factor) (9)
for every comorbidity

Severity Index (SI) of nCOVID − 19 = {(TSF ∗ Duration of Symptoms) + Sex Factor

+(2 ∗ SpO2 Level Factor) + Age Factor + 2 ∗ CS} ∗ (Fatality rate of Country)
(From Eqs. 6 and 9 and Sect. 3.2.1)
(10)
4 Results
Graphical illustrations of results given by for Sahayata algorithm have been depicted
in Figs. 8, 9, 10 and 11.
Fig. 8 Prediction results for probability of COVID-19 positive given by Sahayata algorithm. The
ordinate labeled “probability of COVID positive” represents “probability factor”
Fig. 9 Prediction results for probability of COVID-19 positive given by Sahayata algorithm with
threshold line at y = 0.6
Figure 8 shows the probability of COVID-19 positive, viz. probability factor (PF)
has been plotted on the y-axis of the graph with their corresponding patient or user,
present in dataset [11], on the x-axis in absence of threshold line.
Figure 9 shows that Fig. 8 has been plotted with threshold line place at y = 0.6.
Figure 10 depicts that Fig. 8 been plotted with threshold line place at y = 0.8.
In Fig. 11, the severity index (SI) has been plotted on the y-axis of the graph
with their corresponding patient’s SpO2 levels, present in dataset under Sect. 2.2,
on the x-axis. The plot shows a general increase in the severity with a decrease in
Fig. 10 Prediction results for probability of COVID-19 positive given by Sahayata algorithm with
threshold line at y = 0.8
Fig. 11 Prediction results for severity value of nCOVID-19 cases given by Sahayata algorithm.
The ordinate labeled “severity values” represents “severity index”
the SpO2 level, which is common medical scenario. Thus, the plot validates our
calculation. However, the plot is not strictly a straight line as other factors are taken
into consideration during calculation other than SpO2 level.
Let us elaborate the usage of these two measurements with simple case studies
from our dataset:
Case Study 1: A person from an urban area of India with no serious symptoms and
no travel history has a probability 0.131683213 of being COVID positive, which is
quite low in terms of the decided threshold. Hence, the person is deemed COVID
negative.
Case Study 2: A person living in Spain, who is exhibiting flu like symptoms and has
a recent travel history from Italy, has a probability 0.844886766 of being COVID
positive, which is quite higher and thus falls into the COVID positive category.
Case Study 3: A patient with cough and breathlessness for 2–3 days but no
comorbidity history gives 0.8437 COVID positivity chance and a severity value
of 111.2953507.
Case Study 4: Another patient with fever, cough and breathlessness for almost
a week and history of breathing problem gives severity value 209.5281853 with
0.95631 chance of being COVID positive.
So, interesting observation here is, even though both patients described in case
study 3 and 4 are predicted to be COVID positive, patient 4 will be given higher
priority over patient 3 based on the higher severity value of patient 4. This will save
the time of going through the previous detailed test reports and cost to re-conduct
the tests.
5 Discussions
Referring to Fig. 8 under Sect. 4, if the probability of COVID-19 positive viz. PF

value is 1 or close to 1, it signifies patient or user is COVID-19 positive. These
patients who were COVID-19 positive have been shown with red dots. A number of
patients were asymptomatic with no serious symptoms. As a result, the PF value of
these patients found out to be low. These patients have been denoted with yellow dots.
The patients who were COVID-19 negative have a very low PF value and have been
represented with green dots. Depending upon the threshold line, researchers have
classified each of the 59 patients in filtered dataset obtained from [11] as COVID
positive, if lying above the threshold line else negative.
Considering Fig. 9 under Sect. 4, it has been found out when the threshold line is at
y = 0.8, Sahayata can predict PF for COVID-19 with an accuracy of 81.4%, precision
of 100% and recall of 80%. Threshold line has been shown with blue dotted line.
The patients above the threshold have been classified as COVID-19 positive and have
been denoted with red dots. The patients below the threshold have been considered
as COVID-19 negative and have been represented with green dots. In this case, value
of true positive (TP), true negative (TN), false positive (FP) and false negative (FN)
has been found out to be 44, 4, 0 and 11, respectively.
Taking Fig. 10 of Sect. 4, when the threshold line is at y = 0.6, it has been seen
Sahayata can predict PF for COVID-19 with an accuracy of 88.1%, precision of
100% and recall of 87.3%. Threshold line has been shown with blue dotted line.
The patients above the threshold have been classified as COVID-19 positive and
have been denoted with red dots. The patients below the threshold have been consid-
ered as COVID-19 negative and have been represented with green dots. For this case,
value of TP, TN, FP and FN has been found out to be 48, 4, 0 and 7, respectively.
In the Fig. 11 under Sect. 4, the authors have plotted the prediction of SI given
by Sahayata algorithm against the corresponding SpO2 level in each case. The SI
score has been shown in blue dots. It can be seen from the figure that blue dots are
either present at the bottom-right side or the upper-left side which can be physically
interpreted as SI is inversely proportional to patient’s SpO2 level. The graph depicts
that the severity increases with low SpO2 value which is a classical medical case.
This finding proves the accuracy of proposed Sahayata algorithm in predicting the
severity index for nCOVID-19 cases.
The authors have experimentally and manually optimized several parameters in
the algorithms to get best prediction results. Future works can be done as follows:
• Autonomous optimization of parameters completely.
• Integrating more potential COVID-19 symptoms into Sahayata.
• There has been multiple research works going worldwide which aims to use
computed tomography (CT) of chest for early detection of COVID-19 [18–21].
Integrating CT of chest as symptom of COVID-19 in Sahayata.
• Use of image processing tools for detection of bluish lips, COVID toes and eventu-
ally to be integrated into Sahayata to widen the scope of the algorithm in predicting
PF.
• Development of user-friendly and platform independent application can be devel-
oped based on the Sahayata algorithm to make it more accessible to common
public.
6 Conclusion
nCOVID-19 was officially first reported in the China at the end of January 2020
which was followed by other countries, and lockdown in UK was announced on
March 23, 2020 [22]. India and many other countries followed suit. The community
believes that as the first-, second- and third-order impacts of the virus manifest over
different time frames across regions, this pandemic will not necessarily be “over”
until we are through the impact of the “third wave” of the COVID-19 pandemic [22].
Till December 2020, India is going down the first peak of the pandemic, while Africa
and Europe experiences the second peak, and UK has exposed a new strain of the
virus. Now, the whole world is facing spike in COVID-19 cases yet again mainly
drove by the Delta variant. Countries around the globe are busy defending against the
virus again. This clearly demonstrates the worldwide severity of the pandemic and
reflects the need of efficient services for the ever-growing patient population. With
noticeable rise in demand with patient critical care units at hospitals, the severely
ill nCOVID patients are found waiting for hospital beds, which are occupied by
non-critical COVID patients. The demand has peaked so much in small villages or
areas with low number of hospitals per area residents that patients are queued to
wait in turn for the availability of critical survival units. Due to the absence of prior
knowledge of an index of severity for COVID-19 patients, hospitals, with limited
number of ventilators and medical equipment, fail to admit patients on any priority
basis. With multiple tests kit available in market till now, there is none that has come
up with an instantaneous index for severity prediction for COVID. This research has
developed and implemented a free and user-friendly algorithm “SAHAYATA 1427”
which predicts a factor for a patient having the probability of disease nCOVID-19
termed as “probability factor” of COVID-19 for each patient and concurrently also
provides an index for severity by which the patient is affected by nCOVID, termed
as “severity index.” The probability factor and severity indexes are supposed to act
as an initial guide for the caregiver and patients for making progressive decisions.
Sahayata will be a great advantage and boon for the patient and the care giver if the
critical care unit can be prioritized based on a patient’s probability factor and severity
of nCOVID-19.
The COVID-hit situation is predicted to be worse when the second wave will
reach its peak. The shortage of test kit, the limitation in oxygen supply and most
importantly the absence of sufficient human resources shall be unimaginably difficult
to be handled. Amidst such adverse situation, Sahayata can be an extremely useful
tool to counter the hostility. The use of the “probability factor” and “severity index”
will be beneficial form economic, medical and social point of view.
Sahayata is user-friendly, and program is easily accessible. It can be used by the
general population at home and by trained medical personnel at medical, healthcare
institutes over python programming language platform. One potential situation may
arise if any household does not have any kind of measuring tools like oximeter
available. Many AI-based applications can measure the SpO2 very accurately but
fail to provide any kind of probability factor or severity measurement; hence, by
using our model with the available SpO2 value, one nonmedical person can also get
idea of his or her medical condition, just by staying inside the home without going
out.
The proposed algorithm is simple yet efficient and has achieved high accuracy
in predicting PF on nCOVID-19 with SI in each case. Moreover, the Sahayata is
open-source and thereby welcomes scope for a useful future mobile application
development. Most importantly, Sahayata is free and can be used by any nations all
across the globe.
COVID-19 is evolving day by day, likewise doctors and researchers associate new
symptoms to it. Our choice of algorithm in Sahayata supports this idea of scaling.
If a new symptom is to be added in the data set, our model will not break. As we
are using logistic regression, we can achieve the similar accuracy after training the
model once more.
Author’s Contribution Each author has contributed equally toward this project; from collecting
data from various sources, to design an algorithm, perform the testing and finally build a model
based on the found results.
Conflict of Interest The named authors have no conflict of interest, financial or otherwise.
Information About Funding Authors did not receive any financial support in the form of grants.
Resource Availability “SAHAYATA 1427” code and all the datasets, flowcharts and images used
in this research and final results are available at: https://github.com/AnkushPandit/Predicting-Pro
bability-Factor-of-nCOVID-19-with-Severity-Index.
References
1. Agence France-Presse (2021) China cases hit 7-month high, worst outbreak since virus emerged
in Wuhan, NDTV, Aug 2021
2. Centers for disease control and prevention CDC, Aug 2021
3. US see highest caseload since Feb, Times of India, Aug 2021
4. Chua L (2021) Austin sounds ‘Dire’ covid emergency as available ICU beds drop, Bloomberg,
Aug 2021
5. Iran: one person dying of Covid-19 every 2 min, Times of India, Aug 2021
6. Court E (2020) ‘Game Changing’ 15-minute Covid-19 test cleared in Europe, Bloomberg
Quint, October 2020. Cost of RT-PCR test decreased to | 1600 in Karnataka, Times Of India,
Sept 2020
7. Cost of RT-PCR test decreased to | 1600 in Karnataka, Times Of India, Sept 2020
8. India’s Feluda Covid-19 test cheaper, faster alternative to RT-PCR, ET Healthworld, Sept 2020
9. Assandri R, Buscarini E, Canetta C, Scartabellati A, Vigano G, Montanelli A (2020) Laboratory
biomarkers predicting COVID-19 severity in the emergency room. Arch Med Res 51(6):598–
599
10. Blood test can predict severity of Covid-19: Study, ET Healthworld, July 2020
11. https://github.com/beoutbreakprepared/nCoV2019
12. https://www.worldometers.info/coronavirus/
13. Coronavirus disease 2019 (COVID-19) situation report—89, p 3, Apr 2020
14. https://www.worldometers.info/coronavirus/coronavirus-age-sex-demographics/
15. Hussain A, Mahawar K, Xia Z, Yang W, Hasani S (2020) Obesity and mortality of COVID-19.
Meta-analysis. Obes Res Clin Pract 14:295–300
16. Juarez SY, Qian L, King KL, Stevens JS, Hussain SA, Radhakrishnan J, Mohan S (2020)
Outcomes for patients with COVID-19 and acute kidney injury: a systematic review and meta-
analysis. Clin Res 5(8):1149–1160
17. 63% of coronavirus deaths in India in 60+ age group: Health ministry, India Today, Apr 2020
18. Hui JY, Hon TY, Yang MK, Cho DH, Luk W, Chan RY, Chan K, Loke TK, Chan JC (2004)
High-resolution computed tomography is useful for early diagnosis of severe acute respira-
tory syndrome–associated coronavirus pneumonia in patients with normal chest radiographs.
J Comput Assisted Tomogr 28(1)
19. Huang C, Wang Y, LI X, Ren L, Zhao J, Hu Y, Zhang L, Fan G, Xu J, Gu X, Cheng Z, Yu T,
Xia J, Wei Y, Wu W, Xie X, Yin W, Li H, Liu M, Xiao Y, Gao H, Guo L, Xie J, Wang G, Jiang
R, Gao Z, Jin Q, Wang J, Cao B (2020) Clinical features of patients infected with 2019 novel
coronavirus in Wuhan, China. Lancet 395:497–506
20. Lia Y, Xia L (2020) Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and
management. Am Roentgen Ray Soc 214:1–7
21. Chung M, Bernheim A, Mei X, Zhang N, Huang M, Zeng X, Cui J, Xu W, Yang Y, Fayad
ZA, Jacobi A, Li K, Li S, Shan H (2020) CT imaging features of 2019 novel coronavirus
(2019-nCoV). Radiology 295(1):202–207
22. Fisayo T, Tsukagoshi S (2020) Three waves of the COVID-19 pandemic. Postgrad Med J 0:1
Differentially Evolved RBFNN
for FNAB-Based Detection of Breast
Cancer
Sunil Prasad Gadige, K. Manjunathachari, and Manoj Kumar Singh
Abstract In this work, computational intelligence-based detection of malignant and

benign breast cancer classification has been obtained through the evolved radial basis
function neural network. The cancer category has defined over observed parameters
values of breast lesion extracted through fine needle aspiration. Generally, the final
performance of a neural network is decided by the quality of the learning algorithm
and transfer function characteristics selected over the active nodes. Gradient-based
learning is very popular and useful also but suffered from trapping in the local
minima and accuracy deficiency in the outcomes. Hence, the performance quality
of the radial basis function neural network has been improved by evolving the basis
function parameters and connection weights through a new mutation strategy in the
differential evolution. The proposed approach maintains a better balance between
exploration and exploitation by providing the probabilistic approach of mutation
strategy selection among the defining the differential vector through an available
best member or randomly selected member. The mutation strategy guided by the
random member-based approach helps in exploration at a large level while the best
member-based differential vector helps in faster convergence. The proposed work
also provided the provision to have the neural network parameters in the integer
value domain so that there will saving in the memory and easiness over hardware
realization. The proposed algorithm’s performances have been compared against the
different commonly used strategies of available mutation strategies in the differ-
ential evolution and gradient-based strategy and have shown significant benefits in
efficiency and precision. The proposed solution model can be used to assist the
cytologist in making the final decision robust and accurate.
S. P. Gadige · K. Manjunathachari
GITAM University, Rudraram, Hyderabad, Telangana, India
e-mail: 221960404501@gitam.in
K. Manjunathachari
e-mail: mkamsali@gitam.edu
M. K. Singh (B)
Manuro Tech Research Pvt. Ltd., Bangalore, India
e-mail: mksingh@manuroresearch.com
https://doi.org/10.1007/978-981-16-7610-9_48
644 S. P. Gadige et al.
Keywords Breast cancer · FNAB · Malignant · Benign · Neural network · RBF

neural network · Differential evolution
1 Introduction
The origin of breast cancer started from the breast tissue and often originated from
the milk duct inner linings called ductal carcinomas or from the lobules called lobular
carcinomas [1]. The occurrence of breast cancer prevailed in humans as well as other
mammals. In the year 2020, there were 2.3 million women have been diagnosed with
breast cancer and more than 685,000 deaths have been recorded globally. More than
7.8 million women have been diagnosed with breast cancer in the last 5 years at
the end of 2020 and have made the top-ranked in the prevalent status of all cancer.
The survival status of 5 years after the first diagnosis is having a varying range. For
the high-income countries, it is around 90% while the condition is worst for poor or
underdeveloped countries (in India, it is 66% while in South Africa it is 40%) [1]. The
treatment process is determined by the current status of cancer characteristics (like
size and growth rate) and involved an integrated approach carried the drugs (hormonal
therapy and chemotherapy), radiation and/or immunotherapy, and surgery. From a
statistical point of view, breast cancer shared nearly 22.9% of all types of cancer in
women. The occurrence frequency of breast cancer is common in women than in
men. Prognosis and survival rate depends upon the cancer type and stage, applied
diagnosis and treatment along with the geographical belongings of the patient. At
present, mammograms have been considered as the front best breast cancer screening
tests. But mammography has carried its limitations and carried high sensitivity of
97% and low specificity of 64.5% [2]. The dense breast which is common in women
increased the value of false-positive percentage further [3]. Other complications with
analyzing the mammogram which is highly subjective can induce serious issues with
final observation outcomes. The information is available with abnormal mammo-
grams confirmed with a further diagnosis like ultrasound, MRI, and biopsy. There
is a recommendation for breast biopsy under certain circumstances like when there
is a lump or thickening the breast, mammogram shows some suspicious area in the
breast, or ultrasound /MRI scan has shown some suspicious finding. A breast biopsy
provides a confirmative diagnosis of whether the patient has breast cancer or whether
the abnormality that existed in the question is benign and has a specificity of 99.6%;
sensitivity, 97.4% [4] and depends upon the outcomes of biopsy further processes
can be defined with better confidence. Fine needle aspiration biopsy (FNAB) of the
breast provides the facility of minimally invasive procedure and prevents the need
for open biopsy [5]. The FNAB process is cheaper, comfortable, and results can
appear in a short period. Even though the core biopsy diagnostic process is more
robust and reliable but carried the disadvantages of taking a longer period, patient
discomfort, and being costlier. In the FNAB, there is the use of a smaller needle
hence has a low probability to develop hematoma and other rare complications, such
Differentially Evolved RBFNN for FNAB … 645
as pneumothorax [6, 7]. The triple diagnostic approach combining clinical evalua-
tion, mammography, and FNAB provides a precise diagnosis for breast cancer and
reduces the risk of missed diagnosis even less than 1% [8]. Generally, a pathologist
examined the biopsy tissue sample to find the availability; however, manual anal-
ysis and quantification along with lacking universal rule can make the final decision
erroneous [9]. The accuracy level of manual analysis has been seen in the range
of 62–90%. This variability can affect the patient management process very much.
Early detection of breast cancer can help very much in the right treatment plan and
survival period and in this regard computer-aided diagnosis (CAD) has been applied
to assist. The present knowledge-based artificial intelligence (KAI) approaches can
help in improving the CAD system performance up to great extent in comparison
with the rule-based problem-specific solution.
This work was carried to develop the knowledge-based approach over FNAB data
learned by evolved radial basis function neural network to recognize the category of
breast cancer for malign or benign. In the conventional form of RBF where basis func-
tion parameters are fixed and learning is provided by up-gradation of out layer weight
only. This makes the learning restricted and causes poor performance. To overcome
this issue along with output layer weights, basis function parameters (like center
and spread value of Gaussian function) are also involved in the learning process. The
gradient-based approach of learning has the limitation of being stuck in local minima
and accuracy, and hence, differential evolution-based approach has been applied to
evolve the whole learning process. A hybrid mutation strategy carried the proba-
bilistic approach of selection among derived by the best member or random member
has shown excellent benefit. The work has been divided into several sections; Sect. 2
carried the details about the related work while the proposed work has presented in
Sect. 3. The detailed experimental results and analysis have given in Sect. 4 while
the conclusion and future work have presented in the last.
2 Related Work
Several works have been reported in past toward automated detection of breast cancer.
To distinguish the category of benign and malignant, Osareh and Shadgar [10] have
been utilized the SVM and K-nearest neighbors and PNN together with signal-to-
noise ratio to feature ranking while PCA has been applied to extract the feature.
Single nucleotide polymorphisms from the BRCA1, BRCA2, and TP53 genes have
been utilized in [11] to detect breast cancer by the different approaches of machine
learning. Machine learning-based classifier has been applied over mammogram in
[12] to detect breast cancer over-extracted features from segmented regions on cran-
iocaudal (CC) and/or mediolateral oblique (MLO) mammography image views. The
response over a single cycle of neoadjuvant chemotherapy (NAC) by breast cancer
patients has been discussed in [13]. A different form of artificial meta-plasticity in the
multilayer perceptron has been used in [14] to detect breast cancer. To detect breast
cancer, performances of a different approach like SVM, C4.5, Naïve Bayes, and K-
NN have been discussed in [15]. Recurrence is important in the behavior of breast
cancer related to mortality, and in [16], machine learning-based method has been
applied to predict breast cancer recurrence. Microarray technology-based identifica-
tion of genetic factors has given great help in the diagnosis and treatment. Bektaş and
Babur [17] have been applied to machine learning algorithms to detect and classify
breast cancer. Considering the active genes in breast cancer, Kolay and Erdoğmuş
[18] have been proposed the clustering approach to classify breast cancer. Weighted
K-mean support machines and weighted support machines have been considered in
[19] over two large-scale real applications in the TCGA pan-cancer methylation data
of breast and kidney cancer. The rule-based approach of breast cancer classification
and machine learning-based approach for survival prediction has been discussed in
[20]. SVM alone performance over breast cancer classification has been improved by
the ensemble approach in [21]. FFBPN for the classification of breast cancer cases
as malignant or benign has been discussed in [22]. How machine learning algorithms
help in the detection of breast cancer has been discussed [23, 24].
3 Proposed Work
The radial basis function neural network has been shown very effective in universal
approximation and function mapping applications. The mathematical form of output
delivered by RBF neural network can be represented as given in Eq. 1.

N
Oq = f q ( p) = Wqz ϕz ( p, k z )
z=1

N

= Wqz ϕz p − k z 2 ∀q = 1, 2, . . . m (1)
z=1
where, p ∈ n×1 : n dimensional input, ϕz (.): a transformation function, mapping

value from + to , . 2 : represent the Euclidean distance, Wqz : connection weights
from hidden to output layer; N: number of hidden layer, and k z ∈ n×1 : Kernel centers
in RBF First, the Euclidean distance between the centers and the input is calculated,
and then it is mapped using the nonlinear function accessible in the hidden layer. The
final output has been calculated as a weighted sum of this mapped data. In this work,
the mapping functional has been assumed to be the Gaussian function provided by
Eq. 2.
ϕ(x) = e(− p−k /σ 2 )

2
(2)
where is a spread parameter and regulates the “width” of the Gaussian function. The
centers are normally taken into account from the input data set. The output layer
weights, which are adaptive and play an active part in the function mapping, are the
only parameter in the standard form of RBFNN. When there is a complex function
available for mapping, this can be an issue. Along with output layer weights, two
more kernel function variables, centers and the spread of, were investigated in this
work. This simplifies and expedites the mapping process. Equation 3 shows how to
define the error output.
2
1 1 N
J (n) = |E(n)| =
2
Od (n) − w Z (n)ϕ{ p(n), k z (n)} (3)
2 2 z=1
In the conventional form of learning like gradient descent-based approach where

the change in the weights and other parameters has been updated based on the current
condition of the error function. Such an approach has a constraint to stuck in the
local minima and cause of sub-optimal learning as well more time to converge.
To overcome such an issue, we have proposed differential evolution-based learning
where the weights and kernel parameters of RBF architecture have been updated
from one generation to another generation. The proposed approach does not suffer
from objective function discontinuity characteristics, explores the solution domain
broader manner, and causes optimal convergence.
Differential evolution has been considered as one of the very efficient evolu-
tionary approaches in the area of evolutionary computation because of its simplicity
and exploration capability. There are fundamentally three parts under DE: (i) devel-
opment of differential vector-based mutation vector, (ii) crossover strategy to create
offspring, and (iii) greedy approach-based selection mechanism. In the formation of
a mutation vector, first, a differential vector is created by taking difference among
any two random members, and later its scaled version is added with a third random
selected member to form the mutation vector. To create the offspring, the muta-
tion vector is crossover mostly with all points under the probabilistic environment
with the parent and finally selection process which is fitness-oriented-based greedy
process decide who will survive for the next generation between parent, and their
corresponding offspring. The mathematical forms of mutation vector, crossover, and
selection process have been defined through Eqs. 4–6. In Eq. 4, F is a mutation scaling
factor and should be less than one. CR in Eq. 5 is the crossover rate and should be
high around 1 but lesser than 1. In Eq. 3, f (.) represents the fitness function value.

M Vi(G) = Mr(G)
1
+ F × Mr(G)
2
− Mr(G)
3
(4)

mvi(G)j if rand(0, 1) ≤ CR or j = jrand
u i(G)
j = (G) (5)
xi j otherwise

u i(G) if f u i(G) ≤ f xi(G)

xi(G)
j = j
(6)
xi(G) otherwise
The major difference which makes the differential evolution different and efficient
is its mutation strategy which is defined by the differential vector terms. Apart from
the mutation strategy carried by DE defined in Eq. 4 which is called DE/rand/1,
numbers of other variants existed as shown by Eqs. 7–9. In Eq. 7, there are two
difference vectors and the basis member is selected randomly hence called DE/rand/2.
In Eq. 8, the base member is the best member of the present generation and one
difference vector exist hence called DE/Best/1. In a similar way, strategy in Eq. 9 is
called as DE/rand to best/1.
M Vi = Mr 1 + F(Mr 2 − Mr 3 ) + F(Mr 4 − Mr 5 ) (7)
M Vi = Mbest(g) + F(Mr 2 − Mr 3 ) (8)

M Vi = Mr 1 + F Mbest(g) − Mr 2 + F(Mr 3 − Mr 4 ) (9)
The DE/rand/1 strategy is very efficient and has a high level of exploration capa-
bility because all the component members are selected randomly from the population.
But the problem with the strategy is not exploring the region extensively which may
carry the better solution, and hence, there is either slow convergence or a chance of
missing the quality solutions. The strategy of DE/rand/2 generates the extra pressure
of carrying the differences from others and can be a cause of premature convergence.
DE/best/1 strategy provides a center of exploration around the best solution only and
can cause of trap in the local solution. In a similar way, DE/rand to best/1 tries to
increase the level of exploration by a random differential vector along with the best
member-based difference vector which seems to help in exploitation. But having a
differential vector always concerning, the best solution can cause dominancy over
the random exploration, and as a result, suboptimal convergence appeared. To over-
come this issue, a hybrid approach of mutation strategy has been proposed in this
work called probabilistically best solution directive differential evolution (PBDDE)
as shown in Eq. 10.
⎧
⎨ Mr 1 + F Mbest(g) − Mr 2 if rand < Thr
M Vi = else (10)
⎩
Mr 1 + F(Mr 2 − Mr 3 )
The use of the DE has been applied to evolve the optimal parameters of basis
function and output layer weight for the RBF architecture. Considering RBF archi-
tecture has two input nodes, two hidden nodes, and one output node with Gaussian
function as the basis function. As result, there were four centers of Gaussian func-
tion, two values for function spread, and two values of output layer weights, and
Fig. 1 Representation of the p1 p2 p3 p4 P5 P6 P7 P8

solution in the DE for
evolving RBF
C σ wo
A solu on member array Popula on
C σ Wo
Inputs RBFNN Architecture with Gaussian
Error Differen al
Targets
func on Evolu on
Fig. 2 Functional block diagram of DEARBF
hence, there is a total of 8 parameters that have to explore which is also defines
the problem dimension. Hence, a solution representation in DE carried an array of
size 8 numeric in which the first 4 values represented basis function mean, next 2
values represented the spread of Gaussian function while last 2 values represented
the output layer weights as shown in Fig. 1.
The complete functional block diagram of the proposed DEARBF has shown in
Fig. 2. First, a random population contained the number of solution members having a
defined length (equal to the parameters in RBF architecture) has been selected. From
the individual solution, the centers, spread, and output layer weights are extracted and
applied to RBF architecture where for the given inputs the output is generated with
available parameters in RBF. The DE has evolved the corresponding offspring, and
the fittest among parents and offspring were selected as a next-generation member.
Like this whole process repeated for all other available members in the population
to form the next-generation population. Once the next generation has obtained, the
above-defined process is repeated to evolve further.
4 Experimental Result and Analysis
There were two different sections under which detailed experiments have been
presented. In the first part, the efficacy of the proposed differential evolution in the
evolving RBF architecture has been tested over a benchmark classification problem of
XOR which carried the nonlinear characteristics. Different number algorithms have
been considered for comparison purposes. First, the static RBF has been considered to
see the limitation associated with the fixed value of basis function parameters. The
gradient-based approach has been applied further to make the RBE self-adaptive
where along with weights values, basis function parameters also change with itera-
tions. The performance of gradient algorithms heavily depends upon the value of the
learning rate; hence for three different values of learning rate, performances have been
evaluated. A later different form of mutation strategies in the differential evolution has
been applied, and performance comparisons have been given. In the second section,
FNAB-based breast cancer category has been obtained. The completer experimental
work has developed in the MATLAB environment.
4.1 XOR Classification
The problem of XOR classification is well known and generally being used to test
the developed algorithms. The nonlinear characteristics make the problem difficult
to solve with a lesser number of hidden nodes. In this work, least optimum size
of architecture [1, 2] has been considered which includes the two input nodes, two
hidden nodes, and one output node. Ten different algorithms, namely static RBF
(SRBF), gradient algorithm-based self-adaptive RBF with three different learning
rates Gr1 (learning rate 0.1), Gr2 (learning rate 0.5), and Gr3 (carried learning rate
of 0.9), have been considered to get the detail effect of learning rate over performance.
Differential evolution carried the population size of 100 with mutation factor F equal
to 0.5 and crossover rate CR equal to 0.9 have been considered for all the cases.
In this work notations, DE1 represents DE/rand/1, DE2 represents DE/rand/2, DE3
represents DE/best/1 while DE4 represents DE/rand to best/1. The allowed numbers
of iterations for all the algorithms were 1000, and over 10 independent trials the mean
and standard deviation have been estimated. The performances of convergence have
shown in Fig. 3. It can observe that except, DE1 (DE/rand/1), jDE, and proposed
PBDDE none of the algorithms have been converged for all the 10 trials. It can also
be observed that the performance of PBDDE was better. For the given input {[0, 0]
[0, 1], [1, 0] [1]} and the obtained mean output and standard deviation in that with
all algorithms have shown in Table1. The success rate has also been included. The
proposed form of mutation strategy has shown excellent performances against all
others. The evolution in the integer domain has also been applied by PBDDE, and
the performance of mean convergence over 10 trials has shown in Fig. 4. The best
Fig. 3 Mean convergence 1

characteristics over XOR
problem
0
-1
Log 10 (MSAE)
SRBF
-2 Gr1
Gr2
Gr3
-3 DE1
JDE
DE2
-4 DE3
DE4
PBDDE
-5
0 100 200 300 400 500 600 700 800 900 1000
Iteration
parameters evolved for self-adaptive RBFNN under real and integer domains have
also shown in Tables 2, 3, and 4.
4.2 FNAB-Based Cancer Category Prediction
Generally, a cytologist after observation of lesion defines binary value for 10 different
attributes. The considered attributes are intracytoplasmic lumina, cellular dyshesion,
3D epithelial cell clusters, bipolar naked nuclei, foamy macrophase, nuclei, nuclear
pleomorphism, nuclear size, necrotic epithelial cells, apocrine change. The age of
the patient has also been considered. In the considered data set, there were a total of
692 instances of data out of which randomly 200 instances have been considered for
the training while the remaining has been considered for the test purpose. The size
of RBF neural architecture has taken as 11 input nodes, 6 hidden nodes, and 1 as
an output node. The obtained mean performances by PBDDE over 10 trials for real
and integer domains have shown in Fig. 5. The performance in terms of sensitivity
and specificity also has been estimated for each case and has shown in Tables 4 and
5 correspondingly. It can observe that proposed PBDDE for both real and integer
domain has not shown very satisfactory performance which can be appreciable for the
application like health care. The best-evolved RBF center position of basis function
available at different hidden nodes in the real domain has shown in Table 6 while
corresponding spread and output layer weight values have shown in Table 7. In
similar manner, the evolved best in the integer domain has shown in Table 8 for
centers position and Table 9 contains the spread and output layer weights.
Table 1 Mean and {std.deviation} output along with success rate over 10 trials
Algorithms Output Success rate (%)
0.1741 0.4674 0.5856 0.1690

SRBF 10
{0.1161 0.4889 0.4800 0.1183}
0.0046 0.9907 0.9813 0.0272

GRARBF1 80
{0.0011 0.0260 0.0526 0.0652}
0.0016 0.8650 0.7392 0.0323

GRARBF2 60
{0.0013 0.1863 0.3599 0.0413}
0.0016 0.9126 0.8612 0.0266

GRARBF3 40
{0.0001 0.2001 0.3105 0.0561}
0.0001 0.9999 0.9999 0.0001

DE/rand/1 100
{0.0001 0.0005 0.0002 0.0001}
0.0003 0.9993 1.0000 0.0004

JDE 100
{0.0005 0.0013 0.0013 0.0005}
0.0443 0.8713 0.7297 0.0865

DE/rand/2 60
{0.0793 0.3330 0.4661 0.1392}
0.0000 0.8750 0.6250 0.0000

DE/Best/1 80
{0.0000 0.3536 0.5175 0.0000}
0.0670 0.9121 0.9448 0.0810

DE/rand-Best/1 90
{0.0660 0.2734 0.1649 0.0867}
0.0000 1.0000 1.0000 0.0000

PBDDE 100
{0.0000 0.0000 0.0000 0.0000}
5 Conclusion
The proposed work has provided a module in the CAD facility in health care to
detect breast cancer. Complexity in formulating the relationship among the various
cell parameters has been proven easy with the help of knowledge-based compu-
tational intelligence. The knowledge learning has been done through RBF neural
network. Instead of a deterministic approach to learning, evolution-based learning
has been proposed to improve performance. The learning capability of the gradient-
based approach has shown limitations in optimal convergence while the differen-
tial evolution-based stochastic search has given the optimal and faster convergence.
The proposed mutation strategy is carried the probabilistic selection in defining the
Fig. 4 Mean convergence 1

characteristics over XOR 0.9
problem with integer
0.8
parameters of PBDDE-based
evolved self-adaptive RBF 0.7
0.6
MSAE
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100
Iteration
Table 2 Best RBF evolved real parameters value by PBDDE over XOR problem
Algorithms Center value Spread O/p layer weight
Input HN1 HN2 HN1 HN2 w1 w2
X1 1.0736 −0.1743 0.2870 0.2721 1.5225 1.5112
X2 0.1709 1.0134
Table 3 Best RBF evolved integer parameters value by PBDDE over XOR problem
Algorithms Center value Spread O/p layer weight
Input HN1 HN2 HN1 HN2 w1 w2
X1 5 −2 −3 2 3 2
X2 −1 2
Table 4 Performances by PBDDE evolved RBF for breast cancer in real value domain
Training data Test data
Error Sensitivity Specificity Error Sensitivity Specificity
Best 0.0500 0.9130 0.9695 0.0427 0.9096 0.9816
Mean 0.0625 0.8580 0.9794 0.0811 0.8428 0.9577
random differential vector, and best member directive differential vector has shown
very good balancing in exploration and exploitation. The RBF has also evolved in the
integer domain which can have numerous advantages in terms of low arithmetic cost
and less susceptibility to the noise data in the training. The proposed approach can
further be integrated with mammogram outcomes to make the final decision outcome
robust and accurate.
Fig. 5 Convergence 70
characteristics PBDDE
iPBDDE
PBDDE-based evolved
60
self-adaptive RBF for
FNAB-based cancer
detection in real value and 50
integer value domain
MSAE
40
30
20
10
0 10 20 30 40 50 60 70 80 90 100
Iteration
Table 5 Performances by PBDDE evolved RBF for breast cancer in integer value domain
Training data Test data
Error Sensitivity Specificity Error Sensitivity Specificity
Best 0.0650 0.8841 0.9618 0.0813 0.8855 0.9356
Mean 0.0770 0.8246 0.9748 0.0925 0.8199 0.9521
Table 6 Evolved best center values by PBDDE for RBF for breast cancer in real domain
İnput Basis function mean value over hidden nodes
H1 H2 H3 H4 H5 H6
X1 2.3463 2.2099 10.5743 −19.7346 3.2959 20.5460
X2 −0.6736 2.5770 12.2502 −3.5754 −14.3008 −2.3810
X3 11.6818 −19.6153 −4.1456 5.5049 −11.1878 27.6747
X4 −4.6449 9.6686 −26.4811 −12.4404 −12.4535 16.6607
X5 2.2615 −4.4348 1.1222 −14.2182 0.6125 −12.5923
X6 7.9362 20.0116 5.0546 −27.2334 2.9395 −1.7550
X7 −5.6084 −24.8404 −8.7345 3.3618 −24.8347 9.3254
X8 20.8940 25.1380 −21.4681 12.8040 12.3531 6.6946
X9 12.7413 −15.4768 13.6615 −1.7509 −3.1255 14.7529
X10 −4.9205 −0.2314 −2.0485 5.1427 23.6456 7.8693
X11 12.9655 6.4217 −20.3528 4.3555 −5.0317 −6.5937
Table 7 Evolved best spread and o/p weight values by PBDDE for RBF for breast cancer in real
domain
Basis fun Hidden node
H1 H2 H3 H4 H5 H6
Spread −9.1167 −8.7138 14.6923 −5.7163 −1.7460 −25.9035
wo 9.4985 22.9837 11.9000 13.0526 4.9120 10.2559
Table 8 Evolved best center values by PBDDE for RBF for breast cancer in integer domain
Input Basis function center value over hidden nodes
H1 H2 H3 H4 H5 H6
X1 6 21 3 11 44 37
X2 54 −113 16 4 −66 23
X3 9 −3 47 20 −36 −12
X4 5 −10 35 −18 72 −18
X5 −57 −7 32 −7 −16 9
X6 39 12 58 28 −50 −5
X7 39 −13 54 30 −41 −25
X8 10 −39 123 6 −17 23
X9 −22 −97 46 48 −57 36
X10 19 12 43 4 −18 7
X11 −29 −20 −34 −49 20 −23
Table 9 Evolved best spread and o/p weight values by PBDDE for RBF for breast cancer in integer
domain
Basis fun Hidden node
H1 H2 H3 H4 H5 H6
Spread 33 −31 61 −44 −30 22
wo 2 21 22 21 −2 39
Acknowledgements This research work has been completed in Manuro Tech Research Pvt. Ltd.,
Bangalore, India, under the program of Computational Intelligence in Health Care (CIHC).
References
1. World health organization report 2021. http://www.who.int/cancer

2. Zeeshan M, Salam B, Khalid QSB, Alam S, Sayani R (2018) Diagnostic accuracy of digital
mammography in the detection of breast cancer. Cureus 10(4):e2448. https://doi.org/10.7759/
cureus.2448. PMID: 29888152; PMCID: PMC5991925
3. ICMR (2016) Wed, 18 May 2016, PTI, New Delhi

4. de Cursi JAT, Marques MEA, de Assis Cunha Castro CAC et al (2020) Fine-needle aspiration
cytology (FNAC) is a reliable diagnostic tool for small breast lesions (≤ 1.0 cm): a 20-year
retrospective study. Surg Exp Pathol 3:29. https://doi.org/10.1186/s42047-020-00081-0
5. Gupta RK, Naran S, Buchanan A, Fauck R, Simpson J (1988) Fine-needle aspiration cytology of
breast: its impact on surgical practice with an emphasis on the diagnosis of breast abnormalities
in young women. Diagn Cytopathol 4:206–209
6. Dowlatshahi K, Jokich PM, Schmidt R, Bibbo M, Dawson PJ (1987) Cytologic diagnosis
of occult breast lesions using stereotactic needle aspiration: a preliminary report. Arch Surg
122:1343–1346
7. Evans WP, Cade SH (1989) Needle localization and fine-needle aspiration biopsy of nonpal-
pable breast lesions with use of standard and stereotactic equipment. Radiology 173:53–56
8. Jay H, Monica M, Marce L, Samuel H (1996) Disease of the breast. Lippincott-Raven,
Philadelphia Ch. 5
9. Pisano ED, Fajardo LL, Caudry DJ, Sneige N, Frable WJ, Berg WA, Tocino I, Schnitt SJ,
Connolly JL, Gatsonis CA, McNeil BJ (2001) Fine-needle aspiration biopsy of nonpalpable
breast lesions in a multicenter clinical trial. Radiology 219(3):785–792
10. Osareh A, Shadgar B (2010) Machine learning techniques to diagnose breast cancer. In: 2010
5th ınternational symposium on IEEE, health ınformatics and bioinformatics (HIBIT)
11. Silva S, Anunciação O, Lotz M (2011) A comparison of machine learning methods for the
prediction of breast cancer. In: Pizzuti C, Ritchie MD, Giacobini M (eds) Evolutionary compu-
tation, machine learning and data mining in bioinformatics. EvoBIO 2011. Lecture notes in
computer science, vol 6623. Springer, Berlin
12. Ramos-Pollán R, Guevara-López MA, Suárez-Ortega C et al. (2012) Discovering
mammography-based machine learning classifiers for breast cancer diagnosis. J Med Syst
36:2259
13. Mani S (2013) Machine learning for predicting the response of breast cancer to neoadjuvant
chemotherapy. J Am Med Inform Assoc 20(4):688–695
14. Fombellida J, Torres-Alegre S, Piñuela-Izquierdo JA, Andina D (2015) Artificial metaplasticity
for deep learning: application to WBCD breast cancer database classification. In: Bioinspired
computation in artificial systems. IWINAC 2015. Lecture notes in computer science, vol 9108.
Springer, Cham
15. Hiba A (2016) Using machine learning algorithms for breast cancer risk prediction and
diagnosis. Proc Comput Sci 83:1064–1069
16. Abreu P (2016) Predicting breast cancer recurrence using machine learning technique: a
systematic review. ACM Comput Surv (CSUR) 48(3)
17. Bektaş B, Babur S (2016) Machine learning based performance development for diagnosis of
breast cancer. In: IEEE medical technologies national congress (TIPTEKNO)
18. Kolay N, Erdoğmuş P (2016) The classification of breast cancer with machine learning tech-
niques. In: IEEE electric electronics, computer science, biomedical engineerings’ meeting
(EBBT)
19. Kim SH (2016) Weighted K-means support vector machine for cancer prediction. Springer-
Plus20165:1162
20. Montazeri M, Montazeri M, Montazeri M, Beigzadeh A (2016) Machine learning models in
breast cancer survival prediction Technol Health Care 24(1):31–42.https://doi.org/10.3233/
THC-151071
21. Huang MW (2017) SVM and SVM ensembles in breast cancer prediction. PLoS One
12(1):e0161501
22. Abdel-Ilah L, Šahinbegović H (2017) Using machine learning tool in classification of breast
cancer. In: Badnjevic A (eds) CMBEBIH 2017. IFMBE proceedings, vol 62. Springer,
Singapore
23. Saba T (2020) Recent advancement in cancer detection using machine learning: systematic
survey of decades, comparisons and challenges. J Infect Public Health 13(9):1274–1289
24. Mohammed SA, Darrab S, Noaman SA, Saake G (2020) Analysis of breast cancer detection
using different machine learning techniques. In: Tan Y, Shi Y, Tuba M (eds) Data mining and big
data. DMBD 2020. Communications in computer and ınformation science, vol 1234. Springer,
Singapore. https://doi.org/10.1007/978-981-15-7205-0_10
A Real-Time Face Mask Detection-Based
Attendance System Using MobileNetV2
Kishansinh Rathod, Zeel Punjabi, Vivek Patel,

Abstract COVID’19 has pushed the global trade and commerce in phase of reces-
sion. A dramatic loss has been seen in the gross domestic products (GDP) of many
nations worldwide. To set the economies back on track, nations all over the globe
are pulling back the full lockdowns and taking a step forward to help businesses and
economy. In order to secure a triumph over it, wearing a protective mask should be the
new normal. Hence, because of need and compulsion of a mask, the task of detecting
it on the face has become vital for all of us. A very simple image classification model
is trained by us with the help of fine machine learning libraries like Tensor Flow and
Keras, accompanied by MobileNetV2 neural network architecture. Here, live video
can be taken as an input by the webcam, which later on predicts whether face present
in the ROI (Region of Interest) has mask or not. The system first detects the face of
a person and then identifies whether mask is worn or not. Anyone’s face with mask,
which is also in motion can also be detected with the help of this system. Instead of
detecting only single face in a frame, our system is also capable of detecting multiple
faces and masks on them, whose presence will be depicted in a tabular manner.
K. Rathod (B) · Z. Punjabi

Department of Information Technology, Devang Patel Institute of Advance Technology and
Research (DEPSTAR), CHARUSAT, Charotar University of Science and Technology
(CHARUSAT), CHARUSAT Campus, Changa 388421, India
e-mail: 18dit067@charusat.edu.in
Z. Punjabi
e-mail: 18dit064@charusat.edu.in
V. Patel
Department of Computer Engineering, School of Technology, Pandit Deendayal Energy
University (PDEU), Gandhinagar, Gujarat 382007, India
e-mail: Vivek.pce18@sot.pdpu.ac.in
M. H. Bohara
Computer Engineering Department, Devang Patel Institute of Advance Technology and Research
(DEPSTAR), CHARUSAT, Charotar University of Science and Technology (CHARUSAT),
CHARUSAT Campus, Changa 388421, India
e-mail: Mohammedbohara.ce@charusat.ac.in
https://doi.org/10.1007/978-981-16-7610-9_49
660 K. Rathod et al.
Keywords Convolutional neural network · Face mask detection · Computer

vision · Multi box detector · MobileNetV2 · Deep learning
1 Introduction
Novel Coronavirus has affected the lives of humans at its worst. Senior citizens and
people having some respiratory complications are on higher risk factor. Earlier on
March 11, 2020, COVID-19 was declared a pandemic by World Health Organization
(WHO), with nearly 3 million cases, followed by 2,07,973 deaths across two hundred
thirteen countries and territories worldwide [1]. According to the scientist, there are
huge family of such deadly viruses which are already present around us. Coronavirus
is just one of several of them which has infected humans. When novel Coronavirus
started making people ill in late 2019, scientists gave it a name of coronavirus, which
is also called SARS-CoV-2 by experts [2]. After getting attached to a human cell and
getting inside it, the virus makes copy of the RNA to get spread everywhere. But if
some mistake is made, the RNAs gets changed and the scientists calls that mutation
[3]. After mutating to different variants, Coronavirus got new names with updated
effects namely B.1.1.7 (Alpha), B.1.351 (Beta), P.1 (Gamma), B.1.617.2 (Delta) [4].
All these types of variants were found at different places throughout the globe, which
caused huge destruction to the affected countries. Out of all, the Delta variant did the
worse to India, more than 2 lakh people died as a result of COVID’s second wave in
India, which includes more than 2000 deaths each day [4]. It was like a nightmare
for all.
To prevent various respiratory problems, including that of caused by COVID’19,
to wear a mask has become vital. World Health Organization (WHO) points on
making face mask as priority for mainly health care frontiers, and for all the citizens
of a country. Many countries mandated wearing a protective mask outdoors. Also,
during the second wave [4], WHO even recommended people to wear double mask
as the strain was becoming too deadly. As a result of it, detecting face mask became
an important responsibility to the society. Our model involves detection of where
someone’s face is located and then determining if mask is worn or not. The result
will be showed in a table, where beside the name, the value ‘1’ = with_mask and
the value ‘0’ = without_mask. Identification of face categorically can deal with
distinguishing some specific entity group, i.e., Face. This paper clearly presents a
simple idea to attend the previously mentioned purpose using some of important
libraries such as Keras, OpenCV Scikit Learn and TensorFlow. Convolutional neural
network architecture namely MobileNetV2 was used here which is a very effective
feature extractor for object detection and segmentation [5].
A Real-Time Face Mask Detection-Based Attendance System … 661
2 Methodology
The entire flow of the project goes like:

1. Collecting and loading the dataset.
2. Extraction from dataset.
3. Model Training.
4. Face Mask Detection.
First and foremost, in the training part of our detector, main focus has been made
toward the aspect of loading the dataset from the disk, then we trained the model on the
dataset by following some preprocessing steps. Fine tuning was performed with the
help of MobileNetV2 architecture. Serialization was followed by it. After the training
part, deployment was done. In this, as soon as the models gets trained, loading of the
project was initialized, and then, the detection part of face was performed. Finally,
the model classifies the result in distinct categories of with mask and without mask
(Fig. 1).
Fig. 1 Flowchart
Fig. 2 Part of dataset containing people with mask
3 Related Work
3.1 Data Collection
First of all, we started with collection of data set. For this, we collected some data
which includes two categories of person: with_mask and without_mask. A total of
3833 images are there, out of which 1915 includes people wearing mask and the rest
1918 comes in the category of without mask (Figs. 2 and 3).
The steps of data preprocessing convert the given data to a format which is very
friendly to user as well as meaningful. The data could be of any form including
images, tables, graphs, videos and more [6]. Firstly, in order to get started with the
training part, we need to get a dataset for the model which provides good range of
accuracy. Those data can be imported in our model with the help of various Python
libraries. Many a times, we can encounter some missing data in the dataset, in order
to deal with it, we need to delete that whole row or can take mean value of all the
data present. As we already know, our dataset is characterized into with_mask and
without_mask parts. In the part of preprocessing, different steps such as re-shaping,
resizing are applied, followed by vectorization of images to NumPy arrays. Resizing
of image is critical part of preprocessing, smaller the image, better tendencies of
running well. Here, our images are resized till 224*224 pixels, then converted to
Fig. 3 Part of dataset containing people without mask
format of an array and afterward we scaled the intensities of pixel in the image of
input to range [−1, 1]. At last, input can be preprocessed with the help of images
using MobileNetV2.
3.3 MobileNetV2 Architecture
Convolutional neural network is used by the model of face mask detection. Any
visual imagery can be analyzed by this deep neural network model. Data which is
image’s form is taken as input, all the data is then captured and sent to neuron
layers. MobileNetV2 is used here as convolutional neural network architecture.
MobileNetV2 architecture is a network model which uses depth wise separable
convolution as its basic unit. Its depth wise separable convolution has two layers:
depth wise convolution and point convolution [7]. MobileNetV2 architecture contains
the initial fully convolutional layer with 32 filters, following 19 residual bottleneck
layers [8] (Fig. 4.)
3.4 Proposed Method
Training Part:
To achieve the main goal of face mask detection [9], we started by importing various
necessary libraries.
Fig. 4 MobileNetV2 architecture
a. Tensor Flow and Keras:

Tensor flow and Keras are used for the purpose of data augmentation, prepro
cessing of the image, for loading the image data and also for loading the
MobileNetV2 classified for fine tuning.
b. Scikit learn:
This particular library Scikit learn is used for printing the classification report,
binarizing class labels and also to segment the dataset.
c. Imutils:
For our model, is used to find the images and list them, which are present in
our dataset. Imultis can also be used for image preprocessing in the model. Some
of its functions are used to make one of the basic image processing functions
such as resizing of image, translation, rotation of the image, etc.
d. Matplotlib:
This is an amazing library, which is used for visualization of two-dimensional
plots of arrays. Matplotlib is built on NumPy arrays and is used to plot the
training curves, plot some fine lines and decorate them with labels.
After this, later, preprocessing of the data was done.
In preprocessing, image will be resized into 224*224 pixels, and then it will be
converted to array format. (with the help of the preprocess_input function). In order
to achieve data augmentation, partitioning of our dataset was done further. Following
that we performed fine tuning with the help of mobile net [10] v2architecture. Later,
after the completion of fine tuning, we finally trained the model to detect the face
mask. Lastly, we will plot accuracy and loss curves graph to know the total accuracy
rate and also loss ratio.
Testing and Implementation:
After training, the testing part has been performed:
To begin with, we loaded the images from dataset. Then, we focused on detecting
the faces in the image. After that we had applied our model to classify whether the
detected face is contains mask or not mask. The function detect_and_predict_mask
will detect the faces and after that it will apply our model to each face Region of
Interest, i.e., ROI. Once the face ROI is extracted and the part of preprocessing is
done finally, we are ready to test our project.
While training the model, in order to check the loss and accuracy, we have
performed 20 epoch iterations. It was analyzed from it that right from the second
epoch, an increase in accuracy was noticed, while on the other hand, value of loss
was seen decreasing. Also, once the line of accuracy got stable, there were no further
need to iterate more in order to increase the model’s accuracy (Figs. 5, 6, 7, 8, 9).
In the above Table, ‘1’ refers to presence of mask, while ‘0’ refers to absence of
mask.
Fig. 5 Face with no mask

Fig. 6 Face with mask
Fig. 7 Multi-face detection
The face mask detector model has been trained and tested upon chosen dataset.
According to that set of data, our method has the accuracy up to 98–99%.
Having a look at the following figure, little signs of overfitting can be observed,
with loss of validation lower than the training loss. Hence, we can analyze that our
model could also generalize well to other images which are not the part of our dataset
(Fig. 10).
Fig. 8 Side angled face
Fig. 9 Attendance table based on mask detection
Fig. 10 Loss and accuracy

In this paper, with the help of basic machine learning tools and libraries the method
has achieved highest accuracy. This method can be integrated into public health care
centers. The technique can be used in variety of applications. Finally, the work opens
interesting future directions for researchers. In future, may be wearing a mask will
be mandatory because of this COVID-19 crisis. This model will help to identify if
the person is wearing the mask or not and whether it’s properly wearing, etc. Thus,
we have created a model that is capable of detecting whether a person’s face is
covered with face mask or not with the help of some machine learning libraries such
as OpenCV, Keras, TensorFlow, etc. A two-class model has been trained of people
who are not wearing face masks and ones who are wearing it. A classifier having
approximately ~99% accuracy is obtained by fine tuning MobileNetV2 on the face
mask/no face mask dataset. Model was taken by us and was applied to both real-
time streams of video and images by: Detecting one or more face in images/video,
extracting each individual face and applying the face mask classifier. It is also capable
of showing the results in a tabular form. It should be made more flexible such that
wherever this system is implemented whether in malls, or any public places it should
inform the authority or should alert that person who has not covered the face with
the help of alarm or buzzer beep.
References
1. World Health Organization (2020) Naming the coronavirus disease (COVID-19) and the virus
that causes it. Braz J Implantology Health Sci 2(3)
2. Gage A et al (2021) Perspectives of manipulative and high-performance nanosystems to manage
consequences of emerging new severe acute respiratory syndrome coronavirus 2 variants. Front
Nanotechnol 3:45
3. Campbell F et al (2021) Increased transmissibility and global spread of SARS-CoV-2 variants
of concern as at June 2021. Eurosurveillance 26(24):2100509
4. Bhattacharya A (2021) COVID-19: deaths in 2nd wave cross 2 lakh at daily average of over
2,000. The Times of India [online]
5. Modi S, Bohara MH (2021) Facial emotion recognition using convolution neural network. In:
2021 5th International conference on intelligent computing and control systems (ICICCS).
IEEE
6. Wang W et al (2020) A novel image classification approach via dense-MobileNet models.
Mobile Inf Syst 2020
7. Venkateswarlu IB, Kakarla J, Prakash S (2020) Face mask detection using MobileNet and
Global Pooling Block. In: 2020 IEEE 4th conference on information & communication
technology (CICT), pp 1–5. https://doi.org/10.1109/CICT51604.2020.9312083
8. Ejaz MS, Islam MR (2019) Masked face recognition using convolutional neural network.
In: 2019 International conference on sustainable technologies for industry 4.0 (STI), pp 1–6.
https://doi.org/10.1109/STI47673.2019.9068044
9. Jiang M, Fan X, Yan H (2020) Retinamask: a face mask detector. arXiv preprint arXiv:2005.
03950
10. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H
(2017) MobileNets: efficient convolutional neural networks for mobile vision applications
A New Coded Diversity Combining
Scheme for High Microwave Throughput
Yousra Lamrani, Imane Benchaib, Kamal Ghoumid,

and El Miloud Ar-Reyouchi
Abstract Multiple copies of signals received via different diverse branches are
combined using appropriate combining techniques. Future wireless microwave
communication systems will face two major challenges: high throughput and long
distance. Multiple copies of signals received via different diversity branches are
mixed using appropriate combining techniques. Network coding (NC) communica-
tion has recently piqued researchers’ attention. It combats multipath fading using
techniques in which many microwave stations improving capacity. This paper inves-
tigates a New Coded Diversity Combining Scheme (CDCS) for large throughput
in microwave point-to-point link. To reach this goal, we exploit the NC at each
microwave station. The suggested CDCS improves microwave transmission link
quality and reliability, avoids signal deterioration, and increases throughput. We
demonstrate that our CDCS outperforms several widely used method combinations,
improving throughput by 70% and 50%, respectively, compared to typical technique
and cross-polarization interference cancelation (XPIC).
Keywords Microwave transmission · Fading channels · Diversity combining

techniques · Network coding · XPIC
1 Introduction
Establishing a direct microwave station connection without an intermediary wireless

access point often suffers from signal degradation, especially for distances exceeding
80 km. The radiofrequency (RF) signal transmitted from the transmitter (Tx) to the
receiver (Rx) antennas is attenuated as it propagates through space, which drastically
Y. Lamrani · I. Benchaib · K. Ghoumid

National School of Applied Sciences (ENSAO), University Mohammed First, Oujda, Morocco
E. Miloud Ar-Reyouchi (B)
Abdelmalek Essaâdi University, 93000 Tetouan, Morocco
e-mail: e.arreyouchi@m.ieice.org
SNRT, Rabat, Morocco
https://doi.org/10.1007/978-981-16-7610-9_50
672 Y. Lamrani et al.
affects measurement [1] performance at reception level. This attenuation is due to

several phenomena, which include fast fading and shadow fading [2].
Diversity, or the merging of multiple receivers, is a particular receiving technique
that may be utilized to solve this problem. Diversity is a signal transmission method
that reduces the effects of interference and fading in a communication channel. The
receiver receives many versions or sources of sent and received signals on separate
channels, which is the fundamental concept of diversity [3]. Different types of diver-
sity may be used simultaneously in microwave wireless communication systems via
various combination techniques to improve performance.
An appropriate mix of antennas is needed to acquire separate fading signals to
provide acceptable diversity. Maximizing the average signal-to-noise ratio (SNR)
at the output is useful for merging signals. The receiving station may indicate the
successful reception of each packet transmitted by the microwave station over the
radio channel by sending an extremely short service packet. If the sender does not
get an acknowledgment, the station retransmits the packet using the retries option.
It should also be noted that the packet sizes [4, 5] have a primary role in performing
the microwave stations, particularly concerning throughput. We suggest an efficient
packet combination based on NC in this paper. The fundamental idea is for station 1
to send encoded data via the graph’s edges.
The network stations process the data; they receive and create encoded packets
subsequently sent to their neighbors. The input data packets are merged; this process
enables a wireless network’s intermediate microwave station to mix information
streams and retrieve data at receivers, as shown in [6]. This method has the potential
to offer many benefits to the P2P microwave link network, including reduced retrans-
missions and packet recovery [7] and throughput in the wireless communication
system [8, 9].
The traditional store-and-forward paradigm, in which microwave stations broad-
cast bits of the original flow without additional processing, cannot provide the best
microwave connection. Furthermore, utilizing store-and-forward to calculate the
maximum feasible rate is difficult. On the other hand, NC may increase throughput
and sensitivity in all situations when we consider the potential of packet loss [10], as
is the case with microwave wireless communication connections (unicast, multicast,
and broadcast). The packet creation concept ensures that the receivers decode all
received packets, as explained in [11].
RLNC can assist in achieving maximum network capacity and performance in
multi-rate situations. The throughput and receiver sensitivity [12] of a microwave
connection network are key performance parameters. The receiver sensitivity (RX
sensitivity) is an important radio characteristic since it affects how much power is
needed to decode incoming signals effectively. Various variables, such as modulation
types [13, 14] error correction methods, RF filter performance, and self-produced
noise, affect RX sensitivity.
In a carrier-class microwave connection network, XPIC is used to increase
capacity and spectrum efficiency. In a microwave link outfitted with XPIC tech-
nology, the potential total network capacity [15, 16] of a microwave route is virtually
doubled. Using XPIC, the vertical and horizontal polarizations on a route may be
A New Coded Diversity Combining … 673
given the same frequency. When the available frequencies are limited, the frequency
is assigned twice on the same route using both polarizations.
In contrast to traditional methods, which combine input flows according to prede-
fined schemes established a priori, we explore a CDCS in which microwave stations
mix packets using random or opportunistic network coding methodologies. While the
latter may be more efficient, they may cause buffering in the coding nodes, lowering
throughput, and receiver sensitivity.
The primary advantage of our approach is that it employs a new microwave recep-
tion combination technology that delivers high throughput for enhanced transmission
link durability while using current methods.
After introducing the main concepts in our methodology, we go through the
problem formulation and remedy in Sect. 2. The system moNew Coded Diversity
Combining Scheme for High Microwave Throughputdel is presented and discussed
in Sect. 3. The typical diversity combination techniques and the proposed combina-
tion approach are explained in Sect. 4. Section 5 compares these various methods
showing the better technique. Section 6 brings this study to a conclusion.
2 Problem Formulation and Remedy
XPIC is a technique that increases and overcomes spectral efficiency limits by

utilizing two radio frequency polarizations on a single frequency channel. Two
distinct data streams may be sent on a single carrier, thanks to the double polarization.
Each data stream may maximize the use of the channel’s bandwidth. However, the
dual-polarization transmission offers clear benefits and seems to be a simple tech-
nology deployed quickly. While antennas’ capacity to separate the two polarizations
has improved significantly, current advances are inadequate since rain may cause
the polarization to spin as the signal travels, resulting in a mix of polarization and
cross-polarization discrimination (XPD) interference. Because the chosen branch is
all that matters, this kind of interference is unavoidable.
Minimal-spectral-efficiency systems using QPSK-format modulation are unaf-
fected by this interference and may be utilized since the required SNR is low. More
spectrally effective modulations, such as those employed by system equipment, need
a considerably higher SNR, with 2048 QAM [17] being the most efficient modula-
tion. Transmission with double polarization is impossible for such systems due to
XPD interference. XPIC technology is required in these situations. The receiver’s
SNR would need to be raised by 24 dB in the system suggested in this paper. XPD
interference does not impact the system and double polarization communication
using 2048 QAM modulation.
Two signals are received in XPIC, one for each of the two polarizations. In
both signals, cross-polarization interference is evident. Adaptive filters are used
to cancel the interference in the opposite branch utilizing the signal received in
each branch. Adaptive filters are also included in the receiver to equalize the two
branches. However, this method cannot satisfy all of the customer’s requirements
over long distances, and the adaptive coding and modulation technique cannot effec-
tively handle interference. Furthermore, sea surface reflection and strong multi-
path fading may readily degrade microwave connection transmissions. The usage
of CDCS provides for a much cheaper cost than undersea fibers.
We suggest an effective diversity combination method for long-range point-to-
point microwave wireless connections in this letter, which may be used in XPIC to
improve their performance. This method may be used for long-distance line-of-sight
transmission and improves transmission throughput.
3 System Model with Description
The suggested CDCS with NC increases the transmission system’s capacity, resulting
in high overall connection spectrum efficiency.
The proposed CDCS system architecture is depicted in Fig. 1.
From N terrestrial receiving dishes, the microwave station receives N flows
designated as F1 , F2 , ..., FN . Each bloc represents a source flow corresponding. The
received packets in each generation will be placed in the buffer after demodulation
and demultiplexing; at this point, the proposed CDCS’s NC will be executed.
Co-channel dual-polarization technology (CCDP) is used with XPIC to allow one
RF channel to transmit two service streams simultaneously. The transmitters put out
two electromagnetic waves with polarization orientations orthogonal to the receiver
on the same channel. After XPIC processing removes the interference between the
two electromagnetic waves, the receiver retrieves the two original signal channels. As
a result, using XPIC doubles transmission capacity without changing channels. When
the XPIC line is not in use, the adjacent channel alternative polarization (ACAP)
arrangement is used to broadcast two service signals concurrently across two RF
channels.
Fig. 1 An intermediate microwave relay station is implementing XPIC transmission with double
polarization with the proposed scheme
The additional buffer may be employed at the coded microwave stations to create
a linear combination of input packets rather than just forwarding them. The suggested
method allows the flexible combination at the intermediate coding station and relies
on packet checksums at the receiver to recover the original packets. Packet checksums
are also produced for the receiver to ensure that the original packets are recovered.
We focus on increasing end-to-end throughput in the worst-case scenario. Each input
flow Fi represents a packet bloc Bi , i = 1, 2, . . . , N .
4 Diversity Combining Techniques
Several combination techniques are known, but we only present in this study the
three main techniques used then we detail the proposed CDCS.
4.1 Typical Diversity Combination Techniques (TDCT)
Selection diversity. Combination by selection: This is the simplest, and probably

the most often used, technique. In this technique, the signal is chosen from the path
with the highest SNR. The drawback of this technique is that the signal with the most
interference may be chosen because the interference along a path cannot be easily
evaluated without knowing the transmitted signal beforehand. We assume that the
signal envelope follows a Rayleigh probability density function.
Defining wi as the instantaneous power of the signal in each path divided by the
average of the noise power, i.e., the instantaneous SNR, and wa as the average of the
signal power in each path divided by the average of the noise power, the probability
Pr of the SNR being either less than or equal to a specific value ws is given by:
Pr [wi ≤ ws ] = 1 − e−ws /wa .
It follows that the probability that all the wi independent values are simultaneously
less than the specified value wa is given by:
m
Pr [w1 , w2 , . . . , wm ≤ ws ] = 1 − e−ws /wa (1)
where m denotes the number of paths to the receiver. The average of ws is therefore:

m
1
E[ws ] = wa (2)
k=1
k
Therefore, this method corresponds to selecting the path with the best SNR.
Maximum-ratio combining. In this technique, it is required that the m links be
aligned in phase and weighted proportionally to the signal level before their summa-
tion. This method, therefore, corresponds to the summation of all the branches with
weights calibrated according to the SNR. The distribution in this method follows (3).

m
(wm /wa )k−1
Pr [w ≤ wm ] = 1 − e−wr /wα (3)
k=1
(k − 1)!
E[wm ] = m − wa (4)
Equal-gain combining. This method is simply a summation after band co-phasing.

It has a similar distribution to the previous method, and it is mean which is given in
(5). Therefore, this method corresponds to sum all the paths with the weights Wn set
to the same gain level, or to 1/N , for example,
π
E[we ] = wa 1 + (m − 1) (5)
4
These combination techniques do not improve the throughput or sensitivity,
although they do improve transmission. In contrast, the proposed technique can
provide many benefits and strongly complements the XPIC and BA techniques.
Diversity combination techniques can combat the fading without increasing the
data rate. Generally, the signal processing for microwave stations is performed done
at the intermediate frequency level. İn which an incoming signal is shifted to an
intermediate frequency for amplification before final detection is carried out.
4.2 Technical Description of the Suggested Coded Diversity

Combining Scheme (CDCS)
Applied CDCS is a random coding technique in terms of packet arrival dates, which
minimizes the time spent by packets in the queues of microwave stations. It uses the
concept of block code, it is based on the definition of network encoding, and it can
reduce the waiting times of packets in queues by forwarding them without waiting
for all packets in the same block.
Consider a directed graph G = (V, E) representing a wireless transmission
liaison, where V = {v1 , v2 , . . . , v N } is a vertex set. E is an output edge.
We consider a microwave station 1 with N input flows and one output flow (Fig. 1).
We assume that for each input flow F1 , F2 , . . . , FN is considered as a bloc of packets,
the deadline for the arrival time of the packet pi , i = 1, 2, ..., n of the bloc B J , j =
1, 2, ..., N is known, and that the Rx and Tx stations are synchronized. Each link
has a capacity of C1,2 (bits/s); i.e., a packet of L bits can be transmitted in at least
L/C1,2 seconds.
NEach nflow isj constituted by n packets. The packets of all the blocs are given by
i=1 j=1 pi , where N is the block number, and n is the number of packets in each
j j
block. We thus have F1 = p11 , p12 , ..., p1n , so F1 = nj=1 p1 , F2 = in p2 , …, and
N j
FN = i=1 p N . These microwave flows waves arrive via the edges {e1 , e2 , ..., e N }
of station 1, as shown in Fig. 1.
Suppose that the packets in a given bloc B J arrive at the coding station 1 at time
t. The buffer is created to store the data of a received data packet bloc.
In typical receivers, the data is received without applying any coding technique.
In our scheme, the packets undergo network coding operations [18], identifying and
reassembling the original error-free data from sent messages.
Consider the case where packets from flows are combined in station 1 (Fig. 1) to
produce a combined output packet flow:

N
n
j j
out
pcoded = ⊕αi pi (6)
i=1 j=1
Suppose that a packet belonging to a given generation arrives at coding station 1

at the time t. The CDCS at coding station 1 is applied as follows:
After demodulation, the linear combination corresponding to the bloc B J is
computed as soon as at least one of the following conditions is satisfied for all
the input flows:
• The buffer contains all of the packets from bloc B J .
• The buffer does not include specific packets from bloc B J .
The second condition describes the absence of a packet from the bloc B J in the
associated flow. The linear combination in this instance is limited to the packets
from the bloc B J that are present in the station. If the buffer is empty at station 1, the
packet is multiplied by the finite field coefficient of the network code and sent via
the output link. If the buffer is not empty and no packets from the bloc are present,
the packet is multiplied by its appropriate finite field coefficient and appended at
the end. Otherwise, if there is a packet from the bloc B J in the buffer, the incoming
packet is multiplied by the finite field coefficient corresponding to that of the packet
in the buffer. The incoming packet is immediately added to the packet already in the
buffer.
The CDCS parameters utilized in our simulation are shown in Table 1.
5 Results and Analysis
We compare the suggested CDCS to the presently utilized methods to assess its
performance improvement. Our proposed CDCS method significantly increases the
throughput of the microwave connection under challenging transmission circum-
stances, as shown in Fig. 2.
The throughput for the different modulation types was significantly improved, as
seen in Fig. 2. When compared to 256 QAM modulation, the maximum 4096 QAM
Table 1 CDCS settings

Settings Value/designation
Bandwidth 56 MHz
Modulation formats QPSK, 8-PSK, 16 QAM, 32 QAM, 64 QAM, 128
QAM, 256 QAM, 512 QAM, 1024 QAM, 2048
QAM, 4096 QAM
EIRP: Equivalent isotropic radiated power 25 dBm
Tx and Rx antenna heights 40 m
Distance between Rx and Tx antennas More than 150 km
Strong environment adaptability −40 °C, +55 °C
Fig. 2 Performance throughput evaluation and comparison of two modulation formats 256-QAM,
4096-QAM (TDCT), and 4096-QAM (CDCS)
modulation increased throughput by 45%. In addition, Fig. 2 illustrates the signif-

icant throughput increase achieved by CDCS. The proposed scheme outperforms
the typical technique with 256 QAM and 4096 QAM by 6.5 times and 4.3 times,
respectively.
Figure 3 compares our proposed scheme CDCS with the typical technique TDCT
under different modulation formats.
Figure 4 compares our proposed approach with typical and recent techniques
XPIC under different modulation formats.
As shown in Fig. 4, CDCS’s throughput was better than the typical and current
XPIC using different modulation formats. The results of this improvement may be
attributed to the increased transmission of packets that have been encoded using NC.
Fig. 3 Performance evaluation and comparison of throughput between the TDCT and the proposed
CDCS under various modulation formats
Fig. 4 Performance evaluation and comparison of throughput between the TDCT, XPIC, and the
proposed CDCS under various modulation formats
Fig. 5 Capacity always toward more large
Thus, since CDCS also outperformed the typical method and XPIC by 70% and 50%,
it eliminated all kinds of interference efficiently and rapidly.
In comparison with the current methods illustrated in Fig. 4, the suggested scheme
improves the throughput by 4.36 times and 2.18 times, compared to the typical tech-
nique and XPIC, respectively. CDCS uses the technique of not waiting for all packets
to arrive at the microwave station before removing interferences, which results in
these benefits. When greater symbol rates were employed, however, the power effi-
ciency of these modulations rapidly deteriorated. Small variations in frequency at
higher symbol frequencies are to blame for the deterioration.
Figure 5 shows the capacity evolution for various modulation types under the
recent technique XPIC and the proposed CDCS.
CDCS outperformed the state-of-the-art techniques by 12.6 times, 8.5 times, and
4.8 times for 256 QAM, 4096 QAM, XPIC, respectively.
6 Conclusion
The CDCS presented in this paper is an excellent option for overcoming the signif-
icant difficulties long-distance connections face, especially when the largest single-
hop link spans up to 160 km. The CDCS improves the performance of the channel
avoiding interference. CDCS efficiently eliminates unnecessary waiting time in the
buffer CDCS. The deployment of EDCT for medical applications will be the focus
of future research.
References
1. Ar-Reyouchi E, Ghoumid K (2019) Technical accuracy based on efficiency algorithm for

measuring standing wave ratio in wireless sensor network. Int J Commun Antenna Propag
(IRECAP) 9(2):137–143
2. Fengyu L, Yan Z, Limin X, Chunhui Z, Shidong Z (2013) Fading characteristics of wireless
channel on high-speed railway in hilly terrain scenario. Int Jo Antennas Propag, pp1–9
3. Razmhosseini M, Vaughan RG (2015) Diversity design in wireless communications. In: 5th
International conference and workshop on computing and communication (IEMCON). IEEE,
Vancouver, pp 1–6
4. Maslouhi I, Ar–reyouchi EM, Ghoumid K, Baibai K (2018) Analysis of End-to-end packet
delay for ınternet of things in wireless communications. Int J Adv Comput Sci Appl (IJACSA)
9(9):338–343
5. Chatei Y, Hammouti M, Ar-reyouchi EM, Ghoumid K (2017) Downlink and uplink message
size ımpact on round trip time metric in multi-hop wireless mesh networks. Int J Adv Comput
Sci Appl (IJACSA) 8(3):223–229
6. Hoang TQV, Vuong TP, Trinh LH, Ferrero F, Dubard J (2014) Space diversity for robust
wireless power transmission in multipath environments. In: IEEE antennas and propagation
society ınternational symposium (APSURSI), pp 1125–1126
7. Ho T, Medard M, Koetter R, Karger DR, Effros M, Shi J, Leong B (2006) A random linear
network coding approach to multicast. IEEE Trans Inf Theory 52(10):4413–4430
8. Ahlswede R, Cai N, Li S–YR, Yeung RW (2000) Network information flow. IEEE Trans Inf
Theory 46(4):1204–1216
9. Ar-Reyouchi EM, Ghoumid K, Ar-Reyouchi D, Rattal S, Yahiaouiand R, Elmazria O (2021)
An accelerated end-to-end probing protocol for narrowband IoT medical devices. IEEE Access
9:34131–34141
10. Bhadra DR, Joshi CA, Soni PR, Vyas NP, Jhaveri RH (2015) Packet loss probability in wireless
networks: a survey. In: International conference on communications and signal processing
(ICCSP). IEEE, Melmaruvathur, India
11. Song D, Shin W, Lee J (2021) A maximum throughput design for wireless powered
communication networks with IRS-NOMA. IEEE Wireless Commun Lett 10(2):849–853
12. Jamil S, Ali M, Hussain Y, ul Haq MI, Qamar F (2020) Throughput and energy efficiency maxi-
mization in millimeter wave - micro wave hetnets. In: 23rd ınternational multitopic conference
(INMIC). IEEE, Bahawalpur, Pakistan
13. İnceöz E, Tutgun R, Turgut AMY (2020) FPGA based transmitter design using adaptive coding
and modulation schemes for low earth orbit satellite communications. In: IEEE 5th ınternational
symposium on telecommunication technologies (ISTT). IEEE, Shah Alam, Malaysia
14. Lamrani Y, Benchaib I, Rattal S, Ar-Reyouchi EM, Ghoumid K (2021) Performance analysis
on modulation techniques for medical devices sensitivity in wireless NB-IoT network. In:
International conference on smart data ıntelligence (ICSMDI),SSRN,Tamil Nadu, India
15. Zhao D, Guo F, Guo S, Wang B (2018) Research on shaping microwave fields at UESTC.
In: International conference on microwave and millimeter wave technology (ICMMT). IEEE,
Chengdu, China
16. Ar-Reyouchi EM, Lamrani Y, Benchaib I, Ghoumid K, Rattal S (2020) The total network
capacity of wireless mesh networks for IoT applications. Int J Interact Mob Technol (iJIM)
14(8):61–75
17. Wakayama Y, Gerard T, Sillekens E, Galdino L, Lavery D, Killey RI, Bayvel P (2021) 2048-
QAM transmission at 15 GB over 100 km using geometric constellation shaping. Opt Express
29:18743–18759
18. Cai N, Yeung RW (2002) Network coding and error correction. In: IEEE ınformation theory
workshop. IEEE, Bangalore, India
Extractive Text Summarization
of Kannada Text Documents Using Page
Ranking Technique
C. P. Chandrika and Jagadish S. Kallimani
Abstract Text summarization is one of the core concepts of natural language

processing techniques. Text summarization is a process of producing a quality
summary statements from a document which has a set of raw information, without
losing or altering any valid information. We have developed a Kannada text summa-
rization model using page ranking algorithm. The sentences are represented as
vectors; similarity between the sentences is calculated using cosine similarity and a
similarity graph is generated. The graph is fed as an input to the text ranking algo-
rithm. The text ranking algorithm will then prioritize the sentences, and this output
is considered as the summary of the Kannada document. The proposed model is
evaluated by Rouge Scores; ROUGE 1 and L average F1 values are 65% and 64%
respectively. ROUGE-2 average F1 score is near to 60%. This indicates the perfor-
mance of the proposed text summarization model for the Kannada language is better
and can further be improved with different Kannada datasets.
Keywords Word embeddings · Vectorization · Sentence vectorization · Cosine

similarity · Page ranking algorithm
1 Introduction
In modern days, the internet and social media are loaded with an abundance of infor-
mation. Any form of information is reduced to only what is necessary to understand
the topic. For this, one of the tools implemented is the text summarization software.
Applications of summarization are:
• To get the highlights of a newspaper
C. P. Chandrika (B)
Department of Computer Science and Engineering, M S Ramaiah Institute of Technology,
Bangalore 560054, India
e-mail: chandrika@msrit.edu
J. S. Kallimani
Visvesvaraya Technological University, Belagavi, Karnataka, India
https://doi.org/10.1007/978-981-16-7610-9_51
684 C. P. Chandrika and J. S. Kallimani
• Easy to understand about a poem/article with huge size

• A quick way to get instance information if summarization tools work automati-
cally.
The proposed work focuses on developing an application which summarizes a
Kannada document to a more concise form but maintaining the same meaning without
losing any crucial information. A lot of summarizing techniques were already present
for English language, but only a little effort has been made into this field for the
Kannada language. A researcher has a great scope in developing efficient tool for
the Kannada text summarization.
In this work, we tried to develop a summarization model for the Kannada language.
There are fundamentally two summarization methods—extractive approach and
abstractive approach.
Extractive Text Summarization: A highly intensive text summarization method
which summarizes the Kannada text without modifying the actual meaning or word
in the given text, it will extract the important sentence from the paragraph of Kannada
word document and produces a desired summary.
Abstractive Summarization: In this, given an article, only the novel sentences
are generated as summary by reframing the sentences or adding new terms, instead
of simply extracting the important sentences. In this paper, we have implemented
extractive method of text summarization.
Text Rank Algorithm: Text rank is same as page ranking algorithm that is used
by the Google to list the important web pages as result of the user’s query. In the
proposed work, instead of listing the pages, we want top level sentences to produce the
summary, so page ranking algorithm is termed as text ranking algorithm in this work.
Implementing the text rank algorithm in Python is an easy task. It does not require
any more complex operation but a similarity graph. Compared to the other techniques
found from the survey, the proposed method was found effective in prioritizing the
sentences.
2 Related Works
It is always a good start to do surveys on the existing techniques to implement a new

model. Even though summarization techniques are heavily available for English text,
we have limited our survey to Indian regional languages since our proposed work is
on summarizing the Kannada text. Various methodologies, conclusions and remarks
used by different researchers are summarized in Table 1.
From this survey, we found that main text summarization methods such as extrac-
tive and abstractive techniques have been implemented by many researchers using
different methodologies like latent semantic analysis (LSA) tool, supervised machine
learning algorithms. Deep learning techniques are also employed for summarization
using neural networks, RNN, and BI-LSTM with attention techniques. The main
challenges identified are:
Table 1 Survey summary on Kannada text summarization
S. No Objective Methodology Findings/outcomes Remarks
[1] To explore on different A survey has been carried out to The challenges faced during the The survey is necessary for a
summarization techniques on analyze the different approaches summarization processes are researcher to build a
Indian regional languages like statistical, linguistic, machine identified. A lot of work in text summarization tool for his local
learning and hybrid approaches processing needs to be standardized language
used for summarizing the like stop word list, segmentation,
documents of different languages POS tagging etc.
[2] To automatically summarize news MNB algorithm is used for text The results were found to be The authors have used cosine
articles using an extractive classification and ROCK clustering satisfactory, the model to be tested similarity to find the similarity
summarization approach algorithm is used for clustering the for huge corpus and for different between the documents which is
articles domains quite common and powerful. The
model to be evaluated is using a
different Rouge score
[3] To demonstrates an abstractive Extraction algorithm and When compared to the human made The NER tagger rule is useful for
summarization on the Kannada scheme-based templates are used summaries few aspects were missed both short and lengthy data
texts due to restriction in length samples
[4] To automate extractive summary The model is trained and tested The authors obtained F1 scores of The fusion is a new idea for
process for various Kannada dataset using clustering and LSA 76.48 and 78% using clustering, Kannada summarization.
Extractive Text Summarization of Kannada Text Documents ...
separately, and the summaries LSA and Fusion techniques, Obtained promising results and
generated by both techniques are respectively. This approach can be can be tried with different set of
combined using set operations and combined with other techniques in articles
final summary is generated future to get better accuracy
[5] To implement text classification on Different machine learning Results of all the models are Text classification is one of the
AG’s News Dataset using various algorithms are used for training compared, and the accuracy of all important tasks to perform
machine learning algorithms and testing the classification model the algorithms are more than 80%. summarization, the ideas can be
The same model can be trained and used to build models for other
tested with deep learning techniques regional languages also
in future
(continued)
685
Table 1 (continued)
686

[6] To develop an efficient data All the preprocessing steps like The authors have tried to create a The proposed data preprocessing
preprocessing for the quality tokenization, stemming, removing ready data processing model to be model found to be useful and
summarization of the Marathi unnecessary words and vector used for different text processing more effective
language creation has been done on Marathi applications
Text
[7] To develop an extractive text Linguistic characteristics like Obtained a maximum F1-Rouge The proposed model is found
summarization for the Telugu events, named entities are used for score of 87.3% compared to other useful to implement for other
language selecting quality statements, models like keyword and neural regional languages
sentences are ranked based on the summarizer. The model to be tested
similarity features with different dataset to evaluate its
performance
[8] To develop an abstractive text Deep learning techniques like The outcome is measured against The lack of human-generated
summarization model for RNN, LSTM with attention many metrics, but the output summary for Telugu is a major
the Telugu language using deep mechanism have used for training summary is found to be satisfactory. drawback, so creating a dataset
learning techniques and testing Telugu articles The model to be improved by using regarding this is essential
transformer technique in future
[9] To develop abstractive The RNN is used with an attention The model is tested with 50 and 100 The model requires further
summarization model for mechanism to train and test the epochs and found max Rouge 1 and fine-tuning of parameters to
multi-Telugu document using dataset. The model uses semantic L score of 0.29%. The model to be improve the accuracy
semantic approach concepts and Jiang similarity tested with various dataset to
measure its performance
[10] To develop a summarization model Bi-LSTM is implemented to The model performed well with two The proposed model proves to be
using word and part-of-speech extract the word and POS features. datasets and obtained a max Rouge useful to combine two features
based on the Attention Mechanism Attention mechanism is also used score of 37.67% and test it with different datasets
to identify the important words
required for summarization
(continued)
C. P. Chandrika and J. S. Kallimani
Table 1 (continued)
[11] To develop a text summarization Automated NLP tools were used An efficient and effective method to Keywords plays a vital role in
tool based on the keywords that are for summarization module achieve text summarization using generating summary, this can be
generated automatically. the automation concept tested on the local languages.
[12] To develop a summarization module Text ranking algorithm with The model obtained an accuracy of Text ranking algorithm is a
using the Text rank algorithm based similarity metrics with NLP 80%, and the mean time of common approach used for
on the principle of word frequency libraries summarization is around 15.2 ms. summarizing foreign languages,
An efficient preprocessing model to and the same idea is utilized in
be developed in future to address the our work
limitations of the current work
[13] To summarize Tamil sports articles The model is developed using a The output summary is evaluated The authors have listed quality
using neural networks restricted Boltzmann machine with using ROUGE score but not features for the summarization,
a neural network tabulated in the article and ANN is also a good choice to
train and test the model
[14] To develop an extractive Text ranking algorithm using Obtained max Rouge F1-score of Different encoding techniques
summarization model for different encoding techniques, and 57.4% using modified text rank with also has an impact on the accuracy
the Malayalam text documents the performance is compared SIF + MMR approach of the summarization model
[15] To develop a graph-based model for Graphical sentence representation A max F1 score of 60% is obtained Getting the dataset and processing
Extractive Text Summarization of Kannada Text Documents ...
text summarization for model with text ranking algorithm by using this model, and average of is a tedious task; there is a scope
the Malayalam text 51% sentences are similar between for the researcher to look into
machine and human-generated these aspects
summary
687
• Picking up the quality words to deliver effective summarization

• Applying different algorithms for the same dataset may produce different
summary reports
• For Kannada text, few words may have different meanings, identifying and
selecting the correct words based on the context is a difficult task.
We propose a text ranking algorithm for summarization, which has proved
successful for other languages.
3 Implementation and Results
This section discusses the entire process from collecting the dataset, data prepro-
cessing to applying the text ranking algorithm. The Kannada dataset is collected
from various articles published over the Internet for the summarization. The kind
of data varies from sports and mobile reviews to life skills.
3.1 Summarization Module
The text rank algorithm is a derivative of page rank algorithm in which the web pages
are ranked by the number of times visited and the number of pages which are linked
to them. This algorithm is the heart of summarization model, and the following are
the necessary steps to be carried out before applying the text ranking algorithm and
same is demonstrated in Fig. 1.
• The first step would be to concatenate all the text contained in the articles
• Then split the text into individual sentences
• In the next step, we will find vector representation (word Embeddings) for each
word and every sentence
Fig. 1 Flow diagram of the summarization model

Extractive Text Summarization of Kannada Text Documents ... 689
• Similarities between sentence vectors are then calculated and stored in a matrix
• The similarity matrix is then converted into a graph, with sentences as vertices
and similarity scores as edges, for sentence rank calculation
• Finally, a certain number of top-ranked sentences is represented as a final
summary.
A brief discussion about the perquisite tasks to be carried out before applying the
text ranking is given below:
Text Preprocessing:
The Kannada text processing is quite complex due to its richness in morpho-
logical structure. The same language is spoken in different ways by different
region of the Karnataka. Processing this language is not so easy compared
to English language. Dataset should be clean before the processing to make
noise-free and efficient during processing. Few words in the dataset are iden-
tified as stop words and these words does not contribute any meaning to the
process, example: {Mattu}[and], {Haagu}[also], {Avaru}[those
people], {Bagge}[about], {Aadare}[but], {Avarannu}[them],
{Thamma}[them], {Ondu} [single], {Endaru} [said], {Mele}
[above], {Helidaru} [said], {Seridante} [including], {Balika}
[afterward] etc. So, these words are removed from the dataset using the Kannada
stop words file and this file has around 200 words. After this, we obtained a clean
dataset.
Word Embeddings:
Model does not work on the raw Kannada words; these words should be represented
in numerical form, and this is done by using word embedding representation which is
available online from Wiki word vectors. It is a huge file which has 300 dimensions for
all the Kannada words. Each dimension represents usage of a word in different ways.
Example for the word {Echharike} [careful]: the following are the dimension
values (only 13 dimension values are shown as samples).
−0.41236, 0.082654, 0.64355, −0.2292, −0.043449, 0.20626, −0.039713, −
0.28893, 0.32872, −0.12913, −0.12276, −0.042511, 0.12026, −0.35143, 0.091768,
0.44619, −0.25532, 0.23058, −0.23937, 0.073449, −0.28712, 0.41471, −0.052678,
0.094267.
The Python code is written to obtain 300-dimension values for all the words
in the dataset. Word embedding vector has 400 MB of the Kannada words in 300
dimensions.
Once numerical value of each word in the sentence is obtained, then each sentence
is represented in numeric form called as vector for the further processes. It is created
by using the simple Eq. 1.
(sum(numerical_values_of_individual_word))
vector_each_sentence = (1)
total_no_of_words_sentence
The 300-dimension values for the above sentence is as follows (Sample 10

dimensions has been taken).
0.05722991, −0.01796172, −0.28291529, 0.11364377, 0.03276174,
0.07978289, 0.08735752, 0.01251564, −0.23779888, 0.01059534,
Note: Sample 10 dimensions has been taken.
Similarity Matrix Graph:
After creating a vector for each sentence, now we have to check the similarity between
the sentences, this means we will be removing the duplicate and similar meaning
sentences because the final summary report size should be less than the original
dataset size. To achieve this, cosine similarity matrix shown in Eq. 2 is used. Cosine
comparability is a proportion of likeness between two non-zero vectors of an internal
item space that gauges the cosine of the point between them. Before this, zero matrix
of size “n × n” should be created, where “n” represents each sentence size. Consider
two sentences S1 and S2; the cosine similarity between these two is given by Eq. 2.
S1 · S2
cosine_similarity_kannada = (2)
||S1||×||S2||
Example: Consider two sentences S1 and S2 as follows:
The cosine similarity between these two are
S1 S2
S1[0 0.35831465]
S1 to S1 similarity is 0 but between S1 and S2, the value is 0.35831465; like this,
similarity matrix is generated for all the sentences. The maximum value indicates
high similarity between the sentences. Once the sentence vector is ready, next step
is applying the text rank algorithm. This algorithm will work on the graphs, so there
is a need to create a similarity graph as shown in Fig. 2 (sample 11 sentences are
considered). In the proposed work, sentences are considered as nodes and similarity
scores between sentences as edges. The below graph shows that for a given text,
summary is obtained in thirteen sentences. The graph indicates how one sentence is
linked to another. If sentence S1 has some words present in S2, then there will be an
edge between S1 and S2.
Fig. 2 Similarity graph of

sentences
3.2 Applying Text Rank Algorithm:
Page rank is one of the popular algorithms used to generate important pages. It
prioritizes a webpage based on how many links have been pointing to it. Similarly in
our work, it produces important top important sentences. Initially, the text rank for
all the sentences is equal to 1/11 where 11 is the total number of nodes. In the next
iteration, for a given node 0, it identifies how many nodes are pointing to it. Consider,
node 2 and 3 are pointing to node 0. Also, assume total number of outgoing links for
2 = 2 and 3 = 3, then rank of node 0 is calculated as shown in Eq. 3.

1 1
Rank of Node 0 = 11
+ 11
(3)
2 3
Numerator = Text rank of previous iteration.

Denominator = Total number of outgoing links of nodes 2 and 3, respectively.
Collecting Summary:
After applying an algorithm, the entire rank score array has been sorted in reverse
order because higher value of the ranked sentence may have similar meanings due to
very less in the weightage differences. Summary can be presented as a short, average,
or long based on the percentage specified in the model. The result of the model after
uploading a Kannada document is shown in Fig. 2.
ROUGHE 1, 2, and L metrics are used for the evaluation. This is used to calculate
the F1 score, precision, and accuracy based on the system-generated summary and a
human-generated summary. The model is customized to produce the summary based
on the user requirement like two sentences summary can be produced out of four
sentences. ROUHGE metric is applied on variable set of statements, and the results
are tabulated in the following tables.
ROUGE 1: This metric considers the unigram feature. Table 2 shows the results
of the same.
ROUGE 2: It also calculates results same as ROUGE 1, but it is based on bi-gram
of words. The results produced by this metric is shown in Table 3.
ROUGE-L: Longest common subsequence between two sequences of texts,
mainly indicates how long the similarity between two statements. If it is more, then
two statements are more similar. Table 4 shows the Rouge-L metrics of our model.
Comparisons with other techniques:
From the survey we have carried out, performance of summarization techniques used
by other models has been tabulated in Table 5.
In the proposed model, accuracy, precision, and recall using ROUGE metrics have
been calculated, while others have evaluated differently. ROUGE 1 shows better
values compared to ROUGE 2. We obtained 65%, 70%, and 64% of average F1
score, recall, and precision values respectively, which is better compared to other
techniques listed in Table 5. This is due to the lack of data processing tools in
Kannada language. If the model is trained with a good set of processing tools, a huge
and different datasets, the results might get improved. A sample snapshot of the
summary is shown in Fig. 3
Table 2 ROUGE 1 results

No of sentences Total no of words Total no of words F1 Score % Recall % Precision %
in the input in the system in the manual
summary summary
11 56 45 63 67 60
20 74 81 66 77 66
30 111 105 63 64 63
40 156 153 62 72 63
53 301 162 68 69 68
Average 65 70 64
Table 3 ROUGE 2 results

No of sentences Total no of words Total no of words F1 score % Recall % Precision %
summary summary
11 56 45 46 49 43
20 74 81 71 52 51
30 111 105 46 67 46
40 156 153 45 45 45
50 301 162 63 53 52
Average 54 53 47
Table 4 Rouge-L results

No of sentences Total no of words Total no of words F1 score % Recall % Precision %
summary summary
11 56 45 62 65 59
20 74 81 66 67 65
30 111 105 64 64 63
40 156 153 62 63 63
50 301 162 69 70 68
Average 64 66 63
Table 5 State-of-the-art comparisons

References Technique used Language Accuracy in % ROUGE Score F1 in %
[4] Fusion Technique Kannada 78
[6] Text Ranking using Telugu 87.3
named entities
[9] RNN with attention Telugu 29
[10] Bi-LSTM using POS Tamil 37.67
features
[12] Text ranking with NLP Malayalam 80
libraries
[14] Text ranking algorithm Malayalam 57.4
using different encoding
Techniques
[15] Graph-based model 60
4 Conclusion and Future Works
Extractive text summarization gathers the important sentences from a given docu-
ment. We have used a text rank graph-based algorithm which is similar to Google
Page rank algorithm. Here, the sentences are similar to the web pages. A GUI appli-
cation is also developed to upload a Kannada document and to take the input from
the user that how many lines he wants to see in the summary. Once the original
and summarized document is compared, it is found that the summarized document
contains important lines without losing the meaning in the original document. The
result is satisfactory. The proposed work considers documents with an average size
of 5000 lines from different documents. Since we are using word embeddings of
300 dimensions for constructing the vector for a single word, instead of that it will be
good to identify the correct POS tagging value of that word. Similarity calculation
is based only on the cosine function, other techniques can be used to test the perfor-
mance of the proposed model. These limitations can be considered as our future
works.
Fig. 3 Output snapshot of the proposed work
References
1. Mamidala KK, Sanampudi SK (202) Text summarization for Indian languages: a survey. Int
J Adv Res Eng Technol (IJARET) 12(1):530–538. https://doi.org/10.34218/IJARET.12.1.202
1.049
2. Anusha BS, Ramesh D, Uma D, Lalithnarayan C (2019) Multi-classification and automatic text
summarization of Kannada news articles. Int J Comput Appl 181(38):24–29. ISSN: 0975-8887
3. Shilpa GV, Shashi Kumar DR (2019) Abs-Sum-Kan an abstractive text summarization tech-
nique for an India regional language by induction of tagging rules. Int J Recent Technol Eng
(IJRTE), ISSN: 2277-3878
4. Swamy A, Srinath S (2019) Fused extractive summarization approach for Kannada text
documents. Int J Adv Sci Technol 28(18):565–580. ISSN: 2005-4238
5. Sunagar P, Kanavalli A, Nayak SS, Mahan SR, Prasad S, Prasad s (2020) News topic classi-
fication using machine learning techniques. In: International conference on communication,
computing and electronics systems: proceedings of ICCCES 2020, pp 461–474
6. Dhawale AD, Kulkarni SB, Kumbhakarna VM (2020) Automatic pre-processing of Marathi
text for summarization. Int J Eng Adv Technol (IJEAT) 10(1). ISSN: 2249-8958
7. Mamidalaa KK, Sanampudib SK (2021) A Heuristic approach for Telugu text summarization
with improved sentence ranking. Turkish J Comput Math Educ 12(3). ISSN: 4238-4243
8. Mohan Bharath B, Aravindh Gowtham B, Akhil M (2021) Neural abstractive text summarizer
for Telugu language. Comput Lang. arXiv:2101.07120
9. Naga Sudha D, Madhavee Latha Y (2020) Multi-document abstractive text summarization

through semantic similarity matrix for Telugu language. Int J Adv Sci Technol 29(1). ISSN:
2005-4238
10. Zhao F, Quan B, Yang J, Chen J, Zhang Y, Wang X (2019) Document summarization using
word and part-of-speech based on attention mechanism. J Phys Conf Ser 1168(3):032008, 1–9.
https://doi.org/10.1088/1742-6596/1168/3/032008
11. Bharti SK, Babu KS, Jena SK (2017) Automatic keyword extraction for text summarization: a
survey. ArXiv, Vol-abs/1704.03242, Corpus ID: 23384543
12. Sarika M, Rajeswari KC, Lavanya AP (2021) Comparative analysis of Tamil and English news
text summarization using text rank algorithm. Turkish J Comput Math Educ 12(9):2385–2391.
ISSN: 1309-4653
13. Priyadharshan T, Sumathipala S (2018) Text summarization for Tamil online sports news using
NLP. In: 3rd international conference on information technology research (ICITR), pp 1–5.
https://doi.org/10.1109/ICITR.2018.8736154
14. Manju K, David Peter S, Idicula SM (2021) A framework for generating extractive summary
from multiple Malayalam documents. Information 12(1):1–16. https://doi.org/10.3390/info12
010041
15. Kanitha DK, Mubarak DMN, Shanavas SA (2018) Malayalam text summarization using graph
based method. Int J Comput Sci Inf Technol 9(2):40–44. ISSN: 0975-9646
Destructive Outcomes of Digitalization
(Credit Card), a Machine Learning
Analysis
Yashashree Patel, Panth Shah, Mohammed Husain Bohara, and Amit Nayak
Abstract As the phase of digitization reaches our day-to-day lives, things are easily
available and accessible by the computer, which is quite easier and faster method for
transactions. The pandemic also played a huge role in growth in credit card fraud
activities. And, that has led to massive increase in credit card fraud dramatically.
As a result, fraud detection should include surveillance of the person’s/spending
customer’s attitude in order to determine, prevent, and detect unwanted behavior.
For both online and in-person buying, credit cards are the most convenient way
of payment. Fraud detection is agitated not only with capturing fraudulent activi-
ties, but also with discovering them as early as possible, since this kind of fraud
costs millions of dollars of people. Machine learning algorithms have proven to be
extremely useful in detecting fraud of smart cards. Because of the uneven nature of
regular classification algorithms, they are ineffective in detecting credit card fraud.
Isolation forest algorithm is been used in the proposed scheme, and the local outlier
(Tripathi et al. in J Pure Appl Math 118:229–234, [1]) factor is used to recognize
fraudulent transactions and their accuracy.
Keywords Machine learning · Imbalanced data · Smart card · Fraud detection ·

Isolation forest model · Local outlier factor
Y. Patel (B)
Department of Computer Science and Engineering, Devang Patel Institute of Advance
Technology and Research (DEPSTAR), CHARUSAT, Charotar University of Science and
Technology (CHARUSAT), CHARUSAT Campus, Changa 388421, India
P. Shah · A. Nayak
Department of Information Technology, Devang Patel Institute of Advance Technology and
e-mail: amitnayak.it@charusat.ac.in
M. H. Bohara
Department of Computer Engineering, Devang Patel Institute of Advance Technology and
e-mail: mohammedbohara.ce@charusat.ac.in
https://doi.org/10.1007/978-981-16-7610-9_52
698 Y. Patel et al.
1 Introduction
Deceit involving a smart card that occurs as a consequence of the card owner’s
personal loss of the card, the card being taken by fraudsters, or as a result of tech-
niques such as phishing, skimmer, identity theft, and so on is referred to as credit card
fraud. Financial fraud of this nature has a significant influence on a country’s commer-
cial, organizational, and government sectors. The rate of fraudulent transactions has
increased in today’s world of cyberspace technology, where credit card purchases
have become the most convenient means of transaction, whether online or offline.
As previously stated, there are two sorts of credit card fraud transactions that might
occur. The first type shows a situation in which the cardholder’s information is leaked
to the bank. The second category depicts credit card theft that occurs when a lost card
goes into the hands of humbugs. Credit card fraud was always thought to be a figure
clad in all black snatching your card from your wallet, but that was before the Internet
erupted into society. Most scammers nowadays do not even require your physical
card. Before applying any machine learning techniques, successful preprocessing of
the dataset is needed. This research takes into account the unequal existence of credit
card data, as well as an isolation forest and the outlier factor, which are both local
characteristics. The most notable advantage of employing this approach for recogni-
tion of deceit in smart cards is that it can operate with large amounts of training data
with ease. Electronic shopping has become a vital and necessary part of our modern
life. Credit card companies must now be able to detect bogus credit card purchases
in order to prevent customers from being charged for things they did not buy. As the
amount of transactions grows, the number of dishonest transactions grows as well
[1]. Such difficulties can be solved using machine learning and related methods. The
purpose of this project is to demonstrate how to use machine learning to model a data
collection. The model is then used to determine whether or not a new transaction is
fraudulent. Our objective is to detect all fraudulent transactions while minimizing the
number of false fraud categories. The major focus was on data analysis and prepro-
cessing, as well as the application of a number of anomaly identification methods,
such as the isolation forest approach and the local outlier factor.
2 Literature Review
Fraud of smart cards has been the subject of a significant amount of research. The
methods created can be divided into two categories, as described below.
Destructive Outcomes of Digitalization (Credit Card) … 699
2.1 Techniques of Machine Learning
The technique presented in this work uses the most up-to-date machine learning
algorithms to discover outliers, or unusual behaviors, in credit card fraud detection.
When viewed in depth on a larger scale with real-life elements, the entire architecture
diagram can be depicted as follows. To begin, we obtained our dataset from Kaggle,
a data analysis and dataset-sharing Web site [2]. This dataset has thirty-one columns,
with twenty-eight of them labeled as v1–v28 to protect personal information. In the
other columns, time and number are represented. The interval between the first and
subsequent transactions is represented by the time. The total amount of money that
has been traded is referred to as the sum. Class 0 represents a genuine transaction,
whereas class 1 represents a fraudulent transaction. We use various graphs to look for
abnormalities in the dataset and to visualize it. According to our research, the number
of fraudulent transactions is far smaller than the number of actual transactions [3].
According to the data, the least number of transactions was performed at night, while
the most were made during the day. Only a few transactions come close to predicting
variables and the class variable, and the majority is trivial. The dataset has been
formatted and examined at this stage. To make sure that the evaluation is equal, the
class column is excluded. A sequence of algorithms from modules processes the
data. The isolation forest algorithm and the local outlier factor are added to this data
after it has been fitted into a model [4].
These algorithms are portion of the sklearn library. Classification, regression,
clustering, and outlier detection are included in the sklearn packages [5]. A single
data point that departs considerably from the rest of the data points is considered
anomalous. One example is detecting credit card fraud based on the amount spent. If
an object is anomalous in a specific context, it is referred to as a contextual anomaly.
It is only in this case that there is a contextual abnormality. If some related things
can be observed as an anomaly when compared to other objects, this is referred to
as collective anomalies. A set of items, not a single entity, can be anomalous in this
case.
Anomaly detection can be done using various methods, which also includes super-
vised anomaly detection. A system in which training and test datasets are labeled,
allowing a basic classifier to be trained and applied [6]. This case is alike to conven-
tional pattern recognition, with the exclusion of groups, which are usually highly
unbalanced. Not all classification methods are appropriate for this task. For example,
some decision trees scuffle to deal with unbalanced data. Artificial neural network
(ANN) or support vector machine (SVM) can execute better. This configuration,
however, is unnecessary since we need to be aware of all irregularities and correctly
mark data. Anomalies are not always well known ahead of time in certain situations
or as a result of novelties discovered during the research process. Semi-supervised
anomaly detection is a term used to label the detection of anomalies [4, 7]. We gather
knowledge from training results at first when we do not have any knowledge at
all. This setup often employs training and evaluation datasets, with the training data
consisting solely of standard data free of anomalies. According to the theory, a model
700 Y. Patel et al.
of the average class has already been taught, and discrepancies can be detected by
diverging from it. This technique of classification is referred as “one-class” classifica-
tion. There are famous methods: SVMs and autoencoders of one class. In general, any
density estimation method, such as Gaussian mixture approaches or kernel density
estimation, can be used to model the probability density function of the normal
groups.
Unsupervised anomaly detection [8, 9] is a system in which we do not know what
is standard and what is not in the data. It is the most flexible configuration that does
not need labels. Furthermore, there is no distinction between a training dataset and a
test dataset. According to the definition, unsupervised anomaly detection algorithms
rate data based on natural properties of the dataset. To distinguish what is natural
and what is an outlier, distances or densities are frequently used.
2.2 Isolation Forest Method
The isolation forest is a self-contained anomaly detection system that works by

isolating abnormalities [10, 11]. Rather than aiming to develop a model of normal
examples, it isolates anomalous points in the collection directly. It is a memory-
friendly algorithm that is quick. It locates anomalies in data by isolating outliers. The
isolation forest occurs as a result of an unsupervised machine learning algorithm [2].
Isolation forest is based on the decision tree algorithm’s theorem. The procedure in
a decision tree for guessing the class of a given dataset begins at the root node, or the
following node then relates the attribute value to the further sub-nodes and continues.
The loop is repeated until the tree’s leaf node is reached. Isolation forest’s activities are
based on the recursion principle. This method iteratively creates partitions on datasets
by picking a feature at random and then selecting a split value for the feature at
random. In comparison with so-called normal data points in the collection, anomalies
may necessitate fewer random partitions to be separated. As a consequence, the
oddities will be the points with a short life span.
2.3 Local Outlier Factor
The unsupervised outlier detection tool is the local outlier factor. It computes each
sample’s anomaly ranking. It even computes the local density variance of a given
sample in relation to its neighbors. The anomaly score is decided by how distinct the
sample is from the surrounding neighborhood. The local outlier factor can also be
used to identify unsupervised outliers. It generates an anomaly score to reflect data
points in the data that are regarded outliers. This is accomplished by estimating a
data point’s local density differential from nearby data points. The distances between
data points that are close neighbors are used to determine local density (k-nearest
neighbors). As a consequence, the local density of each data point may be determined.
Fig. 1 k-distances between

different neighbors in a point
cluster
We can see which data points have comparable densities and which have lower
densities than their neighbors by comparing data points. Individuals with the lowest
densities are considered outliers [12]. To begin, k-distances are the distances between
points that are measured for each point in order to determine its k-nearest neighbors.
The point’s second-closest neighbor is defined as the point’s second-closest point.
The k-distances between different neighbors in a point cluster are shown in this
diagram (Fig. 1).
The reachability distance is calculated using this distance. It is equal to the sum
of the distance between two points plus the k-distance between them. Consider the
equation below, in which B represents the center point and A represents a position
close it [13, 14].
Reachability distancek (A, B) = max{k - distance(B), d( A, B)} (1)
To estimate the time’s native reachability density, reachability distances to any

or all of a degree area unit’s k-nearest neighbors are measured (LRD). The native
reachability density may be calculated by taking the inverse of the sum of all the
reachability distances of all the k-nearest neighboring points around a degree. The
smaller the difference between the lines, and thus the higher the density, the larger
the distance between them; hence, the equation is reversed [15].

B ∈ Nk (A)reachability - distancek (A − B)
lrdk (A) := (2)
|Nk (A)|
702 Y. Patel et al.
The native outlier problem is solved by calculating the common of the lrds of
k sets of neighbors of a degree, as well as the lrd of that period (LOR). The LOR
equation is as follows:
k (B)
B ∈ Nk (A) lrd
lrdk (A) B ∈ Nk (A) · lrdk (B)
LOFk (A) := = (3)
|Nk (A)| |Nk (A)| · lrdk (A)
Outliers might be difficult to identify at times. An outlier is a point that is a

short distance from a dense cluster, whereas an inlier is a point that is a greater
distance from a widely spread cluster. Because LOR is used to detect outliers in tiny
regions, this issue is no longer an issue. The LOF approach may be used to tackle
outlier identification problems in a number of fields, including geographic data, video
streams, and so on. A separate dissimilarity function can likewise be implemented
using the LOF. It also outperforms a variety of other anomaly detection approaches
[16, 17].
3 Procedure
This fraudulent transaction detection activity is being completed in three stages.

The loading of the dataset is the first phase, often known as the data discovery
phase. Visual exploration is used in data exploration and data analysis to comprehend
what is in the dataset and what its features are. We used a dataset from the Kaggle
Web site that comprised a number of parameters that were dimensionally reduced
using the PCA approach, including number, class, time, and others. The dataset is
examined and represented in order to generate descriptive statistics for the supplied
series object that summaries the dataset’s central tendency, dispersion, and shape.
There are no null values in any of the calculations. As a result of victimization, the
histogram is constructed to display this represented information. The second phase
is data preprocessing. It reloads the dataset and removes all of the null and garbage
values to improve its efficiency. During this process, we must divide the dataset
into training and testing phases. Here, we primarily focus on the training process,
classifying genuine transactions as class 0 and false transactions as class 1. During
the dataset training process, fraudulent and valid entries are delivered at random to
improve the data quality. As a result, data that is more realistic is obtained. The third
and last process is data classification. It is simply a matter of providing the algorithm
a training dataset of pre-labeled classes to learn from. The model is then applied to
a new dataset in which the classes are not specified, and the model uses the training
set’s learning to predict the class to which it is similar.
Many classification tasks utilize fundamental assessment metrics like accuracy to

evaluate performance among models since it is a straightforward measure to use and
generalizes to more than just binary labels. However, there is one major weakness
in precision. Consistency is a deceptive feature in skewed datasets like ours since
it is believed that each class has an equal representation of instances. It does not
generate precise results. As a result, accuracy is not a good predictor of performance
in our circumstance. We will need some more information to determine whether a
transaction is fraudulent or not. Criteria for correctness include: • Precision, • recall,
• F1-score, • and support the actual and predict classes are the center of all of these
accuracy requirements. Outlier detection, often known as anomaly detection, is one
of the trendiest issues in data mining. Local outliers are ignored by iForest, which is
only receptive to global outliers. While LOF is good at finding local outliers, it has
a lot of time complexity. To address the drawbacks of iForest and LOF, a two-layer
progressive ensemble technique for outlier detection is proposed. It is simple to use
and can accurately find outliers in large datasets. This method searches the dataset
rapidly, prunes the data that appears to be regular, and generates an outlier candidate
collection using a low-complexity variant of iForest.
5 Result
Here, we calculate the mean, count, max, and other information of the data (Fig. 2).
Histograms are used in the project to help distinguish between fraudulent and
legitimate transactions. The Matplotlib package can be used for this. We can also
change the plot’s size to fit our needs (Fig. 3).
The above output shows that it created the bar chart for every attribute within the
dataset. Histograms cluster the info in bins and are that the quickest way to get plan
regarding the distribution of every attribute in dataset.
The correlation matrix is a heat map that is used to see whether there is a
relationship between various parameter sand variables in our dataset (Fig. 4).
The above graph was generated with pyplot and uses seaborn and SNS heat map.
It gives our basic correlation matrix a visual appearance and makes analysis easier.
At both X–Y axes, with a range of −0.75 to +0.50, all 31 parameters V1V29, class,
and number are present.
For successfully identifying credit card fraud, this study provides local outlier and
isolation forest techniques. This study looked on the unbalanced presence of credit
card data. The results of the trials revealed that the suggested model is effective in
704 Y. Patel et al.
Fig. 2 Mean, count, max, min and other information of each of the predictor columns
Fig. 3 Histogram of each of the predictor columns
addressing unbalanced situations in credit card fraud detection. Due to the growing
usage of credit cards for transactions, credit card fraud is on the rise. This study
explores credit card fraud detection using machine learning methods such as local
outlier factor and isolation forest using a publicly available dataset. The Python
programming language was used to construct the proposed framework. The suggested
model has not been verified for high-dimensional datasets. Cleaning techniques
include sampling or feature selection algorithms, and the suggested model may
be enhanced by combining it with additional data to be used in high-dimensional
datasets. This study does not address the unbalanced nature of detection methods or
their influence on results. Another area of potential research is a thorough examina-
tion of the problem in relation to the computational efficiency of smart card fraud
detection techniques. The code outputs the number of false positives it discovered
and compares it to the real figures. This is how the algorithm’s precision and accuracy
are calculated. We only used 10% of the total dataset for speedier testing. Finally, the
entire dataset is utilized, and all reports are generated. These results, as well as the
706 Y. Patel et al.
Fig. 4 Correlation matrix of predictor columns
classification report for each algorithm, are presented in the output,where class 0 indi-
cates that the transaction was determined to be legitimate and class 1 indicates that
the transaction was determined to be fraudulent. To rule out false positives, this result
was compared to the class values. This article reviewed recent research in the subject
and identified the most common types of fraud, as well as methods for detecting
them. The technique, pseudocode, explanation, and experimentation results are all
included in this paper, as well as a full description of how machine learning might
be applied to improve fraud detection outcomes. Since the full dataset is made up of
just two days’ worth of transaction information, it is just a small portion of the data
that could be made accessible if this project were to be used commercially. Since the
software is based on machine learning algorithms, it can only get more efficient over
time as more data is fed into it. Although we did not reach our goal of 100% fraud
detection accuracy, we did develop a system that can get close with enough time and
data. There is room for improvement in this project, as with any other of its kind.
Several algorithms may be integrated as modules in this project, and their outputs
can be merged to increase the accuracy of the final result. To improve these models
even more, new algorithms may be included using blockchain technology [11]. The
performance of these algorithms, on the other hand, must be in the same format as
the others. Once that requirement is fulfilled, the modules are straightforward to add,
as demonstrated in the code. As a result, the project gets a lot of modularity and
flexibility. There is more room for modification in the dataset. As previously shown,
the accuracy of the algorithms improves as the dataset size grows. As a result, more
data will undoubtedly improve the model’s accuracy in detecting frauds and reduce
the number of false positives.
Acknowledgements We would like to take this opportunity to express our heartfelt appreciation
and warm regards to our advisor for their outstanding guidance, monitoring, and relentless encour-
agement during the thesis. The blessings, assistance, and encouragement that they provide from
time to time will take us a long way in the life path that we are about to embark on. Every successful
project is built on the continuous motivation, goodwill, and support of those who surround it. We
want to take this moment to thank everyone who has helped the project flourish by donating their
time, full support, and collaboration. Dr. Amit Ganatra, Head of Department; Dr. Amit Nayak, Head
of Department and Project Guide; and Prof. Mohammed Bohra are all appreciative for their help
during the study and development phase. It is because of them that we have been motivated to work
hard and implement new technologies. They created a favorable atmosphere for us, and without
them, we would not have been able to achieve our target.
References
1. Tripathi D, Lone T, Sharma Y, Dwivedi S (2018) Credit card fraud detection using local outlier
factor. Int J Pure Appl Math 118(7):229–234
2. Banerjee R, Bourla G, Chen S, Purohit S, Battipaglia J (2018) Comparative analysis of machine
learning algorithms through credit card fraud detection, pp 1–10
3. Machine Learning Group, Credit card fraud detection. Kaggle, 23-Mar-2018. [Online].
Available: https://www.kaggle.com/mlgulb/creditcardfraud. Accessed 06 May 2019
4. Li Z et al (2021) A hybrid method with dynamic weighted entropy for handling the problem
of class imbalance with overlap in credit card fraud detection. Expert Syst Appl 175:114750
5. Desai J, Bohara MH (2021) “Farmer Connect”—a step towards enabling machine learning
based agriculture 4.0 efficiently. In: 2021 6th international conference on communication and
electronics systems (ICCES). IEEE
6. Isolation forests for anomaly detection improve fraud detection (2019) Blog Total Fraud
Protection, 2019. [Online]. Available
7. Waleed GT, Mawlood AT, jabber Abdulhussien A (2020) Credit card anomaly detection using
improved deep autoencoder algorithm. J College Educ 1
8. Joshi A, Soni S, Jain V, An experimental study using unsupervised machine learning techniques
for credit card fraud detection
9. Rai AK, Dwivedi RK (2020) Fraud detection in credit card data using unsupervised machine
learning based scheme. In: 2020 international conference on electronics and sustainable
communication systems (ICESC). IEEE
10. Beigi S, Amin Naseri MR (2020) Credit card fraud detection using data mining and statistical
methods. J AI Data Mining 8.2:149–160
11. Bohara MH et al, Adversarial artificial intelligence assistance for secure 5G-enabled IoT.
Blockchain for 5G-Enabled IoT: 323
12. Anand H, Gautam R, Chaudhry R (2021) Credit card fraud detection using machine learning.
No. 5616. EasyChair
13. Revathi N (2021) Credit card fraud detection using unsupervised technique in time series data.
Turkish J Comput Math Educ (TURCOMAT) 12(13):3082–3088
14. Hussein AS et al (2021) Credit card fraud detection using fuzzy rough nearest neighbor and
sequential minimal optimization with logistic regression. Int J Interact Mobile Technol 15(5)
708 Y. Patel et al.
15. Lim CP, Seera M, Nandi AK, Randhawa K, Loo CK, Credit card fraud detection using adaboost
and majority voting. IEEE Access 6
16. Shukur HA (2019) Credit card fraud detection using machine learning methodologies.
8(3):257–260
17. “Local outlier factor”, En.wikipedia.org, 2019. [Online]. Available: https://en.wikipedia.org/
wiki/Local_outlier_factor. Accessed 06 May 2019
Impact of Blockchain Technology
in the Healthcare Systems
Garima Anand, Ashwin Prajeeth, Binav Gautam, Rahul, and Monika
Abstract The healthcare industry is one of the most important industries in the
world which is in dire need of a restructuring process because of its poor and outdated
techniques of data management. Healthcare system has adopted a centralized envi-
ronment and deals with a lot of intermediaries which makes it prone to issues of
single point of failure, lack of traceability of transactions, and privacy issues such as
data leakage. Blockchain is a relatively new technology which is able to tackle the
obsolete methods and practices existing in the healthcare industry. In this chapter, we
analyzed the applications of blockchains in the healthcare industry which can solve
the issues prevalent in the healthcare industry. The aim of this chapter is to reveal
the potential benefits that comes from using blockchain technology in the healthcare
industry and identify the various challenges that this technology has.
Keywords Blockchain · Decentralization · Scalability · Security · Health care
G. Anand
Department of Computational Sciences, CHRIST (Deemed to be University), Bangalore, India
e-mail: garima.anand@christuniversity.in
A. Prajeeth · B. Gautam
Department of Computer Science and Engineering, Delhi Technological University, Delhi
110042, India
e-mail: ashwinkurumkulamprajeeth_2k19co092@dtu.ac.in
B. Gautam
e-mail: binavgautam_2k19co103@dtu.ac.in
Rahul (B)
Department of Software Engineering, Delhi Technological University, Delhi 110042, India
e-mail: rahul@dtu.ac.in
Monika
Department of Computer Science, Shaheed Rajguru College of Applied Sciences for Women,
University of Delhi, Delhi 110096, India
https://doi.org/10.1007/978-981-16-7610-9_53
710 G. Anand et al.
1 Introduction
Blockchain is a technology that was first brought to light in 2008 through a white
paper published by Satoshi Nakamoto, which introduced the peer-to-peer electronic
cash system called bitcoin. Blockchain has disrupted every industry in the past decade
and is considered to be the one of the pathbreaking technologies in today’s world.
This is because it offers a platform that is based on trust and transparency in a
decentralized environment [1]. Bitcoin was created as an alternative to fiat currencies
after the financial crisis of 2008 [2]. The cause of the crisis was because most banks
maintained financial records in a centralized manner. There was no one to look
after the processes, and most of the faults of the system were overlooked [3]. This
made the arrival of bitcoin huge as it utilizes the technical aspects of blockchain, a
decentralized currency maintaining an immutable ledger [4].
Blockchain is a very powerful technology as it is an ‘unbreakable’ chain of data
entries that allows its nodes to conduct secure and open transactions. Nodes are
nothing but the computers present in the blockchain network. Transactions that are
done can be viewed by every node present because of its transparent and distributed
structure. Blockchain does not rely on third parties to enable transactions between
entities owing to its decentralized property. It uses miners, who validate transac-
tions in a decentralized way as opposed to third parties. This is done through a
distributed consensus which describes an algorithm used to validate information [5].
The management of the transactions done in healthcare industries is often complex
[6]. With the healthcare system being centralized, it is susceptible to many attacks
from malicious users. Blockchain being decentralized ledger can solve this problem
effectively (Fig. 1).
The code upon which bitcoin was developed was released as open source which
helped researchers to develop their own blockchain-based applications and proto-
types. This made researchers aware of the vast potential blockchain holds and started
implementing this into non-financial industries. The properties of blockchain such
as transparency and being a distributed network make it extremely useful for safe
exchange of information. This awareness increased the popularity of the technology,
and more funds were invested in the research and development for use in applications
like health care, operations, supply chain, and many more [1].
Few of the many problems faced by healthcare researchers and practitioners today
are dealing with fragmented data, lack of communication, and unreliable supply chain
of workflow tools. Practitioners are either afraid to share data as they are unsure if
the patient’s health and identification information safekeeping regulations prevent
such sharing or if there are any financial consequences associated with sharing data
[7]. Interoperability is another pressing issue in the healthcare industry [8]. Hospitals
store data in a centralized structure which makes it vulnerable to problems such as
fragmentation of health data, single point of failure, lack of quantity and quality of
data for medical research [9]. Many records are fragmented due to the lack of the
ability to share the data to other sources which is due to the centralized nature of the
health recording system.
Impact of Blockchain Technology in the Healthcare Systems 711
Fig. 1 P2P network in

blockchain
The many characteristics of blockchain technology can be used efficiently to

improve various functional aspects in the field of health care. Blockchain is able to
share data efficiently among the different healthcare stakeholders while preserving
privacy and security [10]. The remainder of this chapter is divided into the following
sections: Sect. 2 talks about the issues faced in the healthcare industry. Section 3
provides a background of the blockchain technology, briefly discussing the back-
ground of the technology and its characteristics that makes it very useful in health
care. Section 4 explores the application of the blockchain technology in health care,
and how blockchain technology can help digitize health care and also be integrated
along with various other technologies. Section 5 summarizes the concepts that we
have explored in this chapter.
2 Issues Faced by Healthcare Industry
The structure followed by most medical institutions today is outdated and unreli-
able. With the increase in counterfeit products and the significant lack in communi-
cation between other healthcare institutions, the structure of the industry has been
declining. In this section, we highlight a few significant problems, such as the lack
of interoperability, supply chain integrity, and obsolete practices followed.
712 G. Anand et al.
2.1 Interoperability
Healthcare interoperability is the property of various healthcare systems, clinics,

and third-party service providers to communicate and exchange information in a
specific manner [11]. Interoperability is crucial for enriching effective care given to
individuals and communities. It can improve or maybe even extend someone’s life
[11]. It allows other institutional organizations to view a patient’s medical records
(with the permission of the patient) in a safe and scalable way even if the provider is
placed in a far-off location and there is no trust between the two parties [7].
Interoperability within the hospital helps to increase the overall efficiency and the
effective collaborative treatment for the patients. For example, whenever a family
doctor suspects a complex fracture for any of his patients, he can send the patient
to a radiology practice and easily share the patient’s records to them. Likewise, the
radiologist’s team can send over the results received from the imaging department,
as soon as they are done with the scans. The physician receives the information
without any hassle, even if the two practices use different software systems [12].
Another example would be a patient who has to undergo an important operation in a
different hospital. It would be crucial for the physicians and the anesthetist to access
the patient’s medical records to know of any potential drug interactions. Likewise,
the patient’s primary health care should be notified and updated about the condition
of the patient and if any complications arose from the operation.
Medical data sharing today requires the patient to do most of the work as they are
responsible to share and receive medical records with different practitioners. Even
though interoperability increases security and scalability of the patients’ records, very
few institutions actually follow it. This is mainly due to the lack of trust between the
practitioners and the lack of knowledge in the IT area.
A major issue in such an implementation is how to facilitate different providers
to access, manage, and understand the information in a coordinated way such that
the access is real time and instantaneous without any loss of data during storing and
exchanging [13]. This means that all healthcare service providers would have to grant
permissioned access to storing and sharing such information. Even when access is
granted, it can be difficult to manage information stored in various different kinds of
formats at different healthcare centers [14].
The focus of interoperability is also to transfer the control of ownership from
the providers to the patients, so that they can be responsible for their own personal
information and make sure that all their necessary data is properly present in their
electronic health record [15]. This also facilitates several doctors to give their consul-
tations collaboratively to a patient while being able to consult each other for a more
conclusive diagnosis. Practically, this is a complex task to implement, but there is
huge ongoing research on how this issue can be solved. The first step would be to
ensure data interoperability in hospitals and clinics across a city by making sure all
of them have access to all the records at all times without any hassle. This could
then be built up to the state level, where all hospitals across all major cities follow
the same protocols. This is a long and arduous process, and will take several years
before the theory can be applied to practice [16].
2.2 Supply Chain Integrity
Supply chain integrity is the various procedures and technologies used to monitor and
trace the products within the supply chain. This is implemented to discard counterfeit
products and provide high-quality and safe products to the consumer. Some of the
risks to the integrity of products are:
• Adulteration of products
• Counterfeit materials or products
• Misbranded products
• Expired products which are relabeled and sold to consumers [17].
Following certain protocols and assuring quality by tracking the materials and
products throughout the supply chain ensure that the patients or consumers receive
safe therapies and the problems are contained and minimized (Fig. 2).
The control environment within hospitals and clinics is very complex. The mate-
rials and products are dealt by numerous individuals, and they are hard to identify as
they are usually removed from their original packaging. This makes quality assurance
of the supply chain in health care increasingly difficult [18].
The rise of gray markets poses a significant problem for supply chain integrity.
Prices for many pharmaceutical products vary drastically from country to country.
For example, it costs around $210(~Rs. 15,000) every month for a diabetic patient
Fig. 2 Integration of blockchain in supply chain integrity

714 G. Anand et al.
in the USA, whereas it only costs around $50(~Rs. 3500) every month in India
[19]. This initiates gray market distribution, and this is very damaging for healthcare
institutions as supply chain integrity is lost.
The centralized structure of hospitals imposes certain restrictions on supply chain
integrity as well, which results in drug shortages. Since most of the raw materials
for drugs are made in a single location, any disturbances in this location can cause
significant shortages of the manufacturing of drugs and increased prices of raw
materials. For example, India imports 70% of its pharmaceutical raw materials from
China. During the dreadful second wave of COVID-19 in India, China suspended all
cargo flights to and from India for two weeks. This resulted in the price of the raw
materials used in the manufacturing of COVID-19 to jump more than 200% in the
matter of a month [20].
2.3 Obsolete Practices
Medical data is one of the most important types of information that is abundantly
available in today’s world, and its safe and secure storage is very important to maintain
the integrity and privacy of the general public. But even today, healthcare data is
largely paper-based and as such faces a number of challenges. The management
of multiple records becomes very difficult, and sometimes, a patient is not able to
procure all the prescriptions, thus having an undesirable gap in their medical history.
This leads to relaying incomplete information about the patient to the doctor, which
makes it harder to provide an accurate diagnosis. A possible solution is to digitize
all the data stored in hospitals and make them accessible to patients online through
a portal. All healthcare systems and third-party providers that receive access to such
information must make sure that they have a very secure system to prevent any kind
of breach of this sensitive data.
This not only includes the healthcare system but extended to all other partici-
pants such as insurance companies as well. Insurance companies also work in very
outdated ways, storing all their policies in paper format and keeping it to their agen-
cies. Exchange of information between different insurance providers or between
hospitals/medical providers becomes a hassle for the patient when they might be in
urgent need of the insurance funds and are short of time. To claim one’s own insur-
ance benefits, it becomes a long and arduous process of weeks to get all the required
confirmations to receive the insurance reimbursements. Even then, there might be
technical difficulties or external wait times on either side of the transaction, leaving
the patient at a vulnerable position even though they have already purchased the
insurance. For this reason, sometimes people choose to go through middleman agen-
cies, who handle all their insurances in a simpler way as they are more experienced
in the field and have tie-ups with insurance companies to attract more customers.
This adds extra cost that the public must pay if they choose to avoid the lengthy and
complicated procedures of filing for insurance. Due to the involvement of so many
parties in the system, there is also the problem of constant fraud that happens and
many falls victim to it every year and lose a big portion of their life savings.
These problems are addressed in Sect. 4, where we talk about how the use of
blockchain along with other technologies can help simplify these problems and digi-
tize them, so as to favor the patient and not have to make them work for facilities that
they have already purchased. Blockchain also helps remove problems of middleman
and concerns of getting confirmations for claims, as the whole process will be auto-
mated so that when certain criteria are met, it will be executed automatically without
any external intervention from any party.
3 Background of Blockchain
Blockchain is a peer-to-peer network that records all transactions in an immutable

and secured way. It was first introduced with bitcoin, which aimed to be a more
stable alternative currency after the financial crisis in 2008. A blockchain does not
necessarily need a cryptocurrency such as bitcoin to exist. Blockchain is like a log
of records that are timestamped. All the transactions are stored in blocks which are
connected to each other similar to a chain. The first node in the network is known as
genesis. Each block consists of a cryptographic hash that is used to identify itself,
a hash reference that references the previous connected block and a list of verified
transactions—see Fig. 3. The hash which links all the blocks in the blockchain is
generated using a cryptographic one-way hash function (e.g., the hash function used
in bitcoin-SHA256). This ensures anonymity, immutability, and compactness of the
block.
The infrastructure of a blockchain is formed by nodes. Any device that is on the
blockchain network is known as a node. They can range from laptops all the way up
to big servers. They are all connected to each other, and they are able to exchange
the blockchain data and always stay up to date. Any device that contains a full copy
of the transaction history of a blockchain is known as a full node [21]. Each node in
the network possesses two keys: a private and public key. The private key is for the
user to access his account, and the public key is shared to conduct transactions on
their account. A public key can be used by a user to encrypt messages and send it to a
specific user. This message can only be decrypted by the recipient who was addressed
to in the transaction, using their respective private key. This mechanism is known
as asymmetric cryptography. Every node in the network has access to the public
key, whereas the private key is known only to the key’s initiator [22]. Asymmetric
cryptography is used to ensure the consistency and irreversibility of a blockchain
[5].
When transactions that have occurred in the network are deemed valid by the
peers, it is stored in timestamped blocks by the respective miners. The consensus
protocol (explained in Sect. 3.2) chooses the miners and data to be included in the
block. The block is then broadcasted to the blockchain network, where the other
nodes that are present in the network verify the validity of the transactions of the
716 G. Anand et al.
Fig. 3 Components of a block in a blockchain network
block and if it references the previous block in the chain by using the corresponding
hash. The nodes are only added to the blockchain if both the conditions are met,
otherwise they are discarded [5].
3.1 Types of Blockchain
Blockchains are divided into three categories based on their permission level. These
are:
• Public permissionless: These are open to the public, and any user can easily
participate and validate the transactions. It is a permissionless blockchain, which
means that users do not require any permission from a central authority to join
the blockchain. The transactions are pseudonymous. There is the highest level of
decentralization in this as it is maintained by the community. Most cryptocurren-
cies use public permissionless blockchain (e.g., bitcoin and Ethereum).
• Consortium: These are permissioned blockchain which operates under the
authority of a certain group of nodes or users. Predefined consortium nodes control
the consensus process. Users cannot participate in the network unless they are a
member of any organization who has access to the blockchain. The transactions
may or may not be accessible to the public. Some examples of this blockchain
are Quorum and Corda.
• Private: In this, there is usually a single authority or organization to look after
the network. It is a permissioned blockchain where employees of only a single
organization can get access to it. However, the total decentralization property is
lost and it becomes a partially decentralized system because the blockchain is
controlled by a single entity. This will, however, increase the block times and
have a greater transaction throughput. Transactions are validated by this single
entity, and it may or may not be available for the public. Some examples of private
blockchain are Hyperledger Fabric and Ripple.
3.2 Consensus Protocols
The consensus protocol provides a specific method to verify if a transaction is valid

or not. It is used to verify and determine the order of the transactions that has to
be added to a block in the blockchain network. Since blockchain is a peer-to-peer
network, there is not a single authority that determines the validity of the transactions.
It is done, instead with the agreement of all the nodes of the network by following
the protocol. Some of the used distributed consensus protocols that we are going to
briefly discuss are:
• Proof-of-work (PoW): This is used by the bitcoin network to validate transactions.
In this protocol, the miner is decided based on computational power, as the nodes
of a network are required to solve an arbitrary mathematical puzzle (in the case of
bitcoin, SHA256) [23]. The first node to solve it is rewarded with a set number of
the respective cryptocurrency, which is added to the miners’ account. This incen-
tivizes the miners to continuously mine the transactions. This process is called
mining, and it is based on calculating a random nonce number and hash reference
to the previous block [24]. Once the miner computes a value, the other nodes of
the bitcoin network validate it and if correct it is broadcasted to the network. This
solves the problem of double spending as the entire network is required to validate
each transaction. A Sybil attack is when a malicious entity creates multiple entities
and subverts the election by creating biased information. However, proof-of-work
is not vulnerable to these attacks due to the sheer computational power required
to verify bitcoin.
• Proof-of-stake (PoS): Unlike proof-of-work, where the miner is determined by
the computational power, the miner in proof-of-stake is chosen with respect to
the quantity of the asset they own. This gives an advantage to ‘rich’ owners as
the more coins or assets a miner owns, the more mining power it will possess.
However, this method is more efficient than proof-of-work as it needs minimal
power per transaction.
• Practical byzantine fault tolerance: In this method, all the nodes are required to
communicate with one another and find out all the nodes that have the exact same
718 G. Anand et al.
copy of the ledger. These nodes are known as honest nodes, and an agreement is
formed when the honest nodes reach an agreement of their values.
4 Uses and Applications of Blockchain in Health Care
Blockchain is an immensely versatile technology that has a plethora of applications

in the healthcare sector. In this discussion, we will go over some of these applications,
giving a brief overview of the problem at hand and then talking about how blockchain
provides a more effective solution than traditional methods, which is also more
economically viable in the given domain.
4.1 Electronic Health Records
An electronic health record is one of the most popular ways of storing patient infor-
mation in a digitalized manner. These are real-time records that update instantly as
patients perform any kind of medical activity, and are then securely available for
access to all authorized personnel [25]. A patient’s entire medical history can be
recorded in this way, so that any medical professional can have complete access to
the patients’ medical history without any missing information in between.
Health care is an intensely data-driven industry, and there is always ongoing
research for finding more efficient technologies that can amplify the productivity
of the healthcare system while also cutting down on costs. Blockchain provides the
opportunity to overcome the old and disparate health systems that are still in play
today [6]. With the help of blockchain, health care can be shifted from being provider-
centric, i.e., controlled by hospitals and service providers, to patient-centric, i.e.,
being controlled by a peer-to-peer network comprised of the patients and providing
them with the authority to choose who gets to access their personal information
(Fig. 4).
The field of health care deals with a large number of information exchanges and
microtransactions every second, and it is necessary to ensure that every one of them
is transparent and secure across all authorized organizations [3]. Many studies have
published material on how blockchain can be used for administering digital rules
for information access and maintenance [26]. Blockchain is considered as the most
robust and efficient method for ensuring proper interoperability wherever necessary,
and well-planned large-scale implementation is the most appropriate solution for
addressing this concern [1]. Many top tier healthcare institutions have already put
their research and development team to find out the possibilities blockchain can bring
into their systems. This has led to a large increase in the volume of research papers
being published, hence signifying the growing popularity of blockchain technology
[27].
Fig. 4 Various components

of an electronic health record
Attempts are made to either create their own blockchain implementations or

use preexisting implementations such as Hyperledger Fabric [28], created by the
Linux Foundation, or Ethereum [29], created by Vitalik Buterin. It is easier to
work on an already existing blockchain implementation as a well-built foundation
is already provided, and the company then develops upon it as per their needs and
system requirements. A secure blockchain system could then be established for all
the patients’ needs and cares and for managing all their diagnostic records, such
as general-purpose records comprising personal information and all past doctor
prescriptions [30], or for more specific categories such as across diabetes consul-
tancy specialists [14], radiology records [31], etc. Sometimes, medical data can be
very large in size, and so it is more feasible to store only a reference address on the
blockchain and store the data elsewhere in a dedicated database [14].
4.2 Security and Privacy
Healthcare data is one of the most sensitive forms of data, as a person’s entire life
history can be tracked out from their past medical data. Hence, such information
must be stored with the highest levels of security and complete privacy except those
who have permitted access to it. Until now, healthcare data is largely in the hands
of the hospitals and so is controlled by only one entity. This presents a single point
of failure that any hackers wishing to steal or publicize information can choose to
attack from. If hackers manage to get through the security system, they effectively
gain access to the entire library of patient records stored in that hospital and can then
use such information against an innocent person or to blackmail them for their own
benefit. As blockchain is a distributed peer-to-peer ledger system [32], it is controlled
by a large network of smaller entities and thus becomes very difficult to target and
safe against any malicious threats to the system [33]. In case anyone did manage
720 G. Anand et al.
to breach the system at certain point, to cause any actual damage they would have
to be able to change the records at a rate faster than new records are being verified
and stored in the blockchain, which is a computationally infeasible task due to the
secure hashing algorithm used by blockchain to hash each of the records stored in
the various blocks across the chain.
The use of blockchain technology will shift the authorization of their own records
to the patient, and only, they will be able to manage who is allowed to access
what data, without having to reveal the rest of the information to anyone else. This
greatly increases the privacy of the data by allowing restricted access granted with
permissions decided by the patient.
4.3 IoT and Blockchain
Internet of Things, or IoT, is also another technology that has gained a lot of traction
in recent years. Any device that facilitates Internet connection and transfer of data
comprises the Internet of things. Currently, we are surrounded by thousands of IoT
devices at all times, from our mobile phones, laptops to our smartwatches and Blue-
tooth devices. With the advancements in technology, devices such as smartwatches
are now capable of performing a variety of tasks very efficiently, such as tracking our
pulse and heart health while continuously transferring data with our smartphones.
Health care is a large system that has a huge amount of activity and movement on
a daily basis. Hospital attendants are usually in a rush to meet all their checkups and
daily duties and make sure all equipment is present at the correct places, and that they
are periodically getting changed wherever necessary. This brings along with it the
problem of mismanagement. It is difficult to keep track of all the moving parts in the
hospital at all times, and hence, there is a very high frequency of items getting lost or
misplaced. A key challenge [34] is to tag medical equipment with a usable ID and in
integrating trust in device identification and tracking. If this could be implemented
effectively, then any misplaced device could simply be traced back using its ID and
be recovered [35]. With the use of radio frequency identification tags, or RFID on
IoT devices used across the entire hospital, it becomes much easier to keep track of
all the moving components at all times and see their entire movement history. This
frees up a lot of time spent searching for items that were not tagged and hence had to
be manually searched up for, saving up more time for attending to patients in need.
Various studies conducted by big name companies such as Deloitte [28] and
IBM [29] have shown that implementation of blockchain technology along with IoT
can greatly benefit the management of medical assets in a hospital and bring more
efficiency into the system. This would save the hospital huge sums of money that
would otherwise be spent in repurchasing of misplaced or missing items.
4.4 Drug and Pharmaceuticals
Pharmaceuticals is one of the biggest industries in the world responsible for a large
portion of income for many hospitals, clinics, and drug manufacturing companies. It
is a huge industry with a large number of working components and several thousands
of employees working on a daily basis, and hence is very prone to human errors and
consequently large sums of extra overhead expenditures. Blockchain can be used to
automate the entire system by integrating it with IoT, and this can be used to save a
majority of unnecessary expenses, thereby saving more money to implement in drug
research rather than recurring faults within the obsolete supply chain system.
A major problem that pharmaceutical companies face currently is that of drug
counterfeit [36]. This could pose a serious health risk to consumers who come upon
these fake medicines and do not have access to proper healthcare facilities [37].
Various bodies are involved from the starting point of the manufacturing process
until they reach the end point consumer. This creates a lot of opportunities for
moving of fake products among these large shipments of medicinal supplies, and
it becomes difficult to then differ between the two unless a thorough examination of
the medicines is conducted, which is infeasible to do considering the bulk quantity
of supplies. The immutable and traceable nature of blockchain technology comes
into play here, using which it is possible to track a medicine right from its inception
until its sale to the consumer, who can then check a record of the medicines move-
ment right from the start of manufacturing and hence ensure that the medicines are
the real deal [35]. A start-up company has implemented a technology that creates a
chain of custody model [38], tracking the entire manufacturing timeline and using
the immutable nature of blockchain to track medicine and prevent fraud. Blockchain
facilitates the removal of a central authority in these trades, thus removing factors
such as corruption or other negative incentives for fraudulent behavior and tampering.
4.5 Smart Contracts
Smart contracts are programs stored on the blockchain that execute when certain
conditions are met [39]. They are a great way to automate certain daily tasks that
take place in a repeated and similar manner, so that they can be performed instantly
and without any error. This may include activities such as purchasing of medicines,
paying of bills, or regular consultation fees. Due to it being a digital contract, there
is no paperwork or external manual filling of data, hence making processes more
efficient and also cutting costs at the same time. Contracts once executed on the
blockchain cannot be reversed, and the execution is broadcasted to the blockchain
and hence can easily be verified. This removes all possibilities of bribery or any
other negative influences. Another possibility is the incorporation of smart contracts
in the supply chain of pharmaceutical products. The contract can keep track of the
movement of the shipment, execute conditions as they happen in real time, and
722 G. Anand et al.
ultimately execute the contract once the medicines have been successfully delivered
and paid for. This not only removes the possibility of counterfeit drugs, but also
automates the payment for the medicines, hence making it a seamless process for
the patient to receive genuine medicines.
These could also be used for the purchasing and continuity of insurance policies.
People have realized the importance of different kinds of insurance for their safety
and well-being, and have begun purchasing house, car, life, medical, and other various
kinds of insurance. The current insurance industry is very inefficient as it is mainly
done by people and recorded on paper, and this carries with it the same faults that
medical records suffer with. A person could provide their credentials to an insurance
company through the blockchain, and every time they wish to purchase a new type
of insurance, they could simply do so with the click of a button, without having to
go through the long and arduous insurance filing and registration process every time.
The payments of the insurance can also be stored on the smart contracts, making it an
automatic process and hence guaranteeing that the insurance is always kept up to date
and readily available, without any kind of fault or delay in the system withholding the
insurance policy that was promised. If an insuree ever performs any kind of surgical
procedure covered under an insurance policy, the policy would instantly go through
without them having to make several appointments with the insurers.
4.6 Big Data
We know that health care, being one of the largest generators of data, has massive
amounts of information produced every year that could be useful in learning and
performing various analytical studies for academic purposes that could greatly benefit
research scholars and help bring theoretical studies to practical approach. The volume
of data coming out from health care is so large that it is difficult for normal database
software to store and work with this kind of data [40]. Types of data include patient
consultation records, patient health records, identification, insurance, etc. [39]. A
major concern is the safekeeping of privacy, security, and authenticity of such data.
It is also essential to hide confidential information such as name, gender, and age
that could potentially be used to identify a person if this data was ever to be leaked.
These concerns can be addressed by the use of blockchain technology. Blockchain
provides a safe way to ensure the secure transfer and usage of such data by recording
all transactions immutably, and also provides the facilities of sharing only informa-
tion that is required for research purposes without revealing any sensitive patient
identifiable information (Fig. 5).
The field of deep learning is one of the largest growing fields in computer science
and has made tremendous progress in previous years regarding diagnostics of medical
data from radiological information such as X-ray scans and CT scans. Getting report
results is an extensive and expensive task. With the help of blockchain and big data,
it is possible to create a model that is capable of giving radiology reports instantly
as the scans are performed, and upload it to the EHR of the respective patients using
Fig. 5 How blockchain can help in storage and sharing of big data
a smart contract after clearance of all outstanding payments. In this way, blockchain
allows the seamless integration of various technologies to truly create a modern
smart experience for users. MedRec [41] is a popular prototype for blockchain in
health care that aims to prioritize patient agency, giving them control of a transparent
and accessible view of their electronic health records [42]. It incentivizes miners by
providing them with medical data from hospitals for performing the hashing work
which can then be used for research purposes, thus benefiting everyone taking part
in the blockchain system.
4.7 Clinical Trials
Clinical trials are research studies that are conducted upon a large group of willing
volunteers to test for the efficacy and safety of new treatment methods, such as a new
drug, diet, or medical device. It is used to figure out possible side effects and long-
term effects that the treatment may have, and to weigh out the pros and cons of it to
finally decide whether to introduce it to the general public. Sometimes, new medical
procedures may also be performed on people with chronic diseases or life-threatening
diseases, and try to figure out new revolutionary cures for them [43]. But the task of
724 G. Anand et al.
recruitment is a tedious one with a lot of procedures and formalities that must be duly
completed before the trial can begin. Meeting all concerns before the required time
becomes a difficult task and results in a lot of extra expenditures for the concerned
authorities [44]. Due to these problems, it is estimated that 86% of clinical trials do
not achieve their recruitment goals on time [45] and 19% of registered clinical trials
were either closed or terminated due to failure to reach expected enrollment [46].
A possible solution to this problem can be figured out by the combined use of
blockchain technology along with smart contracts. Blockchain can be used to increase
immutability and transparency of the recruitment procedures, check the auditability
and accountability of medical practitioners, and verify the trials and findings of the
researchers [47]. Patient recruitment and matching, efficient data management, and
daily updates of the trial procedures can then all be programmed into a smart contract
which will be uploaded on the blockchain for an efficient monitoring of the whole
trial. This will also help prevent any trial frauds that commonly happen as anyone
applying to the trial will be able to verify the complete authority and authenticity
of the parties involved through the blockchain [48] and ensure that it is a genuine
clinical trial approved by a government body. Blockchain provides a decentralized
data tracking system for keeping records of all patients, doing background checks
on them and checking up on their health records while also maintaining their privacy
by allowing access to only necessary information [49]. This helps smoothen out the
process as participants willing to undergo the trial can directly apply to the clinical
trials without going through any third parties, making it easier to match personal
profiles with the protocol inclusion and exclusion criteria [50]. Another integrating
feature is that of IoT and blockchain, where IoT devices can be used to continuously
monitor the participants and continuously track and upload sensory data and other
vital information Data traceability of the life cycle of the drugs can be used to verify
their authenticity and guarantee their safety for use, which then makes it easier to
check that all trials being performed are legitimate and are done so at the approval of
correct compliances that follow a regulatory code established by the organizations
holding the clinical trials [51].
4.8 Digital Patient Identity
Digital patient identity consists of storing all identifiable information of a person

on the blockchain, including his basic credentials as well as government-approved
identifications. Patient identification matching is a common task in all healthcare
systems to verify the identity of someone [52], by comparing their identity with
those stored in the hospital database. Until now, most patient identification entries
are manually entered, so they are prone to human error and sometimes patients tend
to forget their previous credentials and create another record. Hence, this leads to
multiple duplicate copies of the same person in the database. This can cause several
problems, such as mixing up of patients due to their names or unverifiable identities
due to lacking or false information [53]. Duplicated identification data and incomplete
medical details are major issues that lead to high extra costs that healthcare systems
and patients might have to bear due to identity conflicts. Radiological imaging or
pathological sampling may have to be carried out repeatedly due to previous data
being stored in the wrong identity, which also leads to delay in treatments.
Blockchain can be used to solve this problem by creating a centralized database
that stores all the information of individuals in a dedicated identification storing
database, so that all necessary information can easily be obtained from one source,
which can easily be verified by any organization trying to check the identity of any
person. This will also remove the possibility of duplicate data, as all information is
stored in one place and can easily be recovered from there if necessary. A set protocol
must be defined for storing all information, so that the information is universally the
same wherever it may be used. Storing the name, address, government-approved
identity number, passport number, etc., can be stored in a proper format following
specific guidelines so that there is no mix up of information and identifying creden-
tials [52]. This also solves the problem of having several different IDs for several
different organizations, as one universally acceptable ID can be used for all necessary
purposes [54]. Due to the cryptographic nature of blockchain, the identities will be
kept secure by providing the user with a key that allows only them to share their
identity and also use it to verify themselves at any place when necessary. In this
way, the decentralized and auditable characteristics of blockchain technology help
enforce a much more robust and secure identity sharing and management for the
system, alleviating many of the problems that the current system faces.
5 Conclusion and Future Scope
We have learnt a great deal about the importance of blockchain and its impact in the
healthcare systems and how it is one of the most essential technologies that can help
revolutionize the healthcare sector. Though it is still a very novel technology, and
there is much to learn about all its possible impacts, pros and cons, it is for sure one of
the most promising developments in this field and is getting huge investments from
healthcare giants for research and development purposes to truly utilize blockchain
to its fullest potential. In this chapter, we have discussed the various problems faced
by healthcare systems today, why they face such problems and what the possible
solutions for these are, and how blockchain is one of the most efficient solutions to
address these concerns. We discussed the features and technical details of blockchain
and blockchain-based solutions to the aforementioned problems in healthcare system.
Although the idea of a completely digitized healthcare system is intriguing, it
comes along with a number of obstacles that must be overcome to have a practi-
cally implementable solution of blockchain. It is essential to make such a task scal-
able across a large network of providers, such that it can be accessed smoothly and
securely for effective collaboration and accurate medical diagnosis by a large audi-
ence simultaneously without excessive wait times, so that all medical professionals
working on such cases can collaborate together and give more meaningful results
726 G. Anand et al.
[55]. Blockchain inherently suffers from problems of scalability, as the technology

cannot keep up with the growing network of connected devices on it as blockchain
can only handle a certain number of transactions in a given time frame. This leads
to longer waiting queues for anyone wanting to access or validate data stored on
the blockchain. The blockchain cannot compromise on the time taken for mining as
the consensus protocol is what guarantees blockchains’ immutability. So, there is a
need to develop new and more efficient algorithms that can help the scalability of
blockchain, such that it can easily handle the large traffic it will receive.
Another concern is that of usability. Blockchain, from a technical standpoint, is a
complex technology with lots of intricate working parts and requires deep knowledge
to ensure that its working is not compromised anyway. Most people who will be
working with blockchain-based applications would not be interested in the technical
working of the blockchain and will rather prefer an application that handles it all
and give them a smooth user experience. Although the application can be greatly
simplified, the user must still be familiar with the concept of public and private keys,
and how they work in the context of the application and understand the administrative
role they play in the application.
To overcome these challenges, further research can be conducted in the following
fields:
• Research is required to address the scalability issues faced by blockchain.
• Research is required to develop a software that will provide a more user-friendly
interface for the consumers.
• Further research should be done focusing on security, as well as possible attacks
like Sybil attacks on the network.
This chapter provides an introduction to blockchain and all its possibilities, and
encourages the reader to further explore blockchain in greater depth at their own
interest to build a deeper understanding and realize the potential of blockchain for
the future.
References
1. Shukla RG, Agarwal A, Shukla S (2020) Blockchain-powered smart healthcare system. In:
Handbook of research on blockchain technology, Elsevier Academic Press, S.l., pp 245–270
2. Markham JW (2011) A financial history of the United States. M.E. Sharpe, Armonk, NY, New
York
3. Rathore H, Mohamed A, Guizani M (2020) Blockchain applications for healthcare. In: Energy
efficiency of medical devices and healthcare applications, pp 153–166
4. Johnston D, Yilmaz SO, Kandah J, Bentenitis N, Hashemi F, Gross R, Wilkinson S, Mason S
(2014) The general theory of decentralized applications. DApps
5. Hölbl M, Kompara M, Kamišalić A, Nemec Zlatolas L (2018) A systematic review of the use
of blockchain in healthcare. Symmetry 10(10):470
6. Abujamra R, Randall D (2019) Blockchain applications in healthcare and the opportunities and
the advancements due to the new information technology framework. Adv Comput, 141–154
7. Zhang P, Schmidt DC, White J, Lenz G (2018) Blockchain technology use cases in healthcare.
Adv Comput, 1–41
8. Azaria A, Ekblaw A, Vieira T, Lippman A (2016) MedRec: using blockchain for medical data
access and permission management. In: 2016 2nd international conference on open and big
data (OBD)
9. McGhin T, Choo K-KR, Liu CZ, He D (2019) Blockchain in healthcare applications: Research
challenges and opportunities. J Netw Comput Appl 135:62–75
10. Dubovitskaya A, Xu Z, Ryu S, Schumacher M, Wang F (2017) Secure and trustable electronic
medical records sharing using blockchain. AMIA Annu Symp Proc 2017:650–659
11. Reid PP, Compton WD, Grossman JH, Fanjiang G (2005) Building a better delivery system
12. Integrations H, Healthcare interoperability. [Online]. Available: https://www.healthcareinteg
rations.com/healthcare-interoperability.php. Accessed 23 Apr 2021
13. O’Connor S (2017) What is interoperability, and why is it important? Advanced data systems
corporation, 30-May-2017. [Online]. Available: https://www.adsc.com/blog/what-is-interoper
ability-and-why-is-it-important. Accessed 27 Apr 2021
14. Hasselgren A, Kralevska K, Gligoroski D, Pedersen SA, Faxvaag A (2020) Blockchain in
healthcare and health sciences—a scoping review. Int J Med Inform 134:104040
15. Cichosz SL, Stausholm MN, Kronborg T, Vestergaard P, Hejlesen O (2018) How to use
blockchain for diabetes health care data and access management: an operational concept. J
Diabetes Sci Technol 13(2):248–253
16. Ash JS, Berg M, Coiera E (2003) Some unintended consequences of information technology
in health care: the nature of patient care information system-related errors. J Am Med Inform
Assoc 11(2):104–112
17. Zhang P, White J, Schmidt D, Lenz G (2017) Applying software patterns to address
interoperability in blockchain-based healthcare apps
18. Kennedy G (2015) Supply chain integrity and security
19. Byrnes J, Fixing the healthcare supply chain. HBS working knowledge. [Online]. Available:
https://hbswk.hbs.edu/archive/fixing-the-healthcare-supply-chain. Accessed 01 May 2021
20. Prasad R (2019) The human cost of insulin in America. BBC News, 14-Mar-2019. [Online].
Available: https://www.bbc.com/news/world-us-canada-47491964. Accessed 04 May 2021
21. Chandna H, SG, TNN, KS, Pharma industry warns of Covid drug shortages as raw materials
prices surge 200%. ThePrint, 03-May-2021. [Online]. Available: https://theprint.in/health/
pharma-industry-warns-of-covid-drug-shortages-as-raw-materials-prices-surge-200/650792/.
Accessed 05 May 2021
22. JS (2020) Blockchain: what are nodes and masternodes? Medium, 14-Oct-2020.
[Online]. Available: https://medium.com/coinmonks/blockchain-what-is-a-node-or-master
node-and-what-does-it-do-4d9a4200938f. Accessed 07 Mar 2021
23. Brush K, Rosencrance L, Cobb M (2020) What is asymmetric cryptography and how does
it work? SearchSecurity, 20-Mar-2020. [Online]. Available: https://searchsecurity.techtarget.
com/definition/asymmetric-cryptography. Accessed 07 Mar 2021
24. King S (2013) Primecoin: cryptocurrency with prime number proof-of-work
25. Zheng Z, Xie S, Dai H, Chen X, Wang H (2017) An overview of blockchain technology:
architecture, consensus, and future trends. In: 2017 IEEE international congress on big data
(BigData Congress)
26. What is an electronic health record (EHR)? HealthIT.gov, 10-Sep-2019. [Online]. Available:
https://www.healthit.gov/faq/what-electronic-health-record-ehr. Accessed 02 Mar 2021
27. Gökalp E, Gökalp MO, Çoban S, Eren PE (2018) Analysing opportunities and challenges of
integrated blockchain technologies in healthcare. Inf Syst Res Dev Appl Educ, 174–183
28. Kassab M, DeFranco J, Malas T, Graciano Neto VV, Destefanis G (2019) Blockchain: a panacea
for electronic health records? In: 2019 IEEE/ACM 1st international workshop on software
engineering for healthcare (SEH)
29. Open Source Blockchain Technologies. Hyperledger, 19-May-2021. [Online]. Available:
https://www.hyperledger.org/. Accessed 20 Mar 2021
30. Home. ethereum.org. [Online]. Available: https://www.ethereum.org/. Accessed 20 Mar 2021
728 G. Anand et al.
31. Yue X, Wang H, Jin D, Li M, Jiang W (2016) Healthcare data gateways: found healthcare
intelligence on blockchain with novel privacy risk control. J Med Syst 40(10)
32. Patel V (2018) A framework for secure and decentralized sharing of medical imaging data via
blockchain consensus. Health Informatics J 25(4):1398–1411
33. Shah B, Shah N, Shakhla S, Sawant V (2018) Remodeling the healthcare industry by employing
blockchain technology. In: 2018 international conference on circuits and systems in digital
enterprise technology (ICCSDET)
34. Esmaeilzadeh P, Mirzaei T (2019) The potential of blockchain technology for health infor-
mation exchange: experimental study from patients’ perspectives. J Med Internet Res
21(6):e14184
35. Why Healthcare Industry Should Care About Blockchain? [Online]. Available: https://ww3.
frost.com/files/8615/0227/3370/Why_Healthcare_Industry_Should_Care_About_Blockch
ain_Edited_Version.pdf. Accessed 7 Mar 2021
36. Bell L, Buchanan WJ, Cameron J, Lo O (2018) Applications of blockchain within healthcare.
Blockchain in healthcare today, 1
37. Sylim P, Liu F, Marcelo A, Fontelo P (2018) Blockchain technology for detecting falsified and
substandard drugs in distribution: pharmaceutical supply chain intervention. JMIR Res Protoc
7:e10163
38. Coelho FC (2018) Optimizing disease surveillance with blockchain. bioRxiv 1 18
39. Chronicled I (2018) Chronicled Releases 2017 Progress Report for Blockchain Platform for
Track-and-Trace of Prescription Medicines, 27-Jun-2018. [Online]. Available: https://www.
prnewswire.com/news-releases/chronicled-releases-2017-progress-report-for-blockchain-pla
tform-for-track-and-trace-of-prescription-medicines-300611648.html. Accessed 29 Apr 2021
40. Szabo (1996) Smart contracts: building blocks for digital markets. NJETJoTT 18
41. Dhagarra D, Goswami M, Sarma P, Choudhury A (2019) Big data and blockchain supported
conceptual model for enhanced healthcare coverage: the Indian context. Bus Process Manage
J. https://doi.org/10.1108/BPMJ-06-2018-0164
42. Iaw A, Azaria A, Halamka JD, Lippman A (2016) A case study for blockchain in health-
care:“MedRec” prototype for electronic health records and medical research data. In:
Proceedings of IEEE open and big data conference, vol 13, p 13
43. MedRec. [Online]. Available: https://medrec.media.mit.edu/. Accessed 15 Apr 2021
44. What Are Clinical Trials and Studies? National institute on aging. [Online]. Available: https://
www.nia.nih.gov/health/what-are-clinical-trials-and-studies. Accessed 15 Apr 2021
45. Zhuang Y et al (2020) Applying blockchain technology to enhance clinical trial recruitment.
In: AMIA ... Annual symposium proceedings. AMIA symposium, vol 2019, pp 1276–1285
46. Sullivan J (2004) Subject recruitment and retention: barriers to success. Appl Clin Trials
47. Carlisle B, Kimmelman J, Ramsay T, Mackinnon N (2015) Unsuccessful trial accrual and
human subjects protections: an empirical analysis of recently closed trials. NJCT 12(1):77–83
48. Roma P, Quarre F, Israel A et al (2016) Blockchain: an enabler for life sciences and healthcare
blockchain: an enabler for life sciences healthcare. Deloitte, 1–16
49. Barrett J (2007) Fraud and misconduct in clinical research. Princ Pract Pharm Med 4(2):631–
641
50. Omar IA, Jayaraman R, Salah K et al (2021) Applications of blockchain technology in clinical
trials: review and open challenges. Arab J Sci Eng 46:3001–3015
51. Gross CP, Mallory R, Heiat A, Krumholz HM (2002) Reporting the recruitment process in
clinical trials: who are these patients and how did they get there? Ann Intern Med 137:10–16
52. Petersen S, Hediger T (2017) The Blockchain (R)evolution—how blockchain technology can
revolutionise the life sciences and healthcare industry. Deloitte
53. Just BH, Marc D, Munns M, Sandefer R (2016) Why patient matching is a challenge: research
on master patient index (MPI) data discrepancies in key identifying fields. Perspectives in
health information management, 01-Apr-2016. [Online]. Available: https://www.ncbi.nlm.nih.
gov/pmc/articles/PMC4832129/. Accessed 03 May 2021
54. A framework for cross-organizational patient identity management (2015) The sequoia project
55. Krawiec R, Housman D, White M, Filipova M, Quarre F, Barr D, Nesbitt A, Fedosova K,

Killmeyer J, Israel A (2016) Blockchain: opportunities for health care. In: Proceedings NIST
workshop blockchain healthcare, pp 1–16
56. Castaneda C, Nalley K, Mannion C, Bhattacharyya P, Blake P, Pecora A, Goy A, Suh KS (2015)
Clinical decision support systems for improving diagnostic accuracy and achieving precision
medicine. J Clin Bioinform 5(1)
A Comparison of Machine Learning
Techniques for Categorization of Blood
Donors Having Chronic Hepatitis C
Infection
Sukhada Bhingarkar
Abstract Hepatitis C is a liver disease whose infection is often silent and can lead to
fibrosis or cirrhosis if it becomes chronic and goes undetected. It is generally spread
through blood-to-blood contact. Hence, it is important to accurately classify blood
donors as healthy blood donor or a person having Hepatitis C infection before blood
transfusion happens. Nowadays, machine learning has been used in various domains
including health care for accurate and fast results. This paper proposes a framework
for accurate classification of blood donors using five machine learning algorithms,
namely logistic regression, support vector machine, k-nearest neighbours, decision
tree, and neural networks. The backward elimination technique is implemented for
feature selection to improve the classification accuracy. The experimental results
show that k-nearest neighbours perform better with the testing accuracy of 94.3%
than other classifiers.
Keywords Classification · Feature selection · Hepatitis C · Machine learning ·

Prediction
1 Introduction
Hepatitis C is a liver disease caused by Hepatitis C virus (HCV) [1]. Unlike Hepatitis
A and Hepatitis B, there is no vaccination for Hepatitis C. Hepatitis C begins as an
acute infection after exposure to Hepatitis C virus. Some infected individuals can
cure virus on their own, but 75% of infected individuals develop chronic HCV.
As per World Health Organization (WHO), more than 71 million people world-
wide have chronic Hepatitis C infection. There are many challenges in dealing with
Hepatitis C. Firstly, the infection with HCV is often silent. There are many infected
individuals who either do not have any symptoms or have unspecific symptoms
such as mild fatigue or discomfort in an abdomen. Hepatitis C does not spread by
coughing, sneezing, hugging, shaking hands, or through food and water. However, it
S. Bhingarkar (B)
MIT World Peace University, Pune, India
e-mail: sukhada.bhingarkar@mitwpu.edu.in
https://doi.org/10.1007/978-981-16-7610-9_54
732 S. Bhingarkar
can spread through activities that involve blood-to-blood contact like injection drug
use, sharing personal hygiene items like razor, toothbrush with the infected person,
piercing tattoos with inappropriate sterilization, blood transfusion, organ transplants,
etc. Secondly, over many years, inflammation in liver caused by Hepatitis C virus
usually results in the formation of scar tissue, called as fibrosis. The developing
fibrosis in the liver can eventually reach a specific level which is called as cirrhosis.
Hepatitis C infection with cirrhosis leads to increase risk of liver failure and liver
cancer. It is important to cure the disease before cirrhosis gets developed. Thus, the
purpose of this paper is to propose automated model to analyse laboratory data which
can help in making timely decisions.
Machine learning is one of the hottest trends in today’s market. According to
Gartner [2], by 2022, there will be at least 40% new application development projects
going on in the market that would require machine learning. Data mining and machine
learning are playing vital role in healthcare industry as well to detect several diseases
at an early stage like diabetics, heart attack, autism, arthritis, blood cancer, etc. Clas-
sification is one of the essential branches of machine learning algorithms, wherein the
category of the data item or object is predicted. In last few decades, the researchers
have extensively used machine learning algorithms to design predictive models based
on clinical data. This paper proposes a framework based on machine learning tech-
niques that can help especially pathologists to classify the blood donor correctly
as a healthy blood donor or having Hepatitis C infection based on various blood
attributes.
The rest of the paper is structured as follows: Sect. 2 discusses the related work
in this area. Section 3 describes the proposed framework. Section 4 evaluates the
performance of proposed framework with the help of various evaluation parameters.
Section 5 discusses the results. Finally, the conclusions are drawn in the last section.
2 Related Work
In the related work discussed below, most of the researchers have worked on
predicting the risk of various side effects seen in fibrosis/cirrhosis patients. Some of
the researchers have implemented prediction model for drug design or for treatment.
In [3], esophageal varices has been detected with the help of various machine
learning algorithms like support vector machine (SVM), Naïve Bayes (NB), deci-
sion tree (DT), artificial neural network (ANN), random forest (RF), and Bayesian
network to diagnose liver cirrhosis at an early stage. esophageal varices is one of
the most common side effects of liver cirrhosis. The dataset used had twenty-four
features. Various feature selection techniques were applied in proposed work to
select nine significant features. Out of the six machine learning algorithms, Bayesian
network exhibited highest accuracy of 74.8%.
Harry Chown [4] used SVM, ANN, RF, generalized linear model (GLM), and
linear discriminant analysis (LDA) to predict Hepatitis C NS3 cleavage patterns
of viral proteases which can be helpful for future drug design. Two sequence-based
A Comparison of Machine Learning Techniques for Categorization … 733
feature extraction methods were implemented in proposed work. It was been observed
that the method of feature extraction compensated the chosen machine learning
algorithm.
Georg Hoffmann et al. [5] have proposed to use two functions of decision tree
algorithm, i.e. rpart and ctree to detect liver fibrosis and cirrhosis. The authors have
implemented leave-one-out cross-validation method to improve the accuracy of diag-
nosis, wherein the feature used was enhanced liver fibrosis (ELF). The highest accu-
racy achieved through rpart with ELF was 75.3%, and ctree without ELF was 72.6%.
However, small changes done in the input data can result in largely deviated decision
trees.
George N. Ioannou et al. [6] have implemented recurrent neural network (RNN)
to predict the risk of hepatocellular carcinoma (HCC) in Hepatitis C patients having
cirrhosis. The authors had focused upon using 2 types of features, the features that
remain constant over a period of time and the features that change over a period of
time. It was been observed that RNN outperformed the logistic regression with 0.759
as area under receiver operating characteristic curve (AUROC) among all samples.
Hiroaki Haga et al. [7] have developed a treatment prediction model in which
nine machine learning algorithms have been applied on HCV genome variants. It
was been experimented that SVM algorithm performed well with 95% validation
accuracy.
The four machine learning algorithms, namely logistic regression, RF, gradient
boosted trees, and stacked ensemble, were used in [8] to find undiagnosed patients
with HCV infection. The authors extracted information like risk factors, symptoms,
treatment relevant with HCV from the patient’s medical history. It was demonstrated
in the work that stacked ensemble accomplished maximum precision of 97% than
rest of the algorithms.
In [9], the authors have employed synthetic minority sampling technique
(SMOTE) to deal with the problem of imbalance in the dataset of HCV patients
and to rank the features accordingly. The authors have implemented five classifica-
tion algorithms, namely decision tree, k-nearest neighbours, random forest, logistic
regression, and Naïve Bayes. After removing the imbalance, it was found that random
forest performed better than other algorithms with the classification accuracy of 92%.
K. Santosh Bhargav et al. [10] implemented decision tree, support vector machine,
logistic regression, and Naïve Bayes on HCV dataset to classify whether the person
will live or die depending upon the attributes mentioned in the dataset. İt was
concluded that logistic regression had greater accuracy of 87.17% compared to rest
of the classifiers.
The prediction of cirrhosis development in veterans was done using cross-
sectional and longitudinal model in [11]. The performance of the models was
measured on the basis of concordance index, and it was observed that longitudinal
model resulted in 0.764 concordance index whereas cross-sectional model resulted
into 0.746 concordance index.
In another study [12], the authors had evaluated the performance of machine
learning classifiers on Egyptian patients’ dataset [13] using Python and R tools. The
authors have implemented binary and multi-class classification after applying feature
734 S. Bhingarkar
selection techniques such as principal component analysis (PCA). İt was observed
that feature selection mechanism helped to improve the classification accuracy.
Another study was done by the researchers in [14] to compare machine learning
approaches for prediction of advanced liver fibrosis in chronic HCV patients. Deci-
sion tree, particle swarm optimization, genetic algorithm, and linear regression
models were applied, and it was concluded that machine learning techniques achieved
the accuracy in the range between 66.3 and 84.4%.
3 Proposed Framework
This section involves three parts. The first part discusses the laboratory dataset that
involves blood values of the donors. The second part demonstrates the preprocessing
and the feature selection techniques employed before applying the machine learning
algorithms. Lastly, the third part discusses the machine learning techniques employed
to find out the classifier that can achieve the highest accuracy. Figure 1 represents
the proposed framework that demonstrates various phases of it.
The dataset used for this research is from University of California, Irvine (UCI)
machine learning repository [15] which is HCV dataset and is publicly available.
The dataset consists of records of 615 patients out of which 238 are women and 377
are men. The dataset has 14 features which present the information of each patient.
Table 1 represents the description of the features of the dataset.
Fig. 1 Proposed framework

Table 1 Features description

S. No. Name of feature Description Mean Standard deviation
1 İd Patient’s ID 308 177.679
2 Age Patient’s age 47.408 10.055
3 Sex Patient’s sex – –
4 ALB Albumin 41.62 5.781
5 ALP Alkaline phosphatase 68.284 26.028
6 ALT Alanine aminotransferase 28.451 25.47
7 AST Aspartate aminotransferase 34.786 33.091
8 BIL Bilirubin 11.397 19.673
9 CHE Choline esterase 8.197 2.206
10 CHOL Cholesterol 5.368 1.133
11 CREA Creatinine 81.288 49.756
12 GGT γ-glutamyl transferase 39.533 54.661
13 PROT Protein 72.044 5.403
14 Category Patient’s category – –
Table 2 Description of
Code Category Number of records
category
0 Blood donor 533
0s Suspected blood donor 7
1 Hepatitis C 24
2 Fibrosis 21
3 Cirrhosis 30
The dataset is labelled into five classes/categories as—blood donor, suspected

blood donor, Hepatitis C patient, fibrosis patient, and cirrhosis patient. Table 2 shows
the count of records under each of the categories.
3.2 Preprocessing and Feature Selection
Quality of data is an important factor in data mining that leads to accurate prediction.
In order to get precise results, firstly data preprocessing is implemented that includes
dropping a column named as “Unnamed:0”, encoding the columns “Sex” and “Cat-
egory” to numeric values and filling the missing values with the mean value of the
column where the missing value was located. Secondly, it is required to have features
with Gaussian or normal distribution as conventional statistical methods perform
better with such distribution. Hence, as a part of preprocessing, except “Sex” feature
736 S. Bhingarkar
Table 3 Correlation score of

Feature Correlation score
features
CHE −0.329472
CHOL −0.300254
ALB −0.285467
PROT 0.007160
ALP 0.028488
Sex 0.060657
Age 0.106341
ALT 0.106369
CREA 0.182040
GGT 0.471164
BIL 0.473006
AST 0.648341
which is a categorical feature, rest of the features have been power transformed to
have normal distribution.
Further, feature selection technique was employed to choose significant features
from the dataset to achieve better classification results. In proposed framework,
feature selection is performed in two stages. In first stage, correlation score of each
feature is measured with each other as well as with target feature. The features
having high correlation score with each other have the same impact on target feature.
Hence, when two features have greater correlation score than the threshold, one of
the features can be dropped. Here, the threshold assumed is 0.9. However, for a given
dataset, there are no such features having correlation score greater than 0.9. Therefore,
none of the features have been omitted in this stage on the basis of correlation score.
Table 3 depicts the correlation score of each feature with the target feature.
In the second stage, the feature selection is done on the basis of backward elimi-
nation method, wherein probability values (p-values) are calculated for each feature.
P-values depict the null hypothesis of significance level. Null hypothesis assumes that
the feature is of no use, whereas significance level is the amount of change a feature
will affect towards the final output. The significance level is set as 0.05. In backward
elimination, we have fit the regressor model with all the features and then have calcu-
lated the p-values. If the p-value of a feature is higher than the significance level,
that feature is removed. These steps have been repeated till only the features having
p-value less than or equal to the significance level remain. For a given dataset, the
two features CHE and ALT have p-values greater than the significance level; hence,
these two features have been removed from the dataset and the remaining features
are selected for further classification. Figure 2 shows the distribution plot for the
selected features.
Fig. 2 Distribution plot for selected features
3.3 Proposed Model
In proposed model, logistic regression (LR), support vector machine (SVM), k-

nearest neighbours (k-NN), decision tree (DT), and neural network (NN) have been
implemented on the feature subset for the classification.
Logistic regression (LR) is a statistical model that generally produces results in a
binary format like 0/1 or yes/no which is used to predict the outcome of a categorical
dependent variable. However, in proposed framework, LR is used for multi-class
classification by importing linear_model.LogisticRegression from sklearn library.
Support vector machine (SVM) is a discriminative classifier that works on linear
as well as nonlinear data. The examples are represented in the form of points in space,
and SVM finds out the hyperplane to categorize those points into multiple classes.
The hyperplanes are found out such that they maximize the margin among the points
for clear separation. In proposed work, SVC () has been imported from sklearn.svm
library and has used radial basis function (RBF) as a kernel to map input data to
higher-dimensional space.
k-Nearest neighbour (k-NN) is a nonparametric method used for classification and
regression. It is called lazy learner as the computation is deferred until the arrival of
new instance for classification. It considers k training examples that are closest to
new instance by measuring either Euclidean distance or Manhattan distance of new
instance with each training example. The new instance is classified by a plurality vote
of its neighbours, with the instance being assigned to the class most common among
its k-nearest neighbours. In proposed work, the number of neighbours considered
738 S. Bhingarkar
is three. KNeighborsClassifier is imported from sklearn.neighbors library for the

implementation.
Decision tree (DT) is a graphical representation of all the possible solutions to a
decision. It consists of various nodes like root node, decision nodes, and leaf nodes.
The root node is the starting node of the tree, whereas decision nodes are the subnodes
that split based on the conditions. The leaf nodes represent final decision or a target
class to which the instance belongs to after traversal through the tree. The attribute
selection measure such as Gini index or information gain (IG) or entropy is used to
decide the root node of the tree. This process is repeated in a recursive manner to
identify the independent variable/feature of a dataset that will be used to split the
root node and further subsequent decision nodes. The feature having lowest Gini
score/entropy or higher IG score will be used to create the split of the decision tree.
In proposed work, DecisionTreeClassifier is imported from sklearn.tree library for
the implementation.
Neural network (NN) is a supervised machine learning algorithm which consists
of one input layer, one or more hidden layers, and one output layer. An input layer
consists of neurons equal to the number of input features. Hidden layers can have
multiple neurons, whereas output layer has neurons corresponding to the number
of target classes. Each neuron in input layer is connected to every neuron in hidden
layer, and the neurons in hidden layer are further connected with the neurons in output
layer. The connection between neurons is weighed, and these weights are adjusted
during training phase so that correct output can be received at an output layer. In
proposed work, tensorflow.keras is used to implement NN classifier.
4 Performance Evaluation
The dataset was divided as 80% for training and 20% for testing. To evaluate the
performance of all classifiers, the test set was used. The evaluation indices were true
positive (TP), true negative (TN), false positive (FP), and false negative (FN). These
indices were used to calculate performance measures like sensitivity, specificity,
precision, F-measure, and accuracy. Sensitivity, also depicted as true-positive rate
(TPR), implies the ratio of actual positive cases that have been predicted as positive
over the entire population. It is also termed as recall and is calculated as:
TP
Sensitivity (TPR) = (1)
TP + FN
Specificity, also termed as true-negative rate (TNR), is opposite to TPR which

depicts the proportion of actual negatives. It is formulated as:
TN
Specificity (TNR) = (2)
TN + FP
Precision, also called as positive predictive value (PPV), is the percentage of

positives that are predicted as positives among the actual positive instances. It is
expressed as:
TP
Precision (PPV) = (3)
TP + FP
Accuracy depicts the percentage of correct predictions, including positive and

negative over the entire set of population. It is represented as:
TP + TN
Accuracy = (4)
TP + TN + FP + FN
Precision and recall are inversely proportional to each other. F-measure combines
the properties of precision and recall into a single measure which is considered as
harmonic mean of precision and recall. Sometimes, it is also termed as F-score which
is represented as:
Precision × Recall
F - Measure = 2 × (5)
Precision + Recall
The classifier models perform multi-class classification. Hence, Table 4 repre-

sents the performance of classifier models for each class label against all evaluation
measures described above.
Figure 3 depicts testing accuracy achieved by each classifier.
5 Discussion
It is observed from the above results that k-nearest neighbour algorithm has performed
better with the classification accuracy of 94.3% which is higher than other machine
learning algorithms considered for this study. The results of this study reveal the
significance of applying machine learning models in healthcare domain which saves
the time and is inexpensive. Application of feature selection technique shows that
the two features, namely cholinesterase (CHE) and alanine aminotransferase (ALT),
can be removed from the feature dataset in order to gain good accuracy.
6 Conclusion
This paper has proposed to use five machine learning algorithms on laboratory data
to classify the blood donors. These machine learning algorithms include logistic
regression, support vector machine, k-nearest neighbours, and neural networks. The
740 S. Bhingarkar
Table 4 Performance of classifier models

Category (class labels) Sensitivity Specificity Precision F-measure
Logistic regression
Blood donor 99.07 93.33 99.07 99.00
Suspected blood 0.00 99.18 0.00 0.00
Donor
Hepatitis C 50.00 97.52 25.00 33.00
Fibrosis 25.00 97.47 25.00 25.00
Cirrhosis 50.00 98.26 66.66 57.00
Testing accuracy (%): 91.86
Support vector machine
Blood donor 99.05 35.29 91.00 95.00
Suspected blood 0.00 100.00 0.00 0.00
Donor
Hepatitis C 0.00 100.00 0.00 0.00
Fibrosis 0.00 98.31 0.00 0.00
Cirrhosis 40.00 97.45 40.00 40.00
k-nearest neighbours
Blood donor 100.00 63.63 97.00 98.00
Suspected blood 0.00 100.00 0.00 0.00
Donor
Hepatitis C 100.00 99.18 50.00 67.00
Fibrosis 66.66 100.00 100.00 80.00
Cirrhosis 20.00 98.30 33.00 25.00
Decision tree
Blood donor 96.03 68.18 93.26 95.00
Suspected blood 50.00 100.00 100.00 67.00
Donor
Hepatitis C 28.57 100.00 100.00 44.00
Fibrosis 50.00 94.87 33.33 40.00
Cirrhosis 71.42 98.27 71.42 71.00
Neural networks
Blood donor 67.27 100.00 100.00 80.00
Suspected blood 100.00 61.15 4.00 8.00
Donor
(continued)
Table 4 (continued)
Category (class labels) Sensitivity Specificity Precision F-measure
Hepatitis C 0.00 100.00 0.00 0.00
Fibrosis 0.00 100.00 0.00 0.00
Cirrhosis 0.00 100.00 0.00 0.00
Fig. 3 Testing accuracy of classifier models
proposed framework has implemented feature selection mechanism in two stages

to improve the performance of the classifiers. In the first stage, correlation score
for each feature was calculated, while in the second stage, backward elimination
method was carried out on the basis of p-values of each feature. As a result of
feature selection, two features were eliminated from the dataset. The performance
of all classifiers was measured on the basis of various evaluation metrics such as
sensitivity, specificity, precision, F-measure, and accuracy. It is clear from the results
that k-nearest neighbours have performed better with testing accuracy of 94.3% as
compared to other algorithms.
As a future work, it will be interesting to see the results by incorporating the
blood attributes like blood type, genotype, etc., and using other machine learning
algorithms and different feature selection techniques.
References
1. Moosavy SH et al (2017) Epidemiology, transmission, diagnosis, and outcome of Hepatitis C

virus infection. Electron Phys 9(10):5646–5656
2. Global Research and Advisory Company|Gartner. https://www.gartner.com/en
742 S. Bhingarkar
3. Abd El-Salam SM, Ezz MM, Hashem S, Elakel W, Salama R, ElMakhzangy H, ElHefnawi M
(2019) Performance of machine learning approaches on prediction of esophageal varices for
Egyptian chronic hepatitis C patients. Inf Med Unlocked 17
4. Chown H (2019) A comparison of machine learning algorithms for the prediction of Hepatitis
C NS3 protease cleavage sites. EuroBiotech J 3(4):167–174
5. Hoffmann GF, Bietenbeck A, Lichtinghagen R, Klawonn F (2018) Using machine learning
techniques to generate laboratory diagnostic pathways—a case study. J Lab Precis Med 3:58–58
6. Ioannou George N et al (2020) Assessment of a deep learning model to predict hepatocellular
carcinoma in patients with hepatitis C Cirrhosis. JAMA Network Open 3
7. Haga H et al (2020) A machine learning-based treatment prediction model using whole genome
variants of hepatitis C virus. PloS One 15(11)
8. Doyle OM, Leavitt N, Rigg JA (2020) Finding undiagnosed patients with hepatitis C infection:
an application of artificial intelligence to patient claims data. Sci Rep 10:10521
9. Oladimeji O, Oladimeji A , Olayanju O (2021) Machine learning models for diagnostic
classification of hepatitis C Tests. Front Health Inf 10(1)
10. Santosh Bhargav K et al (2018) Application of machine learning classification algorithms on
hepatitis dataset. Int J Appl Eng Res 13(16)
11. Konerman, MA et al (2019) Machine learning models to predict disease progression among
veterans with hepatitis C virus. PloS One 14
12. Nandipati SCR, XinYing C, Wah K (2020) Hepatitis C Virus (HCV) prediction by machine
learning techniques. Applı Modell Sımul 4
13. Dua D, Graff C (2019) UCI machine learning repository http://archive.ics.uci.edu/ml
14. Hashem S, Esmat G, Elakel W, Habashy S, Raouf S, Elhefnawi M, Eladawy M, Elhefnawi M
(2017) Comparison of machine learning approaches for prediction of advanced liver fibrosis
in chronic hepatitis C patients. In: IEEE/ACM transactions on computational biology and
bioinformatics
15. Dua D, Graff C (2020) UCI machine learning repository. http://archive.ics.uci.edu/ml
Monitoring the Soil Parameters Using
IoT for Smart Agriculture
K. Gayathri and S. Thangavelu
Abstract Agriculture is extremely important to India’s economy and people’s

survival. The proposed work aims to create a system for soil monitoring and irrigation
that will lessen manual monitoring of fields and provide details through the Google
Firebase cloud server. The proposed work is designed to assist farmers in increasing
agricultural production. The current system has been urbanized to accumulate real-
time details from the agricultural site, including humidity, moisture content of the
soil, temperature value of soil, TDS sensor, temperature sensor, and color sensors for
soil NPK nutrients. Along with this, the proposed work will implement a soil texture
classification framework and a fire detection module. Fire is an unpredicted event
that results in significant losses for farmers. Hot temperatures and dry conditions can
cause tinder-dry crops and residue in agricultural fields. As a result, field fires can
happen unintentionally. As a result, the fire detection module is used in this project
to avoid this unexpected situation. The ESP32 Microcontroller is the heart of the
system, allowing it to take sensor data over Wi-Fi and send it to the Google Firebase
cloud server platform. The end-user or the farmer can store every single piece of
information and details about the sensor within the Google Firebase data storage
element and access it from any smartphone or website using the Google Firebase
cloud server. Hence, the user can monitor sensor data and control irrigation. Farmers
or end users can grow the appropriate and suitable crop in soil based on results.
Keywords Soil classification · Soil monitoring · Automatic irrigation · Machine

learning
K. Gayathri · S. Thangavelu (B)

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Coimbatore, India
e-mail: s_thangavel@cb.amrita.edu
K. Gayathri
e-mail: cb.en.p2cse19008@cb.students.amrita.edu
https://doi.org/10.1007/978-981-16-7610-9_55
744 K. Gayathri and S. Thangavelu
1 Introduction
Agriculture is important to any country’s economic growth and development. As

stated by the United Nation’s Food and Agriculture Organization, agriculture is vital
to the survival of nearly 60–70% of the world’s population. The growth of a crop
is influenced by a variety of factors. These factors are light, temperature of the soil,
water, soil humidity, fertilizers, and so forth. Measuring the amount and quantity of
nutrients present in the soil is also necessary for efficient crop growth. Despite the fact
that all of the factors are important, water is regarded as the most important. Water
is wasted unrestrictedly in domestic and industrial settings due to an unplanned and
casual approach to water usage, resulting in a daily decrease in groundwater level.
As a result, it has become critical to modernizing traditional agricultural practices in
order to conserve water and increase crop productivity. As a result, the proposed work
is designed in such a way that it can be utilized to remotely observe and keep track
of necessary parameters of soil along with color sensors for NPK nutrients. Also,
automatic managing of moisture content presented in the soil as well as automatic
irrigation, a fire detection module and a soil texture classification framework. This
will upgrade the excellence and volume of agricultural products.
One of the most valuable natural resources is soil. Organized study of soil provides
information on nature and information about the types of soil. Soil pH is an important
element of soil health. The pH scale is used to determine the acidity and the basicity
of soil that has an impact on plant growth. Soil nutrients are determined by the acidity
and basicity of the soil. NPK—it is an example of a nutrient that is found in the soil.
If the value of pH is below 7, then the soil is considered to be acidic. If the value
of pH is between 7, then the soil is considered as neutral and if the value of pH is
greater than 7, then the soil is considered as basic. The presence of iron oxide is
indicated by the soil’s yellow and red colors. The presence of organic matter in the
soil is indicated by its dark black or brown color. Minerals in the soil can also have
an impact. As a result, we can use color image processing to determine the nutrient
content of the soil.
The sensor nodes collect details and information from the website, which is then
stored within a helper node that connects to a cloud server above the internet. Then,
it allows users to manage information about sensors on a smartphone, PC, and so
on. Such systems are bulky, consume a lot of power, and are therefore, unsuitable
for remote use. The ESP32 microcontroller is small and consumes very little power.
It is reported that the wireless monitoring system, which uses GPS and the X-Bee
series, has a limited range, and also users may experience connectivity issues. In
order to avoid this issue, an ESP32 microcontroller system should be used to provide
real-time sensor data, with global access provided by an internet connection.
Monitoring the Soil Parameters Using IoT for Smart Agriculture 745
2 Related Work
Agriculture is very important to any country’s economic growth and development.

Changing climatic conditions have had a negative impact on agricultural production.
As a result, innumerable latest technologies are evolved to exercise smart agriculture.
It is capable to adapt to varying climatic conditions so that it is possible to upgrade
the excellence and volume of agricultural products. Srivastava et al. [1] proposed
one related system. The built system is one novel and also straightforward IoT-
based approach for the smart farms. Hardware setup and also software setup are
made here to analyze essential parameters of soil from any remote place and to
automatically manage moisture content present in the soil. Remote monitoring and
water conservation are aided by the proposed approach.
Agriculture has one significant position in our country’s economic progress. Crop
yield is largely determined by the fertility of the soil and also moisture of the soil.
The soil nutrient analysis, which is mostly done using laboratory techniques, is
required in order to suggest an appropriate fertilizer quantity. It takes a long time
to manually measure soil nutrients. Many farmers do not conduct soil testing in the
laboratory and continue to cultivate the same types of crops on the agricultural land,
causing the loss of soil fertility. A model is developed by Madhumathi et al. [2] for
using Wireless Sensor Networks to implement precision agriculture, which allows
for remote analysis of fertility of the soil and also remote analysis of other factors
such as moisture of soil, temperature, and so on. This information and data are sent
to the cloud. Then, the resulting values and outcomes are showed on any mobile
application. The built system has the capability to suggest the required quantity of
water and also to suggest the required amount of fertilizer, thereby improving the
standard of the soil and ensuring proper crop cultivation.
Agriculture is primary occupation of a large portion of India’s population. Crop
production is extremely important in our country. Excessive fertilizer use or insuf-
ficient fertilizer use are common causes of poor crop quality. The level of nutrients
present in the soil must be measured for efficient crop growth. The proposed IoT-
enabled soil testing system is based on soil parameter measurement and observation.
Kapse et al. [3] described a study of soil and the relevant parameters involved in
predicting suitable crops in order to avoid soil infertility and improve crop quality.
This system was created with farmers’ needs in mind, resulting in its ability to make
suggestions through the mobile application.
Over time, agriculture has been the most traditional activity. Agriculture has under-
gone numerous changes since its origination in order to improve crop productivity
and quality. Weather disasters and natural disasters have had an impact on agriculture
over time. As a result, the next step in the development is to develop IoT solutions for
analyzing a variety of elements in order to improve agriculture. A model like this gives
an essential data on crop growth and soil properties. Ioana et al. [4] described a model
that can monitor the elements that have an impact on agricultural crops for Smart
Agriculture. Pawar et al. [5] stated agriculture is the main source of India’s economic
development. Hydroponics is one of the most practical methods that are available.
Growing plants without the help of soil are known as hydroponics. By replicating
their environmental requirements, the technique provides us with superior-quality
crops. It’s also known as vertical farming. Different parameters, such as temperature
and humidity, must be measured for the weather monitoring system and irrigation
controller.
Bhosale et al. [6] described an irrigation scheduler that executes user-defined
functions. The irrigation scheduler also generates commands in order to control
relevant actuators. The soil moisture sensor was designed, developed, and also tested
in order to achieve accurate and reliable measurements at a low cost.
The sensor can also measure humidity using the same PCB circuit. As a result,
Bhosale et al. [6] show the model of a PIC16F877A microcontroller-based irriga-
tion system. Bhaskar et al. [7] proposed a design that will assist farmers who are
experiencing power outages in maintaining a consistent supply of water because
of power outages and insufficient supply of water. The designed model aids in the
reduction of human labor. Due to less interest, cultivation in our country has been
greatly reduced. However, due to very less knowledge about the dryness of land and
not proper usage of the pesticides, this results in very little production. Sowmiya
et al. [8] described how the data that is sensed is processed and how the sensed data
is stored and saved in the cloud, and then relayed to registered farm owners in a
user-friendly format through their pH one or device. In addition, if the pH value in
the soil is less, then the system recommends best pesticides for better cultivation.
This will be extremely beneficial to farmers who are unable to visit their farms and
will improve crop cultivation. In the Internet world, the Internet of Things is the
hottest topic. The concepts aid in the interconnection of physical objects that have
sensing, actuating, and computing capabilities. Thus, Lakhwani et al. [9] discussed
about the Agricultural IoT, Internet of Things, a list of application where Internet
of Things can be used for agriculture, the advantages of IoT in agriculture, and an
analysis about the literature.
On-site engineers require some basic information about the type and structure of
the soil. Chandan et al. [10] investigated traditional soil classification techniques and
developed and tested an image processing-based efficient classifier for soil classifi-
cation. Humus Clay, Clay, Silty Sand, Sandy Clay, Clayey Peat, Clayey Sand, and
Peat were seven soil classes studied for classification. Preprocessed images of the
soils under study were collected. The feature extracted from the preprocessed images
is used to train the classifier-SVM. Developed SVM is then put into the test for clas-
sification efficiency and accuracy for each class. The built model is utilized for the
development of the classification of soil in real-time. The feature extracted from the
preprocessed images is used to train the classifier-SVM. Developed SVM is then
put into the test for classification efficiency and accuracy for each class. The built
model is utilized for the development of the classification of soil in real-time. Bhat-
tacharya et al. [11] used a computer vision approach for characterizing and also for the
soil classification. For soil classification and characterization, a Gravity Analog Soil
Moisture Sensor is utilized here along with an Arduino Uno and image processing
tool. The data sets of this study are from Ethiopia’s Amhara region and Addis Ababa
city. Bhattacharya et al. [11] used six different types of the soil. Each type contains
90 images. To achieve the study’s goal, pre-processing is performed in MATLAB

after dataset is collected. To classify soils, BPNN is used. It has seven vectors as input
features. Its output layer contains six neurons. When BPNN is utilized, it achieves
an accuracy of 89.7%.
3 Proposed Work
3.1 Proposed System
There are mainly four basic modules in the proposed system as shown in Fig. 1.
The four modules are soil texture classification module, soil monitoring module,
automatic irrigation module, and fire detection module. The soil texture classification
module will help to classify the soil into different types such Humus Clay, Clay, Silty
Sand, Sandy Clay, Clayey Peat, Clayey Sand, and Peat. Based on the type of the Soil
Crops will be suggested. That is, it will suggest and predict about the suitable crops to
cultivate there. The farmers can test the type of soil multiple number of times during
or before the cultivation process and can take necessary actions and precautions to
get good yield. For classifying this, an algorithm will be implemented—SVM. Input
images will be fed to the classifier, and it will classify and detect the soil type. If
any abnormalities are found, then an alert will be sent with the help of buzzer and
necessary actions can be taken. The next module is the soil monitoring module. The
soil monitoring module provides temperature, pH, humidity and soil moisture and
NPK level of the soil. And the automatic irrigation module will be helpful to predict
and analyze the adequate amount of water required for the irrigation. That is, if
moisture level in the soil is less or if the value is less than the specified threshold
level, then ESP32 microcontroller turns on a water pump so that it is possible to give

water to the crops and plants in the farm. Water pump gets turn off automatically
whenever the system finds required moisture content in soil. And the automatic
irrigation will be done. Then, the proposed system can also detect fire using fire
detector module. As we all know, fire is an unexpected and unpredicted event that
results in significant losses for farmers. Hot temperatures and dry conditions can
cause tinder-dry crops and residue in agricultural fields. As a result, field fires can
happen unintentionally. Hence, the fire detection module is used in this project to
avoid this unexpected situation. So, if fire is found in the agricultural land, then the
proposed model will inform end-user with the help of buzzer. That is, whenever
fire is found in the agricultural land, the proposed model identifies fire. Along with
the detection of fire, the proposed system provides an alert to the end-user. Also,
whenever the value comes greater than the threshold value, it will produce an alert
with buzzer alarm.
3.2 Methodology
Soil texture classification has done using the SVM algorithm which is one of the
best and fastest algorithms that outputs accurate results in real-time. The algorithm
classifies the soil within seconds. Block diagram of the proposed system is depicted
in Fig. 2.
Fig. 2 Block diagram of the

proposed system
Hardware used in the proposed work includes the ESP32 Microcontroller, DHT11,
Soil Moisture Sensor, Color Sensor, Buzzer, Relay, Water pump, TDS Sensor, and
Gas Sensor as shown in Fig. 2. The software requirements of this project are
MATLAB, Arduino IDE, Embedded C, and Google Firebase Cloud. The data is
stored in the cloud. MATLAB software is used for soil classification. The soil texture
classification module helps to classify the soil into seven different types. And also, it
will suggest and predict about the suitable crops to cultivate there. Soil monitoring
module provides temperature, pH, humidity and soil moisture, and NPK level of the
soil using DHT11 sensor, and color sensor. After this, automatic irrigation module
provides the adequate amount of water required for the irrigation using a Soil Mois-
ture sensor, Relay, Water pump, and TDS Sensor. TDS sensor checks the quality of
the water. The soil moisture sensor checks the moisture level in the soil and if the
moisture level is low then the microcontroller switches on a water pump to provide
water to the plant. The water pump gets automatically off when the system finds
enough moisture in the soil. All the sensors are connected to ESP32 Microcontroller.
Then using the gas sensor, the fire detector module detects fire. Whenever fire is
found in the agricultural land, the proposed model identifies fire. Along with the
detection of fire, the proposed system provides an alert to the end-user. Also, when-
ever the value comes greater than the threshold value, it will produce an alert using
the buzzer. The data will be updated in the Google Firebase cloud which is used for
data monitoring.
3.3 SVM Algorithm
The SVM (Support Vector Machine) Algorithm proves to be one of the efficient
algorithms for providing accurate results at a faster rate in real-time. It is one of
the famous Supervised Learning algorithms. SVM is mainly utilized to resolve the
classification problems and also to solve the regression problems. SVM’s goal is
finding best line or finding the decision boundary for categorizing n-dimensional
space into classes. Thus, new data points are placed in correct category in future.
A hyperplane is for the best decision boundary. The vectors or the extreme points
that aid to create hyperplane is selected by SVM. Support vectors are extreme cases,
and the algorithm is called a SVM or Support Vector Machine. The SVM’s goal is
to find a hyperplane in N-dimensional space, where this N denotes the number of
features, which categorizes data points clearly. Numerous hyperplanes are there that
are selected in order to separate the two classes of data points.
The main goal of the proposed algorithm is to identify one plane with greatest
margin, or greatest distance between the vectors or data points from both the classes.
The hyperplanes are defined as the decision boundary which aid in classifying data
points. Different classes can be assigned to vectors on each side of hyperplane [12–
20]. The hyperplane’s dimension is also determined by the number of features. If
there are two features present, then the hyperplane is observed as a straight line. And
if there are three features present, then it is observed as two-dimensional plane. If

the number of features exceeds three, then it becomes hard to think.
In higher-dimensional space, hyperplane is described as the set of points whose
dot product with a data point in that space is constant, where such a set of data points
is an orthogonal set of data points. The hyperplanes’ vectors are selected as linear
amalgamation with the parameters of the images of the feature data points or vectors
found within database. The total of kernels is utilized to calculate distance between
all the test points and also the vectors starting in one of the discriminated sets. As a
result, the set of vectors mapped into any hyperplane will become convoluted, and
also permitting for much higher difficult discrimination between sets that are not
convex in actual space.
The soil texture classification steps are shown in Fig. 3. Image acquisition, pre-
processing of soil images for image enhancement, feature extraction, and classifi-
cation are all steps in the classification of soil. The color quantization technique,
Gabor filter, and low pass filter are used in order to get the features and the char-
acteristics from the images of the soil. The statistical parameters used are standard
deviation, mean amplitude, and HSV histogram. SVM, segmentation method, trans-
formation, and statistical parameters are the methods used in the proposed work. The
SVM algorithm or the support vector machine algorithm is one type of supervised
learning model which is linked to learning models or algorithms. Such algorithms
are primarily utilized for regression and classification. It falls into one of two cate-
gories when it comes to a set of training examples. The SVM model is an example
of a mapped representation of points in space. As a result, different types of data are
differentiated as much as possible. Then, this new data is mapped. Also classified
according to which category it belongs to. The process segmentation separates RoI
(RoI means the Region of Interest) from the non-interest areas. To classify pixels in
the feature space reviewing the segmentation as the two-class problem, a two-class
classifier is necessary. Training is the method of segmentation. The goal of color
quantization can be defined as to create a new image that looks visually equal to
this original image. As a result, the different color utilized within original images
are reduced. A low-pass filter reduces higher frequencies while passing frequencies
below the cut-off frequency. The frequency that is reduced is determined by the filter
design. As an edge detector, the Gabor filter is used. Gabor filters with different
frequencies are useful for extracting features [21–24] from an image. Statistical
parameters such as Standard Deviation and Mean are used to describe the content
and texture of an image.
3.4 Dataset
The dataset identified and used in this proposed work is soil classification image
dataset. The soil classification image dataset is composed of 700 images that include
different types of soil. This dataset is used for soil classification. This dataset is
taken from Kaggle. Soil type classification image dataset is an image dataset that
Fig. 3 Steps in the soil

classification
contains images as “Humus Clay,” “Clay,” “Silty Sand,” “Sandy Clay,” “Clayey
Peat,” “Clayey Sand,” and “Peat.” This dataset is primarily used to classify soil into
various types. Indian soils are divided into groups based on where the soil is found or
the predominant particle size present in the soil. And the soil is classified as laterite
soil, alluvial soil, black or regur soil, forest soil, red soil, marshy soil or peaty, arid
or desert soil, and so on, depending on its location. Soil is classified as clay, peat,
or sand based on the dominant particle size. Silty Sand, Clayey Sand, Clayey Peat,
Humus Clay, and Sandy Clay, on other hand, are classified as mixtures of two soils.
The dataset is divided into a ratio of 8:2. That means 80% of the dataset is used for
training purposes and 20% of the dataset is used for testing purposes. For this project,
the necessary data and images of soil are also gathered from various sources.
4 Results
Soil texture classification using an image processing technique for classifying the soil
is done using the SVM algorithm. The implementation of soil texture classification
is done using MATLAB. The hardware and software are serially connected using
the USB cable. By using the SVM algorithm, soil texture classification results are
obtained with higher accuracy of 95.72%. The hardware setup was made and the
results of the soil monitoring module, automatic irrigation module, and fire detection
module were obtained and assessed with accuracy.
The different input images are classified into seven different types of soil such as
“Humus Clay,” “Clay,” “Silty Sand,” “Sandy Clay,” “Clayey Peat,” “Clayey Sand,”
and “Peat” successfully. Also obtained an average accuracy of 95.72%. And here, the
six selected features or elements that are given to this proposed classifier-SVM are
Auto Correlogram, Energy, Mean Amplitude, HSV Hist or HSV Histogram, Wavelet
Moments, and Color Moments. Also, suitable crops to cultivate are predicted. The
results of the soil texture classification module are shown below.
Figure 4 shows the input image, which has been classified as clay. It has a
95.7742% accuracy. Also, suitable crops are predicted. The predicted suitable crops
are Paddy, Fruit trees, and Ornamental trees.
Figure 5 shows the input image, which has been classified as Silty Sand. It has a
95.7742% accuracy rating. Also, suitable crops are predicted. The predicted suitable
crops are Willow, Birch, Dogwood, Cypress, and Fruit crops.
Fig. 4 Soil texture classification module—input image is classified into clay

Fig. 5 Soil texture classification module—input image is classified into silty sand
Figure 6 shows the input image, which has been classified as Humus Clay. It has a
95.7742% accuracy. Also, suitable crops are predicted. The predicted suitable crops
are Berry crops, Climbers, Bamboos, Perennials, Shrubs, and Tubers.
Figure 7 shows the values from Sensors for the soil monitoring, automatic irriga-
tion, and fire detection modules. These values are displayed in the Arduino console.
Fig. 6 Soil texture classification module—input image is classified into humus clay
Fig. 7 Sensor values output snapshot
That is, Gas sensor value, TDS value, Humidity, Soil Moisture Sensor value, and
temperature rate are displayed in the Arduino console.
Figure 8 shows the real-time values obtained from the sensors. These real-time
values are stored in the Google Firebase cloud server. Gas sensor value, TDS value,
humidity, soil moisture sensor value, and temperature rate are stored in the Google
Firebase cloud server.
Table 1 depicts the comparison performance with existing algorithms and
proposed SVM for soil texture classification module.
Precision is defined as the ratio of correctly predicted positive observations to
total predicted positive observations. Recall or sensitivity is the percentage of actual
positive cases that were predicted as positive or true positive. And F1 Score is a
metric for assessing a test’s accuracy. Harmonic mean of precision and recall are
Fig. 8 Real-time database snapshot

Table 1 Comparative analysis of previous research methods and proposed SVM method
Model Number of soil class for Classification algorithm Accuracy (%)
the experiment
Bhattacharya and 3 Multi SVM with Linear 90.7
Solomatine [11] kernel
Chung et al. [12] 13 Linear regression 48
Vibhute et al. [13] 5 Multi SVM with Linear 71.78
kernel
Chandan and 7 Fine KNN 93.8
Thakur [10]
Proposed SVM 7 Multi SVM with Linear 95.72
Model kernel
Table 2 Precision, recall and

Class Precision Recall F1 score
F1 score of the proposed
SVM method Humus clay 0.972 0.957 0.964
Clay 0.941 0.957 0.954
Silty sand 0.952 0.957 0.954
Sandy clay 0.973 0.957 0.964
Clayey peat 0.957 0.954 0.955
Clayey sand 0.972 0.957 0.964
Peat 0.957 0.941 0.948
used to calculate the F1 score. Table 2 depicts the Precision, Recall, and F1 Score
of the SVM Method.
5 Conclusion
Remote monitoring of moisture content, humidity value and rate of the temperature
of the soil is done at a very low cost. Farmers can access the values from anywhere
in the world at any time. As a result, the proposed work provides a more precise
moisture content value and also the rate of temperature and humidity in the soil.
This is really important on farms. To assess any additional data, the humidity sensor,
soil moisture sensor, and temperature sensor are connected to microcontroller. A
sustainable and reliable monitoring model focused on each farmer’s land has been
developed successfully. The developed model is low cost, low power, and as well as
noninvasive and provisional real-time agriculture monitoring model. It’s also simple
to use and gives precise results. The project has been implemented with both hardware
and software components. The hardware containing several sensors is tested with
good accuracy. And the results obtained are very accurate.
References
1. Srivastava A, Das DK, Kumar R (2020) Monitoring of soil parameters and controlling of soil
moisture through IoT based smart agriculture IEEE Students Conf Eng Syst (SCES) 13(3):1–6
2. Madhumathi R, Arumuganathan T, Shruthi R (2020) Soil NPK and moisture analysis using
wireless sensor networks. In: 11th international conference on computing, communication and
networking technologies (ICCCNT), vol 9, no. 1, pp 1–6
3. Kapse S, Kale S, Bhongade S, Sangamnerkar S, Gotmare Y (2020) IoT enable soil testing &
NPK nutrient detection. JAC J Compos Theory 13(5):310–318
4. Marcu IM, Suciu G, Balaceanu CM, Banaru A (2020) IoT based system for smart agriculture.
In: 11th international conference on electronics, computers and artificial intelligence (ECAI),
vol. 11, no. 2, pp 1–4
5. Pawar S, Tembe S, Acharekar R, Khan S, Yadav S (2020) Design of an IoT enabled automated
hydroponics system using NodeMCU and Blynk. In: IEEE 5th international conference for
convergence in technology (I2CT), vol 11, no. 1, pp 1–6, March 2020.
6. Bhosale PA, Dixit VV (2020) Water saving-irrigation automatic agricultural controller. Int J
Sci Technol Res 1(11):118–123
7. Bhaskar L, Koli B, Kumar P, Gaur V (2020) Automatic crop irrigation system. In: 4th interna-
tional conference on reliability, infocom technologies and optimization (ICRITO) (trends and
future directions), vol 15, no. 1, pp 1–4
8. Sowmiya E, Sivaranjani S (2020) Smart system monitoring on soil using internet of things
(IoT). Int Res J Eng Technol (IRJET) 4(2):1070–1072
9. Lakhwani K, Gianey H, Agarwal N, Gupta S (2018) Development of IoT for smart agriculture
a review. Emerg Trends Expert Appl Secur 841(1):425–432
10. Chandan, Thakur R (2018) An intelligent model for Indian soil classification using various
machine learning techniques. Int J Comput Eng Res (IJCER) 8(9):33–41
11. Bhattacharya B, Solomatine DP (2020) Machine learning in soil classification. Neural Netw
19(2):186–195
12. Chung S-O, Cho K-H, Kong J-W, Sudduth KA, Jung K-Y (2020) Soil texture classification
algorithm using RGB characteristics of soil images. IFAC Proc 43(26):34–38
13. Vibhute AD, Kale KV, Dhumal RK, Mehrotra SC (2019) Soil type classification and mapping
using hyperspectral remote sensing data. In: International conference on man and machine
interfacing (MAMI), vol 13, no. 1, pp 1–4
14. Byiringiro E, Ndashimye E, Kabandana I (2021) Smart soil monitoring application (Case Study:
Rwanda). In: Future of information and communication conference, FICC 2021: advances in
information and communication, vol 1363, pp 212–224
15. Prakash C, Singh LP, Gupta A, Singh A (2021) Smart farming: application of internet of
things (IoT) systems. In: Congress of the international ergonomics association, IEA 2021:
proceedings of the 21st congress of the international ergonomics association (IEA 2021), vol
221, pp 233–240
16. Koresh, James Deva H (2021) Analysis of soil nutrients based on potential productivity tests
with balanced minerals for maize-chickpea crop. J Electron 3(01):23–35
17. Adam EEB, Sathesh A (2021) Construction of accurate crack identification on concrete
structure using hybrid deep learning approach. J Innov Image Process (JIIP) 3(02):85–99
18. Shankhdhar GK, Sharma R, Darbari M (2021) SAGRO-lite: a light weight agent based semantic
model for the internet of things for smart agriculture in developing countries. Semantic IoT
Theory Appl 941:265–302
19. Bharti B, Pandey S, Kumar S (2021) An advanced agriculture system for smart irrigation and
leaf disease detection. Adv Electr Comput Technol 711:221–233
20. Srunitha K, Padmavathi S (2017) Performance of SVM classifier for image based soil classifica-
tion. In: International conference on signal processing, communication, power and embedded
system (SCOPES), pp 411–415
21. Sabarish BA, Vidhya S (2019) Facility recommendation system using domination set theory
in graph. Int J Innov Technol Explor Eng 8:313–317
22. Nandhini S, Suganya R, Nandhana K, Varsha S, Deivalakshmi S, Senthil Kumar T (2020)

Automatic detection of leaf disease using CNN algorithm. Mach Learn Predictive Anal
141:237–244
23. Subramanian MA, Selvam N, Rajkumar S, Mahalakshmi R, Ramprabhakar J (2020) Gas
leakage detection system using IoT with integrated notifications using pushbullet: a review. In:
Fourth international conference on inventive systems and control (ICISC)
24. Giridhararajan R, Vasudevan SK, Thangavelu S (2020) IoT based approach for the increased and
improved sales for the brick and mortar stores. Int J Adv Trends Comput Sci Eng 9:3048–3052
NRP-APP: Robust Seamless Data
Capturing and Visualization System
for Routine Immunization Sessions
Kanchana Rajaram, Pankaj Kumar Sharma, and S. Selvakumar
Abstract Immunization of children is one of the essential child health strategies

and saves millions of lives against vaccine-preventable diseases. Vaccination under
Routine Immunization (RI) is one of the most cost-effective health investments.
Immunization coverage information is useful to monitor the performance of immu-
nization programs and improve the vaccines delivery. Existing mobile phone-based
applications for tracking the vaccination coverage, either require healthcare workers
to enter a lot of data, resulting in data quality problems like missing data, inconsis-
tent data, etc., or lack history data, leading to problems like not knowing the vaccine
due for a child. To overcome these difficulties, we have proposed a smart mobile
application along with a portal and a data warehouse, with seamless data capturing
and synchronized data. The load testing of the APP with around 1 MB of local data
and its stress testing in synchronizing with the data warehouse for 5000 users have
shown response times of just 5 and 27.5 s, respectively.
Keywords Mobile app · Data visualization · Data synchronization · Vaccination ·

Work-plans · Dashboard · Data-warehouse
1 Introduction
The Government of India launched Expanded Program on Immunization (EPI)

in 1978 and relaunched it as Universal Immunization Program (UPI) with some
enhancements to provide nationwide vaccination. Later, Mother Child Tracking
System (MCTS) [1] was launched in 2009 which is an information system for tracking
K. Rajaram (B)
Department of Computer Science and Engineering, Sri Sivasubramaniya Nadar College of
Engineering, Chennai, Tamil Nadu, India
e-mail: rkanch@ssn.edu.in
P. K. Sharma · S. Selvakumar
IIIT Una, Una, Himachal Pradesh, India
e-mail: director@iiitu.ac.in
https://doi.org/10.1007/978-981-16-7610-9_56
760 K. Rajaram et al.
the maternal and child health and provisioning of health-related services to the bene-
ficiaries of India’s public health system. This ambitious project currently covers all
the states in India. MCTS required Auxiliary Nurse Midwifes (ANMs) to capture
data in the registers which are weekly or monthly uploaded in the RCH (Repro-
ductive and Child Health) portal. The ANMs used to carry the RCH registers with
them to the PHCs (Primary Health Centers) where the data entry operator collects
the data and update in the RCH portals, which is tremendously burdensome [2]. This
system of data uploading had a scope for data inconsistency and incompleteness as
well as fraudulent data entries, thus defeating the actual purpose of building a robust
system. The data were not being uploaded timely, and the feedback mechanisms
for the health-care workers to take appropriate action were not in place [3]. In a
study conducted in the Indian state of Haryana, there were some observations such
as lack of appropriate training [4], overburdened data entry operators and ANMs,
poor internet connectivity, slow server speed and frequent power failures.
In 2014, Mission Indradhanush was introduced to improve the vaccination rate
among rural population of India. Further, Intensified Mission Indradhanush with
some changes introduced in Mission Indradhanush was launched in 2017 to increase
the vaccination rate in urban slums and in higher urban classes to eradicate the
vaccine-preventable diseases (VPD) completely [2].
UNICEF along with the State Government of Bihar launched a computer tablet-
based MCTS in 2014, to capture real-time data online and to minimize the challenges
faced with the conventional MCTS [3]. The MCTS software contained modules
embedded within the tablet for entering data electronically. These data were preserved
and managed on a dummy server and could be accessed real time. The nonlinking of
the dummy server to the national MCTS portal has not lessened the burden of data
entry operators, who continue to enter data into the national portal as before. In 2017,
various states in India like Haryana, Jammu & Kashmir, Uttar Pradesh, Telangana,
etc., launched ANMOL (ANM Online), a tablet-based software built to eliminate
redundancy, automate data processing and empower healthcare personnel to achieve
improved throughput. ANMOL offered real-time data entry and update by the ANMs
through the options provided in the App for capturing mother and child related data.
ANMOL minimizes the paperwork required to fill the RCH registers by providing
fields in the App itself to register mother and children for health services. However,
this process required ANMs to fill the various fields in the App, while providing
vaccine to the children and to manually verify the details of the children to be vacci-
nated. This APP leads to several human errors and throws huge burden of manual
verification of the details on the ANMs.
Collecting immunization data, maintaining them in Electronic Immunization
Registers (EIR) and sharing the reports helps in better decision making to improve
the vaccination rates among the missed communities [5]. More meaningful infor-
mation can be extracted from the collected data by representing data in a visualized
format in the form of graphs.
To overcome the difficulties in the existing mobile applications used in RI sessions,
a robust system namely Nagarik Rog Prathirakshak Application (NRP-APP) is
proposed with the following functionalities:
NRP-APP: Robust Seamless Data Capturing and Visualization … 761
• Seamless data capturing by automating the authentication of children to be

vaccinated.
• Updating the immunization data of RI sessions in the data warehouse without
requiring manual intervention.
• Visualizing the immunization data in different perspectives through a web portal.
The paper is organized as follows: Sect. 2 reviews the existing mobile applica-
tions for immunization. The modules and functionalities of the proposed mobile
application are elaborated in Sect. 3. The performance testing results are discussed
in Sects. 4 and 5 summarizes our work.
2 Literature Review
Various studies have shown that the use of growing mobile technologies (mobile
apps), have significantly enhanced immunization services. These studies emphasize
on aiding healthy diet and exercise through regular monitoring of one’s BMI, blood
pressure and caloric intake for the local communities where hospitals and health
centers are not easily accessible [6–8]. In a similar study, it has been shown that
m-Health has played a vital role in eradicating polio in developing countries [9]. The
m-Health work focuses at providing vaccination and routine immunization services in
low- and middle-income countries using mobile technologies for polio eradication.
It is also backed by another research conducted to identify m-Health intervention
studies on vaccination update in 21 countries. Ten peer-reviewed studies and seven
white or gray studies showed improved updating of vaccination after interventions
[10].
A study conducted under WHO to measure the impact of using e-Health tech-
nologies to encourage immunization and increase vaccination rate has shown positive
results which encourages the usage of mobile technologies for immunization [11].
A study conducted in Pakistan [12] to get the qualitative experiences of front-line
health workers and district managers while engaging with real-time digital tech-
nology to improve vaccination coverage in an underserved rural district in Pakistan.
It showed that the use of digital technologies has increased satisfaction, transparency
and enhanced reliability of the system. Time required to complete both manual and
digital entries and outdated phones over time were considered as constraints.
A software application called Jeev [13] to track the vaccination coverage of chil-
dren in rural communities which combines the power of smartphones and the ubiquity
of cellular infrastructure, QR codes and national identification cards. Its main focus
is to reduce childhood deaths by strengthening the immunization surveillance and
monitoring in developing countries as 24 million children born every year do not
receive adequate immunization during their first year. Comprehensive Public Health
Management application (CPHM) launched by a non-governmental organization in
Bengaluru, using which the data could be entered in an offline mode and required
to be synchronized with the cloud later. Although, it was easy to retrieve data from
the field, many barriers were there such as internet connectivity, lack of technical
support and importantly, the health worker needs to visit the PHC to synchronize the
data [14].
A study conducted at the Aga Khan University Hospital vaccination centre in
Pakistan to evaluate whether an Artificial Intelligence based mobile app can improve
children’s on-time visits at 10 and 14 weeks of age. Study revealed that caregivers
suggested that the mobile app should have information regarding the doses and they
were interested in monitoring their children’s health progress through the app [15].
A vaccination App VAccApp [16] was developed by the Vienna Vaccine Safety
Initiative which enabled parents to keep track of vaccinations for their children, to
check the status of the vaccination and study previous vaccination history. In a study
conducted in rural Sichuan Province, China with 32 village doctors [17], showed
that village doctors found it more convenient to use the EPI App as it saved time by
looking up information of caregivers and contacting them for overdue vaccinations
at time.
3 Proposed Work
The proposed system is a combination of NRP-App, an android mobile phone-based

application along with a data warehouse namely, NRP-DW and a web portal namely,
NRP-Portal. NRP-App has been developed for the usage of the ANMs who collect
and track the immunization data in the field level as front-line health workers. It
provides various functions that simplifies the work of ANMs without any manual
data entry and with easy-to-use multi-language user interface. Figure 1 shows the
block diagram of the working of the proposed NRP-App.
The NRP-Portal is designed to help the medical officials and planning managers
take different reports of the immunization data and visualize them with different
Fig. 1 Proposed system of NRP-APP

perspectives. The portal is useful to provide effective real-time monitoring of the

vaccination services provided at the field level. The data warehouse NRP-DW,
consisting of the details of children and their immunization is constructed in a HDFS
cluster. Columnar NO-SQL model has been used in designing the NRP-DW due to
huge volume of structured and semi structured immunization data. The NRP-App
uses a lightweight relational model based local database. It is synchronized with
NRP-DW before and after every RI session to retrieve the history data in the local
database as well as to update the vaccination data in the warehouse. The users of the
NRP-Portal can retrieve various reports of the immunization data and visualize it in
different perspectives. Various functionalities and features of the proposed work are
detailed in the subsequent sub-sections.
3.1 Data Storage Repositories
NRP-App uses SQLite [18] as local database which is a structured database embedded
in the App itself and stores data in the device’s local storage. As the SQLite database
resides in the device itself, internet connectivity is not required to access the database
and the data remains under the security of the android device. Figure 2 shows the
schema of local database. It consists of 9 normalized relational tables pertaining to
the details of children, mother, facilities, ANMS, vaccines and immunization. The
data in the tables NRP_child_immunization and NRP_vaccine_barcode is updated
Fig. 2 Schema of NRP-APP’s local database

Fig. 3 Schema of data warehouse NRP-DW
during the RI session. The data in the rest of the tables are used in the RI session to
authenticate the children, verify the vaccine due, etc.
The immunization data warehouse NRP-DW is designed as a NoSQL columnar
model database using Cassandra. Apache Cassandra is an open-source NoSQL
distributed database having scalability and high availability without compromising
performance [19]. Linear scalability and proven fault-tolerance on commodity hard-
ware make it a perfect platform for mission-critical data. It is deployed in a HDFS
cluster of three nodes, where one node is the server and the remaining two are
data nodes. Figure 3 shows the schema of NRP-DW. It consists of four highly
denormalized tables holding the details of children, location or facilities, ANMs
and vaccine barcodes. In the table child_master, v1–v21 columns representing 21
vaccines are of user defined data type vaccine with four fields, date_of_vaccination,
weight_at_vaccination, anm_id_at_vaccination and facility_id.
3.1.1 Data Synchronization
On time data synchronization plays a vital role for real-time tracking and provisioning
of the services. Delayed uploading of data has always remained an issue which further
delays generation of work-plans for the ANMs. NRP-App supports real-time data
synchronization. After the RI session gets over, the ANM can synchronize the updated
records in the device’s local storage with the NRP-DW in the server using the option
provided in the App.
Local database in the NRP-App and NRP data warehouse communicates with
each other via a layer of spring boot [20] API running on the server. The spring
boot API is configured with the NRP-DW’s Keyspace running on the cluster. Spring
Boot provides the libraries using which CRUD operations can be performed on the
configured database. API consists of various GET and POST methods, with different
API addresses each having a different purpose. To make any API call from the NRP-
App, Volley library [21] has been used. Volley is an android based library which
is used to make HTTP requests. Benefit of using Volley over simple http request is
that it uses cache to store the response of the http request, so that when the same
request is made again, it fetches the response from the cache without delay. With
API libraries from Volley, the data from NRP-DW is accessed and stored in the local
database in the app and vice-versa to update the NRP-DW with RI session data from
local database.
3.2 Natural Language User Interface
Language should not be a barrier for the front-line health workers to operate the app.
To eradicate this barrier, NRP-App supports multiple natural languages in the user
interface. Users can choose any language of their preference either in the starting
screen of the App or in the navigating screens by simply going to the menu bar.
Currently, it supports three languages (English, Hindi and Tamil), but any natural
languages can be supported. Inbuilt XML strings have been used in the App to
represent a word or a character in a language. Google Script API is used to translate
a word from English to another language. To extend the support for more languages
in the App, a few parameter values need to be changed in the Google Script API for
dynamic translation of English words. The XML strings to represent the words in
the additional language need to be added to the App for static translation.
3.3 ANM Work-Plan and Vaccine Due List Generation
NRP-App provides options to generate the day or week wise work-plan of an ANM
and vaccine due for children under an ANM or a facility. Figure 4 shows NRP-
App home screen containing various options. In day-wise work-plan, vaccines are
grouped and planned to be given on a specific day. For instance, DPT, second dose
of OPV and Hepatitis B and Pentavalent 1 are planned for Monday, second dose
of Pentavalent, DPT 3 and Hepatitis third dose are planned for Wednesday and so
on. This way it will be convenient for the ANMs to see their day-wise work-plan.
The weekly work-plan shows the list of all the beneficiaries who need to receive
vaccines in the current week. Similarly, the vaccine due list can also be generated
date wise or vaccine wise. With the date wise due list, an ANM can know the details
of the children who need to take vaccine but did not receive it till the current date.
Fig. 4 NRP-App options
The vaccine-wise due list shows how many children did not receive each vaccine
in a particular facility. In summary, the work-plan lists the children who have not
received vaccines on the due date and beneficiaries which need to receive vaccines
in the current week, while the due list includes only the beneficiaries who have not
received vaccines on their due dates till current date. Tables 1 and 2 show a sample
work-plan as well as due list of a particular ANM extracted from NRP-APP.
In order to track long pending unimmunized children, certain color coding is
used in work-plan and due list. The work-plan records highlighted with green color
denote the beneficiaries who need to receive vaccines in the current week, while
green highlighted due list records indicate the beneficiaries who did not receive
vaccination due for them, in the last one week. Yellow colored records indicate that
the beneficiaries who are pending without taking the required vaccination, for more
than a week. The records in red colour show the beneficiaries who are pending for
a particular vaccine for more than 2 months. Figure 5 shows the color coding in
the day-wise work-plan. Lack of knowledge is the most prominent reason including
other reasons behind non or partial immunization of children in India [22]. With these
colored listings, ANM can easily identify the beneficiaries who require immediate
attention and can contact them to get vaccinated on a scheduled routine immunization
session. Thus, the work-plan and due list are useful in improving the immunization
coverage.
Table 1 Work-plan report between August 08, 2021 and August 15, 2021
Manoj Sharma
DOB: 06-Jun-2020
Child Id: 777709256620
Mother Id: 222220742091
Last Visit Date: 10-Aug-2020
Baby of Vijiya K
DOB: 05-Jul-2021
Child Id: 888881002105
Mother Id: 111110013086
Last Visit Date: 05-Jul-2021
Baby of Saru R
DOB: 03-Jul-2021
Child Id: 888881002110
Mother Id: 111110013286
Baby of Kajol
DOB: 20-June-2021
Child Id: 888881002124
Mother Id: 111110013692
Last Visit Date: 21-Jun-2021
Baby of Janaki
DOB: 15-Dec-2020
Child Id: 888881002123
Mother Id: 111110013592
Last Visit Date: 14-Jan-2021
3.4 Routine Immunization Sessions
During an immunization session, the tiresome data entry is completely removed to

improve the data quality. According to the work-plan generated in the App, ANMs
administer the vaccines in the respective immunization sessions. Figure 6 shows the
steps involved in recording the immunization data during the RI session. The details
of the children are stored in an optical QR or barcode printed on the RCH card or
immunization card [23]. It can be scanned using an inbuilt QR/barcode scanner in
the App [24–26]. Quick Response (QR) codes or 2D barcodes have fast readability
and higher storage capacity (up to 400 times more) as compared to standard UPC
barcodes [27]. They can be easily decoded using applications on smartphones. Hence,
by simply scanning the card, the child can be authenticated.
In our system, the ANM scans the RCH card carried by mother or caretaker of
the child using the App’s integrated scanner to authenticate the child and verify the
vaccine due. Then, she can proceed administering the vaccine and weighing child. A
Bluetooth enabled weighing scale is used so that the weight value is communicated
Table 2 Due list report as on August 07, 2021
Baby of Prem Lata

DOB: 08-Jun-2021
Child Id: 888881002128
Mother Id: 111110013769
Karuna
DOB: 30-Dec-2020
Child Id: 888881002106
Mother Id: 111110013168
Last Visit Date: 29-Mar-2021
Baby of Saru R
DOB: 03-Jul-2021
Child Id: 888881002110
Mother Id: 111110013286
Chitra
DOB: 18-Nov-2020
Child Id: 888881002113
Mother Id: 111110013439
Last Visit Date: 27-Feb-2021
Baby of Madheshwari
DOB: 27-Aug-2020
Child Id: 888881002115
Mother Id: 111110013473
Last Visit Date: 20-Nov-2020
in a wireless manner. Android phones support various APIs useful for establishing
connection between Bluetooth-capable devices with which Bluetooth weighing scale
can be paired with the mobile phone and the reading can be obtained on the android
app. Our Bluetooth weighing scale use half Inch seven segment LED display (6-
digits) working on classic Bluetooth technology. As a next step, the ANM scans the
barcode on the vaccine vials to get the vaccine name, dosage information and the batch
details into the App’s local database. The data captured in this way will be accurate
without manual intervention. The RI session doesn’t require any manual entry by
the health worker with our proposed system and the data entry process is seamless
without any manual errors. Figure 6 shows the working of routine immunization in
the NRP-App.
Fig. 5 Color coding for day-wise work-plan
Fig. 6 Workflow of a routine immunization session
3.5 NRP-Portal and Data Visualization
NRP-Portal has been designed to get reports on ANM’s work plan, due list based
on filters like district, block, PHC and sub centre. NRP-Portal communicates with
Fig. 7 Vaccine-wise vaccination due status of children
the NRP DW via a layer of spring boot API. As the data warehouse is used by both
NRP-App and NRP-Portal, the APIs with different API addresses allow the user to
query and update the data into the NRP-DW.
Data visualization using graphs helps the human mind to comprehend the data
and identify trends, patterns and outliers within large data sets. It plays a vital role
in improving immunization coverage by allowing the authorities to take better deci-
sions. A dynamic dashboard with various filters has been designed using Metabase as
API, to visualize the vaccination trend in a facility. Metabase [28] is an open-source
business intelligence tool useful to ask questions about data, and displays answers in
visual formats like tables, graphs, etc., that make sense. It provides graphs showing
the children pending for vaccination for each vaccine or across different facilities like
state, district, PHC, etc. Optionally date filters can be applied. Figures 7 and 8 depict
bar graphs and pie charts showing vaccine-wise number of children pending for
vaccination and pending vaccination status across different states of India. Immu-
nization data pertaining to states such as Tamil Nadu and Himachal Pradesh has
been considered. With color coding of the bar chart, the alarming cases pending for
a long time can be easily identified and appropriate action can be taken for improved
coverage.
4 Experimentation
The HDFS based cluster with a name node and two data nodes has been set up.
The name node is a Intel Xeon server 3.3 GHz, 32 GB RAM, 4 Cores and 2 TB.
The data nodes are Intel i7-4 core workstations with 16 GB RAM and 1 TB HDD.
Our own dataset has been generated for the NRP data warehouse comprising of
Fig. 8 State-wise vaccination due status of children
Fig. 9 Number of users versus throughput
120
Response time in
100
seconds
80
60
40
20
0
0 2000 4000 6000 8000 10000 12000
Number of Users
Fig. 10 Number of users versus response time in seconds

immunization data for sample states of India such as Tamil Nadu and Himachal
Pradesh according to the schema shown in Fig. 3. The average population of Tamil
Nadu state is 7.7 crore in 2020 and around 30% of them are children. The number
of children approximately in a state could be 2.3 crores. The Cassandra based data
warehouse is assumed to withstand a load of children in a state. The details of children
along with other immunization details amounts to 3.27 crores of records. Hence, the
data warehouse is loaded with 3.27 crores of records with a storage size of 13 GB.
The data is generated using a tool called DbGen [29] and using MS Excel. DbGen is
a Windows based tool that can be configured based on the schema to generate data.
The proposed NRP-App is capable of running on all android devices (Mobiles and
Tablets) above android version 6.
Two experiments have been conducted and their objectives are given below:
• Load testing of NRP-APP to test its performance under high loads of the local
SQLite database.
• Stress testing to check the performance of the App while synchronizing the local
database and the data warehouse.
NRP-App has been tested using a tool called JMeter [30]. Apache’s JMeter is
an open-source test tool that is used to analyze and measure the performance of
applications.
4.1 Load Testing in Terms of Database Size
For load testing of the App, the number or children under an ANM has been varied
between 50 and 200 and the minimum throughput and maximum response time have
been calculated by varying SQLite DB size from 0.2 to 0.8 MB. The testing results
are tabulated in Table 3. It is observed that for a fourfold increase in DB size, response
time increases by 40% and throughput decreases by only 26%.
Table 3 NRP-App load testing results

S. No Number of children SQLite DB size in MB Throughput Response time in
seconds
1 50 0.2 0.30 3.28
2 100 0.4 0.29 3.47
3 150 0.6 0.25 3.99
4 200 0.8 0.22 4.59
4.2 Stress Testing of Data Synchronization in Terms

of Number of Users
As per the Rural Health Statistics Bulletin of 2014, there are 8682 sub-centers
which provide health services to the rural population of Tamil Nadu. Hence, while
performing stress testing of the data synchronization between the SQLite database
and NRP-DW, the number of users accessing the data warehouse has been varied
from 1000 to 10,000. The data warehouse size is kept constant at 13 GB. The perfor-
mance parameters such as response time and throughput values, while performing
the stress testing are shown as graphs in Figs. 9 and 10. It is realized that for a tenfold
increase in number of users, response time is 1.5 min only and a twofold decrease
in throughput.
5 Conclusion
With the proposed robust system for seamless data capturing and visualization for
RI sessions, there is significant scope for improved coverage of immunization, while
improving data quality by reducing manual data entry errors. The multi-fold advan-
tages of the proposed system include: authentication of children and vaccine by
scanning RCH card and vaccine bar code, accurate acquisition of baby weight,
user interface supporting any natural language, real-time data synchronization and
data visualization using a dynamic dashboard. Color-coded wok-plans and due lists
helps front-line workers to easily identify long pending beneficiaries. A data storage
warehouse is designed using modern big-data storage and processing framework
and it provides increased performance. The proposed system has been load tested
with 0.8 MB of local database size and stress tested with 10,000 users for data
synchronization. The performance of the APP is found satisfactory.
Though the existing similar application, ANMOL provides immense set of func-
tionalities and features, NRP-APP is an attempt to provide important additional
features like user interface in any natural language, no manual data entry, color
coded work-plan and vaccine due lists and visualization of RI session data through a
portal. These features help the ANMs to simplify their work while improving vacci-
nation coverage. Biometric based registration and authentication of children which
further improves ease of use by the ANMs, and thus, immunization coverage is the
ongoing work.
Acknowledgements This work was supported by Grand Challenges India (GCI) for Immuniza-
tion Data: Innovating for Action (IDIA) funded by BIRAC and jointly funded by Department of
Biotechnology, Government of India and Bill & Melinda Gates foundation.
References
1. Gera R, Muthusamy N, Bahulekar A et al (2015) An in-depth assessment of India’s mother and

child tracking system (MCTS) in Rajasthan and Uttar Pradesh. BMC Health Serv Res 15:315
2. Bhadoria AS, Mishra S, Singh M et al (2019) National immunization programme—mission
Indradhanush programme: newer approaches and interventions. Indian J Pediatr 86:633–638
3. Negandhi P, Chauhan M, Das AM, Sharma J, Neogi S, Sethy G (2016) Computer tablet-based
health technology for strengthening maternal and child tracking in Bihar. Indian J Public Health
60(4):329
4. Das R, Mondal S, Mukherjee N (2018) MoRe-care: mobile-assisted remote healthcare
service delivery. In: 10th international conference on communication systems & networks
(COMSNETS), pp 677–681
5. Siddiq DA, Abdullah S, Dharma VK, Shah MT, Akhter MA, Habib A et al (2021) Using a
low-cost, real-time electronic immunization registry in Pakistan to demonstrate utility of data
for immunization programs and evidence-based decision making to achieve SDG-3: insights
from analysis of big data on vaccines. Int J Med Inform 149:104413
6. Atkinson KM, Westeinde J, Ducharme R, Wilson SE, Deeks SL, Crowcroft N et al (2016) Can
mobile technologies improve on-time vaccination? A study piloting maternal use of Immu-
nizeCA, a Pan-Canadian immunization app. Human Vaccines Immunother 12(10):2654–2661
7. Chua JE, Zaldua JA, Sevilla TJ, Tapel MJ, Orlino MR, Camilo RD, Manuela LRC (2014)
An android phone application for a health monitoring system with integrated medical devices
and localized health information and database for healthy lifestyle changes. In: 2014 interna-
tional conference on humanoid, nanotechnology, information technology, communication and
control, environment and management (HNICEM). IEEE, pp 1–6
8. Ridad G, Esporsado GJ, Garangan A, Escabarte AB, Usman OK (2017) Acceptability testing
of a mobile application to improve immunization status monitoring and compliance in selected
barangay health centers in Iligan city. Int J Trend Res Dev (IJTRD) 4(5):16–19
9. Kim SS, Patel M, Hinman A (2017) Use of m-Health in polio eradication and other
immunization activities in developing countries. Vaccine 35(10):1373–1379
10. Oliver-Williams C, Brown E, Devereux S, Fairhead C, Holeman I (2017) Using mobile phones
to improve vaccination uptake in 21 low-and middle-income countries: systematic review. JMIR
mHealth uHealth 5(10):e7792
11. Dumit EM, Novillo-Ortiz D, Contreras M, Velandia M, Danovaro-Holliday MC (2018) The use
of eHealth with immunizations: an overview of systematic reviews. Vaccine 36(52):7923–7928
12. Zaidi S, Shaikh SA, Sayani S, Kazi AM, Khoja A, Hussain SS, Najmi R (2020) Operability,
acceptability, and usefulness of a mobile APP to track routine immunization performance in
rural Pakistan: interview study among vaccinators and key informants. JMIR mHealth uHealth
8(2):e16081
13. Katib A, Rao D, Rao P, Williams K, Grant J (2015) A prototype of a novel cell phone applica-
tion for tracking the vaccination coverage of children in rural communities. Comput Methods
Programs Biomed 122(2):215–228
14. Shilpa DM, Naik PR, Shewade HD, Sudarshan H (2020) Assessing the implementation of a
mobile App-based electronic health record: A mixed-method study from South India. J Educ
Health Promot 9
15. Kazi AM, Qazi SA, Khawaja S, Ahsan N, Ahmed RM, Sameen F et al (2020) An artificial
intelligence–based, personalized smartphone app to improve childhood immunization coverage
and timelines among children in Pakistan: protocol for a randomized controlled trial. JMIR
Res Protoc 9(12):e22996
16. Seeber L, Conrad T, Hoppe C, Obermeier P, Chen X, Karsch K et al (2017) Educating parents
about the vaccination status of their children: a user-centered mobile application. Prev Med
Rep 5:241–250
17. Chen L, Du X, Zhang L et al (2016) Effectiveness of a smartphone app on improving immu-
nization of children in rural Sichuan Province, China: a cluster randomized controlled trial.
BMC Public Health 16:909
18. SQLiteDatabase in android (2021) https://developer.android.com/reference/android/database/

sqlite/SQLiteDatabase
19. Carpenter J, Hewitt E (2020) Cassandra: the definitive guide, 3rd edn. O’Reilly Media, Inc.
20. Spring-Boot (2021) https://spring.io/projects/spring-boot
21. Android Volley Library (2021) https://developer.android.com/training/volley
22. Natu SA, Mhatre S, Shanbhag R, Captain M, Kulkarni K (2020) Immunization status of children
admitted to a tertiary hospital in India. Int J Contemp Pediatr 7(8):1686. In ANM workplan
and Vaccine due list section
23. W Brown D (2012) COMMENTARY child immunization cards: essential yet underutilized
in national immunization programmes. Open Vaccine J 5(1). In routine immunization session
section
24. Wilson K, Atkinson KM, Westeinde J (2015) Apps for immunization: Leveraging mobile
devices to place the individual at the center of care. Human Vaccines Immunother 11(10):2395–
2399
25. Hasan S, Yousuf MM, Farooq M, Marwah N, Ashraf Andrabi SA, Kumar H (2021) e-Vaccine:
an immunization app. In: 2021 2nd international conference on intelligent engineering and
management (ICIEM), pp 605–610
26. Atkinson KM, Westeinde J, Ducharme R, Wilson SE, Deeks SL, Crowcroft N, Hawken
S, Wilson K (2016) Can mobile technologies improve on-time vaccination? A study
piloting maternal use of ImmunizeCA, a Pan-Canadian immunization app. Human Vaccines
Immunotherapeutics 12(10):2654–2661
27. Wilson K, Atkinson KM, Deeks SL, Crowcroft NS (2016) Improving vaccine registries through
mobile technologies: a vision for mobile enhanced immunization information systems. J Am
Med Inform Assoc 23(1):207–211
28. Metabase (2021) https://www.metabase.com/docs/latest/users-guide/01-what-is-metabase.
html
29. DbGen Database Management Tool (2021) https://www.bcdsoftware.com/iseries400solutions/
dbgen/
30. Apache Jmeter (2021) https://jmeter.apache.org
Methodologies to Ensure Security
and Privacy of an Enterprise Healthcare
Data Warehouse
Joseph George and M. K. Jeyakumar
Abstract The power of data lies in the insights derived from it. We trace this journey
as Data-Information-Knowledge-Wisdom and then come to the Insight (Rowley in
J Inf Sci 33:163–180, 2007). A data warehouse is meant for storing and processing
enormous amounts of data, gathered and transformed from various data sources
(Yessad and Labiod in 2016 International conference on system reliability and
science, ICSRS 2016—proceedings, pp 95–99, 2017). Data security is a major
concern in the data warehouse domain, along with the privacy and confidentiality
factors. This paper discusses the various measures and actions to be taken care of,
to protect the data in a data warehouse. We are considering the healthcare industry
use cases in this study, and the proposed measures are discussed in the context of
healthcare data. The healthcare industry is governed by various strict guidelines and
regulatory requirements in the aspects of data storage, processing and transferring.
Our proposed methods are concentrated on the areas of privacy and confidentiality of
healthcare data warehouse and consist of de-identification and user privilege-based
access controls.
Keywords Data mart · Data warehouse · Healthcare data warehouse · Business

intelligence · Data warehouse security · Information security
1 Introduction
Information security is a serious concern for all organizations and industries across
the globe. The awareness and precautions taken with respect to information security
have increased over the past years. Healthcare is one of the industries where secu-
rity and privacy is a key concern. Healthcare industry is monitored and regulated
J. George (B)
Department of Computer Science and Engineering, Noorul Islam Centre for Higher Education
Kumaracoil, Tamilnadu, India
M. K. Jeyakumar
Department of Computer Applications, Noorul Islam Centre for Higher Education Kumaracoil,
Tamilnadu, India
https://doi.org/10.1007/978-981-16-7610-9_57
778 J. George and M. K. Jeyakumar
by various national and international standards and rules. Privacy, confidentiality

and security of Protected Health Information (PHI) is one of the crucial aspects of
healthcare regulations. When it comes to the data warehousing spectrum, one of the
arguments which always opposes the data aggregation is the “Privacy” issues [3].
Data warehouses itself presents an irony in terms of security. Data warehouses, by
virtue of having the consolidated data view, enable the holistic view of the subject of
examination [4], i.e., in healthcare environment—the patient’s treatment and history.
At the same time, since it stores all the PHI and low-level data, the same needs to be
handled with care and protected seriously [5].
Information disclosure or unauthorized access or data breach of healthcare data,
has an unparalleled impact, and it is irreversible. In a healthcare data repository, even
the most confidential information, such as psychiatric history, HIV status, etc., could
be present. Any medical information exposure can lead to irreconcilable privacy
breaches [6].
Studies reveal that information and security threats are more from within the orga-
nization than from outsiders or intruders. One of the main reasons for security threats
is lack of awareness. By improving people’s awareness on information security, to
some extent, this can be mitigated. There is enormous research done on network and
data security and every organization is taking all possible measures to ensure the
security of the network and resources [7].
In this paper, we are concentrating on the measures and mechanisms to ensure
privacy and confidentiality in a healthcare data warehouse environment.
2 Healthcare Security Concepts
Healthcare information protection concentrate on the three areas below:

• Privacy—Every patient has the right to control the disclosure of his/her PHI [8].
• Confidentiality—All information related to a patient shall be viewed and used
by only those who are required to see it, and within the boundary of the specific
treatment scope [9].
Increasing business competitions lead to the tendency of utilizing the data
beyond the confidentiality boundaries. This has been discussed for quite some time.
Confidentiality breaches come from 3 possibilities:
• From within the care provider organization: This could be because of accidental
disclosure or because of someone accessing the non-intended information out of
curiosity. The most dangerous problems in this category happen when an autho-
rized person reveals the confidential information to an outsider, for monetization
or for any other purposes.
• From the secondary usage of the intended information.
• From outside the organization through intrusion.
Methodologies to Ensure Security and Privacy of an Enterprise … 779
• Security—All the measures and precautions taken to safeguard the data and
network.
International privacy laws are formulated based on the risk versus benefit view-
point and data access shall be allowed only if there exists a substantial benefit for
the patient. HIPAA (Health Insurance Portability and Accountability Act) of the
United Sates came into picture to simplify the health information exchange between
healthcare organizations and to protect the rights of the patients [10].
PIPEDA (Personal Information Protection and Electronic Documents Act) of
Canada [11] is somewhat equivalent to HIPAA in the USA. The GDPR (General
Data Protection Regulation), which is the European Union regulation, focus on the
protection and privacy of data within the region. The major difference between the
GDPR and the HIPAA is the area of concentration. GDPR is for protecting the PII
(Personally Identifiable Information) of EU citizens, while HIPAA concentrates on
the PHI.
3 Methodologies
Let us consider the following scenario for the study. Consider a healthcare orga-
nization with multiple facilities and multiple Hospital Information Systems. A data
warehouse is to be created from these multiple source systems to have a consolidated
view of the clinical and financial information [12]. Figure 1 describes the data flow
from multiple data sources to the data warehouse through a staging area.
The possible data sources are:
• EMR—Various Electronic Medical Records
• LIS—Various Laboratory Information Systems
• RIS—Various Radiology Information System
• CRM—Customer Relationship Management Systems
• RCM—Revenue Cycle Management Systems/Claim Management Systems
• Various types of regulatory benchmark data and patient surveys.
Now, let us focus on the data warehouse formed after the ETL (Extraction Trans-
formation Loading) process [13]. Multiple groups of people will have access to the
centralized data warehouse, which demands proper data governance and security
mechanisms to safeguard the data from unauthorized access and data breaches.
4 Proposed Privacy/Confidentiality Protection Methods
De-identification—The process deployed to prevent a patient’s identity from being

exposed. For example, all the PHI information shall be de-identified, such that with
the available de-identified record, the readers won’t be able to identify the owner.
Fig. 1 Healthcare data warehouse data flow—source to visualization
The privacy rules permit the healthcare providers to disclose de-identified data for
secondary usage such as research and data mining. De-identification done on a scien-
tifically acceptable range ensures that re-identification of the person is not possible
using de-identified data [14].
We will be using 3 different approaches for de-identification, namely:
Data jumbling/hiding: All data which can potentially be used to identify an indi-
vidual shall be anonymized. This shall be done at the data source itself before moving
to the staging area for transformation and loading. Carefully designed scripts shall be
deployed for this. This will ensure, only the de-identified data shall leave the source
data location to the staging area. Data warehouse could be in a different geographic
region than the source systems. This will facilitate legitimate data movement with
compliance and privacy.
Data Removal—Removal/nullifying data, which could lead to identification of
individuals/patients.
Data Grouping—Replace absolute data with a data range. For example, if a
patient’s age is 45, this absolute value shall be replaced with a range, in this case,
40–50.
Table 1 shows a snippet of patient data at the source system and at the data
warehouse (Tables 1 and 2).
De-identification is one of the optimal ways of protecting privacy. However, one
should bear in mind that no method is 100% foolproof.
Table 1 Sample data at the

First name ABC
source system (only for
representation purpose) Last name XYZ
Surname PQR
Age 52
Date of birth 01/01/1968
Sex Male
Insurance Medicare
Primary diagnosis E08.21
Secondary diagnosis R50
Presenting compliant Hyperglycaemia-weakness-dry mouth
Medication Metformin
HbA1c 7.5
Billed amount $55
Table 2 Sample data at the

First name 11,111
data warehouse (only for
representation purpose) Last name 22,222
Surname –
Age 50–60
Date of birth 01/01/2001
Sex Male
Insurance Medicare
Primary diagnosis E08.21
Secondary diagnosis R50
Presenting compliant Hyperglycaemia-weakness-dry mouth
Medication Metformin
HbA1c 7.5
Billed amount $55
Role-based access control model—This method is responsible for granting

permissions and access to the groups who are allowed to view the specified data,
and at the same time, denying access to those user groups who don’t have sufficient
rights. Roles represents the JDs (Job Descriptions) irrespective of who perform it.
Roles shall be created with specifically scrutinized optimal privileges, and individual
actors shall be assigned to the roles. All users in a specific role or access group will
have the same access rights. Access roles makes the security management much
easier than individual assignments [15]. Figure 2 represents a logical representation
of access rights of different roles in a data warehouse.
Fig. 2 Role-based access

restriction in a healthcare
data warehouse
Table 3 Sample roles and permissions (only for representation purpose)

Role Data objects
Demographics Diagnosis Medications Billing Staff
Admin team X X X X
Physicians X X X
Nurses X X X
Finance X X X
Pharmacy X X X
HR X
Table 3 shows the sample access right matrix of various roles assigned in the
healthcare data warehouse. For demonstration, we are considering the roles of Admin
Team, Physicians, Nurses, Pharmacy Team and Finance.
5 Conclusion
Data is the new oil and in today’s world data is the driving force behind every
business. Secured storage of data, irrespective of whether it is online or offline, is
not an optional feature but mandatory in current era. Data is more valuable than any
other assets of any organization. Data warehousing, data lakes and big data concepts
are getting more popular day by day and security threats are also looming large.
A lot of researches and advancements have already taken place in the data security
arena, but data security is still an open-ended question. In this paper, we proposed two
prominent approaches, namely de-identification and role-based security, to ensure
privacy in the healthcare data warehouse. The power of data is enormous, and in the
same way, every individual has the right to protect his or her private information.
References
1. Rowley J (2007) The wisdom hierarchy: representations of the DIKW hierarchy. J Inf Sci
33(2):163–180. https://doi.org/10.1177/0165551506070706
2. Yessad L, Labiod A (2017) Comparative study of data warehouses modeling approaches:
Inmon, Kimball and Data Vault. In: 2016 international conference on system reliability and
science, ICSRS 2016—proceedings, pp 95–99. https://doi.org/10.1109/ICSRS.2016.7815845
3. Khalifa Alhamami H, Kumar Udupi P (2021) Warehouse safety and security. GSJ 8(7).
Accessed 24 Jul 2021. [Online]. Available: www.globalscientificjournal.com
4. Mathur S, Gupta SL, Pahwa P (2020) Enhancing security in banking environment using business
intelligence. Int J Inform Retrieval Res 10(4):21–34. https://doi.org/10.4018/IJIRR.202010
0102
5. Kong G, Xiao Z (2015) Protecting privacy in a clinical data warehouse. Health Inform J
21(2):93–106. https://doi.org/10.1177/1460458213504204
6. Sorathiya H, Patel A, Jain H, Khajanchi A (2017) Security in data warehousing. Int J Eng Dev
Res 5(2). Accessed 24 Jul 2021. [Online]. Available: www.ijedr.org
7. Konda S, More R (2021) Augmenting data warehouse security techniques-a selective survey.
Int Res J Eng Technol. Accessed 24 Jul 2021. [Online]. Available: www.irjet.net
8. George J, Bhila T (2019) Security, confidentiality and privacy in health of healthcare data. Int
J Trend Sci Res Dev 3(4);373–377. https://doi.org/10.31142/ijtsrd23780
9. Abouelmehdi K, Beni-Hessane A, Khaloufi H (2018) Big healthcare data: preserving security
and privacy. J Big Data 5(1). https://doi.org/10.1186/s40537-017-0110-7
10. HIPAA in a Nutshell—RightPatient. https://www.rightpatient.com/blog/hipaa-explained/.
Accessed 24 Jul 2021
11. Personal Information Protection and Electronic Documents Act. https://laws-lois.justice.gc.ca/
ENG/ACTS/P-8.6/page-1.html
12. Kimball JR, Caserta (2004) The data warehouse ETL toolkit
13. Ong TC et al (2017) Dynamic-ETL: a hybrid approach for health data extraction, transformation
and loading. BMC Med Inform Decis Mak 17(1):134. https://doi.org/10.1186/s12911-017-
0532-3
14. Methods for De-identification of PHI|HHS.gov. https://www.hhs.gov/hipaa/for-professionals/
privacy/special-topics/de-identification/index.html. Accessed 27 Jul 2021
15. Wyllie D, Davies J (2015) Role of data warehousing in healthcare epidemiology. J Hosp Infect
89(4):267–270. https://doi.org/10.1016/j.jhin.2015.01.005
Comparative Analysis of Open-Source
Vulnerability Scanners for IoT Devices
Christopher deRito and Sajal Bhatia
Abstract Internet of Things devices are commonly overlooked when it comes to

security. Deployment follows the trend that the devices are powered on and installed,
often without proper configuration or regards to the security they possess. Being
Internet connected, these devices should be held to the security standards that other
systems are held to. Vulnerability scanners are the most effective and least time-
consuming method to determine the vulnerabilities present on a device and provide
insight on steps for mitigation and hardening. However, these scanners do not inher-
ently support the lightweight, low powered, and proprietary nature of IoT devices.
This paper analyzes and compares the use of several well-known and lesser-known
open-source vulnerability scanners used with home IoT devices. The aim is to cover
all aspects of using these programs: the ease of use, support available, effectiveness
of the scanners, direction provided in mitigation, and various operational metrics. In
the end, a comprehensive analysis of each scanner will be provided, discussing the
advantages and disadvantages of each, as well as their best use cases. The intention
of these results is to provide an informative viewpoint on what vulnerability scanner
should be selected for an individual based on a hands-on analysis and comparison.
Keywords Internet of things (IoT) · Vulnerability scanner · Open source
1 Introduction
Internet of things devices are becoming more and more common in everyday life.
These “things” often control many daily functions whether it is apparent or not.
Everything from Signage displays in a company building, to security cameras, to
smoke detectors, to even industrial applications. IoT is everywhere. 2018 saw an
C. deRito · S. Bhatia (B)

School of Computer Science and Engineering, Sacred Heart University, Fairfield, CT 06825, USA
e-mail: bhatias@sacredheart.edu
C. deRito
e-mail: deritoc@mail.sacredheart.edu
https://doi.org/10.1007/978-981-16-7610-9_58
786 C. deRito and S. Bhatia
estimated 3.96 billion devices, 4.81 billion in 2019, and in 2020 up to 5.81 billion
worldwide [1]. The vast number of IoT devices is hard to comprehend, but they are
in use practically everywhere. Making these devices intrinsically secure should be a
priority of manufacturers, administrators, and even consumers.
1.1 What Is an IoT Device?
IoT stands for Internet of things. “Internet of things (IoT) is a collection of many
interconnected objects, services, humans, and devices that can communicate, share
data, and information to achieve a common goal in different areas and applica-
tions.” [2]. These IoT devices are network connected to share information between
devices or to computer systems. Often IoT devices serve a singular specific purpose.
For example, an IP security camera captures footage and sends it over the network.
In a commercial building, a signage player receives a video feed and outputs it to
a display. IoT devices can be used in a variety of situations such as transportation,
agriculture, health care, and industrial settings [3]. The focus of this paper will be
on smart home devices. Smart home IoT devices are very similar to the other types,
however often catered toward the more consumer market. This means the devices are
designed with ease of use and accessibility as main priorities. Security is sometimes
seen as not important in some of the cheaper and less reputable manufacturers but is
still upkept in the more popular ones.
1.2 What Is a Vulnerability Scanner?
Vulnerability scanners are automated tools that scan given networks or systems on the
network and produce a set of scan results. These vulnerabilities can be software bugs,
backdoors, missing patches, misconfiguration, and vulnerable ports and services [4].
Vulnerability scanners are an essential part of any organization’s vulnerability man-
agement program. These scanners by no means are a one-stop solution to finding
vulnerabilities, but they greatly cut down on the amount present and time taken to
find and resolve them. Vulnerability scanners are mainly used by organizations and
companies, as they have large inventory of devices and often hold large amounts of
sensitive data. However, these types of programs can also be used in a home setting
to ensure that a home network and systems are secure. In fact, if fixing vulnerabilities
in a home network are desired, then this is most likely the best route to go. It takes
the need for experience and knowledge of security out of the equation. The scanners
present the results directly to you and often also include clear steps on resolving
each vulnerability. This paper will focus on vulnerability scanners utilized in a home
network, but the concepts are very similar and can translate to a company setting.
Comparative Analysis of Open-Source … 787
1.3 Problem Statement and Contributions
IoT devices are becoming for and more prevalent in daily life; however, the security
of these devices is often overlooked. Vulnerability scanners are a great solution
to automating and ensuring the security of devices. There however aren’t many
options for vulnerability scanners that are specifically meant to work with IoT. Ideally,
there would be vulnerability scanners built for IoT, targeting the vulnerabilities that
affect those devices and the architecture of them. These scanners would decrease the
frequency of attacks on IoT devices, including the ever-common botnets that plague
IoT. Analyzing and testing some of the common open-source vulnerability scanners
can give some insight into how well these scanners work with IoT devices, if at all.
This will also show what is missing from the scanners in terms of IoT support and
how this goal can be achieved. It can also provide insight into the ideal use cases that
each scanner tested would be ideal for.
This paper analyzes and compares the use of several open-source vulnerability
scanners used with home IoT devices. The paper covers all aspects of using these
programs: the ease of use, support available, effectiveness of the scanners, direction
provided in mitigation, and various operational metrics. In the end, a comprehensive
analysis of each scanner will be provided, discussing the advantages and disadvan-
tages of each, as well as their best use cases with the intent to provide an informative
viewpoint on the selection of vulnerability scanner based on a hands-on analysis and
comparison.
2 Related Work
IoT and specifically IoT security is a very popular topic in academic research cur-
rently. Vulnerability scanner research is also fairly abundant; however, the focus is
on vulnerability scanners for the web, which is not a focus of this paper. Even though
the two topics separately are well-covered, there still seems to be little in the way of
vulnerability scanners meant for IoT devices.
Chalvatzis et al. [4] cover a comparison and analysis of vulnerability scanners for
general use with standard systems. This is a similar concept to the research conducted
in this paper, but without the focus on IoT. It goes through three scanners in total.
Nessus, OpenVAS, and Nmap.
In terms of IoT devices, there are a few studies on IoT vulnerability scanning.
However, there is a difference between scanning for vulnerable IoT devices and
scanning IoT devices for vulnerabilities. The papers [5, 6] go into scanning a network
or the Internet in general for vulnerable devices. This is done through applications
such as Shodan or Masscan. This allows you to search the Internet of things for
devices with specific vulnerabilities. For example, search for webcams with default
username “admin” and password “admin.” Shodan and Masscan are very powerful
tools for what they are built for, just not within the scope of this research.
IoT security is a very popular research topic, and there are many papers going
into the specific types of threats that plague IoT and ways to mitigate said threats.
These also go into well-known vulnerabilities and mitigation techniques. Hassija et
al. [7] go into the threats that IoT devices face on the various layers. It also talks
about ways to secure these devices such as blockchain, machine learning, and edge
computing.
Anand et al. [8] go into the specific vulnerabilities of IoT devices across all
functions. It is a thorough review of IoT itself, the security problems that these
devices face, the broad vulnerabilities that are commonly seen, and solutions to
these issues. Similarly, Corp [9] goes into the same attack vectors commonly seen
with IoT, as well as case studies exploring these areas. Many IoT device categories
are explored including drones, IP cameras, smart cars, smart thermostats, etc.
Smart home is a separate category of IoT in of itself. The key difference here is
that these devices are intended to bring automation and more specifically ease of use
to the consumer market. For example, Corp [9] dives into IoT smart home and city
environments along with the security risks that these devices pose. IoT can be used
in realistically any environment or industry, so focusing on the smart home area will
have significant differences from other areas such as industrial applications.
3 Methodology
This study will dive into the practicality of different open-source vulnerability scan-
ners and how well if at all they can be used with IoT devices. Five different vul-
nerability scanners and five different IoT devices have been chosen to carry out the
testing. In total, there will be twenty-five different tests performed, each vulnerability
scanner against each of the five IoT devices.
3.1 Vulnerability Scanners Used
Five different open-source vulnerability scanners are going to be tested. They were
chosen based off a few factors. The first is that they are open source, as closed source
and paid software will usually have barriers or restrictions on how they can be used
at different pay tiers or settings. The second factor is how well-known they are. Most
of the list comprises of well-known scanners or from reputable groups. The last
factor is how well they could potentially be used with IoT devices in specific. The
five scanners chosen are OpenVAS (Greenbone Vulnerability Management), Vuls,
SNOUT, Vulscan, IoTSeeker.
OpenVAS Originally developed as a completely free open-source project, Open-
VAS is now developed by a company called Greenbone as part of their Greenbone
Security Management (GSM) product [10]. This product is a complete all in one vul-
nerability management solution. The source code is still open source; however, they
offer both a free Greenbone Community Feed (GCG) and subscription Greenbone
Security Feed (GSF). The GSF subscription feed offers more features along with
support and various enterprise applications. The free GCG is completely adequate
for home use and the purposes of this analysis.
OpenVAS also was the original version of the Nessus Vulnerability Scanner.
Nessus is a closed source product that offers very similar features as OpenVAS.
OpenVAS forked from the GPS version of Nessus (version 2) after it went proprietary
in 2005. The plugins for OpenVAS are still written in the Nessus Attack Scripting
Language (NASL) [4].
Vuls Vuls is an open-source agentless vulnerability scanner for Linux/
FreeBSD [9]. The vulnerabilities that it searches for are all based on multiple vulner-
ability databases, which are NVD, OVAL, JVN, RHSA/ALAS/ELSA/FreeBSD-SA.
The scanner can be run anywhere, meaning it can be installed and run on the cloud,
physical hardware, virtual machines, and within Docker. Vuls is also capable of scan-
ning various components of a system. It can scan non-OS packages such as libraries,
frameworks, or code compiled yourself. These however must all be registered in the
Official Common Platform Enumeration Dictionary (CPE). This is simply a naming
scheme for IT systems, software, and packages.
In terms of UI and reporting. Vuls has both a terminal user interface (TUI) and
web-based graphical user interface (GUI). Both options are very descriptive in the
results they show, providing all sorts of information on the specific vulnerabilities
found. Email and Slack notifications can also be configured to send notifications to.
Snout Snout stands for SDR-based Network Observation Utility Toolkit [11].
This application utilized software-defined radio (SDR) to be able to interact with the
various non-IP wireless communication protocols, the most popular of which and the
two that SNOUT supports being Zigbee and Bluetooth low energy. SNOUT is adver-
tised as having a few features, being device enumeration, vulnerability assessment,
advanced packet replay, and packet fuzzing. This program is not only a vulnerability
scanner, but also has other features that support functions such as penetration testing.
SDR is a radio frequency communication method that does all of the processing
on the software level. The actual hardware (transmitter, receiver) is used just to
send and receive messages. The messages themselves are created with software.
This function allows SNOUT to be able to communicate on the different wireless
protocols, as they do not use the same communication methods that a Wi-Fi dongle are
physical connection would be capable of. This does mean that this program requires
a supported transceiver to be able to perform this SDR-based communication. The
supported ones is HackRF One and USRP transceivers.
Vulscan This scanner is an addition to the popular and well-known command line
utility Nmap [12]. Nmap supports custom scripts to allow additional functionality.
Vulscan utilizes the Nmap flag -sV which enables version detection. From here,
the script analyzes the port number, port state, service running, and version of that
service to come up with a prediction of the vulnerabilities on that specific host and
ports.
Results from this scanner can be complicated and hard to interpret. Since Nmap
is a command line utility, the results are presented in this fashion. In turn, the results
are not very intuitive or easy to read. It can take some time to go through the results
and determine for yourself what is of concern and what can be mostly ignored.
From the Vulscan GitHub, “Keep in mind that this kind of derivative vulnerability
scanning heavily relies on the confidence of the version detection of Nmap, the
amount of documented vulnerabilities and the accuracy of pattern matching” [13]
The resulting vulnerabilities presented are very dependent on how accurate Nmap
is of services and versions being detected. This is especially important for IoT use,
as many of the devices meant for the home are using their own proprietary services
or custom embedded operating systems that may not be easily detected by Nmap,
resulting in no results found or incorrect results.
IoTSeeker This isn’t a traditional vulnerability scanner in the sense that it scans
for all vulnerabilities on a system [13]. This scanner scans for a single vulnerability,
but a very relevant and common one. This vulnerability is default credentials. Often,
owners will deploy an IoT device without changing the default logon credentials or
configuring the device at all. This is very common and easy to discover on services
such as Shodan. It will allow any unauthorized person to have full administrative
rights to a device.
IoTSeeker is open source and developed by Rapid7, the creators of Metasploit
and other IT security solutions [13]. It is a less well-known scanner but is developed
by a reputable and well-known company. Although this only scans for a single vul-
nerability, it is a very useful one that the other scanners do not check for. The script
includes a file containing the username and password combinations that it checks
for, which according to the creator is updated often.
3.2 IoT Devices Used
The devices used in this analysis were chosen based on a few reasons. The types of
devices used were those that are commonly seen in the smart home environment.
These devices are a smart personal assistant, security camera, Smart IoT Hub, Zigbee
sensor, and a Raspberry Pi.
Smart personal assistants are seen everywhere and constantly used in the smart
home environment. These are devices that you can verbally ask questions and get
responses. They are based off of speech recognition and artificial intelligence to
determine what the user asks and to come up with the appropriate response. As of
2017, an estimated 10% of worldwide consumers own a smart personal assistant [14],
so in 2021 the percentage should be higher. The most popular devices are Amazon
Echo, Google Home, and Apple HomePod. In this test, the Amazon Echo was chosen.
As of 2017 over 50 million Amazon Echo’s have been sold in the USA alone [14].
Security cameras are one of the most popular IoT devices currently used. Not only
are they used in the home, but also across all industries. In 2017, 98 million network
surveillance cameras and 29 million HD CCTV cameras were distributed [15]. Not
only are these devices used frequently, but often by users who are not aware of the
Table 1 IoT devices used for vulnerability assessment

Device type Device name and manufacturer
Smart personal assistant Amazon Echo Dot
Security camera Wyze Cam v3
Smart IoT Hub Aqara Smart Hub
Zigbee sensor Aqara temperature and humidity sensor
Raspberry Pi Raspberry Pi
security risk they pose [15]. This is the case with most home IoT devices, but IP
cameras in particular.
In the home, IoT devices are often connected to one centralized device so that
they can be better managed and monitored. These devices are called hubs or smart
home hubs. The hub is the heart of the system and controls all elements connected
to it [16]. Hubs are very common in the home and critical to secure. They are a
gateway to all devices connected to it, so if this device gets compromised, all devices
connected are compromised.
The Zigbee device was chosen to analyze the effectiveness and how necessary it
is for a scanner to support non-IP wireless protocols such as Zigbee and Bluetooth
LE. In the tests performed, Snout is the only scanner to contain this feature. Usually,
insight into these devices is limited to the information sent to the gateways (hubs) that
connect the device [11]. Including one of these devices will give insight as to how
crucial and impactful it can be for a scanner to support the other wireless protocols.
The last device being tested is a Raspberry Pi. A Raspberry Pi is a very popular
computer about the size of a credit card that can be fully set up and configured any way
the user wants. They are often used in the home for DIY IoT projects that can utilize
the small amount of processing power, small size, and control that these bring. They
are perfect for building out custom solutions that may prove to be much cheaper or
provide more control than other off-the-shelf options. The Raspberry Pi has built-in
support for 10/100 Mb/s Ethernet, Wi-Fi, and Bluetooth [16]. Although not as easy
to set up and use as the other smart home devices tested, the Raspberry Pi would be
the kind of device used by individuals who would also deploy a vulnerability scanner
in their home. Table 1 lists the IoT devices used in this study.
3.3 Experimental Setup and Testing Process
Each of the vulnerability scanners was installed into their own virtual machine using
VMWare. The VM’s were configured with 4 GB of Ram, 1 Processor, 45 GB Hard
Drive, and a bridged network adapter. This is plenty of storage and resources to run
the Linux OS’s and scanners. Bridged networking allows the virtual machine to act as
if it was its own device on the network it’s connected to. In this case, the VM appears
as a device on my home network and the DHCP server in the router assigns the VM
an IP. This allows the device to communicate with all other IP’s on the network.
The IoT devices are all configured and setup as if they were being used normally.
This means downloading the required apps and connecting them as needed. The
Aqara Zigbee sensor was connected to the Aqara Smart Hub so it would be actively
transmitting data over the Zigbee protocol. The Raspberry Pi was setup with Raspbian
OS, which is a custom version of Debian meant for the low computing power of the
Raspberry Pi. SSH, Telnet, and Apache were all installed and enabled as these are
services sometimes used to communicate to IoT devices. They were also enabled to
provide some vulnerabilities to discover.
Once everything was setup and installed, the scans were performed. This was
fairly straightforward at this point; however, the procedure for each scanner varied.
Once the scans finished, there were a few things that were looked for. If the scan
worked, how long the scans took, how many vulnerabilities were found, and how the
usability was.
4.1 Ease of Use, Documentation, and Installation
An important aspect of any application or piece of software is how usable it is.

If something is incredibly complex to use and interpret, it is not going to be used
as frequently. Especially as these devices and scanners are being tested in a home
environment, ease of use is an important factor to analyze. Along the same lines is
documentation. Many issues and user errors can be resolved hassle-free with solid
documentation of the product. This should explain all aspects of the package and
how to use it properly. Lastly, the installation process is important in analyzing the
effectiveness of the product. Installing packages on Linux isn’t always very easy
and direct. This means that the installation process can influence the usability of the
product. If the product has a lengthy and problematic installation process, then it will
not be as effective than a package that can be installed quickly and effortlessly.
OpenVAS As one of the oldest and most established vulnerability scanners and
comes with the most support and documentation out of the five scanners chosen.
There is a lot of resources that exist to help with OpenVAS. The problem is that
since the original scanner has been around for some time, and the project changed
direction into the fully featured vulnerability management solution for businesses,
there is a lot of conflicting or outdated information. OpenVAS as its own standalone
software does not exist anymore, it is contained within the Greenbone vulnerability
management solution. This just needs to be kept in mind when reading information
out there that may be more outdated.
Installation for OpenVAS can be a struggle depending on the route taken. They
offer a couple methods for the open-source free option, GSM Source and GSM
TRIAL. GSM Source allows the user to download the source code and compile it
directly on their machine. This allows for more control of the installation but is also
much more complicated, not to mention time-consuming. The GSM Trial is a free
version of their professional solution meant to run on laptops or virtual machines.
This solution is incredibly simple to implement, consisting of just an ISO file and
that can be installed like any other OS.
All in all, the program is very simple to install and configure if the GSM Trial
route is taken. The whole download, installation, and configuration process only took
around 10 min to complete and be fully operational.
Vuls Documentation for this scanner from the creators was substantial but seemed
to be lacking in the community engagement. The Vuls website itself has plenty of
information on getting the application started and operational, but other than that
there isn’t much information on other parts of the internet. Very little can be found
on YouTube, however a few guides can be found through Google on various blogs.
Installation was fairly straightforward. It can either be installed manually, requir-
ing the need to install each module separately and installing all dependencies, or
using what they call Vulsctl. Vulsctl is an easy to install docker image that offers the
advantage of non-complex installation and quick startup. The basics of docker need
to be understood to use it at a moderate level, but all of the essentials can be found
through the tutorial that the creators provide.
Snout Out of all the scanners in this list, Snout is in the earliest stages. Being
at version 0.0.1, it is at the very first release, meaning that no bugs have been fixed
and that it may not be as refined as it could be. The documentation is almost non-
existent. The only materials that were found was the GitHub page that gives very
brief installation instructions, a brief showcase video by the creators with some
sample usage and explanations, and the academic paper written for it. Out of all this
material, the only install instructions are from the GitHub page that contains two
simple Linux commands, where in reality the build process was much more involved
when dependency errors and build errors came up.
The installation and build were attempted on various operating systems and ver-
sions to see if the issues would be resolved. These however persisted regardless. The
build and installation process was all in all fairly frustrating. Although the package
was installed and could be run, in the end a usable installation of SNOUT was not
achieved. SNOUT relies upon SDR for its communication with the various wireless
protocols on different frequencies. As it turns out, the creators only built it to sup-
port two pieces of SDR hardware, the Hack RF and Ettus Research USRP devices.
This was not known beforehand and only discovered once a working installation of
SNOUT was achieved and commands run.
Vulscan Built upon the famous Nmap port scanner, Vulscan was very easy to
install and use. Nmap is installed directly through the package manager (“apt” on a
Debian-based OS). This generates a folder directory /usr/share/nmap/scripts, where
all Nmap scripts can be copied to. Once this is done, running the scanner is as easy
as running a Nmap scan specifying the script to use. Documentation for Nmap itself
is very robust. Nmap’s popularity means that there are a ton of resources that exist on
how to use it. Most of this can be applied to Vulscan as the general syntax is the same
and different troubleshooting solutions can also be applied to problems encountered

with the Vulscan script.
IoTSeeker This is a simple Perl script that is very easy to install and run. It is
only one Perl file; no build needs to be done or installation. The script just needs to
be installed and ran. Perl Uses what is called Comprehensive Perl Archive Network
(CPAN), which is the package manager for the language, similar to pip for python.
The two dependencies for this script are installed using CPAN, which also needs to
be updated beforehand. Once the dependencies are set, the actual script can just be
cloned from GitHub and ran from the command line using Perl.
IoTSeeker does not have tons of documentation; however, it doesn’t need it. The
script is very simple and doesn’t warrant the community and support around it that the
others do. Since it was created and is maintained by Rapid7, there is support in that
sense. Rapid7 is a professionally security software provider and has the necessary
support for all of their projects.
4.2 Results and Potential Use Cases
OpenVAS OpenVAS was able to scan almost all of the devices. The Raspberry Pi
had the most robust results with all levels of vulnerabilities being detected. This is the
closest to a traditional device that would be scanned so this makes sense. The Wyze
cam v3 and Aqara Smart Hub also had results, with mostly log level vulnerabilities,
but also a low- level vulnerability. The Amazon Echo was not able to be scanned at
all. This seems to be the most consumer aimed device with rather strict security in
place. The scanner could not pull any information from it and did not even report
that the host was scanned at all.
One strength that OpenVAS seems to present is the amount of information gath-
ered. Not only does it detect vulnerabilities, but the reports also show various other
information. This includes the OS that is running, open ports and the services, and
applications running. This is all very useful information, especially for IoT devices
that are not monitored as closely as other machines would be. Even if vulnerabilities
aren’t directly detected and reported by OpenVAS, the user reviewing the results can
see what is going on with that device and if it lines up with what is desired. The
presence of log level results also can prove to be useful. Even though they aren’t
vulnerabilities per say, these results can provide solid insight into how the devices
are operating.
Overall, OpenVAS provided great results that the other scanners could not achieve.
This scanner would prove to be the most useful in situations where large amounts
of information are desired. The amount of insight that this scanner provides is very
useful in enumeration and device monitoring. Also, no additional steps needed to be
taken on the end of the target devices. It is all preformed remotely without the need
to install agents or setup authentication.
Vuls Vuls provides thorough results for the devices that it could scan. That is
the catch, however. In order to scan a device with Vuls, there needs to be key-based
SSH authentication setup on the target. This is not always possible with home IoT
products, as the manufacturers to not allow that type of control over the device. This is
understandable though and for security reasons. Consumers do not need to have that
level of control over a smart home device, as they are provided with an application
that allows control that way. The only device that was able to be scanned with Vuls
was the Raspberry Pi, as this is completely setup by the user, and they have control
of the root account allowing anything to be done.
The vulnerabilities that Vuls detected were not very informative. The Web site
shows vulnerabilities with much more detail including remediation steps, but the
ones that came up in this project had most fields blank. The only information present
was the CVE number, effected process/packages, and a short description of the vul-
nerability. Having more information present such as remediation steps and criticality
would be much more helpful to the user in determining what to do in response to
each vulnerability. Whether they should be fixed and how or can be ignored and left
alone.
Vuls would be great for instances where all devices are able to have ssh setup, but
in this usually would not be the case. With Home IoT this generally is not possible so
Vuls should be avoided as it would not provide full coverage. If all home IoT devices
are self-made using Raspberry Pi’s or other microcomputers, then Vuls would be
possible and provide a higher level of security than the other scanners. Having SSH
setup would also allow remote management of each device which could prove itself
to be very useful.
Snout Based on a few glaring reasons, Snout does not seem to be in any usable
state for consumers currently. The package has quite a bit of potential in being an
all-around utility for IoT security, but its current version does not quite hit that mark
for the average user. The package all-around needs some more work in many areas,
in particular the installation process. It is on its very first version (v 0.0.1) so this is
completely expected. It however does seem like more of a proof of concept rather
than a supported package.
The installation process and documentation are the biggest problems with the
current iteration. The package needs to be built from source, which is never an easy
thing to do on Linux, and thorough knowledge of the OS is needed to properly
go about this. There are some issues with the build process that was encountered
on multiple OS installations and multiple Linux distributions, so this needs to be
addressed. The documentation also could use a decent amount of work. The GitHub
page does not give clear instructions of how the installation is done, and there is no
wiki or docs supporting the package.
Disregarding the state that the package is in currently, Snout boasts some impres-
sive features that could have many different potential use cases. It can be used for
monitoring and enumeration. If it is desired to see what devices are connected or
using the different wireless protocols (Zigbee, Bluetooth LE) around the area, this
is the best option. In terms of vulnerability scanning, the package does seem to be
lacking here. It does have the potential to scan Zigbee devices for vulnerabilities,
but as the demo video demonstrated, only one vulnerability can be detected with
Zigbee. This is the ZLL vulnerability. Although Zigbee isn’t a major target of mali-
cious users, there definitely exist security risks that need to be managed. Snout has
the framework and ground set to perform more robust vulnerability scans but needs
more in the way of vulnerability detection.
Since Snout has many more features than vulnerability scanning, I would not
recommend it as the only solution in place for this, but it is a solid toolkit for
managing and monitoring a smart home environment. It can best be used in a situation
where there are multiple wireless protocols being utilized and many devices in a
close vicinity. Snout will allow the user to enumerate all devices and confirm that
everything is in order. It should be used alongside a more robust and established
vulnerability scanner however, because this aspect is not a fleshed out as would be
needed. This would also require a fairly tech-savvy user to get the package installed
and in a usable state. It would not be feasible for an average user.
Vulscan Nmap paired with the Vulscan script make a very effective scanning tool.
Capable of scanning devices completely remotely without any interaction with the
device itself, this solution would be very effective on a smart home network with a
wide variety of devices from different manufacturers. The script interacts with the
device purely at the network level, so nothing is altered or changed on the device
itself. IoT devices are different but also very similar in terms of the protocols they
utilize. The common protocols are HTTP, HTTPS, SSH, Telnet, and a few others.
These protocols and services running can be analyzed very easily with Nmap and
Vulscan. Vulscan also allows a wide variety of customization regarding how the
results are presented and what vulnerability databases are used for each scan.
The issue with Vulscan is that it requires the device to have open ports and run-
ning common services to operate normally. With more specialized devices, this isn’t
always the case. For example, the Wyze Cam had only one open port and it was
running a proprietary service. The Amazon Echo was running a few open ports but
would not allow Nmap to scan it at all. It would never return any fingerprints of the
services running.
This scanner would be also useful when you need just a quick and easy solution to
vulnerability scanner. Very easy to install and preform scans with, Vulscan is the ideal
solution for quick tests. It is not a thorough be all end all solution for vulnerability
scanning but will give a good idea of what is vulnerability on the OS level, and
port/service level. Since it runs on Nmap, this scanner will also be very familiar for
a lot of people. Nmap is very popular so users are likely to have used it in the past if
they are involved in IT or technology.
Lastly, Vulscan would be a good option for consumers who want a non-intrusive
scanner. It requires no installation or interaction of any part with the target devices,
will leave almost no trace on the target machines, and will not interfere with the
device’s normal operations. All in all, Vulscan is a great quick and easy solution that
will present a basic but also satisfactory understanding of the vulnerabilities present.
IoTSeeker This seems to be a great scanner if you are looking for the bare
minimum in terms of IoT security. A common problem with these devices is that
they are not configured or set up correctly. Especially when a large home network
worth of devices are deployed, sometimes these steps can simply be neglected. Often
times not on purpose, but one or two devices might get skipped over. This scanner
is perfect for this kind of situation. It also ideal for an occasional use, as once this
vulnerability is fixed, it will not come back unless the device is completely reset.
IoTSeeker would be perfect to use after many devices are deployed to ensure nothing
was glossed over. It can also be used maybe a few times a year if devices are slowly
accumulated over time.
This script is actually the only one out of the five that searches for this vulnerability.
Since most of the other scanners are not tailored toward IoT devices, they are not
searching for default IoT manufacturer passwords. This scanner should most likely
be used alongside a more robust scanner that looks for multiple vulnerabilities.
IoTSeeker in conjunction with Vuls or OpenVAS can ensure that most of the bases
are covered.
IoTSeeker seems to not support many device types as of current. It includes a
configuration file with the device names and the username and password combina-
tion it checks for. At the time of writing this, there were only 18 different devices
supported. In the grand scheme of things, this is a tiny amount when compared to
the number of IoT manufacturers and products that exist. This file can be added to
manually, but this takes out the convenience and ease of use the script has.
Default usernames and passwords are a very common problem with IoT devices.
Leaving these configured like this makes it very easy for an attacker to gain access.
To an attacker that is aware of this vulnerability, it is practically like leaving the
device completely open with no password. IoTSeeker is a great solution to this, even
if to just ensure that no device was overlooking in its setup. All in all, a very simple
script, but very useful as well. This would definitely be recommended as the time it
takes for installation and execution is negligible. Tables 2, 3, and 4 summarize the
comparison of IoT vulnerabilities discovered, scan times of vulnerability scanners,
and additional information discovered by vulnerability scanners, respectively.
Table 2 Comparison of IoT vulnerabilities discovered

Vulnerability IoT devices
scanner
Amazon Echo Wyze Cam v3 Aqara smart Aqara sensor Raspberry Pi
Hub
OpenVAS [10] No results 5 log 8 log, 1 low N/A 3 High, 5
Medium, 1
Low, 100 Log
Vuls [9] N/A N/A N/A N/A Fast scan—54,
Fast-root—63
Snout [11] No results No results No results No results No results
Vulscan [12] No No No N/A Over 500
vulnerabilities vulnerabilities vulnerabilities
found found found
IoT No No No N/A No
Seeker [13] vulnerabilities vulnerabilities vulnerabilities vulnerabilities
found found found found
Table 3 Comparison of scan times of vulnerability scanners

scanner
Hub
OpenVAS [10] 1 min 3 min 11 min N/A 18 min
Vuls [9] N/A N/A N/A N/A Fast
scan—5 s,
Fast-root—
2 min
Snout [11] N/A N/A N/A N/A N/A
Vulscan [12] 1 min 12 s 7s 24 s N/A 37 s
IoT 1 min 25 s 1 min 25 s 1 min 25 s N/A 1 min 25 s
Seeker [13]
Table 4 Comparison of additional information discovered by vulnerability scanners

scanner
Hub
OpenVAS [10] Could not OS (Linux Open Ports Not connected OS (1),
interact with Kernel), no (1), OS (Linux to the Applications
the device at open ports Kernel) network, no IP (21), Open
all Ports (3)
Vuls [9] Cannot setup Cannot setup Cannot setup Cannot setup Viewed results
SSH key SSH key SSH key SSH key with the TUI
Authentica- Authentica- Authentica- Authentica-
tion tion tion tion
Snout [11] Scanner not Scanner not Scanner not Scanner not Scanner not
compatible compatible compatible compatible compatible
with SDR with SDR with SDR with SDR with SDR
hardware used hardware used hardware used hardware used hardware used
Vulscan [12] Ports 1080 All ports Port 4567 Does not use Many
and 8888 open closed, open only, OS IP duplicate
no service detected as and OS vulnerabilities
detected webcam, version
running Linux detected
IoT Failed to Failed to Failed to Does not use Did not find
Seeker [13] establish TCP establish TCP establish TCP IP device type
connection connection connection from config
file
5 Conclusion
In general, most of these vulnerability scanners have a way to go for being as effective
with IoT devices. The general framework and functionality exist, but the specifics
for IoT still have more to be desired. The open-source scanners are all very robust
and work well for the intended target hardware. The functionality just needs to be
extended. With IoT devices, it is also important to have minimal direct interaction
with the target devices themselves. Not all IoT devices can be accessed through the
OS with services such as SSH, some are much more locked down and only allow
interaction through their custom-built services like phone applications. When using
a scanner for this field of hardware it is important to ensure that something such as
an SSH connection or any configuration directly on the device does not need to be
done.
Most of the scanners used in these tests leverage vulnerability databases to base the
discoveries off of. These vulnerability databases are precompiled with vulnerabilities
discovered and corresponding information. These databases need to be updated to
include IoT devices, or a separate database needs to be created focused on IoT. There
are a few initiatives ongoing to create such a thing, but they aren’t fully developed
yet. For example, the Warren B. Nelms Institute at the University of Florida is in
the process of building an IoT-specific security vulnerability database called IoT-
SVC [17]. It is a great start, but more work still needs to be done to make it a reliable
source. The last measure that needs to be taken with IoT scanners is the different
wireless protocols. The Snout scanner implemented this and is a good start, but still
leaves a lot to be desired. It is simply not robust enough or developed enough to
be effective at vulnerability discovery. This functionality of scanning other wireless
protocols needs to be expanded and combined with the vulnerability discovery of
other scanners like OpenVAS or Vuls.
IoT devices are in use across practically all industries and environments, with the
total number of devices worldwide being upwards of 5 billion. Vulnerabilities present
on these devices can make them easy targets for attackers. Current vulnerability
scanners can be used with IoT devices, but the effectiveness is not consistent across
different device types and manufacturers. Expanding the functionality of open-source
scanners or having an IoT-specific scanner will greatly improve the security of these
devices and the convenience of making them as secure as possible.
References
1. Goasduff L (2021) Gartner Says 5.8 Billion enterprise and automotive IoT endpoints will be in
use in 2020. https://www.gartner.com/en/newsroom/press-releases/2019-08-29-gartner-says-
5-8-billion-enterprise-and-automotive-io. Accessed 8 June 2021
2. Mahmoud R, Yousuf T, Aloul F, Zualkernan I (2015) Internet of things (IoT) security: current
status, challenges and prospective measures. 2015 10th International conference for internet
technology and secured transactions (ICITST). IEEE, New York, pp 336–341
3. Deogirikar J, Vidhate A (2017) Security attacks in IoT: a survey. In: 2017 International con-
ference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC). IEEE, New York,
pp 32–37 (2017)
4. Chalvatzis I, Karras DA, Papademetriou RC (2019) Evaluation of security vulnerability scan-
ners for small and medium enterprises business networks resilience towards risk assessment.
In: 2019 IEEE international conference on artificial intelligence and computer applications
(ICAICA). IEEE, New York, pp 52–58 (2019)
5. Amro A (2020) Iot vulnerability scanning: a state of the art. Comput Security, pp 84–99 (2020)
6. Markowsky L, Markowsky G (2015) Scanning for vulnerable devices in the internet of things.
2015 IEEE 8th International conference on intelligent data acquisition and advanced computing
systems: technology and applications (IDAACS), vol 1. IEEE, New York, pp 463–467
7. Hassija V, Chamola V, Saxena V, Jain D, Goyal P, Sikdar B (2019) A survey on IoT security:
application areas, security threats, and solution architectures. IEEE Access 7:82721–82743
8. Anand P, Singh Y, Selwal A, Alazab M, Tanwar S, Kumar N (2020) IoT vulnerability assess-
ment for sustainable computing: threats, current solutions, and open challenges. IEEE Access
8:168825–168853
9. Corp F (2021) Vuls. https://github.com/future-architect/vuls. Accessed 8 June 2021
10. Rahalkar S (2019) Openvas. Quick start guide to penetration testing. Springer, Berlin, pp 47–71
11. Mikulskis J, Becker JK, Gvozdenovic S, Starobinski D (2019) Snout: an extensible IoT pen-
testing tool. In: Proceedings of the 2019 ACM SIGSAC conference on computer and commu-
nications security, pp 2529–2531
12. Vulscan (2021) https://github.com/scipag/vulscan. Accessed 8 June 2021
13. Rapid7 (2017) IoTSeeker: locate connected IoT devices and check for default passwords.
https://information.rapid7.com/iotseeker.html. Accessed 8 June 2021
14. Bugeja J, Jönsson D, Jacobsson A (2018) An investigation of vulnerabilities in smart connected
cameras. 2018 IEEE international conference on pervasive computing and communications
workshops (PerCom workshops). IEEE, New York, pp 537–542
15. Yang H, Lee W, Lee H (2018) Iot smart home adoption: the importance of proper level automa-
tion. J Sensors 2018 (2018)
16. Singh KJ, Kapoor DS (2017) Create your own internet of things: a survey of iot platforms.
IEEE Consumer Electron Maga 6(2):57–68
17. Jin Y (2018) IoT/CPS security vulnerability database. https://iot.institute.ufl.edu/academics/
iot-cps-security-vulnerability-database/. Accessed 9 June 2021
Emotion and Collaborative-Based Music
Recommendation System
R. Aparna, C. L. Chandana, H. N. Jayashree, Suchetha G. Hegde,

and N. Vijetha
Abstract Music plays a vital role in the life of several people, and they consider it as
a part of their life. Whenever a person is happy, sad or emotional, he prefers to relax
his mind by listening to music. To get songs of their own interest, users keep searching
for them in search engines. As we look into the history of searching, the complexity of
search has gradually decreased, maybe due to advancement in technology and various
methods adopted for searching. In this paper, we are concentrating on suggesting
appropriate songs for the users based on their feelings (or mood) known as the music
recommendation system. The objective of the paper is to find the suitable method for
providing recommendations based on access to the music by similar users and history.
Here, we are considering different methods for implementation like cosine similarity,
collaborative filtering, popularity-based and emotion-based methods and also many
parameters like singer, name of the song, genre and movies which help in finding
proper song. We are also analyzing the performance of the same. The advantage of
the music recommendation system is that it avoids the user from searching manually.
It not only saves time for searching, but also updates the similar new song, if any.
Keywords Music recommendation system · Content-based · Collaborative-based ·

Emotion-based
1 Introduction
There is abundant information available on the Web, which can be accessed by

several people to get solutions for many problems. Music is one of the interesting
rapid developments in the field of mobile devices, and its availability for maximum
people has made it easy to access music freely from anywhere and anytime. The
topic of interest here is not only availing what the user wants but also making the
experience of the user-personalized each time search happens [1]. Hence, demanding
R. Aparna (B) · C. L. Chandana · H. N. Jayashree · S. G. Hegde · N. Vijetha

Department of Information Science and Engineering, Siddaganga Institute of Technology,
Tumakuru 572103, India
e-mail: raparna@sit.ac.in
https://doi.org/10.1007/978-981-16-7610-9_59
802 R. Aparna et al.
for the development of such a system which along with the main functionality keeps
count of the user interest constantly.
1.1 Objectives
Music is found to have an indirect effect on the listeners’ mood and makes them
active and energetic. It is also important to note that music can be used to cure
health issues in human beings like psychiatric disorders, substance abuse issues,
sensory impairments, physical disabilities, communication disorders, developmental
disabilities, interpersonal problems, aging, etc. Using music to enhance or maintain
health is known as music therapy.
A recommender system is a subclass of the information filtering system which
usually predicts the “rating” or “preference” a user would allot to the item. We
find the application of recommender systems in different areas, such as product
recommenders in online stores, playlist generators in the video and music services,
recommendation based on content and context for social media platforms, and also
the open-Web recommenders for content [2–4]. These systems can be made to operate
across various platforms like books, news and search queries. We also know the
popular recommender systems for specific topics such as hotels and restaurants.
Recommender systems are also developed for exploring research articles and experts,
collaborators, and financial services. The main objective of the development of such
a recommendation system boosts user experience and attracts more users.
1.2 Recommendation Techniques
The usage of highly efficient and more accurate recommendation techniques plays an
important role in a system to provide good and useful recommendations to the users of
the system [5–7]. This helps the developers to understand the importance of features
and potentials of different recommendation techniques. There are three different
types of recommendation techniques, namely content-based filtering, collaborative
filtering and hybrid filtering.
1.2.1 Content-Based Filtering
Content-based technique of recommendation is a domain-dependent algorithm. It

highlights the analysis of the attributes of the items for generating predictions. In
content-based recommendation, recommendation is done based on the user’s taste
of music and using the features extracted from contents of the items that the user
has evaluated previously [6]. In this technique, keywords are used for describing the
items. These algorithms recommend items which are similar to the one that a user
Emotion and Collaborative-Based Music Recommendation System 803
Fig. 1 Content-based
recommendation system
liked in previously, or the user is examining now presently [1]. It is independent of

a user sign-in mechanism for generating this temporary profile.
In content-based recommendation, the articles which the user likes (say song 1)
is compared with the other songs present in the dataset. If the other song (say song
2) is found to be similar to song 1, then song 2 is recommended to the user. Figure 1
shows the representation of content-based recommendation.
1.2.2 Collaborative Filtering
Collaborative-based filtering is a recommendation technique in which recommenda-

tion involves filtering of information or patterns with the help of techniques which
involve collaboration among multiple agents, viewpoints or data sources [8]. The
system makes recommendations using the information about the rating profiles for
different users or items. Collaborative-based filtering methods are usually classi-
fied into memory-based and model-based. User-based algorithms are considered as
memory-based approaches, while Kernel-Mapping recommender is found to be the
example of model-based approaches.
In collaborative filtering, if user A likes song 1 and song 2 and user B likes only
song 1, then one can recommend user B with song 2 as it can be assumed that user
B has similar taste as user A.
The most popular online shopping website Amazon uses collaborative filtering to
recommend its users with the items they may like to buy. Collaborative filtering can
be represented as shown in Fig. 2.
1.2.3 Hybrid Filtering
Hybrid recommendation is the popular recommendation technique. It combines

content-based and collaborative filtering together. There are different methods of
implementing the hybrid recommendation: one approach is by doing content and
collaborative-based predictions separately and later combining the results. Another
Fig. 2 Collaborative
filtering
approach is first identify and then add content-based capabilities for collaborative-
based approach (or vice versa); another method is to unify the approaches into
one model. The performance of the hybrid approach is more accurate than pure
collaborative- and content-based methods. Netflix is one good example for hybrid
recommendation systems. This website considers the watching and searching
patterns of similar users, i.e., collaborative filtering and comparing them and also
offering movies which share the characteristics of films that the user highly rated,
i.e., content-based filtering.
1.2.4 Proposed Work
In this paper, we have implemented content-based, collaborative filtering and

emotion-based recommendation systems and analyzed the performance of both the
techniques. The content-based technique is implemented using cosine similarity
method. It is based on the cosine angle between the songs which is calculated based
on the features of the songs like genre, artist and movie. Collaborative filtering
is implemented using the K-Nearest Neighbor algorithm (KNN algorithm). Item
similarity-based approach is built upon popularity-based approach. Item similarity
filtering involves creation of co-occurrence matrix. The co-occurrence matrix has
the weighted average of all the user songs. In an emotion-based recommendation
system, user emotion is captured using a webcam. The songs which represent that
emotion are recommended to the users based on the genre of the songs to which they
belong to.
2 Literature Survey
2.1 Existing System
Adiyansjah et al. have worked on music recommender system based on genre using
convolutional recurrent neural networks in [9]. Here, they have recommended the
music by comparing the similar features on audio signals. This approach can be
considered as a content-based recommendation system because it recommends based
on the perceptual resemblance of what users have heard previously. Input is prepro-
cessed and fed into a convolutional neural network. They have used convolutional
recurrent neural networks (CRNNs) for extracting features and similarity distance to
find the similarity between features. Receiver operator characteristics and precision–
recall is used for evaluating. Drawback of this paper is that they have considered less
features for recommending.
In [10], Anand Neil et al. have recommended music based on collaborative filtering
and deep learning. In this paper, they have used collaborative filtering and YOLO
(you only look once) methods for music recommendation. Data is preprocessed using
R and Python. This paper concludes that hybrid recommendation systems yield better
results once the model is trained enough to recognize the labels.
Hu et al. recommend the music based on user behavior in [11]. In [12], authors
have used ANN model and KNN regression algorithm to compare different songs
based on similarity. Ranking scores were calculated based on the combination factor
of songs. Here, loss function is decreasing with the increase in epoch. Drawback of
this paper is high prediction complexity for large datasets.
In [13], Prachi Singh et al. have used random forest and XGB classifier for music
recommendation. Accuracy of the random forest algorithm is 0.75 and accuracy of
the XGB classifier algorithm is 0.72. Drawback of this paper is that accuracy of the
recommendation was dependent on split between test and training data.
In the paper titled “Multimedia Recommender System using Facial Expression
Recognition,” the author Prateek Sharma [14] considers the human face as an input
which is captured from a webcam. The face is the usual source of expression which
is the key information used in the system to identify the emotion of the user. The
emotions of the user are mapped with the genres of song or movies.
In [15], Schedl et al. have addressed the current challenges in the music recom-
mendation system. Various challenges in the music recommendation system were
also addressed. Drawback that is mentioned in this paper is the cold start problem,
automatic playlist continuation.
In [16], authors have used “Forgetting Curve” to assess freshness of a song and
evaluate “favoredness” using user log. They have analyzed the user’s listening pattern
to estimate the level of interest of the user in the next song. Running time increases
linearly with the increase in the size of the song library. Drawback of this paper is
that the dataset which they have used is not fetched from any music server.
2.2 Proposed System
2.2.1 Scope of the Paper
By analyzing the work carried out from the above papers and drawbacks identified,
in this paper, we are proposing and developing a system whose performance is
better than the systems currently available. In the recommendation systems so far
developed, most of them concentrate on recommending movie and products (items),
but very few address music recommendations. In this paper, we are developing a
collaborative- and emotion-based music recommendation systems by considering
many parameters like the singer, the name of the song, genre and movies which
helps in coming out with accurate results. As per our knowledge, emotion-based
recommendation system is rarely addressed in the literature. It is an innovative way
to recommend music, which takes real-time emotion of the user as input and thus
benefits the user with dynamic experience. It is also an emerging trend in the music
recommendation. In the existing methods, the emotions considered were less, so the
genres of the song that could be considered decreased. In our paper, we propose a
work which includes more emotions; hence, wide variety of genres were considered.
The existing methods usually require additional hardwares like EEG or sensors for
emotion recognition. But, in our proposed system, we use CNN model that takes
image as input, and the image is captured using the webcam.
In our paper, we have implemented four methods that are cosine similarity,
popularity-based, collaborative filtering and emotion-based music recommendation
systems. Popularity-based method is used to display the top recommended songs from
the playlist, emotion-based method is used to recommend the songs based on real-
time expressions by the user, cosine similarity method considers multiple features
from the dataset and recommends the song and collaborative method recommends
the songs to the user based on both user similarity and item similarity approach.
3 System Design
3.1 Content-Based Music Recommendation System
Cosine similarity is one of the content-based recommendation systems. In this

method, based on the data provided by the user, the model is trained and similar
songs are recommended. As the name suggests, cosine similarity finds the cosine of
the angle between any two n-dimensional vectors which is in an n-dimensional space.
The count of the similar words in the documents are considered for comparison. The
documents will be converted to vectors. The vectors are nothing but the array of
word count in the document. The vectors are then projected to multi-dimension and
angle between two documents is calculated. Figure 3 represents the flow chart for
recommending a song using cosine similarity.
Fig. 3 Flowchart of cosine

similarity
As mentioned earlier, this is a recommendation system based on the user input.

To start the process, the user has to give the song of his/her interest. Then the features
of other songs in the dataset are combined. Those songs are then sent to the cosine
similarity function for analysis. The angle between the song of interest and the songs
in the dataset are found out. Similar songs are then printed based on the smallest
angle between the songs.
Similarity matrix can be used as a scale in the recommendation system to get

the similar songs. To start with any recommendation system, the primary important
thing is to get the proper dataset. The dataset decides the result of the model. We
have considered some of the songs in native language as the dataset along with
some English songs. The songs have the features like index, title, singer, genre,
album/movie, and user-rating.
After getting the dataset, the dataset has to be filtered. When the dataset is ready,
then we have to read the file. For this, we have used the pandas library where it
provides easy readability. The dataset which we have considered for the paper is in
the form of.csv. The dataset is being read from IBM cloud. We have created a bucket
to store and read for CSV files. This provides feasibility in the data storage.
Then select the features required for the model. It can be a set of titles, singers or
any other feature. In our paper, we have considered all the features. The features of
the song are combined, leaving a space in between. There are two functions which
can be used to vectorize the songs. They are CountVectorizer or the Tf-idfVectorizer
from scikit learn. The output is a matrix. The vector consists of the word count present
in the songs, respectively.
As explained before, it is intended to get the angle between the two vectors, the
matrix which is vectorized is given as input to fit_tranform() method. This method
is used to perform fit and transform on the input data at the same time and to convert
it into the data points.
The fit method is used to calculate the mean and variance of each feature in the
data. The transform method is used to transform all the features using the mean and
variance. fit_transform() is a combination of both which will increase the efficiency
of the model. fit_transform() will calculate the mean(μ) and standard deviation(σ)
of the feature F at the same time and will transform the data points of the feature F.
When we get two vectors from this, it will be projected in multidimensional. The
angle between the songs is calculated by using the Euclidean dot product formula:
A · B = ||A||||B|| cos θ (1)
and the similarity formula

n
A · B Ai Bi
Similarity = cos θ = = i=1 (2)
||A||||B|| n
A2 n
Bi2
i=1 i i=1
Here, Ai and Bi are components of vectors A and B, respectively. The angle

between A and B is as depicted in Fig. 4.
The output from the fit_transform() is given as an input to cosine_similarity()
method. The method will find the angle between the two vectors (two songs). This
method will provide the final output. The songs with less angle will be placed at the
beginning. The song with less angle indicates that it is most likely to be similar to
the input song. If the angle is more, then it cannot be recommended to the users as
Fig. 4 Angle between A and

B
it is out of interest. When we look into the Euclidean distance method, if the song
is compared with the same song n number of times, then it indicates that the song is
not similar. The same scenario in the cosine similarity shows that both songs are the
same. The reason for this may be the word count will be n times the given song and
when it is plotted on the graph, the angle between them is 0. Both the vectors will
be in the same direction, but have different magnitudes.
Cosine similarity can be used by considering multiple features of the song which
makes it unique from other methods. As defined earlier, this is based on user data,
and then it will find the similar song after using the matrix having word count. As we
know this method is advantageous over the Euclidean method and this gives precise
output.
Figure 5 shows the output for the method in the user interface. As we can see,
“Majhe maher pandhari” is classical music. It has recommended similar genre songs
and majority with the same singer. If the song is not present, then it will display “No
Such Song” on the front-end.
3.2 Collaborative-Based Filtering
The content-based filtering approach predicts what a user might like based on the
previously listened or streamed content from a particular user. Whereas in collabo-
rative filtering, the system predicts based on what a particular user might like and
also takes into consideration the tastes and likes of similar users. The collaborative-
based filtering alone has three different ways of approaching the problem. The
approaches are model-based approach, neighbor-based approach and hybrid models
which combine the implementation of both neighborhood models as well as the
model-based approach. In this, we have focused on naïve popularity-based approach
of predicting the songs. We then combine this with the item similarity-based person-
alized recommendation system. This is also called memory-based filtering which
mainly consists of two main methods, namely
Fig. 5 Output of cosine similarity in the UI
(i) User Item Filtering: which predicts the songs listened to by similar users like
you.
(ii) Item Item Filtering: which predicts the items which you and other users also
liked.
3.2.1 Collaborative Filtering Using KNN Algorithm
The K-nearest neighbor algorithm is one of the algorithms used for collaborative
filtering because it is considered as the standard method for user-based collaborative
filtering as well as item-based approach. The K-nearest neighbor algorithm is one
of the non-parametric and supervised methods for usage in regression and classi-
fication. KNN algorithm is supervised and an example for lazy-learner algorithm.
KNN algorithm is based on features similarity. It is basically assumed that similar
things, nothing but songs are located nearer to each other. Selection of k values is
very important in KNN algorithm.
Whenever KNN algorithm is used to recommend similar songs to users, the algo-
rithm will calculate the distance which is present in between input and other songs
in the dataset. Then it will sort the distances in ascending order and return the top
k nearest neighbor songs which can be considered as song recommendations to the
user. We will use the nearest neighbor method from scikit-learn. This method takes
several parameters like, metric, algorithm and n-neighbor, and the steps are as shown
in Fig. 6.
Collaborative filtering is based on the historical preference of the user on a set of
songs. We know the preference of the user by rating. Rating can be calculated both
Fig. 6 Flowchart for KNN

algorithm implementation
implicitly and explicitly. Explicit rating is nothing but asking users to rate the song.
Implicit rating is nothing but checking whether the user has listened to the song or
not. Implicit rating can be considered as a listening count. After finding the rating,
we will generate an interaction matrix. Interaction matrix has many entries which
includes a user–song pair as well as values which represent the rating of the song.
Interaction matrix has huge value, and most of the values are missing because most
of the songs are not rated by the user.
We use a dataset that is being uploaded in the cloud. As interaction matrix has
very sparse value, dealing with sparse value is resource and memory waste, so we
will consider only the songs which have a listening count greater than or equal to
16 as well as we will use scipy-sparse matrix. We will use the csr_matrix function
which we get from scipy.sparse library. We will reshape the data based on unique
values from song_id as index and user_id as columns resulting in a dataframe. Then
we will use the pivot function to convert the dataframe into pivot table. Pivot table
is then converted to a sparse matrix. Sparse matrix is used to fit the model. This
fitted model can be used to recommend songs. We use fuzzy_matching function to
match the string of a new song to all the songs that are present in the dataset. It uses
Levenshtein distance to match the strings.
We take the input from the user and recommend the best songs that are similar to
the song which is entered by the user and a sample output is shown in Fig. 7.
Fig. 7 Output of collaborative approach for the input ammate
3.2.2 Popularity Playlist
Popularity-based approach is a type of recommendation system which works on

the principle of popularity. The music which are in trend or are most popular are
recommended directly to the user. We have considered a million song dataset for
displaying the popular playlist. We will first load the data. As there is really huge
data in the dataset, we will consider the subset of the data. Song title and artist
columns are merged. After merging, we calculate the number of unique users in
the dataset and percentage for each song based on listen count. We split which is
considered into a dataset into a training dataset and dataset for testing. Create an
instance based on popularity and feed it with the training data. We will create a class
to find the similarity between the songs. A sample output of the popularity playlist
is shown in Fig. 8.
3.3 Emotion-Based Music Recommendation
From the biological point of view, facial expression is extracted by the relative posi-
tion or movement of muscles that lie under the skin of the human face. According to
some of the controversial theories proposed, these also convey the emotional state of
the individual at a given instance of time. They are considered controversial as one
can fake the expressions easily. Figure 9 represents flow chart of the emotion-based
music recommendation system.
Fig. 8 Output of popularity playlist that is displayed on the front-end
In this method, image of the user is captured using the Webcam. The image
captured from the webcam is converted to grayscale image. We mark the face with a
rectangular frame. Here, we consider the human face as a region of interest (ROI) as
it is considered as the primary source where emotion of the user is visible. From this
ROI, we make predictions of each emotion class and determine the probability of the
emotion. The emotion with maximum prediction is identified, and it is considered
as the expression of the human in the image. We use the CNN model for training
the emotions of the user. It is done using thousands of images. Once the emotion
is recognized, we search for the songs that relate to the emotion identified. This is
done by mapping the emotion of the user to the genre of the song. So, the particular
emotion is mapped to the genre of the song. Table 1 shows the mapping of genre of
song to the emotion of the user. For example, if the emotion of the user in the image
is found to be happy, then romantic, funny and comedy songs are used to represent
the user emotion and songs belonging to those genres are recommended to the user.
The data of the songs is present in the dataset as a comma separated values (.csv
file) file. This file is stored in the bucket which is the container to store data in the
IBM cloud. The user can give the input as to how many songs he/she would like to
be recommended and those many songs are displayed. If the number of songs that is
suitable for recommendation is less than the number of songs expected by the user,
then the users are informed regarding this by displaying the message.
For the purpose of identifying the human face, Haar Cascade is used, which is the
algorithm used for object detection to identify faces in an image or a real-time video.
This algorithm usually uses edge or line detection features. It is provided with lot of
Fig. 9 Flowchart of emotion-based music recommendation system
Table 1 Mapping of user

Expression Music genre
emotions with genre of Song
Sad Sad, pathos
Happy Romantic, funny, comedy
Surprise Dance
Neutral Classical, folk
Anger Rap
images consisting of faces (these are considered as positive images) and lot of images
not consisting of any face (negative images) to train the model on these images. As
the images used for training increases, accuracy of the method also increases. The
repository has the XML files where the models are stored, and these are read using
the OpenCV methods. These include models for detection of the human face, eye,
upper body and lower body etc.
To predict the emotion of the user, we use the convolutional neural network (CNN)
model for training the emotions. There are three kinds of layers. These layers are:
the convolution layer, pooling layer, and fully connected layer (FC layer). All these
layers are brought together and combined to form the CNN architecture. In addition
to these three layers, there are two more vital parameters. They are the dropout layer
and the activation function.
Convolutional Layer: It is the very first layer used for extracting the numerous
features from the given input images. These convolutional layers perform various
mathematical processes of convolution between the given input images and filters of
the specific size NxN. After sliding the filters on the input images, the scalar product
(dot product) is calculated between the parts of the input images and the filters with
respect to the size of the filter (N × N). The output thus obtained is termed as the
feature map. It gives us information regarding the corners and edges of the image.
Later on, the obtained feature map is given as input to the other layers for the purpose
of learning about the various different features of the image which is the input.
Pooling Layer: In the CNN model, the pooling layer usually follows the convo-
lutional layer. This pooling layer performs the function of reducing the size of
convolved feature maps so that the computational costs are minimized. It is done by
reducing the number of connections present between the layers. It operates indepen-
dently on each feature map. There exists many types of pooling operations, depending
on the methodologies used.
One of the pooling operations is Max Pooling. In this type of pooling operation,
the maximum element is obtained from the feature maps. In average pooling, we
calculate the average of the elements in a predefined size image section. In the sum
pooling function, the summation of the elements that are present in the predefined
section is calculated. The pooling layer acts like a connecting bridge among the fully
connected layer (FC) and the convolutional layer.
FC layer: It comprises the weights and the biases in addition to the neurons of the
CNN. It is also used for establishing the connection between the two non-identical
layers. Usually, the FC layers are kept prior to output layers. Thus forming the few
layers present at last of CNN models’ architecture.
We flatten the given input images from the past layers, and it feeds them into the
FC layer. This flattened vector is subjected to undergo some more FC layers in which
the mathematical operations occur. The classification of the image takes place in this
stage.
Dropout: When every feature is connected to the fully connected layer, it results
in over-fitting of the training dataset. It occurs only when a model works very well on
training data which causes an adverse effect on the performance of the model when
it is used on very new data.
To resolve this issue, we use dropout layers. Here, a few neurons are released from
neural networks (NN) during the training process. Thus, resulting in reduced size of
model. If we give a dropout value of 0.2, the neural network randomly removes 20%
of the nodes.
Activation Function: Activation function is a vital parameter of the CNN model.
These are used for learning and approximating all kinds of continuous and complex
relationships between variables of the network. It decides which all the information
of the model should be fired in the forward direction. Also, whose information is
not supposed to be fired at the end of the network. Thus adding non-linearity for
our CNN network. There are many commonly used activation functions such as the
ReLU, Softmax, tanH and the sigmoid functions. These mentioned functions have a
specific usage. For example, a binary classification CNN model, sigmoid and softmax
functions are used. For a multi-class classification, softmax is usually preferred.
We used Adam as an optimizer. Adam is an adaptive learning rate optimization
algorithm. It is designed for training deep neural networks. The algorithm anchorages
the ability of adaptive learning rate methodologies to find learning rates for each and
every parameter individually.
Epoch is “one pass over the entire dataset.” It can be used to separate training
into well-defined phases. These are helpful for the purpose of logging and periodical
evaluation. It is an arbitrary cutoff. While using validation_data or validation_split
along with fit_generator() method which is the Keras model, evaluation will be
performed at the end of each epoch. In Keras library, an ability to add callbacks
which are specifically designed for running at the end of epoch is present. Some
examples are model checkpointing and changes in the learning rate. At every epoch,
loss and accuracy of testing and validation data is monitored. The training of the
model stops when the loss starts to increase or accuracy starts to decrease. If the
number of epochs is large, overfitting of the training dataset takes place. If there are
too few epochs, underfitting of the model occurs. There is one more method called
early stopping, which allows us to specify an arbitrary huge number as training
epochs and to stop the training when the model performance stops increasing or
when the accuracy starts to decrease on a validation dataset.
In our case, the 14th epoch is considered as the best epoch, as the loss is increasing
from 1.0726 to 1.0975, accuracy is decreasing from 0.6175 to 0.6054 in the 13th and
14th epochs. Once the training stops, the model restores the weight from the best
epoch, and this is called as early stopping. This can be seen in Fig. 10.
Fig. 10 Loss and accuracy value in CNN model at 13th and 14th epoch
Fig. 11 CNN model for emotion classification
Figure 11 is the representation of the CNN model that is used for the classification
of images based on the expression of the human in the image captured by the webcam.
Thousands of images are used for training purposes. The images are classified into
different classes based on the emotion of the human in the image. The images in
each class are again classified as training and validation data. The large amount of
data obtained from this model is stored in hierarchical data format 5 (HDF5) file. It
is an open-source file that stores data in a hierarchical structure within a single file.
The user needs to decide how many songs he/she would like to be recommended
and give this as an input. The recommender system displays the songs which are
suitable for the particular emotion of the user. If the number of songs requested by
the user is more than the number of songs which are representing the songs for the
particular expression of the user, then the message is displayed to convey the same
to the user. Figure 12 illustrates the working of this case.
4.1 Content-Based Music Recommendation System
In cosine similarity, when we observe the output song of interest and the recom-
mended songs, then they usually have the same words. It can be words in the song
name, movie name, genre or singer name. This shows that the recommendation is up
Fig. 12 Working on an emotion-based music recommendation system
to the mark. The result of the cosine similarity is done considering some basic input
songs. For these songs, we have manually checked if the songs are related or not.
There are ten songs which have been considered for calculating the efficiency. There
are many scenarios like getting most related songs and few related songs. Table 2 has
the list of songs which we have considered. From the content of the table, one can
observe that the song “Bhagyada Laxmi Baramma” has the most similar songs; this
is the line in the dataset “Bhagyada Laxmi Baramma, Pt.Bhimsen Joshi, kannada,
“Classical, Bhajan, Hindustani”, Nodi Swami Navirode Heege.”
There are many songs which have common words like Pt.Bhimsen Joshi, Kannada,
“Classical, Bhajan, Hindustani.” The cosine similarity function creates a matrix in
which it will be having the word count for these words one or more than one. Hence,
those songs will be recommended.
Similarly, the song “Yaakinge” has the less number of similar songs. This is
the details of the song “Yaakinge, All ok, Kannada, Rap.” Taking a glance on the
dataset, we observe that there are few number of rap songs. So, whichever Kannada
Rap song is there, those are recommended, and then, whichever song is in Kannada,
those are printed. The observation tells that the dataset has a predominant role in
the recommendation system. When the data size is increased, the recommendation
system fails, as it takes more time for analysis.
Table 2 List of songs considered for efficiency

Index Song Number of similar Total number of Accuracy in
songs songs percentage (%)
1 Bhagyada Laxmi 15 15 100
Baramma
2 Belageddu 12 15 80
3 Ammate 10 15 66.67
4 Dil Me Kayi Armaan 12 15 80
5 Saddi Dilli 14 15 93.33
6 Saara Saara Din Tum 15 15 100
Kaam Karoge
7 Tumhi Ho Bandhu 8 15 53.33
8 Yaakinge 7 15 46.67
9 Srivenkatesa 15 15 100
Suprabhatam
10 Oh olave 13 15 86.67
The accuracy of the cosine similarity has been calculated by taking the ratio of
the number of similar songs to the total number of songs. The result of the ratio
is considered to be the percentage. After this, the average of ten songs accuracy is
taken. The accuracy of cosine similarity for the considered songs is 80.67%.
In Fig. 13, we have considered the index of the song in the table on the x-axis and
the accuracy of the particular song on the y-axis. This provides a visualization of the
accuracy of the cosine similarity.
Fig. 13 Bar graph for accuracy of the songs

Fig. 14 Precision–recall curve for collaborative-based approach
4.2 Collaborative-Based Approach Using KNN Algorithm
Efficiency of collaborative-based approach is calculated by using precision–recall

curve. Precision–recall curve shows trade-off between precision values and recall
values for different thresholds. Precision is defined as how precise is the predicted
value to actual value. Recall devolves how often it predicts correctly whenever it
says positive call. Calculations are done using the following equations:
Precision = TP/(TP + FP) (3)
Recall = TP/(TP + FN) (4)
TP refers to true positive which denotes that the song recommended is nearer to
the input song. FP refers to false positive which denotes that the song recommended
is far away from the input song with Euclidean distance. FN refers to false negative
which denotes that the song recommended to the user is erroneous. Precision–recall
curve for collaborative-based approach is depicted in Fig. 14.
4.3 Emotion-Based Music Recommendation
The result of the emotion-based music recommendation system can be considered

as the result of training the CNN model for emotion recognition and recommending
the songs which represents the emotion of the recognized.
The accuracy of CNN model for emotion recognition is calculated using the
fit_generator() of Keras neural network library. In this fit_generator(), we first
initialize the number of epochs that we are going to train our network along with the
batch size. As the datasets in the real world are usually too large to fit into memory,
they tend to be challenging and require data augmentation to avoid overfitting and
thus increase the ability of the model to generalize and be better than before. In data
augmentation, a new dataset is artificially created for the training from previously
existing training dataset. Thus, improving the performance of the deep learning neural
networks along the amount of data available. We use Keras object called ImageData-
Generator to apply the data augmentation for images to randomly translate, resize,
rotate, etc. Every new batch of the data is randomly adjusted according to the param-
eters supplied to ImageDataGenerator. Once the maximum accuracy is obtained, we
consider that epoch as the best epoch and restore the model weights. Our model has
training accuracy of 72.34% and validation accuracy of 60.54%. This is graphically
represented in Fig. 15.
The accuracy of recommending the song is analyzed on the basis of the number
of songs present in the actual dataset which belongs to a particular genre and the
number of songs recommended to the user. This is represented in Table 3. Thus, the
overall accuracy of recommending the songs is the average accuracy of songs of all
emotions, and that is found to be 98%.
Fig. 15 Graphical
representation of accuracy of
the model
Table 3 Accuracy of
Emotion Expected Actual number Accuracy (in
recommending songs of
number of songs of songs percentage) (%)
different genres
Sad 379 346 91.3
Happy 750 750 100
Surprise 1300 1283 98.7
Neutral 95 95 100
Anger 4 4 100
Music is merged with many of our lives in such a way that it is not considered as
an extra activity. We do it as our daily chores and enjoy listening to it. But, in most
of the cases, we will find difficulty in finding the best appropriate songs. We are not
so open to new songs which eventually become famous. The user might miss out
many songs just because the user is not updated about the new releases. User keeps
listening to the same set of songs or need to search for the songs on his/her own. In
order to help curb this issue, nowadays, there is a rise in usage of recommendation
techniques and systems.
In here, we have developed an emotion and collaborative music recommendation
system and implemented the same using KNN algorithm. We have implemented
four approaches, namely emotion-based, collaborative-based, cosine similarity and
popularity-based approaches. We analyzed the performance of each method. The
efficiency of KNN is high, but it is prone to cold start problems. We need some
strong prior data upon which the recommendation can be made. That is the reason
why we have implemented the emotion-based method as well and we got the accurate
results from the algorithm. Along with the above two, we have also implemented
cosine similarity approach and popularity-based approaches in order to support these
two implementations. The recommendation systems implemented have proved to be
better compared to existing techniques.
We have implemented all the four approaches in the front-end. There are also
options for improving the same, by adding more features like multi-language support,
easy access based on search, improving the front-end design for aesthetics, and adding
more data into the dataset. And also displaying the songs by bifurcating them into
each genre, adding media files etc.
References
1. Hu Y, Ogihara M (2011) Nextone player: a music recommendation system based on user

behaviour. In; 12th International society for music information retrieval conference (ISMIR
2011), pp 103–108
2. Chen HC, Chen ALP (2005) A music recommendation system based on music and user
grouping. J Intell Inf Syst 24(2):113–132
3. Su J, Yeh H (2010) Music recommendation using content and context information mining.
IEEE Intell Syst 25:16–26
4. Liu N, Lai S, Chen C, Hsieh S (2009) Adaptive music recommendation based on user behavior
in time slot. Int J Comput Sci Network Secur 9:219–227
5. Kim D, Kim K, Park K, Lee J, Lee KM (2007) A music recommendation system with a
dynamic k-means clustering algorithm. In: Sixth international conference on machine learning
and applications (ICMLA 2007), pp 399–403
6. Geetha G, Safa M, Fancy C, Saranya D (2018) A hybrid approach using collaborative filtering
and content based filtering for recommender system. In: National conference on mathematical
techniques and its applications (NCMTA 18), IOP publishing IOP conference series: journal
of physics: conference series 1000
7. Yu K, Schwaighofer A, Tresp V, Xu X, Kriegel HP (2004) Probabilistic memory-based
collaborative filtering. IEEE Trans Knowled Data Eng 16(1):56–69
8. Laveti RN, Ch J, Pal SN, Chandra Babu NS (2016) A hybrid recommender system using
weighted ensemble similarity metrics and digital filters. In: 2016 IEEE 23rd international
conference on high performance computing workshops (HiPCW), pp 32–38
9. Arnold AN, Vairamuthu S (2019) Music recommendation using collaborative filtering and deep
learning. Int J Innov Technol Explor Eng (IJITEE) 8(7):964–968
10. Adiyansjah, Gunawan AAS, Suhartono D (2019) Music recommender system based on genre
using convolutional recurrent neural networks. Proc Comput Sci 157:99–109
11. Hu Y, Ogihara M (2011) NextOne player: a music recommendation system based on user
behavior. In: 12th international society for music information retrieval conference (ISMIR
2011), Miami, Florida, USA, Oct 24–28, 2011
12. Namitha SJ (2019) Music recommendation system. Int J Eng ResTechnol (IJERT) 08(07)
13. Singh P, Singh PK, Ganguli A, Shrivastava A (2020) Analysis of music recommendation system
using machine learning algorithms. Int Res J Eng Technol 07(01)
14. Sharma P (2020) Multimedia recommender system using facial expression recognition. Int J
Eng Res Technol (IJERT) 09(05)
15. Schedl M et al (2018) Current challenges and visions in music recommender systems research.
Int J Multimed Inf Retriev 7(2):95–116
16. Song Y, Dixon S, Pearce M (2012) A survey of music recommendation systems and future
perspectives. In: 9th International symposium on computer music modeling and retrieval, vol
4
Cricket Commentary Classification
A. Siva Balaji, N. Gunadeep Vignan, D. S. V. N. S. S. Anudeep, Md. Tayyab,

and K. S. Vijaya Lakshmi
Abstract As technology advances in the field of machine learning (ML), humans

will be able to automate everything. Consider a game of cricket. In cricket, there are
often two outcomes: the ultimate match outcome, win or lose, and the outcome of each
ball in an over. The majority of commentary delivery platforms use manual methods
to update the game database. To eradicate this manual method of updating, this
research work has attempted to automate the update process of the game database by
using natural language processing (NLP). This project consists of a machine learning
model called cricket commentary classifier. The proposed ML model is trained by
using the commentary data obtained from various cricket score delivery platforms
like Cricbuzz, Espn Sports, etc. The commentary delivered by the commentator was
captured with a microphone and later that audio is converted into text. The converted
text was given as input to the cricket commentary classifier. This classification model
classifies the outcome of the ball among four classes: wicket, dot, boundary, and runs.
The final prediction will be automatically reflected in the score log or game database
by eliminating human involvement using REST API calls.
Keywords Machine learning (ML) · Natural language processing (NLP) ·

Outcome · Artificial intelligence (AI) · Classification · Term frequency–inverse
document frequency (TF–IDF) · Random forest classifier
1 Introduction
Let us return to a time before the Internet and technology. In those days, viewers
rely on radio to learn the outcome of a match or a ball in an over. Cricket is a fast-
paced sport that demands quick reactions. In this case, responses may include the
A. S. Balaji (B) · N. G. Vignan · D. S. V. N. S. S. Anudeep · Md. Tayyab · K. S. V. Lakshmi

Department of Computer Science and Engineering, VR Siddhartha Engineering College,
Vijayawada, India
K. S. V. Lakshmi
e-mail: vijaya@vrsiddhartha.ac.in
https://doi.org/10.1007/978-981-16-7610-9_60
826 A. S. Balaji et al.
conclusion of a match or the outcome of a coin flip, for example. Besides, cricket is a
game that contains a lot of predictions starting with toss, team selection, outcome of
a ball in an over, innings end score, and final result of an entire match. It is believed
as one of the rapid areas to be inculcated with artificial intelligence. The proposed
study mainly focuses on the outcome of the ball in an over. Here, outcome means
whether a ball in an over resulted in a boundary or wicket or dot or runs.
Have you ever thought about the outcome of a ball by hearing the voice commen-
tary? Majority of cases commentators would specify what the outcome of a ball
was. There are some situations where commentators would not specify anything
regarding the outcome of the ball. Commentators may provide outcomes indirectly
in their voice. In this scenario, it will be very difficult for a normal human to interpret
the outcomes of balls in an over. On the other side, the people who saw the cricket
would be somewhat sluggish in nature. To mainly eradicate this issue, we proposed
this study. In addition to that, we found that the updates were happening to the game
databases (Storage for scores, commentary, and outcomes ball by ball) manually.
Manual updates generally consume some time. We were trying to eliminate that
delay also.
Nowadays, people are so mobile in nature. They are moving here and there so
easily with the help of modern transportation. Here comes the problem with mobility;
when we were in mobility, the signal strength was too less to handle video streaming.
Then, it is going to be difficult for the users to watch the match in video basis. The
alternate option would be score or commentary delivery websites like Cricbuzz or
Dream11 etc. We observed that these platforms generally perform manual updates
to their game databases. Using our study, we were trying to eliminate this manual
updates with help of automation. Machine learning was one of the techniques used
in artificial intelligence to create automation.
To make this automation happen, we need a help from the commentary data
(voice delivered by a commentator during a match about a ball in an over). This
commentary data is act as an input to our project. After getting the commentary data,
we made predictions about the outcome of a ball in an over using machine learning as
well as natural language processing. Once we got the outcome, we will transfer that
outcome to the game database to make required updates automatically by completely
eliminating human involvement.
2 Literature Survey
Our project is on commentary text classification. Identifying the meaning of various

words and keywords in commentary text was very crucial here. We were unable to
find any other work which tries to classify outcomes of a ball in the game of cricket.
We view some of the domains like image captioning, sports event recognition, and
action recognition.
Cricket Commentary Classification 827
Simonyan et al. [1, 2]; they provide access to various datasets by using a multi-
task learning approach, for recognizing human sports movements and gestures, they
use multi frame optical flow for the purpose of training a ConvNet.
In [1, 3], Karpathy et al. trained a convolutional neural network along with
recurrent neural network for huge classification of images and its descriptions.
Geng et al. [1, 4] pretrained their convolutional neural network model using
autoencoder, then after to recognize human sports actions by using SVM classifier
model.
Karpathy et al. [1, 5] integrate CNNs for the sake of development of fusion
architectures to classify videos into specific kinds of sports. The architectures in
[1, 5] are a large source of motivation for our project. Donahue et al. [1, 6], they
proposed a network that was trained by the long-term convolutional recurrent network
for image and video description. Enthusiastically, using naval architecture and data
augmentation methods, action recognition was the primary task performed on video
attributes. In addition to human action recognition in videos, the authors Ji et al.
[1, 7] proposed using 3D ConvNet. Tyagi et al. [8] were using machine learning
algorithms to predict the duration of a match in terms of number of balls expected to
be delivered in a match. In this study [8], their prediction was based on the historical
data.
Amin et al. [9] suggested a new method for cricket team selection using data
envelope analysis (DEA). They proposed DEA formulation for evaluation of cricket
players based on the various outputs. This evaluation ranks cricket players based on
the DEA scores.
Kumar et al. [10] with the help of convolutional neural networks and long short-
term memory networks the outcome of the cricket match ball by ball with the help
of videos has been predicted.
Rahman et al. [11] utilized preliminary CNN architecture and the transfer learning
models to perfectly classify the outcome of the ball based on the grips of bowlers.
They believed that if the grip of ball in bowler hand is good, then probability to make
a miss hit by the batman will be high and vice versa.
Singh et al. [12] used linear regression model to predict the score in first innings as
well as second innings in a match based on some attributes like players performance,
current run rate, and venue. They used naïve Bayes classification to predict the
outcome of the match.
Subramaniyaswamy et al. [13] proposed a system called iSCoReS to formulate or
provide relevant data about a player during a match to the commentator. Main theme
of this was to increase the efficiency of commentary delivery to the end users.
Kaluarachchi et al. [14] developed a software tool called CricAI. This tool outputs
the probability of the victory in an ODI cricket match using input factors such as toss
advantage, player’s strength and home advantage.
Semwal et al. [15] utilized deep convolutional neural network (DCNN) to classify
different type of bad shots played in cricket. This approach utilizes videos to classify
the bad shots played by the players during the match.
3 Proposed System
The proposed approach consists of several steps. As we said earlier, the input to
proposed classifier model was commentary, which was delivered from the commen-
tator during the game. Let us start the methodology with data collection to train the
model. Entire process was illustrated in Fig. 1.
3.1 Data Collection and Dataset Preparation
Machine learning projects include a dataset for training and the development of a
model for prediction or categorization. As in the previous case, our model required
a prebuilt dataset as a primary functional need. We cannot train or create a model
without a dataset for training. A well-collected dataset that consists of all the class
labels in a very balanced manner is needed.
Sentiment analysis is the main thing here. To do a good sentiment analysis, we
need all kinds of data that includes all situations and emotions etc. For this purpose,
we collected the data from four different scoring delivery platforms like dream11,
Espn sports, Cricbuzz, etc. This dataset includes around 20,000 records.
Fig. 1 Project Block

Diagram
Fig. 2 Dataset collection
This dataset includes all emotional sentiment for a particular outcome in cricket.
A detailed way of illustrating the collection of data for training was shown in Fig. 2.
We are using Microsoft Excel for storing these records. We collected all records by
manual process and some of the work has been done by a tool that widely uses regular
expressions to collect the data for training of the machine learning model. A record
is formulated in the form of table consisting of 3 columns they are Id, Commentary
Text and Class Label. The dataset follows specific schema like relational databases.
For example, the Column “Id” takes only numbers, the Column “Commentary Text”
takes only text format, and similarly, “Class label” also only takes text format.
The process of getting the data ready for a machine learning model was illustrated
in 3 steps.
1. Data Selection: It is a process used to select the relevant data. This project has
only chosen the commentary data apart from commentary data. Hence, we do
not want any other data like how many runs in an over? Or Information related
to the number of overs in a match.
2. Data preprocessing: It is a most important step in dataset preparation. In this
step, we will try to remove all the unnecessary data. For example, if you consider
our dataset, we removed some of the stopwords like “the,” “a,” “an” etc. In other
words, the data which is not suitable for prediction will be removed in this step.
3. Data Transformation: This step has its own importance. There are some cases
in which the data in one format is not useful for prediction, but the same data
will be useful for prediction if and only if the data is in some other format. For
example, our model requires the data in text format rather than audio format.
One of the important things here we need to consider; we need to make sure that
no leakage of the dependent variable in input commentary data. Due to that reason,
we are using a technique called masking. Masking is a kind of technique that will
hide the leakage of the predicted label in training commentary data. An example for
this technique and a sample record was given in Fig. 3. In Fig. 3. XXXX may be four
or six.
The final dataset consists of all the class labels that contains equal in terms of the
number of records. The complete composition of the dataset was given in Table 1.
The main agenda to take the dataset like this was we shall be trying not to make any
Fig. 3 Sample record
Table 1 Composition of
Class label Number of records
records
Dot 5000
Boundary 5000
Wicket 5000
Runs 5000
class label as dominant while training. Besides that, we are trying to eliminate the
concept of up and down sampling.
Since natural language processing (text mining) is used, it is always necessary to

preprocess (cleaning) the data before knowing the importance of each and every
word in the input commentary data. Several techniques are available to clean the
training data. They are as follows.
• Spell checking and correction.
• Removal of punctuations marks and unnecessary special symbols.
• Removal of unnecessary or stopwords.
• Case folding
• Tokenizing, etc.
3.3 Term Frequency and Inverse Document Frequency

(TF–IDF)
Once the preprocessing was over, now it is time to extract some useful features from
commentary data. Here, Tf–Idf is a technique that enables us to find the usefulness
of each word in commentary data. The technique to find the usefulness of each word
was shown with an example sentence called “I LOVE NLP.”
N = 20 ( let number of records or documents in corpus) (1)

d f NLP = 1 (2)
tfNLP, j = Number of Occurrences of the word NLP/Number of words in Text = 0.3333

(3)
WNLP, j = tfNLP, j × log(N /d f NLP ) (4)
WNLP, j = 0.3333 × log(20/1) (5)
WNLP, j = 0.3333 × 1.301 (6)
WNLP, j = 0.43 (Usefulness of the word NLP in the text i love NLP (7)
Like as shown in the above example, the word NLP in the sentence “I Love NLP”
as an importance of 0.43. Likewise, we try to extract the best possible combination
of words for the prediction of outcome from the commentary data.
If word, text was high, then we can conclude that the “word” would be very useful
for the prediction of class label and vice versa.
3.4 Model Building and Algorithm Used
After knowing all the prerequisites that are required to build our cricket commentary
classifier model, now it is time to build our model using some classification algorithm.
Random forest classifier is an ensemble type of method that internally uses a decision
tree algorithm for classification. The working of the algorithm was very simple to
understand based on our training set a finite number of decision trees will be formed
let me say three for an instance. Among three decision trees, two trees gave the
outcome as “Dot,” and one tree gave the outcome as “Boundary,” then our outcome
will be “Dot.” Because the “Dot” class label will be the dominant class label which
was predicted by a maximum number of decision trees. The working of this was
illustrated in Fig. 4.
The data flow diagram for the proposed model is shown in Fig. 5. The data flow
diagram will provide a complete overview of project and gives some of kind of clarity
about how the data is flowing among the modules in the project. Data flow diagram
used as a basis for architecture.
Fig. 4 Ensemble method
Fig. 5 Data flow diagram
3.5 Capturing of Voice Data from Commentator
The main input to our prediction was commentary text. This commentary text comes
from the audio which was delivered by the commentator at the end of each delivery in
an over in cricket. To capture the voice of the commentator, we are going to arrange
some microphones in front of the commentator. The microphone will capture the
voice of the commentator and send those voice data to the central server location. To
do this job, we require a microphone and some Python code.
Using Microphone
Commentator Commentary data (Audio Format)
3.6 Converting Voice Data into Text Format
Commentary data that is getting from the commentator was in audio format. Our
cricket commentary classifier model needs text as input for prediction. Therefore,
there is a need for the conversion of audio format to text format. We have services
available in Google Cloud (Google Text API), Amazon Cloud (Amazon Transcribe),
and Azure Cloud. Our job will be simple if we use those above-said cloud services
for the transcription of the audio. But, we are not using any services in the cloud,
rather we are using libraries in Python Flask to make the job done. We wrote our
code in python to complete this task.
Using Python Code

Commentary data (Audio Format) Text
3.7 API to Update the Game Database
API stands for an application programming interface. On the server, we wrote Python
code that will accept the request from the client computer. Here, the client computer
is nothing but our cricket commentary classifier model. Once the prediction was done
by the cricket commentary classifier model, the predicted class label was sent to a
special API method. This special API method consists of code for the handling of
automatic updates of the game database. After completion of this process, the results
of updates for the outcome of the ball in the game of cricket will be reflected in the
client devices. This automation will eliminate human involvement in the handling of
game databases. The entire process was shown in Fig. 6.
4 Results
Following the acquisition of the cricket commentary classifier model, this research
work has calculated model parameters such as accuracy, recall, and F1 score using
2000 data to determine its efficacy. Since the proposed model was a classification
model, most classification models have a discrepancy between accuracy and F1
Fig. 6 Process to update the game database automatically
Score of less than 0.5. Our categorization model produced good results. Accuracy
and F1 Score are quite near, which is a key feature of a classification model. Model
accuracy: 0.84.
• Precision: 0.838
• Recall: 0.84
• F1 Score: 0.836
Figure 7 is the screenshot that shows how the live predictions were happening
on the commentary text data. As shown in Fig. 7, the Web page was an interactive
Web page (dynamic in nature). Once we got the predicted outcome of the ball, game
databases will be updated automatically. Later, SQL server running in the background
will return the entire database (along with modifications) as a result set (table with
Fig. 7 Live predictions on commentary data

defined schema) to our frontend of the project. In the frontend, how the results will
be shown to our users just like famous websites like Cricbuzz and Espn Sports etc.,
is given in Fig. 7.
This project has used some of the basic natural language processing and machine
learning techniques to classify the outcome of the ball. TF–IDF is the main technique
used to find the usefulness of a word in the commentary text. The algorithm used
to make our model is random forest classifier algorithm. To deploy our application
in real time, we used Python flask to create a server to manage Web pages and Web
requests. SQL server used to manage (updates) the game database. Replicating any
task in this world with the technique of automation is quite challenging. We tried our
best possible effort to make this automation possible. With the help of our model, at
least some milliseconds of delay were eliminated. If there is no delay, then users can
have a better game experience. If we serve our users with the best possible experience,
then it will automatically place a huge impact on business. In this project, we are
using some basic algorithms and basic approaches to make some difference in the
field of cricket. We were hoping some improvements are still there and that needs
to be addressed in the coming future. We believe that this is a high-growth area for
automation since customers want quick replies. If feasible, we intend to use a similar
system for other sports such as football, volleyball, hockey, and so on in the future.
References
1. Dixit K, Balakrishnan A (2016) Deep learning using CNNs for ball-by-ball outcome classifica-
tion in sports. cs231n.stanford.edu, Mar 23, 2016. [Online]. Available: http://cs231n.stanford.
edu/reports/2016/pdfs/273_Report.pdf. Accessed 13 Mar 2020
2. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition
in videos. In: Proceedings of NIPS’14 27th international conference on neural processing
system’01, pp 568–576
3. Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descrip-
tions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676. https://doi.org/10.1109/TPAMI.
2016.2598339
4. Geng C, Song J (2016) Human action recognition based on convolutional neural networks
with a convolutional auto-encoder. In: Proceedings of 2015 5th international conference on
computer sciences and automation engineering, vol 42, pp 933–938
5. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video
classification with convolutional neural networks. In: IEEE conference on computer vision and
pattern recognition, pp 1725–1732.https://doi.org/10.1109/CVPR.2014.223
6. Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition
and description. In: IEEE conference on computer vision and pattern recognition (CVPR), pp
2625–2634. https://doi.org/10.1109/CVPR.2015.7298878
7. Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recog-

nition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231. https://doi.org/10.1109/TPAMI.
2012.59
8. Tyagi S, Kumari R, Makkena SC, Mishra SS, Pendyala VS (2020) Enhanced predictive
modeling of cricket game duration using multiple machine learning algorithms. In: Interna-
tional conference on data science and engineering (ICDSE), pp 1–9.https://doi.org/10.1109/
ICDSE50459.2020.9310081
9. Amin GR, Sharma S (2014) Cricket team selection using data envelopment analysis. Eur J
Sport Sci 14:S369–S376
10. Kumar R, SD, Barnabas J (2019) Outcome classification in cricket using deep learning. In:
2019 IEEE international conference on cloud computing in emerging markets (CCEM), pp
55–58. https://doi.org/10.1109/CCEM48484.2019.00012
11. Rahman R, Rahman MA, Islam MS, Hasan M (2021) DeepGrip: cricket bowling delivery
detection with superior CNN architectures. In: 2021 6th international conference on inventive
computation technologies (ICICT), pp 630–636. https://doi.org/10.1109/ICICT50816.2021.
9358572
12. Singh T, Singla V, Bhatia P (2015) Score and winning prediction in cricket through data mining.
In: International conference on soft computing techniques and implementations (ICSCTI), pp
60–66.https://doi.org/10.1109/ICSCTI.2015.7489605
13. Subramaniyaswamy V, Logesh R, Indragandhi V (2018) Intelligent sports commentary
recommendation system for individual cricket players. Int J Adv Intell Paradigms 10:103–117
14. Kaluarachchi A, Aparna SV (2010) CricAI: a classification based tool to predict the outcome in
ODI cricket. In: Fifth international conference on information and automation for sustainability,
pp 250–255.https://doi.org/10.1109/ICIAFS.2010.5715668
15. Semwal A, Mishra D, Raj V, Sharma J, Mittal A (2018) Cricket shot detection from videos. In:
2018 9th international conference on computing, communication and networking technologies
(ICCCNT), pp 1–6. https://doi.org/10.1109/ICCCNT.2018.8494081
Performance Comparison of Weather
Monitoring System by Using IoT
Techniques and Tools
Naveen S. Talegaon, Girish R. Deshpande, B. Naveen,

Manjunath Channavar, and T. C. Santhosh
Abstract Environmental monitoring has become more important as a result of

climate change. Continuous tracking of environmental parameters is required to
analyze the environmental sustainability. Because the Internet of things is the latest
technology and is crucial in collecting information from sensor devices. The paper
uses an IoT trainer kit mounted with all the sensors, Raspberry Pi, Arduino, a Wi-
Fi unit that aids in the processing and transmission of sensor information to the
ThingSpeak cloud. As a result, the received variables are saved on the cloud plat-
form (ThingSpeak). Through the cloud computing method, a database is used to keep
track of environmental conditions. ThingSpeak also has a feature that allows you to
establish a public channel for analyzing and estimating it an android application is
developed for easy access to the measured parameters and to compare the results
obtained from the Raspberry Pi and the Arduino to see how well they perform.
Keywords Sensor devices · ThingSpeak · Cloud · Arduino · Raspberry Pi · Wi-Fi

unit · Internet of things
1 Introduction
The growing demands for Internet-based services necessitated efficient data collec-
tion and exchange. The Internet of things tends to refer to a fast-growing network
of interconnected devices that can gather and exchange information via integrated
sensors. It is now widely used in virtually every industry and plays an important
function in the planned environmental surveillance system. The convergence of IoT
and cloud computing provides a fresh approach to improved data planning from
sensing devices, low power consumption, low-priced gathering, and communication
N. S. Talegaon (B) · G. R. Deshpande · B. Naveen · M. Channavar · T. C. Santhosh

Department of Computer Science and Engineering, KLS Gogte Institute of Technology, Belagavi,
India
G. R. Deshpande
e-mail: grdeshpande@git.edu
https://doi.org/10.1007/978-981-16-7610-9_61
838 N. S. Talegaon et al.
microcontrollers such as the Arduino and high-cost mini-computers like Raspberry

Pi. ThingSpeak, an open-source website, is used to update parameter measurements.
ThingSpeak is an free software IoT platform and application programming inter-
face for storing and retrieving sensors live data via the Hypertext Transfer Protocol
through the Internet. ThingSpeak is a cloud-based IoT monitoring framework that
enables you to integrate, display, and interpret real-time data feeds. The cloud uses
graphic visualization operations and is accessible to clients in the context of a simu-
lated computer and devices communicate with the data center via feasible ‘wireless
network Internet services’ accessible to clients and a large percentage of components
use sensors to notify about environment’s analog records. The Internet of things
connects everything and allows us to interact through our own devices. The metrics
obtained can be seen in these scripts in formats such as JSON, XML, and CSV. The
suggested method allows the user to directly access the environmental parameters,
removing the requirement for third parties.
2 Literature Survey
Arko Djajad et al. have implemented their system architecture environmental quality
monitoring techniques by using IoT, and their proposed system. Different sensors are
connected by using Internet via serial interfaces called Modbus. Then collected data
is then sent over network. Collected data from all the motes act as input to IoT circuit
board which is made up of Arduino and Raspberry Pi. The result can be sent over
to cloud to monitor changes in environmental conditions. Users can easily access
results via their laptops and mobiles which are Wi-Fi enabled. Different sensors are
deployed in IoT kit. Sensors used in circuitry are like analog sensors. Sensors are
connected to analog port attached to Raspberry Pi and Arduino [1].
Tamilarasi B. et al. proposed model that provides functional design and also
helps in the implementation of sensor networks, which can be deployed in observing
environmental conditions IoT applications [2].
Nikhil Ugale et al. have presented the system helps for monitoring environmental
conditions in home. The system uses different types of sensors like light, humidity,
temperature to observe different conditions. All the sensors are controlled by PIC
microcontroller. Different devices are connected with the help of sensors. It helps to
identify functionality of different devices connected to it. Once particular device is
turned on sensors senses the correct working of device, if any anomalies identified
automatically message will be sent to concerned user through email. This system has
demonstrated a new IoT technology very efficiently. Its proved that IoT technology
more helpful for advanced home automation [3].
Kondamudi Siva Sai Ram et al. have proposed, and the system is an advanced
solution for monitoring the weather conditions at a particular place and make the
information visible anywhere in the world. Kondamudi Siva Sai Ram et al. have
proposed the architecture that is more advanced for weather monitoring systems,
depending on location and time we can easily assess the environmental conditions.
Performance Comparison of Weather Monitoring System … 839
We can observe the data anywhere in the world. This system is designed to monitor
temperature, humidity, light, etc. After data computation and processing, we can get
the data over Internet from anywhere in the world. In this experiment all functions,
data processing and data collection are supported by microcontroller (LPC2148).
Data can be retrieved from sensors by using microcontroller then sent over Internet
by using Wi-Fi module.
Ms. Padwal S. C. et al. have proposed the system architecture for sensor networks
that can be used for environmental monitoring systems in IoT applications. This
proposed system to build sensor networks by combining IoT application [4].
The constructed system includes Arduino board which is a microcontroller and a

Raspberry Pi board which is a minicomputer both of them act as the main processing
unit for the entire system, on which all sensors and gadgets are embedded and inte-
grated. The sensors may be operated by the Arduino and Raspberry Pi to extract live
information from them, and it carries out the study using sensor information and
uploads that data to Internet via the Wi-Fi unit that is attached to the IoT kit (Figs. 1
and 2).
The Raspberry Pi and Arduino installed on an IoT trainer kit, a sophisticated devel-
opment platform, were used to create this IoT-based weather monitoring system. The
Fig. 1 Complete IoT system hardware setup

Fig. 2 Raspberry Pi
Raspberry Pi module is useful for minimizing the proportion of hardware compo-

nents in the system. As a result, this project does not make use of any external
ADCs or communication modules. DHT11 (temperature and humidity sensor), light
intensity sensor (LDR), available gas sensor, and soil moisture sensor are used in this
system. All of these sensors are linked to the Arduino and Raspberry Pi boards’ GPIO
headers. The Ethernet network is utilized to obtain real-time monitoring of sensor
data. Figures 3 and 5 show a block diagram of interfacing sensors with Raspberry Pi
and Arduino. The data flow diagram for the system is given in Figs. 4, 11, and 12.
1. Raspberry Pi:
Raspberry Pi is used to create a real-time weather surveillance system based on the

Internet of things. The ARM11 processor serves as the central core in the proposed
system architecture. A 32-bit processor with a single core is used in this system with
512 MB of RAM memory. This tool features USB port, an Ethernet port, an HDMI
port, and an SD card slot. It is simple to connect this board to the Internet via the
Ethernet or USB ports. For monitoring environmental parameters, various sensors
are connected to the Raspberry Pi unit’s general-purpose input and output (GPIO).
The Raspberry Pi board is also supplied with a 5 V, 1A power supply via a micro
USB port. The OS, as well as all programs and files required for this system, is saved
on a SD card with 8 GB memory. The Raspberry Pi board is connected to a keyboard
and mouse via USB ports.
A HDMI to VGA cable connects the monitor to the Raspberry Pi board via the
HDMI port. The Ethernet port is designed to interface the system to the Internet
through local area network (LAN). This tool also has a normal RCA-type port for
composite NTSC or PAL broadcasting signals, as well as a basic 3.5 mm small
Fig. 3 Block diagram of interfacing sensors with Raspberry Pi
Fig. 4 Arduino
Fig. 5 Block diagram of interfacing sensors with Arduino
analog audio connection for driving strong impedance forces (such as amplified
speakers). This tool also includes a Camera Serial Interface (CSI) socket for camera
unit interfacing and a Display Serial Interface (DSI) socket for LED or LCD display
interfacing. Both CSI and DSI are 15 pin connectors.
2. Arduino:
Arduino device is freely accessible software tool that helps devices can easily
detect environmental monitoring systems than standalone systems. It is an easily
accessible virtualized framework built on a basic microcontroller with an integrated
advancement platform for building board applications. Arduino may be utilized to
build responsive devices that accept input from various switching devices or sensing
devices and control several lighting systems, engines, and other real responses.
Arduino applications can operate independently or interface with desktop appli-
cations. The tool can be constructed by manual or bought pre-assembled; the
fully accessible IDE software is free to download. Figure 4 depicts an Arduino
microcontroller.
The Arduino programming language is a fork of Cabling, a real application frame-
work built on the computing multimedia development platform. Arduino circuit
board made up of 14 pins and 6 of pins can be used for pulse width modulation output
pins, and also combines six analogue inputs with USB connector, power supply, with
processing speed of 16 MHz, and rest button. We can easily connect microcontroller
to a desktop PC with AC to DC battery. All circuit integrated components are directly
connected with Arduino board.
Fig. 6 DHT11 sensor
3. Humidity and Temperature Sensor (DHT11):

Humidity Sensor: This sensor will report the ambient humidity level in the nearby
environment.
Temperature Sensor: This sensor measures the temperature and calculates the dew
point and heat index.
GPIO9 on the Raspberry Pi is used to connect DHT11sensor to the IoT trainer
kit for detecting humidity (in percent) and temperature (in degrees Celsius) quantity
via a single wire serial interface (SPI). This sensing device employs a resistant type
component for humidity measuring and a negative temperature coefficient (NTC)
component for temperature measuring. The DHT11 output is a calibrated digital
signal that the Raspberry Pi can understand, eliminating the need for an analog-to-
digital converter. DHT11 requires a voltage supply of 3–5.5 V and a current supply
of 0.5–2.5 mA.
Humidity is detected by this sensor. It makes analog and digital output possible.
We are using the digital export pin to integrate it right to the Arduino’s digital pin (pin
7). The power is controlled by an integrated advance register in the sensing device.
Arduino’s VCC and GND pins are linked by the 4-pin connector. Figure 6 depicts a
DHT11 sensor. DHT11 features a wide temperature scope minimal power utilization
long-term durability, and a measured digital output. A DHT sensor is made up of
two parts: a responsive humidity sensor and a thermometer. A basic microchip within
converts analog-to-digital and emits a digital signal containing the temperature and
humidity. To provide accurate temperature readings, an 8-bit microcontroller with
high efficiency is integrated into the sensing device, with the equilibrium-coefficient
stored in the OTP memory. Plugging in and out the sensor will no longer be a problem
with the new 3-pin connector, which includes a number of soldered surfaces as well
as a durable casing. The 3-pin connector is ideal for getting it started quickly and is
very simple in usage. It is efficient and affordable.
Fig. 7 Light sensor
To gain access from the dht11 sensor embedded within the IoT trainer kit, we
used a four-pin connector wire and connected it from the kit’s RM2 socket to the
RM19 socket.
4. Light Intensity Sensor (LDR) (Fig. 7):
Light sensor: This sensor is designed to gauge the ferocity of light that strikes it.
The LDR is linked to the Raspberry Pi’s GPIO10. The light-dependent resistor
(LDR) is used to measure the strength of light. It generates analog signal. To
refrain from the usage of ADC, a capacitor network is established. Light intensity
is measured by estimating the capacitor charging period, which is reliant on LDR
resistance. The resistance of LDR differs significantly with the beam of light. As a
result, resistance drops as light intensity increases and increases as light intensity
lowers. The charging time of the capacitor varies according to this principle. Light
intensity is classified as high, medium, or low depending on the charging time of the
capacitor. It is displayed as a percentage on thingspeak.com.
An LDR is an unit with a (varying) resistance that changes with the amount
of beam that strikes it. As a result, they can be used in light detecting devices.
Light-dependent resistor (LDR) is a changeable resistor that is controlled by light.
This exhibits photoconductivity, as its resistance reduces as ambient beam intensity
increases. Light-penetrating sensing devices, as well as light and dark triggered
switching units, can all benefit from the use of an LDR. LDR is equipped with high
resistance semiconductor device. LDR resistance capacity lies in between few mega
Oms to few hundred ohms.
When incoming light on an LDR surpasses a particular wavelength, light is
absorbed by a microelectronics device such as a semiconductor, providing suffi-
cient energy for bound ions to enter through the conducting barrier. The resulting
liberated electrons (and their complete companions) transmit electricity, lessening
resistance. The resistance limit and sensitivity of an LDR can vary greatly between
devices.
To gain access from the light sensor embedded within the IoT trainer kit, we used
a four-pin connector wire and connected it from the kit’s RM3 socket to the RM20
socket.
Fig. 8 MQ-7 gas sensor
5. Gas Sensor (Fig. 8):

A gas sensor is an equipment that detects the inclusion of toxic gas in a partic-
ular region, usually as result of a protective system. Carbon monoxide, ammonium,
sulfides, and aromatic vapors are all extremely responsive to the MQ 135 sensor.
Carbon monoxide (CO) is sensing for measuring CO levels in the atmosphere.
CO-gas concentration limits varying from 20 to 2000 ppm are detected using the
MQ-7. This sensor is highly perceptive and has a quick reaction time. The sensor’s
output is an analog resistance.
To gain access from the gas sensor embedded within the IoT trainer kit, we used
a four-pin connector wire and connected it from the kit’s RM12 socket to the RM22
socket.
6. Soil Moisture Sensor (Fig. 9):
The soil moisture sensor detects the quantity of water level in the soil. It estimates
the volumetric water content of the soil.
To gain access from the soil moisture sensor embedded within the IoT trainer kit,
we used a four-pin connector wire and connected it from the kit’s RM13 socket to
the RM23 socket.
7. Raspbian:
The operating system Raspbian is freely accessible on Internet is works on Linux
operating systems. To install we are using an image. Raspbian specifically made
for Raspberry Pi device based on Debian. Raspbian is a collection of different
instructions and libraries which help proper functioning of Raspberry Pi device.
Fig. 9 Soil moisture sensor

Fig. 10 ESP8266 Wi-Fi unit
Fig. 11 System data flow with Raspberry Pi
8. Wi-Fi Unit (Fig. 10):

An ESP8266 Wi-Fi unit with an integrated TCP/IP network architecture was used.
As a result, any microcontroller can be connected to a Wi-Fi hotspot. The ESP8266
is a pre-coded SOC that must be communicated with by any microcontroller via
the UART port. It is operated by a 3.3v power supply. The unit has AT instructions
written into it, and Arduino the microcontroller should be configured to deliver the
AT instructions in the correct order to enable the unit to operate in user mode The
unit can function as both a client and a server.
9. System FlowChart (Figs. 11 and 12):
10. Python:
Python is a free and high-level programming language that is used for general-
purpose programming. Python is a beginner’s language that is interpreted, interac-
tive, object-oriented, and interactive. Python is compatible with the Linux kernel.
Integrated Development and Learning Environment (IDLE) is the textual editing
application used for Python programming.
Fig. 12 System data flow with Arduino
4 ThingSpeak
ThingSpeak is an accessible software based on IoT platform and application program-

ming interface (API) that maintains and fetches information from sensing devices
via the Internet using the HTTP protocol. It is a cloud-based IoT analytical software
platform that integrates, depicts, and interprets real data feeds (Fig. 13).
ThingSpeak allows you to develop gauge-signing devices, position mapping
devices, and a virtual networking Web site for objects with current alerts. ThingS-
peaks has inbuilt capability of numerical programming under the platform MATLAB,
by using ThingSspeaks users can easily observe and keep track of all the information
via MATLAB [5].
ThingSpeak is primarily responsible for continuously updating data, as it has APIs
for gathering information produced by sensing devices as well as APIs for under-
standing that info from the developed android application. There are two sections in
the paper. One section of the paper requires you to program something to transfer
Fig. 13 Working of ThingSpeak

information. And the second section is when someone else must look at the informa-
tion. ThingSpeak is situated in the center, allowing you to do both. The paper builds
an evidence of theory IoT system using readily available hardware to keep track of
environments surrounding humidity level, temperature level, gas level, soil moisture
level, light intensity, and so on. This can be further altered with various sensing
devices or automation systems to create something for a specific purpose. After the
above-mentioned procedure is completed, the user has immediate accessibility to all
ecological factors.
5 Result Analysis and Performance Comparison
After detecting information from various sensing tools located in a specific subject of
focus. When an appropriate linking is established with the server device, the detected
information is instantly transferred to the web server. The results will be displayed on
web server page. The web server page displays all environmental data according to
client request. Also, it stores all the variations in values. The entire information will
be automatically generated through Google spreadsheets. The data can be analyzed
and compiled frequently. All the data will be saved in cloud database, and we can
easily observe the changes that occurred in environments.
• Visualization of humidity field on the thingspeak.com channel in a graphical form
with exact humidity on that particular day and time (Fig. 14).
• Visualization of temperature field on the thingspeak.com channel in a graphical
form with exact temperature on that particular day and time (Fig. 15).
• Visualization of gas field on the thingspeak.com channel in a graphical form with
exact gas level on that particular day and time (Fig. 16).
• Visualization of moisture field on the thingspeak.com channel in a graphical form
with exact moisture level on that particular day and time (Fig. 17).
• Performance comparison of weather monitoring system under different scenarios
(Fig. 18).
In view of comparison in between IoT tools (Raspberry Pi and Arduino), we are
considering four parameters humidity, temperature, gas level, moisture level in our
near environment.
Both Raspberry Pi and Arduino nearly give the same outcome because the sensors
used are the same but with a built in Wi-Fi unit in Raspberry Pi, we get the accurate
results as its acts as a mini-computer and dose multiple tasks at once.
Whereas Arduino can do one task at a time and acts as a microcontroller; we need
additional module or Wi-Fi unit to connect to the Internet because it has very less
ports and hardware components.
Both Raspberry Pi and Arduino tools are affordable, small size, low power
consumption and provide fast data transfer, good performance and remote moni-
toring.
Fig. 14 Simulation of humidity in percentage
Fig. 15 Simulation of temperature in °C

Fig. 16 Simulation of gas level
Fig. 17 Simulation of moisture level

Fig. 18 Table of results obtained by using Raspberry Pi and Arduino tool
Fig. 19 Graph between both systems
The graph shows proposed method with Raspberry Pi is better than with Arduino
in all manner (Fig. 19).
6 Conclusion
We may conclude from this study that Arduino is useful for repeated jobs like opening
the garage door, turning on and off the lights, reading temperature sensors, controlling
a motor as the user desires, and so on. While the Raspberry Pi is capable of executing
numerous activities such as controlling complex robots, playing videos, connecting
to the Internet, interacting with cameras, and so on. For example, if you want to create
an application that monitors humidity and temperature from a DHT11 sensor and
displays the results on an LCD, you can use Arduino to do so. However, if you want
to track the humidity and temperature from a DHT11 sensor, send an e-mail with
the statistics and examine/interpret the outcomes with an online weather report, and
show the data on an LCD, then the Raspberry Pi is the appropriate choice. In simple
terms, Arduino is intended for novice projects and quick electronics prototyping,
whereas Raspberry Pi is utilized for more complex projects that can be handled by
Pi. On many environmental conditions, we may compare the performance of our
system to that of other IoT tools (Raspberry Pi, Arduino).
This IoT-based device monitors environmental indicators in real time. Tempera-
ture, humidity, light intensity, gas intensity level, and soil moisture level are all moni-
tored by this system. Data may be viewed from anywhere on the planet. Using this
method, the client can continuously monitor various environmental factors without
interacting with any other server. The Raspberry Pi itself serves as a server. Raspbian’
s operating system handles this task admirably. This weather monitoring system,
built with Raspberry Pi and Arduino, is inexpensive in cost, compact in size, low in
power consumption, has quick data transfer, good performance, and can be monitored
remotely.
7 Future Scope
1. A smoke alert system can be connected to the module to inform the recipient in
the scenario of excessive smoke concentrations.
2. Clients can be notified through SMS of the temperature/humidity/smoke
parameters.
References
1. Deshpande GR, Sannakki S, Madi S (2021) Advanced Home Automation by using Raspberry
Pi
2. Djajadi A, Wijanarko M (2016) Ambient environmental quality monitoring using IoT sensor
network. Internetworking Indonesia J (IIJ) 8(1)
3. Tamilarasi B, Saravanakumar P (2016) Smart sensor interface for environmental monitoring
in IoT. Int J Adv Res Electron Commun Eng (IJARECE) 5(2) Feb 2016
4. Ram KSS, Gupta ANPS (2016) IoT based data logger system for weather monitoring using
wireless sensor networks. Int J Eng Trends Technol (IJETT) 32(2) Feb 2016
5. Padwal SC, Kumar M (2016) Application of WSN for environment monitoring in IoT applica-
tions. In: International conference on emerging trends in engineering and management research
(ICETEMR-16) 23rd Mar 2016
6. Richardson M, Wallace S (2012) Getting started with Raspberry Pi. 1st edn
7. Ugale N, Navale M (2016) Implementation of IoT for environmental condition monitoring in
homes. Int J Eng Appl Technol (IJFEAT). Feb 2016
8. Rao BS, Rao KS, Ome N (2016) Internet of Things (IOT) based weather monitoring system.
IJARCCE J 5(9) Sept 2016.
9. Gawali SM, Gajbhiye SM (2014) Design of ARM based embedded web server for agricultural
application. Int J Comput Sci (1)
10. Singh KK. Design of wireless weather monitoring system. Department of Electronics and
Communication Engineering National Institute of Technology
11. DeHennis AD, Wise KD. A wireless microsystem for the sensing of temperature, and relative
humidity. J Micro Elect
A Study on Surface Electromyography
in Sports Applications Using IoT
N. Nithya, G. Nallavan, and V. Sriabirami
Abstract In recent years, surface electromyography plays a vital role in monitoring

muscle activities. Our every action depends on muscles, which help us to move
our body, control neuromuscular system and do any sort of actions. But due to
the over training of muscles, a condition called muscle fatigue occurs. Athletes
use their muscles for a long term during their training. Due to over involvement,
muscles may be subjected to the risk of muscle fatigue. To monitor the activities of
muscles, surface EMG technique is vital. Research in this field has revealed that with
the help of surface EMG methods, muscle fatigue can be monitored, detected and
can prevent the injuries caused by it. This paper aims a complete study on surface
EMG techniques in the evaluation of muscle fatigue that occurs during different
sport activities prevent injuries, access performance in sports activities and signal
processing methods involved in EMG signals. These considerations provide various
advancement in measuring muscle fatigue condition of a player with the help of
surface EMG method. This paper concludes with sufficient knowledge of surface
EMG methods, challenges faced and the future development.
Keywords Surface electromyography · Muscle fatigue · Prevent injury · Access

performance · Signal processing
1 Introduction
In our day-to-day life muscle fatigue has become a very common problem among
most of the people, especially who does heavy activities like sports [1], bodybuilding
etc. Muscles are mainly responsible for our body movements and our postures, and
they also control our heartbeat, breathing and digestion [2]. Muscle fatigue is the
N. Nithya (B) · G. Nallavan

Department of Sports Technology, Tamil Nadu Physical Education and Sports University, Chennai
600127, India
V. Sriabirami
Department of Electronics and Communication Engineering, Panimalar Engineering College,
Chennai 600123, India
https://doi.org/10.1007/978-981-16-7610-9_62
856 N. Nithya et al.
reduction of muscles’ maximum force while it is contracted. Severe fatigue affects

our ability to move or lift any object [3]. This is related with the exhaustion due
to strenuous activity. When a fatigue is experienced, the force behind the muscle
movement decreases which causes us to feel weakness. As the muscle contracts
to the maximum extent, accordingly high-frequency signals are generated, but the
signal cannot withstand for a long time which leads to decline in a muscle force. A
fatigue detection technique is the best solution. This technique could be able to guide a
person in their trainings or in any activities. It can provide accurate readings of muscle
fatigue level. People were also interested on wearable devices which can monitor
their body conditions during their activities. These wearable devices are in the form
of watches or a band which can measure numerous conditions of our body like heart
rate, pulse rate, muscle activities and so on [4]. This detection technique can also
be implemented in other fields like human–computer interaction, neurophysiology,
rehabilitation medicine, physiotherapy, prosthetics and sports rehabilitation [5]. The
aim of this study is how surface EMG technology helps to determine the muscle
fatigue which monitors the fatigue conditions in real time in the field of sports
performance. EMG sensors find applications in sports biomechanics, a tool to access
performance and muscle injury [6–8]. The paper is organized as follows. Section 1
focuses on detection of muscle fatigue in the limbs and lumbar region, the sensor
used and the methodology adopted. Section 2 deals with injury prevention during
muscle fatigue in various sport activities and the effective communication to the user
with the help of IoT and discuss the various architectures of IoT. Section 3 presents
the role of EMG in performance monitoring of a sports person. Section 4 surveys
various signal processing methods used in evaluating muscle fatigue. Limitations
and challenges are discussed in Sect. 5. Lastly, Sect. 6 gives the conclusion that
summarizes the advantages and future scope.
Muscle fatigue is the inability to maintain the required force or expected force. It
is also reported as a feeling of weakness or pain in muscle [9]. It can be classified into
two types; the first one is central fatigue which comprises reduction in the voluntary
activation of muscle due to less number of recruited motor units and their discharge
rate. The second one is the changes that takes place in the nerve muscle transmission
and the action potential that is generated [10]. Mostly, muscle fatigue occurrence
can be seen in athletes who strain their body to achieve in their respective sports.
They suffer a lot with this fatigue condition. By recording their fatigue conditions
with the fatigue detecting system, it helps them to monitor their body condition
in an easy manner. Recently, most of the detection systems are applied to find the
fatigue condition by analyzing muscle pigments. One main methodology is surface
electromyography technique which is used to study the muscle activities. The muscle
fatigue can be detected with this non-invasive method. In addition to this, surface
EMG technique is easily portable, easy to handle and an inexpensive method [11].
Surface EMG sensors are used to register electrical signal of the muscle activity with
bipolar setups. In sports applications, SEMG technique is used to access muscle
activation amplitude in onset or offset condition [12]. A wireless sEMG recording
system is also available, but because of its high sampling frequency, the transmission
A Study on Surface Electromyography in Sports Applications … 857
of sEMG data is a great challenge [13]. This study mainly deals with the detection
of muscle fatigue in upper limb, lower limb and lumbar region.
1.1 Upper Limb
Several research on muscle fatigue in upper limbs on athletes have been conducted
with the help of surface EMG technique. The clinical aspects like kinematics and
surface EMG of athletes on their isometric contractions were experimented. This
system is also used in sports activities and used to monitor transition-to-fatigue
condition to detect its progression. The data of five male athletes were collected.
They were allowed to be seated in a biceps curl machine to perform their activities.
The participants used to stop their activity once they reach the total biceps fatigue.
SHORT-TERM FOURIER TRANSFORM—It extracts surface EMG’s median
frequency (MDF) and mean frequency (MF) when frequency spectrum analysis was
applied.
+∞
X(t, f) = x(t)h(τ − t)e− j 2π f τ dτ (1)
−∞
where t denotes time, f denotes frequency, h(t) denotes window functions and
|X (t, f )|22 denoted as the energy of x(t). For the indication of energy at each time
and frequency, a new variable is introduced p(t, f )
P(t, f ) = |X(t, f )|2 . (2)
MDF and MF
Power Spectral Density
+∞
p(t) = ∫ r(t)e− j f t dt. (3)
−∞
r(t) denotes autocorrelation function

MDF f0
MDF = p( f )d f = p( f )d f (4)
0 MDF
f0
f p( f )d f
MF = 0 f0
(5)
0 p( f )d f
Table 1 Sensors used in

Sensors Findings Position placed
muscle activity monitoring
system Goniometer Measures angle Lateral surface of
between elbow joints arm’s elbow
EMG electrode Measures muscle Biceps brachii
activity
Table 2 Accuracy rate of the

Characteristics Accuracy rate (%)
system
1-Dspectro feature 85.72
Specificity 98.4
Sensitivity 60.2
f 0 is the upper limit frequency of power spectral density.

RMS

1 t −τ
W x (a, τ ) = √ x(t)ϕ ∗ dt (6)
|a| a
W x (a, τ ) denotes wavelet coefficient, a > 0 is scale parameter, ϕ(t)denotes mother

wavelet.
Two types of sensors were used, goniometer and EMG electrode [14]. Good
classification results with accurate readings are obtained. Table 1 lists the findings
and the sensors used. Table 2 explains the accuracy rate of the system, and Table 3
enumerates the methods to find muscle condition. Several studies show the increase
in electromyography signal amplitude and the spectrum change to low frequency
band will increase the muscle fatigue degree during isometric contraction. Different
methodology and their correlation were discussed in many papers.
Table 3 Methods to find muscle condition

Methodology Function Formulas
Short-term Fourier transform To extract MDF and MF X(t, f) =
+∞ − j 2π f τ dτ
−∞ x(t)h(τ − t)e
MDF
Median frequency (MDF), Indicators of spectrum MDF = 0 p( f )d f =
Mean frequency (MF) shifting f0
MDF p( f )d f
f0
f p( f )d f
MF = 0 f0
0 p( f )d f
Root mean square Determine the amplitude of W x (a, τ ) =

surface EMG
√1
|a|
x(t)ϕ ∗ t−τ
a dt
1.2 Correlation Coefficient
Variables X and Y;
X = {Xi }, i = 1, 2, . . . ..N1 (7)

Y = Yj , j = 1, 2, . . . N2 (8)

N
X i − X Yi − Y
i=1
r= (9)

N 2
N 2
i=1 X i − X i=1 Yi − Y
X and Y denote the mean value of X i and Y i .

The muscle fatigue experiment is performed in right forearm of an athlete under
isometric contractions [15]. Deltoid muscles and biceps were monitored with the
help of athletes. The surface electrodes were placed in the deltoid muscle at distance
of one finger thickness in distal and anterior. In bicep muscle, electrodes are placed
in the medial acromion which extends over the shoulder joint and fossa cubit. These
muscles have been analyzed, and finally, it has been found that the repetitive task
of post-stroke rehabilitation is a most suitable methodology for muscle weakness
recovery [16]. Wearable devices are used to detect bicep muscle fatigue that occurs
during gym activities. During elbow flexion, an electromyographic information from
the upper limbs were monitored, normalized and filtered during maximal isometric
tasks based on the amplitudes of EMG signal [17].
1.3 Lower Limb
Muscle fatigue reduces the metabolic performance and neuromuscular system which
results in persistent muscle contraction and decreases its steady activity. Postural
control plays an important role in an appropriate biochemical stance. But the main
factor which affects this postural control is fatigue. Recent researches found that
lower extremity muscles plays an important role in maintaining and balancing
postural control [18]. Athletes those who uses their lower limb mostly will be affected
by less balancing in in postural control and muscle fatigue in their lower limb. The
activities of lower extremity muscles were analyzed for the fatigueness using surface
EMG before and after activity.
These results indicate that the muscle activity level of rectus femoris, hamstrings
and gastrocnemius muscles significantly changes before and after fatigue. An impor-
tant relationship was found between the postural, rectus femoris muscle and fabulous
anterior muscles [19]. Investigating the activity of major muscles of lower limbs
during a soccer sport have taken with the help of 10 soccer players. Electromyo-
graphic activities of lower limb muscles have taken. Muscles like rectus femoris,
biceps femoris, tibialis anterior and gastrocnemius were monitored before and after
exercise. Then the EMG data’s were analyzed, and root mean square were computed
over ten gait cycles. The results showed that after the exercise of intensity soccer-
play simulation, the electromyographic activity in most of the lower limb muscles
was lower than before [20]. A real-time fatigue monitoring system to detect muscle
fatigue during cycling exercise has been developed, which provides an online fatigue
monitoring and also discusses the analysis on lower limb. It contains physical bicycle
with more number of peripheral devices, wireless EMG sensor set and a computer
which provides a visual feedback. The bicyclers were allowed to pedal with constant
speed and EMG signals of lower limb muscles, velocity and time were recorded.
Once the fatigue occurs, cycling speed will show larger deviation in velocity, and
reference was used to judge the cycling stability. This method can be applied on
bicycle ergometer to monitor real-time onset and activities of lower limb muscles
fatigue. Kinesiological and kinematical data’s are measured using this system [21].
1.4 Lumbar Region
Lumbar muscle function is considered as an important factor among physical defi-

ciencies which causes long-term lower trouble [22]. It is difficult to monitor the back
muscles because they have several fascicles which together generate the trunk tasks
[23]. Literatures have demonstrated the detection of lumbar muscle fatigue in sports
players. One of the study shows the estimation of frequency compression of surface
EMG signal during cyclical lifting. Surface EMG techniques plays an important role
in monitoring normal function interaction among active trunk muscles during exer-
cise and specific movements. Activities such as cyclic lifting and isometric trunk
extension were performed and the paraspinal EMG signals are observed and noted.
The signals were extracted from paraspinal muscles using surface EMG electrodes.
This method is called as back analysis and the signals were extracted with isometric
muscle contraction. EMG signals were accurately recorded from six bilateral lumbar
paraspinal regions, which demonstrated the static and dynamic results in different
patterns of EMG spectral changes and record metabolic fatigue processors [24].
2 Prevention of Injuries
Observing the muscle condition of an athlete during their activities is very important
to prevent injury. Swimming is a sport where arms and legs plays a vital role in swim-
mers which is used to create successive movements in propelling the body through
the water. So a system was developed which is capable to measure the stress level of
the muscle of the swimmer and also indicates the muscle fatigue level. This device is
designed by using EMG sensor. It consists of EMG electrodes and a microcontroller

unit which is used to analyze deltoid muscle of a swimmer because it is mostly
involved in the swimming activity. Once the detection of muscle movement started,
it will trigger and gives an alert signal when the measured EMG signal exceeds the
reference muscle fatigue level. By doing so, it helps to prevent the injury [25]. Another
application of EMG is its normalization techniques that is used to detect alterations in
neuromuscular system. This system is diagnosed in the trainers with anterior cruciate
ligament (ACL) knee injury who does heavy treadmill walks [26]. Internet of things
is the interconnection of computational devices that is inbuilt in everyday objects
and permit them to transmit and receive data over Internet. Recent technologies have
found that muscle fatigue prevention can also be designed with the help of IoT.
Muscle fatigue can be determined and recovered with the help of pulse modulation
techniques like pulse width modulation (PWM) and with ESP8266. With the combi-
nation of PWM, ESP8266 and surface electromyographic signals, muscle fatigue
can be monitored and detected in the real-time basis through wireless network. This
technique consists of power supply, EMG transducer, infrared transducer, ESP8266
Wi-Fi module, vibration motor and a motor drive module. ESP8266 is not only a
Wi-Fi adapter but also a processor which can run independently. The main func-
tion of this wireless fatigue detection system is to prevent the injuries caused due to
muscle fatigue during heavy training activities [27]. Table 4 gives the components of
IoT-based muscle fatigue detection system. Table 5 lists out the various layers and
its functions in the architecture of IoT.
Table 4 Components of IoT-based muscle fatigue detection system

Components Functions
Infrared transducer Detects infrared signals to find if someone is using the system
EMG transducer Detects muscle activation through potential and transmits EMG pulse
signals
ESP8266 Acts as a sole communicator and a processor processing monitored data
from the sensors that serve as wireless access point
Table 5 Architecture of IoT

Layers Functıons
Perceptual bottom-level layer Consist of EMG transducer and IR transducer
Network middle-level layer Processes and sends data given by ESP8266 to the intelligent
mobile terminals
Upper-level application layer Information from perceptual layer is analyzed and displayed
3 Performance Accessing
In the field of sports, surface EMG can analyze and monitor different situations and
also makes it a special sort of interest. Improvement in the efficiency of a movement
is mainly determined by the economy of effort, its effectiveness and also injury
prevention [28]. The main goal of performance monitoring systems was to prevent
overtraining of athletes to reduce injuries caused by muscle fatigue and to monitor
the training activities as well as to ensure the performance maintenance [29]. In
sports, movement strategy is very critical, and surface EMG is used to evaluate
activation of muscles in sports application which includes performance, recovery
and also evaluating the risk parameters in injuries. There is a system called Athos
wearable garment system which integrates the surface electromyography electrodes
into the construction of compression athletic apparel.
It decreases the complexity and increases the portability of collection of EMG data
as well as gives processed data. A portable device collects the surface EMG signal,
and it clips them into apparel, process it and sends them wirelessly to a device which
is handled by a client that presents to a trainer or coach. It monitors and provides
the measure of surface EMG which is consistent [30]. Performance of muscles is
calculated in terms of its strength or during contraction, its ability to generate force
[31]. As we are evolving in a highly competitive world, we need a monitoring system
which analyze our body functions with high performance level, especially athletes
those who vigorously train their body. So monitoring of fatigue condition is necessary
to measure accurate fatigue stress level to maximize their performance [32]. Table 6
discusses the various application of EMG sensor in sports.
4 Signal Processing
In the past few years, electromyogram signals were becoming a great need in different
fields of application like human machine interaction, rehabilitation devices, clin-
ical uses, biomedical application, sports application and many more [33]. EMG
signals which is acquired from the muscle need advanced technique for detection,
processing, decomposition and classification. But these signals are very complicated
because they are controlled by the nervous system which is dependent on phys-
iological and anatomical properties of muscles. If the EMG transducer is mainly
placed on the skin surface, it basically collects the signals from all the motors at
a given time. This can generate the interaction of various signals [34]. The EMG
signals which were collected from the muscles using electrode consists of noise.
Removing noise from the signal also becomes an important factor. Such noises are
caused due to different factors which originates from the skin electrode interface,
hardware source and also from other external sources. The internal noise generated
form the semiconductor devices also affect the signal. Some of them include motion
artifact, ambient noise, ECG noise, crosstalk and so on [35]. The EMG signals may be
Table 6 Application of EMG sensor in various sports field

Sports Muscles Methods/Parameters Findings
Wheel chair basket Trunk muscles Wearable EMG device Provides a study on the
ball use of EMG wearable
sensors in sports person
with disability
Body builders Biceps brachii muscles 1D spectral analysis, Found 90.37% accuracy
sun SPOT (small in monitoring and
programmable object detecting muscle
technology), wearable fatigue
surface EMG with
goniometer
Students who Lower extremity Y balance test evaluates Dynamic balance is
completed sport muscles, like rectus dynamic balance and to evaluated accurately.
science femoris, tibialis standardize SEBT (star The paired T-test shows
anterior muscles, excursion balance test). the activity level of
lateral hamstrings, Paired T-test to find lower extremity
gastrocnemius relationship between muscles which changes
muscles postural control and after and before fatigue
muscle fatigue in lower
extremity muscles
Soccer Rectus femoris, tibialis Custom written EMG activities in lower
anterior muscles, software to compute limb muscles were
gastrocnemius RMS reduced after a soccer
muscles, biceps match play
femoris
Cyclical lifting Paraspinal muscles Time–frequency Static and dynamic
analysis tasks from different
pattern of EMG
spectrum changes
Swimming Deltoid muscles EMG device with Measures muscle stress
ARDUINO UNO level and indicates
REV3,Bluetooth and muscle fatigue level in
EMG sensor athletes
high or low. The amplifiers direct current offsets produce low-frequency noise. This
low-frequency noise can be filtered using high-pass filters, whereas nerve conduc-
tion produce high-frequency noise. High-frequency interference comes from radio
broadcasts, computers which can be filtered using low-pass filter [36]. A specific
band of frequencies should be transmitted in the EMG transmission process which
needs to remove low and high frequencies. It is achieved by a filter called band pass
filter. It is much suitable for EMG signals because it allows specific bands to be
transmitted according to the range fixed by a trainer [37]. EMG signal processing
techniques include three procedures; they are filtration, rectification and smoothing.
Advanced signal processing methods are used in the detection of muscle fatigue
system. The suitable surface EMG signal processing methods for muscle fatigue
evaluation and detection have been listed below.
1. Time Domain Methods—consist of estimation of surface EMG amplitude, zero

crossing rate of the signal, spike analysis.
2. Frequency Domain Methods—consist of Fourier-based spectral analysis,
parametric-based spectral analysis.
3. Combined analysis of the spectrum and amplitude of the EMG signal will give
out the fatigue and the force involved.
4. Time-Frequency and Time Scale methods—consist of general time–frequency
representations which is also known as Cohen class, shot-time Fourier transform
and spectrogram, winger distribution, time varying auto aggressive approach,
wavelets, Choi-Williams distribution.
5. Spectral shape indicators and other mathematical methods like frequency band
method, representation of logarithmic power–frequency, fractional analysis,
recurrence qualification analysis, Hilbert-Haung transform [38].
Many sports activities require heavy physical trainings which are undergone by
athletes during their vigorous workouts that lead to muscle fatigue and also sometimes
causes injuries [39]. By these advanced signal processing techniques, muscle fatigue
can be detected and analyzed and also help to prevent injuries caused by it.
5 Discussion
This paper is mainly focused on the role of surface EMG sensors and its contribution
in monitoring and detecting muscle fatigue in different body parts such as lower limb,
upper limb and lumbar region during sports activities. The study is also concentrated
in analyzing its various signal processing methods. From this analysis, it is found
that there are only prototypes and samples built up with certain conditions. The chal-
lenges faced with surface EMG are: (1) the signal received from the surface EMG
must be accurate. If any noise gets mixed up, interpretation might go wrong; (2) the
wearables are handy and use batteries for their power consumption. The challenge
is the operating hours should be higher, avoiding repeated replacements (3) every
person is concerned about their data privacy. Data security must be ensured while
transferring them. (4) If there is displacement of electrodes on the muscles, then
the spatial relationship cannot be maintained which will affect the amplitude of the
signal (5) the variation between surface EMG and the power loss is higher before and
after the activity. So the EMG models might not give the proper values of the muscle
fatigue after an intense training. There are no sufficient and advanced technologies
for evaluating muscle fatigue. Further, researchers can concentrate on adopting effi-
cient machine learning and artificial intelligence technologies with secured IoT data
transfer to give an instant update about the strain encountered in the muscles.
6 Conclusion
The aim of this study is to analyze various methods of surface EMG techniques
used to monitor muscle fatigue condition in different sports activities. This paper
also demonstrated various categories such as prevention of injuries in athletes with
the use of surface EMG, monitoring the performance with muscle activity and its
signal processing techniques. EMG signals can be transmitted over the Internet for
further analysis. Cloud infrastructure provides storage and processing resources over
the Internet to support EMG monitoring system. Researchers should focus on the
feasibility of the wearable devices to be made available in market as a reliable one to
monitor the signals and derive valuable information in real time. Efficient machine
learning algorithms can be introduced to classify the signals based on the activity. In
future, technologically advanced and compact muscle fatigue detection system with
surface EMG can be implemented.
References
1. Nithya N, Nallavan G (2021) Role of wearables in sports based on activity recognition and
biometric parameters: a survey. In: International conference on artificial intelligence and smart
systems (ICAIS), pp 1700–1705
2. Chaudhari S, Saxena A, Rajendran S, Srividya P (2020) Sensors to monitor the musclar
activity—a survey. Int J Sci Res Eng Manage (IJSREM) 4(3):1–11
3. Yousif H, Ammar Z, Norasmadi AR, Salleh A, Mustafa M, Alfaran K, Kamarudin K, Syed
Z Syed Muhammad M, Hasan A, Hussain K (2019) Assessment of muscle fatigue based on
surface EMG signals using machine learning and statistical approaches: a review. In: IOP
conference series materials science and engineering, pp 1–8
4. Adam DEEB, Sathesh P (2021) Survey on medical imaging of electrical impedance tomography
(EIT) by variable current pattern methods. J IoT Soc Mob Anal Cloud 3(2):82–95
5. Liu SH, Lin CB, Chen Y, Chen W, Hsu CY (2019) An EmG patch for real-time monitoring of
muscle-fatigue conditions during exercise. Sensors (Basel) 1–15
6. Taborri J, Keogh J, Kos A, Santuz A, Umek A, Urbanczyk C, Kruk E, Rossi S (2020) Sport
biomechanics applications using inertial, force, and EMG sensors: a literature overview. Appl
Bionics Biomech 1–18
7. Fernandez-Lazaro D, Mielgo-Ayuso J, Adams DP, Gonzalez-Bernal JJ, Fernández Araque A
(2020) Electromyography: a simple and accessible tool to assess physical performance and
health during hypoxia training. Syst Rev Sustain 12(21):1–16
8. Worsey MTO, Jones BS, Cervantes A, Chauvet SP, Thiel DV, Espinosa HG (2020) Assess-
ment of head impacts and muscle activity in soccer using a T3 inertial sensor and a portable
electromyography (EMG) system: a preliminary study. Electronics 9(5):1–15
9. Gonzalez-Izal M, Malanda A, Gorostiaga E, Izquierdo M (2012) Electromyographic models
to access muscle fatigue. J Electromyogr Kinesiol 501–512
10. Boyas S, Guevel A (2011) Neuromuscular fatigue in healthy muscle: underlying factors and
adaptation mechanisms. Annal Phys Rehabil Med 88–108
11. Al-Mulla MR, Sepulveda F, Colley M (2012) Techniques to detect and predict localised muscle
fatigue 157–186
12. Rum L, Sten O, Vendrame E, Belluscio V, Camomilla V, Vannozzi G, Truppa L, Notarantonio
M, Sciarra T, Lazich A, Manniini A, Bergamini E (2021) Wearable sensors in sports for persons
with disability. Sensors (Basel) 1–25
13. Chang KM, Liu SH, Wu XH (2012) A Wirwless sEMG recording system and its application
to muscle fatigue detection. Sensors (Basel) 489–499
14. Al-Mulla MR, Sepulveda F, Colley M (2011) An autonomous wearable system for predicting
and detecting localised muscle fatigue. Sensors (Basel) 1542–1557
15. Ming D, Wang X, Xu R, Qiu S, Zhao Xin X, Qi H, Zhou P, Zhang L, Wan B (2014) SEMG
feature analysis on forearm muscle fatigue during isometric contractions 139–143
16. Cahyadi BN, Khairunizam W, Zunaidi I, Lee Hui L, Shahriman AB, Zuradzman MR, Mustafa
WA, Noriman NZ (2019) Muscle fatigue detection during arm movement using EMG Signal.
In: IOP conference series: materials science and engineering, pp 1–6
17. Angelova S, Ribagin S, Raikova R, Veneva I (2018) Power frequency spectrum analysis of
surface EMG signals of upper limb muscles during elbow flexion—a comparison between
healthy subjects and stroke survivors. J Electromyogr Kinesiol 1–29
18. Filipa A, Bymes R, Paterno MV, Myer GD, Hewett TE (2010) Neuromuscular training improves
performance on the star excursion balance test in young female athletes. J Orthopeadic Sports
Phys Theraphy 551–558
19. Fatahi M, Ghesemi GHA, Mongasthi Joni Y, Zolaktaf V, Fatahi M (2016) The effect of lower
extremity muscle fatigue on dynamic postural control analysed by electromyography. Phys
Treatments. 6(1):37–50
20. Rahnama N, Lees A, Reilly T (2006) Electromyography of selected lower-limb muscles
fatigued by exersice at the intensity of soccer match-play. J Electromyogr Kinesiol 16(3):257–
263
21. Chen SW, Liaw JW, Chan HL, Chang YJ, Ku CH (2014) A real-time fatigue monitoring
and analysis system for lower extremity muscles with cycling movement. Sensors (Basel)
14(7):12410–12424
22. Elfving B, Dedering A, Nemeth G (2003) Lumbar muscle fatigue and recovery in patients
with long-term low-back trouble—electromyography and health-related factors. Clin Biomech
(Bristol, Avon) 18(7):619–630
23. Coorevits P, Danneels L, Cambier D, Ramon H, Vandeerstraeten G (2008) Assessment of the
validity of the biering- sorensen test for measuring back muscle fatigue based on EMG median
frequency characteristics of back and hip muscles. J Electromyogr Kinesiol 18(6):997–1005
24. Roy SH, Bonato P, KnaflitZ M (1998) EMG assessment of back muscles during cyclical lifting.
J Electromyogr Kinesiol 8(4):233–245
25. Helmi M, Ping C, Ishak N, Saad M, Mokthar A (2017) Assesment of muslce fatigue using
electromyographm sensing. In: AIP conference proceedings, pp 1–8
26. Benoit DL, Lamontage M, Cerulli G, Liti A (2003) The clinical significance of electromyo-
graphy normalisation techniques in subjects with anterior cruciate ligament injury during
treadmill walking. Gait Posture 18(2):56–63
27. Yousif HA, Zakaria A, Rahim NA, Salleh AF, Mahmood M, Alfran KA, Kamarudin L, Mamduh
SM, Hsan A, Hussain MK (2019) Assesment of muscle fatigue based on surface EMG signal
using machine learning and statistical approaches: a review. In: IOP conference series: materials
science and engineering, pp 1–8
28. Masso N, Rey F, Remero D, Gual G (2010) Surface electromyography application in the sport.
Apunts Med Esport 45(165):121–130
29. Taylor KL, Chapman D, Cronin J, Newton M, Gill N (2012) Fatigue monitoring in high
performance sport: a survey of current trends. J Aust Strength Conditioning 12–23
30. Lynn SK, Watkins CM, Wong MA, Balfany K, Feeney DF (2018) Validity and reliability of
surface electromyography measurements from a wearable athlete performance system. J Sports
Sci Med 17(2):205–215
31. Kuthe C, Uddanwadiker R, Ramteke A (2018) Surface electromyography based method for
computing muscle strength and fatigue of biceps brachii muscle and its clinical implementation.
Inf Med Unlocked 34–43
32. Austruy P (2016) Neuromuscular fatigue in contact sports: theories and reality of a high
performance environment. J Sports Med Doping Stud 6(4):1–5
33. Chowdhury RH, Reaz RH, Ali MA, Bakar AA, Chellapan K, Chang TG (2013) Surface elec-
tromyography signal processing and classification techniques. Sensors (Basel) 13(9):12431–
12466
34. Raez MB, Hussain MS, Mohd-Yasin F (2006) Techniques of EMG signal analysis: detection,
processing, classification and application. Biol Proced Online 11–35
35. Shair EF, Ahmad S, Marhaban MH, Tamrin SM, Abdullah AR (2017) EMG processing based
measures of fatigue assessment during manual lifting. BioMedical Res Int 1–12
36. Senthil Kumar S, Bharath Knnan M, Sankaranarayanan S, Venkatakrishnan A (2013) Human
hand prosthesis on surface EMG signals for lower arm amputees. Int J Emerg Technol Adv
Eng 3(4):199–203
37. De Luca CJ, Gilmore LD, Kuznetsov M, Roy SH (2010) Filtering the surface EMG signal:
movement artifacts and baseline noise contamination. J Biomech 43(8):1573–1579
38. Cifrek M, Medved V, Tonkovic S, Ostojic S (2009) Surface EMG based muscle fatigue
evaluation in biomechanics 24(4):327–340
39. Ahmad Z, Jamaudin MN, Asari MA, Omar A (2017) Detection of localised muscle fatigue
by using wireless surface electromyogram(sEMG) and heart rate in sports. Int Med Devices
Technol Conf 215–218
Detection of IoT Botnet Using Recurrent
Neural Network
P. Tulasi Ratnakar, N. Uday Vishal, P. Sai Siddharth, and S. Saravanan
Abstract The Internet of Things (IoT) is one of the most used technologies nowa-
days. Hence, the number of DDoS attacks generated using IoT devices has raised.
Normal anomaly detection methods, like signature-based and flow-based methods,
cannot be used for detecting IOT anomalies as the user interface in the IOT is incor-
rect or helpless. This paper proposes a solution for detecting the botnet activity within
IoT devices and networks. Deep learning is currently a prominent technique used to
detect attacks on the Internet. Hence, we developed a botnet detection model based on
a bidirectional gated recurrent unit (BGRU).The developed BGRU detection model
is compared with gated recurrent unit (GRU) for detecting four attack vectors Mirai,
UDP, ACK, and DNS generated by the Mirai malware botnet, and evaluated for loss
and accuracy. The dataset used for the evaluation is the traffic data created using the
Mirai malware attack performed on a target server using C&C and scan server.
Keywords Internet of Things (IoT) · Botnet · Gated recurrent unit (GRU) ·

Bidirectional gated recurrent unit (BGRU) · Deep learning
1 Introduction
1.1 Internet of Things (IoT)
The Internet of Things (IoT) is an ongoing means of communication [1]. In the near
future, it is anticipated that objects of regular lifestyle can be equipped with microcon-
trollers, microprocessors for virtual communication [2], and proper protocol stacks
in order for them to speak to everyone else and to the users and become a vital
element of the Internet.
P. Tulasi Ratnakar · N. Uday Vishal · P. Sai Siddharth · S. Saravanan (B)

Department of Computer Science and Engineering, Amrita School of Engineering, Bengaluru,
Amrita Vishwa Vidyapeetham, India
e-mail: s_saravanan@blr.amrita.edu
https://doi.org/10.1007/978-981-16-7610-9_63
870 P. Tulasi Ratnakar et al.
1.2 Distributed Denial of Service (DDoS) Attacks
DOS and DDoS attacks have evolved to be a widespread area, posing substan-
tial hazards to network security and online services’ efficiencies. Due to the inter-
connection between machines that are based on the World Wide Web, the target for
denial of service (DoS) attacks is convenient [2]. A denial of service (DoS) attack
attempts to prohibit potential users from accessing a computer or network resource
by disrupting or stopping the service on an Internet host on a permanent basis. A
distributed DoS attack occurs when several hosts are working together to bomb a
victim with excess of attack packets, and at the same time, the attack takes place in
several locations.
1.3 Botnet
A bot is a computer program which carries out complex tasks. Bots are automatic,
ensuring that they can run without assistance from a human user according to their
own set of instructions. Bots are always trying to imitate or replace human activities
[3]. Botnets are computer networks used to steal information, send phishing emails,
perform distributed denial of service attacks, and allow a hacker to access and extract
information from a particular system. Botnet detection is a method used in various
techniques to identify botnets of IoT devices [4]. The botmasters are responsible for
sending commands to all bots in that particular network using command and control
(C&C) tools. Multiple DoS and DDoS attacks have emerged as threats with increased
use and implementation of IoT devices in recent years. These attacks take place at
different IoT network protocol levels. The layers of the protocol are physical, MAC,
6LoWPAN, network, and device layer [5]. In 2016, a DDoS attack against DNS
provider ‘Dyn’ culminated in the largest DDoS attack ever recorded. Linux.Mirai
created a massive botnet (network of infected devices) via which millions of linked
devices, including webcams, routers, and digital video recorders were infected. This
incident is known as the largest DDoS attack that occurred on October 21, 2016,
with an attack speed of approximately 1.1 terabits per second (Tbps).
1.4 Deep Learning
Deep learning is a new field of machine learning designed to model higher-level

data abstraction. The objective of deep learning is to become closer to artificial
intelligence. It substitutes typical handcrafted features with unsupervised or semi-
supervised feature learning algorithms and hierarchical feature extraction. Deep
learning is a type of machine learning algorithm which uses multiple layers to extract
superior characteristics from raw data. Lower layers, for instance, identify image
Detection of IoT Botnet Using Recurrent Neural Network 871
processing edges and high layers identify human or animal, numbers, letters, or
facets [6]. Deep learning mainly benefits by leveraging unstructured data, achieving
higher quality results, reducing costs, and eliminating need for data classification,
which allows deep learning to be used in neural networks.
1.5 Role of Deep Learning in Detecting Botnet
In the domain of networking and cyber security, deep learning is crucial since
networks are prone to security risks including IP spoofing, attack replay or SYN
inundation, jamming as well as resource restrictions including out-of-memory, inse-
cure software, etc., [7]. Deep learning’s self-learning utility has improved accuracy
and processing speed, allowing it to be used effectively to detect Mirai botnet attacks
in IoT devices.
This paper proposes a way for detection of botnet activity among IoT devices and
networks. A detection model is created using recurrent Neural networks (RNN), and
the algorithm used for detection is gated recurrent unit (GRU). Detection is performed
at the packet stage, with an emphasis on text recognition within features rather than
flow-based approaches. For text recognition and conversion, a method called word-
embedding is used. The BGRU based detection model is further compared with GRU
based detection model based on the evaluation metrics, accuracy, and loss.
The main contribution of this paper is:
• To develop GRU and BGRU recurrent neural networks (RNN)-based botnet
detection models.
• To compare the performance of GRU and BGRU models with respect to loss and
accuracy.
The rest of the paper is organized as follows: Section 2 deals with the related work.
Section 3 outlines the design of the system to develop GRU and BGRU recurrent
neural networks (RNN)-based detection models. Section 4 explains about the detailed
implementation of GRU and BGRU recurrent neural networks (RNN). Section 5
contains results for comparing the accuracy and loss of GRU and BGRU recurrent
neural networks (RNN). Section 6 concludes the paper and makes recommendations
for future studies.
2 Related Work
Torres et al. [8] have proposed a work, whereby using a sequence of time changing
states to model network traffic, the viable behavior of recurrent neural networks
is analyzed. The recent success of the RNN’s application to data sequence issues
makes it a viable sequence analysis candidate. The performance of the RNN is
evaluated in view of two important issues, optimal sequence length, and network
traffic imbalances. Both issues have a potentially real impact on implementation. The
evaluation is performed by means of a stratified k-fold check and a separated test of
unprecedented traffic from another botnet takes place. The RNN model resulted in
an accuracy of 99.9% on unseen traffic.
Sriram et al. [9] have proposed a botnet detection system based on deep learning
(DL), which works with network flows. On various datasets, this paper compares and
analyzes the performance of machine learning models versus deep neural network
models for P2P botnet detection. They employ the t-distributed stochastic neighbor-
embedding (t-SNE) visualization technique to comprehend the various characteris-
tics of the datasets used in this study. On the DS-1 V3 dataset, the DNN model they
used achieved 100% accuracy.
A recently implemented DNN approach is used to detect malware in an efficient
way. DNN methods have a key importance in their ability to achieve a high rate of
detection while generating a low false positive rate. Ahmed et al. [10] have proposed a
strategy for identifying botnet assaults that relies on a deep learning ANN. Other
machine learning techniques are compared to the model developed. The performance
of the ANN model is evaluated by number of neurons within the hidden layers. For
six neurons, accuracy is 95%; for eight neurons, accuracy is 96%; for ten neurons,
accuracy is 96.25%.
Yerima et al. [11] have proposed a deep learning approach based on convolutional
neural networks (CNN) to detect Android botnet. A CNN model which is able to
differentiate between botnet applications and normal applications with 342 static app
features is implemented in the proposed botnet detection system. The trained botnet
detection model is evaluated by a series of 6802 real apps with 1929 botnets of the
open botnet dataset ISCX. The results of this model are examined by different filter
sizes. The best results can be achieved with 32 filters, with an accuracy of 98.9%.
Nowadays, IOT devices are widely used to form botnet, and as a result, McDer-
mott et al. [12] have proposed a solution to detect IOT based botnet attack packets
using deep learning algorithms such as long short-term memory (LSTM) and bidirec-
tional long short-term memory (BLSTM). They have used a technique called word
embedding for mapping text data to vectors or real numbers. As LSTM is a recurrent
neural network, it stores past data in order to predict future results. To remember the
past memory, it uses three gates: forget gate, input gate, and output gate, whereas
bidirectional LSTM uses these gates to store both past and future memory. Both
LSTM and BLSTM resulted in the accuracy of 0.97.
By comparing the collected data with actual expected data, it is possible to detect
real glitch in the collected data by comparing the collected data with unexpected data
received from lower-level fog network devices. The glitches impacting performance
might take the form of a single data point, a set of data points or even data from sensors
of the same type or many different components to detect these glitches. Shakya et al.
[13] proposed a deep learning approach that learns through the expected data to
identify the glitches. The proposed deep learning model resulted in an accuracy that
is nearer to 1.0.
A network attack is possible on IoT devices since they are interconnected with
the network to analyze accumulated data via the internet. To detect IoT attacks, it
is necessary to develop a security solution that takes into account the characteristics
of various types of IoT devices. Developing a custom designed safety solution for
every sort of IoT device is, however, a challenge. A large number of false alarms
would be generated using traditional rule-based detection techniques. Hence, Kim
et al. [14] proposed a deep learning-based model using LSTM and recurrent neural
network (RNN) for detecting IoT based attacks. N-Balot IoT dataset is used to train
this model. When it came to detecting BashLite Scam botnet data, LSTM achieved
the highest accuracy of 0.99.
A massively connected world, such as the Internet of Things (IoT), generates a
tremendous amount of network traffic. It takes a long time to detect malicious traffic
in such a large volume of traffic. Detection time can be considerably decreased if this
is done at the packet-level. Hwang et al. [15] proposed a unique word embedding
technique to extract the semantic value of the data packet and used LSTM to find
out the time relationship between the fields in the data packet header, and determine
whether the incoming data packet is a normal flow component or a malicious flow
component. This model was trained on four datasets: ISCX-IDS-2012, USTC-TFC-
2016, Mirai-RGU, and Mirai-CCU. The highest accuracy of 0.9999 is achieved on
ISCX-IDS-2012 dataset.
Hackers have been attracted to IoT devices by their proliferation. The detection of
IoT traffic anomalies is necessary to mitigate these attacks and protecting the services
provided by smart devices. Since anomaly detection systems are not scalable, they
fail miserably when dealing with large amounts of data generated by IoT devices.
Hence, in order to achieve scalability, Bhuvaneswari Amma et al. [16] proposed
an anomaly detection framework for IoT using vector convolutional deep learning
(VCDL) approach. Device, fog, and cloud layers are included in the proposed frame-
work. As the network traffic is sent to the fog layer nodes for processing, this anomaly
detection system is scalable. This framework has a precision of 0.9971%.
IoT-connected devices dependability depends on the security model employed to
safeguard user data and prevent devices from participating in malicious activity. Many
DDoS assaults and botnet attacks are identified utilizing technologies that target
devices or network backends. Parra et al. [17] proposed a cloud-based distributed
deep learning framework to detect and defend against botnet and phishing attacks.
The model contains two important security mechanisms that work in tandem: (i) the
distributed convolutional neural network (DCNN) model is embedded in the micro-
security plug-in of IoT devices to detect application-level phishing and DDoS attacks
and (ii) temporal long short memory (LSTM) network model hosted in the cloud is
used to detect botnet attacks and receive CNN attachments. The CNN component in
the model achieved an accuracy of 0.9430, whereas the LSTM component achieved
an accuracy of 0.9784.
In the above-mentioned works, many botnet identification methods have employed
deep learning algorithms. We can refer from [18] that deep learning algorithms
have better performance than basic machine learning algorithms. As our detection is
performed at the packet level, most of the packet information is present in a sequential
pattern in the info feature. Recurrent neural network is more efficient than the arti-
ficial neural network when it comes to sequential series [19]. We developed a gated
recurrent neural network (GRU)-based botnet detection model that runs faster than
LSTM [20] as it has fewer training parameters, and the word embedding technique
is used for mapping text data to vectors or real numbers.
This section provides the blueprint for developing the GRU and BGRU-based recur-
rent neural networks (RNN) for detection of IOT based botnet attack vectors. This
architecture functions as illustrated in Fig. 1.
3.1 Feature Selection
The network traffic dataset contains the following features (1) No., (2) Time, (3)
Source, (4) Destination, (5) Protocol, (6) Length, (7) Info, and (8) Label. There may
be some features which do not affect the performance of the classification or perhaps
make the results worse; hence, we need to remove those features as a result we
selected Protocol, Length, and Info features from the dataset [10] and Label is the
target feature.
3.2 Word Embedding
Our computers, scripts, and deep learning models are unable to read and understand
text in any human sense. Text data must therefore be represented numerically. The

numerical values should capture as much of a word’s linguistic meaning as possible.

Choosing an input representation that is both informative and well-chosen can have a
significant impact on model performance. To solve this problem, word embeddings
are the most commonly used techniques. Hence, post-feature selection, all of the
text characters in the network traffic data must be converted to vector or real number
format. We adopted word embedding to transform text characters in info field to real
number format.
3.3 Building GRU and BGRU Models
Gated recurrent units simplify the process of training a new model by improving
the memory capacity of recurrent neural networks. They also solve the vanishing
gradient problem in recurrent neural networks. Among other uses, they can be applied
to the modeling of speech signals as well as to machine translation and handwriting
recognition. Considering the advantages of GRU, we employed it for detection of
botnets. Once the feature selection and word embedding are complete, we need to
split the data into train data and test data. The GRU and BGRU models are built, and
the models are trained using train data. The trained models are tested using test data,
and the required metrics are evaluated to compare the models.
4 Implementation
The developed model uses a GRU and BGRU recurrent neural network, as well as
word embedding, to convert the string data found in the captured data packets into
data that can be used as GRU and BGRU input.
4.1 Dataset and Algorithm
Dataset [21] used in this work includes both normal network traffic and botnet attack
network traffic. No., Time, Source, Destination, Protocol, Length, Info, and Label
are some of the features in our dataset. Some features, such as No., Time, Source, and
Destination, are omitted as they are not useful for data processing. The Info feature
contains most of the captured information. Algorithm 1 shows the detailed steps of
our implementation.
Algorithm 1: Algorithm for Detecting Botnet

1: Read the training dataset
2: Extract length, protocol, info, label features from dataset
3: Set vocabulary size ← 50000
4: repeat
5: for row ←1, rows do
6: Convert text data into tokenized integer format through hashing (hash values ranges from
0 to 49999 as vocabulary size is set to 50000)
7: Pad data arrays with 0s to max 35
8: end for
9: until return training dataset
10: set model ← sequential()
11: add 3 GRU hidden layers and 3 BGRU hidden layers with each layer of size 50 units (neurons) to the
model
12: add dense layer i.e. output layer with activation function that is sigmoid to the model
13: compile the model by setting the following parameters:
14: optimizer ← adam, loss ← categorical_crossentropy, metrics ← accuracy
15: Fit the model to the training data by dividing 10% of the data as validation data to check for
overfitting.
16: run the model for 50 epochs
17: after running all the epochs return loss, validation loss, accuracy, validation accuracy
18: Read the test data and perform the steps 1 to 9
19: Predict the results of test data using the trained model and evaluate the accuracy
4.2 Feature Selection
As explained in 4.1, No., Time, Source, Destination are not useful; hence, we omitted
them. The remaining features Protocol, Length, and Info are selected for further
processing.
4.3 Word Embedding
Actually, the data in our dataset’s Info feature follows a sequential pattern. Hence, we
built our solution by converting each letter into a token and storing it in binary format.
A vocabulary dictionary of all tokenized words is produced, and their associated index
is substituted with the index number in the info column. To understand each type of
attack, the order of the indices in a series must be maintained, and hence, an array of
the indices is generated. Since the protocol and length of the packet that was captured
are related with each attack, the protocol and length features are both included in the
array we previously generated. Word embedding is also used to convert and generate
a dictionary of tokenized protocols together with their index. The length features, as
well as the tokenized protocols, are added to the array. The target feature is converted
from string to integer to classify each type of captured packet. We used one hot
function to transform strings into indexes while simultaneously creating a 2D list

and a dictionary. At last, as we know that deep neural networks require equal length
arrays, hence we need to find the max length of the text which is in the info feature
and pad_sequences function is used to pad all the arrays such that the maximum
length should be equal to 35 for better processing. The arrays obtained are converted
into 3D Numpy arrays, as required for the GRU layer.
4.4 Building GRU and BGRU Models
After feature selection and word embedding, the data is split into train and test data.
The IOT based botnet detection models are built using GRU and BGRU and trained
by the train data. The detection model incorporates the output layer with sigmoid
activation. A total of 50 iterations of the categorical cross entropy loss function and
Adam optimizer are used to build the models. We evaluated the metrics like loss,
accuracy, validation loss, validation accuracy, and compared the results of GRU and
BGRU to find out its efficiency.
5 Results
The six experiments will assess the overall performance of the two GRU and
BGRU models. Python is the programming language used to build these models. We
used Anaconda (IDE for Python), Keras (Python library for building deep learning
models), Scikit learn (Python library for data preprocessing), Pandas, Numpy (Python
libraries for working on data frames and arrays) to build the models.
5.1 Model Comparison
Six experiments for comparing GRU and BGRU models are conducted on each
model. The first four experiments use a train dataset and a test dataset containing
normal network traffic and an attack vector network traffic. Both models are trained
using train data and then tested using test data. For each attack vector, evaluation
metrics such as accuracy and loss are calculated. The fifth experiment uses the
train dataset containing normal network traffic and multi-attack vector [Mirai, UDP,
DNS, ACK] network traffic. Both models are trained using train data and then tested
using test data. Evaluation metrics like accuracy and loss are calculated for multiple
attack vectors. The sixth experiment uses the train dataset containing normal network
traffic and multi-attack vector [excluding ACK attack] network traffic. Both models
are trained using train data and then tested using test data. Evaluation metrics like
accuracy and loss are calculated for multiple attack vectors. The validation data used
Table 1 Evaluation metrics

Train data Validation data Test data
Accuracy Loss Accuracy Loss Accuracy
GRU EXPT-l 1.0 8.0517*10ˆ−6 1.0 5.2180*10ˆ−6 0.999717
EXPT-2 1.0 2.3431*10ˆ−5 1.0 1.4151*10ˆ−5 0.999999
EXPT-3 1.0 2.3432*10ˆ−4 1.0 4.2185*10ˆ−4 0.999981
EXPT-4 1.0 3.0320*10ˆ−5 1.0 1.8424*10ˆ−5 0.999992
EXPT-5 1.0 0.0011 1.0 5.5934*10ˆ−4 1.0
EXPT-6 0.9988 0.0013 0.9990 8.3498*10ˆ−4 0.999317
BGRU EXPT-1 1.0 8.0868*10ˆ−6 1.0 1.1475*10ˆ−6 0.999717
EXPT-2 1.0 4.6347*10ˆ−6 1.0 2.9357*10ˆ−6 0.999980
EXPT-3 1.0 4.6718*10ˆ−6 1.0 2.4362*10ˆ−6 0.999687
EXPT-4 1.0 6.0714*10ˆ−6 1.0 3.8378*10ˆ−6 1.0
EXPT-5 1.0 1.3253*10ˆ−4 1.0 4.3010*10ˆ−6 0.999987
EXPT-6 1.0 6.5985*10ˆ−5 1.0 3.0802*10ˆ−5 0.939178
in these experiments is 10% of the train data, and this data is further validated to
determine whether or not overfitting exists in our model.
Table 1 shows the evaluation metrics for all six experiments, including accuracy,
validity accuracy, test accuracy, loss, and validation loss.
According to the above Table 1, BGRU is more efficient than GRU since the
accuracy of both algorithms is almost equal, but the loss for BGRU is minimal when
compared to GRU in all the experiments performed. While detecting ACK attacks
in conjunction with other attack vectors, the accuracy is reduced; however, the GRU
model used in this paper performs commendably when predicting ACK attacks. Table
1 shows that the accuracy of experiments that include ACK attack vector (EXPT-3,
EXPT-5) is nearly equal to 1.0. Table 1 shows that the accuracy of validation data in
all experiments is nearly equal to 1.0. This indicates that our model does not exhibit
overfitting.
Table 2 displays the number of training and testing tuples used in each of the six
experiments, as well as the Avgtime/Epoch. Since BGRU is bidirectional, it takes
more time to train than GRU, as given in Table 2. Though BGRU takes more time
compared to GRU, we can refer from Table 1 that it has minimal loss compared to
GRU model which makes it effective than GRU.
As mentioned in Sect. 4.4, 50 epochs are executed for both models, and two graphs
are plotted for each experiment to show how the accuracy and loss varied across each
epoch.
The graphs obtained from each experiment are shown in Figs. 2, 3, 4, 5, 6, and 7.
Once the highest accuracy is reached, the variation of the accuracy and loss across
the epochs in GRU and BGRU in experiments 1, 2, 3, 4 (single attack vector network
traffic) is linear, as shown in Figs. 2, 3, 4, and 5. However, in experiments 5, 6 (multi-
attack vector network traffic), the graphs of accuracy and loss across each epoch
show slight deviations in the case of GRU, as shown in Figs. 6 and 7, whereas the
Table 2 Training and testing tuples

Experiment Train tuples Test tuples Avgtime/Epoch (s)
GRU EXPT-1 462,174 586,615 216
EXPT-2 444,672 205,957 213
EXPT-3 518,770 214,300 274
EXPT-4 489,552 139,798 269
EXPT-5 521,446 211,585 271
EXPT-6 510,861 193,548 357
BGRU EXPT-1 462,174 586,615 459
EXPT-2 444,672 205,957 443
EXPT-3 518,770 214,300 550
EXPT-4 489,552 139,798 513
EXPT-5 521,446 211,585 567
EXPT-6 510,861 193,548 526
Fig. 2 a–d Graphs of experiment-1 (Mirai attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 3 a–d Graphs of experiment-2 (UDP attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 4 a–d Graphs of experiment-3 (ACK attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 5 a–d Graphs of experiment-4 (DNS attack) a GRU accuracy b GRU loss c BGRU accuracy
d BGRU loss
Fig. 6 a–d Graphs of experiment-5 (Multi-attack with ACK) a GRU accuracy b GRU loss c BGRU
accuracy d BGRU loss
Fig. 7 a–d Graphs of experiment-6 (Multi-attack without ACK) a GRU accuracy b GRU loss c
GRU accuracy d BGRU loss
BGRU model works the same as in experiments 1, 2, 3, 4. That is why, despite the
additional overheads, BGRU is a better model than GRU.
6 Conclusion
This paper contains implementation of GRU, BGRU along with a technique

called word embedding for the detection of IoT based botnet attacks. GRU and
BGRU models are compared based on the evaluation metrics such as accuracy, loss,
validation accuracy, validation loss, and test accuracy. The attack vectors “Mirai,”
“UDP,” “ACK,” “DNS” resulted in a test accuracy of 0.999717, 0.999999, 0.999981,
0.999992 for GRU model and 0.999717, 0.999980, 0.999687, 1.0 for BGRU model.
These results demonstrate the power of our IoT botnet detection model, which
concentrates and analyzes packet-level detection and applies text recognition on
features. The bidirectional approach adds overhead for every epoch during training
and increases processing time, compared to single-direction approach, but seems like
a better model with efficient results as of its layers.
The client server architecture for formation of botnet in IoT networks has a
problem of single point failure. Hence, botnet attackers started using peer to peer
architecture for designing botnet. Hence, we need to develop P2P botnet detection
method to detect P2P botnets within IoT.
References
1. Ullas S, Upadhyay S, Chandran V, Pradeep S, Mohankumar TM (2020) Control console

of sewage treatment plant with sensors as application of IOT. In: 2020 11th international
conference on computing, communication and networking technologies (ICCCNT). IEEE, pp
1–7
2. Mahjabin T, Xiao Y, Sun G, Jiang W (2017) A survey of distributed denial-of-service attack,
prevention, and mitigation techniques. Int J Distrib Sens Netw 13(12):1550147717741463
3. Vinayakumar R, Soman KP, Poornachandran P, Alazab M, Jolfaei A (2019) DBD: deep learning
DGA-based botnet detection. In: Deep learning applications for cyber security. Springer, Cham,
pp 127–149
4. Thejiya V, Radhika N, Thanudhas B (2016) J-Botnet detector: a java based tool for HTTP
botnet detection. Int J Sci Res (IJSR) 5(7):282–290
5. Džaferović E, Sokol A, Almisreb AA, Norzeli SM (2019) DoS and DDoS vulnerability of IoT:
a review. Sustain Eng Innovation 1(1):43–48
6. Harun Babu R, Mohammed, Soman KP (2019) RNNSecureNet: recurrent neural networks for
cyber security use-cases. arXiv e-prints (2019): arXiv-1901
7. Vinayakumar R, Soman KP, Prabaharan Poornachandran (2017) Applying deep learning
approaches for network traffic prediction. In: 2017 international conference on advances in
computing, communications and informatics (ICACCI). IEEE, pp 2353–2358
8. Torres P, Catania C, Garcia S, Garino CG (2016) An analysis of recurrent neural networks
for botnet detection behavior. In: 2016 IEEE biennial congress of Argentina (ARGENCON).
IEEE, pp 1–6
9. Sriram S, Vinayakumar R, Alazab M, Soman KP (2020) Network flow based IoT botnet
attack detection using deep learning. In: IEEE INFOCOM 2020-IEEE conference on computer
communications workshops (INFOCOM WKSHPS). IEEE, pp 189–194
10. Ahmed AA, Jabbar WA, Sadiq AS, Patel H (2020) Deep learning-based classification model
for botnet attack detection. J Ambient Intell Humanized Comput 1–10
11. Yerima SY, Alzaylaee MK (2020) Mobile botnet detection: a deep learning approach
using convolutional neural networks. In: 2020 international conference on cyber situational
awareness, data analytics and assessment (CyberSA). IEEE, pp 1–8
12. McDermott CD, Majdani F, Petrovski AV (2018) Botnet detection in the internet of things
using deep learning approaches. In: 2018 international joint conference on neural networks
(IJCNN). IEEE, pp 1–8
13. Shakya S, Pulchowk LN, Smys S (2020) Anomalies detection in fog computing architectures
using deep learning. J Trends Comput Sci Smart Technol 1:46–55, 1 Mar 2020
14. Kim J, Won H, Shim M, Hong S, Choi E (2020) Feature analysis of IoT botnet attacks based
on RNN and LSTM. Int J Eng Trends Technol 68(4):43–47, Apr 2020
15. Hwang R-H, Peng M-C, Nguyen V-L, Chang Y-L (2019) An LSTM-based deep learning
approach for classifying malicious traffic at the packet level. Appl Sci 9(16):3414
16. Bhuvaneswari Amma NG, Selvakumar S (2020) Anomaly detection framework for Internet
of things traffic using vector convolutional deep learning approach in fog environment. Future
Gener Comput Syst 113: 255–265
17. Parra GDLT, Rad P, Choo K-KR, Beebe N (2020) Detecting Internet of Things attacks using
distributed deep learning. J Net Comput Appl 163:102662
18. Kumar V, Garg ML (2018) Deep learning as a frontier of machine learning: a review. Int J
Comput Appl 182(1):22–30, July 2018
19. Apaydin H, Feizi H, Sattari MT, Colak MS, Shamshirband S, Chau K-W (2020) Compara-
tive analysis of recurrent neural network architectures for reservoir inflow forecasting. Water
12(5):1500
20. Yang S, Yu X, Zhou Y (2020) LSTM and GRU neural network performance comparison
study: taking yelp review dataset as an example. In: 2020 international workshop on electronic
communication and artificial intelligence (IWECAI). IEEE, pp 98–101
21. Dataset link: https://drive.google.com/drive/folders/148XD5gU7cAIlOGzF98N2uC42Bf74-
LID?usp=sharing
Biomass Energy for Rural India:
A Sustainable Source
Namra Joshi
Abstract Energy plays a crucial role in the social-economic development of

developing nations like India. To address the issues like depletion of fossil fuels
and increasing concern toward environmental pollution, the Government of India
promoting the use of renewable energy sources which are clean and green. Biomass
energy is a type of effective source of energy. This paper focuses on the Indian poten-
tial of biomass energy and grid integration opportunities for biomass energy-based
plants. Various methodologies are also addressed to utilize biomass energy at a major
level.
Keywords Renewable energy · Biomass energy · Grid ınterconnection
1 Introduction
India is a developing nation, and the population is getting rise year by year. As per
the census of the year, 2011 the Indian population is 1.21 billion, and it is expected
to rise by 25% by the year 2036. With such a rise in population, the power demand
is increasing tremendously [1]. In the upcoming two decades, worldwide power
consumption will be rise by 60–70%. According to the world outlook magazine, India
will have peak energy demand and to fulfill the same emissions will also increase.
India is looking toward clean sources of energy, i.e., renewable sources of energy. It
is having around 17% of the entire GDP. The energy sources which can renew again
are termed as Renewable Source of Energy. Renewable energy sources like Solar
[2], Wind [3], Geothermal, etc., include any type of energy obtained from natural
resources that are infinite or constantly renewed. The classification of renewable
energy sources is illustrated in Fig. 1. India is about to achieve the aim of 10 GW
bioenergy-based generation by the year 2022. The position of India is fourth in
renewable energy capacity. Government of India promoting waste to energy program
with the help of financial support from the ministry of petroleum and natural gas.
N. Joshi (B)
Department of Electrical Engineering, SVKM’s Institute of Technology, Dhule, India
https://doi.org/10.1007/978-981-16-7610-9_64
886 N. Joshi
Fig. 1 Types of renewable

energy sources Geothermal
Energy
Biomass Hydel
Enery Energy
Renewable
Energy
Wind Solar
Energy Energy
Agriculture plays a very crucial role in the Indian Economy. India has 60.43%
of agricultural land. Agriculture waste material can be used for power generation
through the biomass-based plant. In India, MNRE [4] is promoting biomass-based
power plants and cogeneration plants. The basic target is to extract as much power
possible from the sugarcane bagasse, agricultural waste. A special scheme was intro-
duced in the year 2018 by the ministry to promote such type of generation. The esti-
mated potential is around 18,000 MW. As per MNRE annual report, 2020–21 more
than 550 cogeneration plants are installed in India upto December 2020 [5]. The major
states having the potential of such type of generation are Maharashtra, Chhattisgarh,
Tamil Nadu, Uttar Pradesh, West Bengal, Punjab, and Andhra Pradesh. In the rural
areas which are located away from the central grid, biomass-based power plants are
a very good option for such sites. Currently, around 200 biomass-based plants with
a capacity of 772 MW are installed in the country till now. The MNRE has launched
New National Biogas and Organic Manure Program (NNBOMP), for promoting
Biogas Plant installation in rural India. As of 3.June.2021 installed capacity of such
type of power plant in India is 10170 MW. A major contributor to achieve the set
target of bioenergy is sugar mills bagasse-based plants [6]. The source of generation
percentage-wise in January 2021 is illustrated in Fig. 2 [7].
Fig. 2 Generation in India Others

from RES in Jan 2021 2% Wind
31%
Solar
42%
Small Bagasse
Hydro Biomass17%
5% 3%
Biomass Energy for Rural India: A Sustainable Source 887
2 Biomass-based power plant
Biomass is a very effective source of energy. The wood garbage, crops, agricultural
waste are categories as biomass as shown in Fig. 3. It can be transformed into energy-
rich fuel either chemically or biochemically. The energy can be extracted through
various methods as illustrated in Fig. 4. Either we can adopt a dry process or a wet
process for extraction of energy from biomass. The dry process is further classified
into pyrolysis and combustion, whereas the wet process is further classified into
Fig. 3 Types of biomass
Biomass Energy
Dry Process Wet Process
Anerobic
Digestion
Pyrolysis
Gasification
Combustion
Fermentation
Fig. 4 Methods of extracting energy from biomass

888 N. Joshi
anaerobic digestion, gasification and fermentation [8]. The energy obtained from
biomass can be further utilized to either produce electrical power or heat.
2.1 Types of biomass-based power plant
The major types of biomass power generation modes are as follows:

• Combustion Based Plant: In such type of plant, biomass is feed into boiler and
steam is produced through it which is further converted into electricity. This type
of plant is having a low rung cost. It is having low efficiency at small scale and
large investment is required. It is suitable for large scale only.
• Gasification Combustion Based: In such types of plants, solid parts of biomass
are spitted into a flammable gas. Then, biomass is gasified and after that fuel gas
is burned.
• Mixed Burning Based: In this type of plant, biomass is burned in a boiler along
with coal. Its mode of operation is very simple and convenient. Low investment
is needed for such type of plant. It is most appropriate for timber biomass [9].
• Gasification Mixed Burning Based: As the name illustrates in such type of plant,
both types of biomass materials like solid and liquid biomass are used to burn in
the boiler. In such type of plant, low-energy density and liquid-based biomass are
suitable. In this system, biomass is gasified and after that fuel gas is burned with
coal in the boiler. It is having good economic advantages. Metal erosion issues
are also observed in such types of plants. It is suitable for applications like power
generation for mass biomass [10].
2.2 Working of biomass-based power plant
Biomass-based power plants are more popular in rural areas. As illustrated in Fig. 5,
for the operation of biomass-based power plants first of all we have to gather biomass
materials like agricultural waste, garbage, wood, animal dung cakes, etc. After that
suitable sorting is being carried out [11]. Once sorting is done that we treat gathered
biomass materials and make them suitable to go through the gasification process.
After gasification, we check whether the gas obtained is suitable to run turbines or
not. If it is suitable to run turbines then we feed to turbines which in turn runs the
generator shaft and electrical power is obtained. If gas is not capable enough to drive
the turbine it is given to bio fueled engine. Engines run the shaft of the generator,
and thus, we generate the power [12].
Fig. 5 Working of biomass-based power plant

890 N. Joshi
3 Biomass-based power plant: Indian Scenario
Biomass is a very crucial source of energy particularly in rural regions. It is non-

conventional in nature, and it is competent enough to give firm energy [13]. Around
32% of the entire usage of energy is fulfilled through this source. MNRE is promoting
the use of biomass energy through several schemes. As on June 30, 2021, the installed
capacity of biomass power is 10170 MW as illustrated in Fig. 6, the installed capacity
of biomass-based independent power producers is 74% and that for bagasse based
plants is 18%, and the non-baggase plant is 8%. The Indian government is proving
a subsidy of 25 lakh for biomass bagasse cogeneration plants and for non-bagasse
cogeneration plants 50 lakh subsidy is provided. Figure 7 shows the 5 MW biomass-
based power plant located in Punjab.
Major industries that can contribute their waste for biomass-based power
generation are:
• Sugar Industries
• Corn Industries
• Palm Oil Industries
• Food Processing Industries.
3.1 Advantages of biomass-based power plant
The advantages of biomass-based power plant are as follows:
Installed Capacity of Biomass

8%
18% IPP: 1836 MW
Installed Capacity of Bagasse

Cogeneration: 7562 MW
Installed Capacity of Non-

Bagasse Cogeneration: 772
74%
MW
Fig. 6 Installed capacity of biomass power in India

Fig. 7. 6 MW Biomass power plant at Birpind, Punjab
• Reliability: The power obtained from biomass-based power plant is power

generated is reliable, and it reduces dependability on the central power plant.
• Economic Feasibility: The cost per unit generated and capital cost of biomass is
very less as compared to the thermal power plant. Thus, power generated from
such type of plant is economically competitive.
• Ecofriendly: In the biomass-based power plant the process of power generation
is environmentally sustainable.
• Local Availability: As the power generated is available locally so dependence on
foreign sources will reduce.
• Less Residue: Also cost required for disposal of residue material will be very
less.
• Employment Opportunities: It creates employability opportunities for people
residing in rural regions.
• Cost-Effective: In biomass-based power plant, the transmission cost, labor cost,
and overall running cost are low as compared to the thermal power plant.
• Residue Utilization: The waste obtained from such types of plants can be used
as organic fertilizer.
• A barrier to Pollution: It helps in minimizing water and soil pollution.
892 N. Joshi
3.2 Challenges for biomass-based power plant
Biomass-based power plants are proven to be a good option to fulfill energy needs.
But so many challenges are associated will biomass-based power plants [14]. The
major challenges associated with biomass-based power are as follows:
• Various seasoned agricultural waste is used as a biomass fuel it is quite typical to
have a constant supply of such type of biomass as agriculture depends on climatic
conditions [15].
• The cost/unit may not able to sustain throughout the year in a competitive power
market.
• The space requirement is more in such type of plant.
• It is not suitable for densely populated areas.
• It is affected by temperature variation.
4 Future Scope
The number of biomass power plants is getting increase day by day in rural areas
worldwide. Although, in India [16], we have to look for more emphasis on the usage
of such a useful mode of power generation. GoI is promoting biomass plants through
several schemes and policy framework as discussed in this paper. Several research
projects are going on to improve effective generation through biomass power plant.
The investment from outside the country will also helps to promote the biomass
plant installation. Power production through biomass plants is a nice step toward
sustainable development.
5 Conclusion
India is the second largest country for producing agricultural waste, and it is having a
very nice potential for biomass energy. As of now around 30% capacity of available
potential is used for generation. The government of India is having an excellent
policy framework to implement biomass-based power generation plants in India.
This can be concluded that in rural region of India biomass energy is proven to be a
best available option for fulfilling power requirements. As power is generated locally
so the cost required to construct huge transmission and distribution network will be
saved. At the same time, T & D losses are also minimized. The feed-in tariff will
also motivate to use of biomass-based power generation plants.
References
1. Paul S, Dey T, Saha P, Dey S, Sen R (2021) Review on the development scenario of renew-
able energy in different country. In: 2021 Innovations in energy management and renewable
resources (52042), pp 1–2
2. Khandelwal A, Nema P (2021) A 150 kW grid-connected roof top solar energy system—case
study. In: Baredar PV, Tangellapalli S, Solanki CS (eds) Advances in clean energy technologies.
Springer Proceedings in Energy. Springer, Singapore
3. Joshi N, Sharma J (2020) Analysis and control of wind power plant. In: 2020 4th ınternational
conference on electronics, communication and aerospace technology (ICECA), pp 412–415
4. Annual Report MNRE year 2020–21
5. Tyagi VV, Pathak AK, Singh HM, Kothari R, Selvaraj J, Pandey AK (2016) Renewable energy
scenario in Indian context: vision and achievements. In: 4th IET clean energy and technology
conference (CEAT 2016), pp 1–8
6. Joshi N, Nagar D, Sharma J (2020) Application of IoT in Indian power system. In: 2020 5th
ınternational conference on communication and electronics systems (ICCES), pp 1257–1260
7. www.ireda.in
8. Rahil Akhtar Usmani (2020) Potential for energy and biofuel from biomass in India. Renew
Energy 155:921–930
9. Patel S, Rao KVS (2016) Social acceptance of a biomass plant in India. In: 2016 biennial
ınternational conference on power and energy systems: towards sustainable energy (PESTSE),
pp 1–6
10. Parihar AKS, Sethi V, Banerjee R (2019) Sizing of biomass based distributed hybrid power
generation systems in India. Renew Energy 134:1400–1422
11. Sharma A, Singh HP, Sinha SK, Anwer N, Viral RK (2019) Renewable energy powered elec-
trification in Uttar Pradesh State. In: 2019 3rd ınternational conference on recent developments
in control, automation and power engineering (RDCAPE), pp 443–447
12. Khandelwal A. Nema P (2020) Harmonic analysis of a grid connected rooftop solar energy
system. In: 2020 fourth ınternational conference on I-SMAC (IoT in social, mobile, analytics
and cloud) (I-SMAC). pp 1093–1096
13. Sen GP, Saxena BK, Mishra S (2020) Feasibility analysis of community level biogas based
power plant in a village of Rajasthan. In: 2020 ınternational conference on advances in
computing, communication and materials (ICACCM), pp 385–389
14. Saidmamatov O, Rudenko I, Baier et al (2021) Challenges and solutions for biogas production
from agriculture waste in the aral sea basin. Processes 9:199
15. Ghosh S (2018) Biomass-based distributed energy systems: opportunities and challenges. In:
Gautam A, De S, Dhar A, Gupta J, Pandey A (eds) Sustainable energy and transportation.
energy, environment, and sustainability. Springer, Singapore
16. Seth R, Seth R, Bajpai S (2006) Need of biomass energy in India. Prog Sci Eng Res J PISER
18, 3(02/06):13–17
Constructive Approach for Text
Summarization Using Advanced
Techniques of Deep Learning
Shruti J. Sapra, Shruti A. Thakur, and Avinash S. Kapse
Abstract Text summarization is one of the popular fields, and a great demand is also
associated with text summarization due to a large amount of text which is available
with the Internet in the form of various social media sites, blogs, and other Web sites.
Therefore, the demand with the shortening the information is increasing for reducing
the information for various reasons. Nowadays, there are plenty of resources for the
data is available, and also there the number of tools available for reducing the amount
of information is increasing due to such a great requirement. This paper also discusses
the various types of methods and techniques which are effective in shortening the
text or information using the various advanced technology and advanced algorithms
such as deep learning, machine learning, and artificial intelligence. The advanced
algorithms and technology also work with the other technology to make a great
combination of technology which will resolve the various issues regarding the text
summarization or in other words reduction of information. The main aspect while
reducing the amount of information is that the reduced information must retain the
information which is very essential from the user or application point of view and
must maintain the consistency in the information which is available with the different
sauces.
Keywords Text summarization · Deep learning · Information · Machine learning
S. J. Sapra
Research Scholar, Department of Computer Science and Engineering, Sant Gadge Baba Amravati
University, Amravati, India
S. A. Thakur (B)
Assistant Professor, Department of Computer Science and Engineering, G H Raisoni College of
Engineering, Nagpur, India
e-mail: shruti.thakur@raisoni.net
A. S. Kapse
Head of Department, Information Technology, Anuradha College of Engineering, Amravati
University, Chikhli, India
https://doi.org/10.1007/978-981-16-7610-9_65
896 S. J. Sapra et al.
1 Introduction
Though there are various challenges associated with the reduction in the size or the
amount of information on the different social media or Web sites sources, there are
many effective methods or techniques available which can efficiently reduce the text
without changing the meaning of the information [1]. Text summarization is the
restating of the actual information of the text and makes that text as short as possible
or in other words expresses the large information in very few words or sentences as
is possible [2].
This research also studies and analyzes the various tools for maintaining the
integrity of the data or information which is to be reduced and also finds the various
parameters which must be efficiently handled while dealing with such a large amount
of data. This research mostly studies the different aspects that how the redundant
information should be removed from the main content of information and to be
replaced by short or summarized text or information [3].
The most important parameter while reducing the size of the data or information
available should be shortened; in other words, after the summarization of the text or
information, this will lead to a significant reduction in the amount of memory required
for saving the shortened content of information which is very less as compared to
the original text [4].
Deep learning will give better results as it has plenty of quality data and becomes
available to obtain from it, and this tedious circulates as the data available increases.
But, if the quality data is not available, this may result in the loss of data which may
be severe or may create damage to the whole system due to loss of useful data.
There is another great example where researchers make the deep learning system
of Google fooled by introducing errors and by changing the useful data and added
noise. Such errors are forcefully introduced by the researchers on a trial basis in
the case of image recognition algorithms. And, it is found that the performance is
greatly hampered because of the changes in the quality and quantity of data that were
introduced with the system [5].
Though there are very small data changes, it is found that the results are greatly
changed, and no change in the input data is allowed in such case, and hence, it is
suggested not to alter the input data; hence, it is has become very essential to add
some constraints to deep learning algorithms that will improve the accuracy of such
systems which will lead to great efficiency of the system performance [6] (Fig. 1).
After reduction of the text or information, different analytical methods are also
applied to judge the performance of the applied method and technology. Thus, the
performance measurement is also again the important aspect while understanding
the text summarization [7].
Many times, the text may contain data that is non-essential data, and such data
must be removed in an efficient way to reduce the text or shorten the text. There
is another data called metadata which is the important data, that is, data about the
data that must be preserved for the shortening of the text and must be represented
differently.
Constructive Approach for Text Summarization Using Advanced … 897
Fig. 1 Summarization
scenario to reduce a large
amount of data to sort data
Then, another significant parameter while reducing the text or information is the
application domain where this reduced information is to be used; this plays a very
important role as based on the application domain; the method or technology to be
used changes continuously [8].
2 Literature Survey
2.1 Automated Text Summarization Techniques
Researchers in the early days designed a system faultless depending on the neural
networks of the intelligence based on human analogy. They grouped and mixed most
of the mathematics and algorithms to create the below processes.
Researchers from the various corners of the world are also continuously making
efforts for smart techniques of the text summarization and are also very successful
in most of the cases, but still, there is more requirement of finding the still more
efficient techniques which are again more effective and efficient. This section deals
with the various studies made by the researchers in the field of text summarization
and also studies and analyzes the different advantages and disadvantages concerning
the various crucial parameters [7].
Different methods may have some disadvantages but are useful in particular
scenarios and are very useful in the various applications related to the various needs
of the users and produce some special usefulness for the different domain and might
have different characteristics of the particular method and are essential to be studied
[9].
In automated text summarization, employing machines will perform shortly
summarizing different kinds of documents using different forms of statistical or
heuristics techniques. A summary in this case is shortened form of text which may
exactly grab and refer to the most crucial and related data which may be contained
in the document or document that is being summarized. These numerous tried are
true and automated text summarization methods that are recently in application of
different domains and different fields of information.
Fig. 2 Automated text summarization approaches
This process of classifying automated text summarization methods can be done

in different ways. This article may represent these methods from the point of aspects
of summarization. In this reference, there are two different kinds of methods, namely
extractive and abstractive [10] (Fig. 2).
The new approach proposed is applicable in all the different domains of the data
and information for the use of different methods of artificial intelligence, and deep
learning algorithms are also used for better and faster output. These deep learning
algorithms are used for increasing efficiency and effectiveness which may lead to
faster operation of the proposed methods and may give better very short and abstract
data regarding the context of the information and may save a large amount of time
[11]. Different operations can be explored in different ways for the proposed methods
as shown in Fig. 3.
The method proposed is very useful in many domains and applications, and tech-
nologies are very useful and have produced very great results related to the various
parameters and constraints which are related to the close of the summarization. Thus,
the proposed method has different constraints based on the process of summarization,
and such great things also have many benefits for the specific applications associated
with the different parameters such as time of summarization, speed of summarization,
and accuracy or perfectness of the summarization as shown in Fig. 4.
The architecture addresses the different techniques of abstractive summarization
task accurate details in the source document, like dates, locations or, phone numbers,
that were often imitated erroneously in the summary [12].
Fig. 3 Automatic summarization using ranking algorithm for text summarization
Fig. 4 Graph of ROUGE

measure performance for
text summarization
A finite vocabulary prevents different patterns of words and sentences or para-

graphs like similar or repeated words and their proper names from being taken
into concern. Unnecessary repetitions of source fragments or sentences can be
often removed from the paragraph or complete information which is irrespective
of contained information (Fig. 5).
The proposed method deals with the various parameters such as time or summa-
rization, the effectiveness of summarization, efficiency of summarization, and perfor-
mance of this method has been carefully analyzed based on the different crucial
Fig. 5 Seq2Seq attention model
parameters, and all parameters are important making the text useful for the show-
casing the large information in a very short space, in other words, making the text
very crucial and readable to the user and different applications also [13].
The datasets formulated for the proposed methodology were implemented on
CNN/DailyMail dataset.
4 Scope
The proposed model is very efficient in dealing with many challenges faced during
the shortening of text and forming a new short text. Thus, there is a great use of the
proposed method, and its scope is broad in a large number of business applications
also [14]. This proposed method is very suitable in the real-time applications related
to the various domains which provide numerous advantages to the user and gives
better efficiency as compared to the other studied methods for the shortening or
information which may lead to a better quality of text and can be directly used in
many sorts of application where short data or information is essential or where it
is very essential to represent the data or information in very short words. Thus,
the proposed method provides a great help in reducing the amount of redundant
information and producing meaningful information in very few words [1].
• Precision-target—Precision-target (prec(t)) does the same thing as prec(s) but
w.r.t to real summary. The acquaintance is to calculate how many entities does
model produces in the proposition summary is also measure of the real summary.
Mathematically, it is set as
prec(t) = N (h ∩ t)/N (h)

here, N(h) and N(t) refer to named entity set in the generated/premise and the real
summary, respectively.
• Recall-target—Under recall-target (recall(t)), the knowledge is to calculate how
many entities in the real summary are not present in the model generated
hypothesis summary. Mathematically, it is set as
recall (t) = N (h ∩ t)/N (t)
here, N(h) and N(t) refer to named entity set in the produced/assumption and the real
summary, respectively. To consume an individual measurable number, they merge
together prec(t) and prec(s) and signify as F1-score. Mathematically, it is set as
F1 = 2 prec(t) recall(t)/(prec(t) + recall(t))
So, these mathematical terms were used:

The following formula is found very beneficial for minimizing the various large
words or repetitive sentences which have the same significance, and the final text
is very short, crucial, and informative regarding the perspective of the different
applications.

L tcoverage := min ait , cit
i
The mathematical terms are initiated to be very useful and essential for the use of
summarization of different types of text or information.
L t := L tM L + λL tcoverage .
Classification can be done on different datasets which would be useful for gener-
ating the different usage patterns of the necessary information and can be grouped in
the cluster for the better quality of data for the application and are also useful in the
representation of different notations used in a specific domain of application [15].
Training ROUGE ROUGE2 ROUGE Macro-precs Micro-precs Macro-prect Micro-prect Macro-recallt Micro-recallt Macro-F1t Micro-F1t
902
data 1 L
Newsroom Original + 47.7±0.2 35.0±0.3 44.1±0.2 97.2±0.1 97.0±0.1 65.4±0.3 62.9±0.4 70.8±0.3 68.5±0.2 68.0±0.2 65.6±0.3
filtering + 47.7±0.1 35.1±0.1 44.1±0.1 98.1±0.1 98.0±0.0 66.5±0.1 63.8±0.1 70.2±0.2 67.7±0.3 68.3±0.1 65.7±0.1
classification
JAENS 47.7±0.2 35.1±0.1 44.2±0.2 98.1±0.1 98.0±0.0 67.2±0.4 64.2±0.4 70.3±0.2 67.8±0.4 68.7±0.3 65.9±0.4
46.6±0.5 34.3±0.3 43.2±0.3 98.3±0.1 98.3±0.1 69.5±1.6 67.3±1.2 68.9±1.5 66.8±1.6 69.2±0.1 67.0±0.2
CNNDM Original + 43.7±0.1 21.1±0.1 40.6±0.1 99.5±0.1 99.4±0.1 66.0±0.4 66.5±0.4 74.7+0.7 75.4±0.6 70.0±0.2 70.7±0.3
filtering + 43.4±0.2 20.8±0.1 40.3±0.2 99.9±0.0 99.9±0.0 66.2±0.4 66.6±0.3 74.1±0.6 74.9±0.6 69.9±0.2 70.5±0.2
classification
JAENS 43.5±0.2 20.8±0.2 40.4±0.3 99.9±0.0 99.9±0.0 67.0±0.6 67.5±0.5 74.7±0.2 75.5±0.1 70.6±0.3 71.3±0.3
42.4±0.6 20.2±0.2 39.5±0.5 99.9±0.0 99.9±0.0 67.9±0.7 68.4±0.6 75.1±0.7 76.4±0.7 71.3±0.2 72.2±0.3
XSUM Original + 45.6±0.1 22.5±0.1 37.2±0.1 93.9±0.1 93.6±0.2 74.1±0.2 73.3±0.2 80.1±0.1 80.3±0.3 77.0±0.1 76.6±0.2
filtering + 45.4±0.1 22.2±0.1 36.9±0.1 98.2±0.0 98.2±0.1 77.9±0.2 77.3±0.2 79.4±0.2 79.6±0.2 78.6±0.1 78.4±0.2
classification
JAENS 45.3±0.1 22.1±0.0 36.9±0.1 98.3±0.1 98.2±0.1 78.6±0.3 78.0±0.3 79.5±0.3 79.8±0.4 79.1±0.1 78.9±0.1
43.4±0.7 21.0±0.3 35.5±0.4 99.0±0.1 99.0±0.1 77.6±0.9 77.1±0.6 79.5±0.6 80.0±0.5 78.5±0.2 78.5±0.1
S. J. Sapra et al.
The result will be generated:

ts(i)

LiBIO θ (enc), x , z = −
i i
logpθ(enc) zti xi
t=1
5 Results
Results of the proposed method found to be very accurate summarization of the text
as it produces a short type of the summary; otherwise, it would be self-explanatory
and directly applicable in the application for the representation of the data which
leads to the better performance of the overall system.
6 Conclusion
This contribution is a multidimensional approach useful for the interdisciplinary

field of applications in which the proposed method demonstrates a new strategy
of generating a unique and short summary which is the need for different areas of
applications like expert summarization of a variety of text and information. This
approach also finds the best in such a complex domain in which other techniques
of summarization may not work effectively. This approach finds the most suitable
technique in the field of summarization of the text or information available in different
sources of the Internet.
7 Future Scope
It is expected that the continuous research and improvement in the proposed model
will definitely increase the usefulness of the proposed architecture in the field of
text summarization and will eventually result in a variety of utility and tools. These
strategies will also improve the effectiveness and efficiency of implementing various
methods and technologies of text summarization for fast application in different
domains. And therefore, it leads to enhanced summarization approach that will
improve the proposed method to a great extent.
References
1. Bhargava R, Sharma Y, Sharma G (2016) ATSSI: abstractive text summarization using

sentiment infusion. Procedia Comput Sci 89:404–411. https://doi.org/10.1016/j.procs.2016.
06.088
2. Zarrin P, Jamal F, Roeckendorf N, Wenger C (2019) Development of a portable dielectric
biosensor for rapid detection of viscosity variations and It’s in vitro evaluations using saliva
samples of COPD patients and healthy control. Healthcare 7(1):11. https://doi.org/10.3390/hea
lthcare7010011
3. Shruti M, Thakur JS, Kapse AS, Analysis of effective approaches for legal texts summarization
using deep learning 3307:53–59
4. Verma S, Nidhi V (2019) Extractive Summarization using deep learning, arxiv.org, v2(1) arxiv:
1708.04439
5. Roulston S, Hansson U, Cook S, McKenzie P (2017) If you are not one of them you feel out of
place: understanding divisions in a Northern Irish town. Child Geogr 15(4):452–465. https://
doi.org/10.1080/14733285.2016.1271943
6. Baxendale PB (2010) Machine-made index for technical literature—an experiment. IBM J Res
Dev 2(4):354–361. https://doi.org/10.1147/rd.24.0354
7. Allahyari M et al (2017) Text summarization techniques: a brief survey. Int J Adv Comput Sci
Appl 8(10). https://doi.org/10.14569/ijacsa.2017.081052
8. Sahoo D, Bhoi A, Balabantaray RC (2018) ScienceDirect hybrid approach to abstractive
summarization. Procedia Comput Sci 132(Iccids):1228–1237. https://doi.org/10.1016/j.procs.
2018.05.038
9. Song S, Huang H, Ruan T (2019) Abstractive text summarization using LSTM-CNN based deep
learning. Multimed Tools Appl 78(1):857–875. https://doi.org/10.1007/s11042-018-5749-3
10. Widyassari AP et al (2020) Review of automatic text summarization techniques and methods.
J King Saud Univ Comput Inf Sci xxxx. https://doi.org/10.1016/j.jksuci.2020.05.006
11. Martín C, Langendoerfer P, Zarrin PS, Díaz M, Rubio B (2020) Kafka-ML: connecting the data
stream with ML/AI frameworks. (June):1–10. [Online]. Available: http://arxiv.org/abs/2006.
04105
12. Anand D, Wagh R (2019) Effective deep learning approaches for summarization of legal texts.
J King Saud Univ Comput Inf Sci xxxx. https://doi.org/10.1016/j.jksuci.2019.11.015
13. Barzilay R, McKeown KR, Elhadad M (1999) Information fusion in the context of multi-
document summarization 550–557. https://doi.org/10.3115/1034678.1034760
14. Abualigah L, Bashabsheh MQ, Alabool H, Shehab M (2020) Text summarization: a brief
review. Stud Comput Intell 874:1–15. https://doi.org/10.1007/978-3-030-34614-0_1
15. Khatri C, Singh G, Parikh N (2018) Abstractive and extractive text summarization using docu-
ment context vector and recurrent neural networks [Online]. Available: http://arxiv.org/abs/
1807.08000
Lane Vehicle Detection and Tracking
Algorithm Based on Sliding Window
R. Rajakumar, M. Charan, R. Pandian, T. Prem Jacob, A. Pravin,

and P. Indumathi
Abstract Lane vehicle detection is fundamental to vehicle driving systems and self-
driving. The proposed concept is to employ the pixel difference in the intended lane
line backdrop to isolate the lane and the road surface, and then, the curve fitting model
is used to identify the lane in the image. A histogram on gradient, histogram graph,
and binary spatial features are extracted from the vehicle and non-vehicle images.
For vehicle detection, support vector machine classifier is employed to separate the
vehicle and non-vehicle images using the extracted features. But many methods are
constrained by light conditions and road circumstances, such as weak light, fog, rain,
etc., which may result in invisible lane lines. Feature extraction is the lane images
being picked using various filters. Our work focuses on a lane detection technique
founded on the Sobel filter and curve fitting model for lane line tracking in different
conditions. Preprocessing encompasses the mitigation of noise as well as getting
the image ready for the subsequent procedure. To achieve this, HLS color space
was performed which identifies the lane by adding pixel values. The main aim is to
increase the accuracy and reduce the computation time compared to other existing
methods.
Keywords Sobel filter · Curve fitting model · Lane detection · Vehicle detection ·
Sliding window · Support vector machine
1 Introduction
Most accidents occur due to invisible road lanes. The accidents can be reduced
drastically, by employing improved driving assists. A system that warns the driver
can save a lot of a considerable number of lives. To increase safety and reducing road
R. Rajakumar (B) · R. Pandian · T. P. Jacob · A. Pravin

Sathyabama Institute of Science and Technology, Sholinganallur, Chennai 119, India
e-mail: rrajakumar.maths@sathyabama.ac.in
M. Charan · P. Indumathi
Anna University, MIT Campus, Chromepet, Chennai 44, India
https://doi.org/10.1007/978-981-16-7610-9_66
906 R. Rajakumar et al.
accidents, researchers have been worked for better driving techniques that assure
security.
While driving, accidents occur due to the driver’s unawareness of the lane, specif-
ically at the curved lane which leads to accidents. Therefore, if it is possible to infer
the road and vehicles before the advent of any lane conditions, assist the driver in
advance to reduce speed and avoid road accidents by using the proposed sliding
window algorithm.
In driving assistance to achieve safety on roads, the challenging tasks are road lane
detection or boundary detection which is exposed in white and yellow lines on roads.
Many researchers are working on lane detection, lane tracking, and warning on lane
departure. Yet, many systems have limitations of shadows, changing illumination,
worse conditions of road paintings, and other image interference. This problem can
be overcome by using the proposed algorithm.
This paper developed a curve fitting model enhancing the strength of detecting
the lane and tracking for safe transportation. In our method, lane detection and
tracking will be inspected by the curve fitting model and related component function
to improvise detection of lane and tracking. The support vector machine classifier
was widely used for the detection of a vehicle.
2 Related Works
The literature [1] extracted the AROI to overcome the complexity of computation.
Then, Kalman filter along with progressive probabilistic Hough transform (PPHT)
is used to find boundaries of the lane in the image. Depending on the lane and the
position of the vehicle, their algorithm decides if the vehicle is offset. Different lane
conditions are used for detection and tracking for both city roads and highways. In
the literature [2], lane marks in road images are extracted which is based on the
multi-constraint model and a clustering algorithm is proposed to detect the lane. By
dividing the region of interest into sections, it is easy to track lane lines with curved
shapes. The literature [3] used the B-spline fitting from the RANSAC algorithm
for the front lane and Hough transform for the rear lanes. The algorithm is used
for lane detection, and it eliminates the interference lines, better than the RANSAC
algorithm. The literature [4] improved the accuracy of lane recognition and aimed
to minimize the pixel-wise difference. The predicted lane has both white and yellow
pixels that also do not directly reflect the lane parameters which are essential to detect
the straight line. To detect a lane, we avoid the interference of fixed objects and other
parameters on the outside of the lanes. After the pixels in the road area are selected
as a reorganized data matrix, for the detection of a pre-trained vehicle, a deep neural
network is employed to get the moving vehicle’s information.
The literature [5] proposed a flexible road identification method that connects
both lane lines and obstacle boundaries, applicable for detecting lanes. This algo-
rithm uses an adaptive sliding window for lane extraction using the least-squares
method for lane line fitting. The literature [6] employs a color threshold method
Lane Vehicle Detection and Tracking Algorithm … 907
to identify the lane edges along with perspective transform and Hough transform
technique to detect lane segments in the image. These conditions are a straight lane
and sunny climate. Literature [7] dealing with Vision Based methodology performs
well only in controlled weather conditions and uses Hough transform to identify the
straight road. In this paper, edge-based detection with the open street map is used
to detect lanes which increases computation time. Hough transform [8] identifies
the straight lane, and the curve fitting identifies the curved lane which increases the
computation time. In [9], vehicles are constructed by their geometry and structured
as a combination of a small image to form one histogram, similar to the sliding
window model. In the literature [10], feature pairing elimination (FPE filter) is used
for only feature extraction and SVM, random forest, and K-nearest neighbor clas-
sifiers were compared. In this lane detection, Hough transform [11, 12] is used to
detect them, but these algorithms increase the computational time and the complex
processing. It is essential to focus on the edge image derived from the response of the
CenSurE algorithm. By using the edge lane, we can identify the traffic lane which is
detected from its geometry. For identifying the blobs, an SVM classifier is used [13].
The literature [14] predicts the position of the vehicle by using the Kalman filter
and histogram technique with a mean detection accuracy of 94.05% during the day.
Particle-filter-based tracking was used to learn some road scene variations in [15].
3 Lane Detection Algorithm
Lane detection is used in image processing and computer vision that has many
applications. Previous literature in lane detection, dealt with curve detection methods.
These algorithms detect the lane edges and are used to determine the vehicle position
in the lane. A dataset captured using a single monocular camera is used for lane
detection. This work contributes to the correct position of the vehicle in the same
lane. This system recognizes most of the white and yellow markings across the lane
effectively during different climatic conditions which include shadows, rain, snow,
or any damage on the road.
The lane detection algorithm includes lane detection and lane tracking method.
By changing the input parameters, regions of interest are identified. Both perspective
and inverse perspective transforms were performed on the lanes to detect the region
of interest. In the next step, detected lanes are analyzed by the Sobel filter and the
future lanes are calculated using the polynomial curve model.
In this section, we will explain lane detection by using the HLS method and Sobel
filter method edge detection and that examine the results by using this given method in
Chap. 4. The following procedures are performed to detect the lane. The preprocessed
image is perspective transformed which converts a 3-dimensional image to a 2-
dimensional image. Then, the Sobel filter was performed for noise reduction and to
identify the pixel representing the edge. The filtered image is converted into HLS
colour space and its components (hue, lightness and saturation) to detect the yellow
lane were identified. To detect the white lane, maximum lightness value of 100% was
Captured Lane Image
Perspective transform
Sobel filter and HLS color transformation
Sliding window to detect the lane
Inverse perspective transform
Detected Lane Image
Fig. 1 Lane detection and tracking flowchart
selected. The histogram is computed to separate the left and right lane by summing
the pixel value and select the maximum pixel which identifies the lane.
The sliding window method is applied from the bottom of the image by identifying
lane pixels. The next upward sliding window is constructed based on the previous
window. Then, the polynomial fit to find both lanes using the previous lane is used
to estimate the search area for the next frame. Eventually, the fitted lane is etched on
the original image and an inverse perspective transform is performed.
3.1 Lane Detection and Tracking Flowchart
See Fig. 1.
3.2 Sobel Edge Detection
The Sobel filter performs by estimating the image intensity at every pixel of the lane.
Sobel filter estimates the direction of the change in light for any direction. Figure 2
shows how the lane image changes at each pixel and how the pixel representing edges
changes.
The Sobel filter has two 3 × 3 kernels: one kernel to identify changes in the hori-
zontal direction and another kernel to identify changes in the vertical direction. The
two kernels are combined with the original lane to calculate the equation derivatives.
Fig. 2 Sobel filter and perspective transform
By applying the threshold selected part of the image, ROI, we have a hue, lightness,
and saturation (HLS) component color image as input. In this step, to find lane
boundaries one edge detection method called the Sobel filter is used and boundaries
detected.
In the Sobel filter, the main objective is to detect the edges that are nearer to the
real lane edges. Sobel edge detection basically uses the gradient vector of an intense
image. Lane boundary features are extracted using a gradient vector and through
which we can detect the lane.
Many edge detection methods have different edge operators that can be used, but
the efficiency levels are different. One of the best and efficient methods is Sobel edge
detection.
The novel feature of the Sobel method is that the error rate of this method is low
because this algorithm uses a double threshold for a yellow and white lane. Therefore,
the detected edge is close to the real-world lane.
In the next step, the captured color image is transformed to HLS color space
to speed up the process and be less sensitive to scene conditions. To detect the
white lane, lightness is set to a value close to 100%. Then, the combination of
saturation and lightness value was defined to detect the yellow lane. In our proposed
method, captured images chose from the directory of the Xi’an city database would
be processed. The camera is so calibrated that the vanishing point of the road should
be placed on the top of the region of interest (ROI).
3.3 Histogram Computation
A histogram contains the numerical value of an image. Information obtained from the
histogram is very large in quality. The histogram indicates the particular frequency
Fig. 3 Histogram
computation
of different gray levels in a lane. Lane images contain a series of pixel values, and
a different pixel value includes a specific color intensity value. This is an important
step in the segmentation, and computation time will be decreased. At the lower level
of the lane, the only lane will be present. When we scan up in the image, other
structures will be present. For all the bands, a peak will be drawn. The histogram
contains a sum of pixel values horizontally from that left and the right lane that can
be identified which contains a larger pixel value which is shown in Fig. 3.
4 Lane Tracking Algorithm
Lane tracking is mainly employed to overcome the computation calculation by storing

the information of estimate of the future states. This algorithm includes a prediction
step as well as a measurement step. In the case of lane tracking, the prediction stage
involves shifting the detected lanes by a specific amount in the image, based on
polynomial fit. In the measurement step, the radius of curvature and vehicle offset
were computed. Much research has already been done in lane tracking. The most
efficiently used lane tracking is the curve fitting model.
Lane tracking contains information from the past state to estimate the current
detection. Lane algorithm is efficient for lane tracking method. If the lane is detected,
it is identified by points on both lane boundaries. Lane tracking contains previously
identified lane points, changes them depending on the vehicle projection, and then
alternates points based on the values of the left and right edge points on the lane.
A curve fitting model was selected for efficient tracking. Another algorithm
requires less computational time and is less vulnerable to distortion. Equations with
more degrees can provide a correct fit of the lane. In this work, a curve fitting
model that is applicable to trace curved roads, vigorous during noise, shadows, and
weak lane markings is employed. Also, it can give details about lane orientation and
curvature. Lane tracking involves two parameters which will be discussed in the next
section.
4.1 Sliding Window
To construct the sliding window, the initial point of the windows must be known. To
find the initial point, a histogram for the bottom part of the image is calculated. Based
on the peak value of the histogram, the initial window is selected and the mean of
the nonzero points inside the window is determined. For the first half of the image,
the left lane peak is obtained and the other right half gives the peak of the right lane.
Thus, left and right starting sliding windows are formed, and then, left lane center
and right lane center are calculated. This kind of selection works fine for both lanes
on the left and right sides of the image.
In some cases, for example, where the vehicle is gradually steered more toward
the right, then we might see the right lane present in the left half. In such situations,
improper detection is possible. To avoid such situations, a variable cache is defined
to save the starting point windows of previous lanes. The histogram is not calculated
throughout the detection process but only for the first few frames, and later, it will
be dynamically tracked using the cache. For each initial sliding window, the mean
of the points inside each window is calculated. Two windows to the left and right of
the mean point and three more windows on top of the mean point are selected as the
next sliding windows. This kind of selection of windows helps to detect the sharp
curves and dashed lines. The selection of sliding windows is shown in Fig. 4.
The window width and height are fixed depending upon the input dataset. The
width of the sliding window should be adjusted depending on the distance between
both lanes. The sliding windows on top help track the lane points turning left and
right, respectively. The windows need to have a relatively well-tuned size to make
sure the left- and right-curved lanes are not tracked interchangeably when lanes have
a sharp turn and become horizontally parallel to each other. The detected points inside
the sliding window are saved. The process of finding the mean point and next set of
sliding windows based on valid points inside the respective sliding windows for left
and right lanes is continued until no new lane points are detected. Points detected
in the previous sliding windows are discarded when finding points in the next set
of sliding windows. Then, the searching can stop tracking when no new points are
discovered.
Fig. 4 Sliding window output
4.2 Polynomial Fit Curve Lane
Once the left and right points are detected, these points are processed to polynomial
fitting to fit the respective lanes. Average polynomial fit values of the past few frames
are used to avoid any intermittent frames, which may have unreliable lane informa-
tion. The lane starting points are retrieved from the polynomial fitting equation. This
approach helps increase the confidence of the lane’s starting point detection based
on lanes rather than relying on starting sliding windows. The deviation of the vehicle
from the center of the lanes is estimated. Then, the image is inverse perspective trans-
formed, and the lanes are fitted onto the input image. The sliding window output is
shown in Fig. 4.
4.3 Lane Design Parameters
The curve model is obtained for the lane curve, and the quadratic equation is imple-
mented to analyze and compare the model’s merits and demerits of the different
structures.
The equation of the curve model is given as
Ax 2 + Bx + C = 0 (1)
where A, B, C are the given constants of the quadratic curve, five of which three
constants of the quadratic curve are thus stated.
5 Vehicle Detection and Tracking Algorithm
5.1 Vehicle Detection Algorithm
The vehicle detection is implemented through the support vector machine classifier.
To extract features, histogram-oriented gradient (HOG), histogram, and binary spatial
were performed on training images and input images. Then, the processed image is
converted into YCbCr color space transformation which increases the brightness.
The training input images are fed into the SVM network. This model performs
normalizing the data to the same scale approximately. GTI vehicle image dataset
comprises 8792 vehicle images and 8968 non-vehicle images that are trained to the
SVM classifier and stored in a pickle file.
For vehicle detection, a sliding window technique is performed at each pixel
level and a trained classifier is used to search for vehicles in images. After training
is completed, the support vector machine classifier is applied to the lane images.
From [13], support vector machine classifier is a simple and efficient technique for
classifying vehicles based on the features. To eliminate the false positives, the heat
map function of a higher threshold value was selected. This algorithm was simulated
using PyCharm software.
5.2 Histogram on Gradients
A histogram on gradients is a depiction of an image that simplifies it by taking away

important information. The histogram-oriented gradient (HOG) is employed in image
processing to detect objects. The gradient technique is used to count the number of
gradient orientations in every image position. Vehicle appearance and shape can be
found by detecting the position of edges. Figure 6 shows the HOG feature extracted
output.
5.3 Histogram Graph
Histogram graph computes the summation of pixels in an image at every different

intensity value located in the vehicle image. A color histogram relates the color level
of every color channel. The luminance histogram shows the brightness level from
black to white. The maximum peak on the graph shows the presence of maximum
pixels at that luminance level. Figure 7 shows the histogram output.
5.4 Binary Spatial Feature
Our technique encodes the spatial variation among the referenced pixel and its
neighbor pixels, which depends on the gray abrupt changes of the horizontal, vertical,
and oblique directions. The difference between the center pixel and its surrounding
neighbors is calculated to mirror the amplitude information of the entire image. We
used a support vector machine classifier that uses space information to classify the
lane images, and also necessary features are identified for each pixel in this method.
Then, the features are quantized to train the support vector machine model. After
then, the resulting regions are modeled using the statistical summaries of their textural
and shape properties than the support vector machine model used to calculate the
classification maps. Figure 8 shows the binary spatial graph.
5.5 Vehicle Detection Flowchart
See Figs. 5, 6, 7, and 8.
Captured image
Features extraction
YCbCr color transformation
Support vector machine classifier
Sliding window to detect vehicle
Detected vehicle Image
Fig. 5 Vehicle detection flowchart

Fig. 6 HOG features extraction
Fig. 7 Histogram graph
Fig. 8 Binary spatial graph
5.6 Support Vector Machine
A support vector machine classifier is a machine learning approach that enables two
separate classifications [11]. The SVM classifier includes a set of labeled training
data provided by the individual category and used to classify the vehicle. Support
vector machine algorithm employs a hyperplane in N-dimensional space that in turn
classifies the data points. The support vector machine or separator’s large margin
is supervised learning methods formulated to solve classification problems. SVM
technique is a way of classification of two classes that separate positive values and
negative values. An SVM method is based on a hyperplane that separates the different
values, so the margin will be almost maximum. The purpose of the SVM includes the
selection of support vectors that contain the discriminate vectors, and the hyperplane
was estimated.
5.7 Heat Map Function
In the given image, overlaps are detected for each of the two vehicles, and two
frames exhibit a false positive detection on the center of the road. We intend to build
a heat map combining overlapping detections and removing false positives. For this
purpose, a heat map of a higher threshold limit is used.
6 Experimental Result
6.1 Lane Vehicle Detection for Video Frames from Xi’an City
Dataset
The present section details the experimental results of our lane vehicle detection
method with two sets of various video frames obtained from the Xi’an city dataset.
Frames in this dataset have shadows from trees and cracks on the surface of the
roads. Figure 9a, b, c, d shows some sample frames marked with lanes and vehicles
for the dataset. When all frames in the dataset are processed, we see that our holistic
detection and tracking algorithm has 95.83% accuracy in detecting the left lane and
vehicle.
The proposed sliding window model was tested using a dataset with different
driving scenes to check the adaptiveness and effectiveness. The results showed that
the proposed sliding window algorithm can easily identify the lanes and vehicles
in various situations and it is possible to avoid wrong identifications. In analyzing
the parameters, different window sizes were found to make improvements on the
performance of lane and vehicle detection.
6.2 Time Calculation of Xi’an City Database
To assess the computational complexity of the proposed hybrid lane vehicle detection
and tracking algorithm, we first computed the time required to fully process a single
frame of size (1280 × 720). For a (1280 × 720) frame, processing time was found to
be around 3 to 4 s/frame. To operate in real time, time for calculation is an important
parameter (Table 1).
Table 1 Time calculation

Images Computation time (seconds)
Figure 9a 3.64
Figure 9b 3.75
Figure 9c 3.79
Figure 9d 3.45
Fig. 9 a Output frame of Xi’an city database. b Output frame of Xi’an city database. c Output
frame of Xi’an city database. d Output frame of Xi’an city database
Table 2 Accuracy
Performance (%) Dataset
calculation comparison
Total frames 975
MLD 1.8
ILD 2.37
Accuracy 95.83
Table 3 Accuracy
Source Accuracy (%)
comparison
Literature [8] 93
Literature [10] 95.35
Literature [13] 94.05
Proposed algorithm 95.83
6.3 Accuracy Formula

Missed Lane Vehicle Detection, MLD = ((MD/N ) ∗ 100%)
Incorrect Lane Vehicle Detection, ILD = ((ID/N ) ∗ 100%)
Detection Rate, DR = (C/N ) ∗ 100%
where MD denotes the detection that had a miss, ID indicates the incorrect detection,
C was the images detected correctly in the dataset, and N denotes the total number
of dataset images.
6.4 Accuracy Calculation
GTI vehicle image dataset comprises 8792 vehicle images and 8968 non-vehicle
images that are trained in the SVM classifier, and accuracy was calculated as 99%.
In this paper, different scenes are selected from the video as samples to test the accu-
racy. A sequence of 975 frames was tested, 934 lane vehicle frames were correctly
identified, and 95.83% accuracy was obtained by curve fitting model (Tables 2 and
3).
Lane vehicle detection and tracking is an important application to reduce the number
of accidents. This algorithm was tested under different conditions to render the
transport system very strongly and effectively.
As in the case of lane detection, we described and implemented the HLS color
space and edge detection by using the Sobel filter. Then, we analyzed the curve fitting
algorithm for efficient lane detection.
For vehicle detection and tracking, support vector machine classifier and sliding
window techniques were performed. For our dataset, accuracy was calculated as
95.83%. This algorithm computation time was calculated as 3–4 s/frame.
In the future, we will improve the lane and vehicle detection system by reducing
the computation time in the proposed algorithm. In this approach, the detected lanes
and vehicles can be efficient in real time. This algorithm can be further developed
for self-driving vehicles.
References
1. Marzougui M, Alasiry A, Kortli Y, BailI J (2020) A lane tracking method based on progressive
probabilistic Hough transform. IEEE Access 8:84893–84905, 13 May 2020
2. Xuan H, Liu H, Yuan J, Li Q (2018) Robust lane-mark extraction for autonomous driving under
complex real conditions. IEEE Access, 6:5749–5766, 9 Mar 2018
3. Xiong H, Yu D, Liu J, Huang H, Xu Q, Wang J, Li K (2020) Fast and robust approaches for lane
detection using multi-camera fusion in complex scenes. IET Intell Trans Syst 14(12):1582–
1593, 19 Nov 2020
4. Wang X, Yan D, Chen K, Deng Y, Long C, Zhang K, Yan S (2020) Lane extraction and
quality evaluation: a hough transform based approach. In: 2020 IEEE conference on multimedia
information processing and retrieval (MIPR), 03 Sept 2020
5. Li J, Shi X, Wang J, Yan M (2020) Adaptive road detection method combining lane line and
obstacle boundary. IET Image Process 14(10):2216–2226, 15 Oct 2020
6. Stević S, Dragojević M, Krunić M, Četić N (2020) Vision-based extrapolation of road lane lines
in controlled conditions. In: 2020 zooming innovation in consumer technologies conference
(ZINC), 15 Aug 2020
7. Wang X, Qian Y, Wang C, Yang M (2020) Map-enhanced ego-lane detection in the missing
feature scenarios. IEEE Access 8:107958–107968, 8 June 2020
8. Wang H, Wang Y, Zhao X, Wang G, Huang H, Zhang J (2019) Lane detection of curving
road for structural high-way with straight-curve model on vision. IEEE Trans Veh Technol
68(6):5321–5330, 26 Apr 2019
9. Vatavu A, Danescu R, Nedevschi S (2015) Stereovision-based multiple object tracking in traffic
scenarios using free-form obstacle delimiters and particle filters. IEEE Trans Intell Trans Syst
16(1):498–511
10. Lim KH, Seng KP, Ang LM et al (2019) Lane detection and Kalman-based linear parabolic lane
tracking. In: International conference on intelligent human-machine systems and cybernetics,
pp 351–354
11. Kang DJ, Choi JW, Kweon IS (2018) Finding and tracking road lanes using line-snakes. In:
Proceedings of the conference intelligent vehicles, pp 189–194
12. Wang Y, Teoh EK, Shen D (2014) Lane detection and tracking using B-snake. Image Vis
Comput 22(4):269–280
13. Cortes C, Vapnil V (2020) Support vector networks. Mach Learn 20(3):273–297
14. Zhang X, Huang H (2019) Vehicle classification based on feature selection with anisotropic
magnetoresistive sensor. IEEE Sens J 19(21):9976–9982, 15 July 2019, 1 Nov 2019
15. Gopalan R, Hong T, Shneier M et al (2019) A learning approach toward detection and tracking
of lane markings. IEEE Trans Int Transp Syst 13(3):1088–1098
A Survey on Automated Text
Summarization System for Indian
Languages
P. Kadam Vaishali, B. Khandale Kalpana, and C. Namrata Mahender
Abstract Text summarization is the process of finding specific information after

reading the document text and generating a short summary of the same. There are
various applications of text summarization. It is important when we need a quick
result of information instead of reading the whole text. It has become an essen-
tial tool for many applications, such as newspaper reviews, search engines, market
demands, medical diagnosis, and quick reviews of the stock market. It provides
required information in a short time. This paper is an attempt to summarize and
present the view of text summarization for Indian regional languages. There are two
major approaches of automatic text summarization, i.e., extractive and abstractive
that are discussed in detail. The techniques for summarization ranges from struc-
tured to linguistic approach. The work has been done for various Indian languages,
but they are not so efficient at generating powerful summaries. Summarization has
not yet reached to its mature stage. The research carried out in this area has experi-
enced strong progress in the English language. However, research in Indian language
text summarization is very few and is still in its beginning. This paper provides the
present research status or an abstract view for automated text summarization for
Indian languages.
Keywords Automated text summarization · Natural language processing (NLP) ·

Extractive summary · Abstractive summary
P. K. Vaishali (B) · B. K. Kalpana · C. N. Mahender

Department of Computer Science and I.T, Dr. Babasaheb Ambedkar Marathwada University,
Aurangabad, Maharashtra, India
C. N. Mahender
e-mail: cnamrata.csit@bamu.ac.in
https://doi.org/10.1007/978-981-16-7610-9_67
922 P. K. Vaishali et al.
1 Introduction
The need for automatic summarization increases as the amount of textual information
increases. Unlimited information is available on the Internet, but sorting the required
information is difficult. Automated text summarization is the process of developing
a computerized system that has the ability to generate an extract or abstract from
an original document. It presents that information in the form of a summary. The
need for summarization has increased due to unlimited sources. Summarization is
useful in information retrieval, such as news article summary, email summary, mobile
messages, and information of businesses, offices and for online search, etc. There
are numerous online summarizers accessible, such as Microsoft News2, Google1,
and Columbia Newsblaster3 [1]. For biomedical summarizing, BaseLine, FreqDist,
SumBasic, MEAD, AutoSummarize, and SWESUM are utilized [2]. Online tools
include Text Compacter, Sumplify, Free Summarizer, WikiSummarizer, and Summa-
rize Tool. Open-source summarizing tools include Open Text Summarizer, Clas-
sifier4J, NClassifier, and CNGL Summarizer [3]. As the need for knowledge in
abstract form has grown, so has the necessity for automatic text summarization.
The first summarizing method was introduced in late 1950. The automatic summa-
rizer chooses key sentences from the source text and condenses them into a concise
form for the general subject. It takes less time to comprehend the information of a
huge document [4]. Automatic text summarization is a well-known application in
the field of Natural Language Processing (NLP). The majority of the work in this is
focused on sentence extraction and statistical analysis. However, recent study trends
are focusing on cue phrases and discourse structure. Text summarizing is widely clas-
sified into two types: extractive summarization and abstractive summarization. The
extractive approach takes essential lines or phrases from the original text and puts
them together to provide a summary that retains the original meaning [5]. Reading and
comprehending the source text is required for abstractive summarization. It employs
linguistic components and grammatical rules from the language. The abstractive
system can produce additional sentences, which improves the summary’s quality or
standard.
1.1 Need of Automatic Text Summarization in NLP
Manual summarization of the large text document is a difficult task. It also requires
more time for the summary generation. A text summarizer is an essential tool for
understanding the text and then generating the summary. For this reason, an automatic
summarizer tool is very much required to provide a quick view as a concise summary.
It is the need of the current era of information overload. The automatic summarizer
converts a large document text to its shorter form or version by maintaining its overall
content by its meaning.
A Survey on Automated Text Summarization System … 923
1.2 Challenges in Abstractive Summarization
The purpose of abstract summarization in Natural Language Processing is to provide

a short summary of a source text. The representation of the summary with suitable
linguistic components is the most difficult aspect of abstractive summarization. The
development of the structure of sentences is required with the help of appropriate
words and phrases in order to generate accurate meaning. However, representing
such a big volume of text is a challenging task and a constraint. It is feasible to
create proper representations of significant items with the help of linguistic norms
and expertise. However, in practice, language has been employed in a variety of ways
and is reliant on domain semantics in general.
1.3 Challenges in Extractive Summarization
Extractive text summarization is used to choose the most important sentences from
the original source. The relevant sentences are extracted by combining statistical and
language-dependent characteristics of sentences. Extractive summaries are chosen
in most instances around the world since they are simple to implement. The problem
with extractive systems is that the summaries are long and may contain information
that isn’t necessary for the summary. The crucial information is dispersed across the
document or in several text sections [6].
2 Literature Survey
To study all about the automatic text summarization system survey of past literature
is done to get the specific knowledge and identification of scopes in the application.
Table 1 gives a brief history of the past literature.
2.1 Types of Summarization
There are different forms of summaries required for the application. Summarizer
systems can be classified as per the type of summary requiremnent for the application.
There are two types of summarizer systems: extractive and abstractive. The table
below summarizes the key concepts of extractive and Abstractive summarization in
brief (Table 2).
In addition to extractive and abstractive, there are various other types of summaries
that exist. Different summarization methods are used based on the type of summary
Table 1 Automatic text summarization systems for Indian languages
924
Author Trend of research and Technique/Methodology Dataset and features Lacuna Outcomes
language
Sivaganesan et al. [7] Social networks Interest based parallel Interactive behavior of Parallelism in large Social influence analysis
influence analysis algorithm, semantics the user is weighted networks is difficult algorithm enables
structure based, network, dynamically task identifing influential
partitioned graph with users, implementing the
page rank machines with CPU
architecture and
community structure
Valanarasu et al. [8] Summarization of social Prediction of personality Job applicants data, If job applicants are Digital footprint used for
media data for of job applicants, Naïve collection of the various non-social media prediction of people
personality prediction Bayes, and SVM dataset from different users, proposed model through communication,
using machine learning probability prediction social media sites cannot be used sentiments, emotions,
and A.I models and expectations to their
data
Sinha et al. [9] Extractive mutidocument Single doccument, Data set with 100 Lack of standard ROUGE-1 and
summarization sentence encoding, document sets, each set Malayalam NLP tools, ROUGE-2 based
(Malyalum) TextRank, MMR, with three news articles, problem in multi evaluation calculated at
sentence scoring TF-idf, Word2Vec and document 0.59, 0.56, 0.57% for
aalgorithm Smooth Inverse summarization Precision Recall F-Score,
Frequency SIF, respectively
TextRank
Malagi et al. [10] Survey on automatic text Extraction and Own data sets. Sentence Lack of It is an effort to bridge
summarization abstraction, LSA, HMM, scoring features multi-document the gap in researches in
SVM, DT models, summarizers due to the development of text
clustering algorithms tools and sources summarizers
(continued)
P. K. Vaishali et al.
Table 1 (continued)
language
Manju et al. [11] Abstractive text Abstractive machine Structure, semantic Predefined structures Semantic approach
summarization for learning approaches, features, word may not result in a improves better analyzed
sanskrit prose graph-based method signicance, compounds coherent or usable summary
and sandhis, verb usage summary
diversity
Verma et al. [12] Extractive and Stop words list, stemmer, 100 news articles, NLP Unavailability of NLP tools are essential
abstractive NER sentiment analyzer, tools as stemmer, PoS resource for Language for summarizing the text
summarization methods wordNet, word vector, tagger, parser, named understanding and accurately
and NLP tools for Indian segmentation rules, entity recognition generation
languages corpus system, etc
Mamidala et al. [13] Automatic text Extraction and Own datasets, sentence Extractive summaries Combination of the
summarization abstraction, Text Rank length, title similarity, are not convenient, preprocessing and
techniques, text mining, Algorithm, TF and IDF, semantic similarities in abstractive sometimes processing techniques
OCR, K-Nearest sentences, ANN, Fuzzy not able to represent could give good model
A Survey on Automated Text Summarization System …
Neighbor and Naïve logic meaning for all relevant features

Bayes Classifier,
Sarkar et al. [14] A light-weight text Extraction, extrinsic Own corpus of Domain knowledge A ROUGE-L F1-score of
summarization system evaluation techniques, evidence-based medicine essential 0.166
for fast access to medical MMR with 456 queries.
evidence (English) similarity-based and
structural features
Bhosale et al. [15] Automatic keyword Keyword extraction Online e-newspaper Limited to Marathi Average article length
extraction for algorithm, article, highest scored language calculated at average of
e-newspaper text summarization module words 30% to 40% size of
(Marathi) article
(continued)
925
Table 1 (continued)
926
language
Rathod [16] News articles Extractive, Text rank for Own collection of news Language specific File1 Score is 0.84
summarization (Marathi) sentence extraction, article, similarity-based domain dependent File 2 Score is 0.56
Graph-based ranking features Average ROUGE-2 score
model 0.70
Mohamed et al. [17] Document clustering LSA, Multiple document Own database, word Language specific Clustering of Tamil text.
(Tamil) summarization, weight, sentence feature, Gives good results to
similarity, clustering length, position, generate cohesive
centrality, proper nouns summaries
Dalwadi, et al. [18] Text summarization Extractive and Various own designed Current systems not so Study concludes most
using fuzzy logic and abstractive document dataset efficient to produce researchers used
LSA summarization summary rule-base approaches
techniques
Kanitha et al. [19] Comparison of extractive Word and phrase Own datasets, sentence Comparin manual Domain independent
text summarization frequency algorithm, ranking methods summary with generic summary. LSA
models Machine learning, machine sumary not based systems
HMM, Cluster-based appropriatt summarize the large
algorithm datasets within the
limited time
Gaikwad et al. [20] Text summarization Abstractive as well Researcher used own Aabstraction requires Study gives all about text
overview for Indian extractive approaches datasets, news articles, more learning and summarization. with its
languages story docum-ents, reasoning importance
linguistic, statistical
features
(continued)
Table 1 (continued)
language
Sarda et al. [21] Text summarization Neural network model, Own document Difficulties in Neural Numerical data feature
using neural networks back propagation collections, sentence network training, and rhetorical structure
and Rhetorical structure technique, rhetorical ranking, clustering theory helps to select
theory structure theory highly ranked summary
sentences
Gulati et al. [22] Study of text Machine learning Most of the researchers Separation of Two main techniques
summarization techniques, text mining used their own collection important contents extraction and
techniques algorithms and semantic of text corpus as from text is difficult abstraction studied for
technologies database text summarization
Ragunath et al. [23] Ontology based Concept extraction Own database collection, Genre specific Accuracy is calculated at
document summarization algorithm, ontology Domain specific features 87.03%
model
Deshmukh et al. [24] Query Dependent Multi document, using News document dataset, Issues regarding Study gives all detail on
Multi-Document Feature based and Clustering K-means, i.e., limitations of feature multi-document
Summarization Cluster based Method Hierarchical, partitioned and clustering summarization

algorithm
Patil et al. [25] Text summarization Summarization with Own database. Title, Transform-ation of Fuzzy logic improves
using fuzzy logic feature extraction, use of Sentence length, osition, knowledge base into quality of summary.
fuzzy rule sets numerical data, Term fuzzy rule set is Proposed model given
weight, sentence difficult task better results as
similarity features compared to online
summary
(continued)
927
Table 1 (continued)
928
language
Babar et al. [26] Text summarization Extractive Own database, direct The focus of this paper Precision of fuzzy based
using Fuzzy Logic and summarization Feature word matching and is narrow summary is 86.91%
LSA vector algorithm, Fuzzy, sentence feature feature average recall is 41.64%
Inference model score average f-measure is
64.66%
Gupta [27] Survey on summarizer Weight learning Topic identification, Lack of techniques of Study observed research
for Indian languages algorithm and regression, statistical and language text Summarization on summarization is at
(Punjabi) dependent features initial state for Indian
languages
Deshpande et al. [28] Text summarization Extractive, Own collection sentence Lack of simplification Result compared using
using Clustering Multi-document scoring, document on technique for large precision, recall and
summarization, clustering by cosine and complex sentences F-measure. Clustering
document, sentence similarity reduces redundancy
clustering by K-means
Dhanya et al. [29] Comparison of text Extractive, Tf-Idf, Own collection of Same set of sentences Feature selection is
summarization technique sentence scoring, documents, LSW in English are used for important in summary
for eight different graph-based sentence similarity weight, comparing all the generation
languages weights sentence score, features methods
Dixit et al. [30] Automatic text Feature based extraction 30 documents from news The system is tested 81% resemblance with
summarization using of important sentences based URLs. compared only with 30 news human summ-ary. And
fuzzy logic using fuzzy logic, with Copernic and MS document similarity in sentence
sentence scoring, fuzzy Word 2007 summarizer position has got 79%
inference rule resemb-lance
(continued)
Table 1 (continued)
language
Prasad et al. [31] Feature based text Extraction, sentence Own collected dataset, Limited dataset or Module with 9 and 5
summarization scoring, fuzzy algorithm, utilizes a combination of documents features has better
feature decision module nine features to achieve accuracy for precision,
feature scores of each recall and f-measure as
sentence compared to MS—Word
Jayashree et al. [32] Text summarization Extractive, key word Database obtained from Requirement of human Machine summary
using sentence ranking based summary Kannada Webdunia summary from expert compared with Human
(Kannada) news articles. GSS summary average at
coefficients and IDF, TF 0.14%, 0.11%, and
for extracting key words 0.12% for sports,
Entertainment, Religious
article respectively
Siva Kumar et al. [33] Query-based summarizer Multi-document, Newswire articles from Need of simplification Summary can be
topic-driven summarizer AQUAINT-2 IR Text techniques for very Evaluated using N-gram
sentence similarity, word Research Collections, complex d large Co-occurrences

frequency, VSM model TAC 2009 datasets sentences
Das et al. [34] Opinion summarization Extractive, single Own dataset, Theme Issue related to Result calculated at
(Bengali) document, theme identification using sentence ordering. It is precision, recall and
clustering, relational lexical, syntactic, important in F-score which is 72.15%,
graph representation discourse level features summa-rization 67.32%, and 69.65%
Agrawal et al. [35] Resources and Corpus development for Own developed corpora More language It introduced different
techniques development multi-lingual and using news articles. expertise is the techniques of corpus
for Marathi, Hindi, Multi-document Linguistic features requirement development
Tamil, Gujarati, Kannada summarization
929
Table 2 Comparison of extractive and abstractive technique [15, 20, 32]

Extractive technique Abstractive technique
Summary is a collection of extracted sentences Summary is a collection of meaningful phrases
or sentences
The extracted sentences follow the order in For the summary, new sentences or
which they have appeared in the text paraphrases are generated
It is unnecessary to develop domain knowledge It is necessary to develop domain knowledge
and features and features
It produces a summary with specific important It produces a summary with new sentences
sentences as result showing the theme of the source as a result
It is easier to produce expected results Difficult to achieve the desired results
It has a great demand for early research It has great scope in the present NLP
applications
Results are based on statistical approach Results based on linguistic and semantic
approach
It does not use a framework It uses encoder and decoder frameworks
Most of the work has been based on sentence Current research is underway to use cue
extraction and statistical analysis phrases and discourse structure
It extracts important sentences, phrases from It uses linguistic components, grammatical
the text and groups them to produce a summary rules, and significance of the language to write
the summary
It does not need interpretation It need study and analysis for the text
interpretation
It does not consist reading and understanding It consists reading and understanding of the
of the text text for its meaning
Unable to generate new sentences Ability to generate new meaningful sentences
Generated summary is not so standard it It raises the quality of the summary by
consists repeated sentences reducing sentence redundancy
The issue with extraction is coherence The issue with abstraction is the separation of
main content from the text
and applications. The below table shows the classification of summarization systems
by their categories (Table 3).
2.2 Observed Techniques for Feature Identification
Text summarizers can identify and extract key sentences from the source and group
them properly to generate a concise summary. A list of features required to select for
analysis and for better understanding of the theme. Some of the features are given
in below table that used for selection of important content from the text on which
meaning depends (Table 4).
Table 3 Text summarization classification [13, 20, 28]

Content Type Scope of the summary
Technique Supervised Training data or predefined datasets are
needed for selecting the main contents from
the document
Unsupervised No need of training data. System
automatically summarize the text
Approach Statistical It counts the sentence weights based on
various parameters. Tf-Idf, term frequency,
word count etc
Linguistic It is related to words lexical, semantic, and
syntactic features. Word dictionary, POS
tagger, word pattern, n-grams are used for
lexical analysis of words
Machine learning It has used training datasets. It is based on
linguistic features. It finds the relevance of
the sentence in the summary. Naive Bayes,
Decision Trees, HMM, Neural Networks,
SVM, etc., are used to extract relevant
sentences
Hybrid Combination of the features of statistical,
lexical, and machine learning based models
Summary information Extractive Summary consists of extracted sentences
from the source. Methods use text mining
approaches
Abstractive The summary consists of the overall
meaning or theme from the source and is
presented with new sentence generation. It
uses natural language generation tools to
derive summary
Real-time It produce a relevant real time summary.
When new contents are added in the source
summary is updated by the system
Details Informative Concise information given to the user as a
summary
Indicative Provides the main idea and a quick view of
a lengthy document
Contents Generic Summary is subject independent or generic
in nature
Query-based Summary is a result of some question
Limitations Domain/genre dependent It only accepts special input like newspaper
articles, stories, manuals, medical reports.
Summary is limited to that input
Domain independent It can accept different type of text for
summary generation
(continued)
Table 3 (continued)
Content Type Scope of the summary
Input Single document It involves summarization of single
document
Multi-document Several documents are used to summarize
at a time
Language Mono-lingual Input documents only with specific
language and output is also based on that
language
Multi-lingual It accepts documents as an input with
different languages and generate summary
in different languages
2.3 Observed Preprocessing Methods for Text Summarization
Preprocessing is a process of performing basic operation for the preparation and

simplification of data for the further processing. In this, unstructured data are trans-
formed into structured form as per of the need of summary application. Below table
gives the idea of last few years widely used preprocessing techniques (Table 5).
2.4 Observed Methods for Text Summarization
There are different types of methods implemented for the summarizer systems that
are capable of identifying and extracting the important sentences from the source
text and grouping them to generate the final summary. Tables 6 and 7 provide the
important metods of extractive and abstractive summarization, respectively.
3 Dataset
From the early days of summarization, most of the work has been done in English.
There are a number of standard datasets available in English for research like
DUC (Data Understanding Conference), CL-SciSumm (Computational Linguis-
tics scientific document summarizer), TAC Text Analysis Conference, TISPER text
summarization evaluation conference (SUMMAC). These datasets are used to test
the language and performing experimental researches. But in the case of Indian
languages, there are no proper datasets available for the researchers. Most of the
data are collected from newspapers, medical documents another source is by own
collected dataset in respective languages. On the basis of the specifications outlined
in the system, the corpus was designed by the researchers.
Table 4 Text summarization features [19, 20, 30]

Feature Description
Term frequency The number of repeated words in the sentence
Word location The find importance of the word by its position
Title word Identification of title or theme of the source
Sentence location First and last position of sentence in a paragraph is important
to be included in summary
Numerical data Presence of numerical data in the sentence
Sentence to sentence similarity For each sentence S, the similarity between S and every other
sentence is computed by the method of token matching
Sentence length It measures the size of sentences, long or very short sentences
Cue phrases Indicative words that show positive or negative sense
Upper-case word Sentences containing acronyms or proper names are included
in summary. Some languages are exception of this
Keywords Most important words from the source document
Similarity Ratio It is the similarity between the sentence and the title of the
document
Clauses The key textual elements present in the source
Paragraphs Each paragraphs is used to discuss single point
Sentence weight The ratio of the size of paragraphs
Proximity Determining factor in the formation of relationships between
entities
Sentence boundary A dot or comma or semicolon is the indicator of end of
sentence
Stop words Frequently occurring words that do not affect meaning of the
sentence
Proper noun A lexicon representing the name of a person or a place or an
organization
Thematic words Domain specific words with maximum possible relativity in
sentence
Term weight The ratio of summation of term frequencies of all terms in a
sentence over the maximum of summation values of all
sentences in a document
Information density Factors for determination of the relevance of sentences and
their meaning
Font based feature Words appearing in upper case, bold, italics or underlined
fonts are usually important
Biased word feature A word in a sentence is from biased word list, then that
sentence is important. These are domain specific words
Table 5 Preprocessing techniques for text summarization [19, 29]

Procedure Purpose
Stop word removal Removal of frequently occurring words that has less linguistical
importance in the sentence
Text validation Confirmation of the text for a required language script, correct
grammatically in spelling
Sentence selection Selection of sentences for generation of the summary
Sentence segmentation Sentences are break out into individual words
Term weight A statistical measure for the weight of the word considered for
summary
Word frequency Counting of the word for number of occurrences in the sentence
Stemming Removal of the suffixes from the word to get the stem or base word
Sentence tokenization Paragraph text are splited into number of individual sentences
Word tokenization Sentences are splited into individual tokens or words
Lemmatization Removing of suffixes to generate base word or lemma
Morphological analysis Investigation of the structure of the language or format by
linguistical, semantical aspects of the language
Text normalization Simplification of text by stemming or lemmatization for its quality
Paragraph segmentation Paragraph texts are divided into number of individual sentences
POS tagging Labeling of the words by its part-of-speech
Chapter segmentation It separates or cuts the text document into number of chapters
Set of proper nouns A collection of all the pronouns extracted from the source
Bag of words/tags [b] A vector space model in which each sentence is described as a token
and each appearance of a word is counted regardless of its order
Through the literature review, it is observed that there are various types of method-
ologies useful or followed for the development of the text summarization systems.
Figure 1 shows general architectural view for text summarization.
Most of the system generally follows some important steps to achieve the target
summary after selection of textual contents.
4.1 Major Steps for Text Summarization
The text summarization can be done by following steps as shown in Fig. 1

(a) Input Text Document—In this, input source document is given as an Input to
the system.
Table 6 Extractive text summarization methods [20, 23, 29]

Technique Description Features Example
Tf-Idf model It is based on term Term frequency, A document have 100
frequency and inverse inverse document words in this the word
document frequency frequency count for cat appears 3 times.
the sentence scoring The term frequency
(i.e., tf) for cat is then
(3/100) = 0.03. for 10
million documents and
the word cat appears in
1000 times. Then, Idf
= log (10, 000, 000
/1000) = 4. Thus, the
Tf-idf product is 0.03
* 4 = 0.12
LSA (Latent semantic Semantic Semantic features Useful for selection of
analysis) method representation of sentences on the basis
terms, sentences, or of their contextual use
documents
Neural network-based Summary by selecting Linguistic features to Create artificial
model the most important generate a semantic networks
words from the input grammatical have been modeled
sentence summary. It requires using graph theory
advanced language
modeling techniques
Fuzzy logic based Fuzzy model, fuzzy The fuzzy rules are in Model are useful to
Model[d] rules and, Triangular in the form of determine whether
membership function IF–THEN. It fuzzifies sentence is important,
at Low, Medium, and unimportant or average
high value
Query-based Model Sentences in a given Sentence features Used in frequency
document are scored extraction counts of terms
based on the query
Machine learning Training the machine Linguistic and Useful naive bayes
Model to learn using statistical features for algorithm
experience relevance of sentence
Graph theoretic model It is based on the Semantic relationship Useful for knowledge
identification of the or features representation
themes
Cluster-based model Document clusting Document clustering Cluster models
based model to generate a generally used for
meaningful summary grouping and
classification. like
k-means algorithm
Statistical-based Words, sentences are Statistical features, It is useful to find the
counted for number of word, phrase, concept relevance
occurrences keywords, and
sentence length
Table 7 Abstractive text summarization methods [20]

Techniques Description Features Advantages
Machine learning A popular method it Linguistic and It has a predefined
is to train the statistical features training dataset
machine by
experience
Topic modeling The important Statistical and Useful for discourse
information in the structure features like segmentation
text is identified position, cue phrases,
word frequency
Rule-based method This method is based Linguistic features Rule sets for correct
on grammar rules and like part-of-speech, identification of word
predefined data sets suffix removal, etc type
Fuzzy based It is based on Fuzzy inference Used as a semantic
properties of a text as rules, sentence analysis model
similarity to the title length, and keyword
similarity
Neural network model Training the neural Sentence scoring Used as vector space
networks to learn the model
types of sentences
that should be
included in the
summary
Ontology-based Domain specific It gives a cosine Useful for
systems designed by distance between the identification of the
domain experts feature vectors of the word which has high
sentence and its weight for a particular
category domain
Tree based It uses a dependency It creates nodes of Easy to generate
tree for important sentences summary
representation of text for the given text
document
Template based Template are used the Linguistic features or It enhances the quality
representation of the extraction rules of the summary
whole document matched to identify
text that mapped into
template slots
Lead and body phrase It is based on the Semantic features It is easy to identify
operations of phrases used to rewrite the the important
that have the same lead sentence sentences
syntactic head chunk
in the lead and body
sentences
Information-Item-based It is an abstract Linguistic features to System produces
method representation of generate abstract short, coherent,
source documents summary content rich summary
(continued)
Table 7 (continued)
Techniques Description Features Advantages
Multimodal semantic It is the semantic Semantic features, Represent the contents
model model it captures useful to generate an of multimodal
concepts or their abstract summary documents
relationship
Semantic graph based It summarizes a Semantic features Used for single
method document by creating document
a rich semantic graph summarization
(RSG)
Query-based It generate summary Sentence scoring Useful to generate
of text based on the based on the precise summary
query frequency counts
Preprocessing Processing
Feature Extraction
Special characters removal
Feature weight computation

Removing stop words
Input Text Theme identification

Document
Stemming
Selection of salient content
Sentence Tokenization
Compute similarity and
rank sentences
Word Tokenization
Generate Extract or
Abstract Summary
POS Tagging
Fig. 1 General architectural view of the text summarization system
(b) Preprocessing—In this, some basic operations are performed for normal-
ization of the text. It is an important for selection of text with a particular
script.
(c) Processing—In this, text is processed for selection of the text and extraction
of main important sentences using the features.
(d) Theme Identification—In this, the most important information in text is
identified. Techniques such as word position, sentence length, cue phrases,
etc.
(e) Interpretation—This is important for abstractive summarization. In this,

different techniques are employed to form a generalize content from the source
text.
(f) Generate extract or abstract Summary—In this, extract or abstract summary
is generated as an output.
5 Evaluation of the Summarization Systems
Evaluation of the summary is important to measure the performance of the text

summarization system for its quality. Intrinsic and extrinsic methods are widely used
to measure quality of the summaries. It is taken in terms of exract and abstract of the
summary. For extractive summarizers, evaluation is done by the popular measures
precision, recall, and F-score. But for abstractive summaries or content-based system
evaluation is done using Rouge or the N-grams similarity matching methods that
used the word or semtence similarity and perform comparison for human generated
summary and the machine-generated summary using the wordNet or synonyms list,
word paraphraser, dictionary tools.
5.1 Result and Discussion
For our literature studies, we used the last 10 years’ research papers and their trends.
From the study, it is observed that extraction methods are mostly used for summariza-
tion. Extraction is easier than abstraction. The results are good for extractive systems.
Today, extractive systems have good scope in the industry. Abstraction systems are
difficult to implement. The most useful features for generating a summary are word
frequency, word length, sentence scoring, sentence position, keywords and phrases,
semantics, and linguistic or structural features. Abstractive summaries are sometimes
not clear enough to express the meaning and are a challenging task for development.
5.2 Observed Issues in Text Summarization for Indian

Languages
Based on the reserach study of past literature from 2008 to 2021, it is observed that.
There are various challenges in the development of automated text summarization.
The problems have been the challenges for present technologies to resolve.
• It is a difficult task to summarize the original content by selection of significant
contents from the other text.
• No standard metric available for evaluation of the summary.
• Ambiguity of words is the main problem with Indian Languages.

• Language expertise is essential for text analysis and interpretation.
• Language variants create complexity to understand the meaning.
• Machine-generated automatic summaries would result in incoherence within the
sentences.
• Abstractive summarizers mainly depends on the internal tools to perform
interpretation and language generation.
• Abstraction requires an expert system for linguistic or semantic analysis.
• Designing a generic standard for evaluating a summary is a great challenge.
• It is difficult to achieve similarity between a machine-generated summary and an
ideal summary.
• Problem with achieving accuracy or efficiency in results due to human interface.
6 Conclusion
Text summarization is an important NLP application. It is the demand for summa-

rization of large amounts of information due to the internet and related services. It
helps to search more effectively. It is a need for professionals, marketing agencies,
government and private organizations, research students and institutions. Summa-
rization seen to be powerful to provide required information in a short time. This
paper takes into all about the details of both the extractive and abstractive approaches
along with the techniques, features, language specification for Indian languages.
Text summarization has its importance in the commercial and research field. An
abstract summary requires proper learning and linguistic reasoning. Implementation
of abstractive systems is complex than the extractive systems.
Abstraction provides a more meaningful and appropriate summary based on
knowledge as compared to extraction. Through the study, it is observed that very little
work has been done using abstractive methods in Indian languages. Research has a
lot of scope for exploring methods for more appropriate and efficient summarization.
It makes the study of automated summarization exciting and more challenging.
Acknowledgements Authors would like to acknowledge and thanks to CSRI DST Major
Project sanctioned No.SR/CSRI/71/2015 (G), Computational and Psycholinguistics Research Lab
Facility supporting to this work and Department of Computer Science and Information Technology,
Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, Maharashtra, India. Also thankful
to the SARATHI organization for providing financial assistant as a Ph. D. research fellow. I would
like to express my sincere thanks to my research guide Dr. C. Namrata Mahender (Asst. Professor)
of the Computer Science and IT Department, Dr. B.A.M.U, Aurangabad. For providing research
facilities, constant technical and moral support.
References
1. Sindhu CS (2014) A survey on automatic text summarization. Int J Comput Sci Inf Technol
5(6)
2. Reeve Lawrence H, Hyoil H, Nagori Saya V, Yang Jonathan C, Schwimmer Tamara A, Brooks
Ari D (2006) Concept frequency distribution in biomedical text summarization. In: ACM 15th
conference on information and knowledge management (CIKM), Arlington, VA, USA
3. Mashape.com/list-of-30-summarizer-apis-libraries-and-software
4. Atif K, Naomie S (2014) A review on abstractive summarization methods. J Theor Appl Inf
Technol 59(1)
5. Manne S, Mohd ZPS, Fatima SS (2012) Extraction based automatic text summarization system
with HMM tagger. In: Proceedings of the international conference on information systems
design and intelligent applications, vol 132, pp 421–428
6. Sarwadnya VV, Sonawane SS (2018) Marathi extractive text summarization using graph
based model. In: Fourth international conference on computing communication control and
automation. (ICCUBEA). 978-1-5386-5257-2-/18/$31.00 IEEE
7. Sivaganesan D (2021) Novel influence maximization algorithm for social network behavior
management. J ISMAC 03(1):60–68. http://irojournals.com/iroismac/. https://doi.org/10.
36548/jismac.2021.1.006
8. Valanarasu R (2021) Comparative analysis for personality prediction by digital footprints in
social media. J Inf Technol Digital World 03(02):77–91. https://www.irojournals.com/itdw/.
https://doi.org/10.36548/jitdw. 2021.2.002
9. Sinha S, Jha GN (2020) Abstractive text summarization for Sanskrit prose: a study of
methods and approaches. In: Proceedings of the WILDRE5–5th workshop on Indian language
data: resources and evaluation, language resources and evaluation conference (LREC 2020),
Marseille, 11–16 May 2020 European Language Resources Association (ELRA), licensed
under CC-BY-NC, pp 60–65
10. Malagi SS, Rachana, Ashoka DV (2020) An overview of automatic text summarization tech-
niques. In: International journal of engineering research and technology (IJERT), Published
by, www.ijert.org NCAIT—2020 Conference proceedings, vol 8(15)
11. Manju K, David Peter S, Idicula SM (2021) A framework for generating extractive
summary from multiple Malayalam documents. Information 12:41. https://doi.org/10.3390/
info12010041 https://www.mdpi.com/journal/information
12. Verma P, Verma A (2020) Accountability of NLP tools in text summarization for Indian
languages. J Sci Res 64(1)
13. Mamidala KK, Sanampudi SK (2021) Text summarization for Indian languages: a survey. Int J
Adv Res Eng Technol (IJARET), 12(1):530–538. Article ID: IJARET_12_01_049, ISSN Print:
0976-6480 and ISSN Online: 0976-6499
14. Sarker A (2020) A light-weight text summarization system for fast access to medical evidence.
https://www.frontiersin.org/journals/digital-health
15. Bhosale S, Joshi D, Bhise V, Deshmukh RA (2018) Marathi e-newspaper text summarization
using automatic keyword extraction. Int J Adv Eng Res Dev 5(03)
16. Rathod YV (2018) Extractive text summarization of Marathi news articles. Int Res J Eng
Technol (IRJET) 05(07), e-ISSN: 2395-0056
17. Mohamed SS, Hariharan S (2018) Experiments on document clustering in Tamil language.
ARPN J Eng Appl Sci 13(10), ISSN 1819-6608
18. Dalwadi B, Patel N, Suthar S (2017) A review paper on text summarization for Indian languages.
IJSRD Int J Sci Res Dev 5(07), ISSN (online): 2321
19. Kanitha DK, Muhammad Noorul Mubarak D (2016) An overview of extractive based automatic
text summarization systems. AIRCC’s Int J Comput Sci Inf Technol 8(5). http://www.i-scholar.
in/index.php/IJCSIT/issue/view/12602
20. Gaikwad DK, Mahender CN (2016) A review paper on text summarization. Int J Adv Res
Comput Commun Eng 5(3)
21. Sarda AT, Kulkarni AR (2015) Text summarization using neural networks and rhetorical
structure theory. Int J Adv Res Comput Commun Eng 4(6)
22. Gulati AN, Sarkar SD (2015) A pandect of different text summarization techniques. Int J Adv
Res Comput Sci Softw Eng 5(4), Apr 2015, ISSN: 2277 128X
23. Ragunath SR, Sivaranjani N (2015) Ontology based text document summarization system using
concept terms. ARPN J Eng Appl Sci 10(6), ISSN 1819-660
24. Deshmukh YS, Nikam RR, Chintamani RD, Kolhe ST, Jore SS (2014) Query dependent multi-
document summarization using feature based and cluster based method 2(10), ISSN (Online):
2347-2820
25. Patil PD, Kulkarni NJ (2014) Text summarization using fuzzy logic. Int J Innovative Res Adv
Eng (IJIRAE) 1(3), ISSN: 2278-2311 IJIRAE | http://ijirae.com © 2014, IJIRAE
26. Babar SA, Thorat SA (2014) Improving text summarization using fuzzy logic and latent
semantic analysis. Int J Innovative Res Adv Eng (IJIRAE) 1(4) (May 2014) http://ijirae.com,
ISSN: 2349-2163
27. Gupta V (2013) A survey of text summarizer for Indian languages and comparison of their
performance. J Emerg Technol Web, ojs.academypublisher.com
28. Deshpande AR, Lobo LMRJ (2013) Text summarization using clustering technique. Int J Eng
Trends Technol (IJETT) 4(8)
29. Dhanya PM, Jethavedan M (2013) Comparative study of text summarization in Indian
languages. Int J Comput Appl (0975-8887) 75(6)
30. Dixit RS, Apte SS (2012)Improvement of text summarization using fuzzy logic based method.
IOSR J Comput Eng (IOSRJCE) 5(6):05–10 (Sep-Oct 2012). www.iosrjournals.org, ISSN:
2278-0661, ISBN: 2278-8727
31. Prasad RS, Uplavikar Nitish M, Sanket W (2012) Feature based text summarization. Int J Adv
Comput Inf Res Pune. https://www.researchgate.net/publication/328176042
32. Jayashree R, Srikanta KM, Sunny K (2011) Document summarization for Kannada, soft
computing and pattern, 2011-ieeexplore.ieee.org
33. Siva Kumar AP, Premchand P, Govardhan A (2011) Query-based summarizer based on simi-
larity of sentences and word frequency. Int J Data Min Knowl Manage Process (IJDKP)
1(3)
34. Das A (2010) Opinion summarization in Bengali: a theme network model. Soc Comput (Social
Com), -ieeexplore.ieee.org
35. Agrawal SS (2008) Developing of resources and techniques for processing of some Indian
languages
36. Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. Int J Emerg
Technol Web Intell 2:258–268
A Dynamic Packet Scheduling Algorithm
Based on Active Flows for Enhancing
the Performance of Internet Traffic
Y. Suresh, J. Senthilkumar, and V. Mohanraj
Abstract The Internet is being a large-scale network, the packet scheduling scheme
must be highly scalable. This work is to develop a new packet scheduling mechanism
to enhance the performance of today’s Internet communication. Per-flow control
technique has a scalability challenge because of the vast number of flows in a large
network. The proposed method G-DQS is based on aggregated flow scheduling and
can be used to manage huge networks. Packets are divided into two categories in
this study: short TCP flows and long TCP flows. A scheduling ratio is determined
based on edge-to-edge bandwidth and the maximum number of flows that can be
accepted in the path. This ratio varies dynamically and minimizes the packet drop
for short flows, also the long flows are not starved. This is required for today’s Internet
communication as the recent Internet traffic shows huge short flows. The simulation
results show that the suggested technique outperforms the other algorithms that use
a constant packet scheduling ratio, such as RuN2C and DRR-SFF.
Keywords TCP flows · Internet · Scheduling · Edge-to-edge bandwidth
1 Introduction
The Internet has been transformed into the world’s greatest public network as a
result of the Web. The Web has acted as a platform for delivering innovative appli-
cations in the domains of education, business, entertainment, and medicine in recent
years. Banking and multimedia teleconferencing are just two examples of business
applications.
The practice of storing multimedia data on servers and allowing users to access
it via the Internet has become increasingly common. Other applications include
distance education provided by colleges via video servers and interactive games that
are revolutionizing the entertainment sector. The quality of Internet communication
Y. Suresh (B) · J. Senthilkumar · V. Mohanraj

Department of Information Technology, Sona College of Technology, Salem, TamilNadu, India
https://doi.org/10.1007/978-981-16-7610-9_68
944 Y. Suresh et al.
will have a big impact on the utility value of these apps. The network should be able
to handle large volumes of this type of traffic in a scalable. The network must be able
to meet quality-of-service (QoS) standards to enable this service.
2 Literature Review
Today’s Internet is still based on best-effort service model [1]. It represents a service
that all data packets are treated equally, and the network tries to ensure reliable
delivery. The design of such model is very simple and easily scalable, but it does not
ensure the delivery of traffic flows from end to end. In this model, during the conges-
tion of network, the FIFO drops the packets regardless its priority and transmission
control protocol (TCP) assures the retransmission for the dropped packets.
The work done in [2, 3] shows that Internet traffic characteristics have heavy-
tailed property. Several academics have attempted to increase system performance
by utilizing the considerable unpredictability of Internet traffic. The research includes
high-speed switching [4], dynamic routing [5], and scheduling [6]. The short flow
and long-flow concept have been applied to the data centers [7]. Determining whether
Internet traffic fits heavy-tailed distributions or not is difficult [8].
According to the research, studies applied on Internet traffic, the majority of flows
are short, with less than 5% of long flows carrying more than 50% of total bytes. The
average size of a short flow is only 10–20 packets. In [9], short flows and long flows
are referred as mice and elephants. The long flows have been observed in specifically
P2P data transfers [10] in Web servers [11].
The short flows are primarily attributable to online data transfers initiated by
interactivity [12]. The author [13] proved that the preferential treatment to the short
flows reduces Web latency. Most of the long flows originates from P2P applica-
tions. The MMPTCP transport protocol is introduced [14], which benefits short
flows by randomly scattering packets. The protocol then switches to multi-path TCP
(MPTCP), which is a very efficient mode for long flows. FDCTCP [13] dramatically
improves performance and decreases flow completion times, particularly for small
and medium-sized flows when compared to DCTCP.
As discussed above, from the recent Internet traffic measurement, it is neces-
sary to classify the Internet flows as short and long to achieve service guarantee.
The proposed scheduling algorithm applies, flow-classified service to improve the
performance of Internet.
In today’s Internet routers, a basic scheduling technique known as first in first
out (FIFO) or first come first served (FCFS) is extensively utilized. FIFO only
provides best-effort service. Because flows are not categorized, it is not suited for
delivering guaranteed service. Any special requirements of a flow, such as latency
and throughput, are not taken into account by the scheduling technique. A high-
throughput data transfer in FIFO, for example, can starve a low-throughput real-time
link like voice traffic.
A Dynamic Packet Scheduling Algorithm Based on Active Flows … 945
Existing algorithms such as weighted fair queuing (WFQ), deficit round robin
(DRR), deficit round robin-short flow first (DRR-SFF), and least attained service
(LAS), and execute per-flow scheduling that involves a complicated mechanism for
flow identification as well as flow state maintenance. It is impracticable to keep all
of the flow state in routers with the significant expansion in Internet communication.
The huge advantage of the proposed approach is that no need to maintain flow state
in the routers.
RuN2C scheduling technique [15], in which packets with a low running number
(class-1) are placed in one queue, whereas packets with a high running number (class-
2) are placed in another. The class-2 packets get chance only if the class-1 packets
are completely scheduled. This creates starvation for the class-2 packets.
LAS [16] is used in packet networks to intercommunicate effectively with TCP
to support short flows. This is accomplished by placing the first packet of a newly
arriving flow with the least amount of service at the top of the queue. LAS prioritizes
this packet and reduces the round trip time (RTT) of a slow-starting flow. Short-flow
transfer times are reduced as a result.
The author [17] compared the performance of round robin-based scheduling algo-
rithms. The various attributes of network performance of WRR/SB is compared with
WRR and PWRR. WRR/SB, the WRR/SB outperforms better than other algorithms.
The author [18] proposed that the network time delay might be reduced by using
the neural network approach. This is accomplished by collecting weights that influ-
ence network speed, such as the number of nodes in the path and congestion on each
path. The study demonstrates that an efficient technique that can assist in determining
the shortest path can be used to improve existing methods that use weights.
3 Calculation of Fmax
Maximum active flows that can be allowed on the edge-to-edge network path is
represented by Fmax. In [19], a method described to determine flow completion
time (FCT). For a packet flow of size S f , the FCT is determined using Eq. (1).

Sf
FCT = 1.5 × RTT + log2 × RTT (1)
MSS
where
MSS Maximum segment size
RTT Round trip time
Relating S f and FCT, the throughput T f and Fmax are determined based on
network bandwidth using Eqs. (2) and (3)
S f × (MSS + H )
Tf = (2)
FCT × MSS
F Ai (dk )
BWPr
Fmax = ⊗ (3)
Tf
Fmax is related with the scheduling ratio in the proposed algorithm which is
detailed in Sect. 4 for effective scheduling of packets.
4 Proposed Algorithm
The proposed algorithm captures short flows using threshold th and considers the
remaining flows as long flows. They are inserted in two queues: SFQ and LFQ. The
total number of flows in SFQ is used to initialize the variable counter DC(r). Using
the number of flows in SFQ and LFQ, the algorithm derives the dynamic scheduling
ratio Q(r). It also determines the maximum flows that will be available on the path
using Fmax, which is connected to Q(r). To schedule the flows from SFQ and LFQ,
the conditions Fmax > Q(r) and Fmax < Q(r) are tested.
Algorithm
Begin
Using a th threshold, divide flows into short and long flows, and place
them in two queues, namely SFQ and LFQ, respectively.
S:
n
• Total flows in SFQ = i=1
n BSFQ (i)
• Total flows in LFQ = i=1 BLFQ (i)
• Variable counter initialization DC = i=1 n
BSFQ (i)
D:
n n
BSFQ (i)+ i=1 BLFQ (i)
• Scheduling ratio Q(r) for any round r Q(r ) = i=1 n
i=1 LFQ (i)
B
• Estimate Fmax using Eq. (3).
If Fmax > Q(r).
• Flows served in SFQ = Q(r)
• Flows served in LFQ = 1.
• Perform DC(r) = DC(r) − Q(r)
• If DC(r) > Q(r) then return to D: else return to S: for the calculation of
Q(r) and
nFmax for the next round.
• When i=1 BSFQ (i) = 0 then flows served in LFQ = Q(r)
If Fmax < Q(r)
• Flows served in SFQ = Fmax, and no flow is served in LFQ

• Perform DC(r) = DC(r) − Q(r)
• If DC(r) > Q(r) then return to D: else return to S:for the calculation of
Q(r) and Fmax for the next round.
End
When Fmax is greater than the flows to be scheduled, the algorithm schedules both
short and long flows in SFQ and LFQ. When Fmax is limited, it prioritizes only
short flows. The proposed algorithm works in accordance with the characteristics of
Internet as Internet traffic exhibits short flows in vast manner. This method provides
a more reliable service than the best-effort method utilized in the Internet.
5 Performance Analysis
The proposed scheduling algorithm’s performance is measured in Dumbbell

topology. The performance is evaluated based on the parameters like mean trans-
mission time, packet loss, and throughput. Using network simulation, the G-DQS
is compared with other algorithms such as LAS, RuN2C, and FIFO. The results
recorded from the simulation are presented and analyzed. ns-2 is used to run all of
the simulations.
Fig. 1 Dumbbell topology

In Fig. 1, R0 and R1 are the edge routers. S1–S5 are source, and C1–C5 are sink
nodes which transmits and receives the packets. In our simulation, the packet size is
set to 500 bytes, and short flows are considered as packets with the size of 1 to 20,
and long flows are considered as packets with the size of 1–500. Transmission time,
packet drop, and throughput parameters are analyzed here.
5.1 Transmission Time of Short Flows
The transmission time of short flows is depicted in Fig. 2. In comparison with FIFO,
the proposed algorithm G-DQS and the other algorithm RuN2C greatly lower the
transmission time of short flows. A single queue discipline method FIFO does not
make any distinction between the two flows which increases the transmission time
largely.
In FIFO, long flows can obtain priority over short flows, increasing transmission
time as shown in Fig. 2. The proposed method is giving short flows preference over
long flows and reduces 30.7% of mean transmission time for the short flows in
comparison with FIFO.
Transmission Time (sec)
Flowsize (packets)
Fig. 2 Transmission time versus flow size

5.2 Packet Drop Analysis
Figures 3 and 4 indicate packet drop for various flow size. It demonstrates that short
flows of less than 25 packets do not incur packet loss when using the proposed tech-
nique, although FIFO flows of the same size do. Packet loss for short flows is lower in
Number of packets dropped
Flow Size (packets)
Fig. 3 Packets drop versus flow size

Number of packets dropped
Flow Size ( in packets)
Fig. 4 Packet drop versus flow size (Zoom version of image of Fig. 3)
Throughput (packets/sec)
Simulation Time (sec)
Fig. 5 Throughput versus simulation time
the proposed approach than in RuN2C due to dynamic approach in scheduling ratio.
The proposed algorithm schedules packets from both SFQ and LFQ, whereas the
Ru2NC technique schedules long flows only if short flows are completely supplied.
As a result, in addition to small flows, long flows are serviced in G-DQS.
5.3 Throughput
Figure 5 depicts flow throughput as a number of packets received per second. During
the simulation time, it has been observed that the FIFO’s throughput drops abruptly.
Long flows are penalized and starved as a result of this in FIFO. It also shows that
the FIFO and RuN2C throughputs are not constant across the simulated duration.
The proposed G-DQS has a nearly constant throughput and guarantees it.
6 Conclusion
The proposed algorithm reduces transmission time of all flows in comparison with
other protocols. Transmission time of short flows has been analyzed and found the
no packet loss till th. The proposed algorithm reduces packet loss as it schedules
the flows based on Fmax the path. Since it reduces the packet loss, the number of
retransmission decreases and results in reduction of mean transmission time. The
throughput analysis has also been made and results show that G-DQS maintains
almost constant throughput performs better than other protocols.
References
1. Anand A, Dogar FR, Han D, Li B, Lim H, Machado M, Wu W, Akella A, Andersen DG,

Byers JW, Seshan S, Steenkiste P (2011) XIA: an architecture for an evolvable and trustworthy
ınternet. İn: Proceedings of theHOTNETS ’11,tenth acm workshop on hot topics in networks,
vol 2
2. Bansal N, Harchol-Balter M (2001) Analysis of SRPT scheduling: ınvestigating unfairness. In:
Proceding of the sigmetrics 2001/performance 2001, pp 279–290
3. Crovella M (2001) Performance evaluation with heavy tailed distributions. In: Proceedings of
the JSSPP 2001, 7th ınternational workshop, job scheduling strategies for parallel processing,
pp 1–10
4. Harchol-Balter M, Downey A (1997) Exploiting process lifetime distributions for dynamic
load balancing. Proc ACM Trans Comput Syst 15(3):253–285
5. Shaikh A, Rexford J, Shin KG (1999) Load-sensitive routing of long-lived IP flows. Proc ACM
SIGCOMM 215–226
6. Qiu L, Zhang Y, Keshav S (2001) Understanding the performance of many TCP flows. Comput
Netw 37(4):277–306
7. Carpio F, Engelmann A, Jukan A (2016) DiffFlow: differentiating short and long flows for load
balancing in data center networks. Washington, DC, USA, pp1–6
8. Gong W, Liu Y, Misra V, Towsley DF (2005) Self-similarity and long range dependence on the
internet: a second look at the evidence. Orig Implications Comput Netw 48(3):377–399
9. Guo L, Matta I (2001) The war between mice and elephants. In: Proceedings of the ICNP 2001,
IEEE ınternational conference on network protocols, pp 180–191
10. Brownlee N, Claffy KC (2004) Internet measurement. IEEE Internet Comput 8(5):30–33
11. Bharti V, Kankar P, Setia, Gürsun G, Lakhina A, Crovella M (2010) Inferring ınvisible traffi.
In: Proceedings of the CoNEXT 2010, ACM conference on emerging networking experiments
and technology, vol 22, ACM, New York
12. Chen X, Heidemann J (2003) Preferential treatment for short flows to reduce web latency
computer networks. Int J Comput Telecommun Network 41:779–794
13. Wang M, Yuan L (2019) FDCTCP: a fast data center TCP. In: IEEE ınternational conference
on computer science and educational ınformatization (CSEI). China, pp 76–80
14. Kheirkhah M, Wakeman I, Parisis G (2015) Short versus long flows. ACM SIGCOMM Comput
Commun Rev 45:349–350
15. Avrachenkov K, Ayesta U, Brown P, Nyberg E (2004) Differentiation between short and long
TCP flows: predictability of the response time. İn: Procedimg IEEE INFOCOM, 04(2):762–733
16. Rai IA, Urvoy-Keller G, Biersack EW (2004) LAS scheduling to avoid bandwidth hogging
in heterogeneous TCP Networks. HSNMC 2004 high speed networks and multimedia
communications. In: 7th IEEE ınternational conference, Toulouse, France, pp 179–190
17. Balogh T, Luknarva D, Medvecky M (2010) Performance of round robin-based queue
schedulining algorithms. In: Proceedings of the CTRQ’10, third ınternational conference on
communication theory, reliability, and quality of service. Athens, Greece, pp 156–161
18. Zhang B, Liao R (2021) Selecting the best routing traffic for packets in LAN via machine
learning to achieve the best strategy. Complexity 1–10
19. Jiang Y, Striegel A (2009) Fast admission control for short TCP flows. In: Proceedings of the
global communications conference, 2009. GLOBECOM 2009, Hawaii, USA, pp1–6
Automated Evaluation of Short Answers:
a Systematic Review
Shweta Patil and Krishnakant P. Adhiya
Abstract Automated short answer grading (ASAG) of free text responses is a field of
study, wherein student’s answer is evaluated considering baseline concepts required
by question. It mainly concentrates on evaluating the content written by student, more
than its grammatical form. In educational domain, assessment is indeed a tedious
and time-consuming task. If in anyway the time utilized for this task is, then the
instructor can focus more on teaching and learning activity and can help students in
their overall growth. Many researchers are working in this field to provide a solution
that can assign more accurate score to student response which are similar to the score
assigned by human tutor. The goal of this paper is to provide insight in the field of
ASAG domain by presenting concise review of existing ASAG research work. We
have included the research work carried out using machine learning and deep learning
approaches. We have also proposed our methodology to address this problem.
Keywords Automated evaluation · Short answer grading · Feature engineering ·

Natural language processing · Machine learning · Deep learning
1 Introduction
Research in the field of natural language processing (NLP), machine learning and
deep learning has opened doors for providing solution to complex problems. One
such complex problem is automated short answer grading (ASAG). It is the field
wherein students’ short answer which comprises of few sentences or one paragraph
is evaluated and scores are assigned which are close enough to grades assigned by
human evaluator.
In education domain along with teaching & learning, evaluation is one of the
important task. Evaluation helps to assess student understanding about the course
S. Patil (B) · K. P. Adhiya

Department of Computer Engineering, SSBT’s College of Engineering and Technology,
Bambhori, Jalgaon, Maharashtra, India
https://doi.org/10.1007/978-981-16-7610-9_69
954 S. Patil and K. P. Adhiya
being taught. Evaluation mainly comprises of multiple choice, true or false, fill in
the blanks, short answer, and essay type questions [1]. We are interested in assessing
short answer which comprises of 2–3 sentences or few paragraphs & closed ended,
as it helps to analyze overall understanding related to the course. In evaluating, short
closed-ended answers, students are usually expected to concentrate on including
specific concepts related to question, which ultimately help student to get good score.
It will also help to reduce the amount of time devoted in checking student answers
and will provide them with immediate detailed feedback which finally assist them
in their overall growth.Even the system will develop an unbiased score.
Example 1: What is Stack?
Model Answer: Stack is a linear data structure which has PUSH and POP opera-
tions for inserting and deleting an element in an array, respectively.
Student Response: Stack is a data structure where elements are arranged sequentially
and which has majorly 2 operations PUSH for inserting element and POP for deleting
an element from an array.
In the example 1 shown above, the words underlined are important concepts which
are essential to be included in student response to get evaluated. But many a times, it
may happen that student can represent the same concepts with the help of synonyms,
paraphrases, or polysemous words. So the system should be developed such that it
can recognize all such surface representations and assess student responses correctly.
Our main motivation to take this task for research are;
• To evaluate contents rather concentrating on grammar and style.
• Unbiased Evaluation.
• Save instructors time of evaluation and utilize the same for overall progress of
students.
• To provide immediate detail feedback to students, which will help them in their
future progress.
The study conducted so far has clearly showed that problem of ASAG can be
solved by three state-of-the-art methodologies such as rule-based approach, machine
learning, and deep learning [1, 2]. We have majorly studied, analyzed and reported
machine learning and deep learning approaches in this paper.
The remainder section of this article comprises of first we have reviewed the
existing ASAG systems. Then, we illustrated the general approach of ASAG and our
proposed methodology. At last, we present discussion and conclusion.
Automated Evaluation of Short … 955
2 Related Study
Many researchers have gone through the problem of automated short answer grad-
ing and provided the solution for the same by applying various feature extraction
techniques, machine learning, and deep learning approaches. In this study, we have
studied automated grading via machine learning and deep learning.
2.1 Automated Grading by Applying Machine Learning

Approach
Author [3] proposed a method in which the feature vector for student answer was
generated by utilizing the word Part of speech (POS) tag acquired from Penn tree
bank, along with POS tag of preceding and next word. Also term frequency-inverse
document frequency (TF-IDF) value and entrophy are included in feature vector.
Finally, using SVM classifier, the students answer were labeled as +1 and −1. Author
has shown the average precision rate of the proposed model to be 68%.
Kumar et al. [4] proposed AutoSAS system which incorporates various feature
vector generation techniques such as Lexical diversity, word2vec, prompt, and con-
tent overlap.The authors have later trained all students answer using the features
described above and later used random forest for regression analysis. They have
employed quadratic weighted kappa for calculating the level of agreement between
their proposed model and human annotated scores which comes to be 0.79. The
authors have tested their proposed model on universal available dataset ASAP-SAS.
Galhardi et al. [5] has presented an approach for well-known ASAG dataset Beetle
and SciEnts Bank which consist of electricity & electronics and physics, life, earth,
and space science questions, respectively. They have utilized the best of distinct fea-
tures such as text statistics (SA and question length ratio, count of words, avg. word
length, and avg. words per sentence), lexical similarity (token based, edit based,
sequence based and compression based), semantic similarity (LC, Lin, Resnik, Wu
& Palmer, Jiang and Conrath and shortest path), and bag of n-grams. Once the fea-
tures were generated, they experimented with random forest and extreme gradient
boosting classifiers. System was evaluated using macro-averaged F1-score, weighted
average F1-score, and accuracy. They reported the overall accuracy between 0.049
and 0.083.
In [6], automated assessment system proposed for engineering assignments which
majorly comprises of textual and mathematical data. They have used tf-idf method
for extracting feature from textual data by performing initial preprocessing such as
stop word removal, case folding, and stemming. Later they utilized support vector
machine (SVM) technique for assigning score to student answers. They have shown
the accuracy of 84% for textual data.
2.2 Automated Grading by Applying Deep Learning

Approach
Following we have presented various existing ASAG systems via deep learning:
Zhang et al. [7] has developed word embedding through domain general (through
Wikipedia) and domain-specific (through student responses) information by
CBOW.Later, student responses are evaluated using LSTM classifier.
Ichida et al. [8] deployed a measure to compute semantic similarity between
the sentences using Siamese neural network which uses two symmetric recurrent
network, i.e., GRU due to its capability to handle the issue of vanishing/ exploding
gradient problem. They have also studied LSTM and proved how their approach of
GRU is superior to LSTM as it has very fewer parameters to train. The system can be
improved by utilizing sentence embedding instead of word embedding. They showed
Pearson correlation to be 0.844 which is far better than baseline approaches studied
by author.
Kumar et al. [9] proposed model which comprises of cascading of three neural
building blocks: Siamese bidirectional LSTM unit which is applied for both student
and model answer, later a pooling layer based on earth mover distance applied over
LSTM network and finally a regression layer which comprises of support vector ordi-
nal regression for predicting scores.The evaluation of LSTM-Earth Mover Distance
with support vector ordinal regression has shown 0.83 RMSE score which is better
than scores generated by softmax.
Kwong et al. [10] Author has ALSS, which checks content, grammar, and deliv-
ery of speech using three Bi-LSTM. The end-to-end attention of context is learned
through MemN2N network. System can be enhanced by utilization of GAN.
Whatever systems we studied so far have used word embedding technique, but it
has the limitation of context. So, [11, 12] both approaches have utilized sentence
embedding techniques skip thought and sentence BERT, respectively. Author in [11]
deployed a model, wherein vectors for both student and model answer are generated
using skip thoughts sentence embedding techniques. Later, component-wise product
and absolute difference of both vectors is computed. To predict the final score logistic
linear classifier is utilized. Wherein, [12] proposed a method that provided automated
scoring of short answers by utilizing SBERT language model. The model performs
search through all reference answers provided to the model during training and
determine the more semantically closer answer and provide the rating. Limitation:
False negative scores are also generated which need to be checked manually which
is very tedious job.
Hassan et al. [13] employed paragraph embedding using two approaches: (1) Sum
of pretrained word vector model such as word2vec, glove, Elmo, and Fasttext (2)
utilizing pretrained deep learning model for computing paragraph embedding such
as skip-thought, doc2vec, Infersent for both student and reference answer. Once the
vectors are generated, author used cosine similarity metrics for computing the sim-
ilarity between both vectors. Yang et al. [14] utilized deep encoder model which
has two encoding and decoding layers. Wherein in encoding layer, student answers
are represented in lower dimensions, and later, their labels information is encoded
using softmax regression. While in decoding layer, the output of encoding are recon-
structed.
Gong and Yao [15], Riordan et al. [16] utilized bidirectional LSTM with attention
mechanism. In [15], initially word embeddings are fed to CNN to extract the relevant
features which are later given as an input to LSTM layer. The hidden layers of LSTM
are aggregated in either mean over time or attention layer which gives single vector.
That single vector is passed through a fully connected layer to compute scalar value
or a label, while in [16] student response and reference answers are segmented into
sentences which in turn are tokenized. Each feature is fed into bidirectional RNN
network to generate sentence vectors. On top of it, attention mechanism is applied on
each sentence vector and final answer vectors are generated. At last,the answer vector
is passed through logistic regression function to predict the scores. Tan et al.[17]
have introduced extremely new approach for ASAG by utilizing graph convulational
network (GAN). The author has deployed a three-step process: (1) Graph building:
Undirected heterogeneous graph for sentence level nodes and word bi-gram level
nodes are constructed with edges between them. (2) Graph representation: Two-layer
GCN model encode the graph structure. (3) Grade prediction.
Table 1 gives a summary of ASAG systems studied by us.
3 Methodology
3.1 General ASAG Architecture
Architecture employed by most of the ASAG systems studied so far is shown (see
Fig. 1). It majorly comprises of four modules: Preprocessing, feature engineering,
model building, and evaluation.
3.1.1 Preprocessing
Though it is not a compulsory phase, but still some sought of preprocessing such
as stop word removal, case folding, stemming/lemmatization are employed in many
works to extract content rich text for generating vectors.
3.1.2 Feature Engineering
It is a task in which domain specific as well as domain general information is extracted

and a vectors are generated which can be fed into the model to generate more accurate
score for Student Answer (SA). Various feature engineering techniques are utilized in
Table 1 Summary of Existing ASAG systems studied

Ref. No. and Year Approach Technique Evaluation metrics
Machine learning TF-IDF and SVM Accuracy
[6] and 2009
Machine learning POS, TF-IDF, entropy Precision
[3] and 2010 and SVM
Machine learning Dependency graph and RMSE and Pearson
[18] and 2011 SVM correlation
Deep learning Siamese Bi-LSTM MAE and RMSE
[9] and 2017
Deep learning Bi-LSTM with QWKappa
[15] and 2017 attention mechanism
Deep learning word2vec and Siamese Pearson correlation,
[8] and 2018 GRU model Spearman, and MSE
Deep learning Paragraph embedding Cosine similarity,
[13] and 2018 Pearson correlation,
and RMSE
Deep learning Deep sutoencoder Accuracy and
[14] and 2018 grader QWKappa
Machine learning Text statics, lexical Accuracy, macro
[5] and 2018 similarity, semantic F1-score, and
similarity, n-gram, weighted F1-score
random forest, and
extreme gradient
boosting
Deep learning Bi-LSTM with MSE and MAE
[2] and 2019 attention mechanism
Deep learning LSTM QWKappa
[7] and 2019
Deep learning 3-attention based Pearson correlation
[10] and 2019 Bi-LSTM and
MemN2N
Deep learning Lexical diversity, QWKappa
[4] and 2019 word2vec, prompt and
content overlap, and
random forest
Deep learning Skip thought Pearson correlation
[11] and 2020 and RMSE
Deep learning Graph convulation QWKappa
[17] and 2020 network
Deep learning Sentence-BERT QWKappa and
[12] and 2020 accuracy
Fig. 1 General system

architecture for automated
short answer grading system
the study carried out so far. Some of them are TF-IDF, n-gram, word embedding [8],
and sentence embedding [11, 12].
3.1.3 Model Building
In this phase, researchers have incorporated various machine learning such as

SVM [3, 6, 18], logistic regression, and deep learning techniques such as LSTM [7],
Bi-LSTM [9, 10], Bi-LSTM with attention mechanism [15, 16], and many more for
predicting correct class labels or predicting correct score for SA.
3.1.4 Evaluation
It majorly contributes in computing the amount of similarity and variance in system

generated labels/scores as compared to labels/score assigned by human evaluator.
Various evaluation methods employed are root mean square error (RMSE), Pearson
correlation, QWKappa, accuracy, etc.
3.2 Proposed Methodology
Our major intention to carry out this work is to recognize the level of semantic
similarity between student answer and model answer. As per the study conducted,
there are many ways through which semantic equivalence between terms can be
recognized such as TF-IDF, LSA, and embedding. Many research work studied
related to ASAG utilized word embedding-based feature for recognizing the semantic
similarity between terms, but the corpus on which word embedding are trained is
usually model answer as well as student collected responses, which many a times are
limited in context. So, domain-specific and domain general corpora can be utilized
to train word embedding. Even utilization of sentence embedding technique can
overcome the problem of understanding context and intention in the entire text.
The proposed methodology concentrates on utilizing the word2vec skip gram
model for feature extraction and 2-Siamese Bi-LSTM with attention mechanism for
predicting scores for students answers. The work will be limited to data structure
course of undergraduate program of engineering. Instead of using the pretrained
word embedding, we are going to generate domain-specific word vectors by utilizing
the knowledge available for data structures. Later, once the domain-specific word
embedding is generated, word vectors for concepts utilized by reference answer and
students answer will be extracted from those embedding, and we will fed the same
to Siamese networks to predict the similarity between SA and RA.
For the purpose of this research, we have created our own dataset by conducting
two assignments on class of undergraduate students wherein ten general questions
related to data structures are asked in assignment-1 to more than 200 students wherein
students were expected to attempt the questions in 3–4 sentences and in assignment-2
which has four application-oriented open-ended questions on real-world situation,
students were asked to answer the same using more than 150 words. Total of about
1820 answers are collected so far, even we have graded the acquired answer through
human evaluator for checking the reliability of scores predicted by model in near
future.
Table 2 shows sample questions asked in assignment-1 and assignment-2 and
sample student answers collected.
4 Discussion and Conclusion
The main objective behind carrying out this work was to study, analyze, and present
the development going on in the field of automated short answer grading which is
related to machine learning and deep learning approaches.
We discovered that many researchers utilized only reference answers (RA) pro-
vided by instructor for rating student answer (SA). So there is a chance that some
of the concepts which are left out or presented in different way in RA and SA may
lead to incorrect label/score assignment. Also generation of word vectors contribute
Table 2 Sample question and answer collected for future study

Assignment Sample question Sample student answers collected
Assignment-1 What do you understand from SA1: Stack overflow-A stack overflow is an
stack overflow and stack undesirable condition in which a particular computer
underflow condition? program tries to use more memory space than the call
stack has available. Stack Underflow-An error
condition that occurs when an item is called for from
the stack, but the stack is empty
SA2: Stack overflow condition means user cannot be
able to insert or push any element as the stack is
already been filled by elements i.e. if top = max − 1
then it is overflow. Stack underflow condition means
there is no element in the stack as top is pointing to
null
Assignment-2 Suppose we want to SA1: The most recommended among both of them is
implement a navigation the doubly linked list because we can easily go back
option in a web browser. Now and forth with the doubly linked list as it hast 2
we have two options for this pointers aiming to front and previous and it will be
particular purpose, a circular easy to navigate back and forth which is a common
queue array based and doubly feature in our file managers navigation bar whereas if
linked list. Which option you we consider the circular queue array we can easily
will select and compare both navigate in forward direction but to go back we have
the options to cover a loop and hence it’ll take more time and also
will destroy the users experience. Also the circular
queue array don’t have end nodes hence creating an
overall mess of how the navigation will be executed
SA2: Circular Queue is a linear data structure in
which the operations are performed based on FIFO
(First In First Out) principle and the last position is
connected back to the first position to form a
circle.Where as doubly linked list is the linear data
structure in which data is sequentially stored. It has
one extra pointer(*prev) along with *next to point to
the previous node.Hence in doubly linked list we can
traverse back and forth through the list. Main purpose
of the navigation option is to help user quickly
switch between the pages. If it is implemented using
circular queue then if the user wants to traverse to
first page from somewhere in middle,he will have to
first traverse till the end and then only he will be able
to access the first page. Hence more time will be
required.Where as if the same was implemented
using doubly linked list the user could have back
traversed easily to the first page,which he cannot do
in circular queue. Hence navigation option should be
implemented using doubly linked list
highly towards assigning labels/score to SA as compared to model building, because

word vector captures a semantical relationship, which will categorize SA more accu-
rately even if it does not have exact terms or concepts as that of RA. Therefore, there
is a need of identifying textual entailment in SA and RA and computing correct level
of similarities in them. In near future, we will implement the proposed model and
test it on the data collected by us. We will also compare the accuracy of our proposed
model with already available popular ASAG systems. The proposed method mainly
concentrates on implementing an assessment method for evaluating textual answers,
mathematical expressions, and diagrammatic representation will not be evaluated by
proposed model.
So, we can conclude that latest advancement in the field of natural language pro-
cessing, machine learning, deep learning, and feature extraction method will surely
contribute in the domain of short answer grading task.
References
1. Galhardi LB, Brancher JD (2018) Machine learning approach for automatic short answer grad-
ing: a systematic review. In: Simari GR, Fermé E, Gutiérrez Segura F, Rodríguez Melquiades
JA (eds) IBERAMIA 2018. LNCS (LNAI), vol 11238. Springer, Cham, pp 380–391. https://
doi.org/10.10007/1234567890
2. Burrows S, Gurevych I, Stein B (2014) The eras and trends of automatic short answer grading.
Int J Artif Intell Educ 25(1):60–117. https://doi.org/10.1007/s40593-014-0026-8
3. Hou WJ, Tsao JH, Li SY, Chen L (2010) Automatic assessment of students’ free-text answers
with support vector machines. In: García-Pedrajas N, Herrera F, Fyfe C, Benítez JM, Ali M
(eds) Trends in applied intelligent systems. IEA/AIE 2010. Lecture Notes in Computer Science,
vol 6096. Springer, Berlin, Heidelberg
4. Kumar Y, Aggarwal S, Mahata D, Shah R, Kumaraguru P, Zimmermann R (2019) Get IT scored
using AutoSAS—an automated system for scoring short answers, AAAI
5. Galhardi LB, de Mattos Senefonte HC, de Souza RC, Brancher JD (2018) Exploring distinct
features for automatic short answer grading. In: Proceedings of the 15th national meeting on
artificial and computational intelligence. SBC, São Paulo, pp 1–12
6. Quah JT, Lim L, Budi H, Lua K (2009) Towards automated assessment of engineering assign-
ments. In: Proceedings of international joint conference on neural networks, pp 2588–2595.
https://doi.org/10.1109/IJCNN.2009.5178782
7. Zhang L, Huang Y, Yang X, Yu S, Zhuang F (2019) An automatic short-answer grading model
for semi-open-ended questions. Interact Learn Environ, pp 1–14
8. Ichida AY, Meneguzzi F, Ruiz DD (2018) Measuring semantic similarity between sentences
using a siamese neural network. In: International joint conference on neural networks (IJCNN),
pp 1–7. https://doi.org/10.1109/IJCNN.2018.8489433
9. Kumar S, Chakrabarti S, Roy S (2017) Earth mover’s distance pooling over Siamese LSTMs
for automatic short answer grading. In: Proceedings of the twenty-sixth international joint
conference on artificial intelligence, pp 2046–2052. https://doi.org/10.24963/ijcai.2017/284
10. Kwong A, Muzamal JH, Khan UG (2019) Automated language scoring system by employing
neural network approaches. In: 15th International conference on emerging technologies (ICET),
pp 1–6. https://doi.org/10.1109/ICET48972.2019.8994673
11. Gomaa WH, Fahmy AA (2019) Ans2vec: a scoring system for short answers. In: Hassanien
A, Azar A, Gaber T, Bhatnagar R, Tolba MF (eds) The international conference on advanced
machine learning technologies and applications (AMLTA2019). AMLTA 2019. Advances in
intelligent systems and computing, vol 921. Springer, Cham
12. Ndukwe IG, Amadi CE, Nkomo LM, Daniel BK (2020) Automatic grading system using
sentence-BERT network. In: Bittencourt I, Cukurova M, Muldner K, Luckin R, Millán E (eds)
Artificial intelligence in education. AIED 2020. Lecture Notes in Computer Science, vol 12164.
Springer, Cham
13. Hassan S, Fahmy AA, El-Ramly M (2018) Automatic short answer scoring based on paragraph
embeddings. Int J Adv Comput Sci Appl (IJACSA) 9(10):397-402. https://doi.org/10.14569/
IJACSA.2018.091048
14. Yang X, Huang Y, Zhuang F, Zhang L, Yu S (2018) Automatic Chinese short answer grading
with deep autoencoder. In: Penstein Rosé C et al (eds) AIED 2018, vol 10948. LNCS (LNAI).
Springer, Cham, pp 399–404
15. Riordan B, Horbach A, Cahill A, Zesch T, Lee C (2017) Investigating neural architectures
for short answer scoring. In: Proceedings of the 12th workshop on innovative use of NLP for
building educational applications, pp 159–168. https://doi.org/10.18653/v1/W17-5017
16. Gong T, Yao X (2019) An attention-based deep model for automatic short answer score. Int J
Comput Sci Softw Eng 8(6):127–132
17. Tan H, Wang C, Duan Q, Lu Y, Zhang H, Li R (2020) Automatic short answer grading by encod-
ing student responses via a graph convolutional network. In: Interactive learning environments,
pp 1–15
18. Mohler M, Bunescu R, Mihalcea R (2011) Learning to grade short answer questions using
semantic similarity measures and dependency graph alignments. In Lin D (ed) Proceedings
of the 49th annual meeting of the association for computational linguistics: human language
technologies volume 1 of HLT ’11. Association for Computational Linguistics, Portland, pp
752–762
Interactive Agricultural Chatbot Based
on Deep Learning
S. Suman and Jalesh Kumar
Abstract The goal of technological innovation is to assist humans in making their

lives easier. This is especially true in the field of natural language processing (NLP).
This is why conversational systems, often known as chatbots, have gain popularity
in recent years. Chatbot has been used in different domains. These are the intel-
ligent systems developed using machine learning algorithms and NLP. Although
technology has advanced significantly in the sector of agriculture, farmers still do
not have easy access to this knowledge, which necessitates extensive online searches.
This is where a chatbot can help them in providing the answers to their queries quickly
and easily when compared to traditional methods. In this paper, the chatbot has been
intelligently built to recognize poorly grammatically defined statements, misspelled
words, and unfinished phrases. Natural language processing is used by the system to
read user queries and keywords, match them against the knowledge base, and offer
an answer with correct results, making it easier for users to communicate with the
bot. To make the responses more intelligible, classification algorithms are used to
provide non-textual responses that are easily seen by the farmers.
Keywords Natural language processing · Chatbot · Natural language toolkit ·

Knowledge base · Machine learning
1 Introduction
Agriculture plays a significant role in employing people in many parts of the world.
Agriculture is the main source of income for the majority of the population. India
is a country where 70% of people reside in rural areas, and they primarily depend
on agriculture where 82% of farmers being marginal and small. The GDP growth of
many countries is still based on agriculture [1].
S. Suman (B) · J. Kumar

J N N College of Engineering, Shivamogga, India
J. Kumar
e-mail: jaleshkumar@jnnce.ac.in
https://doi.org/10.1007/978-981-16-7610-9_70
966 S. Suman and J. Kumar
The advancement in the field of farming is at a faster pace nowadays. But much
information is still not accessible to the farmers as it requires many steps, and it also
fails in fetching the responses to the queries.
A chatbot is conversational assistant which provides easy communication to the
users as they are conversing with the human being. The users’ requests will be
processed and interpreted, and the appropriate responses will be sent [2]. The chatbot
will extract relevant entities by identifying and interpreting the intent of a user’s
request, which is a vital task for the chatbot.
Farmers are facing a low-yield issue due to the lack of information. Many of
the agricultural-related advanced techniques are discussed in [3–5]. In the proposed
work, querying techniques that help the farmers to get agriculture information is
designed and implemented. The NLP [6] technique used to take the natural language
of humans as input. It will also help the system in interpreting the user query even if
there is an incomplete sentence or grammatical mistake.
The objectives of the project are as follows,
(i) Creating a user interface that allows people to engage successfully in order to
attain the required results in fewer steps.
(ii) Processing of the extracted data into a suitable format using machine learning
algorithms.
(iii) Respond quickly to the user query and suggest the response for it.
(iv) A system that will respond to the users in real time.
The paper is organized into the different sections as follows.
The literature review is covered in Sect. 2. Section 3 describes the system architec-
ture. In Sect. 4, the methodology is explained. The results and analysis are presented
in Sect. 5. Section 6 is the conclusion.
2 Literature Survey
This section discusses a literature review to highlight the work that has been done so
far in the field of chatbots.
Kannagi et al. [1] gives insight into the farmbot application which helps the
farmers in solving their queries related to their agricultural farmland. Farmbot uses
natural language processing (NLP) technique to identify keywords and to respond
with accurate results. NLP technique is used to interpret the natural language of
human as an input. Based on training dataset, neural network will be constructed,
and gradient descent algorithm is used in error optimization. The test dataset will go
through certain preprocessing steps, classification, and finally the construction of a
neural network. The system output will be shown in text format in the user interface,
and the text will be translated to speech using the Web Speech API. The ‘ARIMA’
prediction method used to forecast future cost of the agricultural products.
The study by Karri et al. [7] discusses the chatbot that was successful in answering
the queries. It follows two steps.
Interactive Agricultural Chatbot Based on Deep Learning 967
(1) Bag-of-Words algorithm

(2) The Seq2seq model (training model).
This model is then used by a recurrent neural network. It takes two inputs at
the same time, one from the prior and the other from the user, making it recurrent.
The state-space tree will be constructed using the breadth-first search strategy. The
successor will be generated from all current states at each level, and the heuristic
values will be determined by sorting in ascending order. If no response is observed,
the technique will be repeated by widening the beam. The chatbot is trained with the
help of the Corpus dataset.
Sawant et al. [8] proposes an intelligent system that uses analytics and data mining
to assist farmers in assisting different agricultural approaches so that they can choose
appropriate crops based on the meteorological, geographical, and soil circumstances
of their location. The implementation algorithm is a K-nearest neighbor algorithm.
The precision of the model is determined by the k value. Each tree’s output will be
single class, and the forest will select class with the highest number of votes. The
results of the algorithm are compared to one another. Algorithms are used to deter-
mine the accuracy of training and testing. Agribot’s role is not limited to suggesting
crops to farmers and also assists in acquiring a better understanding of their crops so
that they can extend their shelf life.
Vijayalakshmi et al. [9] portrayed how machine learning and artificial intelligence
are transforming the IT industry into a new landscape. Talkbot is a virtual conver-
sational assistant that parses inquiries, identifies keywords, matches them to the
knowledge base, and responds to the user with correct results using natural language
processing. If the query is based on classification, it undergoes naïve Bayes classifier,
which retrieves related results using a knowledge base. To get equivalent responses,
the highest responses are looped. Then, the output of textual responses will be sent
to the API for speech synthesis. The API accepts text as input, converts it to speech,
and outputs it.
An overview of the numerous question-answering systems that have been used
to resolve farmer-related queries are discussed in [10–12]. A chatbot is being
constructed using the Kisan Call Center dataset to answer the farmers’ questions.
The Sen2Vec model is used by Agribot. The model’s outputs, as well as the weights
required for a matrix of the model, are trained into embedding words. The model
outputs are derived from the most comparable query of training data, and then, cosine
similarity is used to compare them to embedding vectors. The ranking was deter-
mined by the best response to the production. Agribot helped the farmer in solving
the queries related to agriculture, animal husbandry, and horticulture using natural
language technology.
Kohli et al. [13] proposed that even though several chatbots have been created,
there is still a problem with the data-driven system, when you consider that it is tough
to cope with the massive quantity of facts required for the development. So, with the
help of a python pypy source, a chatbot is developed by way of taking user necessities
into account. The principal buffer is where the message that wants to be printed via
the chatbot to a consumer is saved in the Run.py file. It builds a connection socket
switch. The send message is declared throughout the socket to the joinroom feature
efficiently as soon as the initialization is completed. One hundred interactions are
mined for the trying out domain to see how the chatbot will correctly apprehend
user inquiries, and then, it is run with questions that are identifiable in the chatroom
through an agent if the chatbot and examined to see whether the chatbot will reply
incredibly or deceptively.
Arora et al. [14] portrayed that chatbot has been proposed that would help the
farmers in providing various solutions to their queries as well as help them in the
process of decision-making. Bot not only provides an answer but also answers to the
questions that have been frequently asked and emphasizes weather forecasting and
crop disease detection. Sequence-to-sequence model is used for building a conver-
sational system. It is a multilayer perceptron RNN. Also known as encoder-decoder.
It is the generative class of models. It means model will automatically grasp the data
and response will be in terms of word by word. Next step will be the creation of
model of classification. They can be constructed from beginning or transfer learning.
The created model trained for 50 epochs and batch size 20. Prediction of weather is
included as one of the features in Agribot. OpenWeatherMap is an administrator who
gives the information regarding climate, which includes current climatic condition
information. The chatbot will be able to guide farmers as in the part of detection of
disease in crops, weather prediction.
The creation of chatbots using natural language approaches which is an initiative
to annotate and observe the interaction between humans and chatbots are described
in [15]. The proposed system performs an analysis of the parameters of machine
learning that will help the farmers in increasing their yield. The analysis is done on
the rainfall, season, weather, and type of soil of particular area which is based on
the historic data. The chatbot is trained using NLP. The system helps the farmers of
remote places to understand the crop to be grown based on the atmospheric condi-
tion. K-nearest neighbors algorithm is used which stores the available cases and
also classifies based the measure of similarity. The data has been collected from
different sources of government websites and repositories. The database is trained
using machine learning using TensorFlow architecture and KNN algorithm. The NLP
is used in training and validation of the data. Once the system has gone through all the
processes of data collection, cleaning, preprocessing, training, and testing, it sends it
to the server for use. The system helps the farmers of remote places where the reach
of connectivity is less and to better understand the crop to be grown based on the
atmospheric condition and also suggest the answers to their queries.
Based on the analysis made on the literature survey, we require a chat platform
that uses the Internet facility to make the discussion process more accessible and
automated. In addition, the system should include capabilities such as real-time
outputs and a user-friendly interface for farmers. A system like this could help farmers
bridge the knowledge gap and develop a more productive market.
Fig. 1 System architecture of the chatbot
This section gives an overview of the system architecture that was employed in the
project.
The proposed model has been divided into three stages, processing of the query,
training, development of a chatbot, and retrieval of responses. The chatbot application
system architecture is shown in Fig. 1. The user enters their query as text through
the user interface. The interface receives user questions, which are then sent to
the Chatbot application. Then, the textual query in the application goes through a
preprocessing stage. During the preprocessing phase, the query sentence tokenized
into words, stopwords removed, and words are stemmed to the root words. Then,
query would be classified using a neural network classifier, with the appropriate
results being given to the user as text.
4 Methodology
The proposed methodology focuses on responding to the farmer queries, from which
they can get the benefits; it comprises of three steps:
(A) Processing of the query
(B) Training and development of a chatbot
(C) Retrieval of responses
(A) Processing of the query

The primary goal of NLP is to understand human language as input. This aids
system in comprehending the input, even if it contains grammatical errors or is
incomplete phrases. As a result, the classification algorithm’s efficiency improves.
(1) Segmentation of Sentences:
The initial step in NLP is sentence segmentation, which divides a paragraph
into individual sentences.
(2) Tokenization:
Tokenization is a technique for separating sentences into tokens or words.
(3) Noise Removal:
Noise removal is the process of removing stop words that are not related
to the context. The stopwords have been omitted so that the classification of
likelihood is not accounted for.
(4) Normalization of the Lexicon:
The process of transforming various input representations into a single
representation is known as lexicon normalization. One of these is the stemming
process, in which the suffixes of a word are removed.
(5) Bag of Words or a Vector Space:
Extracted word will be converted into a feature vector, with the value of
binary serving as a weight for each feature’s representation (0 indicates that
the feature is not there, 1 indicates that the feature is present).
(B) Training and development of a chatbot
The dataset file, which comprises 200 agricultural questions and responses for
various crops (rice, groundnut, cotton, wheat, bajra, and sugarcane) has been created
and imported. After that, the data is processed and translated to a vectored format.
The chatbot is trained by developing neural network, which is computational model
with two hidden layers that consists of input layer and output layer. Each hidden
layer transforms the inputs into a format that can be used by outputs. In the input
layer, nodes or neurons reflect the number of words in the dataset. Each node given
a random weight. The weighted value is then combined with bias and run through
activation function. The second hidden layer accomplishes same task as first, but
its input is the output of the first hidden layer. The output layer multiplies weights
before sending them to activation function. The nodes in output layer are features or
classes. The activation function used is softmax, with a value between 0 and 1 as the
output. For classification, this deep neural network is employed.
(C) Retrieval of responses
A neural network classification model is built using the training dataset. The
probability for test data is generated using the built model. The output of the system
provided to user as text via the user interface.
The chatbot was created based on the research and the various methodologies. After
making reasonable predictions, a Google Colaboratory setup was created to make
the chatbot interactive.
The developed chatbot can help farmers in the following areas like soil detection,
recommendation of pesticides, and details about Kisan call center as shown in Fig. 2.
Testing can be done to assess the chatbot’s quality. The procedures involved in
conducting chatbot tests are
(i) Gathering an overview of questions that can be asked and
(ii) Determining if the responses are correct or incorrect.
Table 1 gives the sample of the responses that have been retrieved from the chatbot
application in the first query; there is a spelling mistake, but still, the chatbot is
successful in rendering the answer. In the second query, the chatbot has not been
trained for the particular query, but it learned from the training of a similar type of
dataset. In the third query, the user asks questions that are not related, so the chatbot
is responding as out-of-bound questions.
The false-negative scenario is described in Table 2, where the chatbot inaccurately
predicts the response to the user query. This can be overcome by training the model
with a large number of datasets.
Fig. 2 Chatbot interface
Table 1 Chatbot responses

User query Chatbot response
Soil for maize/maiz crop The best soil for maize crop is clay
loam
Hiiiiiiiiiiiii Good to see you
Do u have bf/boyfriend? This is out of bound question I can’t
answer
Table 2 Incorrect response

User query Chatbot response
of the chatbot
Sugar cane pesticide The pesticides for groundnut crop is
Carbaryl 10 D-25 kg/ha
Kisan call center number Please visit our website
pmkisan.gov
6 Conclusion
The need and necessity for chatbots in numerous industries are justified by the expan-
sion and popularity of chatbots. The performance of chatbots is shown to be relatively
high when compared to traditional approaches. The typical amount of time spent
interacting with a chatbot is fairly brief, and it helps farmers to get quick responses
to their questions. A chatbot has been proved to suit the needs of users by responding
quickly and offering services and information. By leveraging natural language to
answer questions about agriculture, our chatbot has benefited neglected communi-
ties. The chatbot will provide agricultural facts to the farmer. To get an answer, a
farmer can send a direct message. Our approach would allow a farmer to ask any
number of questions at any moment, which would aid in the speedier and more
widespread adoption of current farming technology. Because most farmers interact
in their native languages, future advancements are possible. As a result, there is a
need for a solution that can connect the model and their languages, as well as rainfall
prediction, production, and other aspects estimation.
7 Future Scope
Farmers can ask their questions verbally and receive answers from the bot using the
speech recognition capability. Because most farmers interact in their native languages
as a result, there is a need for a solution that can connect the model and their languages.
The weather prediction module can be added to accesses the location and suggest
the crops based on that. To support farmers, integration with different channels such
as phone calls, SMS, and various social media platforms can be used.
References
1. Kannagi L, Ramya C, Shreya R, Sowmiya R (2018) Virtual conversational assistant—the

FARMBOT”. In: International journal of engineering technology science and research, vol. 5,
pp 520–527
2. Akma N, Hafiz M, Zainal A, Fairuz M, Adnan Z (2018) Review of chatbots design techniques.
Int J Comput Appl 181:7–10
3. Talaviya T, Shah D, Patel N, Yagnik H, Shah M (2020) Implementation of artificial intelligence

in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif
Intell Agric 4:58–73
4. Jha K, Doshi A, Patel P, Shah M (2019) A comprehensive review on automation in agriculture
using artificial intelligence. Artif Intell Agric 2:1–12
5. Ganatra N, Patel A (2021) Deep learning methods and applications for precision agriculture.
In: Joshi A, Khosravy M, Gupta N (eds) Machine learning for predictive analysis. Lecture
notes in networks and systems, vol 141. Springer, Singapore. https://doi.org/10.1007/978-981-
15-7106-0_51
6. Nagarhalli TP, Vaze V, Rana NK (2020) A review of current trends in the development of chatbot
systems. In: 2020 6th ınternational 48 conference on advanced computing and communication
systems (ICACCS), pp 706–710
7. Karri SPR, Kumar (2020) Deep learning techniques for ımplementation of chatbots. In: 2020
ınternational conference on computer communication and ınformation (ICCCI-2020)
8. Sawant D, Jaiswal A, Singh J, AgriBot P (2019) An intelligent interactive interface to assist
farmers in agricultural activities. In: 2019 IEEE Bombay section signature conference (IBSSC)
9. Vijayalakshmi K, Meena P (2019) Agriculture TalkBot using AI. Int J Recent Technol Eng
(IJRTE) 8:186–190
10. Jain N, Jain P, Kayal P, Sahit PJ, Pachpande S, Choudhari J, Singh M (2019) AgriBot:
agriculture-specific question answer system. In: 2019 IEEE Bombay section signature
conference (IBSSC)
11. Niranjan PY, Rajpurohit VS, Malgi R (2019) A survey on chatbot system for agriculture domain.
In: 2019 ınternational conference on advances in ınformation technology, pp 99–103
12. Jain N, Jain P, Kayal P, Sahit J, Pachpande S, Choudhari J (2019) AgriBot: agriculture-specific
question answer system
13. Kohli B, Choudhury T, Sharma S, Kumar P (2018) A platform for human-chatbot ınteraction
using python. In: Second ınternational conference on green computing and ınternet of things,
pp 439–444
14. Arora B, Chaudhary DS, Satsangi M, Yadav M, Singh L, Sudhish PS (2020) Agribot: a natural
language generative neural networks engine for agricultural applications. In: 2020 ınternational
conference on contemporary computing and applications (IC3A), pp 28–33
15. Yashaswini DK, Hemalatha R, Niveditha G (2019) Smart chatbot for agriculture. Int J Eng Sci
Comput 9:22203–22205, May 2019
Analytical Study of YOLO and Its
Various Versions in Crowd Counting
Ruchika, Ravindra Kumar Purwar, and Shailesh Verma
Abstract Crowd counting is one of the main concerns of crowd analysis. Estimating
density map and crowd count in crowd videos and images has a large application
area such as traffic monitoring, surveillance, crowd anomalies, congestion, public
safety, urbanization, planning and development, etc. There are many difficulties in
crowd counting, such as occlusion, inter and intra scene deviations in perception and
size. Nonetheless, in recent years, crowd count analysis has improved from previous
approaches typically restricted to minor changes in crowd density and move up to
recent state-of-the-art systems, which can successfully perform in a broad variety
of circumstances. The recent success of crowd counting methods can be credited
mostly to the deep learning and different datasets published. In this paper, a CNN-
based technique named You Only Look Once (YOLO), and its various versions have
been studied, and its latest version, YOLOv5, is analyzed in the crowd counting
application. This technique is studied on three benchmark datasets with different
crowd densities. It is being observed that YOLOv5 gives favorable results in crowd
counting applications with density ranges from low to medium but not in a very
dense crowd.
Keywords CNN · YOLO · YOLOv5 · Crowd counting · Density estimation ·

Crowd analysis
1 Introduction
Crowd counting is an approximation of the number of persons in an image or a

video sequence. It is extensively used in application domains such as public safety,
traffic monitoring, surveillance, smart city strategies, crowd abnormality detection,
Ruchika (B) · R. K. Purwar · S. Verma

Guru Gobind Singh Indraprastha University, Delhi, India
e-mail: ruchika.usict.077164@ipu.ac.in
R. K. Purwar
e-mail: ravindra@ipu.ac.in
https://doi.org/10.1007/978-981-16-7610-9_71
976 Ruchika et al.
etc. [1]. Crowd counting has remained a persistent challenge in the computer vision
and machine learning fields due to perspective distortions, severe occlusions, diverse
densities, and other issues. Existing research has primarily focused on crowds of
consistent density, i.e., sparse or dense. In the real world, however, an image may
include uneven densities due to camera perspective and changing distribution of
people in the crowd. As a result, accurately counting individuals in crowds need to
focus on all density levels.
Crowd counting can be done in number of ways. The fundamental approach is to
count manually, but it is infeasible to do in moderate and highly congested scenes.
Another approach is to enumerate the strength of humans in a video frame, further
extrapolating it to the whole frame to estimate the total strength. Since no such
algorithms give a precise count, but computer vision techniques can produce notably
accurate estimates. Broadly used five methods of crowd counting are given below
[2]:
• Detection-based methods: Detectors are like moving windows used to identify
and count the people in an image. This method can further be classified as:
Monolithic detection: a direct approach to count people in an image or video.
Pedestrian detection involves this technique by training the classifier based on full
human body appearance [3–5].
Part-based detection: classifiers are trained on partially occluded humans body
parts like head, shoulder, face, etc., to count the people [6–8].
Shape based detection: realistic shape prototypes are used for detection
purposes. These prototypes are employed to identify the people in images.
Multi-sensor detection: this technique incorporates multi-view data generated
by multiple surveillance cameras applied in an area. However, multi-camera setup
suffers due to varied resolutions, viewpoints, variations in illuminations and back-
grounds. Different solutions to such issues are spatio-temporal occurrences, scene
structures, object size [9, 10].
• Clustering-based methods: This method uses relative uniformity in visual
features and individual motion fields. Lucid feature trajectories are clustered to
show the independently moving entities. Kanade-Lucas-Tomasi (KLT) tracker
[11], Bayesian clustering to track and group local features into clusters [12],
generate head detection-based person hypothesis [13] are some the techniques
usedin this method.
• Regression-based methods: In these methods, patches are cropped from the
image, and corresponding to each patch, low-level features are extracted. Density
is estimated based on the collective and holistic description of crowd patterns
[14].
• Density estimation-based methods: A density map is formed for objects in the
image. Extracted features and their object density maps are linearly mapped.
Random forest regression is also used for learning non-linear mapping.
• CNN-based methods: CNNs are used to build an end-to-end regression model
to analyse an image. Crowd counting is done on the whole image rather than
Analytical Study of YOLO and Its Various Versions … 977
only on a particular part of it. CNNs give remarkable results when working with
regression or classification tasks for crowd counting.
For sparse crowd authors in [11, 15] used sliding window detectors and hand
crafted features are used by authors in [16, 17] for regression-based techniques. These
techniques are not effective in dense crowd counts due to occlusions. Researchers
used CNN-based approaches to predict the density and gives better results as given
in [18–22].
YOLOv5 [23] has been introduced for object detection of different types of objects
in video and images. İn this paper, it has been analysed exclusively for crowd counting
in different video sequences ranging from low to high density. Weights are obtained
by training the model on the COCO dataset [24]. Based on these pre-trained weights,
model is tested on three different datasets. Results show that the model works well
for low to medium dense crowds, but the performance degrades in densely crowded
scenarios.
2 Literature Survey
YOLO—You Only Look Once is a fast and easy-to-use model designed for object
detection. YOLO was firstly introduced by Joseph Redmon et al. [25] in the year
2016. Future versions YOLOv2, v3, v4, v5 were published with improvements in
previous releases. Table 1 describes the basic working and year of publication of all
the YOLO versions. Table 2 shows the features and limitations of all the versions of
YOLO.
2.1 YOLO
Before YOLO, classifiers were used for object detection, but in YOLO, the full image
is directly used in NN to speculate class probabilities and bounding boxes. In real-
time, YOLO processing speed is 45 fps. Its modified version, Fast YOLO processes
at 155 fps. The detailed architecture of YOLO can be found in [25].
Steps involved in object detection are as given below:
1. The input image is divided into A × A grids.
2. A grid cell containing the center of the object is used for detection.
3. Predict bounding box B and confidence score C for each grid cell.
4. The confidence of model for the bounding box and the object in is predicted as:
Confidence scores = Pr(object)*IoUtruth

pred
If no object C = 0; Otherwise, C = IoU(ground truth and predicted box)
978 Ruchika et al.
Table 1 YOLO, its versions, and their working procedures

YOLO versions Publication year Working
YOLO [25] 2016 • Divide image in a grid size A X A, and detect
bounding box (BB)
• Calculate object center coordinates (x, y),
height and width (h, w), confidence score C,
conditional probability Pr factors for different
classes
• Select boxes depending on the threshold for
confidence and Intersection of Union (IoU)
• Non-Max suppression removes duplicate
detection of object in the image by accepting
the prediction with maximum confidence level
from all detected BB
• Detection window with threshold more than the
threshold of both IoU and confidence score is
accepted
YOLOv2/YOLO9000 [26] 2017 • Batch normalization layers are used after each
convolutional layer
• Model consists of 30 layers
• Anchor box is added to the model architecture
YOLOv3 [27] 2018 • Network has 106 layers
• Small to tiny objects ate detected on three
different scales
• Nine anchor boxes are taken corresponding to
three boxes per scale
• It is a MultiLabel problem with modified error
functions
YOLOv4 [28] 2020 • Model uses Weighted-Residual-Connections
(WRC), Cross Stage-Partial Connections
(CSP), to improve the learning competency of
CNN
• Self-adversarial training (SAT), and data
augmentation techniques operates both in
forward–backward stages of network
• A new self-regularized non-monotonic
activation function is used
• Mosaic data augmentation mix four training
images instead of single image
• Drop block regularization method is applied for
CNN
YOLOv5 [23] 2020 • YOLOv5 has four variations as YOOv5-s, -m,
-l, -x: small, medium, large, extra-large,
respectively
• A two-stage detector contains Cross Stage
Partial Network (CSPNet) [29] as backbone,
Path Aggregation Network (PANet) [30] as
head of model
Table 2 Features and limitations of YOLO and all of its versions

YOLO version Features Limitations
YOLO • YOLO is a fast regression-based • Accuracy of YOLO is less
problem. İt train and test the • YOLO cannot localize small
whole image for detection, so it objects in a group efficiently
implicitly encodes • It is not able to detect objects
circumstantial data about classes with unusual aspect ratios
and their appearance • YOLO uses ’A x A’ grid size,
• Background errors are almost and each grid can predict one
half in YOLO than the existing class of objects, which leads to a
method Fast R-CNN limited number of detections, so
• YOLO represents objects it misses various objects
generally, i.e., training the model • It can detect a maximum of ’A x
in real images and testing on A’ objects
artwork. Probability of system YOLO classifier network trains
failure is very less even when and detect on images with
tested on novel fields or hasty resolution 224 X 224 and
inputs [31, 25] 448 X 448, respectively. So,
• In YOLO, the whole model is simultaneous switching between
trained collectively and is two different resolutions is
trained on the loss function that required
is directly linked to detection
accuracy
YOLOv2/YOLO 9000 • It has better speed and accuracy • Dimensions of anchor boxes are
than YOLO selected manually
• It can execute small-sized, low • Model instability occurs as
and high-resolution images, location coordinates (x, y) of the
high-framerate videos, multiple anchor box is predicted during
video streams, and real-time initial iterations
videos • It takes longer to stabilize the
• Network can realize generalized prediction due to random
object representation, so model initialization
training on real world images is
easy
YOLOv3 • It can detect tiny objects • Reduced speed due to Darknet53
• It can detect the objects at three architecture
different scales • Dissimilar-sized objects are
• More number of bounding boxes difficult to detect
are there in YOLOv3, so • It is not easy to detect objects
perform better predictions that are very close to one another
• Multilabel classification of • It is not applicable in sensitive
detected objects is possible domains like autonomous
driving, surveillance, security,
etc
YOLOv4 • For accuracy and convergence • YOLOv4 is incompatible with
speed, Complete IoU loss is mobile device integration with
better than Bounding Box virtual reality [32]
regression problem
• It can be trained even on a single
GPU
(continued)
980 Ruchika et al.
Table 2 (continued)
YOLO version Features Limitations
YOLOv5 • YOLOv5 is blazingly fast and • YOLOv5 has limited
accurate performance on highly dense
• It can detect objects with images and videos
inconsistent aspect ratios
• YOLOv5 architecture is small,
so it can be deployed to
embedded devices easily
• Pycharm weights of YOLOv5
can be translated to Open Neural
Network Exchange (ONNX)
weights to Core Machine
Learning (CoreML) to iOS
5. For each bounding box, five prediction and confidence parameters are: center
coordinates of box w.r.t. grid cell boundary (cx , cy ), height and width of bounding
box relative to the image size (h, w), predicted confidence C, the IoU between
ground truth and predicted box.
6. Single conditional class probability, Pr(classx |object), is predicted for each grid
cell.
7. For testing, class-specific confidence score for all bounding boxes is computed
as:
Pr(classx |object) ∗ Pr(object) ∗ IoUtruth

pred
= (x + a)n = Pr(classx ) ∗ IoUtruth

pre (1)
2.2 YOLOv2/YOLO9000
YOLOv2 is designed to execute various image resolutions with improved speed

and accuracy. A hierarchical view of object classification is used in YOLOv2, and
object detectors can be trained on different datasets for detection and classification.
İn YOLOv2 batch normalization layer is used after each convolutional layer, and
dropout layers are removed without overfitting. For ten epochs on ImageNet, the
classification network is adjusted at 448 × 448 resolution; meanwhile, the network
filters are adjusted for higher resolution images to give efficient results. Bounding
box prediction is made by adding anchor boxes [28] replacing fully connected layers
of YOLO. For high-resolution outputs, pooling layers are also removed. For a single-
centered cell, the network operates on 416 input images rather than 448. IoU is the
main factor in predicting objects in the image. The network takes five terms for all
bounding boxes: start coordinates bx , by ; width and height bw , bh ; and centroid bo .
Cell offset of the image from its top left corner is (cx , cy ), and old width and height
of bounding box are ow , oh . YOLOv2 computes the predictions as below:
Px = σ (bx + cx ) (2)

Py = σ b y + c y (3)
Pw = ow ebw (4)
Ph = oh ebh (5)
Pr(obj)*IOU(P, obj) = σ bo (6)
Variation in YOLOv2 is using DarkNet-19 along with it results in a faster network

than YOLOv2. Another variant of YOLOv2 is YOLO9000. In YOLO9000, classifi-
cation and detection are done collectively. It is capable of detecting 9000 different
objects from the images and videos using WordTree method. YOLO9000 trains on the
ImageNet classification and COCO detection dataset simultaneously. The network
can detect unlabeled objects in images also.
2.3 YOLOv3
YOLOv3 replaced YOLO9000 because of higher accuracy. Due to the complex archi-
tecture of DarkNet-53, YOLOv3 is slower, but accuracy is better than YOLO9000.
For each bounding box, the network predicts four coordinates labeled as nx , ny , nw ,
nh , from the top left corner of the image the cell offset is (ox , oy ), and initial height
and width of bounding box is (ih , iw ).
Method to predict next location is shown in Fig. 1. It is derived as:
Fig. 1 Initial dimension and

next location prediction of
bounding box [27]
982 Ruchika et al.
Px = σ (n x ) + ox (7)

Py = σ n y + o y (8)
Pw = i w en w (9)
Ph = i h en h (10)
The overlapping threshold is fixed at 0.5. The prediction score of objectness for
any bounding box is 1, i.e., the predicted bounding box covers the maximum area
of the ground truth object. YOLOv3 predicts across three different scales, i.e., the
detection layer detects three different-sized feature maps. For each scale, three anchor
boxes are assigned, so nine boxes perform better than YOLO and YOLO9000 both.
2.4 YOLOv4
The YOLOv4 is a fast and accurate real-time network. The structure of YOLOv4
is built using CSPDarknet53 [30] as backbone; Path Aggregation Network (PAN)
[2] and Spatial Pyramid Pooling (SPP) [33], as the neck; and YOLOv3 [27] as
the head. Authors combine several universal features as the backbone and detector
in two categories Bag of Freebies (BoF), Bag of Specials (BoS). BoF is the tech-
nique that increases the training cost, and it also improves the detector’s accuracy
within a similar estimated time. Whereas the BoS comprises several plugins and
post-processing units, which improves detection accuracy with a little increment in
inference cost.
A number of crowd counting methods have been proposed in literature. Some
of them are suitable for low-density videos, while others are suitable for moderate
density videos. Further, there is very little work that is applicable for all density
crowd videos. Therefore, there is a need to design ubiquitous methods for all density
crowd videos.
3 Proposed Model
Detailed explanation of proposed model is given in this module. The main aim of this
study is counting the number of people in images and videos using the object detec-
tion technique YOLOv5. Glen Jocher from Ultralytics LLC [23] designed YOLOv5
and published it on GitHub in May2020. YOLOv5 is an improvement on YOLOv3
and YOLOv4 and is implemented in PyTorch. Till date, developer is updating and
Table 3 Variations of YOLOv5 model

YOLOv5 Number of parameters Number of frames per second on GPU
YOLOv5s (smallest) 7.5 million 416
YOLOv5m 21.8 million 294
YOLOv5l 47.8 million 227
YOLOv5x (largest) 89.0 million 145
improving yolov5 by adding new features related to data augmentation, activation

function, post-processing to attain best possible performance in detecting the objects.
A major emphasis of new features and enhancements in YOLOv5 is the cutting
edge for deep learning networks in terms of data augmentation and activation func-
tions. They were partly adapted from YOLOv4 like CSPNet Wang et al. (2000) and
partly derived by the YOLOv5 maintainer prior to YOLOv4 contributions. YOLOv5
works on mosaic augmented architecture, i.e., combining N number of images to get
a new image, which makes it fast and more accurate in detecting small objects. This
enables object detection external to their typical environment and at lower dimen-
sions that decreases the demand of large mini-batch sizes. The YOLOv5 architecture
incorporates the extraction of final prediction boxes, features and object classifica-
tion into a neural network. It simultaneously extracts prediction boxes and detects
objects accordingly from the whole image. Compact model size and high-inference
speed are reported by YOLOv5, allowing for easy transformation to mobile use cases
thru model export. All the versions of YOLOv5 are pre-trained on COCO dataset.
YOLOv5 comprises four distinct models as given in Table 3.
Different models of YOLOv5 also vary in terms of width and depth of model and
the layer channels, with values 1.23 and 1.33, respectively, for YOLOv5x model.
The pre-trained YOLOv5x model is used in this study. List of layers with number of
filters and their size is tabulated in Table 4.
YOLOv5x is a two-stage detector trained on MS COCO [24], consists of Cross
Stage Partial Network (CSPNet) [29] as backbone, Path Aggregation Network
(PANet) [30] used in head of model for instance segmentation. Two convolutional
layers with filter size 1 × 1 and 3 × 3 are composed in Bottleneck CSP unit. Spatial
Pyramid Pooling network (SSP) [2] is the backbone of architecture allowing variable
size input images and is robust against object distortions, i.e., it can detect the objects
with different aspect ratio. Figure 2 shows the overall architecture of YOLOv5x.
YOLOv5 repository uses V100 GPU along with PyTorch FP16 inference, for
person identification in images and videos. Pre-training of model is done with a
batch size of 32 labeled images on COCO model and corresponding weights are
used for object detection. Steps of crowd counting using YOLOv5 are as below:
1. Train YOLOv5 model using the COCO pre-trained weights.
2. The RESNet classifier is used for classification of object using pre-trained
weights.
3. For testing the model, the image or video is loaded in inference directory.
984 Ruchika et al.
Table 4 Layer in YOLOv5x

Layers Number of filters Filter size
[33]
Backbone
Focus 12 3×3
Convolutional 160 3×3
BottleneckCSP (4 layers) 160 1×1+3×3
SPP
Head
Upsample 2
Upsample 2
Detection
Fig. 2 Overview of YOLOv5 architecture [33]

4. The model weights are optimized using strip optimizer for input images and
videos.
5. Images are rescaled and reshaped along bounding boxes using normalization
gain.
6. The boxes are then labeled according to pre-trained weights classes.
7. Weights are processed on the model and generates bounding boxes on input
image or video frames to produce the person count as output.
The given model is tested on three benchmark datasets, AVENUE dataset [34], PETS
dataset [31], Shanghaitech dataset [35], out of which, the first two datasets represent
low to medium crowd density. In contrast, the last one is for high-density crowds.
AVENUE Dataset: The dataset consists of 35 videos. 30,652 frames corre-
sponding to these videos are used for testing.
PETS Dataset: Regular workshops are held by the Performance Evaluation of
Tracking and Surveillance (PETS) program, generating a benchmark dataset. This
dataset tackles group activities in public spaces, such as crowd count, tracking indi-
viduals in the crowd and detecting different flows and specialized crowd events.
Multiple cameras are used to record different incidents, and several actors are
involved. Dataset consists of four different subsections with different density levels.
Shanghaitech Dataset: It is a large crowd counting dataset. İt contains 1198
annotated crowd images. Dataset is divided into two sections: Part-A and Part-B,
containing 482 and 716 images, respectively. In all, 330,165 persons are marked in
the dataset. Part-A images are collected from the Internet, while Part-B images are
from Shanghai’s bustling streets.
Figures 3, 4, and 5 show the detected humans in bounding boxes for AVENUE,
PETS2009, and Shanghaitech datasets. Average Precision (AP) and Mean Absolute
Error are the evaluation parameters. Table 5 shows the average precision for three
datasets. Table 6 gives the resultant MAE for all datasets.
It can be seen that AP for AVENUE and PETS2009 datasets is 99.5%, 98.9%,
respectively, whereas it is 40.2% for the high-density dataset Shanghaitech dataset.
Further, in terms of MAE, the value is higher for high-density dataset. Therefore,
it has been concluded that YOLOv5 works more efficiently for crowd detection for
low to medium videos, and its performance degrades for high-density videos.
The proposed work analysis the performance of YOLOv5, which is a CNN-based

architecture for crowd counting in different video sequences, ranging from low to
high density. Through experimental resuls, it has been observed that crowd counting
is done in a better fashion for low to medium density videos only.
986 Ruchika et al.
Fig. 3 Detection of humans in various frames of different video sequences of AVENUE dataset
(i)–(ii) represent frame 1, and frame 51 of video sequence 2, (iii)–(iv) represent frame 20 and frame
101 of video sequence 7, (v)–(vi) represent frame 301 and 451 of video sequence 13
Fig. 4 Detection of humans in various frames of different video sequences of PETS2009 dataset.
(i)–(ii) represent frame 1, 1001 of video sequence 2, (iii)–(iv) represent frame 101, 651 of video
sequence 8, and (v)–(vi) represent frame 1, 251 of video sequence 13
As future work, authors are working on modifying the YOLOv5 architecture to

make it suitable for dense video sequences.
Fig. 5 Detection of humans in different images of Shanghaitech Dataset
Table 5 Average precision

Dataset Average precısıon (AP)
for three datasets
AVENUE 99.5
PETS2009 98.9
SHANGHAITECH 40.2
Table 6 MAE for all the

Dataset MAE
datasets
AVENUE 5.62
PETS2009 4.89
SHANGHAITECH 55.28
Acknowledgements This work is sponsored by Visvesvaraya Ph.D. Scheme issued by the

Ministry of Electronics and Information Technology, Govt. of India, as employed by Digital India
Corporation.
988 Ruchika et al.
References
1. Ford M (2017) Trump’s press secretary falsely claims: ‘Largest audience ever to witness an
inauguration, period.’ The Atlantic 21(1):21
2. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation.
In: Proceedings of the IEEE conference on computer vision and pattern recognition 2018, pp
8759–8768
3. Cheng Z, Zhang F (2020) Flower end-to-end detection based on YOLOv4 using a mobile
device. Wirel Commun Mob Comput 17:2020
4. Leibe B, Seemann E, Schiele B (2005) Pedestrian detection in crowded scenes. In: 2005 IEEE
computer society conference on computer vision and pattern recognition (CVPR’05), vol 1,
IEEE, pp 878–885, 20 June 2005
5. Tuzel O, Porikli F, Meer P (2008) Pedestrian detection via classification on riemannian
manifolds. IEEE Trans Pattern Anal Mach Intell 30(10):1713–1727
6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: 2005 IEEE
computer society conference on computer vision and pattern recognition (CVPR’05), vol 1, 20
Jun 2005. IEEE, pp 886–893
7. Lin SF, Chen JY, Chao HX (2001) Estimation of number of people in crowded scenes using
perspective transformation. IEEE Trans Syst Man Cybern-Part A Syst Hum 31(6):645–654
8. Wu B, Nevatia R (2007) Detection and tracking of multiple, partially occluded humans by
bayesian combination of edgelet based part detectors. Int J Comput Vision 75(2):247–266
9. Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with
discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–
1645
10. Wang M, Li W, Wang X (2012) Transferring a generic pedestrian detector towards specific
scenes. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, pp 3274–
3281, 16 Jun 2012
11. Wang M, Wang X (2011) Automatic adaptation of a generic pedestrian detector to a specific
traffic scene. In: CVPR 2011. IEEE, 20 June 2011, pp 3401–3408
12. Lucas BD, Kanade T (1981) An iterative image registration technique with an application to
stereo vision 1981
13. Brostow GJ, Cipolla R (2006) Unsupervised Bayesian detection of independent motion in
crowds. In: 2006 IEEE computer society conference on computer vision and pattern recognition
(CVPR’06), vol 1 17. Jun 2006. IEEE, pp 594–601
14. Tu P, Sebastian T, Doretto G, Krahnstoever N, Rittscher J, Yu T (2008) Unified crowd segmen-
tation. In: European conference on computer vision. Springer, Berlin, pp 691–704, 12 Oct
2008
15. Wu B, Nevatia R (2005) Detection of multiple, partially occluded humans in a single image
by bayesian combination of edgelet part detectors. In: Tenth IEEE international conference on
computer vision (ICCV’05), vol 1. IEEE, pp 90–97, 17 Oct 2005
16. Chan AB, Vasconcelos N (2009) Bayesian poisson regression for crowd counting. In: 2009
IEEE 12th international conference on computer vision 2009 Sep 29. IEEE, pp 545–551
17. Ryan D, Denman S, Fookes C, Sridharan S (2009) Crowd counting using multiple local features.
In: 2009 digital image computing: techniques and applications. IEEE, pp 81–88, 1 Dec 2009
18. Ruchika, Purwar RK (2019) Crowd density estimation using hough circle transform for video
surveillance. In: 2019 6th international conference on signal processing and integrated networks
(SPIN). IEEE, 2019 Mar 7, pp 442–447
19. Kampffmeyer M, Dong N, Liang X, Zhang Y, Xing EP (2018) ConnNet: A long-range
relation-aware pixel-connectivity network for salient segmentation. IEEE Trans Image Process
28(5):2518–2529
20. Li Y, Zeng J, Shan S, Chen X (2018) Occlusion aware facial expression recognition using cnn
with attention mechanism. IEEE Trans Image Process 28(5):2439–2450
21. Zhang K, Zuo W, Chen Y, Meng D, Zhang L (2017) Beyond a gaussian denoiser: Residual
learning of deep cnn for image denoising. IEEE Trans Image Process 26(7):3142–3155
22. Chao H, He Y, Zhang J, Feng J (2019) Gaitset: regarding gait as a set for cross-view gait
recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp
8126–8133, 17 Jul 2019
23. Jocher G, Changyu L, Hogan A, Changyu LY, Rai P, Sullivan T (2020) Ultralytics/yolov5. Init
Release. https://doi.org/10.5281/zenodo.3908560
24. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Doll’ar P, Zitnick CL (2014)
Microsoft COCO: common objects in context. In: ECCV, 2014. ISBN 978-3-319-10601-4.
https://doi.org/10.1007/978-3-319-10602-148
25. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object
detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition
2016, pp 779–788
26. Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE
conference on computer vision and pattern recognition 2017, pp 7263–7271
27. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.
02767. 8 Apr 2018
28. Bochkovskiy A, Wang CY, Liao HY (2020) Yolov4: optimal speed and accuracy of object
detection. arXiv preprint arXiv:2004.10934. 23 Apr 2020
29. Davies AC, Yin JH, Velastin SA (1995) Crowd monitoring using image processing. Electron
Commun Eng J 7(1):37–47
30. Wang CY, Liao HY, Wu YH, Chen PY, Hsieh JW, Yeh IH (2020) CSPNet: a new backbone
that can enhance learning capability of CNN. In: Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition workshops 2020, pp 390–391
31. Lu C, Shi J, Jia J (2020) Abnormal event detection at 150 fps in MATLAB. In: Proceedings of
the IEEE international conference on computer vision 2013, pp 2720–2727. Available at: http://
www.cse.cuhk.edu.hk/leojia/projects/detectabnormal/dataset.html [Accessed 15 Nov 2020]
32. Zhang Y, Zhou D, Chen S, Gao S, Ma Y. Single-image crowd counting via multi-column
convolutional neural network. In Proceedings of the IEEE conference on computer vision and
pattern recognition 2016, pp 589–597
33. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks
for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
34. Oh MH, Olsen P, Ramamurthy KN (2020) Crowd counting with decomposed uncertainty. In:
Proceedings of the AAAI conference on artificial intelligence, vol 34(07), pp 11799–11806, 3
Apr 2020
35. Ferryman J, Shahrokni A (2009) Pets2009: dataset and challenge. In: 2009 twelfth IEEE inter-
national workshop on performance evaluation of tracking and surveillance. IEEE, 7 Dec 2009,
pp 1–6. Available at: http://www.cvg.reading.ac.uk/PETS2009/a.html [Accessed 15 July 2021]
IoT Enabled Elderly Monitoring System
and the Role of Privacy Preservation
Frameworks in e-health Applications
Vidyadhar Jinnappa Aski, Vijaypal Singh Dhaka, Sunil Kumar,

and Anubha Parashar
Abstract Healthcare IoT (HIOT) or electronic health (e-health) is an emerging

paradigm of IoT in which multiple bio-sensors are capturing body vitals and dissemi-
nating captured information to the nearest data center through the underlying wireless
infrastructure. Despite the observation of such rapid research and development trends
in e-health field with its key facets (i.e., sensing, communication, data consolidation,
and delivery of information) and its inherent benefits (e.g., error reduction, homecare,
and better patient management), it is still facing several challenges. These challenges
are ranging from development of an interoperable e-health framework to design of
an attack free security model for both data and device. In this article, an overview
of recent technological trends in designing HIoT privacy preservation framework
is provided, and the corresponding security challenges are discussed subsequently.
Alongside, we also propose an architectural framework for monitoring health vitals
of differently abled or a patient with degenerative chronic disorder. The interaction of
application components is illustrated through the help of different use-case scenarios.
Keywords Healthcare IoT (HIoT) · Interoperability · e-health · Access control ·

Authentication
V. J. Aski (B) · V. S. Dhaka · S. Kumar

Department of Comupter and Communicaton Engineering, Manipal University Jaipur, Jaipur,
India
e-mail: Vidyadharjinnappa.aski@jaipur.manipal.edu
V. S. Dhaka
e-mail: vijaypalsingh.dhaka@jaipur.manipal.edu
S. Kumar
e-mail: kumar.sunil@jaipur.manipal.edu
A. Parashar
Department of Computer Science and Engineering, Manipal University Jaipur, Jaipur, India
e-mail: anubha.parashar@jaipur.manipal.edu
https://doi.org/10.1007/978-981-16-7610-9_72
992 V. J. Aski et al.
1 Introduction
IoT has gained a massive ground in day-to-day’s life of the researchers and prac-
titioners due to its capability of offering an advanced connectivity and uniquely
identifying nature of every physical instance on the planet earth. Thanks to IPV4 and
IPV6 addressing spaces which facilitates the seamless addressing schemes that can be
remotely being called. These technological evolutions offer the new ways for smart
objects to communicate with things (M2T), machines (M2M), and humans (M2H)
[1]. With the rapid increment in the requirements of remote monitoring, the IoT
enabled healthcare devices can be commonly seen in many domestic places nowa-
days. These devices help in monitoring body vitals majorly temperature, heartrate,
blood pressure, and glucose level., thus by enabling patients with self-helped diag-
nosis and avoids overcrowding at hospitals. The captured vitals are being constantly
uploaded to iDoctor like cloud centric data centers and eventually one can seek
profession medical advices on regular basis [2]. In addition, such self-helped medical
services are becoming more prominent and able to produce more accurate results
because of recent advancements in wireless communication techniques (WCTs) and
information technologies [3, 4].
The privacy protection (data or device) is one of the key problems that IoT is facing
since its inception. There is an immense need for a universal solution to overcome
such issues in order to widely accept the technological solutions in critical appli-
cation domains such as healthcare and allied sectors. Researchers are witnessing
a huge spike in medical information leakage patterns compromising the security
goals from recent past since the malwares are becoming more vulnerant and resistive
[5, 6]. For instance, the US’s second-largest health insurance corporation database
was once targeted by hackers and led to the leakage of approximately 80 million
customers’ personal information including health data of those individuals. There-
fore, the privacy protection strategies play a vital role in designing any e-health
application. In this article, authors provide a holistic overview of recent trends and
techniques that are employed in development of healthcare privacy preservation
frameworks and associated security concerns.
The following scenario help us understand as to how the privacy breaching has
becoming day-to-day reality in all our lives. Akshata looks at her smart wrist-band
while doing the regular workouts and observes that the heartrate was little higher
than the target rate (Target heart rate during the exercise should be normally between
the range of “220 minus your age” [7]). After the workout, she asks Alexa (smart
speaker) to book her an appointment with nearest cardiologist for general heart health
checkup. Next day after completing her office works, she visit to a cardiologist and felt
relaxed when doctors reassured nothing was wrong and the heartrate went high due
to the intensified workouts. The next time Akshata uses her browser, she felt irritated
because of those annoying adds related to heart medications, heart complications,
and many tutorials about identifying heart attack were popping up on the browser
constantly. Things got more worsen when she received a telephone call from a health
insurance agency to recommend her a plan. This could be only one of several such
IoT Enabled Elderly Monitoring System and the Role of Privacy … 993
incidents where the modern technology brings us the high risks of privacy and it
became unavoidable in daily life.
2 Related Work
Our primary objective is to carry out a joint assessment of privacy preservation

protocol surveys on healthcare IoT (data and device prospects). Secondly we aim to
investigate the research studies that are simultaneously or separately discussed the
implementation of privacy preservation protocols in HIoT application scenarios. In
this direction, we have divided the literature review in following subsections.
2.1 Survey Articles on HIoT Security and Privacy Aspects
There exist several survey articles in the domain of HIoT security and privacy
concerns [8–13]. Most of these survey articles overview on security issues and privacy
aspects along with the proposed solutions in healthcare IoT prospects as shown in
the Table 1. In the proposed article, we have focused on providing a holistic overview
on privacy protection of healthcare data and devices. In addition, we also provide a
conceptual architecture for designing an application for monitoring health status of
elderly (patient with degenerative disorder) and differently abled person.
2.2 Research Study on Implementation of Security Protocols

and Algorithms
In this subsection, we review some of the prominent security algorithms proposed

from a recent past. In [15], authors proposed a scheme for labeling the data in
order to manage the privacy in healthcare IoT applications. With the help of adopted
techniques to control the flow of information, events representing the data are tagged
with several private attributes such as DoB and place. These tagging features enable
the data privacy in the applications. This model was not suitable for largescale IoT
application due to its nature of being extremely difficult to handle the large set of user
attributes through a small computing platform. In [16], the authors proposed an access
control algorithm for preserving the privacy of IoT user which is based on privacy
policy of anonymous nature. In this scheme, user can have a control over the data and
define the policies to enable which system user is having what kind of access rights
and the same can be manipulated in real time. In [14], authors proposed an algorithm
for naming the dataflow in a continuous manner with the help of adaptive cluster.
In this cluster-based approach, it guarantees the novelty in naming and imposes the
994
Table 1 Comparative analysis of existing survey articles in the domain of HIoT and implementation issues
Author and year of Aim of the article Security concerns Architecture Research questions Open issues Challenges Drawbacks
publication discussed
Luis et al. [8] An exhaustive Access controlling ✕ ✓ ✕ ✕ Limited frameworks
literature review on rules and policies
medical IoT
Abbas et al. [9] Survey on health Attribute-based ✕ ✕ ✓ ✓ Centralized
clouds encryption architecture
Nuaimi et al. [10] Survey on healthcare Not mentioned ✕ ✕ ✓ ✕ Limited scope
cloud implementation variation
Idoga et al. [11] Comprehensive on AES ✕ ✕ ✓ ✕ Application scope
security issues of limitation
e-health
Pankomera et al. A review on security Not mentioned ✓ ✕ ✓ ✕ Lack of public health
[12] and privacy issues in concerns
healthcare
Olaronke et al. [14] A survey on bigdata Biometric security ✕ ✕ ✕ ✕ Incomplete
challenges in functions information
healthcare
Proposed study A holistic overview Access control and ✓ ✓ ✓ ✓ NA
on privacy authentication
preservation schemes
V. J. Aski et al.
latency restrictions on continuous dataflow. In addition, there are several research

studies published in recent past which centralizes their discussion on HıoT privacy
[17–23]
3 Privacy Preservation in HIoT and Its Implications
IoT provides healthcare consumers with a high degree of control on how to carry
out day-to-day tasks that are ranging from capturing the data from patient body to
disseminate the captured information to the remote servers for further analysis. It
also provides a way to saturate the patient environments with the smart things. Smart
things denote a broad spectrum made of low power computing platforms such as
microcontroller, microprocessors, sensor area network, and wireless communication
entities which help data to be settled at a cloud platform. Figure 1 describe a generic
healthcare environment which comprises sensor network, data processing platforms,
wireless infrastructure, cloud platform, and the base station where multiple health
workers are being benefitted. HIoT can be implemented in both static and dynamic
environments. In static environment, patient’s movement is static to a place like
ICU and physician’s examination hall (In-hospital monitoring use case of Fig. 1).
Data
Packets
GPRS/ UMTS/ Admin

CDMA Updates
Smart
Outdoor Patients Rehabilitation
Local Network
Processing Gateway
Unit (LPU)
of Sensor
Nodes
Prescription
upload Patient
IP Network Monitoring
Out Patient
Monitoring
Medical
Data Server
Packets
Cellular
Services Enhanced Drug
Management
Indoor Patients
Wi-Fi
gateway
Outdoor Wi-Fi
Access Points
Ambulance
Monitoring Improved Hospital
Report
Air Resource Utilization
Analysis
Road Ambulance Ambulance
HIoT Use-cases
In-hospital
Patient Monitoring 2G/3G/4G, LPWAN (Sig fox, LoRA), WiMAX, ZigBee, Wi-Fi, Wireless Connectivity
BLE,6LoWPAN, NFC, RFID
WSN Connectivity
Backbone Wireless Protocols for HIoT Systems
Fig. 1 Generic healthcare IoT architecture

However, in dynamic environments, patient can wear the device and perform his daily
activities such as jogging and walking (out patient monitoring use case of Fig. 1). In
this article, we provide privacy concerns related to healthcare application. In addition,
the risk exposure for these devices are much higher at the place of development than
that of the deployment and a safeguard technique shall be used to prevent these
devices from the security threats at deployment place.
Given the high vulnerable nature of IoT devices, it is essential for us to know
the risks and challenges such devices pose to the privacy of patient data. Moreover,
one has to obtain satisfactory answer to the following question before going to opt
an IoT enabled healthcare device from a hospital. Is it possible to get a device
that fully supports privacy preserving and safe environments like the traditional
internet provides? To get a precise response to this questionn one must understand the
logical differences between trust, privacy, and confidentiality. Privacy in healthcare
IoT terms is defined as the information of any individual’s health data must be
protected from third party accesses. In the same way, privacy also means that the
information should not be exposed to others without an explicit consent of a patient.
It is a fundamental right of a patient to decide whom to share his/her data. For
instance, in our previous example, only Akshata who decides whether to share the
data to insurance company or not. In the same way, trust is consequential product of
transparency and consistency. Finally, the confidentiality is a factor that decides the
data and manages the right person is accessing right data, and it prevents data being
accessed from unauthorized entities. If I say, my data is confidential that means the
data is accessible only by me and without my permissions no one is authorized to
access.
4 Taxonomy of Security and Privacy Preservation

in Healthcare
In this section, we deliberate the various privacy concern issues and security aspects.
Mainly we categorize the different security and privacy concerns in three heads
such as process based, scheme based, and network and traffic based. The detailed
classification is explained in the taxonomy diagram as shown in Fig. 2.
4.1 Process-Oriented Schemes
The modern lifestyle has generated a need to incorporate the huge number of smart
devices around us. These smart devices are made of multiple sensors and actuators
comprising data acquisition system which captures numerous physical parameters
such as temperature and heartrate. These devices creating massive amount of data
Fig. 2 Security and privacy taxonomy in HIoT
which can only be handled with specific set of algorithms specified in Bigdata tech-
nologies. This massive amount of data is using open channel such as Internet. Though
it is needless to mention the vulnerable nature of Internet, its more challenging job
is to handle the issues of process-based techniques such as cloud computing, fog
computing, and edge computing.
4.1.1 Distributed Approaches
In the centralized systems, the user data is stored at one central database, and it is
queried as and when required by end user from the same database. During the failure
of central server, the whole system is frozen and it is difficult to recover the lost data.
Therefore, it is a biggest drawback of such systems. In distributed systems, such
issues can be easily handled. Table 2 shows the state-of-the-art security protocols in
distributed computing field.
4.1.2 Centralized Approaches
Traditional Internet technologies are rapidly benefitted by the latest advancements

in WCTs and ICTs. Due to such advancements, there is enormous data is being
generated in all the sectors. Storage and processing of such humongous data in
traditional internet devices is difficult. This problem can be solved by the use of
centralized cloud-based systems. Table 3 discusses the various state-of-the-art in the
centralized computing area.
Table 2 State-of-the-art security protocols in distributed computing field

State-of-the-art Contribution Schemes proposed Observations
Hamid et al. [24] Designed an Bilinear matching It does not provide
authenticated key cryptography security against MIM
agreement protocol for attack, key theft, and
distributed systems brute force attack
Zhou et al. [25] Designed an Hybrid real-time More vulnerable to
authenticated Key cryptography algorithm threats and also
agreement scheme for patients’ confidentiality
distributed systems may be compromised
overcoming the and not resistive toward
drawbacks of [24] replay attack
Kaneriya et al. [26] Presented a scheme Multi-authority ABE Not resistive towards
that handles the replay MIM attack
attack
Mutlag et al. [27] They highlighted the Vision and key Non suitable for
limitations of properties of FC bandwidth sensitive
computation, storage, IoT applications
and networking
resources in a
distributed environment
4.1.3 Decentralized Approaches
There are multiple entities involved in managing an entire healthcare industry.

These entities could be pharmacy, patient group, doctors’ community, and emer-
gency response team. IoT provides a unique way to bind these entities together on
a single platform. When they are centrally connected by means of a network, it is
essential to generate trust among all of them. Distributed computing approaches are
used in managing the trusted relationship between multiparty environments. Table 4
discusses the various state-of-the-art in the decentralized computing area.
4.2 Authentication Oriented Schemes
IoT exposes several internet paradigms to security vulnerabilities due to its highly
openness nature. In healthcare, the data needs to be securely captured, transferred,
stored, and processed. The sensitive or critical data such as body vitals of several
patients can be protected from unauthorized accesses with the use of password based
mechanisms, cryptographical algorithms, or biometric authentication schemes.
Table 3 State-of-the-art security protocols in centralized computing area

State-of-the-art Contribution Schemes proposed Observations
Zhou et al. [28] Designed a Medical text mining It is more secure for
privacy-preserving and image feature input and output data
dynamic medical text extraction approach also reduced both the
mining and image communication and
feature extraction computational cost
approach in the cloud
healthcare system
Ziglari et al. [29] Analyzed the security Performed a security They presented an
aspects In the analysis between architecture for the
deployment models service providers and deployment of
for the healthcare cloud service providers information technology
system systems that are
generated on the cloud
by multiple cloud
providers
Requena et al. [30] Proposed a design for Scheme to permit NA
a cloud-assisted patients to access the
radiological gateway health images and
diagnosis reports from
the cloud
Huang et al. [31] Proposed a secure Biometric-based secure It protects against
PHR system to collect health data collection known attacks such as
and (BBC) and replay, and authors
attribute-based health claimed that their
record accessing scheme is efficient in
terms of storage,
computational, and
communication needs
The primary concern in healthcare IoT technology is to prevent the unauthorized

accesses. The password-based mechanisms provide an efficient way to protect the
data from such attacks. Here, the password needs to be periodically changed with
the complex combination of alphanumeric characters. Authors in [36] proposed
a lightweight technique to be implemented on miniaturized computers such as
Raspberry Pi. The scheme utilizes centrally shared key agreement scheme.
4.2.2 Biometric-Based Authentication Schemes
In biometric-based authentication schemes, various biometric features such as finger-

print, iris, gait, and facial feature set can be used to verify the legitimacy of the user.
This is the most effective mechanism which prevents unauthorized accesses. Figure 3
shows the comparison of different biometric features in accordance with the security
levels and accuracy.
Table 4 State-of-the-art security protocols in decentralized computing area

State-of-the-art Contribution Schemes proposed Remarks
Banerjee et al. [32] Proposed a Model to detect and The proposed system
blockchain-based prevent current threats has the capability to
decentralized system in IoT systems predict possible threats
and attacks
Gordon et al. [33] Presented a healthcare Information exchange They analyzed the
framework, which can model blockchain security
be used in different considering five
areas such as dimensions; data
patient-driven and aggregation, digital
institution driven access rules, data
liquidity, data
immutability, and
patient identity
Kshetri et al. [34] Did a comprehensive Comparative model The authors examined
analysis of blockchain between blockchain the activities of
characteristics and IoT associations between
concerning security in hierarchical systems
IoT-based supply and the healthcare
chains industry and discussed
several privacy
suggestions
Kumar et al. [35] Encountered the Not specified The authors
possible security and highlighted the
privacy problems in requirements of BC in
IoT and suggested a IoT and its broad scope
distributed of services in various
ledger-based fields
blockchain technology
Fig. 3 Comparison matrix of various biometric authentication mechanisms

4.3 Network Traffic Oriented Schemes
Figure 4 depicts the proposed HIoT layered architectural framework for monitoring
elderly and differently abled people. Here, we have derived different application
scenarios and components in accordance with their functionalities and requirements
into three layers. Object layer or perception layer is a layer where all the physical
objects such as sensors and actuators are functioning for a common goal of capturing
Application Layer
Signaling Web 2.0
Gateway
Static Sensor Mobile Sensor

Monitoring
(Application) (Application) (Application)
Server Server Business
Server Back-end Alerts
Application Server
SQL injection attack

Phishing attack
Static Sensor Mobile Sensor Monitoring Privacy thrat Attack
Database Operation Replay attack
Database Database Data corruption
Support Platform
(Signaling) attack
malware attacks
DDoS attack
Network/Gateways Layer
LoRA
Sybil attack
Gateway Eavesdropping
MIM Attack
Internet Network Replay attack
DDOS attack
Spoofing
Routing attack
LPWAN
WiMAX
Object Layer Hardware

User Monitoring tempering attack
Device Eavesdropping
Dos Attack
Physical attack
malware attack
Monitoring Unit
Room Temperature
and Light Intensity Sensor
GPS Sensor
Pulse Sensor
Gateway Enabled
MCU Joystick
ECG Sensor Controller
IR Sensor
Wheelchair
Motor Controller
Data Acquisition and
Sonar Sensor
Processing Unit Moment Monitoring
Pressure Sensors Sensor
Object Heart Activity
(b) Monitoring Sensor
(a) (c)
(a) Real-Time Monitoring of Foot Pressure and Heart Activity of Diabetic Patient.
(b) A Real-Time Assistive Smart Wheel-Chair for Parkinson Patient.
(c) A Real-Time Monitoring of an ICU Patient.
Fig. 4 Proposed HIoT layered architectural framework for monitoring elderly and differently abled
people
the data from patient body in dynamic environments. The network or gateway layer
is responsible for transporting data from DAQs to the storage infrastructures such
as clouds and fog nodes. Further, the application layer is responsible for performing
data analytical tasks such as creating graphs, flow charts to improve the business
processes. Sometimes application layer also called by the name of business layer.
Attack vectors of the layered architecture is also shown in Fig. 4.
Several researchers worked on model-based attack-oriented schemes to prevent
unauthorized accesses. For instance, authors in [37] presented a model-based attack
oriented algorithm to safeguard the healthcare data which work on the basic principle
of Markov model. Further authors in [38] designed a model to prevent the information
breaches in healthcare applications.
5 Proposed HIoT Architecture
The proposed HIoT architectural framework is shown in Fig. 4. It is a three-layered

architecture, and the functionalities of each layers are briefly explained in Sect. 4.3.1.
5.1 Use Cases
We have considered various chronic health issues for both elderly as well as differ-
ently abled community as use cases and they are briefly discussed in the below
subsection.
Here, various sensors such as force sensitive resistive (FSR) pressure sensor is being
installed in the foot sole of a patient. The diabetic patient has a tendency to develop
a wound easily as his/her foot skin is highly sensitive to rough surfaces. The wound
may lead to gangrene and therefore may cause permanent disability followed by
amputations of infected body parts. Therefore, it is important for a patient to know
his/her foot pressure variations. These variations are continuously monitored by a
medical professional through the FSRs. Such that, if there are any abnormal variations
can be easily tracked out and further medical attention can be gained. Here, multiple
other sensors such as pulse rate sensor, ECG sensor, and IR sensors are interfaced to
a microcontroller and data gets transferred to medical health server (MHS) through
Wimax like wireless technologies. At cloud-level data gets segregated as per the
nature of applications. For instance, data from diabetic patient and ICU patient gets
stored in mobile sensor database and monitoring database, respectively. Generally,
at cloud-level, LoRa gateways are used for further data distribution.
5.1.2 Real-Time Assistive Smart Wheelchair for Parkinson’s Disease
Here, the patient is equipped with motion sensor (to detect the motion), GPS sensor (to
know the location), and motor controller. The wheelchair is smart enough to capture
patient data and transfer it to the nearest cloud database through microcontroller. In
Parkinson’s disease, patient cannot move his/her arms as per their wish, so the smart
joystick will take care of patient’s movements. The data captured from this patient
is stored in the separate database at cloud called mobile sensor database for further
evaluations.
5.2 Research Challenges and Future Work
In this section, we have discussed the research challenges that are common in
designing IoT frameworks for monitoring vitals of elderly and disabled people. It
was observed that the major challenge is customization of healthcare devices that fit
comfortable for disabled people as every disabled individual has specific needs and
circumstances are different. Context-aware environments are created in smart work-
flows which takes the intelligent decision based on the context information received
from the bio-sensing devices. Another important challenge is the self-management of
IoT device. It is always preferable to design a human intervention-free device which
automatically updates its environment as its difficult for elderly or disabled people to
work on regular updates. Standardization is another key problem that every health-
care IoT designer needs to take care. Incorporation of globally acceptable standards
into IoT device is more essential to avoid interoperability related problems.
Finally, the future goal is to enhance and envision the evolution of technologies
associated in IoT and allied fields that helps in creating the devices for disabled
and elderly people. The advances in brain–computer interface (BCI) have made it
possible to create the control environments for various artificial limbs such as arms
and legs. There are continuous transformations occurring in BCI technologies around
the globe for further enhancing the research challenges. It is expected that the disabled
community will be greatly benefited by such advancements in BCI.
6 Conclusion
The paper offers a detailed overview of numerous privacy preservation concerns and
security issues that are seen in day-to-day functions of an HIoT applications. We
have deliberated key aspects of security aspects though the taxonomical diagram.
A heterogynous verity of recent state-of-the art authentication and access control
schemes and their implications in a detailed way. In addition, we have presented the
insights of different policy-based, process-based, and authentication-based security
and privacy preserving schemes that are used in HIoT application domain. IoT based
healthcare architectural framework has been discussed. Multiple use cases such as
1 real-time monitoring of foot pressure and heart activity of a diabetic patient and
real-time assistive smart wheelchair for Parkinson’s disease are deliberated with
the diagram. Here, various sensors such as force sensitive resistive (FSR) pressure
sensors, ECG sensor, GPS sensors, and pulse rate sensor are explained with their
usage implications.
References
1. Bahga, Madisetti VK (2015) Healthcare data integration and informatics in the cloud. Comput
(Long Beach Calif) 48(2):50–57, Feb 2015
2. Zhang Y, Chen M, Huang D, Wu D, Li Y (2017) iDoctor: personalized and professionalized
medical recommendations based on hybrid matrix factorization. Futur Gener Comput Syst
66:30–35
3. Yu K, Tan L, Shang X, Huang J (2020) Gautam srivastava, and pushpita chatterjee. In: Efficient
and privacy-preserving medical research support platform against COVID-19: a blockchain-
based approach. IEEE Consumer Electronics Magazine
4. Yu, K-P, Tan L, Aloqaily M, Yang H, Jararweh Y (2021) Blockchain-enhanced data sharing
with traceable and direct revocation in IIoT. IEEE Trans İndustrial İnformatics (2021)
5. Sriram S, Vinayakumar R, Sowmya V, Alazab M, Soman KP (2020) Multi-scale learning based
malware variant detection using spatial pyramid pooling network. In: IEEE INFOCOM 2020-
IEEE conference on computer communications workshops (INFOCOM WKSHPS), IEEE, pp
740–745
6. Vasan D, Alazab M, Venkatraman S, Akram J, Qin Z (2020) MTHAEL: cross-architecture IoT
malware detection based on neural network advanced ensemble learning. IEEE Trans Comput
69(11):1654–1667
7. Target heart rates chart | American heart association. [Online]. Available: https://www.heart.
org/en/healthy-living/fitness/fitness-basics/target-heart-rates. [Accessed: 28 May 2021]
8. Fernández-Alemán JL, Señor IC, Lozoya PÁO, Toval A (2013) Security and privacy in
electronic health records: a systematic literature review. J Biomed Inf 46(3):541–562
9. Abbas A, Khan SU (2014) A review on the state-of-the-art privacy-preserving approaches in
the e-health clouds. IEEE J Biomed Health Inform 18(4):1431–1441
10. Al Nuaimi N, AlShamsi A, Mohamed N, Al-Jaroodi J (2015) e-Health cloud implementation
issues and efforts. In: 2015 ınternational conference on ındustrial engineering and operations
management (IEOM). IEEE, pp 1–10
11. Idoga PE, Agoyi M, Coker-Farrell EY, Ekeoma OL (2016) Review of security issues in e-
Healthcare and solutions. In: 2016 HONET-ICT, pp 118–121. IEEE
12. Pankomera R, van Greunen D (2016) Privacy and security issues for a patient-centric approach
in public healthcare in a resource constrained setting. In: 2016 IST-Africa week conference.
IEEE, pp 1–10
13. Olaronke I, Oluwaseun O (2016) Big data in healthcare: prospects, challenges and resolutions.
In: 2016 future technologies conference (FTC). IEEE, pp 1152–1157
14. Lee I, Lee K (2015) The Internet of Things (IoT): applications, investments, and challenges
for enterprises. Bus Horiz 58(4):431–440
15. Zhang D, Zhang D, Xiong H, Hsu C-H, Vasilakos AV (2014) BASA: building mobile Ad-Hoc
social networks on top of android. IEEE Network 28(1):4–9
16. Sharma G, Bala S, Verma AK (2012) Security frameworks for wireless sensor networks-review.
Procedia Technol 6:978–987
17. Wang K, Chen C-M, Tie Z, Shojafar M, Kumar S, Kumari S (2021) Forward privacy
preservation in IoT enabled healthcare systems. IEEE Trans Ind Inf
18. Hassan MU, Rehmani MH, Chen J (2019) Privacy preservation in blockchain based IoT
systems: ıntegration issues, prospects, challenges, and future research directions. Future Gener
Comput Syst 97:512–529
19. Bhalaji N, Abilashkumar PC, Aboorva S (2019) A blockchain based approach for privacy
preservation in healthcare iot. In: International conference on ıntelligent computing and
communication technologies. Springer, Singapore, pp 465–473
20. Du J, Jiang C, Gelenbe E, Lei X, Li J, Ren Y (2018) Distributed data privacy preservation in
IoT applications. IEEE Wirel Commun 25(6):68–76
21. Ahmed SM, Abbas H, Saleem K, Yang X, Derhab A, Orgun MA, Iqbal W, Rashid I, Yaseen A
(2017) Privacy preservation in e-healthcare environments: state of the art and future directions.
IEEE Access 6:464–478
22. Xu X, Fu S, Qi L, Zhang X, Liu Q, He Q, Li S (2018) An IoT-oriented data placement method
with privacy preservation in cloud environment. J Net Comput Appl 124:148–157
23. Bhattacharya P, Tanwar S, Shah R, Ladha A (2020) Mobile edge computing-enabled blockchain
framework—a survey. In: Proceedings of ICRIC 2019. Springer, Cham, pp 797–809
24. Al Hamid HA, Rahman SMM, Hossain MS, Almogren A, Alamri A (2017) A security model
for preserving the privacy of medical big data in a healthcare cloud using a fog computing
facility with pairing-based cryptography. IEEE Access 5 (2017):22313–22328
25. Zhou J, Cao Z, Dong X, Lin X (2015) TR-MABE: White-box traceable and revocable multi-
authority attribute-based encryption and its applications to multi-level privacy-preserving e-
healthcare cloud computing systems. In: 2015 IEEE conference on computer communications
(INFOCOM). IEEE, pp 2398–2406
26. Kaneriya S, Chudasama M, Tanwar S, Tyagi S, Kumar N, Rodrigues JJPC (2019) Markov
decision-based recommender system for sleep apnea patients. In: ICC 2019–2019 IEEE
international conference on communications (ICC). IEEE, pp 1–6
27. Mutlag AA, Abd Ghani MK, Arunkumar NA, Mohammed MA, Mohd O (2019) Enabling
technologies for fog computing in healthcare IoT systems. Future Gener Comput Syst 90:62–78
28. Zhou J, Cao Z, Dong X, Lin X (2015) PPDM: a privacy-preserving protocol for cloud-assisted
e-healthcare systems. IEEE J Sel Top Sign Process 9(7):1332–1344
29. Ziglari H, Negini A (2017) Evaluating cloud deployment models based on security in EHR
system. In: 2017 ınternational conference on engineering and technology (ICET). IEEE, pp
1–6
30. Sanz-Requena R, Mañas-García A, Cabrera-Ayala JL, García-Martí G (2015) A cloud-based
radiological portal for the patients: ıt contributing to position the patient as the central axis of
the 21st century healthcare cycles. In: 2015 IEEE/ACM 1st ınternational workshop on technical
and legal aspects of data privacy and Security. IEEE, pp 54–57
31. Huang C, Yan K, Wei S, Hoon Lee D (2017) A privacy-preserving data sharing solution for
mobile healthcare. In: 2017 ınternational conference on progress in ınformatics and computing
(PIC). IEEE, pp 260–265
32. Banerjee M, Lee J, Choo K-KR (2018) A blockchain future for internet of things security: a
position paper. Digital Commun Net 4(3):149–160
33. Gordon WJ, Catalini C (2018) Blockchain technology for healthcare: facilitating the transition
to patient-driven interoperability. Comput Struct Biotechnol J 16:224–230
34. Kshetri N (2017) Blockchain’s roles in strengthening cybersecurity and protecting privacy.
Telecommun Policy 41(10):1027–1038
35. Kumar NM, Mallick PK (2018) Blockchain technology for security issues and challenges in
IoT. Procedia Comput Sci 132:1815–1823
36. Li X, Ibrahim MH, Kumari S, Sangaiah AK, Gupta V, Choo K-KR (2017) Anonymous mutual
authentication and key agreement scheme for wearable sensors in wireless body area networks.
Comput Netw 129:429–443
37. Strielkina A, Kharchenko V, Uzun D (2018) Availability models for healthcare IoT systems:
classification and research considering attacks on vulnerabilities. In: 2018 IEEE 9th ınterna-
tional conference on dependable systems, services and technologies (DESSERT), IEEE, pp
58–62
38. McLeod A, Dolezel D (2018) Cyber-analytics: Modeling factors associated with healthcare
data breaches. Decis Support Syst 108:57–68
Hybrid Beamforming for Massive MIMO
Antennas Under 6 GHz Mid-Band
Kavita Bhagat and Ashish Suri
Abstract Mid-bands and massive MIMO together have become a game-changer

for the 5G technology in the areas of wireless systems. It is considered as a sweet
pot holding the opportunities for handling new operations covering several miles
to offer large throughput and spectral efficiencies along with the massive MIMO
systems. Executing massive MIMO systems using mid-band frequencies rectify to
boost up the speed, capacity, and coverage areas. Hybrid beamforming is an efficient
solution to design our model at 6 GHz, resulting in a smaller number of RF chains and
maximizing throughput and antenna gain. In our designed experimental model, we
use massive—multiple inputs and multiple outputs—OFDM ray launching design
which separates the precoding at the digital baseband and then to the radio frequency
analog components both at the transceiver site. Less number of RF chains reduces
the system complexities and is time efficient. The experiment is carried out using the
scattering and MIMO propagation channel model, and the channel simulation link
is the OFDM and 16, 64, 256-QAM modulation.
Keywords Mid band · Massive MIMO · TDD · Hybrid beamforming · OFDM ·

QAM
1 Introduction
The ongoing evolution in wireless technologies has become a necessary evil of our
everyday life. The present system uses RF signals, electromagnetic waves (EM) to
forward their data from the source point to its destination point. 5G redefines the
network with new global wireless standards for the fastest communications. The use
of macrocells makes the foundation of 5G technology by serving thousands of mobile
K. Bhagat (B) · A. Suri

School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University of
Katra, Katra, Jammu and Kashmir 182320, India
e-mail: 19mmc004@smvdu.ac.in
A. Suri
e-mail: ashish.suri@smvdu.ac.in
https://doi.org/10.1007/978-981-16-7610-9_73
1008 K. Bhagat and A. Suri
users simultaneously at longer distances. Higher frequencies of 5G technology will

eventually allow even more technologies to connect on a massive scale. The frequency
spectrum of 5G is divided into three bands: low band, mid-band, and millimeter
wave. The low band offers a similar capacity to the 4G advanced version with an
air latency of 8–10 ms. The millimeter wave is the super fastest from all supporting
high data transfer speeds and large coverage areas at the line of sight (LOS) [1].
The mid-band is the sweet pot that overcomes the shortcomings of both the low
band and millimeter-wave band because it bridges the speed, capacity, coverage, and
longer distances over LOS and NLOS conditions. It offers more adaptability in the
5G networks because of its long reach and more compatibility to cover large areas
and penetrate through obstacles. The major ultimatum using mid-band frequencies
is path loss [2], which can be covered up by using massive MIMO antennas in the
particular scale. For increasing the data transfer speeds, massive MIMO is the only
solution for transferring multiple data streams in parallel format simultaneously,
and it can be achieved through beamforming. Hybrid beamforming is an efficient
solution where beamforming is required in the radio frequency and the baseband area,
resulting in a smaller number of RF chains than the quantity of transmitting element
arrays [3–5]. The technology is itself the combination of analog beamforming and
digital beamforming in RF and baseband domains which smartly forms the patterns
transmitted from a large antenna array. In order to send more than one data stream
in a particular sequence over a propagation channel, we need to express it with the
help of precoding weights at the transmitter and combining weights to the receiver
over an impulse matrix [6]. At last, every single data stream from the users can
be recovered from the receiver independently. The SNR and complexities can be
improved with the hybrid beamforming in a multiuser MIMO set [7]. Additionally,
it shows the formulation of the transmit-end matrix of the channel and ray launching
to trace rays from every single user. For wideband OFDM modeled systems, the
analog weights of beamformers are the only average complex weights through the
subcarriers [8].
The hybrid beamforming technique controls the antenna gains, cost, and power
utility for maximum spectral efficiencies. The computational time and complexities
in massive MIMO systems can be reduced even by increasing users and base stations
through this technology as it reduces the number of RF chains [9]. However, it is
more challenging for the system to design transmit vectors as base stations transmit-
ters to communicate [10–13] with the receivers simultaneously using the same TR
resources. So, allowing the number of base stations for many users thereby increasing
the number of data streams in a single cell for every user to direct the smooth flow
of the network.
2 Literature Review
From the literature review, we observed that this technology in addition with the
massive MIMO systems have become a hot topic in this research area. Article [10],
Hybrid Beamforming for Massive MIMO Antennas Under 6 GHz Mid-Band 1009
proposed hybrid beamforming with HBWS (Hybrid beamforming with selection)

reducing its costs and enhancing the performance of the system adjusting itself with
channel statistics enabling better user separability and beamforming gains. Another
[14] for interference mitigation in MIMO radar reduces the dimensionality of the
covariance matrix for improving jamming and interference capabilities. Complexities
of radar signal can be decreased using the space–time adaptive processing. In another
[15] HBF–PDVG (predefined virtual grouping) algorithm is adopted, reducing the
system complexities and feedback overhead to achieve beamforming because of
hardware constraints. Hybrid precoder design is proposed for cross-polarization by
a joint algorithm for enhancing overall performance [16]. The author in [17] derived
the hybrid EBF architecture modeling under phase shifter impairment proposing the
optimal least square estimator for the energy source to EH user channel maximizing
average energy. An optimal selection framework method [18] that was energy effi-
cient by conserving the energy by deactivating parts of the beamformer structure
which reduces typical power at a low level. Different methods adopted for tracing
the rays such as in [19], studied MIMO indoor propagation using 3D shooting and
bouncing (SBR) RT technique using 802.11n WIFI under 2.4 and 5 GHz bands. Both
imaging method and ray launching method were involved for the channel calcula-
tions launched through the base station. Conventionally, it uses 2 MIMO antennas
of 2 * 3 dual band for the correct prediction of the signal which needs to be highly
achievable from nearby and farther locations and distributed equally among all users.
A big advantage of deep MIMO dataset was designed with hybrid beamforming
for machine learning applications with the help of massive MIMO and millimeter-
wave channels used both for indoor as well as outdoor environments [20]. Such
dataset parameters include several active base stations, active users, antenna spacing,
bandwidth of the system, OFDM parameters, and channel paths. It shows how this
dataset can be applied to deep learning by using massive MIMO and millimeter-
wave beam prediction. In Ref. [21], beamforming neural network dataset for deep
learning in the application of millimeter wave for optimizing its design using channel
state information constraints. In Ref. [22], performance analysis of massive MIMO
hybrid beamforming at downlink using massive MIMO antennas for the application
of millimeter waves offers better throughput and spectral efficiencies using the OMP
algorithm. İn this situation, it is mandatory to improve the spectral efficiency with
minimum number of RF chains for the estimation of channels and to calculate accu-
rate results [23]. Antenna gains are increased by resolving the issues of high pass loss.
Author [24] proposed low complexity hybrid beamforming for downlink millimeter-
wave applications works on both the different analog and digital precoding and shows
that higher transmission rates that are possible only with analog beamforming solu-
tions. In Ref. [25], a combination of large dimensional analog precoding and small
dimensional digital precoding together for reduction of low complexity and hardware
costs and constraints. It gives a clear understanding by arranging this combination
for the average CSI, better SNR ratio design implementations in massive MIMO
systems. In Ref. [26], consideration of a new system model which collaborates with
general transceiver hardware impairments at both the base stations (equipped with
large antenna arrays) and the single-antenna user equipment (UEs). In Ref. [27], a
survey was conducted for the macro-cells millimeter-wave system which discussed
the performance of the MIMO systems showing that it is better to model for multi-
dimensional accuracy using scattering models at outdoor scenarios. Our paper is
organized in a systematic way following with the section firstly proposed model. The
next section includes the mathematical model representation of our model. Thirdly,
it includes measurements units following with the results and discussion. The fifth
section includes the conclusion and scope in future time.
3 Proposed Model
Starting with basic beamforming, the analog beamformers produce a single beam
to each antenna array which makes it a little complex for multiple beams. Digital
beamformers have an analog baseband channel in a single antenna for processing of
digital transceivers for every station to reduce its costs, more power utilization, and
system complexities. To overcome such problems, the use of hybrid beamforming
is the best choice [28]. The combining of both beamforming in RF and baseband
domains smartly forms the patterns transmitted from a large antenna array. In a hybrid
beamforming system, the transmission is like others beamforming. If we need to send
more than one data stream in a particular sequence over a propagation channel, we
need to express it with the help of precoding weights at the transmitter and combining
weights to the receiver over an impulse matrix. At last, every single data stream from
the users can be recovered from the receiver independently. For signal propagation,
the ray-tracing method is applied to the model using an SBR ray-tracing method for
the estimation of the tracing of rays assumed to be launched and to be traced. The
diagram shown below in Fig. 1 shows the transmission and reception of the signal
in the MU-MIMO OFDM system.
Fig. 1 Block diagram of data transmission in MU-MIMO OFDM system [23]

In the transmitter section, one or more user’s data are sent over the transmitter
antenna array through the channel encoded with the help of convolution codes. The
output of channel encoded bits is then mapped into quadrature amplitude modulation
(QAM) with different subcarriers (2, 16, 64, 256) complex symbols which results
in generating mapped symbols of each bit for every single user [29]. The output of
QAM data of users is then distributed into multiple data streams for transmission.
After all this process is completed, the next phase starts where the output is passed
into digital baseband precoding for assigning precoding weights for data streams.
In our proposed model, these weights are measured using hybrid beamforming with
Orthogonal Matching Pursuit (OMP) algorithm and Joint Spatial Division Multi-
plexing (JSDM) algorithm for single users and multi-users. The Joint Spatial Divi-
sion multiplexing is used as its performance is better for maximum array response
vector. It also allows many base stations for transmission.
Channel sounding and estimation is performed at both transmitter and receiver
section in reduction of the radio frequency propagation chains [30]. The base stations
sound the channel by using the reference signal for transmission so that it is easily
detected by the mobile station receiver point to estimate that channel. The mobile
stations then transmit the same information back to the base station so that they
can easily calculate the precoding required for the upcoming data transmission [31].
After assigning the precoding weights, the MU-MIMO system is used to combine
these weights at the receiver resulting in complex weights. The signal received is
in the digital form which is further modulated using orthogonal frequency-division
multiplexing modulation with pilot contaminated mapping followed by the radio
frequency analog beamforming for every single transmitter antenna. This modulated
signal is then fed into a scattering MU-MIMO, and then, demodulation is performed
for decoding of the originality of the signal when reached its destination point [32].
Table 1 shown below includes the parameters of our model which are generally
assumed for experimenting by considering different numbers of users, data streams
allotted to such users, and the OFDM system.
3.1 Mathematical Model Representation
The channel matrix of the MIMO system is shown below considering H as the
channel impulse response.
⎡ ⎤
h 11 h 21 · · · h 31
⎢ h 12 h 22 · · · h 32 ⎥
⎢ ⎥
Channel matrix, H = ⎢ . .. .. ⎥
⎣ .. . . ⎦
h 1M h 2M · · · h 3M
We have assumed the downlink transmission from the first base station which acts
at the transmitter to the mobile user. In each transmitter section, baseband digital
Table 1 Parameters for

Serial no Parameter Value
proposed model
1 Users 4, 8
2 Data streams per user 3, 2, 1, 2; 3, 2, 1, 2, 1, 2,
2, 3
3 Base stations 64
4 Receiving antennas 12, 8, 4, 8
5 Bits per subcarrier 2, 4, 6, 8
6 OFDM data symbols 8, 10
7 Position of MS 180
8 Position of BS 90
9 Maximum range of MS 250 m, 500 m, 1 km
10 Carrier Frequency 6 GHz
11 Sampling Rate 100 * 10ˆ6
12 Channel type Scattering, MIMO
13 Noise figure 4, 6, 8
14 Rays 500
15 FFT length 256
16 Cyclic prefix 64
17 Carriers 234
18 Carrier indices 1:7, 129, 256
19 Code rate 0.33
20 Tail bits 6
21 Modulation schemes QAM—2, 16, 64, 256
precoder FBB is processed with NS data streams to obtain its outputs. It is then
converted into RF chains through an analog precoder TRF to NBS antenna elements
for the propagation of the channel. Analog beamformers WRF are combined with
RF chains from the user’s antennas to create the output at the receiver.
Mathematically, it can be written as:
F = Fbb ∗ Frf and W = Wbb ∗ Wrf (1)
where Fbb = Ns × NtRF matrix, Frf = NtRF × Nt matrix, Wbb = NrRF × Ns

matrix, Wrf = Nr × NrRF matrix.
The mathematical representation for calculating the precoding weights and
complex weights at the transmitter and receiver antennas is as:
Precoding weights matrix; F = FBB ∗ FRF × N S × N T (2)

where FRF = Analog precoder, FBB = Digital decoder, N S = Signal streams, and
N T = Transmitter antennas.
Combining weight matrix is written as:
W = WRF ∗ WBB × N R × N S (3)
where WRF = Analog combiner, WBB = Digital combiner, N R = Number of receiver

antennas, and N S = Number of signal streams.
The downlink signal for each user is calculated as:

k
k, yk = Hk Wk Sk + Hk x xa + n k (4)
n=k
Whereas, k is the number of users, xk = signal allotted to the user, Hk is the channel
from the transceiver point to the k user; n k is the noise.
For precoding weights set, xk = wk sk (5)
Then, downlink signal for each user is as:

k
k, yk = Hk wk sk + Hk x xa + n k (6)
n=k
4 Measurement Units
Our results for the measurement of the channel are impaired in an indoor scenario with
the MATLAB software using the communication toolbox and phased antenna array
toolbox. The communication toolbox plays a major role in it designing of the model
and helps to provide algorithms that enable to analyze and gain outputs easily. The
antenna-phased array toolbox provides the correct positioning of the transmitter and
receiver antennas when locating in large numbers. The performance of the software
tool is analyzed once so that it shows accurate results when applied to characterize
the indoor radio channels. MIMO hybrid beamforming is designed in such a way
that it is highly possible to achieve accurate results. The implementation part of
the work is to plan the suitable environment for conducting measurements using
the channel sounder. The environment that is considered in our work model is an
indoor environment operating at a 6 GHz frequency range working for the application
of broadband and the low band for enhanced capacity of the network. Equipment
used for the measurement of the channel includes channel sounder, antennas such as
Fig. 2 Channel sounder with transmitter unit and receiver unit [30]
isotropic antenna as it radiates power equally among all directions, mobile users at
250 m, 500 m, and 1 km range from the base station, uninterruptible power supply,
and laptop for measurement purposes. The number of rays is set to be at 500. The
modulation scheme used in it is OFDM whose number of data symbols is 10 and
8. There are four and eight users who are assigned with multiple data streams in
the order 3, 2, 1, 2, 2, 2, 1, 3. The next step is to calibrate the channel sounder
equipment for the spatially multiplexing system as shown in Fig. 2 consisting of two
main units: transmitter and receiver. The function of the channel sounder is to apply
the maximum power to the signal in the desired direction [27]. Preamble signal is
sent all over to the transmitter for processing the channel at the Rx section. The
preamble signal is generated for all sound channels and is then sent to the selected
MIMO system. The receiver section then performs pre-amplification, demodulation
(OFDM) for all established links. The experiment is practiced by varying parameters
such as the transmission distances, propagation channel model, data symbols, and
noise figures. We have simulated our results based on the error vector magnitude
values, beam patterns, and rays patterns. Following, different cases are assumed
for the measurement of their results such as users, data symbols, range of the base
stations, noise figures, propagation channel model, and modulation schemes.
5 Results and Discussions
MU-mm MIMO communication link between the BS and UEs is validated using
scattering-based MIMO spatial channel model with “single bounce ray tracing”
approximation. The users are placed randomly at different locations. Different cases
are studied, and on its basis, experiments have been performed and analyzed.
Case 1 In this first case, we change the distance of the antennas between the users
and then increase our users to check the impact on the bits received to users, EVM
RMS values, and antenna patterns. Also, here, the data symbol is set to be at 10 with
a noise figure of 10 dB. The number of rays is said to be 500 and remains fixed in
it. The propagation channel model selected in it is MIMO as it is better and more
efficient than the scattering channel model in Fig. 3. Here, in Fig. 3, it is clearly
shown that increased distances does not shows any effect on its values as well as the
received bits. They remain constant if the distance is increased from 250 to 1000 m
and beyond it. It is observed that for users who have a high number of data streams,
RMS EVM value remains low, and for users with single data streams, RMS EVM
value is high. This value is increased as base stations are decreased in multiple data
streams users. In other words, no impact is seen on the bits, EVM values by increasing
the distances (Fig. 3).
Fig. 3 Comparison of EVM values using 2-QAM modulation by varying distances
Fig. 4 Comparison of all modulations using four users by varying distances

Fig. 5 Comparison of EVM values using the 16-QAM, 64-QAM scheme by considering eight
users
In Case 2, we assumed our users to be increased from four to eight and all other
parameters remained constant as we assumed in Case 1. We analyzed that there
is very little effect seen on the error vector magnitude values which relys on the
performance. Figure 5 as shown, the EVM values go partially high by increasing
users, the bit error rate remains low, and the output bits received are the same for
the first four users. This value is increased as base stations are decreased in multiple
data streams users. The number of bits is increased slightly for multiple data streams
as shown in Fig. 5. The transfer bit rate remains the same for multiple data streams
as those for 64-QAM. The root means square value is minimized when base station
antennas are increased. More data feeds lead to limiting the root mean square value
for every single data feed.
Case 3 In the third case, we have changed the data symbols from 10 to 8 symbols
and compared them with the outputs of Fig. 4. It shows that by varying the data
symbols there is an effect seen on our output bits and error vector magnitude values.
The output bits of this case are compared with the output bits of Case 2. The EVM
values are low if we consider data symbols value to be low and goes on the increase
by increasing the symbol rate. For increasing the output bits, the data symbols are
set to be at the high value shown in Fig. 6. By comparing both the 64-QAM and 256-
QAM for eight users, we analyzed that the RMS EVM value of both the modulations
is somewhat the same. The EVM value is very high for two data streams in 64-QAM.
Case 4 As seen in Figs. 7, 8, 9, and 10, noise figures from 4 to 6 and then from
6 to 8 are varied accordingly, and there is a prominent difference seen in the RMS
values, yet the bit remains constant in it. There is again a direct relation seen between
the noise figure and the EVM values. It goes on decreasing by decreasing the noise
figure levels.
Case 5 Fig. 11 shows by varying the propagation channel model from scattering
to MIMO channel model we saw that RMS EVM value is high using scatter channel,
Fig. 6 Comparison of values of 64-QAM and 256-QAM schemes by changing its data symbols at
eight users
Fig. 7 Comparison of EVM values at different noise levels using the 2-QAM scheme for four users
whereas while using MIMO channel, it can be lower down. There is no effect on the
bit error rate. It remains the same using both the propagation channels.
Figures 12 and 13 show the 3D radiation antenna pattern using MIMO and scat-
tering channel model at 256-QAM modulation schemes. More lobes are formed
with the MIMO model as it works on multiple antennas. If we compare it with the
scattering channel model as seen in Fig. 10, the lobes are formed in fewer numbers
compared with the scattering design. The lobe that is at the right side of the diagram
signifies the data streams of the users. The pointer shows that hybrid beamforming
is achieved, and data streams for every user are divided. It is quite clear from the
diagram that the signal radiation beam pattern is growing sharper as the antennas at
base stations are increased which results in increasing the throughput of the signal
to be efficient.
Fig. 8 Comparison of EVM values at different noise levels using the 16-QAM scheme for four
users
users
The constellation diagrams are also shown in Figs. 14, 15, 16, and 17 by consid-
ering eight users. It reveals the point tracing blocks of every data stream for higher-
order modulation schemes in the working model. The ray-tracing blocks explain to
us that the streams retrieved are high for those users with fewer data streams. Position
of the blocks represents that those blocks where points are adjusted so closely have
a high rate of retrieved streams for users with multiple data streams and the points
which are positioned with more space have less rate of retrieved streams for users
with single data streams. More recovered data streams for users with multiple data
streams result in less SNR ratio, and less retrieved data streams for single-stream
users result in high SNR.
users
Fig. 11 Comparison using scattering and MIMO channel model of 256-QAM scheme
We have analyzed the hybrid beamforming with the ray-tracing method in which each
user can use multi-data streams. The spectral efficiency is improved significantly with
the high number of data streams. It is observed that for the users who are having a high
number of data streams, RMS EVM values are low, and for those users with single
data streams, the RMS EVM value is comparatively high. The possibility of errors is
reduced; also, we can compare the bit errors with actual bits transmitted with the bits
that are received at decoder per user. The number of antennas required is decreased
only if the users are transmitting its information or signal data by using multiple data
streams. If the users are transmitting their data by using a single data stream, then
the requirement of the antenna is also increased in this case which results in more
system complexity. For more throughputs and less bit error rate, multi-data streams
Fig. 12 Antenna pattern

design using the MIMO
channel model at 256-QAM
Fig. 13 Antenna pattern

design using the scattering
channel model at 256-QAM
Fig. 14 2-QAM ray-tracing blocks

are more advantageous to the users for every higher-order modulation scheme as
seen from the diagrams. Studies at different parameters can be analyzed in the future
through this experiment. The environment can be changed from indoor to outdoor,
and then, analysis can be done by comparing different environmental conditions.
Additionally, we will do a comparison with the ray-tracing results with different
channel models, and analysis should be done precisely and accurately. Complexity
is reduced to the minimum level of RF chains in the uplink conversions. The MU-
MIMO hybrid beamforming is to be designed with the aim of deduction of RMS
EVM values users with single data streams.
References
1. Larsson EG, Edfors O, Tufvesson F, Marzetta TL, Alcatel-Lucent (2014) Massive Mimo for
next generation wireless systems, USA
2. Gupta A, Jha RK (2015) A survey of 5G network: architecture and emerging technologies
3. Ahmed I, Khammari H, Shahid A, Musa A, Kim KS, De Poorter E, Moerman I (2018) A survey
on hybrid beamforming techniques in 5G: architecture and system model perspectives
4. Lizarraga EM, Maggio GN, Dowhuszko AA (2019) Hybrid beamforming algorithm using
reinforcement learning for millimeter-wave wireless systems
5. Zou Y, Rave W, Fettweis G (2015) Anlaogbeamsteering for flexible hybrid beamforming design
in mm-wave communications
6. Palacios J, Gonzalez-Prelcic N, Mosquera C, Shimizu T, Wang C-H (2021) Hybrid beam-
forming design for massive MIMO LEO satellite communication
7. Choi J, Lee G, Evans BL (2019) Two-Stage analog combining in hybrid beamforming systems
with low-resolution ADCs
8. Lee JH, Kim MJ, Ko YC (2017) Based hybrid beamforming design in MIMO interference
channel
9. Hefnawi M (2019) Hybrid beamforming for millimeter-wave heterogeneous networks
10. Ratnam VV, Molisch AF, Bursalioglu OY, Papadopoulos HC (2018) Hybrid beamforming with
selection for multi-user massive MIMO systems
11. Chiang H-L, Rave W, Kadur T, Fettweis G (2018) Hybrid beamforming based on implicit
channel state information for millimeter-wave links
12. Yoo J, Sung W, Kim I-K (2021) 2D-OPC Subarray Structure for Efficient Hybrid Beamforming
over Sparse mmWave Channels
13. Zhang D, Wang Y, Xiang W (2017) Leakage-based hybrid beamforming design for downlink
multiuser mmWave MIMO systems
14. Chahrour H, Rajan S, Dansereau R, Balaj B (2018) Hybrid beamforming for interference
mitigation in MIMO radar, IEEE
15. Aldubaikhy K, Wu W, Shen X (2018) HBF-PDVG: Hybrid Beamforming and User Selection
for UL MU-MIMO mmWave Systems
16. Satyanarayana K, Ivanescu T, El-Hajjar M, Kuo P-H, Mourad A, Hanzo L (2018) Hybrid
beamforming design for dual-polarised millimeter wave MIMO systems
17. Mishra D, Johansson H (2020) Optimal channel estimation for hybrid energy beamforming
under phase shifter impairments
18. Vlachos E, Thompson J, Kaushik A, Masouros C (2020) Radio-frequency chain selection
for energy and spectral efficiency maximization in hybrid beamforming under hardware
imperfections
19. Sohrabi F, Student Member, IEEE, Yu W (2017) Hybrid analog and digital beamforming for
mmWave OFDM large-scale antenna arrays
20. Hybrid-beamforming design for 5G wireless communications by ELE Times Bureau published
on December 12, 2016
21. Dama YAS, Abd-Alhameed RA, Salazar-Quiñonez F, Zhou D, Jones SMR, Gao S (2011)
MIMO indoor propagation prediction using 3D shoot-and-bounce ray (SBR) tracing technique
for 2.4 GHz and 5 GHz
22. Alkhateeb A (2019) DeepMIMO: a generic deep learning dataset for millimeter-wave and
massive MIMO applications
23. Dilli R (2021) Performance analysis of multi-user massive MIMO hybrid beamforming systems
at millimeter-wave frequency bands
24. Jiang X, Kaltenberger F (2017) Channel reciprocity calibration in TDD hybrid beamforming
massive MIMO systems
25. A Alkhateeb G Leus R Heath 2015 Limited feedback hybrid precoding for multi-user
millimeter-wave systems IEEE Trans Wireless Commun 14 11 6481 6494
26. A Liu V Lau 2014 Phase only RF precoding for massive MIMO systems with limited RF chains
IEEE Trans Signal Process 62 17 4505 4515
27. E Bjornson J Hoydis M Kountouris M Debbah 2014 Massive MIMO systems with non-ideal
hardware: energy efficiency, estimation, ¨ and capacity limits IEEE Trans Inf Theory 60 11
7112 7139
28. Eisenbeis J, Pfaff J, Karg C, Kowalewski J, Li Y, Pauli M, Zwick T (2020) Beam pattern
optimization method for subarray-based hybrid beamforming systems
29. Alkhateeb A, El Ayach O, Leus G, Heath RW (2014) Channel estimation and hybrid precoding
for millimeter wave cellular systems. IEEE J Selected Topics Signal Process 8(5):831–846
30. open example (‘phasedcomm./MassiveMIMOHybridBeamformingExample’)
31. Y Zhu Q Zhang T Yang 2018 Low-complexity hybrid precoding with dynamic beam assignment
in mmwave OFDM systems IEEE Trans Vehicular Technol 67 4 3685 3689
32. Foged LJ, Scialacqua L, Saccardi F, Gross N, Scannavini A (2017) Over the air calibration of
massive MIMO TDD arrays for 5G applications. In: 2017 IEEE international symposium on
antennas and propagation & USNC/URSI national radio science meeting, pp. 1423–1424, San
Diego, CA, USA, 2017
33. Gonzalez J (2021) Hybrid beamforming strategies for secure multicell multiuser mmWave
MIMO communication
34. Eisenbeis J, Tingulstad M, Kern N et al (2020)MIMO communication measurements in small
cell scenarios at 28 GHz. IEEE Trans Antennas Propag. Smith TF, Waterman MS (1981)
Identification of common molecular subsequences. J Mol Biol 147:195–197
Multi-Class Detection of Skin Disease:
Detection Using HOG and CNN Hybrid
Feature Extraction
K. Babna, Arun T. Nair, and K. S. Haritha
Abstract It is essential to monitor and analyse skin problems early on in order to

prevent them from spreading and turning into deadly skin cancers. Due to artefacts,
poor contrast and similar imaging of scars, moles and other skin lesions, it is difficult
to distinguish skin diseases from skin lesions. As a consequence, automated skin
lesion identification is performed using lesion detection methods that have been
optimised for accuracy, efficiency and performance. Photographs of skin lesions are
used to illustrate the suggested technique. To assist in the early detection of skin
lesions, the proposed method uses CNN, GLCM and HOG feature extraction. The
files include many skin lesions of various kinds. The suggested work includes a
pre-processing step that aims to improve the quality and clarity of the skin lesion
and to remove artefacts, skin colour and hair, amongst other things. Then, using
geodesic active contours, segmentation is done (GAC). Skin lesions may be separated
separately during the segmentation step, which is beneficial for subsequent feature
extraction. The proposed system detects skin lesions via the use of feature extraction
methods such as CNN, GLCM and HOG. Score features are extracted using the CNN
technique, whilst texture features are extracted using the GLCM and HOG methods.
After collecting characteristics, a multi-class SVM classifier is utilised to categorise
skin lesions. Using ResNet-18 transfer learning for feature extraction, many skin
diseases, including malignant lesions, may be rapidly classified.
Keywords Geodesic active contour · Grey-level co-occurrence matrix · Histogram

of oriented gradients · Convolution neural network · ResNet-18 · SVM classifier
K. Babna (B)
India
A. T. Nair
K. S. Haritha
College of Engineering, Kannur, Kerala, India
https://doi.org/10.1007/978-981-16-7610-9_74
1026 K. Babna et al.
1 Introduction
The skin, which acts as the body’s outer layer, is the biggest organ in the human
body. The skin is made up of up to seven layers of ectodermal tissues that serve
as a protective covering over the underlying muscles, bones, ligaments and internal
organs. The skin protects the body from harmful substances and viruses, aids in
temperature control and gives feelings of cold, heat and touch. A skin lesion is defined
as a patch of skin that is abnormal in contrast to the surrounding skin. Infections
inside or on the skin are the basic and main cause of skin lesions. Skin lesions
may be categorised as primary (present at birth or developed over time) or secondary
(resulting from poor treatment of the original skin lesion), both of which can progress
to skin cancer. Since a consequence, manual skin cancer diagnosis is not optimal, as
the skin lesion is assessed with the naked eye, resulting in mistreatment and ultimately
death. Accurate detection of skin cancer at an early stage may significantly increase
survival chances. As a consequence, automated detection becomes more reliable,
increasing accuracy and efficiency.
In the proposed method, three types of skin lesions are included in the dataset. The
training sets are passed through four major steps in the methods are pre-processing,
segmentation, feature extraction and classification. Here, we propose a novel method
of feature extraction stage includes HOG, and GLCM features along with ResNet-18
transfer learning for a better output in the classification process.
2 Literature Review
Dermoscopy methods are being developed in order to produce a clear skin lesion
site, which improves the visual impact by removing reflections. Automatic skin lesion
identification, on the other hand, is difficult owing to artefacts, poor contrast, skin
colour, hairs [1] and the visual similarities between melanoma and non-melanoma
[2]. All of this may be reduced to a minimum by using pre-processing processes.
The exact location of the skin lesion is determined by segmenting the pre-processed
skin lesion picture. The wavelet algorithm, basic global thresholding, region-based
segmentation, the watershed algorithm, the snakes approach, the Otsu method, active
contours and geodesic active contours are some of the segmentation methods avail-
able. Geodesic active contours [3] are used to segment the data. There are a variety
of methods for extracting characteristics from a segmented skin lesion image [4, 5],
including the cash rule, ABCD rule and ABCDE rule, as well as the GLCM rule, the
HOG rule, the LBP rule and the HLIFS rule. The ABCD rule is a scoring method
that collects asymmetry, colour, border and diameter information [6]; the authors
describe how to take the total dermoscopic score and identify melanoma and non-
melanoma using wavelet analysis. Using hog, it is possible to extract the form and
edge of a skin lesion [7]. In this study, the recovered feature is passed straight to an
SVM classifier, which yields an accuracy of 97.32%. The classifier is the last step in
Multi-Class Detection of Skin Disease … 1027
the process of identifying skin lesions and is responsible for categorising them. This
method consists of two parts: teaching and testing. Unknown patterns are fed into
the system, and the information acquired during the training process is utilised to
categorise the unknown patterns. There are many different kinds of classifiers, such
as SVM, KNN, Naive Bayes, and neural networks, amongst others. Author Khan [8]
applied features to the SVM, KNN, and Naive Bayes classifiers and achieved accu-
racy rates of 96%, 84%, and 76%, respectively, for the three classifiers. In their article
[9], Victor, Akila, and M. Ghalib describe pre-processing as the first and most signifi-
cant step of image processing, which helps in the elimination of noise. Pre-processing
is the first and most essential stage of image processing, according to the authors.
The output of the median filter is supplied as an input to the histogram equalisation
phase of the pre-processing stage, and the input of the histogram equalised picture
is provided as an input to the segm stage after that. The use of segmentation aids in
the identification of the desired area. Area, mean, variance and standard deviation
calculations for feature extraction are now carried out using the extracted output from
the segmentation phase, and the output is fed into classifiers such as support vector
machine (SVM), k-nearest neighbour (KNN), decision tree (dt) and boosted tree (bt).
The categorisations are compared one to another. Kasmi and colleagues showed that
glcm extracts textural characteristics [10], and that the extracted feature may then
be passed straight to a neural network, resulting in a success rate of 95.83%. Skin
lesion segmentation is the essential step for most classification approaches. Codella
et al. proposed a hybrid approach, integrating convolutional neural network (CNN),
sparse coding and support vector machines (SVMs) to detect melanoma [11]. Yu et al.
applied a very deep residual network to distinguish melanoma from non-melanoma
lesions [12]. Schaefer used an automatic border detection approach [13] to segment
the lesion area and then assembled the extracted features, i.e. shape, texture and
colour, for melanoma recognition. Moataz et al. practised upon a genetic algorithm
with an artificial neural network technique for early detection of the skin cancers
and obtained a sensitivity of 91.67% and a specificity of 91.43%. [14]. Kamasak
et al. classified dermoscopic images by extracting the Fourier identifiers of the lesion
edges after dividing the dermoscopic images. They obtained an accuracy of 83.33%
in diagnosing of the melanoma [15] (Table 1).
The discovery of cutaneous lesions proceeds in stages, as illustrated in Fig. 1. It

entails data: data acquisition, segmentation, feature extraction and classification.
Table 1 Review of conventional methods

Sl No Author (citation) Methodology Features Challenges
1 Jaisakthi et al. Grab cut and Automatic recognition Recognition of skin
k-means algorithms of skin lesion lesion has a few
difficult task such as
artefacts,
low-contrast, skin
colour and hairs
2 Chung et al. PDE-based method The pre-processed skin Accuracy is low
lesion image is Only melanoma and
segmented used to get non-melanoma
the accurate position detection
of skin lesion
3 Hemalatha et al. Active Preparation of pixels The sample size is
contour-based of interest for different relatively low
segmentation image processing
Decompose the image
into parts for future
analysis
4 Salih et al. Active contour Fuzzy clustering based A large amount of
modelling, FCM on region growing images and different
algorithm need for
better classification
5 Li et al. FCRN A straightforward Accuracy is
CNN is proposed for comparatively low
the dermoscopic
feature extraction task
6 Kasmi R et al. ABCD rule 92.8% sensitivity and The accuracy is low
90.3% specificity
reported
7 Bakheet et al. HOG, SVM Evaluations on a large Only two type of
dataset of dermoscopic classification
images have
demonstrated
8 Khan et al. K-mean clustering, Extraction of textural Only two
FCM and colour features classification
from the lesion
9 Victor et al. SVM, KNN Detect and classify the KNN is 92.70%;
benign and the normal SVM is 93.70%
image
10 Goel et al. GLCM, back GLCM matrix Accuracy is low
propagation neural characterises the
network feature of image
11 Celebi et al. Survey on lesion A lesion border It is a comparison
border detection detection in with various criterion
dermoscopy images
(continued)
Table 1 (continued)
Sl No Author (citation) Methodology Features Challenges
12 Yu et al. Very deep residual Automated melanoma Accuracy low and
networks recognition in only two
dermoscopy images classifications
13 Schaefer et al. An ensemble Ensemble Accuracy is 93.83%
classification classification
approach
14 Moataz et al. Artificial Image classification Sensitivity 91.67%
intelligence using ANN and AI and the specificity
techniques 91.43%
15 Kamasak et al. ANN, SVM, KNN Classification with Comparison of
and decision tree different machine different classifiers
learning methods
Fig. 1 Block diagram
3.1 Data Acquisition
3.1.1 Dataset
It is intended that the initial phase of this project will include the gathering of data
from the International Skin Imaging Collaboration’s databases of images of skin
lesions (ISIC). There are three types of cancer represented in this experiment: actinic
Fig. 2 Examples for skin lesion images
keratosis, basal cell carcinoma and melanoma. Photographs of skin lesions were
taken using data from the ISIC 2017 dataset. Images in JPEG format are utilised. It
was decided to divide the skin lesion pictures into three groups. There was 69 actinic
keratosis, 80 basal cell carcinoma and 60 melanoma images for training and testing.
Figure 2a, b, c shows actinic keratosis, basal cell carcinoma and melanoma,
respectively.
3.2 Pre-Processing
It is necessary to do pre-processing on the skin lesions datasets in the second phase.

To ensure that the lesion is detected in the future stages, pre-processing removes
everything else from the sample except for the lesion. Artefacts, poor contrast, hairs,
veins, skin tones and moles are all examples of undesirable components. Using the
following ways, they are disposed of: it is necessary to convert an RGB picture to
greyscale in order for digital systems to recognise the intensity information included
within the image. (ii) Following the use of median filtering to reduce noise from the
greyscale picture, which enhances the image of the skin lesion, this median-filtered
image was utilised for hair identification and removal from the skin lesion. Using
bottom hat filtering, which separates the smallest element in an image, such as hair,
researchers found hair on skin lesions for the first time. By employing an area filling
morphology, which interpolates pixels from the outside in, the found hair may be
removed.
3.3 Segmentation
The third step involves the segmentation of the images that have been pre-processed.
The technique of segmentation is used to pinpoint the exact site of a skin lesion.
Geodesic active contours were used in this study to segment the dataset for segmen-
tation (GAC). In general, GAC identifies the most significant changes in the overall
skin lesion, which are usually seen near the lesion’s borders. The Otsu thresholding
technique is used to binarize pre-processed skin images, and the binarized image is
then applied using the GAC technique.
3.4 Feature Extraction
The extraction of characteristics from the segmented skin lesion is the subject of the
fourth step. In order to acquire accurate information about the skin lesion, the feature
extraction method was utilised to gather information on the lesion’s border [16],
colour, diameter, symmetry and textural nature. The identification of skin cancer is a
straightforward process. Three distinct feature extraction methods were employed:
GLCM, HOG and CNN. GLCM was the most often used methodology.
3.4.1 GLCM (Grey-Level Co-occurrence Matrix)
In textural analysis, GLCM is usually used to get the distributed intensity of an

item, which is accomplished via the use of a GLCM. The GLCM [17, 18] algorithm
analyses two pixels, one of which is a neighbouring pixel and the other of which is
a reference. GLCM may be used to produce contrast, correlation, energy, entropy,
homogeneity, prominence and shadow, amongst other things. The computation for
each characteristic is described in more detail below:
• Contrast: In a skin lesion, the spatial frequency of texture is measured.
• Correlation: The linear relationships of a skin lesion at the grey level.
• Energy: The degree to which a skin lesion is disordered.
• Homogeneity: The element’s distribution throughout the skin lesion.
3.4.2 HOG (Histogram of Oriented Gradients)
HOG is used to extract information about the shape and edges of objects. It is neces-
sary to utilise the orientation histogram in order to assess the intensity of a lesion’s
edges. When it comes to this goal, there are two basic components to consider: the
cell and the block.
3.4.3 ResNet-18 Convolution Neural Network
Known as a residual network, an artificial neural network (also known as ResNet)

is a network that helps in the development of a deeper neural network by utilising
skip connections or shortcuts to avoid particular layers. You will see how skipping
allows for the construction of deeper network layers whilst also avoiding the problem
of gradients vanishing in the process. ResNet is available in a number of flavours,
Fig. 3 Residual blocks in ResNet
Fig. 4 Architecture ResNet-18
including ResNet-18, ResNet-34 and ResNet-50. ResNet is available in the following

sizes: ResNet-18 is a convolutional neural network with 18 layers of layers. Despite
the fact that the design is similar, the numbers indicate the amount of layers. The
addition of a shortcut to the main route in the conventional neural network results
in the generation of residual blocks, as shown in Fig. 3. A diagram showing the
architecture of ResNet-18 is shown in Fig. 4.
3.5 Classification
There are a plethora of models available for distinguishing between malignant and
non-cancerous skin lesions. The SVM, KNN, Naive Bayes and neural networks
algorithms are the most frequently used machine learning methods for lesion classi-
fication. Specifically, a multi-SVM classifier is used in this study, with the obtained
features being instantly sent on to the classifier.
A framework for training and testing that is based on SVMs. The support vector
machine method, which makes use of these element vectors, builds and trains our
proposed structural model (colour and texture). In the database, each cancer image’s
colour and texture attributes are recorded, and these qualities will be used in the
following phase of categorisation.
This suggested structure based on SVM will categorise cancer pictures in the
light of the component vectors colour and texture. Multiple distance metrics are used
to measure feature similarity between one picture and other photographs in order to
successfully categorise one image with other photos. This was done by comparing the
characteristics of the query image with the features of the database images, which
was accomplished using SVM classifiers in this instance. Based on these values,
the SVM classifier will decide which class the input picture belongs to. The SVM
classifier will compute the feature values of the input image and the database images;
the SVM classifier will determine which class the input image belongs to.
Proposed method is applied to skin lesion images collected from skin lesion images.
When applied to ISIC skin lesion pictures, the suggested approach yields excellent
results. There are 69 pictures of actinic keratosis in the datasets, 80 images of basal
cell carcinoma and 60 images of melanoma in the databases. Classes are taught to
classifiers by utilising a number of different training and testing sets. As specified in
the method, three different feature extractors are used in the analysis. In addition to
GLCM [19] and HOG, CNN is used for feature extraction in this application. Many
stages of the process, including as training, pre-processing, segmentation, feature
extraction and classification, may be automated using the algorithms that have been
proposed. Multi-SVM classifier is used for the classification.
We created five push buttons for the easy finding of different stages of the process.
For each step, the relevant findings are shown. Figure 5 shows the selected image
and the processed stage of the image. The processed images then entering to the
noise removal stage. Hair removed image of selected image is shown in Fig. 6.
Then, the image undergoes segmentation, and the segmented images are shown in
Fig. 7. We need a confusion matrix in order to get a thorough grasp of our suggested
models, which is necessary due to the problem of class imbalance. This allows us
to identify areas in which our models may be inaccurate, and the confusion matrix
is used to assess the performance of the architecture. A comparison of the accuracy
and precision of feature extraction using the proposed approach is shown in Table
2 [20, 21]. The accuracy of CNN, HOG and GLCM may be increased to 95.2% by
combining them. The statistical result shown in Table 3 shows also the comparison
and the better sensitivity and specificity of the classifier.
Fig. 5 Selected image and processed stage
Fig. 6 Hair removed image
Fig. 7 Image after segmentation
Table 2 Classification accuracy of proposed method

Features Accuracy Precision Recall
GLCM 82% 81% 82.3
HOG 87% 86.4% 87.15
GLCM + HOG 94% 93.2% 95%
GLCM + HOG + CNN 95.2% 94.8 95.13
Table 3 Comparison of different classifier methods

Classifier Sensitivity Specificity Positive predicted value Negative predicted
KNN 86.2 85 87 13
Naïve Bayes 72 82 85.2 14.8
SVM 95.13 95.13 89 11
Specificity (SP) and sensitivity (S) of classifier models are used to evaluate their
performance (SE). They are defined as follows:
TN
Specificity =
TN + FT
TP
Sensitivity =
TP + FN
where
TP correctly classified positive class (True positive).
TN correctly classified negative class (True negative).
FP incorrectly classified positive class (False positive).
FN incorrectly classified negative class (False negative).
5 Advantages and Future Scope
By using hybrid feature extraction of HOG, GLCM along with the convolution
neural network features, the proposed method became more accurate. Classifier got
high sensitivity and specificity compared with other methods. Here, use multi-SVM
classifier for classifier so we can add more skin disease classes and works like a skin
specialist who can identify any of the skin disease in the future. Further investigations
on deeper convolution network for classification may increase the accuracy.
6 Conclusion
Skin lesions were classified using hybrid feature extraction in this proposed study,
which is described in detail below. The suggested technique is utilised to Kaggle
images of skin lesions taken with a digital camera. Images of three distinct kinds
of skin diseases, including melanoma, are included inside the files. In addition to
GLCM and HOG, CNN is used for feature extraction in this application. The GAC
method was used to segment the skin lesion, which was suggested as a solution.
It has been possible to achieve segmentation with a JA of 0.9 and a DI of 0.82 in
this study. It is possible to extract CNN features by utilising the ResNet-18 transfer
learning technique, whilst texture features may be retrieved by using the GLCM
and HOG methods. In this instance, we use a multi-SVM classifier to allow for the
inclusion of additional skin disease classes in the future, as well as to serve as a skin
expert capable of detecting any skin condition in the future. The suggested technique
was tested on a variety of datasets, including pictures of lesions on the skin. The
multi-SVM classifier categorises the pictures into three different categories of skin
diseases with 95.2% accuracy and 924.8% precision, according to the manufacturer.
As a result, we may be able to add more skin ailment classifications in the future and
act as a skin expert who is capable of detecting any skin condition. In the light of the
information gathered, we can infer that accuracy is enhanced after the implementation
of augmentation performance. Also, possible is the use of this technique on a neural
network platform to enhance accuracy.
References
1. Jaisakthi SM, Mirunalini P, Aravindan C (2018) Automated skin lesion segmentation of dermo-
scopic images using GrabCut and k-means algorithms. IET Comput Vis 12(8):1088–1095
2. Chung DH, Sapiro G (2000) Segmenting skin lesions with partial-differential- equations-based
image processing algorithms. IEEE Trans Med Imaging 19(7):763–767
3. Hemalatha RJ, Thamizhvani TR, Dhivya AJ, Joseph JE, Babu B, Chandrasekaran R (2018)
Active contour based segmentation techniques for medical image analysis. Med Biolog Image
Anal 4:17
4. Salih SH, Al-Raheym S (2018) Comparison of skin lesion image between segmentation
algorithms. J Theor Appl Inf Technol 96(18)
5. Li Y, Shen L (2018) Skin lesion analysis towards melanoma detection using deep learning
network. Sensors 18(2):556
6. Kasmi R, Mokrani K (2016) Classification of malignant melanoma and benign skin lesions:
implementation of automatic ABCD rule. IET Image Proc 10(6):448–455
7. Bakheet S (2017) An SVM framework for malignant melanoma detection based on optimized
hog features. Computation 5(1):4
8. Khan MQ, Hussain A, Rehman SU, Khan U, Maqsood M, Mehmood K, Khan MA (2019)
Classification of melanoma and nevus in digital images for diagnosis of skin cancer. IEEE
Access 7:90132–90144
9. Victor A, Ghalib M (2017) Automatic detection and classification of skin cancer. Int J Intell
Eng Syst 10(3):444–451
10. Goel R, Singh S (2015) Skin cancer detection using glcm matrix analysis and back propagation
neural network classifier. Int J Comput Appl 112(9)
11. Kawahara, J.; Hamarneh, G. Fully convolutional networks to detect clinical dermoscopic
features. arXiv 2017, arXiv:1703.04559.
12. Jerant AF, Johnson JT, Sheridan CD, Caffrey TJ (2000) Early detection and treatment of skin
cancer. Am Fam Phys 62:381–382
13. Binder M, Schwarz M, Winkler, A, Steiner A, Kaider A, Wolff K, Pehamberger H (1995)
Epiluminescence microscopy. A useful tool for the diagnosis of pigmented skin lesions for
formally trained dermatologists. Arch Dermatol 131:286–291
14. Celebi ME, Wen Q, Iyatomi H, Shimizu K, Zhou H, Schaefer G (2015) A state-of-the-art survey
on lesion border detection in dermoscopy images. In: Dermoscopy image analysis. CRC Press,
Boca Raton, FL, USA
15. Erkol B, Moss RH, Stanley RJ, Stoecker WV, Hvatum E (2005) Automatic lesion boundary
detection in dermoscopy images using gradient vector flow snakes. Skin Res Technol 11:17–26
16. Celebi ME, Aslandogan YA, Stoecker WV, Iyatomi H, Oka H, Chen X (2007) Unsupervised
border detection in dermoscopy images. Skin Res Technol 13
(29 pages) World Scientific Publishing Company. https://doi.org/10.1142/S02195194215
00056.
an intelligent approach. Comput Methods Biomech Biomed Eng Imaging Vis. Taylor & Francis.
https://doi.org/10.1080/21681163.2019.1647459
nosis of diabetic retinopathy. Int J Image Graphics 20(4):2050030 (29pages). World Scientic
20. Nair AT, Muthuvel K (2021) Effectual evaluation on diabetic retinopathy Lecture notes in
networks and systems, vol 191. Springer, Singapore. https://doi.org/10.1007/978-981-16-0739-
4_53
21. Nair AT, Muthuvel K (2021) Blood vessel segmentation for diabetic retinopathy. J Phys Conf
Ser 1921012001
DeepFake Creation and Detection Using
LSTM, ResNext
Dhruti Patel, Juhie Motiani, Anjali Patel, and Mohammed Husain Bohara
Abstract Technology was created as a means to make our lives easier. There is
nothing more fast-paced than the advancements in the field of technology. Decades
ago, virtual assistants were only a far-fetched imagination; now, these fantasies have
become a reality. Machines have started to recognize speech and predict stock prices.
Witnessing self-driving cars in the near future will be an anticipated wonderment.
The underlying technology behind all these products is machine learning. Machine
learning is ingrained in our lives in ways we cannot fathom. It may have many good
sides but it is misused for personal and base motives. For example, various forged
videos, images, and other content termed as DeepFakes are getting viral in a matter of
seconds. Such videos and images can now be created with the usage of deep learning
technology, which is a subset of machine learning. This article discusses the mecha-
nism behind the creation and detection of DeepFakes. DeepFakes is a term generated
from deep learning and fake. As the name suggests, it is the creation of fabricated
and fake content, distributed in the form of videos and images. Deep learning is one
of the burgeoning fields which has helped us to solve many intricate problems. It
has been applied to fields like computer vision, natural processing language, and
human-level control. However, in recent years, deep learning-based software has
accelerated the creation of DeepFake videos and images without leaving any traces
of falsification which can engender threats to privacy, democracy, and national secu-
rity. The motivation behind this research article was to spread awareness among the
digitally influenced youth of the twenty-first century about the amount of fabricated
content that is circulated on the internet. This research article presents one algorithm
used to create DeepFake videos and, more significantly, the detection of DeepFake
videos by recapitulating the results of proposed methods. In addition, we also have
discussed the positive aspects of DeepFake creation and detection, where they can
be used and prove to be beneficial without causing any harm.
D. Patel (B) · J. Motiani · A. Patel · M. H. Bohara

Department of Computer Science and Engineering, Devang Patel Institute of Advance
Technology and Research (DEPSTAR), Charotar University of Science and Technology
(CHARUSAT), Changa 388421, India
M. H. Bohara
e-mail: Mohammedbohara.ce@charusat.ac.in
https://doi.org/10.1007/978-981-16-7610-9_75
1040 D. Patel et al.
Keywords DeepFake · DeepFake creation · DeepFake detection · Generative

Adversarial Networks
1 Introduction
Fake images and fake videos formed by DeepFake methods have become a great
public concern. The term “DeepFake” means to swap the face of one person by
the face of another. The first DeepFake video was generated by a Reddit user in
2017 to morph the celebrity photos faces in pornography by using machine learning
algorithms. Furthermore, some other harmful uses of DeepFake are fake news and
financial fraud. Due to these factors, research traditionally devoted to general media
forensics is being revitalized and is now dedicating growing efforts to detecting facial
manipulation in images and videos [1].
The enhancing intricacy of cell phones as well as the development of social
networks have resulted in an enormous increase in brand-new digital object contents
in recent times. This extensive use of electronic images has resulted in a rise in
strategies for changing image web content [2]. Up until recently, such techniques
stayed out of range for the majority of customers because they were lengthy as
well as tedious, and they necessitated a high level of computer vision domain name
proficiency. Those constraints have continuously discolored away, thanks to recent
growths in maker learning and accessibility to vast quantities of training data. Conse-
quently, the time required to produce and control electronic web content has lowered
significantly, allowing unskilled individuals to modify the content at their leisure.
Deep generative versions, particularly, have lately been commonly used to create
fabricated photos that appear all-natural. These models are based upon deep neural
networks, which can estimate the real-data distribution of a provided training dataset.
Consequently, variants might be added to the found-out circulation by testing from it.
Two of the most frequently made use of and also effective techniques are Variational
Autoencoders (VAE) and also Generative Adversarial Networks (GAN). Particu-
larly, GAN techniques have lately been pushing the limits of cutting-edge outcomes,
boosting the resolution and top quality of pictures generated. Therefore, deep gener-
ative designs are ushering in a new period of AI-based fake image generation, paving
the way for the quick dissemination of top-quality tampered photo web content [2].
Face manipulation is broken down into four categories:
(i) Entire face synthesis
(ii) Identity swap (Deep Fakes)
(iii) Attribute manipulation, and
(iv) Expression swap [1].
Illustrations of these face manipulation categories are given below in Fig. 1.
One of the mechanisms which can manipulate or change digital content is “Deep-
Fake.” DeepFake is a word that is derived from “Deep Learning” and “Fake.” It
is a mechanism through which one can morph or change an image over a video
DeepFake Creation and Detection Using LSTM, ResNext 1041
Fig. 1 Examples of facial manipulation groups of real and fake images [1, 3, 4].
and thereby creating a new fabricated video that may appear to be real. The under-
lying mechanism for the whole of DeepFake development are the autoencoders and
the Generative Adversarial Networks (GAN), which are deep learning models. Their
usage is concentrated in the computer vision field. These models are used to analyze a
person’s facial expressions and movements and synthesize facial images of someone
with similar expressions and movements. So, through the DeepFake mechanism, we
can create a video of a person saying or doing things that the other person is doing
just by using an image of the target person and a video of the source person.
2 Methods
In the following section, we describe our approach toward DeepFake creation and
DeepFake detection algorithms.
2.1 DeepFake Creation
The popularity of DeepFakes can be attributed to the creative users who target
celebrities and politicians to generate fake and humorous content. DeepFakes have
burgeoned over the past 3–4 years due to the quality of tempered videos and also
the easy-to-use capability of its application to a broad range of users from profes-
sional to amateur. These applications evolved on deep learning techniques. One
such application is called Faceswap, captioned as “the leading free and open-source
multi-platform Deepfakes software.” Deep autoencoders constitute the blueprint of
this application. The idea behind using autoencoders is dimensionality reduction
and image compression because deep learning is well known for extracting the
higher-level features from the raw input.
A brief introduction about the techniques used is given below:
1. CNN: Convolutional Neural Network (CNN or ConvNet) is a category of deep
neural networks which are primarily used to do image recognition, image
classification, object detection, etc.
Image classification is the challenge of taking an input image and outputting a
category or a possibility of instructions that best describes the image. In CNN, we
take an image as an input, assign significance to its numerous aspects/functions
in the image and have the ability to distinguish one from another. The prepro-
cessing required in CNN is a lot lesser compared to different classification
algorithms [5, 6].
2. RNN: RNN is short for Recurrent Neural Network. RNN is used to remember
the past and selections made by the RNN are influenced by the past. Only one
additional input vector is provided to the RNN to produce single or multiple
output vectors. These outputs are not only governed by the weights that are
applied on the input but also by a “hidden” state vector. This hidden state vector
represents the context supporting previous input(s)/output(s) [7].
3. GAN: GANs stand for Generative Adversarial Networks. As the name implies,
GANs are largely used for generative purposes. They generate new and fake
outputs based on a particular input. GANs comprise of two sub models, which
are the generator model and the discriminator model. The difference between
the two is that the generator model, as the name suggests, is trained to generate
or create new examples, whereas the discriminator model is more of a binary
classification model that tries to identify the generated output as real or fake.
Interestingly, the discriminator model is trained till it believes half the times that
the generator model has produced a plausible output.
DeepFake creation uses two autoencoders, one trained on the face of the target
and the other on the source. Once the autoencoders are trained, their outputs are
switched, and then something interesting happens. A DeepFake is created!
The autoencoder separates the inert properties from the face picture and the
decoder is utilized to reproduce the face pictures. Two encoder-decoder sets are
needed to trade faces between source pictures and target pictures where each pair is
Fig. 2 A diagram depicting the working of two encoder-decoder pairs [8]
utilized to prepare a picture set, and the encoder’s boundaries are divided among two
organization sets. This strategy assists normal encoders with finding and learning
likeness between two arrangements of face pictures since faces by and large have
comparative credits like eyes, nose, and so forth. We can say that the encoder provides
data in a lower dimension, thus performing dimensionality reduction. The job of the
decoder is to reconstruct the face again from the compressed and extracted latent
features. Figure 2 shows the DeepFake creation measure.
One may notice that the diagram shown in Figure 2 uses the same encoder but
two different decoders. Since latent features are common to all faces, the job of an
encoder remains uniform for all inputs. However, in order to generate a morphed
picture, one needs to use the decoder of the driving image on the source image.
2.2 DeepFake Detection
Creating DeepFake and spreading it over the social media platforms and swapping
the faces of celebrities and politicians to bodies in porn images or videos can be
threatening to one’s privacy. Sometimes DeepFake threatens the world’s security
with videos of world leaders with fake speeches and falsification purposes and even
used to generate fake satellite images. Therefore, it can be menacing to privacy,
democracy, and national security. This raises concern to detect DeepFake videos
from genuine ones.
A brief introduction about the techniques used for detection of deepfakes is given
below:
1. LSTM: Long Short-Term Memory or LSTM are artificial recurrent neural

networks, which as the name suggests are capable of learning the order depen-
dence in sequence prediction problems. LSTMs are majorly used in tortuous
machine learning domains such as machine translation and speech recognition
[9].
2. ResNext: It is a network architecture that is mainly used for image classification.
This architecture is built on repeated building blocks that aggregate a set of
transformations with the same topology.
DeepFake detection can be considered as a binary classification problem between
authentic videos and tampered ones. DeepFake detection techniques are different
for fake image detection and fake video detection. We have primarily focused on
DeepFake video detection.
These fabricated videos can be identified by temporal features across the frames
[8]. The temporal features are the features related to time, these time-domain features
are simple to extract and provide easy physical interpretation. A video is composed
of coherent frames. Any manipulation done on a video occurs on a frame-by-frame
basis, the unevenness between contiguous frames manifest as temporal discrepancies
across frames. It can be metaphorically described as replacing a puzzle piece with
another random piece that doesn’t properly fit.
Current DeepFake detection methods rely on the drawbacks of the DeepFake
generation pipeline. The detection method using LSTM and ResNext parallels the
method used to create a DeepFake by the generative adversarial network. This method
exploits certain characteristics of the DeepFake video, since this task involves the
usage of computational resources and production time, the algorithm will combine
the face images of a fixed size only. The next step is to subject these images to affinal
warping. The idea is to check whether collinearity is preserved or not after an affine
transformation. Affinal warping will unveil any resolution inconsistencies between
the warped face area and surrounding context. The target video is divided into frames,
and the corresponding features are extracted by a ResNext Convolutional Neural
Network (CNN). The aforementioned temporal inconsistencies are captured by the
Recurrent Neural Network (RNN) with the Long Short-Term Memory (LSTM). The
simplification of the process is done by directly simulating the resolution inconsis-
tency in the affine face wrappings. This is then used to train the ResNext CNN model
[10]. The prediction flow of DeepFake detection is given in Fig. 3.
Fig. 3 Architecture of DeepFake detection

3 Experiments
In this segment, we present the devices and exploratory arrangement we used to plan
and foster the model to implement the model. We will introduce the outcomes gained
from the execution of the DeepFake detection model and give an understanding of
the exploratory outcomes [11].
3.1 Dataset
The detection model has been trained with three pairs of datasets. The variety of
datasets allow the model to train from a diverse dataset and create a more generic
model.
The description of the datasets has been listed below:
1. FaceForensics ++: This dataset largely consists of manipulated datasets. It
has 1000 original videos tampered with four automated face manipulation
techniques, which are, DeepFakes, Face2Face, FaceSwap, and NeuralTextures
[3].
This dataset itself has been derived from 977 distinct YouTube videos. These
videos primarily contain frontal face occlusions which allow the automated
tampering methods to regenerate realistic forgeries. This data can be used for
both image and video classification.
2. Celeb-DF: This dataset is a large-scale dataset for DeepFake Forensics. It stands
apart from other datasets for having DeepFake synthesized videos having similar
visual quality at par with those circulated online [4].
It contains 590 original videos collected from YouTube. This dataset has
been created carefully keeping in mind to maintain the diversity of the dataset,
thus, it contains subjects of different ages, ethnic groups, and genders. It also
contains 5639 corresponding DeepFake videos.
3. DeepFake Detection Challenge: [12] This dataset is provided by Kaggle. This
data contains files in the ‘.mp4’ format, which is split and compressed into sets
of 10 GB apiece. The files have been labeled REAL or FAKE and accordingly
the model is trained.
The prepared dataset used to train the model include 50% of the real videos and
50% the manipulated DeepFake videos. The dataset is split into a 70–30 ratio, i.e.,
70% for training and 30% for testing.
3.2 Proposed System
First, the dataset is split in a 70–30 ratio. In the preprocessing phase, the videos in the
dataset are split into frames. After that face is detected, the detected face is cropped
Table 1 Depicting respective

Model No. of videos No. of frames Accuracy
models and their accuracies
[10]. Source: https://github. 1 6000 10 84.21461
com/abhijitjadhav1998/Dee 2 6000 20 87.79160
pfake_detection_using_
3 6000 40 89.34681
deep_learning
4 6000 60 90.59097
5 6000 80 91.49818
6 6000 100 93.58794
from the frame. Frames with no detected faces are ignored in preprocessing. The
model includes ResNext CNN, followed by one LSTM layer and the preprocessed
data of cropped faces videos are split into train and test dataset. ResNext is used to
accurately extract and detect the frame-level features. LSTM is used for sequence
processing so that temporal analysis can be done on the frames. And, then the video
is passed to the trained model for prediction whether the video is fake or real [10].
3.3 Evaluation
Models trained and their respective accuracy [10] (Table 1).

For this project, we have used the second model with an accuracy of 87.79160.
The relation between the number of frames and the accuracy seems to be directly
proportional. The increased number of frames allows the algorithm to easily identify
the distortions between the adjacent frames and thus yielding a higher accuracy.
However, the larger the number of frames the slower the algorithm responds.
4 Results
This section depicts the working of our model. The DeepFake creation has been
depicted in Fig. 4. An image and a driver video are passed to the model to create a
resultant DeepFake. The generated DeepFake is of low resolution because when the
input is of high resolution, an extremely accurate GAN is required to generate fake
videos which are hard to detect. The poor resolution of the DeepFake makes it easily
identifiable by the naked eye, however, advancements in DeepFake technology are
making it increasingly difficult to identify DeepFakes even with the help of detection
algorithms.
DeepFake detection results have been depicted in Figs. 5 and 6. With the help of
LSTM and ResNext, we were able to build a model that detects fabricated videos
based on the inherent inconsistencies between the frames. These results were derived
by passing test data to a pre-trained model. The model was trained on a dataset of
videos containing low-resolution videos divided into 20 frames per video.
Fig. 4 Output of DeepFake creation
Fig. 5 Output of DeepFake detection showing that the provide video is REAL along with the
confidence of prediction
Fig. 6 Output of DeepFake detection showing that the provided video is FAKE along with the
confidence of prediction
DeepFake detection algorithms need to catch up with the constantly improving

creation algorithms. As more and more fabricated content floods the internet, the
more detection algorithms are unable to detect.
5 Challenges
Although the performance and quality of the creation of DeepFake videos and espe-
cially in the detection of DeepFake videos have greatly increased [13], the challenges
affecting the ongoing detection methods are discussed below.
The following are some challenges in DeepFake detection:
(1) Quality of DeepFake Datasets: To develop DeepFake detection methods, we
require the availability of copious datasets. However, the available datasets
have certain impairments such as there lies a significant difference in the visual
quality to the actual fake videos circulated on the internet. These imperfections
in the dataset can either be some color discrepancy, some parts of the original
face are still visible, low-quality synthesized faces, or certain inconsistencies
in the face orientations [13].
(2) Performance Evaluation: The current DeepFake detection is considered as a

binary classification problem, where each video is classified as real or fake. The
DeepFake detection methods are helpful only when the fabricated videos are
created from the corresponding DeepFake creation algorithms. Nevertheless,
many factors affect the detection methods when we implement them in the
real world, i.e., videos fabricated in other ways than DeepFake, a video with
multiple faces, and the picture is murkier. Therefore, binary classification needs
to be expanded to multi-class, and multi-label to handle the complexities of
the real world [13].
3) Social Media Laundering: A myriad number of DeepFake videos are spreading
through social media, i.e., Instagram, Facebook, and Twitter. To reduce the
bandwidth of the network and to protect users’ privacy, these types of videos
usually remove meta-data, reduce the video size, and then compress it before
uploading, it’s usually known as social media laundering. Because of this, we
cannot recover the traces of manipulation and chances are high for detecting a
fake video as a real one. Therefore, the robustness of the DeepFake detection
method should be improved to avoid such types of issues [13].
The limitation of the proposed system of DeepFake detection is that the method
we have used has not considered the audio. So, the DeepFakes with audios will not
be detected [10].
6 Current Systems
In this part, we give insights regarding the current system that can be utilized to
generate DeepFake videos.
Currently, applications such as FakeApp, Zao, and DeepFaceLab are used to
create DeepFake videos and images. The first DeepFake application to appear on the
internet was called DeepFaceLab. This application is very useful for understanding
the step-by-step process of a DeepFake creation. DeepFaceLab allows users to swap
faces, replace entire faces, age of people, and change lip movements. So, one could
easily morph an image or video and create a phony one. Zao is a Chinese application
that allows users to create DeepFake videos, but it is observed that Zao cannot create
natural images of Indian faces because it is mainly trained with Chinese facial data.
It can be clearly known whether it is real or fake using the Zao app for Indian faces.
Faceswap is another DeepFake application that is free and open source. It is supported
by Tensorflow, Keras, and Python. The active online forum of Faceswap allows
interested individuals to get useful insights on the process of creation of DeepFakes.
The forum accepts questions and also provides tutorials on how to create DeepFake
[14].
7 Discussion
In this paper, we have delineated and assessed the mechanism of face manipulation
(DeepFake). We have explained the methods for creating the fake identity swap
video and also how to detect such videos. We were able to create a low-resolution
DeepFake video as the accessible frequency spectrum is much smaller. Although,
we can create a DeepFake video and also detect one. DeepFake is a technology that
has many negative aspects and if not applied wisely may cause a threat to society
and turn out to be dangerous. Since most online users believe stuff on the internet
without verifying them, such DeepFakes can create rumors.
Looking at the positive aspects, the concept of DeepFakes can be applied to create
DeepFake videos/images which can be used in a creative way, like one who is not able
to speak or communicate properly can swap their face with the video of a good orator
and hence can create their video. It can also be used in film industries for updating
the episodes without reshooting them. Face manipulated videos can be created for
entertainment purposes unless not creating any threat to society or someone’s privacy.
The DeepFake detection method can be applied in the courtrooms to check
whether the evidence provided in digital form is real or fake. It could be very bene-
ficial for such scenarios. Every coin has two sides, and thus, technology has its pros
and cons so if used wisely can be a boon for society.
8 Summary and Conclusion
Many real-life problems have been solved as a result of technological advancements,

but certain technologies have more negative aspects than positive ones. One of such
examples is face manipulation or DeepFake or identity swap. In this article, we
have discussed in detail the concept of DeepFakes and briefly about the DeepFake
creation algorithms and detection algorithm. In addition to that we have implemented
the algorithms and displayed the results accordingly. We were able to create a low-
resolution DeepFake video as the available frequency spectrum is much smaller. We
were able to create the DeepFake detection model with an accuracy of 87%.
DeepFake has more negative aspects than positive ones. Hence, more research is
being carried out for various detection methods. There is a gap between the DeepFake
creation and detection technologies, and the latter is lagging. It is important to educate
society on the malicious intent behind the creation of DeepFakes.
Acknowledgements Every work that one accomplishes relies on constant motivation, benevolence,
and moral support of people around us. Therefore, we want to avail this opportunity to show our
appreciation to a number of people who extended their precious time, support, and assistance in the
completion of this research article. This research article has given us a wide opportunity to think
and expand our knowledge about new and emerging technologies. Through this research article,
we were able to explore more about the current research and related experiments. Therefore, we
would like to show our gratitude to our mentors for their guidelines throughout the process and for
encouraging us to look forward to learning and implementing new emerging technologies.
References
1. Tolosana R, Vera-Rodriguez R, Fierrez J. Morales A, Ortega-Garcia J (2020) DeepFakes and

beyond: a survey of face manipulation and fake detection. Inf Fusion 64. https://doi.org/10.
1016/j.inffus.2020.06.014
2. Durall R et al (2020) Unmasking DeepFakes with simple features. arXiv:1911.00686
3. Rössler A, Cozzolino D, Verdoliva L, Riess C, Thies J, Nießner M (2019) FaceForensics++:
learning to detect manipulated facial images
4. Li Y, Sun P, Qi H, Lyu S (2020) Celeb-DF: a large-scale challenging dataset for DeepFake
forensics. In: IEEE conference on computer vision and pattern recognition (CVPR)
5. Modi S, Bohara MH (2021) Facial emotion recognition using convolution neural network. 2021
5th International conference on intelligent computing and control systems (ICICCS). IEEE
6. Parekh M (16 July 2019) A brief guide to convolutional neural network (CNN). Medium. https://
medium.com/nybles/a-brief-guide-to-convolutional-neural-network-cnn-642f47e88ed4
7. Venkatachalam M (1 March 2019) Recurrent neural networks. Towards data science. https://
towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce
8. Nguyen T. Nguyen CM. Nguyen T. Nguyen D, Nahavandi S (2019) Deep learning for
DeepFakes creation and detection: a survey
9. Brownlee J (17 June 2019) A gentle introduction to generative adversarial networks (GANs).
Machine learning mastery. https://machinelearningmastery.com/what-are-generative-advers
arial-networks-gans/
10. Jadhav A et al (2020) DeepFake video detection using neural networks. IJSRD—Int J Sci Res
Dev 8(1):4. https://github.com/abhijitjadhav1998/DeepFake_detection_using_deep_learning/
blob/master/Documentation/IJSRDV8I10860.pdf
11. Wodajo D, Solomon A (2021) DeepFake video detection using convolutional vision trans-
former, p 9. Arrive. https://arxiv.org/pdf/2102.11126.pdf
12. DeepFake detection challenge dataset. Kaggle. https://www.kaggle.com/c/DeepFake-detect
ion-challenge/data
13. Lyu S (2020) DeepFake Detection: current challenges and next steps. In: 2020 ieee international
conference on multimedia & expo workshops (ICMEW). IEEE Computer Society, pp 1–6
14. Zhukova A (24 August 2020) 7 Best DeepFake apps and websites. OnlineTech Tips. https://
www.online-tech-tips.com/cool-websites/7-best-DeepFake-apps-and-websites/
Classification of Plant Seedling Using
Deep Learning Techniques
K. S. Kalaivani, C. S. Kanimozhiselvi, N. Priyadharshini, S. Nivedhashri,

and R. Nandhini
Abstract Agriculture is an important livelihood of many nations. Increase in world’s

population leads to an increasing demand for food and cash crops. It is a big deal
to growth crops in this changing climate conditions. So, it is important to increase
the production in agriculture at low cost. In agriculture, weeds are a big issue for
farmers. Weeds take up the nutrients and water which is given for crops that causes
huge loss. Chemical herbicides are used to kill weeds, but they are harmful to the
ecosystem and raise costs. In order to protect the environment and save money, an
automated machine vision system that can identify crops and remove weeds in a safe
and cost-effective manner is needed. To classify between plant species, the dataset
used was downloaded from Kaggle platform that consist of 12 different species of
plants, namely black-grass, charlock, cleavers, common chickweed, common wheat,
fat hen, loose silky-bent, maize, scentless mayweed, shepherds purse, small-flowered
cranesbill, sugar beet. Convolutional neural network (CNN) was used to classify the
plant species. The accuracy obtained was 74%. In order to improve the accuracy,
CNN variants like VGG-19 and ResNet-101 are used in this work. The accuracy
obtained for VGG-19 and ResNet-101 are 87% and 94%, respectively. From the
results obtained, it is found that ResNet-101 model outperforms VGG-19 and basic
CNN for classifying plant species. In addition, hyperparameter tuning like batch size,
learning rate, optimizer is performed on ResNet-101.
Keywords Deep learning · Plant seedling classification · CNN · Resnet-101 ·

VGG-19
K. S. Kalaivani (B) · C. S. Kanimozhiselvi · N. Priyadharshini · S. Nivedhashri · R. Nandhini

Department of Computer Science and Engineering, Kongu Engineering College, Perundurai,
Erode, Tamil Nadu, India
https://doi.org/10.1007/978-981-16-7610-9_76
1054 K. S. Kalaivani et al.
1 Introduction
Food and oxygen become a necessary thing for the living organisms in the world. The
countries similar to India do agriculture as their important work, proper automation
of the farming process will help to maximize crop yield while also ensuring long-
term productivity and sustainability [1, 2]. The yielding of crop in agriculture is
challenging due to weed invasion on farm lands. In general, weeds are unwanted
plants in farm land. It has no valuable benefits like nutrition, food, and medication.
The growth of weed is faster when compared to crops and hence deplete the growth
of crops. Weeds take the nutrient and the space which is required for crops to grow.
To obtain better productivity, it is necessary to remove the weeds from farming
land at early stage of growth. The manual removal of weeds is not so easy and
efficient process. For precision agriculture decision-making system is employed to
save resources, control weeds, and to minimize the cost. Robots are involved for
removing the weeds from field. It is necessary to accurately detect a weed from
field through vision machines [3–6]. In this work, the dataset is taken from Kaggle
platform which consist of 12 species of plants. The dataset contains totally 5545
images. The basic CNN is widely used to classify the plant species. To improve
the classification accuracy, VGG-19 and ResNet-101 architecture is used. VGG-19
architecture has nineteen layer deep network, and Resnet-101 has 101 layers. The
proposed architecture helps to enhance the vision mission to classify plant species
accurately when compared to existing work.
Ashqar [7] has implemented CNN architecture for classifying plant seedlings.
The implemented algorithms are used extensively in this task to recognize images.
On a held-out test range, the implemented model finds eighty percent correctly,
demonstrating the feasibility of this method.
NKemelu [8] compared the performances of two traditional algorithms and a
CNN. From the obtained result, it is found that when comparing CNN with traditional
algorithms, the basic CNN architecture obtain more accuracy.
Elnemr [9] developed CNN architecture to differentiate plant seedling images
between crop and weed at early growth stage. Due to the combination of normaliza-
tion layer, pooling layer, and the filter used, performance has been increased in this
system. With help of elaborated CNN, this work achieved higher precision. In this
work, complexity is reduced, and also, CNN helps the system to achieve accurate
classification. The segmentation phase is involved in order to classify plant species.
This work can be combined with IoT for controlling the growth of weeds by spraying
herbicides. This system achieved accuracy of 90%.
Alimboyong [10] proposed deep learning approaches like CNN and RNN archi-
tectures. The dataset used for classification contains 4234 images belonging to 12
plants species taken from Aarhus University Signal Processing group. This system
achieves low consumption of memory and high processing capability. Performance
metrics like sensitivity, specificity, and accuracy are considered for evaluation. This
system involves three phases. First, the data are augmented and then compared with
the existing one. Second one is a combination of RNN and CNN using various other
Classification of Plant Seedling Using Deep Learning Techniques 1055
Table 1 Number of images

Type of species Number of images
present in each species
Black-grass 263
Charlock 390
Cleavers 287
Common chickweed 611
Common wheat 221
Fat hen 475
Loose silky-bent 654
Maize 221
Scentless mayweed 516
Shepherds mayweed 516
Small-flowered cranesbill 496
Sugar beet 385
plant seedling dataset. Finally, a mobile application for plant seedling images is
created using a developed model. This work produced an accuracy of 90%.
Dyrmann [11] worked on deep learning techniques for classification of crop and
weed species from different dataset. The dataset contains 10,413 images where it has
22 different crop and weed species. These images are taken from six different dataset
and combined. The proposed convolution neural network recognizes plant species in
color images. The model achieves approximately eighty six percent in classification
between species.
Rahman [12] developed deep learning techniques to identify plant seedlings at
early stage. The quality, quantity, f1-score, and accuracy were measured for the
proposed architecture. By using this calculation, comparison is made with previous
implemented architecture. From this work, ResNet-50 performs well when compared
to previous model. It produces accuracy of 88% than previous work.
Haijian [13] proposed CNN variants like VGG-19 is used for classification of pest
in vegetables. Fully connected layers have been optimized by VGG-19. The analysis
shows that VGG-19 performs better than existing work. The accuracy obtained is
97%.
Sun [14] has designed twenty-six layer deep learning model with eight residual
building blocks. The prediction is done at natural environment. The implemented
model predicts 91% accurately.
2 Materials and Methods
The dataset is taken from Kaggle platform. The total number of images present in
this dataset is 5550. Training dataset contains 4750 images, and test dataset contains
790 images. This dataset contains 12 species of plants (Table 1).
Fig. 1 CNN architecture
2.1 Convolutional Neural Network
CNN is widely used to classify plant seedlings in order to distinguish among crop
and weed species at an early stage of development. The CNN has three layers such as
input layer, hidden layers, and output layer. Before passing images to input layer, the
images are equally resized. There are five stages of learning layers in hidden layer.
Each convolutional layer at each stage uses the filters with kernel sizes of 3 × 3 and
a number of filters such as 32, 64, 128, 256, and 1024, respectively (Fig. 1).
2.2 VGG-19
VGG-19 is a 19-layer network with 16 convolution layers along with pooling layers,
two fully connected layers, and a dense layer (Fig. 2).
It uses 16 convolution layers with different filters. The number of filters is 64,
128, 256, 512 used in different convolution layers. Each layer 1 and layer 2 has
two convolution layers with 64 filters and 128 filters, respectively. Layer 3 has four
Fig. 2 VGG-19 architecture

Fig. 3 Skip connection in

ResNet
convolution layers with 256 filters. Each layer 4 and 5 has four convolution layers
with 512 filters. Layer 5 has three convolution layers with filter 512. The input of
VGG-19 is a fixed size of 224 × 224 × 3. The input size of the filters is given as 3 ×
3. In fully connected layers, the first two fully connected layers have 4096 channels
each activated by ReLU activation function, and the third fully connected layer has
1000 channels which acts as an output layer with softmax activation.
2.3 ResNet-101
It is made up of residual blocks. There are 101 layers in Resnet-101. It introduced an

approach called residual network to solve the problem of vanishing and exploding
gradient. In this network, a technique called skip connection is used. The skip connec-
tion skips few layers and directly connects to the output. The advantage of adding
skip connection is the layers will be skipped by regularization if it affects the perfor-
mance of architecture. This results in training neural networks without the problem
of vanishing or exploding gradient (Fig. 3).
2.4 Optimizer
Optimizer is a method or an algorithm that can be used to reduce the loss by changing
the attributes of neural networks such as weights and learning rates. It is used to solve
the optimization problems by reducing the function. The following are the different
types of optimizers used.
AdaGrad. It is an optimization method that adjusts the learning rate to the parameters.
It updates low learning rates for parameters related with usually occurring features
and high learning rates for parameters related with unusually occurring features. It
takes default learning rate value of 0.01 and ignores the need to tune the learning
rate manually.
Stochastic Gradient Descent (SGD). In each iteration, the model parameter gets
updated. It means the loss function is tested, and model is updated after each training
sample. The advantage of this technique is that it requires low memory.
Root-Mean-Square Propagation (RMS Prop). It balances the step size by
normalize the gradient itself. It uses adaptive learning rate, which means the learning
rate changes overtime.
Adam. It is a replacement optimization method for SGD to train deep learning
models. It combines the properties of AdaGrad and RMSprop to provide optimiza-
tion when handle with large and noisy problems. It is efficient because the default
parameters perform well in most of the problems.
To improve the accuracy, CNN variants like VGG-19 and ResNet-101 are used in this
work. The accuracy obtained for VGG-19 and ResNet-101 is 87% and 94%, respec-
tively. From the results obtained, it is found that ResNet-101 model outperforms
VGG-19 and basic CNN for classifying plant species (Fig. 4).
The above graph shows the accuracy comparison of VGG-19 and ResNet-101
model with different Epochs (Fig. 5).
The above graph shows the accuracy with different batch sizes for ResNet-101.
When batch size is increased, the accuracy also gets increased (Fig. 6).
The above graph shows the accuracy by varying learning rates like 0.1, 0.01,
0.001, and 0.0001. The higher accuracy obtained for 0.0001 learning rate.
Fig. 4 Epoch-wise accuracy

Fig. 5 Batch-wise accuracy
Fig. 6 Accuracy for different learning rate
4 Conclusion
The main aim of this project is classify the plant species in order to remove the
weeds in the farmland. Removing weeds help the plants to get enough nutrients and
water which in turn makes the plant grow healthier. This increases the productivity
and gives good yield to the farmers. In this paper, we proposed VGG-19 model and
ResNet-101 model for plant seedlings classification. A dataset containing images of

12 different species is used in this project. The total number of images present in this
dataset is 5550. The model can detect and differentiate a weed from other plants. By
comparing the accuracy of VGG-19 and ResNet-101, it is found that ResNet-101
model outperforms VGG-19 model. Further the hyperparameter tuning is performed
on ResNet-101. The different optimizers like Adam, Adagrad, SGD, and RMS prop
are used. Among the optimizers, Adam optimizer with 128 batch size and 0.0001
learning rate performs better on ResNet-101. The accuracy obtained is 94%. The
proposed system can be extended to work with robotic arms for performing actual
weeding operation in large farmlands.
References
1. Chaki J, Parekh R, Bhattacharya S (2018) Plant leaf classification using multiple descriptors:
a hierarchical approach. J King Saud Univ—Comput Inf Sci 1–15
2. Prakash RM (2017) Detection of leaf diseases and classification using digital ımage processing
3. Kamilaris A, Prenafeta-boldú FX (2018) Deep learning in agriculture: a survey 147(July
2017):70–90, 2018
4. Mohanty SP, Hughes D, Salathé M (2016) Using deep learning for ımage-based plant disease
detection
5. Grinblat GL, Uzal LC, Larese MG, Granitto PM (2016) Deep learning for plant identification
using vein morphological patterns. Comput Electron Agric 127:418–424
6. Lecun Y, Bengio Y, Hinton G (2015) Deep learning
7. Ashqar BAM, Bassem S, Abu Nasser, AbuNaser SS (2019) Plant seedlings classification using
deep learning
8. Nkemelu DK, Omeiza D, Lubalo N (2018) Deep convolutional neural network for plant seedling
classification. arXiv preprint arXiv:1811.08404
9. Elnemr, HA (2019) Convolutional neural network architecture for plant seedling classification.
Int J Adv Comput Sci Appl 10
10. Alimboyong CR, Hernandez AA (2019) An improved deep neural for classification of
plant seedling images. 2019 IEEE 15th international colloquium on signal processing & its
applications (CSPA). IEEE
11. Dyrmann M, Karstoft H, Midtiby HS (2016) Plant species classification using deep convolu-
tional neural network. Biosyst Eng 151(2016):72–80
12. Rahman NR, Hasan MAM, Shin J (2020) Performance comparison of different convolutional
neural network architectures for plant seedling classification. 2020 2nd International conference
on advanced information and communication technology (ICAICT), Dhaka, Bangladesh, 2020,
pp 146150. https://doi.org/10.1109/ICAICT51780.2020.93333468
13. Xia D et al (2018) Insect detection and classification based on an improved convolutional neural
network. Sensors 18(12):4169
14. Sun Y et al (2017) Deep learning for plant identification in natural environment. Comput Intell
Neurosci 2017
A Robust Authentication
and Authorization System Powered
by Deep Learning and Incorporating
Hand Signals
Suresh Palarimath, N. R. Wilfred Blessing, T. Sujatha, M. Pyingkodi,

Bernard H. Ugalde, and Roopa Devi Palarimath
Abstract Hand gesture recognition signals have several uses. Communication for
visually challenged people, such as the elderly or the handicapped, health care, auto-
mobile user interfaces, security, and surveillance are just a few of the possible appli-
cations. A deep learning-based edge computing system is designed and implemented
in this article, and it is capable of authenticating users without the need of a physical
token or physical contact. To authenticate, the sign language digits are represented by
hand gestures. Deep learning is used to categorize digit hand motions in a language
based on signs. The suggested deep learning model’s characteristics and bottleneck
module are based on deep residual networks. The collection of sign language digits
accessible online shows that the model achieves a classification accuracy of 97.20%,
which is excellent. Model B+ of the Raspberry Pi 3 is used as an edge computing
device, and the model is loaded on it. Edge computing is implemented in two phases.
First, the gadget collects and stores initial camera data in a buffer. The model then
calculates the digit using the first photograph in the buffer as input and an inference
rate of 280 ms.
S. Palarimath (B) · N. R. W. Blessing · B. H. Ugalde

Department of IT, University of Technology and Applied Sciences, Salalah, Oman
e-mail: suresh.p@sct.edu.om
N. R. W. Blessing
e-mail: wilfred.b@sct.edu.om
B. H. Ugalde
e-mail: bernard.u@sct.edu.om
T. Sujatha
Department of CSE, Karunya Institute of Technology and Sciences, Coimbatore, India
e-mail: sujatha@karunya.edu
M. Pyingkodi
Department of Computer Applications, Kongu Engineering College, Erode, India
e-mail: pyingkodi@kongu.ac.in
R. D. Palarimath
Faculty of Computer Science, Mansarovar Global University, Bhopal, India
https://doi.org/10.1007/978-981-16-7610-9_77
1062 S. Palarimath et al.
Keywords Artificial intelligence · Edge computing · Machine learning ·

MobileNetV2 · Convolutional neural network · Deep learning · Hand signal ·
Raspberry Pi
1 Introduction
Contactless biometric identification is considered to be more sanitary, secure, and

efficient than traditional biometric identification. Physiological and behavioural
biometrics are two types of biological measurements. Physiological biometrics
include fingerprints, facial characteristics, palm prints, retinas, and ears. Common
behavioural biometrics include keystrokes, and signatures. When it comes to body
language, hand gestures are an element that may be communicated via the centre
of a person’s palm, their finger position, and the shape formed by the hand. Hand
gestures may be divided into two categories: static and dynamic. In contrast to the
dynamic gesture, which consists of a sequence of hand motions such as waving, the
static gesture is defined by its fixed form, as indicated by its name [1].
Prior to deep learning, biometric identification relied on handcrafted features
retrieved using methods like SIFT [2] and other similar approaches. This decade’s
debut of deep learning revolutionized the field of biometric identification. Most
modern biometric identification systems use convolutional neural networks or varia-
tions of them. It is a deep multilayer artificial neural network that uses convolutional
neural networks to learn. Their convolutional approach offers an excellent representa-
tion of the input photographs straight from raw pixels with little to no pre-processing
and can easily recognize visual patterns [3]. Their convolutional approach, which
creates a good representation of the input photographs straight from raw pixels with
little to no pre-processing, makes them popular in medical imaging applications.
The representations learned by CNN models are visual features, which are more
effective than handmade features [2]. Listed below are some of the most recent deep
learning-based biometric identification systems: [4] utilizes CNN for face biometric
authentication, Aizat et al. [5] combine graph neural networks and CNN for palm print
identification, and Aizat et al. [5] use deep neural networks to identify and authenti-
cate users based on speech, referencing deep neural networks (DNNs), deep neural
networks for fingerprinting. While deep learning has made significant advances in
biometric identification, there are still many difficulties to solve. Some obstacles
include the need for more demanding datasets to train the models, interpretable deep
learning models, real-time model deployment, memory-efficient, and security issues.
We present an authentication system that validates a person by verifying an
‘authentication code’ created by a memory-efficient CNN model. The system gener-
ates an authentication code for each user consisting of digits ranging from 0 to 9.
The goal of using convolutional neural networks (CNNs) to classify images into
categories automatically is called image classification. Deep learning models are
A Robust Authentication and Authorization System Powered … 1063
available now that perform well on image classification datasets using convolu-
tional neural networks. On the ILSVRC-2012 dataset [6], the sharpness-aware mini-
mization technique on the CNN model obtains 86.73% top-1 accuracy, whereas the
EnAET model gets 1.99% top-1 accuracy CIFAR-10 dataset. All of the preceding
models are large and need a lot of memory and model inference time. This has proven
problematic, especially when inference must occur on a computer device at the
network edge or in the cloud (central processing system). Complex models requiring
more significant computing resources have been used to attain cutting-edge perfor-
mance [7]. We present a memory-efficient CNN model for use in edge computing
systems. The suggested memory-efficient CNN model is paralleled to the existing
memory-efficient technology that is cutting-edge CNN model, MobileNetV2.
2 Literature Review
In the last decade, many papers with regard to processing hand gestures were
published and have become an interesting topic for researchers, where some of these
studies have considered a range of different applications. However, the hand gesture
interaction systems depend on recognition rate, which is affected by some factors,
including the type of camera used and its resolution, the technique utilized for hand
segmentation, and the recognition algorithm used. This section summarizes some
key papers with respect to hand gestures.
In [8], the author has discussed the recognizing hand gesture for the slandered
Indonesian sign language, using Myo armband as hand gesture sensors. Authors in [9]
have reviewed the various hand gesture techniques and merits and demerits. Authors
in [1] used the Kinet V2 depth sensor for identifying hand gestures and suggested
three different scenarios to get effective outcomes. The authors in [10] used the
inertial measurement unit (IMU) sensors for human–machine interface (HMI) appli-
cations using hand gesture recognition (HGR) algorithm. Authors in [11] discussed
hands-free presentations using hand gesture recognition; in this paper, authors have
discussed the design and wearable armband to perform this hands-free presentation.
Finally, the authors in [12] addressed the development and deployment of an end-to-
end deep learning-based edge computing system from conception to completion for
gesture recognition authentication from start to finish.
The technique to rectify the rotational orientation of the MYO bracelet device’s
sensor that has been described by the authors in [13] was addressed in detail. In order
to identify the highest possible energy channel for a given samples from the gesture’s
timing synchronization set WaveOut, the method is used. Researchers say it can
improve the recognition and classification of hand gestures. Authors in [14] used hand
segmentation masks combined with RGB frames to identify real-time hand gesture
recognition. Furthermore, the authors in [15] discussed the handy feature of hand
gesture recognition: utilizing hand gestures in times of emergencies. Recognition
of hand gestures was achieved via the use of vector machine-based classification as
well as deep learning-based classification techniques.
3 Proposed Model
3.1 Dataset
We used the Sign Language Digits Dataset [16] to train the proposed CNN
and MobileNetV2 systems. The dataset has 2500 samples. There are ten courses
numbered 0–9. Figure 1 illustrates the four classes. Each sample is [150 × 150]
pixels. Table 1 lists the dataset’s statistics, including the total number of samples
in each grouping. The dataset is separated into three parts: training, validation, and
testing. Because there are ten courses, the test data is split evenly among them. The
test set included 650 samples, or 25.20% of the whole dataset, with 63 instances
Fig. 1 Hand signals and the decoded numbers. (Source dataset [16])
Table 1 Number of samples from dataset [16] for training, testing, and validation
Class No. of samples No. of training No. of validation No. of testing
0 250 149 38 63
1 250 149 38 63
2 250 149 38 63
3 250 149 38 63
4 250 149 38 63
5 250 149 38 63
6 250 149 38 63
7 250 149 38 63
8 250 149 38 63
9 250 149 38 63
Total 2500 1490 380 630
Fig. 2 Suggested authentication system’s categorization (digit recognition) job
from each class chosen at random. It is divided into two training and validation sets,
with samples comprising 59.60 and 15.20% of the total dataset, respectively.
Each sample in the collection has an image size of 150 × 150 pixels. These
images were upscaled to 256 × 256 pixels using bicubic interpolation [17]. Due to
the real-time testing restrictions stated in Sect. 3.4, we upscale the images to 256 ×
256 pixels. The scaled photograph samples are then utilized as input pictures for the
deep learning system. Section 3.2 details the proposed authentication mechanism.
3.2 Proposed Authentication System
An authentication system presented in this article shows how to employ CNNs to

produce authentication codes utilizing hand movement and sign language numbers.
It consists of two steps: image capture and classification (digit recognition). Figure 2
depicts the system’s whole categorization task. Before feeding the input picture into
the deep learning model for digit recognition, the system resizes it to 256 × 256
pixels. Figure 2 shows this.
3.3 Deep Learning Models for Hand Gesture Recognition

Using MobileNetV2
This section addresses deep learning-based CNNs for hand gesture identification
using MobileNetV2.
The MobileNetV2 architecture is the state-of-the-art CNN that outperforms other
models in applications like object recognition [18]. The network has efficient depth-
separable convolutional layers. The network’s premise was to encode intermediate
input and output across these bottlenecks efficiently.
We train the MobileNetV2 model gradually using two transfer learning methods:
feature extraction and fine-tuning. We first train the model using feature extraction
and then refine it using fine-tuning. These are briefly mentioned below:
3.3.1 Feature Extraction
When the MobileNetV2 model is trained on the ImageNet dataset, which is utilized in
this method, the model is ready to use. The ‘classification layer’ of the MobileNetV2
model is not included since the ImageNet dataset has more classes than the ‘Sign
Language Digit Dataset’ we utilize. The basic model uses 53 pre-trained layers of
the MobileNetV2 model. The learnt features are a four-dimensional tensor of size
[None, 8, 8, 1280]. A global average pooling 2D function [19] flattens the basic model
output into a 2-dimensional matrix of size [None, 1280]. Then, we add a dense layer,
which is the dataset’s categorization layer. Using the feature extraction approach,
we do not train the base model (MobileNetV2, except the final layer), but rather
utilize it in order to extract characteristics from the input sample and feed them into
the dense layer (the additional classification layer according to the ‘Sign Language
Digit Dataset’). This technique uses the ‘RMSprop optimizer’ (the training of neural
networks using gradient-based optimization) [20].
3.3.2 Fine-Tuning
The MobileNetV2 basic model (minus its classification layer) is pre-trained using
ImageNet dataset, and an additional dense layer fine-tunes the model (classification
layer according to our dataset). On the ‘Sign Language Digit Dataset’, we train
53 layers, including the last dense layer. Compared to the previous approach, this
method has more trainable parameters. ‘RMSprop optimizer’ was used to train the
MobileNetV2 model [20].
3.4 Deployment on Edge Computing Device
The Raspberry Pi 3 Model B + microprocessor is used for the job. The Raspberry Pi
3 Model B + is the newest single-board edge computing device. It has more faster
CPU and better connection than the Raspberry Pi 3 Model B. We used a Raspberry Pi
Camera V2 module to collect pictures in real time. The trained model predicts hand
movements on the Raspberry Pi 3. Using the proposed and MobileNetV2 models in
their original form will create prediction delay. Therefore, these models’ TensorFlow
Lite (TFL) versions are created and deployed to address the latency problem.
During real-time testing, the camera’s pictures represent the system’s input. Before
sending these pictures to the deep learning model, they are shrunk (downscaled) to
256 × 256 pixels. Images with a resolution less than 256 × 256 were warped for
real-time prediction. Figure 3 depicts the complete authentication process. Figure 3
shows the counter variable ‘i’, which keeps track of the number of iterations in the
system, and loop ‘n’ times, where n is the authentication code length. It follows two
fundamental stages within the loop. The system initially reads the live camera feed
from the Pi Camera and saves it in the frame buffer. The input picture (the first image
Fig. 3 Diagram of the process flow for the production of authentication codes (note: where n = 5,
the length of the code is denoted by the letter ‘n’)
frame in the buffer) is then scaled to 256 × 256 pixels, and the anticipated digit class
is displayed on the screen. After 2 s, the human changes the digit sign and the frame
buffer is cleared. After that, the cycle is performed a second time. This sign digit
changeover pause time may be customized to meet specific application requirements.
Following the conclusion of the loop, the authentication code is shown. It is printed
for the purpose of verification.
4 Results
Figure 4 depicts a few instances of projected labels produced by the proposed model
based on the test data and a representation of real-time predictions of the proposed
mechanism, because the model is accurate in predicting the authentication code under
both uniform and non-uniform illumination circumstances.
4.1 Discussion
The findings show that authentication using hand gestures and deep learning on a
Raspberry Pi is possible. The created technology could generate an authentication
PIN without touching the keyboard. Because ATMs have enclosures on both sides,
the digit signals (hand motions) are hidden from view and the entire authentication is
low cost. On the other hand, as demonstrated in Fig. 5, code creation is independent
of illumination conditions and achieves excellent accuracy in both. The Raspberry
Pi 3 Model B+, used in this study as an edge computing device, may offer different
outputs to input the security code.
Fig. 4 Predictions about the planned CNN model in identifying the samples taken from the dataset
[16]
Fig. 5 A total of five numbers are predicted in real time under a variety of natural lighting condition
situations (illumination situations that are both uniform and non-uniform), with the accompanying
sign images being taken, and in each prediction, an edge computing device is used to process data.
This picture depicts the final code that was produced as a result of the suggested CNN algorithm
being implemented
4.2 Limitations
The number of samples available for each class (0–9) in the hand gesture sign
recognition dataset is restricted in the dataset being used. However, the proposed
deep learning model’s performance is auspicious. Furthermore, the dataset may be
further improved in future by including new classifications. On the other hand, the
model’s performance may be somewhat reduced due to the motion artefacts that
occur during picture capture. In addition, the performance may be adversely affected
by the camera’s field of vision restricted and the practical placement of the hands in
front of the camera.
5 Conclusions
This study designed a comprehensive system that uses sign language digit hand
gestures to authenticate users in public and commercial locations, including ATMs,
information desks, railways, and shopping malls. A convolutional neural network
(CNN) was utilized in this study to generate an authentication code using camera
input, making it genuinely contactless. The whole deep learning model inference
was made on a Raspberry Pi 3 Model B+ CPU with a connected camera, making it
ideal for large-scale deployment. The suggested CNN obtained 97.20% accuracy on
the test dataset. The proposed system operates in real time with a model inference
rate of 280 ms per picture frame and may replace traditional touchpad and keypad
authentication techniques. Furthermore, it is possible to expand the dataset in future
to include classifications such as ‘accept’, ‘close’, ‘home’, ‘ok’, and ‘go back’ to
minimize further the requirement for surface interaction in these secure systems. As
previously mentioned, deep learning techniques may be used to a variety of appli-
cations, including water quality calculation, medical illness prediction, and instruc-
tional computing [20–25]. Future studies in these fields have been planned by the
authors.
References
1. Oudah M, Al- A, Chahl J (2021) Elderly care based on hand gestures using kinect sensor.
Computers 10:1–25
2. Zhou R, Zhong D, Han J (2013) Fingerprint identification using SIFT-based minutia descriptors
and improved all descriptor-pair matching. Sensors (Basel). 13(3):3142–3156. https://doi.org/
10.3390/s130303142
3. Gu J, Wang Z, Kuen J, Ma L, Shahroudy A, Shuai B, Liu T, Wang X, Wang G, Cai J et al
(2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377. https://
doi.org/10.1016/j.patcog.2017.10.013
4. Zulfiqar M, Syed F, Khan M, Khurshid K (2019) Deep face recognition for biometric authen-
tication. In: International conference on electrical, communication, and computer engineering
(ICECCE), Swat, Pakistan, pp 24–25. https://doi.org/10.1109/ICECCE47252.2019.8940725
5. Aizat K, Mohamed O, Orken M, Ainur A, Zhumazhanov B (2020) Identification and
authentication of user voice using DNN features and i-vector. Cogent Eng 7:1751557
6. Foret P, Kleiner A, Mobahi H, Neyshabur B (2020) Sharpness-aware minimization for
efficiently improving generalization. arXiv arXiv:cs.LG/2010.01412
7. Deng BL, Li G, Han S, Shi L, Xie Y (2020) Model compression and hardware acceleration for
neural networks: a comprehensive survey. Proc IEEE 108:485–532
8. Anwar A, Basuki A, Sigit R (2020) Hand gesture recognition for Indonesian sign language
interpreter system with myo armband using support vector machine. Klik—Kumpul J Ilmu
Komput 7:164
9. Oudah M, Al-Naji A, Chahl J (2020) Hand gesture recognition based on computer vision: a
review of techniques. J Imaging 6
10. Kim M, Cho J, Lee S, Jung Y (2019) Imu sensor-based hand gesture recognition for human-
machine interfaces. Sensors (Switzerland) 19:1–13
11. Goh JEE, Goh MLI, Estrada JS, Lindog NC, Tabulog JCM, Talavera NEC (2017) Presentation-
aid armband with IMU, EMG sensor and bluetooth for free-hand writing and hand gesture
recognition. Int J Comput Sci Res 1:65–77
12. Dayal A, Paluru N, Cenkeramaddi LR, Soumya J, Yalavarthy PK (2021) Design and implemen-
tation of deep learning based contactless authentication system using hand gestures. Electron
10:1–15
13. López LIB et al (2020) An energy-based method for orientation correction of EMG bracelet
sensors in hand gesture recognition systems. Sensors (Switzerland) 20:1–34
14. Benitez-Garcia G et al (2021) Improving real-time hand gesture recognition with semantic
segmentation. Sensors (Switzerland) 21:1–16
15. Adithya V, Rajesh R (2020) Hand gestures for emergency situations: a video dataset based on
words from Indian sign language. Data Brief 31:106016
16. Mavi A (2020) A new dataset and proposed convolutional neural network architecture for
classification of American sign language digits. arXiv:2011.08927 [cs.CV] https://github.com/
ardamavi/Sign-Language-Digits-Dataset
17. Dengwen Z (2010) An edge-directed bicubic interpolation algorithm. In: 3rd International
Congress on image and signal processing, pp 1186–1189. https://doi.org/10.1109/CISP.2010.
5647190
18. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) MobileNetV2: inverted residuals
and linear bottlenecks. In: IEEE/CVF conference on computer vision and pattern recognition,
Salt Lake City, UT, USA, 18–22 June 2018, pp 4510–4520
19. Lin M. Chen Q, Yan S (2013) Network in network. arxiv arXiv:cs:CV/1312.4400
20. Blessing NRW, Benedict S (2017) Computing principles in formulating water quality
informatics and indexing methods: an ample review. J Comput Theor Nanosci 14(4):1671–1681
21. Sangeetha SB, Blessing NRW, Yuvaraj N, Sneha JA (2020) Improved training pattern in back
propagation neural networks using holt-winters’ seasonal method and gradient boosting model.
Appl Mach Learn. ISBN 978-981-15-3356-3, Springer, pp 189–198
22. Blessing NRW, Benedict S (2014) Extensive survey on software tools and systems destined
for water quality. Int J Appl Eng Res 9(22):12991–13008
23. Blessing NRW, Benedict S (2016) Aquascopev 1: a water quality analysis software for
computing water data using aquascope quality indexing (AQI) scheme. Asian J Inf Technol
15(16):2897–2907
24. Haidar SW, Blessing NRW, Singh SP, Johri P, Subitha GS (2018) EEapp: an effectual appli-
cation for mobile based student centered learning system. In: The 4th international conference
on computing, communication & automation (ICCCA 2018), December 14–15, India. IEEE,
pp. 1–4
25. Pyingkodi M, Blessing NRW, Shanthi S, Mahalakshmi R, Gowthami M (2020) Performance
evaluation of machine learning algorithm for lung cancer. In: International conference on artifi-
cial intelligence & smart computing (ICAISC-2020), Bannari Amman Institute of Technology,
Erode, India, Springer, October 15–17, 92
Author Index
A Bhavani, Y., 281

Abdullah, S. K., 129 Bhingarkar, Sukhada, 731
Abirami, A., 491 Bindu, R., 477
Adhiya, Krishnakant P., 953 Blessing, N. R. Wilfred, 1061
Agarwal, Ritu, 251 Bodavarapu, Pavan Nageswar Reddy, 1
Ahamed, A. Fayaz, 49 Bohara, Mohammed Husain, 659, 697,
Ahmed, Md Apu, 345 1039
Aishwarya, A., 529 Bose, Soumalya, 627
Alam, Sanim, 345
Ambika, G. N., 453
Anand, Garima, 709 C
Anithaashri, T. P., 93 Chandana, C. L., 801
Anudeep, D. S. V. N. S. S., 825 Chandrika, C. P., 683
Anu Keerthika, M. S., 373 Channavar, Manjunath, 837
Anusha, C., 529 Charan, M., 905
Aparna, R., 801 Chhabra, Gunjan, 385
Arefin, Md. Rashedul, 345
Arivarasan, S., 463
D
Arthi, R., 321
Das, Saptarshi, 491
Arun Raj Kumar, P., 599
deRito, Christopher, 785
Aski, Vidyadhar Jinnappa, 991
Deshpande, Girish R., 837
Athira, M., 359
Dhaka, Vijaypal Singh, 991
Dhanushmathi, S. K., 75
Dharaniga, M., 75
B
Dharsheeni, R., 75
Babna, K., 1025
Dinesh Kumar, J. R., 75
Baby Shalini, V., 373
Duvvuru, Jahnavi, 331
Balaji, A. Siva, 825
Banerjee, Archita, 491
Baranidharan, V., 265 E
Barik, Kousik, 491 Ebenezer, V., 541
Bedekar, Gayatri, 441
Benchaib, Imane, 671
Bhagat, Kavita, 1007 F
Bhatia, Sajal, 785 Ferdous, Tafannum, 345
© The Editor(s) (if applicable) and The Author(s), under exclusive license 1073
to Springer Nature Singapore Pte Ltd. 2022
https://doi.org/10.1007/978-981-16-7610-9
1074 Author Index
Firke, Shital N., 193 Kaushik, Sunil, 385

Kavya Sri, E., 281
Keerthana, M., 49
G Keerthivasan, S. N., 265
Gadige, Sunil Prasad, 643 Kejriwal, Shubham, 505
Gautam, Binav, 709 Kitawat, Parth, 505
Gayathri, K., 743 Konar, Karabi, 491
George, Joseph, 777 Krishnaveni, S., 219, 291, 321
Ghoumid, Kamal, 671 Kumar, Badri Deva, 331
Gopinath, Nimitha, 359 Kumar, Jalesh, 965
Goudar, R. H., 441 Kumar, Sunil, 991
Kurhade, Swapnali, 505
H
Habib, Md. Rawshan, 345
Halsana, Aanzil Akram, 411 L
Haritha, K. S., 173, 359, 1025 Lakshmi, K. S. Vijaya, 825
Hasan, S. K. Rohit, 129 Lalithadevi, B., 219
Hegde, Suchetha G., 801 Lamrani, Yousra, 671
Hossain, Sk Md Mosaddek, 411 Laxmi, M., 565
Lokhande, Unik, 115
I
Indira, D. N. V. S. L. S., 519
Indrani, B., 139 M
Indumathi, P., 905 Mahender, C. Namrata, 921
Isaac Samson, S., 265 Mandara, S., 309
Islam, Saiful, 345 Mane, Sunil, 103
Iswarya, M., 373 Manjunathachari, K., 643
Manohar, N., 309
Meghana, B., 529
J Miloud Ar-Reyouchi, El, 671
Jacob, T. Prem, 905 Mohammed, S. Jassem, 33
Jain, Amruta, 103 Mohanraj, V., 943
Jain, Ayur, 385 Mollah, Ayatullah Faruk, 129
Jain, Ranjan Bala, 193 Monika, 709
Jayalakshmi, V., 17 Motiani, Juhie, 1039
Jayashree, H. N., 801
Jeyakumar, M. K., 777
Jeyanthi, D. V., 139
N
Joshi, Brijendra Kumar, 399
Joshi, Namra, 885 Nair, Arun T., 173, 239, 359, 585, 1025
Nallavan, G., 855
Namboothiri, Kesavan, 173, 359
K Namritha, M., 425
Kalaivani, K. S., 425, 1053 Nandhini, R., 1053
Kaliraj, S., 207 Narayan, T. Ashish, 1
Kallimani, Jagadish S., 683 Naveen, B., 837
Kalpana, B. Khandale, 921 Naveen Kumar, S., 265
Kamakshi, P., 281 Nayak, Amit, 697
KanimozhiSelvi, C. S., 425, 1053 Nikhil, Chalasani, 331
Kanisha, B., 207 Nithya, N., 855
Kapse, Avinash S., 895 Nivedhashri, S., 1053
Karthik, N., 541 Niveetha, S. K., 425
Author Index 1075
P Sapra, Shruti J., 895

Palarimath, Roopa Devi, 1061 Saravanan, S., 869
Palarimath, Suresh, 1061 Saritha Haridas, A., 173
Pandian, R., 905 Sasirekha, R., 207
Pandit, Ankush, 627 Selvakumar, S., 759
Parashar, Anubha, 991 Sen, Anindya, 627
Patel, Anjali, 1039 Senthilkumar, J., 943
Patel, Dhruti, 1039 Shadiya Febin, E. P., 239
Patel, Nehal, 613 Shah, Jaini, 115
Patel, Ritik H., 613 Shah, Panth, 697
Patel, Rutvik, 613 Shah, Rashi, 115
Patel, Sandip, 613 Shamna, P. A., 585
Patel, Vivek, 659 Sharma, Pankaj Kumar, 759
Patel, Yashashree, 697 Shawmee, Tahsina Tashrif, 345
Patil, Rudragoud, 441 Shidaganti, Ganeshayya, 565
Patil, Shweta, 953 Shivamurthy, G., 565
Pavithra, K., 425 Shukla, Mukul, 399
Phaneendra, H. D., 477 Sikha, O. K., 59
Prajeeth, Ashwin, 709 Sindhu Sai, Y., 281
Prakash, S., 463, 565 Singh, Aakash, 505
Prathik, A., 541 Singh, Bhupinder, 251
Prathiksha, R., 49 Singh, Manoj Kumar, 643
Pravin, A., 905 Singh, Rishi Raj, 385
Priyadharshini, N., 1053 Sivamohan, S., 291
Priyadharsini, K., 75 Sobhana, M., 331
Priya, D. Mohana, 49
Sriabirami, V., 855
Punjabi, Zeel, 659
Sridhar, Gopisetti, 331
Purwar, Ravindra Kumar, 975
Sridhar, S. S., 291
Pyingkodi, M., 1061
Srihari, K., 59
Srilalitha, N. S., 477
Srinivas, P. V. V. S., 1
R Sujatha, T., 1061
Radhika, N., 33 Suman, S., 965
Rahul, 709
Surendran, S., 463
Rajakumar, R., 905
Suresh Babu, Ch., 519
Rajaram, Kanchana, 759
Suresh, Yeresime, 453, 943
Rajendran, P. Selvi, 93
Suri, Ashish, 1007
Raju, C., 265
Swarup Kumar, J. N. V. R., 519
Ramya Sri, S., 373
Swathi Chandana, J., 529
Rathod, Kishansinh, 659
Ravichandran, G., 93
Rekha, K. S., 477
Remya Krishnan, P., 599 T
Ruchika, 975 Talegaon, Naveen S., 837
Rupesh, M., 529 Tanvir, Md Shahnewaz, 345
Tanwar, Priya, 115
Tayyab, Md., 825
S Tergundi, Parimal, 441
Saadhavi, S., 477 Thakral, Manish, 385
Sadhana, S. Ram., 477 Thakur, Shruti A., 895
Sai Siddharth, P., 869 Thangavelu, S., 743
Santhosh Kumar, S., 159 Thiyagu, T., 321
Santhosh, T. C., 837 Tulasi Ratnakar, P., 869
1076 Author Index
U Vignan, N. Gunadeep, 825

Uday Vishal, N., 869 Vijayalakshmi, K., 17
Ugalde, Bernard H., 1061 Vijetha, N., 801
Uma, M., 553 Vinodhini, M., 541
Viswanadham, Y. K., 519
Viswasom, Sanoj, 159
V
Vaishali, P. Kadam, 921
Venkateswara Rao, Ch., 519 Y
Verma, Shailesh, 975 Yohapriyaa, M., 553

Intelligent Data Communication Technologies and Internet of Things

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intelligent Data Communication Technologies and Internet of Things

Uploaded by

Copyright:

Available Formats

Lecture Notes on Data Engineering

and Communications Technologies 101

More information about this series at https://link.springer.com/bookseries/15362

ISSN 2367-4512 ISSN 2367-4520 (electronic)

This conference proceedings volume contains the written versions of most of

Coimbatore, India Dr. D. Jude Hemanth

An Optimized Convolutional Neural Network Model for Wild

Acute Leukemia Subtype Prediction Using EODClassifier . . . . . . . . . . . . 129

A Gender Recognition System from Human Face Images Using

An Efficient QOS Aware Routing Using Improved Sensor

Predicting NCOVID-19 Probability Factor with Severity Index . . . . . . . 627

Emotion and Collaborative-Based Music Recommendation

IoT Enabled Elderly Monitoring System and the Role of Privacy

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073

Dr. Chandrasekar Vuppalapati is a Software IT Executive with diverse experience

USA and Future of Information and Communication Conference 2018, Singapore.

Pavan Nageswar Reddy Bodavarapu, T. Ashish Narayan,

Keywords Object detection · Edge detection · Convolutional neural network ·

P. N. R. Bodavarapu (B) · T. A. Narayan · P. V. V. S. Srinivas

Fu et al. [17] proposed a framework “deepside” to integrate the convolutional neural

3.1 Dataset Description

This study is focused on detecting animals in images shot in low-light settings. We

3.2 Edge Detection Techniques

3.2.1 Canny Edge Detection

3.2.2 Laplacian Edge Detection

Laplacian is the second derivative mask, which is susceptible to noise. If an image

3.2.3 Sobel Edge Detection

where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 )

3.2.4 Prewitt Edge Detection

where Sx = (a2 + ca3 + a4 ) − (a0 + ca7 + a6 )

Step 1: Input the animal dataset containing different animal images.

4.1 Performance of Various Models on Wild Animal Dataset

See Table 1 and Figs. 1, 2, and 3.

Table 1 Outline of accuracy and loss of different models

Fig. 1 Accuracy and loss of

4.2 Performance of Proposed Model After Applying Different

See Table 2 and Figs. 5, 6, 7, and 8.

Fig. 2 Accuracy and loss of

Fig. 3 Accuracy and loss of proposed model

Fig. 5 Accuracy and loss

Fig. 6 Accuracy and loss

Fig. 7 Accuracy and loss

Fig. 8 Accuracy and loss

4.3 Performance of Proposed Model on Different Opacity

K. Vijayalakshmi and V. Jayalakshmi

Keywords Access control models · Attribute-based access control model · Cloud

policies [30]. Harsha S. Gardiyawasam Pussewalage and Vladimir A. Oleshchuk

3 Attribute-based Access Control Model

3.1 Background of ABAC

Table 1 Analysis of the research on ABAC model

500 Articles related to ABAC models were downloaded

100 Articles unrelat- Excluded

Taken for review process

52 Articles related to ABAC research and current challenges are identified

A review is conducted and current research and challenges in ABAC are

Fig. 1 Flow diagram of literature survey

of the attributes {A1 , A2 , …, An }, respectively. The decision is made based on the

3.2 Policy Expression

4 Taxonomy of ABAC Research

The current study on ABAC research is classified based on model, implementation,

4.1 ABAC Models

4.2 ABAC Implementation

Categorization of current research in ABAC