Applied Information Processing Systems 2022

Advances in Intelligent Systems and Computing 1354
Brijesh Iyer
Debashis Ghosh
Valentina Emilia Balas Editors
Applied
Information
Processing
Systems
Proceedings of ICCET 2021
Advances in Intelligent Systems and Computing
Volume 1354
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Nikhil R. Pal, Indian Statistical Institute, Kolkata, India
Rafael Bello Perez, Faculty of Mathematics, Physics and Computing,
Universidad Central de Las Villas, Santa Clara, Cuba
Emilio S. Corchado, University of Salamanca, Salamanca, Spain
Hani Hagras, School of Computer Science and Electronic Engineering,
University of Essex, Colchester, UK
László T. Kóczy, Department of Automation, Széchenyi István University,
Gyor, Hungary
Vladik Kreinovich, Department of Computer Science, University of Texas
at El Paso, El Paso, TX, USA
Chin-Teng Lin, Department of Electrical Engineering, National Chiao
Tung University, Hsinchu, Taiwan
Jie Lu, Faculty of Engineering and Information Technology,
University of Technology Sydney, Sydney, NSW, Australia
Patricia Melin, Graduate Program of Computer Science, Tijuana Institute
of Technology, Tijuana, Mexico
Nadia Nedjah, Department of Electronics Engineering, University of Rio de
Janeiro, Rio de Janeiro, Brazil
Ngoc Thanh Nguyen , Faculty of Computer Science and Management,
Wrocław University of Technology, Wrocław, Poland
Jun Wang, Department of Mechanical and Automation Engineering,
The Chinese University of Hong Kong, Shatin, Hong Kong
The series “Advances in Intelligent Systems and Computing” contains publications
on theory, applications, and design methods of Intelligent Systems and Intelligent
Computing. Virtually all disciplines such as engineering, natural sciences, computer
and information science, ICT, economics, business, e-commerce, environment,
healthcare, life science are covered. The list of topics spans all the areas of modern
intelligent systems and computing such as: computational intelligence, soft comput-
ing including neural networks, fuzzy systems, evolutionary computing and the fusion
of these paradigms, social intelligence, ambient intelligence, computational neuro-
science, artificial life, virtual worlds and society, cognitive science and systems,
Perception and Vision, DNA and immune based systems, self-organizing and
adaptive systems, e-Learning and teaching, human-centered and human-centric
computing, recommender systems, intelligent control, robotics and mechatronics
including human-machine teaming, knowledge-based paradigms, learning para-
digms, machine ethics, intelligent data analysis, knowledge management, intelligent
agents, intelligent decision making and support, intelligent network security, trust
management, interactive entertainment, Web intelligence and multimedia.
The publications within “Advances in Intelligent Systems and Computing” are
primarily proceedings of important conferences, symposia and congresses. They
cover significant recent developments in the field, both of a foundational and
applicable character. An important characteristic feature of the series is the short
publication time and world-wide distribution. This permits a rapid and broad
dissemination of research results.
Indexed by DBLP, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and
Technology Agency (JST).
All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/11156

Brijesh Iyer · Debashis Ghosh ·
Valentina Emilia Balas
Editors
Applied Information
Processing Systems
Proceedings of ICCET 2021
Editors
Brijesh Iyer Debashis Ghosh
Department of Electronics Department of Electronics and Computer
and Telecommunications Engineering Engineering
Dr. Babasaheb Ambedkar Technological Indian Institute of Technology Roorkee
University Roorkee, Uttarakhand, India
Lonere, India
Valentina Emilia Balas

Department of Automatics and Applied
Software
Aurel Vlaicu University of Arad
Arad, Romania
ISSN 2194-5357 ISSN 2194-5365 (electronic)

Advances in Intelligent Systems and Computing
ISBN 978-981-16-2007-2 ISBN 978-981-16-2008-9 (eBook)
https://doi.org/10.1007/978-981-16-2008-9
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Preface
Dr. Babasaheb Ambedkar Technological University, Lonere-402103, is a State Tech-

nological University of the Maharashtra State in India. Over the years, the Depart-
ment of Electronics and Telecommunication Engineering of this University has been
organizing faculty and staff development and continuing education programs.
In 2013, the department took a new initiative to organize international conferences
in the frontier areas of eEngineering and computing technologies. The ICCET series
(earlier ICCASP) is an outcome of this initiative. The 6th ICCET 2021 has been
organized by the department of E&TC Engineering of the University. The event was
conducted in ONLINE mode due to ongoing pandemic situations all over the globe.
Keynote lectures, invited talks by eminent professors, and panel discussions of the
delegates with the academicians and industry personnel are the key features of the
6th ICCET 2021.
This volume aims to collect scholarly articles in the area of Applied Information
systems which will be helpful to cater to the needs of next millennium communica-
tions systems. We have received a great response regarding the quantity and quality
of individual research contributions for consideration. The conference had adopted
a “Single Blind Peer Review” process to select the papers with a strict plagiarism
verification policy. Hence, the selected papers are the true record of research work
for the theme of this volume.
We are thankful to the reviewers, session chairs, and rapporteurs for their support.
We also thank the authors and the delegates for their contributions and presence.
Finally, we are incredibly grateful to University officials for their support for this
activity.
We are pledged to take this conference series to greater heights in the years to
come to put forward the need-based research and innovation.
Thank you one and all.
Lonere, India Dr. Brijesh Iyer

Roorkee, India Dr. Debashis Ghosh
Arad, Romania Dr. Valentina Emilia Balas
v
Contents
CNN Parameter Adjustment for Brain Tumor Classification . . . . . . . . . . . 1

Toan Pham Ho and Vinh Truong Hoang
Advance Fuzzy Radial Basis Function Neural Network . . . . . . . . . . . . . . . . 11
Balaji S. Shetty, Manisha S. Mahindrakar, and U. V. Kulkarni
Unbounded Fuzzy Radial Basis Function Neural Network Classifier . . . . 25
A Study on the Adaptability of Deep Learning-Based Polar-Coded
NOMA in Ultra-Reliable Low-Latency Communications . . . . . . . . . . . . . . 39
N. Iswarya, R. Venkateswari, and N. Madhusudanan
Heart Rate Variability-Based Mental Stress Detection Using Deep
Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Ramyashri B. Ramteke and Vijaya R. Thool
Product-Based Market Analysis Using Deep Learning . . . . . . . . . . . . . . . . 63
Aayush Kumaria, Nilima Kulkarni, and Abhishek Jagtap
Driver Drowsiness Detection Using Deep Learning . . . . . . . . . . . . . . . . . . . . 73
Ajinkya Rajkar, Nilima Kulkarni, and Aniket Raut
Emotion Detection from Social Media Using Machine Learning
Techniques: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Vijaya Ahire and Swati Borse
Deep Age Estimation Using Sclera Images in Multiple Environment . . . . 93
Sumanta Das, Ishita De Ghosh, and Abir Chattopadhyay
Data Handling Approach for Machine Learning in Wireless
Communication: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Niranjan S. Kulkarni, Sanjay L. Nalbalwar, and Anil B. Nandgaonkar
Breast Cancer Detection in Mammograms Using Deep Learning . . . . . . . 121
Abhiram Pillai, Amaan Nizam, Minita Joshee, Anne Pinto,
and Satishkumar Chavan
vii
viii Contents
Deep Learning-Based Parameterized Framework to Investigate

the Influence of Pedagogical Innovations in Engineering Courses . . . . . . 129
M. Ashok, Kumar Ramasamy, Umadevi Ashok, and Revathy Pandian
Modern Transfer Learning-Based Preliminary Diagnosis
of COVID-19 Using Forced Cough Recordings with Mel-Frequency
Cepstral Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Shariva Dhekane, Vaishnavi Agrawal, Aniruddha Datta, and Kunal Kulkarni
Biomedical Text Summarization: A Graph-Based Ranking
Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Supriya Gupta, Aakanksha Sharaff, and Naresh Kumar Nagwani
EEG-Based Diagnosis of Alzheimer’s Disease Using Kolmogorov
Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Digambar Puri, Sanjay Nalbalwar, Anil Nandgaonkar, and Abhay Wagh
Quantification of Streaking Effect Using Percentage Streak Area . . . . . . 167
Sajjad Ahmed and Saiful Islam
Improving Topographic Features of DEM Using Cartosat-1 Stereo
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Litesh Bopche and Priti P. Rege
Active Noise Cancellation System in Automobile Cabins Using
an Optimized Adaptive Step-Size FxLMS Algorithm . . . . . . . . . . . . . . . . . . 187
Arinjay Bisht and Hemprasad Yashwant Patil
FFT-Based Robust Video Steganography over Non-dynamic
Region in Compressed Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Rachna Patel, Kalpesh Lad, and Mukesh Patel
An Improved Approach for Devanagari Handwritten Characters
Recognition System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Rajdeep Singh, Arvind Kumar Shukla, Rahul Kumar Mishra, and S. S. Bedi
PSO-WT-Based Regression Model for Time Series Forecasting . . . . . . . . 227
P. Syamala Rao, G. Parthasaradhi Varma, and Ch. Durga Prasad
Leaf Diagnosis Using Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
Prashant Udawant and Pravin Srinath
Attendance System Using Face Recognition Library . . . . . . . . . . . . . . . . . . 247
Bhavna Patel, Vedika Patil, Onkar Pawar, Omkar Pawaskar, and J. R. Mahajan
Studies on Performance of Image Splicing Techniques Using
Learned Self-Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Bhukya Krishna Priya, Anup Das, Shameedha Begum,
and N. Ramasubramanian
Contents ix
Random Forest and Gabor Filter Bank Based Segmentation

Approach for Infant Brain MRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Vinodkumar R. Patil and Tushar H. Jaware
Sensory-Motor Cortex Signal Classification for Rehabilitation
Using EEG Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Vinay Kulkarni, Yashwant Joshi, and Ramchandra Manthalkar
D-CNN and Image Processing Based Approach for Diabetic
Retinopathy Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Armaan Khan, Nilima Kulkarni, Ankit Kumar, and Anirudh Kamat
Pothole Detection Using YOLOv2 Object Detection Network
and Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
R. Sumalatha, R. Varaprasada Rao, and S. M. Renuka Devi
A New Machine Learning Approach for Malware Classification . . . . . . . 301
G. Shruthi and Purohit Shrinivasacharya
Analysis of Feature Selection Techniques to Detect DoS Attacks
Using Rule-Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Atharva Vaidya and Deepak Kshirsagar
Botnet Detection Using Bayes Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Prapti Kolpe and Deepak Kshirsagar
Insider Attack Prevention using Multifactor Authentication
Protocols - A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Siranjeevi Rajamanickam, N. Ramasubramanian, and Satyanarayana Vollala
Link Scheduling in Wireless Mesh Network Using Ant Colony
Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Makarand D. Wangikar and Balaji R. Bombade
Development of an Integrated Security Model for Wireless Body
Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
K. R. Siva Bharathi and R. Venkateswari
An Improved Node Mobility Patten in Wireless Ad Hoc Network . . . . . . 361
Manish Ranjan Pandey, Rahul Kumar Mishra, and Arvind Kumar Shukla
IGAN: Intrusion Detection Using Anomaly-Based Generative
Adversarial Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Jui Shah and Maniklal Das
CodeScan: A Supervised Machine Learning Approach to Open
Source Code Bot Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
Vipul Gaurav, Shresth Singh, Avikant Srivastava, and Sushila Shidnal
x Contents
Green Internet of Things: The Next Generation Energy Efficient

Internet of Things . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Navod Neranjan Thilakarathne, Mohan Krishna Kagita,
and W. D. Madhuka Priyashan
iGarbage: IoT-Based Smart Garbage Collection System . . . . . . . . . . . . . . . 403
Zofia Noorain, Mohd. Javed Ansari, Mohd. Shahnawaz Khan,
Tauseef Ahmad, and Md. Asraful Haque
IoT-Based Smart Home Surveillance System . . . . . . . . . . . . . . . . . . . . . . . . . 417
Shruti Dash and Pallavi Choudekar
Optimized Neural Network for Big Data Classification Using
MapReduce Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Sridhar Gujjeti and Suresh Pabboju
Impact of Deployment Schemes on Localization Techniques
in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
Pratee k, Aakansha Garg, and Rajeev Arya
A Survey on 5G Architecture and Security Scopes in SDN and NFV . . . . 447
Jehan Hasneen and Kazi Masum Sadique
Study and Analysis of Hierarchical Routing Protocols in Wireless
Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
Ankur Choudhary, Santosh Kumar, and Harshal Sharma
Circularly Polarized 1 × 4 Antenna Array with Improved Isolation
for Massive MIMO Base Station . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
Ravindra S. Bakale, Anil B. Nandgaonkar, S. B. Deosarkar, and R. Bhadade
Analysis of Rectangular Microstrip Array Antenna Fed Through
Microstrip Lines with Change in Width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Tarun Kumar Kanade, Alok Rastogi, Sunil Mishra, and Vijay D. Chaudhari
Parametric Study of Electromagnetic Coupled MSA Array
for PAN Devices with RF Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
Shilpa Nandedkar, Shankar Nawale, and Anirudha Kulkarni
Fractal Tree Microstrip Antenna Using Aperture Coupled Ground . . . . . 507
Sanjay Khobragade, Sanjay Nalbalwar, and Anil Nandgaonkar
Wind Speed at Hub Height (Using Dynamic Wind Shear)
and Wind Power Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519
Rohit Kumbhare, Suraj Sawant, Sanand Sule, and Amit Joshi
Modeling and Simulation of Microgrid with P-Q Control
of Grid-Connected Inverter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Nasir Ul Islam Wani, Anupama Prakash, and Pallavi Choudekar
Contents xi
Smart Student Assessment System for Online Classes Participation . . . . 541

Sudheer Kumar Nagothu
Recommendation System for Location-Based Services . . . . . . . . . . . . . . . . 553
Ritigya Gupta, Ishani Pandey, Kritika Mishra, and K. R. Seeja
Optimal and Higher Order Sliding Mode Control for Systems
with Disturbance Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
Ishwar S. Jadhav and Gajanan M. Malwatkar
Synchronization and Secure Communication of Chaotic Systems . . . . . . . 575
Ajit K. Singh
Improvement in Ranking Relevancy of Retrieved Results
from Google Search Using Feature Score Computation Algorithm . . . . . 585
Swati Borse and B. V. Pawar
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

About the Editors
Dr. Brijesh Iyer received his Ph.D. degree in Electronics and Telecommunication
Engineering from Indian Institute of Technology, Roorkee, India, in 2015. He is Asso-
ciate Professor in the University Department of E&TC Engineering at Dr. Babasaheb
Ambedkar Technological University, Lonere (A State Technological University). He
is a recipient of INAE research fellowship in the field of Engineering. He had 02
patents to his credit and authored over 50 research publications in peer-reviewed
reputed journals and conference proceedings. He had authored 05 five books on
curricula as well as cutting-edge technologies like sensors and healthcare technology.
He has served as Program Committee Member of various international conferences
and Reviewer for various international journals. His research interests include RF
front-end design for 5G and beyond, IoT and biomedical image/signal processing.
Dr. Debashis Ghosh is presently working as Full Professor and Head of the Depart-
ment of E&CE Engineering at IIT Roorkee. He had earned his B.E., M.Sc. (Engi-
neering) and Ph.D. from MNIT, Jaipur, IISC Bangalore in the year 1993, 1996
and 2000, respectively, in the area of E&CE Engineering as a major. He was
working as Visiting Professor to the many reputed overseas Technological Insti-
tutes/Universities. He is a recipient of “Excellence in Teaching” award of Multi-
media University, Malaysia, in the year 2007. He has a vast experience of handling
research and consultancy projects at IIT Guwahati and IIT Roorkee to his credit. He
had published several research papers in various journals and conferences of inter-
national and national repute. His area of interest includes communication systems
& signal processing, cognitive radio & sensor networks, image & video processing,
computer vision & pattern recognition.
Valentina Emilia Balas is currently Full Professor in the Department of Auto-

matics and Applied Software at the Faculty of Engineering, “Aurel Vlaicu” University
of Arad, Romania. She holds a Ph.D. in Applied Electronics and Telecommunica-
tions from Polytechnic University of Timisoara. Dr. Balas is the author of more than
350 research papers in refereed journals and international conferences. Her research
interests are in intelligent systems, fuzzy control, soft computing, smart sensors,
information fusion, modeling and simulation. She is Editor-in-Chief to International
xiii
xiv About the Editors
Journal of Advanced Intelligence Paradigms (IJAIP) and to International Journal of

Computational Systems Engineering (IJCSysE), Editorial Board Member of several
national and international journals and Evaluator Expert for national and interna-
tional projects and Ph.D. thesis. Dr. Balas is Director of Intelligent Systems Research
Centre in Aurel Vlaicu University of Arad and Director of the Department of Interna-
tional Relations, Programs and Projects in the same university. Dr. Balas participated
in many international conferences as Organizer, Honorary Chair, Session Chair and
Member in Steering, Advisory or International Program Committees. Now she is
working in a national project with EU funding support: BioCell-NanoART = Novel
Bio-inspired Cellular Nano-Architectures - For Digital Integrated Circuits, 3M Euro
from National Authority for Scientific Research and Innovation. She is Member of
EUSFLAT and SIAM, Senior Member of IEEE, Member in TC—Fuzzy Systems
(IEEE CIS), Chair of the TF 14 in TC—Emergent Technologies (IEEE CIS) and
Member in TC—Soft Computing (IEEE SMCS). Dr. Balas was past Vice-President
(Awards) of IFSA International Fuzzy Systems Association Council (2013–2015),
is Joint Secretary of the Governing Council of Forum for Interdisciplinary Math-
ematics (FIM), A Multidisciplinary Academic Body, India, and a recipient of the
“Tudor Tanasescu” Prize from the Romanian Academy for contributions in the field
of soft computing methods (2019).
CNN Parameter Adjustment for Brain
Tumor Classification
Toan Pham Ho and Vinh Truong Hoang
Abstract Being considered as one of the most prominent and detrimental neurolog-
ical disorders, diagnosing what category of brain tumor disease as soon as possible is
tremendously imperative for patients, which is excessively relied on human factors
on determining brain tumor type. In order to address the said issue and enhance the
classifying performance in deep learning, the paper proposes a myriad of methods
combined with Convolutional Neural Networks, namely transfer learning, data aug-
mentation, the arrangement between Batch Normalization and Dropout. Eventual
experimental results prove that the proposed approaches outperform the state-of-
the-art papers on the benchmark brain tumor dataset. The proposed architecture for
each particular Convolutional Neural Network depicts that the outcomes are more
prospective than those original methods and default-set parameters. The highest
accuracy conducted experiments is 98.8%.
Keywords Deep learning · Transfer learning · Brain tumor classifications ·

Convolutional neural networks · Batch normalization · Dropout
1 Introduction
It is knowledgeably asserted that a brain tumor is one of the most prominent and
detrimental neurological disorders among others like dementia, stroke, and Parkin-
son’s disease. A brain tumor, known as an intracranial tumor, is an abnormal mass
or growth of tissue in which cells grow and multiply out of control, seemingly
unchecked by the mechanisms that take over normal cells. Recently, more than 150
different brain tumors have been documented, but mainly there are two prime groups
of brain tumors which are termed primary and metastatic. An enormous number of
T. P. Ho · V. T. Hoang (B)
Faculty of Information Technology, Ho Chi Minh City Open University,
Ho Chi Minh City, Vietnam
e-mail: vinh.th@ou.edu.vn
T. P. Ho
e-mail: 1751010162toan@ou.edu.vn
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
B. Iyer et al. (eds.), Applied Information Processing Systems, Advances in Intelligent
Systems and Computing 1354, https://doi.org/10.1007/978-981-16-2008-9_1
2 T. P. Ho and V. T. Hoang
people from all walks of life in the United States and other regions of the world have
to deal with brain tumor symptoms yearly, which is about approximately 700,000
people. Moreover, an estimate of over 87,000 people will be unfortunately received
a primary brain tumor diagnosis in the year 2020 while the average survival rate for
all malignant brain tumor patients is only 36%, not to say that survival rates vary by
age and tumors type or grade, broadly decreasing with age. Therefore, diagnosing
what kinds of brain tumors as soon as possible after symptoms appear to carry out
plausible treatments eventually for the patient is tremendously imperative.
Nowadays, brain tumor classification has been studied widely in computer vision
works of literature, which means more and more robust brain tumor classification
models with high performance and diverse classifiers are proposing by researchers
from all over the world day by day. Jun Cheng et al. [7] proposed a method to
enhance the classification performance by applying three means (i.e., intensity his-
togram, GLCM, and raw patch-based BoW model) features to verify the effectiveness
of the recommended method namely Direct use of GLCM as features, Pipeline of
BoW-based tissue classification, and Tumor region augmentation and partition. The
ultimate outcomes were relatively objective, over 90% of assortment for each type
of brain tumor. J. Seetha and S. Selvakumar Raja [19] proposed an automatic brain
tumor detection by using based Convolutional Neural Networks (CNN) division and
the more in-depth architecture design was performed by using small kernels. The
recommended models witnessed a better improvement in the validation accuracy
compared to other algorithms such as SVM and DNN. Parnian Afshar et al. [2] pro-
posed the Capsule Networks model which had the potential to preserve the spatial
relations, due to their Routing by Agreement process. The goal of this suggestion
aimed to classify three different groups of brain tumors, which were Meningioma,
Pituitary, and Glioma. Javaria Amin et al. [3] proposed a fusion process, which is to
combine structural and texture information of four MRI sequences (T1C, T1, Flair,
and T2) for the detection of brain tumors. Also, a Discrete Wavelet Transform (DWT)
along with the Daubechies wavelet kernel was utilized for the fusion process which
provides a more informative tumor region than a single individual sequence of MRI.
Generally, the successful rate of these researches ranges from 87% (the minimum)
to 98.7% (the maximum) with diverse models and methods.
Recently, Arshia Rehman et al. [18] used CNN (Convolutional Neural Network)
models like AlexNet, Inception, and VGG16 with different kinds of improving tech-
niques to classify three types of brain tumors images (Meningioma, Glioma, and
Pituitary) in 2019. To be more specific, training parameters were adjusted, and mod-
els were fine-tuned, which achieved the accuracy of 98.6% in VGG16, 98.04% in
Inception, and 97.3% in AlexNet. Then, Deepak and Ameer [8] also implemented the
Inception model but only for the extraction stage of the two proposed models. In the
classifying step, SVM (Support Vector Machine) and KNNs (K-Nearest Neighbors)
were chosen to diagnose MRI images and accomplished significant results, with
97.8% and 98.0%, respectively. Zar Nawab Khan Swati et al. [20] proposed efficient
methods using pre-train VGG19 on ImageNet database combined with fine-tuning,
respectively, from the first block to the 6th block in 2019 for brain tumor classifica-
tion. The highest following accuracy for transfer learning VGG19 model was 96.13%
CNN Parameter Adjustment for Brain Tumor Classification 3
when fine-tuning all 6th first blocks. In this paper, we would represent many pro-
posed methods that allow based CNN architectures to accomplish a more remarkable
performance than the original form in the brain tumor classifications.
Nevertheless, five simple proposed Convolutional Neural Network Architectures
for brain tumor classification were constructed by Nyoman Abiwinanda et al. [1] in
order to assert that their results on simple models could be higher than other numerous
complicated models. Only with two 2D convolutions, two ReLU activation, and two
Maxpooling layers, they obtained 98.51% for training and 84.19% for validation.
In the year 2020, a modified CNNBCNs (Convolutional Neural Network Based on
Complex Networks) was tested and constructed by Zhiguan Huang et al. [14] with
three algorithms for randomly generating graphs such as the Erdos–Renyi (ER),
Watts–Strogatz (WS), and Barabasi–Albert (BA) that was more effective than the
original CNNBCNs in diagnosing brain tumors types via MRI images. The highest
accuracy among obtained results belonged to CNNBCN-ER, at exactly 95.49%. The
VGG16 model was chosen as a base network in proposed methods, whose name
was the Faster R-CNN architecture, presented by Yakub Bhanothu et al. [5] in 2020.
The Faster R-CNN consists of three primary blocks, namely RPN, Region of Interest
(RoI), and Region-based CNN (R-CNN) for object classification. The final results for
each brain tumor class (Glioma, Meningioma, and Pituitary) were 75.18%, 89.45%,
and 68.18%, respectively. Likewise, a developed model that consists of two primary
stages was introduced by Kazihise Ntikurako Guy-Fernand et al. [11] in the year
2020. To be more specific, input images would firstly pass through the visual attention
mechanism for training. The acquired knowledge would then be transferred to the
proposed architecture as a feature selector that mainly uses staples of CNNs such as
Convolutional layers and Batch Normalization layers. The coming out results were
quite good, at roughly 96%. In the second month of the year 2020, a new model
for brain tumor classification based on CNN’s ultimately was presented by Milica
M.Badza and Marko C.Barjaktarovic [4]. In other words, MR images at first would be
preprocessing (normalize, resize) and augmented (rotate, flip vertically) to increase
the training image database after passing through architecture with two proposed
extracting blocks which had different kinds of layers arranged. The highest obtained
accuracy in this way was around 95%. Preethi Kurian and Vijay Jeyakumar [16]
in early 2020 experimented with the CBIR task on seven separate databases with
fifteen diverse classes by implementing LeNet and AlexNet architecture and make
a comparison of validation accuracy via the number of epochs. It indicated that the
higher the figure for epoch is, the more accurate the models get.
This paper is organized as follows. Section 2 introduced the proposed approach by
enhancing different CNN frameworks. Next, Sect. 3 presents experimental results.
2 Proposed Approach
In terms of CNNs, different kinds of the version of ubiquitous CNN models such as
namely AlexNet [21], VGG [22], Inception [12], MobileNet [13], ResNet [9], and
DenseNet [17] would have experimented with proposed methods in order to draw
comparisons between original results and proposed ones. Several major parameter
values can be considered as follows:
• Original set values and techniques for original CNNs
– Optimizer: Stochastic Gradient Descent (SGD).
– Activations: ReLU for hidden layers and Softmax for FC layers.
– Regularizations: ModelCheckpoint and EarlyStopping.
• Proposing techniques to improve performance

– Optimizer: Adaptive Moment Estimation (Adam).
– Regularizations: Batch Normalization, Dilution, and ReduceLROnPlateau.
– Preprocessing images: Data augmentation (Rotation).
According to the example set values that are taken from research papers or websites
guiding how to implement the CNN model, input images firstly would be read and
resized to 224×224 for almost all working algorithms but Inception architecture
(299×299). Adam optimizer would be applied universally with the learning rate of
1e-3 instead of SGD. The setting values for both original and proposed methods
are entirely different depending on each CNN framework. Set values of original
approaches and proposed methods are illustrated in Table 1. It would be more plau-
sible for AlexNet, VGG, and Inception architectures to train with Adam’s LR of
1e-4. Data augmentation (Rotation) would be preferable to implement as well not
Table 1 Parameter values of enhancing approaches

CNNs Train/ Optimizer ReLR Augmented
Val/Test
AlexNet 4/1/5 Adam(lr = 1e-4) No Yes
VGG VGG-16 4/1/5 Adam(lr = 1e-4) No Yes
VGG-19 4/1/5 Adam(lr = 1e-4) No Yes
Inception Inception v3 4/1/5 Adam(lr = 1e-4) No No
Inception-ResNet v2 4/1/5 Adam(lr = 1e-4) No No
MobileNet 4/1/5 Adam(lr = 1e-4) No Yes
DenseNet DenseNet-121 4/1/5 Adam(lr = 1e-3) 1e-4 Yes
DenseNet-169 4/1/5 Adam(lr = 1e-3) 1e-4 Yes
DenseNet-201 4/1/5 Adam(lr = 1e-3) 1e-4 Yes
ResNet ResNet-50 4/1/5 Adam(lr = 1e-3) 1e-4 No
ResNet-50 v2 4/1/5 Adam(lr = 1e-3) 1e-4 No
ResNet-101 4/1/5 Adam(lr = 1e-3) 1e-4 No
ResNet-152 4/1/5 Adam(lr = 1e-3) 1e-4 No
Resized Images Resized Images Resized Images
Data Augmentation Data Augmentation Data Augmentation
Based ResNet's Architecture Based DenseNet's Architecture Based MobileNet's Architecture

(Without FC Layers) (Without FC Layers) (Without FC Layers)
Proposed Methods
Batch Normalization FC Layer (1024 filters) FC Layer (1024 filters)
FC Layer (3 filters) ReLU Layer ReLU Layer

Proposed Methods
Proposed Methods
Softmax Layer Batch Normalization Batch Normalization
FC Layer (512 filters) FC Layer (1024 filters)
Glioma, Meningioma or
ReLU Layer ReLU Layer
Pituitary
Batch Normalization Batch Normalization
Dropout (0.5)
FC Layer (512 filters)
FC Layer (3 filters) ReLU Layer
Softmax Layer Batch Normalization
Dropout (0.5)
Pituitary FC Layer (3 filters)
Softmax Layer
Pituitary
(a) (b) (c)

Fig. 1 Enhancing architect of ResNet (a), DenseNet (b), and MobileNet (c)
only for these said models but also for DenseNet to accomplish better results. In
terms of MobiletNet, DenseNet, and ResNet, the LR within Adam optimizer might
be set slightly higher, at precisely 1e-3, and train along with ReduceLROnPlateau
with its minimum LR of 1e-4. We eventually figure out these best-set parameters for
achieving the best results by using given deep learning algorithms. Figure 1 illustrates
enhancing architects of ResNet, DenseNet, and MobileNet, respectively.
3 Results and Discussions
3.1 Database Preparation
The proposed approach is evaluated on a benchmark brain tumor database [6], which
consists of 3,064 T1-weighted contrast-enhanced images presented for three kinds of
brain tumor categories (Glioma, Meningioma, and Pituitary). Table 2 summarizes the
characteristic of this database, and several example images are illustrated in Fig. 2.
3.2 Results
Table 3 presents both the classification performance on original methods and pro-
posed methods, respectively. Given is the eventual result Table 3 through which
comparisons are drawn between accuracy from original plans and those from pro-
posed methods to find out the most potent algorithms when classifying three different
types of brain tumors. Roughly speaking, almost all the CNN algorithms witness an
increase in the proportion of testing accuracy when applying proposed methods, by at
least around 1% and the maximum approximately 12%. While the proposed VGG16
model holds a dominant position in the percentage of accomplished accuracy, the
figure for recommended MobileNet architecture witnesses the least growth. In terms
of AlexNet, VGG, Inception, and MobileNet architectures, it could be obviously seen
that the proposed accuracy rises slightly by at least 1% for AlexNet and VGG but dra-
matically for Inception when modifying parameter values. To be more specific, the
figure for submitted Inception architectures grows remarkably at exactly 95.48% for
Inception v3 and 95.67% for Inception ResNet v2 as opposed to 91.86% for original
Table 2 Summary of brain tumor dataset

Brain tumor No of images Image size
Glioma 1,426 512×512
Meningioma 708 512×512
Pituitary 930 512×512
Fig. 2 Several examples of

brain tumor database at three
different classes
(a) (b) (c)

Table 3 Accuracy from both original and proposed methods

CNNs Original Proposed
AlexNet 97.39 97.47
VGG VGG-16 98.69 98.80
VGG-19 96.13 97.00
Inception Inception v3 91.86 95.48
Inception-ResNet v2 88.63 95.67
MobileNet 91.34 92.76
DenseNet DenseNet-121 92.25 95.97
DenseNet-169 84.28 96.52
DenseNet-201 87.88 96.43
ResNet ResNet-50 94.00 95.88
ResNet-50 v2 94.75 95.65
ResNet-101 88.92 94.86
ResNet-101 v2 94.92 95.39
ResNet-152 91.75 94.04
ResNet-152 v2 92.82 95.73
Inception v3 and 88.63% for initial Inception ResNet v2. Concerning MobileNet,
applying proposed methods that lead to a change in the architecture provides a more
significant result, at 92.76% in comparison with the actual work, at 91.34%.
About DenseNet architecture, DenseNet version 169 sees the most considerable
increase among others by implementing the proposed arrangement between Batch
Normalization and Delusion, by almost 12% in proposed accuracy. The figures for
the two versions left of DenseNet grow up gradually as well during the researched
timescale. Furthermore, all versions of ResNet’s accuracy percentages also witness
a trend-forward when applying the proposed arrangement in the FC layer. In other
words, ResNet 50 accomplishes the highest accuracy as possible, at 95.88% com-
pared to 95.65% of ResNet 50 v2, 94.86% of ResNet 101, 95.39% of ResNet 101
v2, 94.04% of ResNet 152, and 95.73% of ResNet 152 v2.
In the comparative table with other paper’s results presented in Table 4, our pro-
posed methods could achieve the accuracy of 98.8% in comparison with other refer-
ence’s precision when conducting on the same brain tumor database only by modi-
fying parameter values (Learning Rate, Batch Size, Epochs, K-Fold) rationally and
taking advance of other techniques namely ReduceLROnPlateau class, Data Aug-
mentation, Batch Normalization, and Dropout.
By implementing proposed adjusted parameters and suggested architecture for
particular Convolutional Neural Networks, the accuracy has been enhanced signifi-
cantly compared to not only pure algorithms but also other paper results. Neverthe-
less, it seems that the accomplished accuracy does not reach the efficacious peak. To
be more specific, fine-tuning these proposed models could be a more effectual way
for the accuracy to achieve.
Table 4 The comparison with previous works

Reference, Year Accuracy
Kaplan et al. 2019 [15] 91.12
Gumaei et al. 2019 [10] 94.23
Badza et al. 2020 [4] 95.40
Huang et al. 2020 [14] 95.49
Fernand et al. 2020 [11] 95.50
Swati et al. 2019 [20] 96.13
Deepak et al. 2019 [8] 98.00
Redman et al. 2019 [18] 98.69
Our method 98.80
4 Conclusion
This paper proposed enhancing methods for CNN architectures to improve the brain
tumor classification performance with a low training database rate as opposed to test-
ing one. Only by modifying values in CNN model rationally comparable to database
and models, arranging positions between Batch Normalization and Dropout suitably
to avoid conflictions, and applying other useful techniques such as ReduceLROn-
Plateau class and Data Augmentation, the final results from all experimental deep
learning algorithms based on CNN could accomplish a way more significant perfor-
mance in comparison with other paper’s results.
The paper could perceive that Batch Normalization and Dropout still work well
with each other if they are arranged wisely through these experiments. Moreover,
Learning Rate in optimizer still plays an immensely vital role in determining models’
eventual accuracy when dealing with different database and deep learning architec-
tures. Nevertheless, our proposed architectures find it quite tough to achieve flawless
accuracy in all conducting algorithms, especially for Inception, MobileNet, ResNet,
and DenseNet, at around 95%. The future of this paper’s work is to strike to imple-
ment the fine-tuning technique in our proposed models and increase the training
database volume so that the models could have enough database to digest, which
could lead to more outstanding performance.
References
1. Abiwinanda, N., Hanif, M., Hesaputra, S.T., Handayani, A., Mengko, T.R.: Brain tumor clas-
sification using convolutional neural network. In: Lhotska, L., Sukupova, L., Lacković, I.,
Ibbott, G.S. (eds.) World Congress on Medical Physics and Biomedical Engineering 2018, pp.
183–189. Springer Singapore, Singapore (2019)
2. Afshar, P., Plataniotis, K.N., Mohammadi, A.: Capsule networks for brain tumor classification
based on mri images and coarse tumor boundaries. In: ICASSP 2019–2019 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1368–1372 (2019)
3. Amin, J., Sharif, M., Gul, N., Yasmin, M., Shad, S.A.: Brain tumor classification based on
dwt fusion of MRI sequences using convolutional neural network. Pattern Recognit. Lett. 129,
115–122 (2020)
4. Badza, M., Barjaktarovic, M.: Classification of brain tumors from MRI images using a convo-
lutional neural network. Appl. Sci. 10(03), 1999 (2020)
5. Bhanothu, Y., Kamalakannan, A., Rajamanickam, G.: Detection and classification of brain
tumor in mri images using deep convolutional network. In: 2020 6th International Conference
on Advanced Computing and Communication Systems (ICACCS), pp. 248–252 (2020)
6. Cheng, J.: Brain tumor dataset (2017). https://figshare.com/articles/dataset/brain_tumor_
dataset/1512427
7. Cheng, J., Huang, W., Cao, S., Yang, R., Yang, W., Yun, Z., Wang, Z., Feng, Q.: Enhanced
performance of brain tumor classification via tumor region augmentation and partition. PLOS
ONE 10(10), 1–13 (2015)
8. Deepak, S., Ameer, P.: Brain tumor classification using deep CNN features via transfer learning.
Comput. Biol. Med. 111, 103345 (2019)
9. Gong, T., Niu, H.: An implementation of resnet on the classification of RGB-D images. In: Gao,
W., Zhan, J., Fox, G., Lu, X., Stanzione, D. (eds.) Benchmarking, Measuring, and Optimizing,
pp. 149–155. Springer International Publishing, Cham (2020)
10. Gumaei, A., Hassan, M.M., Hassan, M.R., Alelaiwi, A., Fortino, G.: A hybrid feature extraction
method with regularized extreme learning machine for brain tumor classification. IEEE Access
7, 36266–36273 (2019)
11. Guy-Fernand, K.N., Zhao, J., Sabuni, F.M., Wang, J.: Classification of brain tumor leverag-
ing goal-driven visual attention with the support of transfer learning. In: 2020 Information
Communication Technologies Conference (ICTC), pp. 328–332 (2020)
12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
13. Howard, A., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M.,
Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications
(2017)
14. Huang, Z., Du, X., Chen, L., Li, Y., Liu, M., Chou, Y., Jin, L.: Convolutional neural network
based on complex networks for brain tumor image classification with a modified activation
function. IEEE Access 8, 89281–89290 (2020)
15. Kaplan, K., Kaya, Y., Kuncan, M., Ertunç, H.M.: Brain tumor classification using modified
local binary patterns (LBP) feature extraction methods. Med. Hypotheses 139, 109696 (2020)
16. Kurian, P., Jeyakumar, V.: 3—multimodality medical image retrieval using convolutional neural
network. In: Agarwal, B., Balas, V.E., Jain, L.C., Poonia, R.C., Manisha (eds.) Deep Learning
Techniques for Biomedical and Health Informatics, pp. 53–95. Academic Press (2020)
17. Liu, Q., Xiang, X., Qin, J., Tan, Y., Tan, J., Luo, Y.: Coverless steganography based on image
retrieval of densenet features and dwt sequence mapping. Knowl.-Based Syst. 192, 105375
(2020)
18. Rehman, A., Naz, S., Razzak, M., Akram, F., Imran, M.: A deep learning-based framework for
automatic brain tumors classification using transfer learning. Circuits, Syst., Signal Process.
39 (2019)
19. Seetha, J., Raja, S.S.: Brain tumor classification using convolutional neural networks. Biomed.
Pharmacol. J. 11(3), 1457–1461 (2018)
20. Swati, Z.N.K., Zhao, Q., Kabir, M., Ali, F., Ali, Z., Ahmed, S., Lu, J.: Content-based brain
tumor retrieval for MR images using transfer learning. IEEE Access 7, 17809–17822 (2019)
21. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V.,
Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition (CVPR) (2015)
22. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception archi-
tecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR) (2016)
Advance Fuzzy Radial Basis Function
Neural Network
Abstract Data mining discovers meaningful knowledge attributes from provided

dataset and further transforms it into meaningful information. Data mining became
more popular with the combined use of statistics and machine learning, for various
scenarios like improved decision making, revenue and operation improvement, cost
reduction, anomalies detection, and many more. It is found that use of inappropriate
data mining pattern classification and clustering algorithms may lead to imprecise
decisions. Many clustering algorithms perform pattern recognition but provide poor
classification accuracy. In order to resolve this problem, the advance fuzzy radial basis
function neural network (AFRBFNN) learning algorithm for pattern classification is
proposed. The primary objective of this approach is to reduce the misclassification
rate. The proposed novel method is generic and can be applied to any kind of labeled
dataset for classification. The AFRBFNN algorithm is based on the combination
of radial basis function (RBF), fuzzy neural network (FNN), and fuzzy clustering
(FC). The approach is evaluated by using six benchmark datasets and the results
are further compared with well-known classifiers. The obtained results prove that
it is an efficient and accurate method of pattern classification and improves pattern
classification accuracy by 5–10% approximately.
Keywords Radial basis function neural network · Fuzzy membership function ·

Fuzzy clustering · Fuzzy set hypersphere
1 Introduction
Data mining and knowledge discovery from database (KDD) involve classification
and clustering approaches. The different approaches of artificial intelligence are
shown in Fig. 1. Categorizing the dataset is one of the most important steps in KDD.
Data patterns grouped together on the basis of common characteristics are known as
clusters [1]. Classification methods are used for labeled data and clustering meth-
B. S. Shetty (B) · M. S. Mahindrakar · U. V. Kulkarni

SGGS Institute of Engineering and Technology, Vishnupuri, Nanded, Maharashtra, India
e-mail: bsshetty@sggs.ac.in
12 B. S. Shetty et al.
Fig. 1 Artificial intelligence hierarchy
ods are used for unlabeled data. A lot of research has been carried out in recent
years over the use of RBFNN in designing clustering algorithms. Many clustering
techniques have been developed for pattern analysis, grouping, decision making,
document retrieval, image segmentation, and data mining; yet many significant chal-
lenges remained in the formation of the clusters in a correct way. Density, partitional,
and hierarchical are three broad categories for clustering approaches [2]. Artificial
neural network (ANN) is widely used in machine learning (ML) for solving classi-
fication and clustering problems. ANN is a 3-layer architecture (input, hidden, and
output layer). Two broad categories of ANN are feedforward and backpropagation.
Many researchers have done clustering and classification using hypersphere [3–8].
Determination of the number of hidden layers is the most researched area in ANN.
Various researchers have proposed many clustering algorithms which includes K-
means [9], enhanced K-means [10], subtractive [11], ART [12], fuzzy [13], scatter
[14], output-constricted [15], ant colony [16], artificial fish swarm [17] and particle
swarm optimization [18], genetic algorithm [19], etc. Recently, Kulkarni et al. [20]
have proposed a novel approach in which fuzzy clustering with RBFNN is used.
The major issue in any machine learning algorithm is the creation of nonlinear
boundaries for performing pattern classification and recognition. Generally, the clas-
sifiers group the patterns of the same class in the respective class cluster and perform
machine learning. Pattern classification accuracy increases with the proper construc-
tion of nonlinear boundaries. The proposed approach is an extension of the algorithm
described in [20] with improved classification accuracy. The creation of optimum
linear and nonlinear boundaries is a major objective of the proposed approach. Addi-
Advance Fuzzy Radial Basis Function Neural Network 13
tionally, it removes overlap between different classes. The designed classifier is the
combination of fuzzy clustering and RBFNN having the following advantages over
earlier RBFNNs and Fuzzy neural network classifiers.
1. The proposed approach does not make use of any tuning parameter.
2. The Gaussian neurons in the hidden layer of the RBFNN are replaced by fuzzy
neurons. The fuzzy neurons are characterized by the fuzzy membership function
due to which the clustered patterns give 100% training accuracy for any dataset.
3. The learning between the hidden layer and output layer uses optimum spread
fuzzy clustering algorithm instead of traditional least mean square algorithm.
So, the proposed AFRBFNN classifier overcomes most of the drawbacks of earlier
RBFNNs and results into a precise classifier for pattern recognition.
The paper is organized as follows. Section 2 describes basics of RBFNN in
brief. Section 3 elaborates the proposed AFRBFNN classifier architecture. Section 4
describes the learning and recall AFRBFNN algorithm. In Sect. 5, evaluation of the
AFRBFNN is done using various classifiers and datasets. Conclusions and future
work are stated in Sect. 6. The notations used in this paper are kept consistent with
FRBFNN [20], as far as possible for the reference and comparison purposes.
2 Radial Basis Function Neural Network
Radial basis function neural network (RBFNN) is a special type of feedforward net-
work which uses exactly one hidden layer. RBFNNs are widely used in classification
and regression problems. The role of RBF is the transformation of data from nonlin-
ear to linear format before performing the classification. RBF increases dimension
of feature vector and performs classification by transforming d-dimensional feature
vectors to f -dimensional feature vectors where f > d. The RBFNN is a modified
ANN which uses radial basis function in hidden layer [21]. Neurons that means clus-
ters in the hidden layer are formed during the learning phase and are characterized
by the RBF as its activation function. Quadratic, inverse, and gaussian are types
of RBFNN. The core concern over the use of RBFNN is the determination of the
centroids and the width of clusters in the hidden layer. The architecture of RBF is
composed of input, hidden, and output layers. RBFNN has only one hidden layer
and it is also referred to as a feature vector. For hidden layer formation, the radial
basis function is used. The main property of a radial function is that the membership
value of a pattern increases or decreases subsequently with an increase or decrease
in distance from a centroid. One of the features of RBF is that it gives high accu-
racy with quick convergence for dense data. Radial basis function can be used in
linear and nonlinear models. The three-layer architecture of RBF is shown in Fig. 2.
Clusters are formed in the hidden layer which is sometimes referred to as nodes
or neurons. Cluster formation demonstrated in Fig. 10 is as per [20] author Dr. A.
B. Kulkarni. Number of clusters formed are represented by H1 , H2 , …, H j . Every
Fig. 2 RBF architecture
cluster represents a subset of respective class data. Representation of kth cluster Hk

is shown here Hk = [ck1 , ck2 , …, ckn ].
The following steps show cluster formation in RBFNN:
• Step 1: Input layer receives n-dimensional input X = [x1 , x2 , . . . , xn ] and forward
to middle layer (hidden).
• Step 2: Output of hidden neuron uses the Gaussian function. Function is stated in
Eq. 2, where σ is the width of a cluster.
n
k=0 (X i −C i j )
2
ψ = ex p 2∗σ 2
(1)
• Step 3: Gradient descent method is used in determination of weights between

hidden and output layer. Wi j represents the weight between ith hidden layer and
jth output class layer.
• Step 4: Output layer assigns input to a particular cluster. In output layer of RBFNN,
output of ith node for m classes is decided by the following Eq. 2.

J
yi = f ( Wi j ∗ ψ j ) (2)
j=0
where i = 1, 2, . . . , m.
3 Advance Fuzzy Radial Basis Function Neural Network

Architecture
UFRBFNN consist of three layers. Layer’s names are the input layer, hidden layer,
and the output layer described as FI , FH , and FC layer, respectively, in Fig. 3.
FI = (X 1 , X 2 , X 3 , . . . , X n ). Input Layer
FH = (H11 , H12 , . . . , H3z ). Hidden Layer
FC = (C1 , C2 , C3 , . . . , Cn ). Class Layer
AFRBFNN has three layers namely the input, hidden, and output depicted as FI ,
FH , and FC , respectively, in Fig. 3.
FI = (X 1 , X 2 , X 3 , . . . , X n ). Input Layer
FH = (H11 , H12 , . . . , H3z ). Hidden Layer
FC = (C1 , C2 , C3 , . . . , Cn ). Class Layer
The role of the input layer FI is to accept n-dimensional input as a feature vec-
tor and forward it to the middle layer FH . It does not perform any operation. The
proposed algorithm is applied to input data and nonlinear data gets transformed into
a linear format in the hidden layer. Input data is mapped into unbounded hyperboxes
format in the hidden layer by learning algorithm given in Sect. 4. Hidden layer FH
Fig. 3 Advance fuzzy radial basis function neural network architecture

output is fuzzy hyperboxes (FHBs). If the pattern lies under the hyperbox region,
the value of membership value is 1. As input moves away from the hyperbox region,
the membership value of the input pattern gradually decreases. The formula for
determining the membership function is as shown in Eq. 3.
m j (Xh , Cpj , r j ) = 1 − f (l, r j ) (3)
where Xh = (x h1 , x h2 , . . . , x hn ) is the hth given input to be trained and Cpj , r j are

the center points and temporary radius of FHS and f () the function is defined as:

1 if l ≤ r j
f (l, r j ) =
rj < l otherwise
where l is the Euclidean distance between Xh and Cpj . The weights between FI
and FH are stored in the matrix C which gives the center points of FHSs. The FC
represents class layer and k nodes in this layer represent one of the class.
4 Proposed AFRBFNN Learning Algorithm
To design a UFRBFNN classifier, an unbounded spread fuzzy clustering (USFC)

algorithm is given below:
Let D be the input training set with P patterns for training and hth input data is
denoted by {Xh , dh }, Xh = (x h1 , x h2 , . . . , x hn ) and dh represents any K class.
For every class index k where k = 1, 2, . . . , K , follow steps from 1 to 6.
Step 1: Let M k and O k estimate same and different class input pattern distance,
respectively, and βk shows the number of patterns input in class in Ck .

S k = Xi − Xj αk X αk i, j = 1, 2, . . . , αk where Xi , Xj ∈ Ck

D k = Xi − Xj αk X p−αk i = 1, 2, . . . , αk and j = 1, 2, . . . , p − αk
where Xi ∈ Ck , Xj ∈ / Ck
Step 2: Calculate the smallest distance of every pattern Xi ∈ Ck with patterns of

other classes Xj ∈
/ Ck using data in O k matrix.
B k = min(O k )
Step 3: (Temporary Cluster creation) We choose the pattern x kj which under

covers the maximum pattern of the same class without overlapping with other pattern
class. Its value is calculated by the following formula and it is considered as radius.
Coordinates of that pattern x kj are considered as the centroid of the cluster.
radius k = max(P k )
Step 4: (Temporary SET creation) Now collect all the patterns following under
the cluster created in step 4. Using pattern x kj , Membership function stated in Eq. 3,
and initial radius radius k , input patterns of class k are collected and stored in set
SET for cluster formation.
Step 5: (Unbounded Hyperbox Node Creation in Fh Layer) Call Algorithm:
1 OSFC_CR(SET, n) by passing SET derived in the previous step and n as number
of features of input pattern. Here, creation of new nodes occur in Fh Layer and
unbounded hyperbox creation occurs.
Step 6: (Update βk count) As βk shows the number of patterns of Ck class,
update βk .
n j = COUNT(SET) COUNT function calculates number of patterns in SET
βk = βk − nj
where n j is the number of patterns included in the cluster.
Step 7: (Check Pattern exists in class βk ) If βk > 0, go to step 1.
Step 8: (Class Node Creation in Fc Layer) Creation of a class node in output
layer with label k.
Step 9: (Fh to Fc Link Creation) Make connection between output layer with
respective FHB belonging to class k.
Algorithm 1 —Calculation of centroid and radius

1: I nput: a set of SET and no. of features n
2: Out put: Centroid and Radius
3: procedure OSFC_CR(SET, n)
4: for each feature f in n do
5: Tmin ← 0
6: Tmax ← 100
7: for each pattern p in S do
8: if SET[ p][ f ] > Tmax then
9: Tmax ← SET[ p][ f ]
10: end if
11: if SET[ p][ f ] < Tmin then
12: Tmin ← SET[ p][ f ]
13: end if
14: centr oid[ f ] = [Tmax - Tmin ]/2
15: smax[ f ] = Tmax
16: smin[ f ] = Tmin
17: end for
18: end for
19: for each feature k in n do
20: Centr oid f inal [k] = smin[k] + centr oid[k]
21: end for
22: radius ← max(centroid)
23: return Centr oid f inal and radius
24: end procedure
Algorithm 2 —Calculation of number of patterns clusterd by HS

1: I nput: A set S and radius
2: Out put: nj
3: procedure NoP(S, radius)
4: for each pattern j in S do
5: n←0
6: for each pattern h in S do
7: if S[ j][h] ≤ radius then
8: n ←n+1
9: end if
10: end for
11: nj ← n
12: end for
13: return n j
14: end procedure
5 Performance Evaluation and Analysis of the AFRBFNN
To evaluate the performance of AFRBFNN classifier, two case studies along with
obtained results have been discussed in the following sub-sections. The learning
algorithm is implemented in MATLAB 2019a.
5.1 Case Study with 2-D Examples
Example 1: In this experiment, for a better understanding of OSFC algorithm and to

build AFRBFNN classifier, a two-dimensional example with 2 classes and 6 patterns
is used. Also while constructing the network with OSFC algorithm, its performance
is compared with the maximum spread fuzzy clustering (MSFC) algorithm given in
[20]. The patterns, their features, and class labels are as per given in Table 1. Post
successful compilation of OSFC algorithm, it constructs two clusters for the given
classes. The centroids of class 1 cluster are (13.5, 13.5) with radius 2.13. It is shown
in Fig. 4b. The class 2 centroid is (16.5, 16.5) with 0.5 radius. It is shown in Fig. 5b.
Figure 6b shows clusters formed after training of all patterns by OSFC algorithm.
Clusters formed by class 1 and class 2 with same dataset using MSFC algorithm
are shown in Figs. 4a and 5a, respectively. The detailed description of these cluster
formations using the MSFC algorithm is given in [20].
Table 1 Case study 1—Example dataset

Sr. no. Pattern class Total count Feature vectors
1 1 4 (12, 12), (13, 13), (14,14), (15,15)
2 2 2 (16, 16), (17, 17)
Fig. 4 Case study

1—Example 1—class 1
cluster formation
Fig. 5 Case study

cluster formation
Fig. 6 Case Study

1—Example 1—class 1 and
class 2 cluster formation
Figure 6a, b shows the final result of formed clusters by MSFC and OSFC algo-
rithm, respectively. When the above experimentation is compared with [20], it is
evident that there are reduced radii of 3.03 and 2.33 in class 1 and class 2, respec-
tively. The details of class 1 is shown in Fig. 7a and class 2 is shown in Fig. 7b. This
figure shows the redundant area that has been occupied by the MFSC method. It also
shows the comparison of MFSC method with OFSC method classwise. As and when
the proposed algorithm will be applied to a larger dataset, it will significantly reduce
the misclassification rate, increase the classification accuracy, and improve decision
making.
Fig. 7 Case study

1—Example
1—comparison—class 1 and
comparison—class 2
Table 2 Case study 2—Example 2—dataset

1 1 8 (1, 9), (1, 5), (1.5, 8.5), (2, 4.5), (1.5, 7),
(0.75, 6), (0.75, 8), (1, 7.5)
2 2 8 (4, 1), (2, 1.5), (2, 3.5), (1, 1), (1.5, 2.5),
(3, 1.5), (0.75, 3), (3.25, 4)
3 3 8 (3, 9), (4, 5), (3.75, 9), (4, 7), (3.75, 8),
(3.7, 6), (3.5, 8.5), (3, 8)
Initially, as per the steps in the algorithm, cluster formation for any one of the k
class can be initiated. In this example, let us consider class 1 starts first, the intra class
S k and inter-class D k distance for all αk patterns of class 1 are calculated using step
1. In steps 2 and 3, the pattern (12, 12) of class 1 as centroid with initial radius 2.13
forms cluster of class 1 patterns using membership function stated in Eq. 3. Here in
the same step set S is created with all class 1 patterns included in the cluster. Cluster
with centroid and optimum radius using patterns in set S is calculated using step
5 of Algorithm 1. The cluster formed before calculation of optimum centroid and
radius is like the cluster formed by MSFC algorithm shown in Fig. 4a. The cluster
formed for class 1 is as shown in Fig. 4b. Once the possible clusters for one class
along with their connections with class nodes are done, then as per step 8, the same
procedure is repeated for class 2. As per MSFC method in [20], class 2 pattern (17,
17) is selected for centroid with initial radius to be 2.83 and it is shown in Fig. 5a.
As per the proposed algorithm, the optimum radius 0.5 and centroid (16.5, 16.5)
are calculated using Algorithm 1 and respective cluster is shown in Fig. 5b. Here
architecture diagram consists of 2 input nodes for 2 features, 2 hidden layer nodes
for 2 clusters, and 2 output nodes for 2 classes.
Example 2: In this example, 3 classes and 24 patterns were considered. In order to
compare our proposed OFSC algorithm with MSFC algorithm [20], input is derived
from lateral [20]. The patterns with their features and class labels are given in Table 2.
Class 1, 2, and 3 patterns are shown by green, red, and blue colors, respectively, as
shown in Fig. 8.
Fig. 8 Scatter plot of Table 2: case study 2
After applying Algorithm 1, four clusters are created with optimum radii for
all classes as shown in Fig. 9b. Comparison of MSFC with the proposed OSFC
algorithm is as shown in Fig. 9a, b. Centroid and radius values of all the classes
are given in Table 3. The AFRBFNN architecture is shown in Fig. 10 and it consists
of an input layer with 2 nodes representing 2 features. Four nodes in the hidden
layer represent 4 clusters of three classes. Three nodes in the output layer represent
3 classes, respectively. From both the above examples, we can conclude that the
overlapping region between inter-class clusters is decreased in OSFC algorithm due
to optimum centroid and radius calculation. Reduced overlapping region improves
prediction accuracy and decreases misclassification of patterns. To prove this, the
performance of the same algorithm is evaluated with other classifiers in the next
section.
5.2 Case Study 2
In case study 2, the proposed OSFC algorithm performance in AFRBFNN is verified.

AFRBFNN is evaluated by using 7 datasets from the UCI repository and compared
with well-known pattern classification methods. For a fair comparison, the experi-
mental setup is kept as per [20]. The k-fold cross-validation with k equal to 5 is used
for evaluation. The results are shown in Table 4 and it is compared with results given
Fig. 9 Case study 2—Example 2—cluster formation by MSFC and OSFC approach
Table 3 Centroid and radii comparison for case study 2

Centroid Radius Class
MSFC (0.75, 6.0), (1.0, 9.0) 2.61, 1 1
(1,9) 0 1
(4.0, 1.0) 3.8161 2
(4.0, 7.0) 2.2361 3
OSFC (1.37, 6.5) 2 1
(2.37, 1.75) 1.6 2
(6,7) 3 3
in [20]. The analysis of results shows that the classification accuracy of AFRBFNN
is higher than all other classifiers listed. OSFC algorithm improves classification
accuracy in a more better way than all classifiers shown in comparison.
6 Conclusions and Future Work
We have proposed a novel precise clustering algorithm AFRBFNN for the clas-
sification of patterns. The proposed algorithm improves the training accuracy by
5–10% approximately over other RBF-based classifiers. Because of optimum clus-
ter centroid calculation and its respective radii value, the number of clusters created
is the same as that of FRBFNN algorithm. This method assures linear separability
of nonlinear input in a more efficient way leading to better classification accuracy.
Unnecessary overlap between inter-class clusters is removed because of optimum
cluster radii value. Removal of overlap leads to an increase in pattern recognition
Fig. 10 AFRBFNN architecture for case study 2
Table 4 Classification accuracy by five-fold cross-validation on standard dataset

Dataset OSFC MSFC RBF RBF-R RBFN RBF-WTA
Hepatitis 93 88.2 65.0 81.9 81.1 82.1
Heart 90 77.0 73.5 81.9 80.5 80.6
Liver 80 69.3 53.8 62.2 62.8 61.0
Ionosphere 98 90.0 81.5 95.5 95.2 94.3
Monks-3 92 84.8 97.5 99.0 95.8 68.6
Breast 99 96.3 94.1 96.3 96.4 97.0
Pima 86 76.9 71.0 75.3 72.1 73.8
rate and a decrease in misclassification rate. The improved algorithm gives the opti-
mized classification with better fidelity and accuracy. The proposed algorithm is not
sensitive to any tuning parameter. The number of clusters formed in the hidden layer
is independent of the input order of training data.
The proposed algorithm can be further enhanced by using the feature selection and
dimension reduction method. Also, pattern classification accuracy can be increased
by the creation of k more clusters using the k-means algorithm.
References
1. Bindra, K., Mishra, A.: “A detailed study of clustering algorithms”. In: International Confer-
ence on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future
Directions), Sep. 20–22: AIIT. Amity University Uttar Pradesh, Noida, India (2017)
2. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3),
264–323 (1999)
3. Kulkarni, U.V., Sontakke, T.R.: Fuzzy hypersphere neural network classifier. In: 10th Interna-
tional Conference on Fuzzy Systems, Melbourne, Victoria, Australia, pp. 1559–1562 (2001)
4. Kulkarni, U.V., Doye, D.D., Sontakke, T.R.: General fuzzy hypersphere neural network. In:
Proceedings of the 2002 International Joint Conference on Neural Network, Honolulu, HI,
USA, USA, vol. 3, pp. 2369–2374 (2002)
5. Doye, D.D., Kulkarni, U.V., Sontakke, T.R.: Speech recognition using modified fuzzy hyper-
sphere neural network. In: Proceedings of the International Joint Conference on Neural Net-
works (IJCNN02), Honolulu, Hawaii, vol. 1, pp. 65–68 (2002)
6. Patil, P.M., Kulkarni, U.V., Sontakke T.R.: Modular fuzzy hypersphere neural network. In:
The 12th IEEE International Conference on Fuzzy Systems, St Louis, MO, USA, vol. 1, pp.
232–236 (2003)
7. Patil, P.M., Kulkarni, S.N., Patil, A.J., Doye, D.D., Kulkarni, U.V.: Modular general fuzzy
hypersphere neural network. In: 17th IEEE International Conference on Tools with Artificial
Intelligence, Hong Kong, China, vol. 4, pp. 211–216 (2005)
8. Sonar, D.N., Kulkarni, U.V.: Pruned fuzzy hypersphere neural network (PFHSNN) for lung
cancer classification. Int. J. Comput. Appl. 157, 36–39 (2017)
9. Moody, J., Darken, C.J.: Fast learning in networks of locally-tuned processing units. Neural
Comput. 1, 281–294 (1989)
10. Chen, S.: Nonlinear time series modeling and prediction using Gaussian RBF networks with
enhanced clustering and RLS learning. Electr. Lett. 3, 117–118 (1995)
11. Sarimveis, H., Alexandridis, A., Bafas, G.: A fast training algorithm for RBF networks based
on subtractive clustering. Neurocomputing 501–505 (2003)
12. Shie-Jue, L., Chun-Liang, H.: An ART-based construction of RBF networks. IEEE Trans.
Neural Netw. 13, 1308–1321 (2002)
13. Tsekouras, G.E., Tsimikas, J.: On training RBF neural networks using input-output fuzzy
clustering and particle swarm optimization. Fuzzy Sets Syst. 221, 65–89 (2013)
14. Sohn, I., Ansari, N.: Configuring RBF neural networks. Electronic Lett. 34, 684–685 (1998)
15. Wang, D., Zeng, X.J., Keane, J.A.: A clustering algorithm for radial basis function neural
network initialization. Neurocomputing 77, 144–155 (2012)
16. Li, J., Liu, X.: Melt index prediction by RBF neural network optimized with an adaptive new
ant colony optimization algorithm. J. Appl. Polym. Sci. 119, 3093–3100 (2011)
17. Shen, W., Guo, X., Wu, C., Wu, D.: Forecasting stock indices using radial basis function neural
networks optimized by artificial fish swarm algorithm. Knowl. Based Syst. 24, 378–385 (2011)
18. Feng, H.M.: Self-generation RBFNs using evolutional PSO learning. Neurocomputing 70,
241–251 (2006)
19. Billings, S.A., Zheng, G.L.: Radial basis function network configuration using genetic algo-
rithms. Neural Netw. (8), 877–890 (1995)
20. Kulkarni, A., Bonde, S., Kulkarni, U.: A Novel fuzzy clustering algorithm for radial basis
function neural network. Int. J. Fut. Revolut. Comput. Sci. Commun. Eng. 4(4), 751–756
(2018). ISSN: 2454-4248
21. Raitoharju, J., Kiranyaz, S., Gabbouj, M.: Training radial basis function neural networks for
classification via class specific clustering. IEEE Trans. Neural Netw. Learn. Syst. 27, 2458–
2471
Unbounded Fuzzy Radial Basis Function
Neural Network Classifier
Abstract Area of pattern recognition deals with the recognition of patterns by using
different machine learning algorithms without human intervention. Many different
data mining algorithms are used in pattern recognition. The selection of an appropri-
ate and precise algorithm is very crucial. An imprecise algorithm may lead to generate
a wrong decision. Recognition can be supervised or unsupervised. This paper presents
a novel unbounded fuzzy radial basis function neural network (UFRBFNN) classifier
model to perform the supervised classification. This classifier is constructed using
fuzzy clustering and further clusters are converted into fuzzy hyperboxes. Fuzzy set
hyperboxes (FHBs) represent the neurons in the hidden layer. The creation of these
FHBs is based on the unbounded spread from inter-class information and intra-class
fuzzy membership function. The proposed approach is faster and independent of the
tuning parameters. The output is determined by the union operation of the FHBs
outputs which are connected to the class nodes in the output layer. Using K-fold
cross-validation, the UFRBFNN model is verified by applying 7 different standard
datasets from (UCI) machine learning repository and further by comparing results
with well-known radial basis function neural network (RBFNN) variants. The anal-
ysis of the result shows that the proposed model provides 5–10% improved training
accuracy with previous radial basis function classifiers.
Keywords Fuzzy set · Fuzzy neuron · Fuzzy clustering · Fuzzy hyperbox · Radial
basis function neural network
1 Introduction
Pattern classification is the area devoted to the study of techniques designed to cat-
egorize data into distinct classes. It is under extensive study for a long time as it
is a classification problem that the human brain can achieve incredibly well, but is
difficult for computers to perform.
B. S. Shetty (B) · M. S. Mahindrakar · U. V. Kulkarni

SGGS Institute of Engineering and Technology, Nanded, Vishnupuri, Nanded, Maharshtra, India
e-mail: bsshetty@sggs.ac.in
However, the fuzzy min–max neural network (FMMN) is one of the efficient and
powerful models [1] for pattern classification. The FHSNN [2] classifier is one of
the variants proposed by U. V. Kulkarni in this year 2001. A weighted FMN is pro-
posed by Kim and Yang [3]. In this model, a hyperbox can be expanded without
considering the hyperbox contraction process as well as the overlapping test. During
the training of patterns, the feature distribution information is utilized to avoid the
hyperbox distortion, and such distortion may be caused by eliminating the overlap-
ping area of hyperboxes in the contraction process. Ma, Liu, and Wang proposed an
FMN-based novel algorithm for pattern classification [4]. In this model, a new mem-
bership function of hyperbox is defined in which the characteristics are considered.
Additionally, it does not use a contraction process, but needs only expansion, and
no additional neurons have been used with the overlapped area. An enhanced FMN
(EFMM) is proposed by Falah Mohammed and Lim [5]. In EFMM, three heuristic
rules are proposed to eliminate the overlapping problem and to discover and resolve
possible overlapping cases.
In recent years, the RBFNNs [6] have become popular pattern classifiers which
have been applied in several engineering applications. The key issue with the RBFNN
is the determination of centroids and radii of the radial basis functions along with
the number of hidden nodes in the hidden layer [7]. A lot of research has been
carried out in recent years over the use of RBFNN in designing clustering algorithms.
Many clustering techniques have been developed for pattern analysis, grouping,
decision making, document retrieval, image segmentation, and data mining, and yet
many significant challenges remained in the formation of the clusters in the correct
way. Density, partitional, and hierarchical are three broad categories for clustering
approaches [8]. The popular approaches proposed by researchers to create hidden
layer nodes are provided in various research papers [6, 9–21]. Similarly, FMMN
model has several limitations. Over the period, many variants [3, 4, 22] are proposed
to overcome these limitations. In this paper, Unbounded Fuzzy Radial Basis Function
Neural Network Algorithm is proposed. The designed classifier is a combination of
RBF-based and fuzzy neural network classifiers.
The proposed approach has the following features:
1. The proposed model is independent of the tuning parameter.
2. The hidden layer Gaussian neurons of the RBFNN are replaced by fuzzy neurons.
These neurons are characterized by the fuzzy membership function due to which
the trained network gives 99% training accuracy for any standard dataset.
3. The learning between the hidden layer and output layer uses unbounded spread
fuzzy hyperbox algorithm (USFH) instead of traditional least mean square algo-
rithm.
4. The connection between the hidden layer and output layer, i.e., the creation of
class nodes, is done concurrently with the creation of (fuzzy set hyperboxes)
FSHBs in the hidden layer. The output of the class nodes is determined by the
union operation of the respective output of FSHs.
Hence, the proposed UFRBFNN classifier overcomes most of the drawbacks of ear-
lier RBFNNs and Fuzzy neural network classifiers. The rest of the paper is organized
Unbounded Fuzzy Radial Basis Function Neural Network Classifier 27
as follows. Section 2 describes fundamentals of RBFNN in brief. Section 3 describes

the architecture of UFRBFNN classifier. In Sect. 4 learning rule, i.e., unbounded
spread fuzzy hyperbox algorithm to construct UFRBFNN classifier is discussed.
Simulation results with case studies are explained in Sect. 5. Finally, Sect. 6 con-
cludes the paper with future work.
2 Radial Basis Function Neural Network
The simplest form of the Radial Basis Function network is a three-layer feed-forward
neural network. Training input is provided in the first layer of the network The second
layer is a hidden layer with multiple RBF nonlinear activation functions. The last
layer corresponds to the class layer and it gives the final output in the network. Radial
basis function is applied in hidden layer which maps m-dimensional n-patterns of
training dataset [m ∗ n] to m1-dimensional n-patterns [m1 ∗ n] where m1 > m, by
adding more dimensions to input. Hidden layer transforms and maps nonlinear m-
dimensional patterns to linear m1-dimensional patterns. The number of hidden layers
is always less than the number of input patterns. Hidden layers are any hyperplane
which may be represented by any shapes like circle, box, cluster, hypersphere, hyper-
box, and hyperline. RBFNN classifiers are useful to solve regression, prediction, and
classification problems. RBF is a special type of multi-layer perceptron (MLP) with
a single middle layer. The main property of a radial function is the membership value
of a pattern, which increases or decreases subsequently with an increase or decrease
in distance from the centroid. The three-layer architecture of RBF is as shown in
Fig. 1. The clusters are formed in the hidden layer of RBFNN. The number of clus-
ters formed is represented by [H1 , H2 , …, H j ]. Every cluster represents a subset of
respective class data. Representation of kth class cluster Hk is shown as Hk = [ck1 ,
ck2 , …,ckn ].
The following steps demonstrate cluster formation in RBFNN:
– Step 1: First layer receives n-dimensional input X = [x1 , x2 , ..., xn ] and forwards
it to middle layer (hidden). Number of nodes in input layer is equal to number of
features in dataset. Creation of input layer occurs here.
– Step 2: Any radial basis function is used for creation of middle layer. In this
example, the Gaussian function is described and it is shown in Eq. 2, where σ is
the cluster width. n
k=0 (X i −C i j )
2
ψ = ex p 2∗σ 2
(1)
– Step 3: Gradient descent method is used in determination of weights between

hidden and output layer. Wi j represents the weight between ith hidden layer and
jth output class layer.
Fig. 1 Radial basis function general architecture
– Step 4: Output layer assigns input to a particular cluster. In output layer of RBFNN,
output of ith node for m classes is decided by following Eq. 2.
⎛ ⎞

J
yi = f ⎝ Wi j ∗ ψ j ⎠ (2)
j=0
where i = 1, 2, ..., m.
3 Unbounded Fuzzy Radial Basis Function Neural

Network Architecture
The UFRBFNN consist of 3 layers and represented as input layer FI , hidden layer
FH , and the output layer FC as shown in Fig. 2.
FI = (X 1 , X 2 , X 3 , ..., X n )
FH = (H11 , H12 , ..., H3z )
FC = (C1 , C2 , C3 , ..., Cn )
The role of the input layer FI is to accept n-dimensional input as a feature vector
and forward it to the middle layer FH . It does not perform any operation. The pro-
posed algorithm is applied to input data and nonlinear data gets transformed into a
linear format in the hidden layer. Input data is mapped into unbounded hyperboxes
format in the hidden layer by learning algorithm given in Sect. 4. Hidden layer FH
Fig. 2 Unbounded fuzzy radial basis function neural network architecture
output is fuzzy hyperboxes (FHBs). If the pattern lies under the hyperbox region, the
membership value is 1. As input moves away from the hyperbox region, the mem-
bership value of the input pattern gradually decreases. The formula for determining
the membership function is as shown in Eq. 3.
m j (Xh , Cpj , r j ) = 1 − f (l, r j ) (3)
where Xh = (x h1 , x h2 , ..., x hn ) is hth given input to be trained and Cpj , r j are the
center points and radius of cluster. The function f () is defined as

1 if l ≤ r j
f (l, r j ) = (4)
rj < l otherwise
where l is the Euclidean distance between Xh and Cpj . The weights between FI
and FH are stored in the matrix C which gives the center points of FHSs. The FC
represents class layer and k nodes in this layer represent one of the class.
4 Proposed UFRBFNN Learning Algorithm
To design a novel UFRBFNN classifier, unbounded spread fuzzy hyperbox (USFH)

algorithm is as given below:
Let D be the input training set with p patterns and hth input data is denoted by
ordered pair {Xh , dh }, where Xh = (x h1 , x h2 , ..., x hn ) pattern features and dh repre-
sents one of the K class index.
For every class index k where k = 1, 2, ..., K , follow steps from 1 to 9.
Step 1: Let M k and O k estimate the distance between similar and different class
patterns, respectively, and βk shows the number of patterns in class Ck .

M k = Xi − Xj βk Xβk i, j = 1, 2, ..., βk where Xi , Xj ∈ Ck

O k = Xi − Xj βk X p−βk i = 1, 2, ..., βk and j = 1, 2, ..., p − βk
where Xi ∈ Ck and Xj ∈ / Ck
Step 2: Calculate the smallest distance of each pattern Xi ∈ Ck from the patterns
of other class Xj ∈
/ Ck using O k .
B k = min(O k )
Step 3: (Cluster creation) For cluster creation, from B k , select the pattern x kj
having maximum distance and consider it to be the centroid of cluster and radius
equal the maximum distance calculated as
radius k = max(B k )
Step 4: (SET creation) Using pattern x kj as centroid, membership function stated

in Eq. 3, and initial radius radius k , find class k patterns included by cluster formed
in step 3. Store all such patterns in SET for fuzzy hyperbox formation.
Step 5: (Unbounded hyperbox node creation in Fh layer) Call Algorithm:
1 USFH_LU(SET, n) with SET and n as number of pattern features. Here a new
unbounded FHB node is created in Fh layer.
Step 6: (Update βk count) n j = COUNT(SET) COUNT calculates number of
patterns in SET.
βk = βk − nj
where n j is the number of patterns included in cluster.

Step 7: (Check βk value) If βk > 0 , then go to step 1.
Step 8: (Class node creation in Fc Layer ) Creation of a class node in output
layer with label k.
Step 9: (Link creation) Create a link between the Fh nodes and respective class
node of Fc layer.
5 Performance Evaluation and Analysis of the UFRBFNN

Classifier
To evaluate the performance of UFRBFNN classifier, two case studies along with
obtained results have been discussed in the following sub-sections. The learning
algorithm is implemented in MATLAB 2019a.
5.1 Case Study with 2-D Examples
5.1.1 Example 1
In this experiment, for a better understanding of USFH algorithm and to build

UFRBFNN classifier, a 2-D example with 2 classes and 6 patterns is used. The
pattern features and class labels are as per given in Table 1. Initially, as per the algo-
rithm, the clustering process for the class 1 patterns is as follows: In step 1, intra-class
distance Sk for class 1 patterns and inter-class distance Dk with class 2 patterns are
calculated. Cluster and SET are created using steps 2, 3, and 4. The formed cluster is
as shown in Fig. 3a. Now once cluster formation is over, the process of FHB creation
is initiated in step 5. Using Algorithm 1, lower and upper bounds are calculated for
Algorithm 1 – Calculation of Min and Max values of Hyperbox.

1: I nput: a set of SET and number of features n
2: Out put: Min and Max values of Hyperbox
3: procedure USFH_LU(SET, n)
4: for every feature f in n do
5: K min ← 0
6: K max ← 100
7: for every pattern p in SET do
8:
9: if SET[ p][ f ] > K max then
10: K max ← SET[ p][ f ]
11: end if
12:
13: if SET[ p][ f ] < K min then
14: K min ← SET[ p][ f ]
15: end if
16:
17: Lower Bound [ f ] = K max
18: U pper Bound [ f ] = K min
19:
20: end for
21: end for
22: return Lower Bound and U pper Bound
23: end procedure
Table 1 Case study 1—Example 1 dataset

1 1 4 (12, 12), (13, 13), (14,14), (15,15)
2 2 2 (16, 16), (17, 17)
Fig. 3 Case study

cluster formation
Fig. 4 Case study

cluster formation
FHB formation. Class 1 FHB is shown in Fig. 3b. In step 6, all the patterns of class
1 are compiled. If more than one pattern are left outside the FHB, steps 1 to 6 are
required to be repeated. In our example, no class 1 pattern is left out. Hence, contin-
ued to follow step 8 and 9 for ensuring the connection between FHB and respective
class node in output layer. After all the possible FHBs for one class along with their
connections with class nodes are done in the last step, the same procedure is repeated
for other classes. In this example, cluster and FHB for class 2 are as shown in Fig. 4a
and b, respectively. Finally, two FHBs, one for class 1 and class 2 each, are created
as shown in Fig. 5b.
Figure 6a and b for class 1 and class 2, respectively, clearly indicates the removal
of excess space after conversion of the cluster into FHB. Removal of excess space
improves training accuracy and certainly reduces overlapping between different
classes. Hence, it further reduces the overall misclassification rate. This in turn cer-
tainly improves the training accuracy.
Fig. 5 Case study

class 2 cluster formation
Fig. 6 Case study

class 2—excess space
removal
Table 2 Case study 2—Example 2—dataset

1 1 8 (1, 9), (1, 5), (1.5, 8.5), (2, 4.5), (1.5, 7),
(0.75, 6), (0.75, 8), (1, 7.5)
2 2 8 (4, 1), (2, 1.5), (2, 3.5), (1, 1), (1.5, 2.5),
(3, 1.5), (0.75, 3), (3.25, 4)
3 3 8 (3, 9), (4, 5), (3.75, 9), (4, 7), (3.75, 8),
(3.7, 6), (3.5, 8.5), (3, 8)
5.1.2 Example 2
In this example, 3 classes and 24 patterns were considered. The pattern features and
class labels are taken from [23] as stated in Table 2. Input patterns and class labels
are kept the same in order to compare the proposed USFH algorithm with maximum
spread fuzzy clustering (MSFC) algorithm [23].
Class 1, class 2, and class 3 patterns are as shown by green, red, and blue colors,
respectively, and its scatter plot is as shown in Fig. 7.
Final results computed by both MSFC and USFH algorithms are given in Table 3.
In addition to it, final clusters formed by MSFC are as shown in Fig. 8a, and FHBs
created by USFH are as shown in Fig. 8b.
Fig. 7 Scatter plot of Table 2: case study 2
Table 3 Case study 2: centroid and radii calculated by MSFC and FHBs dimension calculated by
USFH
Centroid Radius Class Lower bound Upper bound Class

(0.75, 6.0), (1.0, 9.0) 2.61, 1 1 Min point Max point Class
MSFC (1, 9) 0 1 (0.75, 4.5) (2, 9) 1
(4.0, 1.0) 3.8161 2 USFH (0.75, 1) (4, 4) 2
(4.0, 7.0) 2.2361 3 (3, 4.5) (4, 9) 3
From the above comparison, we can conclude that the overlapping region between
inter-class clusters is decreased in the USFH algorithm due to lower and upper
bound calculation for FHB formation. Reduced overlapping region improves training
accuracy of USFH algorithm over MSFC algorithm. To prove this, the performance
of the USFH algorithm is evaluated with other classifiers in the following section.
Fig. 8 Case study 2: cluster formation by MFSC approach and FHBs by UFSH approach
5.2 Case Study 2
In case study 2, the proposed USFH algorithm performance in UFRBFNN is eval-

uated with well-known pattern classification methods. For evaluation purpose, 7
standard datasets from UCI [24] repository were used. The properties of datasets
are as given in Table 4. For a fair comparison, the experimental setup is kept as per
[23]. The k-fold cross-validation with a value of k as 5 is used for evaluation. The
results of USFH are appended with results given in [23] and tabulated in Table 5. The
analysis shows that the classification accuracy of UFRBFNN is higher than all other
listed classifiers. USFH algorithm improves classification accuracy as compared to
other RBF classifiers.
Table 4 The properties of datasets

Dataset Data size Features Classes
1 Hepatitis 155 19 2
2 Heart 270 13 2
3 Liver 345 7 2
4 Ionosphere 351 34 2
5 Monk-3 432 7 2
6 Breast 699 9 2
7 Pima 768 8 2
Table 5 Classification accuracy by five-fold cross-validation on standard dataset

Dataset USFH MSFC RBF RBF-R RBFN RBF-WTA
Hepatitis 95 88.2 65.0 81.9 81.1 82.1
Heart 91 77.0 73.5 81.9 80.5 80.6
Liver 85 69.3 53.8 62.2 62.8 61.0
Ionosphere 98.5 90.0 81.5 95.5 95.2 94.3
Monks-3 94 84.8 97.5 99.0 95.8 68.6
Breast 99 96.3 94.1 96.3 96.4 97.0
Pima 90 76.9 71.0 75.3 72.1 73.8
6 Conclusions and Future Work
The proposed novel UFRBFNN algorithm is useful for pattern classification. It is

a combination of fuzzy neural network and RBFNN classifiers. The result analy-
sis shows that the training accuracy of the algorithm is improved because of the
unbounded spread fuzzy hyperbox method. This method performs better classifica-
tion by minimizing class overlap among different classes and hence decreases pattern
misclassification rate. The algorithm converts clusters into unbounded FHBs due to
which unnecessary inter-class overlap as well as the excess spread of clusters itself
is eliminated. The number of FHBs formed in the hidden layer is independent of the
sequence of patterns applied for training.
The proposed novel algorithm can be further enhanced by using the optimum fea-
ture selection and dimension reduction method. Also, pattern classification accuracy
can be increased by the creation of a more appropriate hyperplane using different
classifiers like k-means and k-medoids algorithm in SET created in UFRBFNN.
References
1. Simpson, P.K.: Fuzzy min-max neural network Part I: Classification. IEEE Trans. Neural Netw.
3, 776–786 (1992)
2. Kulkarni, U.V., Sontakke, T.R.: Fuzzy hypersphere neural network classifier. In: 10th Interna-
tional Conference on Fuzzy Systems, Melbourne, Victoria, Australia, pp. 1559–1562 (2001)
3. Kim, H.J., Ryu, T.W., Nguyen, T.T., Lim, J.S., Gupta, S.: A weighted fuzzy min-max neural
network for pattern classification and feature extraction. In: Lagana, A., Gavrilova, M.L.,
Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) Computational Science and Its Applications
ICCSA 2004 (2005)
4. Ma, D., Liu, J., Wang, Z.: The pattern classification based on fuzzy min-max neural network
with new algorithm. In: Wang, J., Yen, G.G., Polycarpou, M.M. (eds.) Advances in Neural Net-
works ISNN 2012. Lecture Notes in Computer Science, vol. 7368. Springer, Berlin, Heidelberg
(2012)
5. Mohammed, M.F., Lim, C.P.: An enhanced fuzzy min-max neural network for pattern classi-
fication. IEEE Trans. Neural Netw. Learn. Syst. 26(3), 417–429 (2015)
6. Chen, S.: Nonlinear time series modeling and prediction using Gaussian RBF networks with
enhanced clustering and RLS learning. Electron. Lett. 3, 117–118 (1995)
7. Carse, B., Pipe, A.G., Fogarty, T.C., Hill, T.: Evolving radial basis function neural networks
using a genetic algorithm. In: IEEE International Conference on Evolutionary Computation,
Perth, December 1995, pp. 300–305 (1995)
8. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3),
264–323 (1999)
9. Moody, J., Darken, C.J.: Fast learning in networks of locally-tuned processing units. Neural
Comput. 1, 281–294 (1989)
10. Sarimveis, H. Alexandridis, A., Bafas, G.: A fast training algorithm for RBF networks based
on subtractive clustering. Neurocomputing 501–505 (2003)
11. Tsekouras, G.E., Tsimikas, J.: On training RBF neural networks using input-output fuzzy
clustering and particle swarm optimization. Fuzzy Sets Syst. 221, 65–89 (2013)
12. Shie-Jue, L., Chun-Liang, H.: An ART-based construction of RBF networks. IEEE Trans.
Neural Netw. 13, 1308–1321 (2002)
13. Sohn, I., Ansari, N.: Configuring RBF neural networks. Electron. Lett. 34, 684–685 (1998)
14. Wang, D., Zeng, X.J., Keane, J.A.: A clustering algorithm for radial basis function neural
network initialization. Neurocomputing 77, 144–155 (2012)
15. Niros, Antonios D., Tsekouras, George E.: A novel training algorithm for RBF neural network
using a hybrid fuzzy clustering approach. Fuzzy Sets Syst. 193, 62–84 (2012)
16. Feng, H.M.: Self-generation RBFNs using evolutional PSO learning. Neurocomputing 70,
241–251 (2006)
17. Billings, S.A., Zheng, G.L.: Radial basis function network configuration using genetic algo-
rithms. Neural Netw. 8, 877–890 (1995)
18. Li, J., Liu, X.: Melt index prediction by RBF neural network optimized with an adaptive new
ant colony optimization algorithm. J. Appl. Polym. Sci. 119, 3093–3100 (2011)
19. Shen, W., Guo, X., Wu, C., Wu, D.: Forecasting stock indices using radial basis function neural
networks optimized by artificial fish swarm algorithm. Knowl. Based Syst. 24, 378–385 (2011)
20. Rouhani, M., Javan, D.S.: Two fast and accurate heuristic RBF learning rules for data classifi-
cation. Neural Netw. 75, 150–161 (2016)
21. Liu, Y., Huang, H., Huang, T.W., Qian, X.: An improved maximum spread algorithm with
application to complex-valued RBF neural networks. Neural Netw. Neurocomput. 216, 261–
267 (2016)
22. Kumar, A., Sai Prasad, P.S.V.S.: Hybridization of fuzzy min-max neural networks with kNN for
enhanced pattern classification. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Oren, T., Kashyap,
R. (eds.) Advances in Computing and Data Sciences. ICACDS 2019. Communications in
Computer and Information Science, vol. 1045. Springer, Singapore (2019)
23. Kulkarni, A., Bonde, S., Kulkarni, U.: A novel fuzzy clustering algorithm for radial basis
function neural network. Int. J. Future Revol. Comput. Sci. Commun. Eng. 4(4), 751–756
(2018). ISSN: 2454-4248
24. Frank, A., Asuncion, A.: UCI Machine Learning Repository. University of California, School
of Information and Computer Science, Irvine, CA. Neural Network (2010). http://archive.ics.
uci.edu/ml
A Study on the Adaptability of Deep
Learning-Based Polar-Coded NOMA in
Ultra-Reliable Low-Latency
Communications
N. Iswarya, R. Venkateswari, and N. Madhusudanan
Abstract According to ITU-R, 5G wireless communication’s primary goal is

achieving too high data rates in the broadcast region. Polar coding has emerged
as a pivotal channel coding technique for 5G to accomplish the previously men-
tioned goals. Subsequently, the Polar- Coded Non-Orthogonal Multiple Access (PC-
NOMA) is observed as a favorable channel accessing technique for sporadic traffic
of low data rate devices in a 5G Internet of Things (IoT) environment. Deep Learn-
ing algorithms are getting revolutionized in data analysis, Prediction, and decision-
making by employing neural network hierarchy. When these Deep Learning algo-
rithms get incorporated in channel estimation or resource allocation of Polar-coded
NOMA, they appear to be a promising and robust solution for an uncertain channel.
Meanwhile, ultra-reliable low-latency communication, one among the vital 5G use
cases, has tremendous potential applications in the Internet of things generation. Con-
sequently, the challenges of integrating deep learning techniques with PC-NOMA
for URLLC use cases are reviewed, and the adaptability of Deep Learning algorithms
for channel estimation and resource allocation of NOMA are surveyed here.
Keywords Deep learning · DNN · Machine learning · NOMA · Polar codes ·

URLLC
1 Introduction
In recent days, wireless communication systems show swift growth, essentially

demanding extensive data bandwidth to support the currently evolving 5G appli-
cations. The 5th-generation New Radio (5G-NR) are expected to accomplish the
distinct demands such as Enhanced Mobile Broadband (eMBB), massive Machine
N. Iswarya (B) · R. Venkateswari · N. Madhusudanan

PSG College of Technology, Coimbatore, India
e-mail: rvi.ece@psgtech.ac.in
40 N. Iswarya et al.
Type Communications (mMTC), and Ultra-Reliable and Low-Latency Communi-

cations (URLLC). In this mean, channel coding, resource allocation, and channel
estimation become significant for wireless communications. Claude Shannon antic-
ipated in his paper [1] that error-free transmissions can be implicitly achieved over
noisy channels by adopting error correction codes. Yet, practically speaking, a com-
munication system can neither manage highly complex implementations nor endure
the extended transmission delay. Therefore, we need optimized codes that avail Shan-
non’s capacity limit and assure the desired performance targets. Arıkan et al. in [2]
proposed Polar codes that appear to be the assuring channel coding technique for 5G
communication. Polar codes outperform any other coding technique by achieving
Shannon’s capacity limit by affording reduced encoding and decoding complexi-
ties. The authors in [3, 4] have analyzed Polar codes’ overall analysis, presenting
the features and challenges in it. Furthermore, NOMA, a potentially evolving radio
accessing technique, achieves high system capacity and spectral efficiency and suc-
cessively can be utilized for 5G and future radio access schemes. Simultaneously, the
Polar-coded NOMA system [5] has got attention recently because of the multistage
channel polarization.
On the other hand, Channel estimation and Resource allocation play a vital role in
the NOMA system’s performance gains, and sharply varying channel characteristics
impact the complexity and, in turn, degrade the entire NOMA system. Therefore, to
overcome this problem, Deep Learning (DL) is recommended as a capable way to
auto-detect the channel information which is discussed in [6]. Under the 3rd Genera-
tion Partnership Project (3GPP) URLLC requirements, a 32-byte packet transmission
must be 99.999% reliable and of not more than 1 ms latency [7]. An optimization
problem to enhance NOMA’s reliability when utilized in an aspect of grant free access
in one of the URLLC applications of Tactile IoT to benefit low-latency massive access
is formulated in [8].
Therefore, the paper serves the purpose of analyzing Deep Learning algorithms
applied in channel coding for polar codes in Non-Orthogonal Multiple Access aiming
URLLC applications. The rest of the article is organized as follows. In Sects. 2
and 3, the Construction and channel coding of Polar codes are studied. In Sect. 4,
employing Deep Learning algorithms for channel coding and eventually for NOMA
techniques in Sect. 5 are reviewed. Section 6 deals with the adaptability of Deep
Learning techniques in PC-NOMA specifically for URLLC use case, followed by its
implementation challenges. Scope for future research is explored in Sect. 7. Finally,
Sect. 8 concludes this paper.
2 A Study on the Construction of Polar Codes
Polar codes are a subset of linear block error correction codes that can achieve Shan-
non’s channel capacity limit with decreased encoding and decoding complexity. Polar
coding is a process of recursive concatenations of a kernel matrix that transforms
the physical channel into virtual channels. As the iterations are repeated, the virtual
A Study on the Adaptability of Deep Learning … 41
Fig. 1 Construction of polar

codes
channels become either extremely reliable or unreliable, and hence the data bits are
allocated to the most reliable channels. The transformation of physical channels into
virtual channels is called channel polarization. Channel polarization consists of two
phases (i) channel combining phase and (ii) channel splitting phase. After polariza-
tion, reliable channels will have a maximum data rate over a channel, and unreliable
channels will have a reduced data rate on a given channel. The number of codewords
in polar codes is represented as N, which are the powers of 2, i.e., 2 N with K number
of highly reliable channels being assigned to data bits.
Figure 1 shows the construction of polar codes of two channels, u1 and u2 , to gen-
erate the codeword transmitted on a channel. The codeword consists of information
bits (I), and frozen bits (F), and a generator
matrix (G N ) to generate I and F. G N uses
nth Kronecker product as G N = G2 n where

10
GN =
11
A codeword C of N th vector can be written as
C N = ui G N (1)
where ui is the input vector with ui = u 1 , u 2 , . . . , u N , and it consists of I and F.

Channel polarization is an operation of constructing another second set of chan-
nels N from the independent copies of W channels {W (i) N: 1 ≤ i ≤ N}. As N
increases, the symmetric capacity terms {I(W (i) N)} become 0 or 1 for all but a van-
ishing fraction of indices i. Channel polarization consists of two phases: (i) channel
combining phase and (ii) channel splitting phase.
3 A Study on the Channel Coding of Polar Codes
Arican in [2] depicted polar coding and decoding. The codeword C N is generated by
multiplying the ui vector with the G N matrix from Eq. (1). The generated codeword
is then channel broadcast. At the receiver end, y N is acknowledged and decoded. In
[2, 9], the Successive Cancellation (SC) decoding method is used. Log-Likelihood
Ratio (LLR) values are calculated, and the output vector (uˆ1 ) is generated. Initially,
LLR soft-decisions are used successively from uˆ1 to uˆN and later, hard decisions are
applied for deciding between 0 and 1. The SC decoding method performed well as
N; the codeword length tends to infinity and system complexity in the order of 0 (N
log N). However, the SC approach achieves high channel capacity; error-correction
capability detriments at short codeword lengths. In [10], to overcome the drawback of
error correction in SC decoding, the authors designed Successive Cancellation List
(SCL) decoding that triggers L number of paths. Consecutively, the system com-
plexity also increases in order 0 (LN log N). SCL performed well with limited code
lengths but failed when the correct codeword is displaced in the chosen path. From
the simulation results of [10], it is evident that BLER fades away with the increase of
list size, but the complexity remains increasing. The authors in [11] approached with
Successive Cancellation Flip (SC-Flip) decoding method to enhance SC by maximiz-
ing its error correction efficiency. However, the adoption of the selection and sorting
process in SC-Flip affects inexpensive execution complexity. Therefore, to better use
SC-Flip algorithms for low code rates, two techniques were suggested [12]: Fixed
Index Selection (FIS) scheme and Enhanced Index Selection (EIS) scheme to cir-
cumvent the execution cost and boost the error correction capability. Adaptive SCL
decoding in [13] aims at improving the throughput of the decoder, thereby reducing
the system complexity. Adaptive SCL was a combination of SC, SCL, and SCL-
CRC (Cyclic Redundancy Codes). Noticeable decoding methods have been found
in the literature to enhance channel coding of polar codes in achieving low latency
and ease of hardware complexity. Machine learning (ML), an Artificial Intelligence
application, is booming in the data science field. Utilizing these ML algorithms in
channel coding for wireless communications suits to be an attentive solution for a
channel coding problem.
4 A Study on Deep Learning Algorithms for Channel

Coding
The information and yielding vectors of a DNN are coherent with the channel coding,
and hence this coherent behavior found a beneficial way to acquire DNN in channel
decoding [14]. The favorable aids of channel decoding based on Deep Learning are
not reiterative and low latent but are bounded by scalability limitations concerning
block lengths, named “curse of dimensionality.” Admitting the fact that shorter code-
word lengths are efficiently trained, the polar encoding graphs are partitioned into
multiple sub-graphs; ML algorithms in [15] eventually take over polar decoding.
Analogous to the past work, the authors in [16] constructed an Neural Successive
Cancellation (NSC) decoding, comprising several Neural Network Decoders com-
bined with SC. A modified neural belief propagation method is proposed in [17] and
the estimations on decoding reliability are presented to accomplish better decoding
outcomes. Most of the literature has considered the noise environment as Gaussian
White noise, whereas practical communication systems would have noise correla-
tion, due to filtering and oversampling. However, adopting conventional methods to
process the colored noise leads to computational complexity. Integrating an iterative
belief propagation (BP) algorithm with a convolutional neural network for an LDPC
decoder comprehended under a colored noise model. They proved in [18] CNN as
a potential solution by extracting noise correlation as a feature for CNN. A similar
combination of CNN and BP accounting BER and latency is proposed in [19]. The
DL schemes in channel coding of polar codes have got the light in communication
systems though there was a shortfall in two aspects: it can be applied solely for
shortcodes and the BP decoding scheme [20–24]. Artificial Neural Network (ANN)
for channel coding has started emerging [22]. Rather than considering error detec-
tion, the capability of error correction is considered by designing a table based error
correction decoding framework in [25, 26]. An optimization technique for the BP
algorithm’s weighting scheme to accelerate the training convergence is intended [27].
In the paper [28], the neural offset Min-Sum [MS], using the Machine Learning tech-
niques that offered enhanced performance than other BP decoders applied for BCH
codes, is suggested. Hence, in the paper [29], authors have considered MS decoders
for polar codes were exploiting both scaling and offset. Advancements in ML have
motivated a Multi-Flips SC decoding scheme in [30] based on ML in achieving low
latency than the previous flip-successive cancellation decoding methods. Free Space
Optical (FSO) communications combined with Polar codes are considered for apply-
ing DL techniques in [31, 32], resulting in a higher convergence rate and better BER
performance outperforming the standard LLR input. The authors in [33] came up
with sparse training of neural networks employed to channel polar codes for attaining
high BER. Although the authors utilize the partitionable property of polar codes, the
NND lacks the generalization capability itself and remains an unsolvable problem.
5 A Study on DL for Non-orthogonal Multiple Access

in URLLC Use Case
The 5th generation (5G) networks are availed for achieving essential services in
aiming drastically increased capacity, endeavoring Internet of Things (IoT) by con-
necting abundant of intelligent devices and the capability to subsidize highly reliable;
mission-critical implementations are portrayed in Fig. 2.
Advancements in Artificial Intelligence have induced authors to employ
deep learning schemes that beneficially impact NOMA techniques [34]. The high-
performance 5G systems can be realized by effectuating NOMA combined with
MIMO [35], mMIMO [36], mmWave [37, 38] technologies. A communication archi-
tecture for a Tactile Internet application based on NOMA allowing non-orthogonal
resource sharing is discussed in [39]. The role of these 5G generic services is ana-
lyzed, and the suitability of NOMA-based Tactile Internet is explored for applica-
Fig. 2 Use cases of 5G
tion in real-time applications. Sporadic data traffic is generated by smart devices

operating in an IoT environment in which synchronization overheads and control
functions become a hostile problem in low-latency applications. An identical con-
tent transmission over NOMA (ICToNOMA) is proposed for transmitters in [40].
They cooperatively combine similar messages and then transmit over successive
data packets to enhance the channel’s reliability by improving the power and spec-
tral efficacy. In consideration of a URLLC scenario accompanied by Finite Block
Length (FBL) subjected to quasi-static Rayleigh fading channel, a NOMA scheme
joined with Wireless Power Transfer (WPT) is employed in [41] to reduce the latency
and enhance the reliability. The communications in URLLC services are divided into
two: (i) latency-critical and (ii) latency-tolerant communications. On adopting grant-
free (GF) access for latency-critical and grant-based (GB) access for latency-tolerant
applications, an orthogonal frequency division multiplexing with index modulation
(OFDM-IM)-based NOMA is proposed in [42] to obtain 99.99% success probabil-
ity. Correspondingly, Deep Learning-based autoencoders [42–44] are in ascension
shortly, focusing on uncertain channel conditions. A summary of various literature
on Deep Learning techniques for polar decoding and NOMA channel accessing
schemes is indexed in Table 1. Successive Cancellation and Successive Cancellation
List are proposed as channel decoding techniques for polar codes. Recently, involving
machine learning algorithms in channel decoding appears to be a promising solution
for reduced BER decoding.
Table 1 A table on deep learning techniques applied in PC-NOMA for URLLC scenarios
Topic References Inference
Channel encoding and decoding of [9–13, 15, 16] Channel decoding techniques like
polar codes SC, SCL, and SCL-CRC decoding
techniques are discussed. Latency
and hardware complexity remain a
constraint for the design of polar
codes for 5G-NR
Deep learning algorithms for [17–22] Deep learning techniques are
decoding of polar codes applied in polar decoding to reduce
the decoder complexity and
enhance latency
DL-based error-correcting codes [23, 24] Error correction codes are
considered based on DL algorithms
Convolutional and deep neural [18, 19, 24, 31, 32] CNN, belief propagation
network for polar decoding (BP-CNN), and DNN schemes are
applied for polar decoding
Polar-coded non-orthogonal [5] PC-NOMA framework is designed
multiple access (PC-NOMA) to jointly optimize polar coding,
modulation, and transmission
DL based on NOMA and its [6, 34–38] Deep learning techniques for
advancements of MIMO, NOMA schemes are proposed for
mMIMO, mmWave efficient channel estimation in
imperfect CSI and uncertainties
Use cases of 5G NR with NOMA [7, 39] The main objectives of 5G NR are
to reduce the latency and to
accommodate billions of smart
devices. NOMA is employed for
Tactile Internet and URLLC
applications for 5G radio access
Performance of NOMA when [41, 45] NOMA applied for URLLC use
accompanied by URLLC cases with index modulation
schemes in a cooperative
environment are explained
6 A Study on DL-Based PC-NOMA Applied in URLLC

Scenario
Enabling Artificial Intelligence for 5G and beyond entails highly robust, well-
performing, and low complex technicalities [46]. PD-NOMA, a variant of NOMA
in two user scenarios, involving the URLLC environment, a critical use case of 5G,
is shown in Fig. 3, to reduce latency and improve reliability in the communication
system. The problem of employing PC-NOMA in URLLC applications remains a
challenging one. The chances of involving deep learning for channel estimation in
NOMA in URLLC applications seems to be an eye-opening solution for reliability
and latency complications.
Fig. 3 NOMA applied in URLLC application
7 Challenges and Future Research Scope
NOMA schemes seem to be an encouraging channel accessing solution for 5G-NR.

Through polarization of NOMA, performance improvement is inevitable. Channel
estimation during continuously varying channel conditions and uncertainties remains
to be an unsolvable problem. Employing deep learning techniques in channel estima-
tion looks to be an attentive solution for such challenges. Subsequently, considering
the rapid progress of 5G and its extremely reliable requirements, zero-latency com-
munication evolves into a significant research challenge paving the way for availing
Artificial Intelligence in wireless communications.
8 Conclusion
The machine learning techniques for Polar decoding aiming at channel coding in the
5G-NR network are surveyed. Deep learning techniques involving neural schemes
appear to be beneficial in channel decoding of polar codes. Applying the machine
learning algorithms in channel estimation for Non-Orthogonal Multiple Access
schemes proposed in very few reported works seems to be an encouraging solu-
tion during channel uncertainties. Involving deep learning algorithms in NOMA
also enhances the reliability and reduces the latency for critical enabling use cases of
5G. Accounting for those significant use cases of 5G, low latent Deep learning-based
NOMA schemes can be incorporated to implement the applications mentioned above.
The suitability of employing deep learning polar codes for NOMA is addressed as a
prominent solution for 5G and beyond use cases.
References
1. Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–
656. Math. Rev. (MathSciNet) MR10, 133e (1948)
2. Arikan, E.: Channel polarization: a method for constructing capacity-achieving codes for sym-
metric binary-input memoryless channels. IEEE Trans. Inf. Theory 55(7), 3051–3073 (2009)
3. Bioglio, V., Condo, C., Land, I.: Design of polar codes in 5g new radio. arXiv preprint (2018).
arXiv:1804.04389
4. Babar, Z., et al.: Polar codes and their quantum-domain counterparts. In: IEEE Commun. Surv.
Tutor. 22(1), 123–155. Firstquarter (2020). https://doi.org/10.1109/COMST.2019.2937923
5. Dai, J., Niu, K., Si, Z., Dong, C., Lin, J.: Polar-coded non-orthogonal multiple access. IEEE
Trans. Signal Process. 66(5), 1374–1389 (2018). https://doi.org/10.1109/TSP.2017.2786273
6. Gui, G., Huang, H., Song, Y., Sari, H.: Deep learning for an effective nonorthogonal multiple
access scheme. IEEE Trans. Veh. Technol. 67(9), 8440–8450 (2018). https://doi.org/10.1109/
TVT.2018.2848294
7. Sutton, G.J., et al.: Enabling technologies for ultra-reliable and low latency communications:
from PHY and MAC layer perspectives. IEEE Commun. Surv. Tutor. 21(3), 2488–2524.
Thirdquarter (2019). https://doi.org/10.1109/COMST.2019.2897800
8. Zhang, M., Lou, M., Zhou, H., Zhang, Y., Liu, M., Zhong, Z.: Non-orthogonal coded access
based uplink grant-free transmission for URLLC. In: IEEE/CIC International Conference on
Communications in China (ICCC), Changchun, China, pp. 624–629 (2019). https://doi.org/
10.1109/ICCChina.2019.885590
9. Alamdar-Yazdi, A., Kschischang, F.R.: A simplified successive-cancellation decoder for polar
codes. IEEE Commun. Lett. 15(12), 1378–1380 (2011). https://doi.org/10.1109/LCOMM.
2011.101811.111480
10. Tal, I., Vardy, A.: List decoding of polar codes. IEEE Trans. Inf. Theory 61(5), 2213–2226
(2015)
11. Afisiadis, O., Balatsoukas-Stimming, A., Burg, A.: A low-complexity improved successive
cancellation decoder for polar codes. In: Asilomar Conference on Signals, Systems and Com-
puters, pp. 2116–2120 (2014)
12. Condo, C., Ercan, F., Gross, W.J.: Improved successive cancellation flip decoding of polar
codes based on error distribution. arXiv preprint (2017). arXiv:1711.11096)
13. Li, B., Shen, H., Tse, D.: An adaptive successive cancellation list decoder for polar codes with
cyclic redundancy check. IEEE Commun. Lett. 16(12), 2044–2047 (2012)
14. Xu, S., Luo, F.-L.: Machine Learning for Future Wireless Communications, 1st edn. Wiley
(2020)
15. Cammerer, S., Gruber, T., Hoydis, J., ten Brink, S.: Scaling deep learning-based decoding of
polar codes via partitioning. In: IEEE Global Communications Conference, Singapore (2017).
https://doi.org/10.1109/GLOCOM.2017.8254811
16. Doan, N., Ali Hashemi, S., Gross, W.J.: Neural successive cancellation decoding of polar
codes. In: IEEE 19th International Workshop on Signal Processing Advances in Wireless
Communications (SPAWC), Kalamata, pp. 1–5 (2018). https://doi.org/10.1109/SPAWC.2018.
8445986
17. Yuan, C., Wu, C., Cheng, D., Yang, Y.: Deep learning in encoding and decoding of polar codes.
J. Phys. Conf. Ser. 1060(1), 012021. IOP Publishing (2018)
18. Liang, F., Shen, C., Wu, F.: An iterative BP-CNN architecture for channel decoding. IEEE J.
Sel. Top. Sig. Process. 12(1), 144–159 (2018)
19. Wen, C., Xiong, J., Gui, L., et al.: A novel decoding scheme for polar code using convolutional
neural network. In: 2019 IEEE International Symposium on Broadband Multimedia Systems
and Broadcasting (BMSB). IEEE, Jeju, Korea (South), pp. 1–5 (2019)
20. Gruber, T., Cammerer, S., Hoydis, J., ten Brink, S.: On deep learning-based channel decoding.
In: Annual Conference on Information Sciences and Systems (CISS), pp. 1–6 (2017
21. Nachmani, E., et al.: Learning to decode linear codes using deep learning. In: 54th Annual
Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL,
pp. 341–346 (2016)
22. Song, X., Zhang, Z., Wang, J., Qin, K.: A graph-neural-network decoder with MLP-based
processing cells for polar codes. In: 2019 11th International Conference on Wireless Commu-
nications and Signal Processing (WCSP). IEEE, Xi’an, China, pp. 1–6 (2019)
23. Xu, W., Wu, Z., Ueng, Y.-L., You, X., Zhang, C.: Improved polar decoder based on deep
learning. In: IEEE International Workshop on Signal Processing Systems (SiPS), pp. 1–6
(2017)
24. Lyu, W., Zhang, Z., Jiao, C., Qin, K., Zhang, H.: Performance evaluation of channel decoding
with deep neural networks. IEEE International Conference on Communication (ICC), pp. 1–6
(2018)
25. Liu, X., Wu, S., Wang, Y., et al.: Exploiting error-correction-CRC for polar SCL decoding: a
deep learning-based approach. IEEE Trans. Cogn. Commun. Netw. 6, 817–828 (2020). https://
doi.org/10.1109/TCCN.2019.2946358
26. Wang, J., Li, J., Huang, H., Wang, H.: Fine-grained recognition of error correcting codes based
on 1-D convolutional neural network. Dig. Sig. Process. 99, 102668 (2020). https://doi.org/10.
1016/j.dsp.2020.102668
27. Gao, J., Niu, K., Dong, C.: Exploiting error-correction-CRC for polar SCL decoding. IEEE
Access 8, 27210–27217 (2020)
28. Lugosch, L., Gross, W.J.: Neural offset min-sum decoding. In: 2017 IEEE International Sym-
posium on Information Theory (ISIT) (2017). https://doi.org/10.1016/j.dsp.2020.102668
29. Dai, B., Liu, R., Yan, Z.: New min-sum decoders based on deep learning for polar codes. In:
IEEE International Workshop on Signal Processing Systems (SiPS), Cape Town, pp. 252–257
(2018). https://doi.org/10.1109/SiPS.2018.8598384
30. He, B., Wu, S., Deng, Y., Yin, H., Jiao, J., Zhang, Q.: A machine learning based multi-flips
successive cancellation decoding scheme of polar codes. In: IEEE 91st Vehicular Technology
Conference (VTC2020-Spring) 2020, pp. 1–5 (2020)
31. Fang, J., Bi, M., Xiao, S., et al.: Neural network decoder of polar codes with tanh-based modified
LLR over FSO turbulence channel. Opt. Express 28, 1679 (2020). https://doi.org/10.1364/OE.
384572
32. Fang, J., et al.: Neural successive cancellation polar decoder with Tanh-based modified LLR
over FSO turbulence channel. IEEE Photon. J. 12(6), 1–10. Art no. 7906110 (2020). https://
doi.org/10.1109/JPHOT.2020.3030618
33. Xu, W., You, X., Zhang, C., Be’ery, Y.: Polar decoding on sparse graphs with deep learning.
In: 52nd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA,
pp. 599–603 (2018). https://doi.org/10.1109/ACSSC.2018.8645372
34. Narengerile, Thompson, J.: Deep learning for signal detection in non-orthogonal multiple
access wireless systems. UK/China Emerging Technologies (UCET), Glasgow, United King-
dom, pp. 1–4 (2019). https://doi.org/10.1109/UCET.2019.8881888
35. Kang, J.-M., Kim, I.-M., Chun, C.-J.: Deep learning-based MIMO-NOMA with imperfect
SIC decoding. IEEE Syst. J. 14(3), 3414–3417 (2020). https://doi.org/10.1109/JSYST.2019.
2937463
36. Boloursaz Mashhadi, M., Gündüz, D.: Deep learning for massive MIMO channel state acquisi-
tion and feedback. J. Indian Inst. Sci. 100(2), 369–382 (2020). https://doi.org/10.1007/s41745-
020-00169-2
37. Cui, J., Ding, Z., Fan, P.: The application of machine learning in mmWave-NOMA systems.
In: 2018 IEEE 87th Vehicular Technology Conference (VTC Spring). IEEE, Porto, pp. 1–6
(2018)
38. Cui, J., Ding, Z., Fan, P., Al-Dhahir, N.: Unsupervised machine learning-based user clustering
in millimeter-wave-NOMA systems. IEEE Trans. Wirel. Commun. 17(11), 7425–7440 (2018).
https://doi.org/10.1109/TWC.2018.2867180
39. Budhiraja, I., Tyagi, S., Tanwar, S., Kumar, N., Rodrigues, J.J.P.C.: Tactile internet for smart
communities in 5G: an insight for NOMA-based solutions. IEEE Trans. Ind. Inform. 15(5),
3104–3112 (2019). https://doi.org/10.1109/TII.2019.2892763
40. Ahmad Khan Beigi, N., Soleymani, M.R.: Ultra-reliable energy-efficient cooperative scheme
in asynchronous NOMA with correlated sources. IEEE Internet Things J. 6(5), 7849–7863
(2019). https://doi.org/10.1109/JIOT.2019.2911434
41. Wang, Z., Lv, T., Lin, Z., Zeng, J., Mathiopoulos, P.T.: Outage performance of URLLC NOMA
systems with wireless power transfer. IEEE Wirel. Commun. Lett. 9(3), 380–384 (2020). https://
doi.org/10.1109/LWC.2019.2956536
42. Chen, X., Cheng, J., Zhang, Z., Wu, L., Dang, J., Wang, J.: Data-rate driven transmission
strategies for deep learning-based communication systems. IEEE Trans. Commun. 68(4), 2129–
2142 (2020). https://doi.org/10.1109/TCOMM.2020.2968314
43. Shlezinger, N., Farsad, N., Eldar, Y.C., Goldsmith, A.J.: ViterbiNet: a deep learning based
Viterbi algorithm for symbol detection. IEEE Trans. Wirel. Commun. 19(5), 3319–3331 (2020).
https://doi.org/10.1109/TWC.2020.2972352
44. Besser, K.-L., Lin, P.-H., Janda, C.R., Jorswieck, E.A.: Wiretap code design by neural network
autoencoders. IEEE Trans. Inf. Forensic Secur. 15, 3374–3386 (2020). https://doi.org/10.1109/
TIFS.2019.2945619
45. Doğan, S., Tusha, A., Arslan, H.: NOMA with index modulation for uplink URLLC through
grant-free access. IEEE J. Sel. Top. Sig. Process. 13(6), 1249–1257 (2019). https://doi.org/10.
1109/JSTSP.2019.2913981
46. Shafin, R., Liu, L., Chandrasekhar, V., Chen, H., Reed, J., Zhang, J.C.: Artificial intelligence-
enabled cellular networks: a critical path to beyond-5G and 6G. IEEE Wirel. Commun. 27(2),
212–217 (2020). https://doi.org/10.1109/MWC.001.1900323
Heart Rate Variability-Based Mental
Stress Detection Using Deep Learning
Approach
Ramyashri B. Ramteke and Vijaya R. Thool
Abstract Health problems are rising with today’s stressful life, as it promotes car-
diac diseases, depression, violence, and may provoke suicide. Hence, it is essential
to develop a computer-aided diagnosis system to identify relaxed versus stressed
individuals and their correct classification. Heart rate variability (HRV) based on
RR interval is a well-proven clinical and diagnostic tool strongly associated with
the autonomic nervous system (ANS). In this study, a conventional method was
compared with a deep learning-based method. In the Conventional method, features
were extracted from various domains, and these features were fed to a classifier to
detect stressed states. However, this method uses hand-crafted features, and hence,
there is a possibility of missed high potential features that may be responsible for
maximizing the classifier’s generalization performance. This work presents a new
approach motivated by the long short-term memory network (LSTM) in sequence
learning to generate a concrete decision about the signal category. We proposed
deep learning-based Inception-LSTM network to improve performance and to reduce
computational cost. Two different stress datasets, viz., self-generated stress data and
Physionet driver stress data were used to perform the proposed method’s perfor-
mance analysis. The presented Inception-LSTM architecture outperforms existing
literature methods, achieving an accuracy of 93% for self-generated stress data and
97.19% for driver stress data.
Keywords Mental stress · RR interval · Heart rate variability · Conventional

method · Deep learning
1 Introduction
Humans face various health problems out of which stress is a primary and critical
issue in day-to-day life. Stress can be studied as a physiological and psychological
response of the body, and it occurs due to workload, social media, some personal
R. B. Ramteke (B) · V. R. Thool

Biomedical Instrumentation Lab., Shri Guru Gobind Singhji Institute of Engineering &
Technology, Nanded, Maharashtra, India
e-mail: vrthool@sggs.ac.in
52 R. B. Ramteke and V. R. Thool
problems, etc. According to the Center for Disease Control (National Institute on
Occupational Safety and Health), the cause of life stress is the workplace. They also
reported that 110 million people die every year because of stress (7 persons every
2 s) [18]. Stress categorizes into acute (short term) and chronic (long term). Chronic
stress releases the cortisol stress hormone, which may give birth to bad habits such
as bad eating, smoking, drug-taking. It also increases the health risk by reducing
healing power, immunity, rise in blood pressure (BP), effects on the brain, heart
attack, stroke, violence, suicide, and even causes cancer [18]. Even though acute
stress is treatable, frequent occurrence of it may lead to chronic stress, as chronic
stress develops gradually. In general, stress is a prevalent and severe problem in the
twenty-first century, and thus, early detection and accurate classification become a
need of human beings. It motivates us to focus on the detection of stress in everyday
human life.
Motivated by the significance and computations of mental stress, numerous studies
have been reported in this field. Existing methods for recognizing stress are based
on the ECG-derived study of heart rate variability (HRV) [10, 13, 16]. Delaney et al.
[3] investigated HRV-based cardiovascular reactivity with short-term psychological
stress. The stress dataset was created using a 5 min Stroop Word Color Conflict
Test. The research was performed in the time domain and the frequency domain.
A significant reduction was found in the standard deviation of the RR intervals and
the high-frequency HRV component, whereas the low-frequency HRV component
raised. Anne et al. [12] found that HRV is a reliable index for assessing long-term
stress effects during surgery, helping surgeons to distinguish high stress-inducing
operating techniques. In [17], the authors implemented a k-nearest neighbor (KNN)
classifier for stress detection with HRV feature-based transformation algorithm. The
algorithm involves feature generation, selection, and dimension reduction for robust
feature generation. They used the Physionet driver database to conduct the analysis.
Notable literature is available for stress assessment based on manual feature gen-
eration and traditional classification approaches. HRV-based stress detection adopts
a considerable window length of the signal, usually in minutes [2]. However, acute
mental stress detection requires the decision made in a short window [5]. The per-
formance of conventional HRV-based methods can be constrained with decreased
window length. Intelligent and reliable deep learning may be a potential solution
that has been emerged swiftly and achieved impressive performance on image clas-
sification and sequence learning tasks. A few researchers have already formulated
an ECG-based stress detection problem using deep learning [5, 8, 14]. Rastgoo et
al. [11] proposed a multimodal fusion CNN-LSTM driver stress detection network
that integrates vehicle data, contextual data, and ECG signal data to enhance the
classification results with remarkable accuracy of 92.8%. In [4], the authors sug-
gested a similar method as that of [11] to automatically recognize diversified pilot
mental states. The recent deep learning studies on stress detection uses multimodal
input, along with that the authors adopting heavy network parameters of fusion-based
models that are computationally complex.
This work presents an approach to utilize the HRV parameter for the evaluation of
mental stress. The conventional method and deep learning method were employed
Heart Rate Variability-Based Mental Stress Detection Using Deep Learning Approach 53
for the performance comparison. The conventional method offers prominent features
that allow the classifier to generate optimal results. The proposed deep learning study
combines the Inception module and LSTM network with minimum learnable param-
eters. This is the first attempt to use the LSTM-based Inception network for stress
recognition to the best of our knowledge. The contributions to the proposed work are
as follows: (1) Generation of a database, 180 recordings of a stressed category (data
was collected during the viva-voce engineering examination of students), and 179
recordings of a normal category (data was collected during the regular college rou-
tine); (2) Filters were designed for the noise removal from generated ECG data; (3)
Stress was evaluated using HRV parameter in different domains and highly effective
eight features were picked; (4) SVM and ANN classifiers with our specified hyperpa-
rameter values based on the trial and error technique have been implemented in such
a way that the most favorable classification rule was developed; (5) The inception-
LSTM architecture was designed for the enhancement of classification results.
The rest of the paper is organized into three major sections. Section 2 briefs about
the proposed methodology in which database and methods are explained, Sect. 3
presents the analysis of experimental results and discussions, and Sect. 4 concludes
the proposed approach.
2 Proposed Methodology
The proposed method initiates a strategy to detect mental stress and classify whether
the subject is stressed or relaxed. In this study, the conventional method and the
deep learning-based method have been implemented as shown in Fig. 1. Firstly,
an electrocardiogram (ECG) signal database was acquired and preprocessed using
designed filters explained in the preprocessing part of Sect. 2.2. The RR interval data
was extracted from the preprocessed ECG signal. The conventional method follows
the HRV analysis in which features were extracted in various domains and then fed
as an input to the classifier. The deep learning-based method is a fusion of a single
Inception module [15] that is inspired by GoogleNet due to its ability to produce a
robust performance and bidirectional LSTM network [6], as it learns the RR interval
sequence step-by-step in a forward and backward manner to detect stressed state.
2.1 ECG Dataset
In this paper, two databases were used for stress detection using HRV. The first dataset
is self-generated ECG data at Shri Guru Gobind Singhji Institute of Engineering and
Technology, Nanded, Maharashtra, India. It contains 359 recordings sampled at a
frequency of 360 Hz. The data were acquired from 180 students during the practical
viva-voce semester examination as stressed, and during the regular college routine
as normal. Out of 359 recordings, 180 are stressed, and 179 are normal ones. The
Fig. 1 Block diagram of the proposed method for stress detection
short segment ECG of 3 min has been acquired with the help of surface electrodes
with lead-I configuration using the BIOPAC system (MP150) from the Biomedical
Instrumentation Lab. of the Instrumentation Department.
The publicly available Physionet driver stress ECG database (drivedb) and the
normal sinus rhythm ECG database (nsrdb) are the second dataset used in this analysis
[19]. Both Physionet datasets consist of 18 recordings of 1 h each, so there were only
a total of 36 samples. Hence to increase the dataset, we segmented the 1 h recording
into 3 min. There are 1920 samples, i.e., 1080 are stressed (drivedb) samples, and
840 are relaxed (nsrdb) samples after segmentation.
2.2 Heart Rate Variability (HRV) Analysis
HRV signal is obtained from the RR interval, where each RR interval represents a
point in a graph known as a tachogram [1, 2]. R peak detection is the first step of
acquiring the HRV signal.
2.2.1 Preprocessing
The self-generated ECG signal was corrupted by powerline interference at 50 Hz,

which lead to an erroneous prediction. It can be eliminated using a Butterworth
band-stop filter of order 5 with cut-off frequencies at around 47 and 55 Hz has been
designed to kill the noise at 50 Hz effectively.
The sampling frequency of both standard Physionet datasets was different. Hence,
both the datasets get resampled at a frequency rate of 360 Hz. Afterward, these
samples were filtered using the same filters as mentioned above.
2.2.2 R Peak Detection
The Pan-Tompkins algorithm [9] was used to find R peaks. The bandpass filter
with low cut-off frequencies of 5 and 11 Hz was used to detect the QRS complex.
The algorithm goes through differentiation, squaring followed by a sliding window
integration with a window size of 150 ms to make all signal data positive and to
amplify the high-frequency data to get valuable information. Two thresholds (high
threshold and low threshold) are adjusted to make the decision. The peak is labeled
as a signal peak if it crosses a high threshold (and low threshold in case of lost
peak); otherwise, the peak is labeled as a noise peak. This algorithm provides good
efficiency for the detection of R peak. The RR intervals were estimated using the
time duration between two adjacent R peaks. Figure 2 shows filtered ECG and HRV
signal.
Fig. 2 a Filtered ECG, b Relaxed tachogram, and c Stressed tachogram

2.3 Feature Extraction
Feature extraction improves the performance of the classifier. Two methods have
been used for feature extraction to recognize mental stress.
I. Conventional Method
II. Deep Learning-based Method.
2.3.1 Conventional Method
In this work, statistical analysis (time domain) [2], Fourier analysis (frequency
domain) [2], and nonlinear analysis (Poincare plot) [7] were carried out to extract
features from the RR intervals. The extracted features are shown in Table 1.
Two classifiers were used, i.e., support vector machine (SVM) and artificial neural
network (ANN), to predict stressed and normal states. In this study, the classification
task has been accomplished using 8 potential features.
SVM: The performance of SVM depends on regularization parameter C and
kernel parameters. The radial basis function (RBF) kernel was used, and it is defined
as
Table 1 Extracted feature values from different domains for the relaxed and stressed state
Variable Description Relaxed values Stressed values
(mean ± SD) (mean ± SD)
Time domain features
sdrr (ms) The standard deviation of RR 129 ± 20 98 ± 13
interval
rmssd (ms) RMS value of the difference 58 ± 12 23 ± 9
between adjacent RR intervals
Frequency domain features
LF (ms2 ) Power in the low frequency band 202 ± 49 400 ± 134
[0.003–0.04 Hz]
HF (ms2 ) Power in the high frequency band 328 ± 82 200 ± 60
[0.04–0.15 Hz]
LF/HF ratio 0.6 ± 1.5 2 ± 3.2
Poincare plot features
SD1 (ms) The standard deviation of 54 ± 8 39 ± 6
short-term variability of RR
intervals
SD2 (ms) The standard deviation of 118 ± 11 90 ± 9
long-term variability of RR
intervals
SD1/SD2 ratio 0.4 ± 0.2 0.2 ± 0.2
2
f = (z, z ) = ex p(−α z − z ) (1)
where α = 2σ1 and σ is called a free parameter. Here, C = 2 × 101 and σ = 2 or

α = 0.25 with least CV error 0.0607 chosen.
ANN: Multi-Layer Perceptron with a single hidden layer was used for the clas-
sifications of a stressed subject. The input layer consists of 8 neurons as there are
eight features, the hidden layer consists of 50 neurons, and the output layer has 2
neurons, as it they are two class classification problems. The activation function used
was Sigmoid. The model was trained using gradient descent, having momentum and
an adaptive learning rate. This momentum parameter helps the classifier to reach
faster toward the minimum. The network updates the values of weights and biases
according to gradient descent including momentum (m) and an adaptive learning rate
(lr ) which is given as
d(C E)
dw = m × dw pr ev + lr × m (2)
dw
The previous change in weight or bias is denoted by dw pr ev . The term C E is the
binary cross-entropy used to estimate the performance of the network. The hyper-
parameter values are chosen such that the network produces the least error. For the
proposed work, the learning rate (lr ) is 0.001 with a 0.89 momentum value (m).
2.3.2 Deep Learning-Based Method
In this work, an LSTM network is used to incorporate with the Inception module,
as shown in Fig. 1. The LSTM networks have received remarkable results in the
prediction of time-series signals such as RR interval signals. Single Inception-LSTM
module was used, and the detailed structure of the proposed network is shown in
Fig. 1 (see zoom portion of Inception-LSTM module or elliptical circle). The LSTM
act as a feature extractor with a many-to-many structure. The proposed Inception-
LSTM-based approach optimizes the binary cross-entropy loss by Adam optimizer
for classification. If t is the actual output and q is the predicted output, then the loss
function is defined as
L(t, q) = − log Pr (t/q) = −(t log(q) + (1 − t) log(1 − q)) (3)
The hidden units of the bidirectional LSTM were set to 5. For preventing overfitting,
a dropout layer with a drop rate of 0.4 was used. There are only 39,942 learnable
parameters, and hence, this network is computationally cheap. The training process
gets terminated after 25 epochs with a fixed global learning rate of 0.01 and a batch
size of 150. The fully connected layer’s weight and bias learning rate were kept 5
times the global learning rate. All the experiments were implemented on the system
configured with 2GB NVIDIA GeForce MX230 GPU using software MATLAB
R2020a.
The ultimate objective of the research work is to detect severe stress. Each dataset
was split in a 65:15:20 ratio for training, validation, and testing of the proposed
network to achieve the objective.
The overall results comprised the classification of stress and relaxation conditions
of humans. Initially, HRV-based conventional methods have been implemented for
classification. The time-domain features, frequency-domain features, and Poincare
plot features were computed for HRV analysis (see Table 1). Afterward, SVM and
ANN classifiers were trained on extracted features for each dataset to detect mental
stress. The ANN classifier has better accuracy than the SVM classifier shown in
Tables 2 and 3, because the momentum parameter used in ANN helps the classi-
fier reach faster toward the minimum, and the adaptive learning rate converges the
optimization process.
The deep learning-based method was utilized to improve the classification accu-
racy further. To evaluate the performance of the deep learning-based model, two
approaches are proposed. In the first approach, only the Inception module was trained,
while in the second approach, the Inception-LSTM network was used, which consid-
erably upgraded the performance. The result analysis shows the substantial enhance-
ment of classification using the deep learning-based method over the conventional
method. The overall classification results were intensified by replacing the Inception
module with the Inception-LSTM network for learning the sequential features in
both forward and backward directions, which signifies the importance of learning
long-term dependencies in time-series data.
Table 2 Results obtained using different approaches for Dataset-I

Self-created ECG Data
Sr. No. Method Features Classifier Sen. (%) Spec. (%) Acc. (%)
1 Conventional Time domain, SVM 79.41 76.92 78.33
methods frequency
domain, and
features obtained
using Poincare
plot
ANN 77.94 82.69 80
2 Deep – Inception 81.6 83.72 82.52
learning- network
based
method
Inception- 94.19 91.84 93
LSTM
Network
Table 3 Results obtained using different approaches for Dataset-II

Physionet Stress ECG data
Sr. No. Method Features Classifier Sen. Spec. Acc.
1 Conventional Time domain, SVM 86% 80.9% 83.9%
frequency
domain, and
methods features obtained ANN 93% 85.61% 89.75%
using Poincare
plot
2 Deep – Inception 93% 90.59% 91.47%
learning- Network
based
method
Inception- 98% 97.11% 97.19%
LSTM
network
Table 4 Comparison of deep learning-based studies for mental stress detection

Sr. No. Author Stressor Approach Deep Accuracy
learning
method
1 He et al. [5] Mental arithmetic Spectral ECG 1D CNN 82.7%
calculation was data
adopted.
2 Seo et al. [14] Mental stress in a ECG and 1D CNN− 83.9%
workplace RESP LSTM
3 Han et al. [4] Pilot’s diversified ECG, EEG, 1D CNNS− 85.2%
mental states RESP LSTM
4 Ali et al. [8] Firefighter trainees RR interval 1D CNN− 88.23%
that participated in a data LSTM
drill
5 Rastgoo et al. Self-created driver ECG, vehicle 1D CNN− 92.8%
[11] stress data and LSTM
contextual
features
6 This work Dataset-I: RR interval Inception- Dataset-I:
Self-created data data LSTM 93%
during practical viva network
examination
Dataset-II: Dataset-II
Physionet driver 97.19%
stress ECG data
Tables 2 and 3 exhibit discrimination between conventional methods and deep

learning-based methods in which the deep learning-based method achieved the high-
est accuracy of 93% for Dataset-I and 97.19% for Dataset-II on the test set using
the Inception-LSTM module. Table 4 shows a comparison of results obtained using
various approaches proposed by researchers with the proposed work that indicates
our method is excellent in feature extraction and classification.
4 Conclusions
Mental stress is a severe problem that decreases performance and potentially

increases health risk. Hence mental stress detection becomes essential. The pro-
posed work presented the comparative study of the conventional method and deep
learning-based method to recognize mental stress using RR interval data. The work
was carried out using two types of mental stress data, viz., the self-created academic
level stress data and the standard publicly available Physionet driver stress data. This
paper presented a robust Inception-LSTM network as an automated classifier for
mental stress detection. The classifier provided an accuracy of 93% and 97.19% for
Dataset-I and Dataset-II, respectively. The clinicians can monitor severely stressed
patients using this method outside the hospital. This work provides a better sense of
RR interval time-series data in HRV-based psychological stress detection problems.
In the future, the work will emphasize the analysis of the proposed algorithm with
more collection of data using a variety of stressors for the recognition of mental
stress and the different levels of stress.
References
1. Acharya, U.R., Joseph, et al.: Heart rate variability: a review. Med. Biol. Eng. Comput, 44(12),
1031–1051 (2006)
2. Camm, A.J., Malik, M., et al.: Heart rate variability: standards of measurement, physiological
interpretation and clinical use. In: Task Force of the European Society of Cardiology and the
North American Society of Pacing and Electrophysiology, pp. 1043–1065 (1996)
3. Delaney, J.P.A., et al.: Effects of short-term psychological stress on the time and frequency
domains of heart-rate variability. Percept. Motor Skills 91(2), 515–524 (2000)
4. Han, S.-Y., Kwak, N.-S., et al.: Classification of pilots’ mental states using a multimodal deep
learning network. Biocybern. Biomed. Eng. 40(1), 324–336 (2020)
5. He, J., Li, K., Liao, X., Zhang, P., Jiang, N.: Real-time detection of acute cognitive stress using a
convolutional neural network from electrocardiographic signal. IEEE Access 7, 42710–42717
(2019)
6. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
7. Hoshi, R.A., Pastre, C.M., et al.: Poincaréplot indexes of heart rate variability: relationships
with other nonlinear variables. Auton. Neurosci. 177(2), 271–274 (2013)
8. Oskooei, A., Chau, S.M., et al.: DeStress: Deep Learning for Unsupervised Identification
of Mental Stress in Firefighters from Heart-rate Variability (HRV) Data. arXiv preprint
arXiv:1911.13213 (2019)
9. Pan, J., Tompkins, W.J.: A real-time QRS detection algorithm. IEEE Trans. Biomed. Eng.
32(3), 230–236 (1985)
10. Ramteke, R., Thool, V.R..: Stress detection of students at academic level from heart rate vari-
ability. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft
Computing (ICECDS), pp. 2154–2157. IEEE (2017)
11. Rastgoo, M.N., et al.: Automatic driver stress level classification using multimodal deep learn-
ing. Expert Syst. Appl. 138, 112793 (2019)
12. Reijmerink, I., et al.: Heart rate variability as a measure of mental stress in surgery: a systematic
review. Int. Arch. Occup. Environ. Health 1–17 (2020)
13. Rigas, G., et al.: Real-time driver’s stress event detection. IEEE Trans. Intell. Transp. Syst.
13(1), 221–234 (2011)
14. Seo, W., Kim, N., Kim, S., et al.: Deep ECG-respiration network (DeepER net) for recognizing
mental stress. Sensors 19(13), 3021 (2019)
15. Szegedy, C., Vanhoucke, et al.: Rethinking the inception architecture for computer vision.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
2818–2826 (2016)
16. Tanev, G., et al.: Classification of acute stress using linear and non-linear heart rate variability
analysis derived from sternal ECG. In: 2014 36th Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, pp. 3386–3389. IEEE (2014)
17. Wang, J.-S., et al.: A k-nearest-neighbor classifier with heart rate variability feature-based
transformation algorithm for driving stress recognition. Neurocomputing 116, 136–143 (2013)
18. The Science of Stress. https://www.slma.cc/the-science-of-stress/. Last accessed 20 Aug 2020
19. PhysioBank Databases. https://archive.physionet.org/physiobank/database/. Last accessed 15
Nov 2020
Product-Based Market Analysis Using
Deep Learning
Aayush Kumaria, Nilima Kulkarni, and Abhishek Jagtap
Abstract Product Market Analysis understands how the market reacts to a product
manufactured by a company. In this paper, a deep-learning-based model is created.
The model can understand how a customer feels about a particular product. The
dataset used is “fer2013” (Ref. Kaggle Dataset) and is famous for creating “Senti-
ment Analysis.” The model developed is a self-made model giving a training accu-
racy of 68.61 and 65.92% test accuracy. The self-made model is a 27-layer deep
convolutional neural network consisting of 8 convoluting layers, three max-pooling
layers, and two fully connected layers. The model is created using Keras, which is a
framework built on Tensorflow, a machine learning library. A total of 427,319 param-
eters are used to develop the proposed model. Out of these parameters, 426,839 are
trainable, and 480 are non-trainable.
Keywords Convolutional neural network · Deep learning · Emotion recognition ·

Market analysis · Sentiment analysis
1 Introduction
Product Market Analysis is the process of assessing the market or the public to
fully understand and comprehend what they require or how they react to a particular
product. It is studying the need for economic purposes and getting to know what the
end-user wants or requires. Based on a market survey, the developers of a particular
product can fix the issues and plan a release strategy to succeed. Market analysis
also discusses the profit margin in the picture because if the reviews seem mostly
positive, it can yield a more significant profit margin.
Every company that dreams of launching a product or feature has to go through
Market Analysis. There are mainly two types of Market Analysis, as shown in Fig. 1.
A. Kumaria (B) · N. Kulkarni · A. Jagtap

Department of Computer Science and Engineering, MIT School of Engineering, MIT Arts Design
and Technology University, Pune 412201, India
N. Kulkarni
e-mail: nilima.kulkarni@mituniversity.edu.in
64 A. Kumaria et al.
Fig. 1 Phases of market analysis
1. Phase 1:
a. The target market is identified, and requirements are understood.
b. Requirements and Features are gathered from the target audience, and a
fair price point is decided based on the complexity.
c. Based on the requirements, a product is developed, keeping the target
audience in mind.
2. Phase 2:
a. The product is sent out to various users in the form of a “testing” phase.
This is known as the Alpha Test and Beta Test.
b. The testers review the product and give feedback to the developers and
notify them about bugs and issues. They also mention what further
improvements could be made.
Till today, the second phase has been a strenuous process. Most of the software
has an automatic bug capturing feature, but it will never give the full story. Along
with that, the user experience is something that needs to be manually expressed to the
developers. When it comes to products like video games, the testers need to call and
inform the developers about bugs and improvements manually. Most of the Android
phones in the market have an automatic bug capturing feature, but the user interface
and whether the user is happy with the product cannot be captured.
This software is made keeping in mind the second kind of analysis. Let us take
an example of an application. The application is a video game created for mobile
devices. In the eye of the creators and the company, the video game is already perfect.
But when you send this game out to the public, issues may arise, such as unnoticed
bugs, hard-to-understand user interface, and so on. Hence beta and alpha testing is a
necessary process. The application is sent out to a handful of registered users around
the world. They use the app and try finding bugs and report this back to the company
Product-Based Market Analysis Using Deep Learning 65
that works on eliminating them. In some cases, the testers are also supposed to say
how they feel about the application.
Due to the rise of Artificial Intelligence and Deep Learning, Convolutional Neural
Networks have been able to find a way to record human sentiments or the emotions a
person is displaying based on something as simple as an image. While the sentiment
analysis feature can be used for Market Analysis, it has never really been used much.
Hidden layers in a ConvNet can automatically identify features in a person’s face.
The earlier layers might identify lines, and it builds up to identify smiles, eyebrows,
etc. The hidden layers do all this, so there is no need for another human being to
manually deduce a person’s emotions.
One of the most widely known datasets for performing sentiment analysis is the
“fer2013” dataset, consisting of 35,887 rows. Each row is a unique image depicting
a particular emotion. There are three columns in the dataset: identifying the pixels
to create the image, another identifying the emotion id, and the last one identifying
whether the image is part of training or testing. There are 28,709 images for training
the model on and 7178 images to test the model. There are seven different kinds of
emotions in the dataset that are Happy, Sad, Surprised, Angry, Disgust, Neutral, and
Fear. The dataset has images of various faces centered within a 48 × 48 dimension
image in grayscale and gives accuracy on the proposed model of 68.61% on training
data and 65.92% testing data.
In Sect. 2, a discussion on related work is provided. The proposed system is
explained in Sect. 3. Then, in Sect. 4, results are discussed and analyzed. Our proposed
work is concluded in Sect. 5.
2 Related Work
Minaee and Abdolrashidi [1] aim to identify facial expressions using convolutional
neural networks and an accuracy of 70.02%. Pramerdorfer and Kampel [2] have tried
CNN for facial expression recognition. They identified Facial Expressions from a
deep Convolutional Neural Network architecture such as VGG, Res-Net, and Incep-
tion. The VGG Net of depth 10 received an accuracy of 72.7%. The Inception model
of depth 16 received an accuracy of 71.6%. The Res-Net model of depth 33 received
an accuracy of 72.4%.
Badjatiya et al. [3] proposed a model in this research paper that identifies whether
a particular tweet is hurtful to any community in any way. It uses a deep learning
model to classify the nature of a specific tweet. Poria et al. [4] presented a multimodal
dataset for emotion recognition. The model analyzes a particular person’s emotion
after training on a dataset created from the TV Series Friends. Hence, along with
facial expressions, it can detect the tone and speech to understand the sentiment. A
review paper by Mäntylä et al. [5] talks about the rise in sentiment analysis through
the years and how it is in a way related to customer feedback. It shows that the
customer’s sentiment is equivalent to how the product performs in the market. Good-
fellow et al. [6] took part in a competition where they implemented facial expression
analysis. To create a high-performance model, they implemented Residual Masking

Network and an Ensemble ResMaskingNet with six other CNNs that acquired an
accuracy of 74.14% and 76.82%, respectively. Li and Deng [7] talk about the prob-
lems that arise while performing Facial Expression Analysis that include but are not
limited to the lack of training data that causes overfitting and variations such as the
head illumination tilt. Zhang et al. [8] talk about the problem that arises in Facial
Expression Analysis due to occlusion, i.e., blocking a specific part of the face. It also
discusses the recent advancements made in FEA to tackle this problem.
Rouast et al. [9] review the trend of machines performing human affect recogni-
tion, focusing on deep neural networks to suggest a pattern of how deep learning
models can best serve human affect recognition. Noroozi et al. [10] talk about
how along with Speech and Expressions, Body Language can also detect a person’s
emotions or sentiments—using body language and posture as an input to identify
the person’s sentiments. Martinez et al. [11] talk about how facial expressions are a
crucial feature in communications. They talk about the various stages in building a
Facial Expression Analysis that include but are not limited to Preprocessing, Feature
Extraction, etc.
Corneanu et al. [12] speak about how it is possible to analyze and detect facial
expressions using RGB, 3D, Thermal, and Multimodal facial expression analysis.
Sariyanidi et al. [13] review facial expression analysis models’ progress and shed
light on fundamental questions. Along with that, they also break down state-of-the-
art solutions and analyze them. Savvides [14] speaks about the market evaluation
process starting from the project’s description to find a target analysis. Further, it
talks about Market Performance, i.e., a product’s ability to satisfy critical factors
under specific criteria.
3 The Proposed Model
The system flow of the proposed software, as shown below in Fig. 2, would have the
following actors:
1. The Reviewer
a. would be able to see the available products and review them.
b. Once a particular user reviews a product, it cannot be reviewed again by
the same user hence keeping the reviews authentic and unbiased.
2. The Admin can
a. add or delete a product.
b. see the reactions of people to their products.
The system starts the user’s front camera, and using cv2’s Haar cascade; the system
tries finding a face in every frame. This face is passed into the Deep Learning model,
where the particular facial expression is recognized. The system does this process
Fig. 2 System flow of the proposed software
in a loop till the user decides to stop the review, after which the most shown facial
expression is extracted, and the category is updated in the database. The Admin can
see this category alongside the respective product to know how users feel about it.
The face extracted is preprocessed before passing it into the proposed Convolu-
tional Neural Network model by converting it into grayscale and then reduced to 48
× 48 dimensions. Pixels range from 0 to 255, so to reduce the computation, each
pixel is divided by 32 and is kept as a “float32” data type. The same preprocessing
is done for each image in the database. Since the data the proposed Convolutional
Neural Network model has been trained on, and the data passed to get a prediction
is preprocessed in the same way, the margin of error significantly decreases. Adding
to that, a manual data augmentation technique is done where the image is flipped
vertically. This ensures that there is more data for training and testing. It was found
that this augmentation increased the accuracy by approximately 6% hence getting
the accuracy from 62 to 68%. Out of the various optimization techniques available
such as Gradient Descent, RMS Prop, and Momentum, the Advanced Momentum
Estimation (Adam) optimizer was chosen since it was one of the best performing
optimizers that takes the RMS Prop equation and Momentum equation and clubs it
into one high-performance optimizer.
The architecture of the proposed Convolutional Neural Network Model is
illustrated in Fig. 3.
The proposed model was trained on ASUS Rog-Strix G notebook with the
following specifications:
• Intel i7 9th Generation Processor,
• 8 GB DDR4 RAM,
Fig. 3 Architecture of the proposed convolutional neural network model
• 512 GB SSD,
• NVIDIA GeForce GTX 1650 4 GB Graphics Card, and
• Windows 10 Home Operating System.
The following libraries are used to run the proposed software:
• numpy (1.16.4),
• matplotlib (3.1.3),
• pandas (1.0.1),
• opencv_python (3.4.2.17),
• Keras (2.3.1), and
• Pillow (8.0.1).
The proposed model was trained for 200 epochs with a batch size of 64. The loss of
each epoch was plotted on a graph to comprehend the performance of ConvNet. The
initial loss was approximately 1.8, but within 25–30 epochs, the loss was reduced
to approximately 1.0. The train data continued to fall to approximately 0.8, whereas
the test remained consistent at approximately 1.0. The graph is illustrated in Fig. 4.
The classification report for the proposed model is shown in Fig. 5 (Table 1).
Fig. 4 Graph demonstrating the loss in training and testing
Fig. 5 Classification report

for the proposed model
Table 1 Comparison between the proposed model and other well-known models
Model Ensemble Residual VGG Res-Net Inception DeepEmotion Proposed
Name ResMaskingNet Masking [2] [2] [2] [1] model
with six other Network
CNNs [6] [6]
Accuracy 76.82% 74.14% 72.7% 72.4% 71.6% 70.02% 68.61%
Extra ✔ ✔ ✘ ✘ ✘ ✘ ✘
Training
Data
The proposed model can successfully predict the seven facial emotion categories,
including but not limited to Happy, Sad, Surprised, Neutral, etc. Figure 6 shows a
snapshot of live feed prediction where the “Happy” emotion is successfully detected.
Fig. 6 A happy face
5 Conclusions
In this paper, a deep-learning-based model is created to understand customers’

reviews about a product or service. A convolutional neural network is used for model
building. The proposed system got training accuracy of 68.61 and 65.92% test accu-
racy. With the rise of deep learning and Artificial Intelligence’s dawn, it can be safely
said that research work like this would be one of the trendsetters in the future world.
The proposed method eliminated the need to make videos and calls to receive users’
personal views about an upcoming product. It makes the product review a stress-free
and straightforward task. The margin of error does exist, but it reduces the workload
and the human error by manually understanding a person’s view on a product.
References
1. Minaee, S., Abdolrashidi, A.: Deep-emotion: facial expression recognition using attentional
convolutional network (2019)
2. Pramerdorfer, C., Kampel, M.: Facial expression recognition using convolutional neural
networks: state of the art (2016)
3. Badjatiya, P., Gupta, S., Gupta, M., Varma, V.: Deep learning for hate speech detection in
tweets (2017). https://doi.org/10.1145/3041021.3054223
4. Poria, S., Hazarika, D., Majumder, N., Naik, G., Cambria, E., Mihalcea, R.: MELD: A multi-
modal multi-party dataset for emotion recognition in conversations, pp. 527–536 (2019). https://
doi.org/10.18653/v1/P19-1050
5. Mäntylä, M., Graziotin, D., Kuutila, M.: The evolution of sentiment analysis—a review of
research topics, venues, and top cited papers. Comput. Sci. Rev. 27 (2016). https://doi.org/10.
1016/j.cosrev.2017.10.002
6. Goodfellow, I., Erhan, D., Carrier, P., Courville, A., Mirza, M., Hamner, B., Cukierski, W., Tang,
Y., Thaler, D., Lee, D.-H., Zhou, Y., Ramaiah, C., Feng, F., Li, R., Wang, X., Athanasakis, D.,
Shawe-Taylor, J., Milakov, M., Park, J., Bengio, Y.: Challenges in representation learning: a
report on three machine learning contests. Neural Netw. 64 (2013). https://doi.org/10.1016/j.

neunet.2014.09.005
7. Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput.
(2020). https://doi.org/10.1109/TAFFC.2020.2981446
8. Zhang, L., Verma, B., Tjondronegoro, D., Chandran, V.: Facial expression analysis under partial
occlusion: a survey. ACM Comput. Sur. 51 (2018). https://doi.org/10.1145/3158369
9. Rouast, P.V., Adam, M., Chiong, R.: Deep learning for human affect recognition: insights
and new developments. IEEE Trans. Affect. Comput. https://doi.org/10.1109/TAFFC.2018.
2890471
10. Noroozi, F., Kaminska, D., Corneanu, C., Sapinski, T., Escalera, S., Anbarjafari, G.: Survey
on emotional body gesture recognition. IEEE Trans. Affect. Comput. (2018). https://doi.org/
10.1109/TAFFC.2018.2874986
11. Martinez, B., Valstar, M.F., Jiang, B., Pantic, M.: Automatic analysis of facial actions: a survey.
IEEE Trans. Affect. Comput. 10(3), 325–347 (2019). https://doi.org/10.1109/TAFFC.2017.273
1763
12. Corneanu, C.A., Simón, M.O., Cohn, J.F., Guerrero, S.E.: Survey on RGB, 3D, thermal, and
multimodal approaches for facial expression recognition: history, trends, and affect-related
applications. IEEE Trans. Patt. Anal. Mach. Intell. 38(8), 1548–1568 (2016). https://doi.org/
10.1109/TPAMI.2016.2515606
13. Sariyanidi, E., Gunes, H., Cavallaro, A.: Automatic analysis of facial affect: a survey of regis-
tration, representation, and recognition. IEEE Trans. Patt. Anal. Mach. Intell. 37(6), 1113–1133
(2015). https://doi.org/10.1109/TPAMI.2014.2366127
14. Savvides, S.C.: Marketing analysis in project evaluation (May 1, 1990). Harvard Institute
for International Development, Development Discussion Paper No. 341. Available at SSRN.
https://ssrn.com/abstract=266721. https://doi.org/10.2139/ssrn.266721
Driver Drowsiness Detection Using Deep
Learning
Ajinkya Rajkar, Nilima Kulkarni, and Aniket Raut
Abstract The drowsiness of a person driving a vehicle is the primary cause of

accidents all over the world. Due to lack of sleep and tiredness, fatigue and drowsiness
are common among many drivers, which often leads to road accidents. Alerting the
driver ahead of time is the best way to avoid road accidents caused by drowsiness.
There are numerous techniques to detect drowsiness. In this paper, we have put
forward a deep learning-based approach to detect the drowsiness of the drivers. We
have used convolutional neural networks, which is a class of deep learning. We used
the Face and Eye regions for detecting drowsiness. We have used the Closed Eye in
the Wild dataset (CEW) and Yawing Detection Dataset (YawDD). We achieved an
average accuracy of 96%.
Keywords Driver drowsiness detection · Convolutional neural network · Deep

learning · Eye detection · Image processing
1 Introduction
The number of vehicles on the road is increasing day by day; road accidents have
become common in most parts of the country and the leading cause of death. As we
know, the person behind the steering is responsible for the road traffic system and
road traffic safety. The driver is responsible for himself in addition to the passengers
in the vehicle. Drowsiness is a human trait that is often ignored by many people
when it comes to their safety. But this characteristic can cause problems to the driver
and the passengers if it is not considered and reacted upon, or else it may lead to
an accident and be the cause of death. Driver Drowsiness is a demanding issue that
needs to be taken care of to improve road traffic safety. Driver drowsiness detection
is an essential component in modern-day driver monitoring systems because too
A. Rajkar (B) · N. Kulkarni · A. Raut

Department of Computer Science and Engineering, MIT School of Engineering, MIT Arts Design
and Technology University, Pune 412201, India
N. Kulkarni
e-mail: nilima.kulkarni@mituniversity.edu.in
74 A. Rajkar et al.
Table 1 Literature survey

Sr. No. Authors Method Result/Observations
1 Gwak et al. [1] Features of drowsiness An accuracy of 65.2% was
detection by cameras are evaluated on pretended data
divided into handcrafted
features, or features learned
automatically using CNNs
2 Kepesiova et al. [2] A convolutional neural network Average accuracy of 84.41%
(CNN), a convolutional control was achieved
gate-based recurrent neural
network (Conv GRNN), and a
voting layer were used
3 You et al. [3] Deep cascaded convolutional Average accuracy of 94.80%
neural network was achieved
4 Mehta et al. [4] Eye aspect ratio and eye A random forest classifier was
closure ratio was used used to get an accuracy of 84%
5 Sathasivam et al. [5] Eye aspect ratio (EAR) and Accuracy close to 94% was
SVM classifier achieved
many traffic accidents are happening worldwide due to drivers’ drowsiness. There
are various attempts in the literature to spot the drowsiness of the driver. We have
studied some approaches, as given in Table 1.
This paper is organized further as follows: In Sect. 1, a discussion on related work
is provided. The proposed system is explained in Sect. 2. Then, in Sect. 3, results are
discussed and analyzed. Our proposed work is concluded in Sect. 4.
2 Proposed Approach
This section details the proposed approach to detect driver’s drowsiness that works
on two parameters. The process starts with capturing the live video stream from the
camera and is processed to be sent to the model to predict drowsiness. Using the
OpenCV library, the video stream is cropped into the eye region and the face. Each
frame is checked for checking the state of the eyes as open or closed. Suppose the
state of the eyes is closed for more than a specific time set in the system. If drowsiness
is detected, the system will alert the driver and the passengers with an alarm. This
same process is followed for detecting if the driver is yawning or not. The subsequent
section details the working of each module. Figure 1 shows the flow of the proposed
approach process.
Driver Drowsiness Detection Using Deep Learning 75
Fig. 1 Flowchart of the proposed methodology

76 A. Rajkar et al.
2.1 Datasets
There are some standard datasets available for drowsiness detection. In this paper,
the following datasets are used.
a. YawDD VIDEO DATASET [6]
Yawning detection dataset is a video dataset recorded by a camera on a car’s dash-

board; The dataset contains male and female drivers, some wearing glasses and others
without glasses. Some examples of the Yawning detection dataset are given in Fig. 2.
b. Closed Eyes In The Wild [7]
This is a dataset of 2423 subjects. 1192 people with closed eyes, and 1231 people
with open eyes. The images of open eyes are taken from the dataset Labeled face in
the wild. Some examples from the CEW dataset are given in Fig. 3.
Fig. 2 Yawning detection dataset (Ref. YawDD VIDEO DATASET [6])

Fig. 3 Closed eyes in the wild (Ref. Closed Eyes In The Wild [7])
2.2 Pre-processing Stage
In this paper, the driver’s face in real-time videos is detected using the OpenCV
library’s Haar cascade classifier. It is an open-source library that is primarily used
for computer vision. It is also used for processing images and machine learning.
OpenCV supports many programming languages like Python, Java, C++, etc. It is
used to process images to identify faces, objects, and many more. The OpenCV’s
inbuilt feature, i.e., Haar feature-based cascade classifiers, is used to classify the input
to detect the face and the eye region. The cascade is pre-trained from many positive
and negative images; further, it can detect objects from other required images. This
approach is a machine learning-based approach. Detecting the face and eye regions
is a crucial step to determine drowsiness. Detecting face and eye regions is shown in
Fig. 4.
To use the CNN model on the YawD dataset, it needs to be converted into images
and then resized into 24 × 24 resolution. Then, face was determined using the
OpenCV library. Then the image was converted into grayscale. It is decided whether
the mouth state is open or closed and labeled accordingly. The open mouth state
is “1,” and the closed mouth state is “0” and is saved into a CSV file. The CEW
dataset was available in the form of cropped eyes and the resolution of 24 × 24 and
grayscale. The open eye state is labeled “1,” and the Closed eye state is marked as
“0” and saved into a CSV file. The data is separated into 80% for training and 20%
for validation during training the model. Each pixel is later divided by 32 and saved
as a float32 value.
78 A. Rajkar et al.
Fig. 4 Detecting face and eyes
2.3 Drowsiness Detection
The system starts the user camera and using OpenCV’s HAAR cascade, user’s face
and eye regions are detected frame by frame. These frames are then forwarded on
to the trained CNN model. This gives the output if the state of the eye is open or
closed, similarly, if the driver is yawning or otherwise. If the eyes remain closed
for the given time threshold, the system provides a drowsiness alert. Besides, if
the user is repeatedly yawning for the given time threshold, the system offers a
drowsiness alert. The time threshold is a dynamic field and can be set accordingly.
The convolutional neural network is the proposed deep learning model. After many
trials, we perfected our proposed model using four conv2d, four max-pooling layers
that were then flattened, two dense layers were added. To prevent overfitting, four
dropout layers were used.
Two datasets (Yawning Detection Dataset and CEW) are used for training the CNN
model and testing purposes. With the help of OpenCV’s Haar cascade algorithm,
the face region and eye region are determined. In the paper, since two features, i.e.,
eyes and yawn, were to be trained, two CNN models were used. Adam’s optimization
Table 2 CEW dataset loss

Epoch Train loss Val loss
1 0.7164 0.5964
10 0.0425 0.0680
20 0.0258 0.1050
30 0.0054 0.0698
40 0.0089 0.0784
50 0.0068 0.0963
Table 3 YawDD dataset loss

Epoch Train loss Val loss
1 1.6290 0.6918
10 0.1591 0.4074
20 0.1208 0.1770
30 0.0734 0.1960
40 0.0848 0.2468
50 0.0468 0.3550
algorithm was used in training the proposed model since it is better than the stochastic
gradient descent procedure. After a couple of tries, a model with three convolutional
layers was selected, which got us the best accuracy. To improve the performance, the
original photos from the YawDD dataset were resized into 24 × 24 resolution. To
calculate loss, categorical cross-entropy loss was used, which is also called softmax
loss. Table 2 shows the losses for training and validation for closed eyes in the wild
dataset, and Table 3 shows the losses for training and validation for the yawning
detection dataset. Figure 5a, b show the graph loss values for epochs 1, 10, 20, 30,
40, and 50. After observing the results, it is indicated that the loss value decreases in
each epoch value. This is indicating the proposed model was successful. The train,
validation, and average accuracy are shown in Table 4. The comparison of average
accuracy is given in Table 5.
Driving after alcohol consumption is another serious problem with drivers.
Notable research has been reported to detect and predict early the effect of alco-
holism [9–11]. The proposed work can be extended in this direction. Further, Internet
of Things (IoT)-based systems also got famous due to location-independent services
[12–14]. The reported work may be extended in this direction. The IoT-enabled
system can provide an early alarm to the traffic control unit to avoid accidents.
80 A. Rajkar et al.
Fig. 5 a CEW dataset loss. b YawDD dataset loss

Table 4 Average accuracy

CEW dataset YawDD Average
(%) dataset (%) accuracy (%)
Train 98.54 97.59 98.06
accuracy
Validation 97.26 93.93 95.59
accuracy
Proposed 97.90 95.76 96.82
approach
Table 5 Comparison table

Year Dataset Average accuracy
(%)
Wang et al. 2019 CEW and 98.42
[3] YawDD
Savas et al. 2019 YawDD and 98.81
[8] NthuDD
Proposed 2020 CEW and 96.82
approach YawDD
4 Conclusions
This paper aims to detect the drowsiness of drivers using a deep learning approach.
CNN models are utilized to detect driver drowsiness in real time. OpenCV’s Haar
cascade algorithm is used to see the driver’s face and eye regions. Then, the system is
trained with the proposed convolutional neural network for the detection of drowsi-
ness. The performance in real time is excellent. The driver drowsiness system works
successfully, with an average accuracy of 96%. For future work, we can improve
performance by getting a larger dataset. Another feature of face recognition can be
added to prevent theft of vehicles. The system also can be converted into a mobile
application for feasible usage.
References
1. Gwak, J., Hirao, A., Shino, M.: An investigation of early detection of driver drowsiness using
ensemble machine learning based on hybrid sensing. Appl. Sci. 10(8), 2890 (2020). https://
doi.org/10.3390/app10082890
2. Kepesiova, Z., Ciganek, J., Kozak, S.: Driver drowsiness detection using convolutional neural
networks. In: 2020 Cybernetics & Informatics (K&I) (2020). https://doi.org/10.1109/ki48306.
2020.9039851
3. You, F., Li, X., Gong, Y., Wang, H., Li, H.: A real-time driving drowsiness detection algorithm
with individual differences consideration. IEEE Access 7, 179396–179408 (2019). https://doi.
org/10.1109/access.2019.2958667
82 A. Rajkar et al.
4. Mehta, S., Dadhich, S., Gumber, S., Bhatt, A.J.: Real-time driver drowsiness detection system
using eye aspect ratio and eye closure ratio. SSRN Electron. J. (2019). https://doi.org/10.2139/
ssrn.3356401
5. Sathasivam, S., Mahamad, A.K., Saon, S., Sidek, A., Som, M.M., Ameen, H.A.: Drowsi-
ness detection system using eye aspect ratio technique. In 2020 IEEE Student Conference on
Research and Development (SCOReD) (2020). https://doi.org/10.1109/scored50371.2020.925
1035
6. Abtahi, S., Omidyeganeh, M., Shirmohammadi, S., Hariri, B.: YawDD: yawning detection
dataset. IEEE Dataport (2020). https://doi.org/10.21227/e1qm-hb90.
7. Song, F., Tan, X., Liu, X., Chen, S.: Eyes closeness detection from still images with multi-scale
histograms of principal oriented gradients. Pattern Recogn. (2014).
8. Savas, B.K., Becerikli, Y.: Real time driver fatigue detection system based on multi-task
ConNN. IEEE Access 8, 12491–12498 (2020). https://doi.org/10.1109/access.2020.2963960
9. Bavkar, S., Iyer, B., Deosarkar, S.: Rapid screening of alcoholism: an EEG based optimal
channel selection approach. IEEE Access 7, 99670–99682 (2019). https://doi.org/10.1109/
ACCESS.2019.2927267
10. Bavkar, S., Iyer, B., Deosarkar, S.: BPSO based method for screening of alcoholism. In: Kumar,
A., Mozar, S. (eds.) ICCCE 2019. Lecture Notes in Electrical Engineering, vol. 570, pp. 47–53.
Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8715-9_6
11. Bavkar, S., Iyer, B., Deosarkar, S.: Optimal EEG channels selection for alcoholism screening
using EMD domain statistical features and harmony search algorithm. Biocybern. Biomed.
Eng. 41(1), 83–96 (2021)
12. Deshpande, P., Iyer, B.: Research directions in the internet of every things (IoET). In: 2017
International Conference on Computing, Communication and Automation (ICCCA), Greater
Noida, pp. 1353–1357 (2017). https://doi.org/10.1109/CCAA.2017.8230008
13. Deshmukh, D., Iyer, B.: Design of IPSec virtual private network for remote access. In: 2017
14. Iyer, B., Patil, N.: IoT enabled tracking and monitoring sensor for military applications. Int. J.
Syst. Assur. Eng. Manag. 9, 1294–1301 (2018). https://doi.org/10.1007/s13198-018-0727-8
Emotion Detection from Social Media
Using Machine Learning Techniques:
A Survey
Vijaya Ahire and Swati Borse
Abstract The work carried out in this paper is to overview and compare various
sentiment analysis methodologies and approaches in detail and also discuss the limi-
tations of existing work and future direction about sentiment analysis methodolo-
gies. The main goal of sentiment analysis for market prediction is to recognize the
customer’s opinion about the available products. The work carried out in this paper is
to overview and compare various sentiment analysis methodologies and approaches
in detail with Sentiment Emotion Detection (SED) and also discuss the limitations
of existing work and future direction about sentiment analysis methodologies on
SED. The main goal of sentiment analysis for market prediction is to recognize the
customer’s opinion about the available products. It can pave the way for improve-
ment and prevent future defects and flaws. The tools for identifying and classifying
opinion communicated in a bit of text, in sound, or video formats indicate whether
the creator’s mood toward a specific issue, thread, item, and so on is positive, nega-
tive, or neutral. Human emotions are limited to being positive or negative. Still, it has
more categories like happiness, sadness, joy, disgust, surprise, depression, frustration,
anger, fear, confidence, trust, anticipation, shame, kindness, love, friendship, faith,
and wonder. Analyzing people’s comments/emotions is essential for the country,
business, or individuals for their existence, which gives the researcher motivation on
sentiment analysis on emotion detection.
Keywords Sentiment Analysis (SA) · Opinion mining · Emotion Detection (ED) ·

Social network · Machine learning · Social media · Text-Based Emotion (TEM)
V. Ahire (B)
RCPET’s Institute of Management Research and Development, Shirpur, India
S. Borse
SSVPSs. Late Karmveer Dr. P. R. Ghogrey Science College, Dhule, India
84 V. Ahire and S. Borse
1 Introduction
Every minute of the day, a tremendous amount of data is generated by social media
networks. Social media like YouTube, Facebook, Twitter, LinkedIn, WhatsApp,
Reddit, or any product website are available online around the globe. It is where
people spend a lot of time sharing their thoughts, views, and opinions across the
world [1, 2]. When people share their thoughts through social media, they express
their emotions directly or indirectly. The process of analyzing this expression is
called Sentiment Analysis.
1.1 Social Network Analysis (SNA)
Sentiment Analysis (SA) is part of social network analysis (SNA). With the help
of social media, various known or unknown entity constructs are formed in social
networks. It links with family members, groups, colleagues, peers, and maybe
connects with users for commercial purposes [3, 4]. The interconnectivity of the
individual in a social media network is called groups or communities. People get
connected on social media platforms with attributes like their relationship, similarity
of interest or habits, etc. It is found that people have social commodity beliefs on
those people with whom they trust more and follow them to achieve their solution
to real-time problems. The digital world discovered a large amount of data gener-
ated by these networks, which is highly important to understand the user’s thoughts
accurately. This paper provides a detailed overview and survey of emotion detec-
tion from social media by using machine learning techniques. Section 1 contains
the introduction about SNA; Sect. 2 contains the background of SNA with data
collection/acquisition, data cleaning, clustering or community detection, sentiment
analysis level, and some sentiment analysis approaches. Section 3 contains related
work on sentiment detection with the table. The table shows a review of emotion
detection in sentiment analysis. It briefly elaborates on which dataset the researcher
used and which approach they implemented to achieve the result. Section 4 contains
a discussion and future research directions followed by a conclusion in Sect. 5.
2 The Background
Sentiment analysis (or) opinion mining is an essential part of personal/corporate life

for decision-making. Sentiment analysis is an emotion in AI and opinion mining area
where stated opinions over a specific entity are categorized. SA got more demand
since 2008. It aims to determine a speaker’s or writer’s attitude concerning some topic
or the document’s overall contextual polarity. Sentiment analysis is widely applied
to comments, review sites, tweets and retweets, blogs, discussion groups, or other
Emotion Detection from Social Media Using Machine Learning … 85
spaces where people comment in their choice of social network. Here, analyzing
people’s comments is essential for production, countries, and entities to offer people
or consumers the best services. Hence, it is crucial to recognize emotion expressed
by these entities to regulars of the production or services. This information collected
from customers or individuals encodes their feelings regarding their procurement.
This analysis is essential for any organization in the decision-making process, like
what people say, how they’re saying it, and what they mean, to ensure their growth.
The following steps are used for sentiment analysis.
2.1 Data Collection/Acquisition
Data collection is an essential aspect of sentiment analysis. Social media networks

like Facebook, Twitter, LinkedIn, YouTube, Reddit, etc., provide comments, like,
post, and share among family members, friends, and relatives within the same
network. This data has many features, which may be noisy or may not be loud, perhaps
a similar or mixed type of data, etc. It has been observed that to produce correct web
pages on the Internet for these many methodologies proposed by researchers [5],
three main approaches are used to gain data as follows:
• Network Traffic Analysis: Private groups use it due to security concerns.
• Crawling: It is a widely used mechanism to collect data from social media. Various
APIs exist for social media.
• Ad hoc Applications: It is generally used to get specific information regarding
account holders to track the user’s operational performance.
2.2 Data Cleaning
The data cleaning steps will turn your dataset to a very informative format. Data
cleansing is searching and modifying, erasing corrupt or inappropriate minutes from
collected data, and referring to classifying the data. Data cleaning can be done using
new advanced machine learning techniques easily.
2.3 Clustering or Community Detection
After collecting data from different social media networks, the next steps are to
forming the group. It is also known as community detection. It can be done with
many features like liking, disliking, behavior, culture, emotion, etc. The process of
finding the interrelated groups in the networks is called community detection. Many
algorithms were developed for finding community detection in social media network
data. The algorithms for this are classified into approaches based on clustering, graph
partition, genetic algorithms, label propagation, etc. [6].
2.4 Sentiment Analysis Level
This classifying sentiment polarity will exist in three analysis levels based on the
given data [7, 8].
Document level: This sentiment classification level considers the entire docu-
ment’s opinion and predicts the document’s view as either positive or negative.
Sentence level: This sentiment classification level considers the sentence opinion
and predicts the sentence’s view as positive or negative, or neutral. The sentence can
be subjective with a positive, negative, or neutral state.
Aspect level: This level of sentiment classification will classify the sentiment
concerning the entities’ specific aspects. Rather than seeing language builds,
viewpoint level straightforwardly takes a gander at the assessment itself.
Instead of looking at language constructs, the aspect level directly looks at the
opinion itself. This level of sentiment analysis will convey the views or emotions at
each level.
2.5 Some Sentiment Analysis Approaches
Sentiment analysis tasks include different types of strategies, which are classified
mainly into three types of approaches:
Lexicon-based approach: It is an unsupervised learning mechanism. It works on
the polarity of the sentence and measures positive, negative, or neutral forms. It is
having two basic approaches: Dictionary and a Corpus-based approach.
Machine learning approach: It is classified into two procedures, supervised
learning and unsupervised learning. Supervised learning techniques will predict the
polarity of the target data or test data based on the training dataset with a finite set of
classes such as positive and negative. Simultaneously, an unsupervised learning tech-
nique is proposed when there is no possibility of providing a prior training dataset
to my data. There are four approaches to machine learning:
a.Supervised.
b.Unsupervised.
c.Semi-supervised.
d.Reinforcement learning.
Hybrid approach: Based on the above two strategies, we consider a hybrid
approach.
3 Related Work on Sentiment Detection
An emotion is a way of expressing feelings about any individual like environment,

positions, mood, or relationships with others. This emotion plays an essential role in
every individual’s life to make decisions and act. Everyone expresses their feelings
differently, like speaking, writing, drawing, dancing, and shouting. These emotions
are essential for their presence [6]. Emotions are different like happiness, sadness,
joy, disgust, surprise, depression, frustration, anger, fear, surprise, etc. Researchers
represented other models of emotion. Emotions have five main emotion models.
Emotion main models:
1. Discrete—Ekman
2. Dimensional—Russell
3. Componential—OCC
4. Circuit—Ledoux
5. Appraisal model—Smith and Lazar
Researchers tried to find out emotion from various modes like voice/speech,
images, and text. But detecting emotion from writing/textual form is challenging
to analyze. Hence, Emotion Detection [ED] is a significant era for researchers in
sentiment analysis (SA). It has been detected that the attention on ED has been more
since 2010. Many research work on emotion detection but have less attention on text-
based ED than other modes because of shorter text, newly generated words by the
young generation, grammatical mistakes, and emojis. Recently, research available
for “ED on the text” in IEEE-Explore was only 11.16%, and for the Scopus database,
it was 10.89% [4, 6, 9, 10].
The following table shows a review of emotion detection in sentiment analysis.
It briefly elaborates on which dataset the researcher used and which approach they
implemented to achieve the result.
4 Discussion and Future Research Directions
Much work has been performed using machine learning techniques but has limita-
tions such as disregarding the word’s contextual meaning, a high number of misclas-
sifications, a partial number of groups, and weak context information extraction.
So to overcome these lacunas, some researchers recommended using deep learning
techniques for improved performance.
The above table focuses on the details of existing work for emotion detection
[11–23] and limitations and future work.
Some common challenges in Emotion Sentiment analyses are the following:
1. Text is frequently displayed with noisy and incorrect syntax in a message.
2. In many languages, a single word may have several meanings, hence polarity is
constructed in the environment.
3. Terminology is not limited. Words may be induced due to named entities as well
as user errors and deliberate misspelling.
4. Some sentences might be mocking (sarcastic), thus causing to produce an
inappropriate result.
5. Many times, sentiments are vague due to the mention of multiple opinions about
them.
Many researchers suggested that sentiment analysis using emotion detection is
carried out with lexicon-based and machine learning techniques for limited approach
algorithm. Further, EEG analysis and the Bigdata approach can also be used for
emotion detection [24].
5 Conclusions
A comprehensive study and discussion on sentiment analysis and allied emotion

detection are reported in this paper. Researchers worked on only structured data
available on the Web or social media. Hence, research can continue with unstructured
data available on the Web. The above study suggested that significantly less work was
performed on Text-Based Emotion (TEM). There is ample opportunity to work in this
area. It is also analyzed that various domains like NLP, Stock Prediction, Politics,
Agriculture, etc., [14, 22, 23] are the most demanding research areas in today’s
scenario. The above survey anticipated many machine learning and in-depth learning
methodologies, which are shown in Table 1. In this century, world is digitalized, and
it produces a massive amount of data; hence, many researchers suggested using deep
learning techniques to improve performance in sentiment analysis.
Table 1 Related literature review with comparison and future work

Ref Dataset Approach Outcome Limitation Future Work
No.
[11] The movie, Deep On a different DLSARS Multidomain,
Unwomen, learning-CNN domain like a recommender multi-feature
Iphone8, movie— 92.09, system approach to get
Agriculture Agriculture—91.19, framework is better
India tweets, Electronic evaluated on performance
Kollywood Products—84.48, documents and
cinema twitter Social—84.41 sentences
accuracy
comparison with
CNN and RNN
[12] Emo-Dis-HI Machine The proposed Dataset is Researchers can
learning CNN methodology small in size; use different
and Bi-LSTM transfers learning hence they deep learning
techniques, faced the algorithms to
achieving an problem of improve their
F1-score of 0.53 under-fitting performances
[13] WhatsApp Deep learning Chat Analyzer gave They focus on Extend work on
Chats Machine 72.9% accuracy emojis and text the rise in hate
learning against a set of data for 6 types speech,
pre-classified data of emotions cyber-bullying,
heckling by
adding more
emotion
[14] Twitter Data Machine Results found that It can’t work ML algorithms
learning NB outperformed for with hybrid
NB and KNN well when unsupervised approach with
compared with learning multi-model
K-NN forms of data on
various domains
[15] ISEAR Hybrid The accuracy of “Neutral” With the help of
approach 66.18% emotion was deep learning, it
not classified can classify
and reduced Neutral emotion
the accuracy of correctly with
the emotion high accuracy
keyword
[16] SemEval 2015 Machine Subword-LSTM It works on Other language
dataset for learning surpasses other Hindi-English combinations
Twitter methods by a good with a limited are used with
margin with F-Score dataset the in-depth
at 65.8% learning
approach
(continued)
Table 1 (continued)
Ref Dataset Approach Outcome Limitation Future Work
No.
[17] ISEAR Machine Multinomial NB Complex Complex
dataset learning gave good results emotion can’t Emotion can be
Multinomial be predicted solved after
NB, SVM, accurately adding features
DTC and KNN or rules-based
approaches
[18] AWS dataset Deep learning Result of Long In future, we To get higher
CNN and short-term memory can consider performance,
Bi-LSTM analysis upgraded other axioms use convolution
as related to the of sentiment to deep learning
basic neural understand algorithms
network model emotion more
accurately for
the specific
domain
[19] ISEAR Machine It shows Discounted Improve
learning enhancement in relation performance
SVM routine associated between with a hybrid
with baseline features approach
[20] EmotionLine Machine F1-score for friends Amount of To get higher
learning 0.815 and data performance use
EmotionPush 0.885 insufficient a hybrid
approach
[21] SemEval Machine LSTM F1 score is The only Using
learning 0.5861 for four restricted Bi-LSTMs can
classes number of improve the
groups performance
produced
[22] Tencent Lexicon based Got 84.3% accuracy It works on Improve
Weibo (2013) Chinese blogs accuracy with
only machine
learning
[23] Facebook Machine Hindi obtained an It contains Using the
multilingual learning F1-score of 0.4521, non-English multilingual
texts and English got text, which dataset and
0.5520 reduces the hybrid approach
system’s can improve
performance performance
References
1. Chakraborty, K., Bhattacharyya, S., Bag, R.: A Survey of sentiment analysis from social media
data. IEEE Trans. Comput. Soc. Syst. 7(2), 450–464
2. Pokhun, L., Yasser Chuttur, M.: Emotions in texts. Bull. Soc. Inf. Theory Appl. 4(2), 59–69
(2020)
3. Leskovec, J.: Social media analytics: tracking, modeling, and predicting the flow of information
through networks. In: Proceedings of 20th International Conference Companion World Wide
Web, pp. 277–278 (2011)
4. Acheampong, F., Wenyu, C., Nunoo-Mensah, H.: Text-Based Emotion Detection: Advances,
Challenges, and Opportunities (2020)
5. Canali, C., Colajanni, M., Lancellotti, R.: Data acquisition in social networks: issues and
proposals. In: Proceedings of International Workshop Services Open Sources (SOS), pp. 1–12.
ISSN 0167-739X (2011)
6. Flake, G. W., Lawrence, S., Giles, C.L.: Efficient identification of Web communities, KDD
150–160 (2000)
7. Ray, P., Chakrabarti, A.: A mixed approach of deep learning method and rule-based method to
improve aspect level sentiment analysis. Appl. Comput. Inf. ahead-of-print No. ahead-of-print
(2020)
8. Jain, A., Pal Nandi, B., Gupta, C., et al. Senti-NSetPSO: large-sized document-level sentiment
analysis using Neutrosophic Set and particle swarm optimization. Soft Comput. 24, 3–15
9. Gunes, H., Schuller, B., Pantic, M., Cowie, R.: Emotion representation, analysis and synthesis
in continuous space: a survey. In: Paper Presented at: Proceedings of the Face and Gesture,
pp. 827–834. IEEE (2011)
10. Brusco, M., Doreian, P., Steinley, D.: Deterministic block modelling of signed and two mode
networks: a tutorial with software and psychological examples. Br. J. Math. Stat. Psychol.
(2019)
11. Pradeepth, N.: Deep Learning Based Sentiment Analysis for Recommender System, Annals.
Comput. Sci. Ser., 16th Tome 2nd Fasc-2018, 155–160 (2018)
12. Ahmad, Z., Jindal, R., Ekbal, A., Bhattachharyya, P.: Borrow from rich cousin: transfer learning
for emotion detection using cross lingualembedding. Expert Syst. Appl. 139, 112851 (2020)
13. Dahiya, S., Mohta, A., Jain, A.: Text Classification based Behavioural Analysis of WhatsApp
Chats, pp. 717–724 (2020). https://doi.org/10.1109/ICCES48766.2020.9137911
14. Suhasini, M., Srinivasu, B.: Emotion detection framework for twitter data using supervised
classifiers. New York, NY: Springer 2020, 565–576 (2020)
15. Seal, D., Roy, U.K., Basak, R.: Sentence-level emotion detection from text based on semantic
rules. In: Paper Presented at: Proceedings of the Information and Communication Technology
for Sustainable Development, pp. 423—430. Springer (2020)
16. Joshi, A.: Sentiment Analysis and Opinion Mining from Noisy Social Media Content.
International Institute of Information Technology, Hyderabad (2020)
17. Nasir, A.F.A., Nee, E.S., Choong, C.S., Ghani, A.S.A., Abdul Majeed, A.P.P. Adam, A., Furqan,
M.: Text-based emotion prediction system using machine learning approach. In: The 6th Inter-
national Conference on Software Engineering & Computer Systems; IOP Conference Series:
Materials Science and Engineering 769, 012022 (2020)
18. Goud, G., Garg, B.: Sentiment analysis using long short-term memory model in deep learning.
In: 2nd EAI International Conference on Big Data Innovation for Sustainable Cognitive
Computing, pp. 25–33 (2019)
19. Singh, L., Singh, S., Aggarwal, N.: Two-stage text feature selection method for human emotion
recognition. In: Paper Presented at: Proceedings of the 2nd International Conference on
Communication, Computing and Networking, pp. 531–538; Springer (2019)
20. Huang Y-H, Lee S-R, Ma M-Y, Chen Y-H, Yu Y-W, Chen Y-S. EmotionX-IDEA: emotion
BERT–an affectional model for conversation; arXiv preprint arXiv:1908.06264 (2019)
21. Chatterjee, A., Narahari, K.N., Joshi, M., Agrawal, P.: SemEval-2019 task 3: EmoCon-
text contextual emotion detection in text. In: Paper Presented at: Proceedings of the 13th
International Workshop on Semantic Evaluation, pp. 39–48 (2019)
22. Ma, J., Xu, W., Sun, Y.H., Turban, E., Wang, S., Liu, O.: An ontology-based text-mining method
to cluster proposals for research project selection. IEEE Trans. Syst. ManCybern. Part A Syst.
Hum. 42, 784–790 (2012)
23. Malte, A., Ratadiya, P.: Multilingual cyber abuse detection using advanced transformer archi-
tecture. In: Paper Presented at: Proceedings of the TENCON 2019–2019 IEEE Region 10
Conference, pp. 784–789. IEEE (2019)
24. Kamthekar, S., Deshpande, P., Iyer, B.: Cognitive analytics for rapid stress relief in humans
using EEG based analysis of Tratak Sadhana (Meditation): a Bigdata approach. Int. J. Inf. Retr.
Res. (IJIRR) 10(4), 1–20 (2020)
Deep Age Estimation Using Sclera
Images in Multiple Environment
Sumanta Das, Ishita De Ghosh, and Abir Chattopadhyay
Abstract Human age estimation from images using machine learning techniques
is a challenging task. Due to physical aging process, color and texture of sclera, a
protective outer layer present in human eye, get changed. In this work, we present an
exploratory study to find the effectiveness of using sclera region of eye images for
age estimation. It employs a modified form of deep neural network model VGG-16.
The model is trained and tested by SBVPI dataset, in which the images are acquired
with high-end cameras. The model is also tested using images acquired by a mobile
camera fitted with a macro lens. The work gives the best mean-absolute-error of 0.06
and the encouraging results lead us to conclude that sclera images can be used as an
effective modality for human age estimation. It is a pioneering work in the sense that
the idea of using sclera for the purpose has not been explored before.
Keywords Age estimation · Sclera images · Deep learning · VGG-16 model ·

SBVPI dataset
1 Introduction
Automatic estimation of age from the physical characteristics of a human being is an

interesting research topic. It has applications in many fields, such as electronic cus-
tomer relationship management (ECRM), security control and monitoring, including
biometrics systems and child sexual exploitation material (CSEM), and areas of med-
ical sciences. Standard techniques for automatic age estimation employ images of
fingerprints, retina, iris, or face. Fingerprint images are used for authentication and
verification purposes for many years, and nowadays, they are used for age estimation
also [16, 17]. But they are not suitable for monitoring systems like CSEM because
continuous monitoring is not possible for the system user in real-time.
S. Das (B) · A. Chattopadhyay

University of Engineering and Management, Newtown, Kolkata 700156, India
I. De Ghosh
Barrackpore Rastraguru Surendranath College, Kolkata 700120, India
94 S. Das et al.
Moreover, image acquisition needs special devices like a fingerprint scanner.

Retina images show considerable changes due to age. In particular, Bruch’s mem-
brane present in the retina bears marks of the aging process [4]. However, high-
resolution retina images are acquired under constrained imaging conditions using
special imaging devices by expert technicians. There is also evidence of research for
using iris as a modality for age determination [1]. Usually, iris images are acquired
with infra-red cameras from proximity. As image acquisition becomes non-trivial and
costly, retina or iris images are not an obvious choice for age estimation purposes.
The face of a human being is a good indicator of biological age, and face images are
used in state-of-the-art age estimation research with considerable accuracy [2, 11]. In
the field of security control and monitoring, especially in CSEM, its use has become
common [3]. Employing face images for age prediction is advantageous because of
three reasons. Firstly, image acquisition is possible with ordinary imaging devices
like hand-held cameras or mobile cameras; secondly, it is possible in visible light; and
lastly, without direct contact with the subject. Disadvantages are as follows: (i) faces
may vary in skin color or tone, naturally or artificially, as in case of cosmetics-use;
(ii) faces may contain beard or wigs; and (iii) in some religious practices and in the
present pandemic situation, face masks cover a large portion of the face. In addition
to that, full facial image capturing is not always possible, as often happens in CSEM.
All these together make age prediction from face images difficult.
Observation suggests that eyes constitute a prominent region of face images.
Eyes are usually uncovered and thus remain visible except for sunglass wearers.
Mobile cameras or standard high definition cameras used in our daily life can easily
capture eye images in visible light without physical contact for real-time processing
and monitoring. The sclera is the white portion visible in front-view eye images;
it remains visible for various gaze directions. Sclera color changes with age and
health [5, 15]. This motivates us to explore the possibility of using sclera images to
estimate human age. To our knowledge, this paper is novel to address formally the
use of sclera images in human age estimation.
In this work, a novel deep learning method is proposed to estimate a person’s age
using sclera images. It uses a modified version of the deep neural network model
VGG-16. Ocular dataset SBVPI is used for our work. In addition to that, experiments
are done on images captured by mobile cameras. For that, we captured images in
visible light using a macro lend fitted on a mobile handset camera. Collectively 206
subjects are used in the range of 4–68 years. Examples of multi-gaze eye images
from both datasets are given in Fig. 1. Though this is the first reported work on
age estimation from sclera images, we get the best mean-absolute-error of 0.06.
Such encouraging results strengthen the idea that sclera can be used as an effective
modality for age estimation. The paper is organized as follows. A brief literature
review is given in Sect. 2. The proposed method is provided in Sect. 3. Experimental
results are presented in Sect. 4, and concluding remarks are given in the last section.
Deep Age Estimation Using Sclera Images in Multiple Environment 95
(a) 15 (b) 38 (c) 55 (d) 75 (e) 4 (f) 31 (g) 50 (h) 68
Fig. 1 Examples of multi-gaze (front, up, left, and right) eye images. Images a–d are taken from
the SBVPI dataset. Images e–h are captured using a mobile handset camera. The age of subjects is
annotated below each image
2 Literature Review
A brief review of the current scenario of research on age estimation is given now.
In 2012, Gou proposed age estimation and sex classification using colored images
acquired with cameras installed in public places [8]. The focus is on face-based
features, and numerous feature extraction techniques are described. In 2015, Jana
et al. proposed a method using skin wrinkle features extracted from face images,
and it has experimented with images from Asian subjects [10]. In 2017, Lin et
al. studied age estimation for the same subject at multiple ages and proposed a
dimension reduction scheme for face images employing neural networks [12]. In the
same year, Hu et al. used the Kullback–Leibler divergence to estimate age difference
between subjects and also contributed a large face image dataset [9]. In 2020, Ajala
and Viriri showed age prediction, and gender classification using face images work
better using a deep convolution neural network [2]. In the same year, the DeepUAge
model was proposed by Anda et al. to assist in combating child sexual exploitation
material (CSEM), where the aim is to classify child age groups for restricting access
to specific contents [3]. Apart from face images, other methods are developed in
the curvelet domain for extracting features from fingerprints in the estimation of
age [16]. Eye-tracking was proved to be active in the case of toddlers [6].
In recent years, sclera images obtained in visible light are used in biometric
recognition systems that gave rise to two essential classifications of works: sclera
segmentation and recognition. Worldwide competitions are organized for exploring
the effectiveness in sclera segmentation [7, 19]. The next step after sclera segmen-
tation is sclera recognition for use in biometric recognition systems [14]. Sclera
segmentation being used as a separate research topic, we used the segmented sclera
images for our work. Novel datasets named MASD, MSD, and SBVPI are proposed
for these works. SBVPI dataset is provided with corresponding age meta-data for
subjects whom we used for this work. To our knowledge, this paper is novel to present
age prediction using advantageous characteristics of sclera images.
96 S. Das et al.
3 Proposed Method
The basic assumption of our work is ‘human sclera color changes with age’. There
are medical pieces of evidence for this reported in [4, 5]. Sclera stiffness also changes
due to age because of variation of underlying choroidal thickness [20]. This is also
discussed in detail from an image processing point of view in [15]. Recently sclera
images are explored for suitability in biometric recognition systems, which gives rise
to two essential research fields, namely, sclera segmentation and recognition [14,
19]. Sclera segmentation being a particular problem, ground-truth images provided
in the SBVPI dataset are used for this work. For images acquired by a mobile camera,
sclera segmentation is done by us [7]. RGB colored sclera region obtained from the
eye image is illustrated in Fig. 2. The figure also shows the change of colors in the
sclera region due to age, as evidenced by images acquired by mobile cameras from
four subjects.
At first, square-shaped patches of size 300 × 300 are segmented from the sclera
area. The segments are then fed into a deep neural network to get an estimation of the
age of the subject. The network produces a single floating-point number as output
which essentially indicates the age. The model of the deep network is similar to VGG-
16 with variations [18]. The network has four convolution layers, and in between,
it has three max polling layers to ensure that colors and patterns all over the patch
have sufficient variations in features and interconnections. The final convolution
layer is attended and fed to a network of three dense layers. The final dense layer
has only one node with a ’sigmoid’ activation function to ensure that the output is a
single positive floating-point value. ’Adam’ optimizer is used with a learning rate of
Fig. 2 Ground-truth image superimposed on original RGB image gives sclera-segmented RGB
image. A sclera patch is then sliced from it. On the right-hand side, four patches obtained from
sclera images (acquired by a mobile camera) belonging to four subjects of different ages are shown
Fig. 3 Layers of deep model used for age prediction
0.00001. Mean-absolute-error (MAE) is used to determine error or loss at each step

during the training process. Details of layers used for the model are tabulated and
pictorially presented in Fig. 3.
4 Results and Discussion
Experiments are performed using a system with an Intel i7 processor, 12 GB of

RAM, 512GB SSD, and NVIDIA GPU 1660Ti with 6GB dedicated memory. Python
3.6 with tensor-flow 2.3 libraries is used for execution in the said environment
with Pycharm IDE. SBVPI is a sclera dataset originally used for sclera recognition
work [13, 14]. This dataset was used for our work since images in it are annotated
with the subjects’ age information. It contains 1856 images of 55 individuals for both
eyes. This doubles up the number of subjects to 110. Each image has one of the gaze
directions, left, right, front, or upward. Eye images in RGB format are obtained with
the least distortion under a constrained environment using a high-resolution camera
with a clear focus on sclera. The images are cropped to eliminate the background
and contain only peri-ocular and ocular regions consisting of skin, eyelashes, iris,
and, sclera. The approximate resolution of each image is 3500 × 2000. The dataset
contains binary ground-truth images for the sclera region used in our work to segment
the sclera region from the rest of the image.
To train the model with a greater variety of sclera-types and more subjects, we
acquire eye images using a mobile camera from 48 individuals whose ages range
from 3 to 68 years. Since images are obtained for each person’s two eyes (left and
right), the number of subjects doubles to 96. Each eye was captured approximately
10 times. Hence, 950 images are acquired in total.
We used a rear mobile camera of 13 MP, with a 10× macro lens mounted on it. Such
installations come inbuilt in the latest mobile devices. The lens allows us to capture
98 S. Das et al.
eye images within 2 to 10 cm from the lens with a clear focus. We use Samsung J4
Galaxy and Yunicorn 5530 mobiles for image acquisition. The images are captured
under varying lighting conditions, varying distances from the lens, a good focus that
clearly show the sclera vessels, and some slightly blurred or distorted images due to
motion. Every individual is asked to look to the left, right, upward, and toward the
camera lens for capturing images of multiple gazes. The image-capturing device was
slightly rotated or tilted to get a variety of images. So essentially, the image dataset
made by us has variations in gaze direction, position, capturing distance, illumination,
blur, etc., due to motion and focus change. It also contains sclera-segmented ground-
truths images prepared by us.
We used 70% images from both datasets in training; the remaining images are used
for testing. To reduce over-fitting during training, we select two patches of 300 × 300
randomly instead of a single patch from each image, which doubles training and test
data size. The model is trained with a batch size of 16. Training converges within
approximately 300 epochs. The average time required for execution in a batch is
approximately 50 ms.
Mean-absolute-error (MAE) is calculated by finding the mean of absolute dif-
ferences between the predicted age and given age. Using all images of the SBVPI
dataset and mobile handset images, we obtain an MAE of approx. ±12 and ±9
years, respectively. The fact is further elaborated for all images given in Fig. 4 for
the SBVPI dataset and Fig. 5 for images of the mobile handset. The graphs show
a higher number of predicted images with low MAE which ascertains the model’s
usability for prediction. Few images have high MAE, which increases the overall
MAE. To further analyze the variation of subject ages used in training, Fig. 6 shows
the number of images versus subject ages. The figure shows a very low number
of subject images for children and elderly groups than middle-aged subjects. This
has led to a higher error in prediction for children and the elderly. The graph in
Fig. 7 depicts the scenario. The higher number of images in training for middle-aged
Fig. 4 Graph depicting the distribution of error (MAE) versus its frequency for SBVPI dataset
Fig. 5 Graph depicting the distribution of error (MAE) versus its frequency for mobile handset
images
Fig. 6 Graph depicting the distribution of subject ages used in training the model
subjects gives good results for the middle-aged subjects. Table 1 gives the results
separately for each dataset, along with imaging constraints and sex of subjects. As
the number of subjects is significantly less above age group 50 and below 16, we
experimented by removing eye images of these 9 subjects for further evaluation.
This reduces the overall MAE to ±8 and ±6 years for SBVPI and mobile handset
images, respectively, represented by MAE-R in the table. We can easily conclude
that more subjects in each age category are essential to perform biased training for all
age groups. We observed no significant impact due to multiple mobile handsets used
for the image acquisition process or sex. Further results using our mobile handsets
have performed better over standard datasets prepared in a constrained environment.
Our work suggests sclera be an essential feature for predicting age and face images
be used widely in the process.
100 S. Das et al.
Fig. 7 Graph depicting the error for subjects with variation in age
Table 1 Results obtained separately for each dataset and sex

Dataset Camera Image Imaging Cross Sex MAE MAE-R
name used quality environment environment
SBVPI Standard High Constrained No Male 0.1091 0.0856
Female 0.1254 0.0768
In-house Mobile Low Un- Yes Male 0.1140 0.0683
handset constrained
Female 0.0978 0.0655
5 Conclusion
Estimation of age from the physical characteristics of a human being is a common

but challenging task. It is often required in the field of medical and forensic sci-
ences. However, face images are widely used for automatic age estimation; current
work shows that the sclera region of eye images can be used effectively for this pur-
pose. Deep learning techniques in which training is done with a variety of images
are utilized for the work. Advantages of sclera for age estimation are manifold, (i)
images can be acquired with ordinary imaging devices like standard cameras or
mobile cameras, (ii) they are acquired without direct contact with subjects, (iii) no
special lighting condition is required during image acquisition, and (iv) it is useful
in continuous real-time monitoring often required in combating CSEM. The work
can be advanced further by increasing the number of subjects by including more
subjects from children and older adults, thereby improving the entire population’s
trained model.
Acknowledgements We express our gratitude to Dr. Matej Vitek of the University of Ljubljana
and his team members for allowing us to use the SBVPI dataset.
References
1. Abbasi, A., Khan, M.: Iris-pupil thickness based method for determining age group of a person.
Int. Arab J. Inf. Technol. 13(6) (2016)
2. Agbo-Ajala, O., Viriri, S.: Deeply learned classifiers for age and gender predictions of unfiltered
faces. Sci. World J. (2020). https://doi.org/10.1155/2020/1289408
3. Anda, F., Le-Khac, N.A., Scanlon, M.: DeepUAge: improving underage age estimation accu-
racy to aid CSEM investigation. Forensic Sci. Int. Digit. Investig. 32, (2020). https://doi.org/
10.1016/j.fsidi.2020.300921
4. Beattie, J.R., Pawlak, A.M., McGarvey, J.J., Stitt, A.W.: Sclera as a surrogate marker for deter-
mining AGE-modifications in Bruch’s membrane using a Raman spectroscopy-based index of
aging. Investig. Ophthalmol. Vis. Sci. 52(3), 1593–1598 (2011). https://doi.org/10.1167/iovs.
10-6554
5. Coudrillier, B., Tian, J., Alexander, S., Myers, K.M., Quigley, H.A., Nguyen, T.D.: Biomechan-
ics of the human posterior sclera: age and glaucoma-related changes measured using inflation
testing. Investig. Ophthalmol. Vis. Sci. 53(4), 1714–1728 (2012)
6. Dalrymple, K.A., Jiang, M., Zhao, Q., Elison, J.T.: Machine learning accurately classifies age
of toddlers based on eye tracking. Sci. Rep. 9, 6255 (2019). https://doi.org/10.1038/s41598-
019-42764-z
7. Das, S., Ghosh, I.D., Chattopadhyay, A.: An efficient deep learning strategy: its application
in sclera segmentation. In: 2020 IEEE Applied Signal Processing Conference (ASPCON), pp.
232–236. Kolkata (2020)
8. Guo, G.: Human age estimation and sex classification. In: Video Analytics for Business Intel-
ligence, vol. 409, pp. 101–131. Springer, Berlin, Heidelberg (2012)
9. Hu, Z., Wen, Y., Wang, J., Wang, M., Hong, R., Yan, S.: Facial age estimation with age
difference. IEEE Trans. Image Process. 26(7), 3087–3097 (2017). https://doi.org/10.1109/TIP.
2016.2633868
10. Jana, R., Datta, D., Saha, R.: Age estimation from face image using wrinkle features. Procedia
Comput. Sci. 46, 1754–1761 (2015). https://doi.org/10.1016/j.procs.2015.02.126
11. Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In:
IEEE Conference on Computer Vision and Pattern recognition (CVPR) Workshop on AMFG.
Boston (2015)
12. Lin, C.T., Li, D.L., Lai, J.H., Han, M.F., Chang, J.Y.: Automatic age estimation system for face
images. Int. J. Adv. Robot. Syst. 9(5), 626–635 (2017). https://doi.org/10.5772/52862
13. Rot, P., Emeršič, Ž., Štruc, V., Peer, P.: Deeps multi-class eye segmentation for ocular biomet-
rics. In: 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI), pp.
1–8 (2018). https://doi.org/10.1109/IWOBI.2018.8464133
14. Rot, P., Vitek, M., Grm, K., Emeršič, Ž., Peer, P., Štruc, V.: Deep sclera segmentation and recog-
nition. In: A. Uhl, C. Busch, S. Marcel, R. Veldhuis (eds.) Handbook of Vascular Biometrics,
pp. 395–432. Springer (2020). https://doi.org/10.1007/978-3-030-27731-4_13
15. Russell, R., Sweda, J.R., Porcheron, A., Mauger, E.: Sclera color changes with age and is a cue
for perceiving age, health, and beauty. Psychol. Aging 29, 626–635 (2014). https://doi.org/10.
1037/a0036142
16. Saxena, A.K., Chaurasiya, V.K.: Multi-resolution texture analysis for fingerprint based age-
group estimation. Multimed. Tools Appl. 76(5), 3087–3097 (2017). https://doi.org/10.1007/
s11042-017-4516-1
17. Saxena, A.K., Sharma, S., Chaurasiya, V.K.: Neural network based human age-group estimation
in curvelet domain. In: Eleventh International Multi-Conference on Information Processing-
2015 (IMCIP-2015), pp. 781 –789 (2015)
18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition (2014). arXiv:1409.1556
102 S. Das et al.
19. Vitek, M., Das, A., Pourcenoux, Y., Missler, A., Paumier, C., Das, S., Ghosh, I.D., et al.:
SSBC 2020: Sclera segmentation benchmarking competition in the mobile environment. In:
International Joint Conference on Biometrics (IJCB 2020) (2020)
20. Zhou, H., Dai, Y., Shi, Y., Russell, J.F., Lyu, C., Noorikolouri, J., Feuer, W.J., Chu, Z., Zhang,
Q., de Sisternes, L., Durbin, M.K., Gregori, G., Rosenfeld, P.J., Wang, R.K.: Age-related
changes in choroidal thickness and the volume of vessels and stroma using swept-source oct
and fully automated algorithms. Ophthalmol. Retin. 4(2), 204–215 (2020). https://doi.org/10.
1016/j.oret.2019.09.012
Data Handling Approach for Machine
Learning in Wireless Communication:
A Survey
Niranjan S. Kulkarni, Sanjay L. Nalbalwar, and Anil B. Nandgaonkar
Abstract Recently, wireless communication network has evolved with different

types of communication architectures and protocols. Operation and management of
such heterogeneous networks with huge demand are manually difficult for network
engineers. In the recent past, Machine Learning (ML) has proven its capability by
significantly improving the performance in various fields such as natural language
processing and medical diagnostic. Using ML in wireless communication is also not
a simple task, as we have to track the user’s Quality of Experience (QoE) on the one
hand and network resource management on the other, with continuously changing
wireless scenarios. Identifying channel variability with proper decision-making is
the crucial task of ML in Wireless Communication Network. In this paper, based on
a systematic review of the current use of machine learning techniques in WCN, a
set of crucial design limitations are identified, and a novel computationally efficient
data exchange approach is proposed.
Keywords Data exchange · Decision logic · Machine learning · Network

resources · Quality of Experience (QoE) · Wireless communication network
1 Introduction
Recent technological development has increased the demand for wireless communi-
cation, and 5.7 Billion population, i.e., 71% will shift to the wireless domain. One of
the reasons for this shift is increased mobile and broadband data rates (43.9 Mbps)
and (110.4 Mbps) respectively. Every year, a number of the applications come into
the market such as Machine to Machine (M2M) communication and smartphones,
and recently Internet of Things (IoT) is replacing the whole market scenario with
increased wireless capabilities [1]. Considering this situation, the next-generation
network has to open the more advanced requirements in terms of computational
N. S. Kulkarni (B) · S. L. Nalbalwar · A. B. Nandgaonkar

Department of Electronics and Telecommunication Engineering, Dr. BATU, Lonere, Raigad
402103, Maharashtra, India
104 N. S. Kulkarni et al.
complexity, data rate requirement, and increased capacity in terms of Computa-

tion Oriented Communications (COC), Contextually Agile eMBB Communications
(CAeC), and Event Defined uRLLC (EDuRLLC) [2].
Considering this requirement, the next-generation network must consider three
design principles: network management via software, flexible management of
network at a local and global level, and intelligent resource management. In the
recent past, efficient management of network devices has attracted wide interest in
the Network Function Virtualization (NFV) and Software Defined Network (SDN),
where different entities are managed by software. With deep insights in these fields,
control and data plane can be separated, and our first goal to manage the network
with the help of software can be fulfilled.
In a next-generation network, most of the applications are mobile, and to satisfy
our second and third design principles, real-time condition and network variability
at local and global levels need to be understood and based on this, an appropriate
decision needs to be taken. Traditional model-based approaches are not sufficient
to catch this complex and non-linear network variability [3]. Artificial Intelligence
(AI) is an excellent solution that accurately extracts the complex features from a
considerable amount of data generated at various levels of wireless networks due
to hardware and software advancements. The following is the flow of this research
work; the motivation and background to use AI in Wireless Communication (WC)
are explained in the next paragraph, along with the various opportunities of AI in the
WC.
In Sect. 2, material and methods, recent work in WC using AI is compared,
and multiple options are identified. A lucid explanation about various limitations
in previous work is discussed, and a novel data handling approach is proposed in
Sect. 3, results, and discussion with concluding remarks in Sect. 4.
1.1 Motivation and Opportunities to Use AI in WC
According to the future needs, the next-generation interface needs to be application-

oriented where accuracy, flexibility, and efficiency are an integral part of handling the
network components. Recently, AI and its sub-parts, namely Machine Learning (ML)
and Deep Learning (DL), are used in various applications such as speech processing,
natural language processing, and image processing with outstanding results. In [4],
multiple types of wireless networks and challenges are discussed. This suggests us
that data preprocessing plays a significant role in drawing the accurate decision in
wireless communication. In network dynamics, threshold setting for categorization
is a challenging task where computation overhead, time delay, and data security are
prime factors on which a data handling approach must be established [5].
Detailed benefits of AI-aided WC equipped with ML are outlined in [6]. In this,
prominent ML families with the corresponding modeling and their use in a different
Data Handling Approach for Machine Learning … 105
wireless context such as Multi-Input Multi-Output (MIMO), Smart Grid (SG), Cogni-
tive Radio (CG) HetNets, Small Cells, and Device to Device (D2D) networks is
outlined.
In various WC areas, ML was used previously. An extensive survey is carried out
in [7] with different network technologies. This work gives insights into used ML
in various wireless domains. In [8], a gap between DL and Wireless Communica-
tion Network (WCN) is reduced by mapping various platforms and ML techniques
to simplify DL’s effective deployment onto WC. With a high-level introduction to
supervising and unsupervised learning, application to communication network at
different layer protocol stack emphasizes the physical layer presented [9].
ML algorithms can learn and adapt to a changing environment in wireless
networks. From [4–9], we can identify the complex pattern recognizing the capacity
of ML, and with DL, a complex radio environment and large-scale topology-related
intelligent management can be design.
In WC, massive data related to various aspects of the network is generated, and
using such Wireless Big Data (WBD) and AI-driven intelligence, the network can
be managed intelligently. In this approach, DL, with its brain-like acute feature
extraction capacity, plays an essential role in analyzing the complex relationship and
catching the network’s real-time dynamics.
ML techniques can realize the implementation of human-like prediction or
decision-making process. The most substantial advantage of adopting ML techniques
is that it can learn continuously from the data over time. Even during the system’s
operation, it can be continually updated by observing newly observed and produced
data.
This quality makes the AI a vital driving force in the next-generation network
where the network becomes autonomous. To fetch the critical information from
WCN, AI capabilities are subdivided into four types of analytics: descriptive, diag-
nostic, predictive, and prescriptive analytics [10]. In a descriptive kind of analytics,
important information of the network is collected and using diagnostic tools, network
performance analysis is carried out. Recently, predicting the important network
parameters with the help of ML, DL has seized the network engineers’ attention.
However, based on these three analytics, predicting future network impairments and
observing the most probable solution is the challenge for prescriptive analytics in the
next-generation network. A few of the tasks under various analytics are identified in
Table 1.
Presently, three types of analytics are used for network management, and deci-
sions are based on human interaction. In a future network, such manual interven-
tions can lead to operational delay and performance degradation. Hence, based on
this analytics information, an application-oriented prescriptive type decision-making
model is proposed in Sect. 3. In Sect. 2, we will study various wireless communication
scenarios where AI had a significant role.
Table 1 A few of the ML opportunities in wireless communication

Type of the opportunity Name of the opportunity
Descriptive type (Detection) N/W state information, traffic profile, channel
condition, user perspective identification, signal
characteristics, Queuing state of each node,
congestion, SNR
Channel holding time, collision status, routing
delay, routing path, BER, packet loss, link
evaluation, intrusion detection, N/W performance
Descriptive type (Classification) Network modulation/demodulation, security codes,
communication technology
Traffic type, congestion type, fault type, routing
class
Diagnostic type Security, reliability, traffic versus resources, latency
versus delay, N/W anomalies
Service impairment
Predictive type Resource requirement, traffic requirement,
mobility pattern, location prediction/user mobility,
user behavior, user preference, probable fault
type/location
Prescriptive type Resource allocation, queue management,
congestion management, routing performance
optimization, energy enhancement
2 Recent Development in Wireless Communication Using

AI
To develop the decision model based on the wireless communication channel vari-
ability, various signal detection and classification methods are the first step to draw
the meaning of full information from the wireless network. After fetching the channel
information, user-centric data such as mobility and context awareness play a useful
role in tracking the user movement in a particular area. Once sufficient relevant
information related to both channel and user is acquired, network performance opti-
mization can be done. With this flow, the recent literature related to signal detection,
classification, mobility, context awareness, traffic prediction, and network perfor-
mance optimization is surveyed to identify key design parameters for next-generation
data handling mechanisms.
2.1 Signal Detection, Modulation, and Demodulation

Classification
In this subsection, signal detection and classification methodologies are reviewed to

understand the various methods used recently to draw meaningful information from
the wireless channel. A few of the findings are listed in the following Table 2.
With DL, promising results for channel estimation (CE) and signal detection
(SD) in complicated and distorted channels can be seen in [11]. Unlike the current
OFDM channel, the proposed method with a limited training pilot first estimates the
channel state information (CSI) implicitly and directly detects/recovers the trans-
mitted symbols. Distortion in the channel is handled by offline training based on
simulated data, and data are recovered instantly. Replacing the existing orthog-
onal frequency-division (OFDM) multiplexing receiver in WC, a robust expert
knowledge-based data-driven fully connected deep neural network (FC-DNN) model
is proposed in [12]. In this work, the receiver is divided into channel subnet (CS) and
estimation subnet (ES) with DNN in each subnet. Depending upon the complexity,
two types of DL-based zero-forcing techniques, namely fully connected (FC) and
bi-directional long short-term memory (Bi-LSTM) signal detection, are proposed.
The robustness for signal estimation and detection is demonstrated in computational
complexities, signal-to-noise ratio, and memory usage.
To automatically and blindly detect the Morse signal in wideband spectrum data,
a robust DL-based Deep Morse framework is designed [13]. In wideband spectrum,
without considering the prior knowledge, A DL energy-based multi-signal sensing
module is a deal to catch the signal contents. A CNN-based non-linear pooling cell
module is designed to fetch the informative local features from the located candidates
to distinguish the Morse signal from other modulation types.
DNN-based end-to-end spectrum monitoring wireless signal identification
approach is described in [14]. To classify the modulation technique and detect the
interference, various features are used such as temporal wireless signal, amplitude-
phase, and frequency domain representation of wireless data with automatic feature
extraction, and end-to-end training.
A DL-based automatic detection of decoding algorithm is presented in [15]. In
the presence of additive white Gaussian noise (AWGN) in the channel, optimal
performance can be seen by creatively designing and training the recurrent neural
network (RNN) architectures to decode well for sequential convolutional and turbo
codes. In this method, the loss function is guided by dynamic programming, and
sequential codes are parameterized for training the RNN.
An ML-based demodulation method in the physical layer of visible light commu-
nication (VLC) systems is presented in [16]. A real-time signal is collected from
a flexible end-to-end hardware prototype system. In this work, modulated signals
are converted into images, and a convolutional neural network (CNN) classifier
recognizes the signal from it; the deep belief network (DBN) and adaptive boosting
(AdaBoost) ML demodulators use restricted Boltzmann and K-nearest neighbor
algorithms for demodulation.
Table 2 A few of the ML/DL opportunities in signal detection and classification
108
Paper No. Parameter targeted Name ML/DL method Parameter compared O/P compared with Results
[10] Signal detection and recovery Deep learning Bit error rate (BER) with and LS, MMSE With limited data, channel
of the transmitted data without cyclic prefix (CP) characteristic can be learned
with DL
[11] Signal detection Deep learning Robustness—(MSE and LMMSE-MMSE, FC-DNN, DL-based intelligent
BER) and ComNet-BiLSTM, extraction and with efficient
complexity—Floating-point performance can be seen in
multiplication (FLOPs), signal detection
memory usage,
computational intensity, and
time consumption
[12] Signal detection DL and CNN Precision, recall, F1 score, SVM, PSVMHSVM, DNN, Excellent ability to capture
accuracy SAE distinct information with
high accuracy can be seen
[13] Signal classification and DNN Classification accuracy, comparing the IQ-A-Ø and • For accurate decision, a
interference detection precision, recall, and F1 FFT vector with varying balanced trade-off between
score SNR results are compared efficiency and complexity
must be considered
• Time-varying multipath
and channel impairments
must be dealt with
appropriately
[14] Discovery of decoding RNN Robustness, adaptivity Neural decoder, Turbo New codes can be learned on
algorithm decoder the AWGN channel
(continued)
N. S. Kulkarni et al.
Table 2 (continued)
[15] Demodulation CNN, DBN, and Adaboost Accuracy Changing the distance and Accuracy is inversely
modulation type proportional to the
transmission distance, and
high order of modulation is
preferred for more accuracy
[16] Signal demodulation DL Changing the SNR and DBN, SVM, MLD • Different modulation
different modulation mode models need different
accuracy and training period training periods
is compared • Higher modulation order
needs longer training
signal periods
• With an increase in
modulation order,
demodulation accuracy
decreases
Data Handling Approach for Machine Learning …
[17] Radio monitoring Deep CNN Throughput, latency With different time Context awareness is one of
scenarios the criteria for resource
optimization
[18] Time-varying underwater DL BER BER changing with SNR With less training better,
acoustic with severe Doppler results can be achieved
effect
[19] Modulation classification Ensemble voting classifier SNR versus accuracy KNN, SVC, AdaBoost, DT, More accurate results can be
BAG, RFC, GB, LR, XGB observed by using DL
(continued)
109
Table 2 (continued)
110
[20] Wireless technology FNN, CNN, decision tree, Accuracy, generalizability, FNN manual feature, As automatic feature
classification RForest robustness, complexity CNN-RSSI, image, IQ extraction outperforms
based, random forest manual feature extraction in
all except complexity, the
proper trade-off between
manual and automated
feature extraction methods
needs to be investigated
N. S. Kulkarni et al.
A real modulated signal dataset of the wireless communication system is presented

in [21], and the DL-enabled signal demodulation method. A flexible communication
prototype platform is proposed for detecting the real modulation dataset. Based on the
measured dataset, two DL-based demodulators, termed deep belief network (DBN)-
support vector machine (SVM) demodulator and adaptive boosting (AdaBoost)-
based demodulator, are proposed. The proposed DBN-SVM-based demodulator
exploits the advantages of both DBN and SVM, i.e., the benefit of DBN as a feature
extractor and SVM as a feature classifier. In DBN-SVM-based demodulator, the
received signals are normalized before being fed to the DBN network. For the
AdaBoost-based demodulator, a k-nearest neighbor coding is used.
A supervised learning-based ‘Decision Tree Boosting’ algorithm is outlined in
[17]. This TCP protocol is extended with a packet loss classifier where constraints
are computed directly, and the classifiers’ parameters are tuned to maximize the TCP
rate.
In [18], an advanced ML algorithm is used for radio monitoring and based on
context awareness information, radio slices are assigned for optimal network perfor-
mance in ongoing traffic in a given spectrum band. Time-domain IQ samples are used
as input to a classifier, and classified output is used for optimal resource mapping
virtualization.
A DL-based Underwater Acoustic Channel (UWA) receiver is proposed for time-
varying single carrier communication [19]. The proposed model alternatively works
on online training and testing to accommodate the time variability of UWA. For this,
narrowband signaling model along with predefined threshold and estimated bits are
consider. With a considerable reduction in training overhead, the proposed model
gives better performance results over traditional channel estimates (CE) based on
decision feedback equalizer.
To optimize the modulation classification in limited data and with little compu-
tational speed using the traditional ML algorithm, an ensemble voting approach is
proposed in [20]. The proposed method compares the various algorithms’ perfor-
mance via soft and hard voting approaches, and the top three methods are utilized
for the next step ensemble.
Technology classification at different locations and conditions, an approach for
manual and automatic feature extraction, is compared based on accuracy, general-
izability, robustness, and complexity which is proposed in [22]. In the first step,
without domain expertise, IQ sample/RSSI values are fetched from Universal Soft-
ware Defined Radio based, and features, as mentioned above, are extracted. An
interference map is designed in the last part, and spectrum policies are defined for
fair coexistence.
2.2 Mobility and Context Awareness
Data exchange using ML in visible light communication (VLC) networks in user

mobility and wireless-traffic dynamics is developed [23]. The presented work
outlines a performance trade-off between delay and throughput in dynamic indoor

VLC networks. Here, the average system queue backlog is reduced, and throughput
is improved under user mobility conditions. DL-based physical layer communication
is outlined in [24] to improve the overall transmitter’s and receiver’s performance.
Categorizing the system with and without block structure, signal compression, and
DL detection is observed.
Computing resources and training time are the vital parameters in resource
constraints environment. Surveying various applications such as classification of
under resource management in the MAC layer, mobility and network management
in the network layer, and localization in the application layer, various DL applications
in physical layer communication with domain knowledge is explained in [25].
ML-based coordinated beamforming solution is provided in [26] to enable the
high mobile user. In this approach, the network is divided into a collaborative and
distributed Base Station (BS) to simultaneously serve the user. Coordinating base
stations receives the uplink training pilot sequence sent by a user with Omni or quasi-
Omni beam patterns. This information will work as a defining signature to draw the
user location and interaction with the surrounding environment. A DL model is
used, which learns this information based on the information beamforming vectors
predicted at the BS.
The optimized caching policy is defined in [27] by tracking the user preference and
activity level. The current approach maximizes the offloading probability for cache-
enabled device-to-device communications with a low-complexity algorithm. To learn
user preference, a model for the user request behavior resorting to probabilistic
latent semantic analysis and understanding the model parameters by the expectation–
maximization algorithm is presented.
Without disclosing the own private information data in resource constraints,
overall training becomes inefficient under poor channel conditions. In [28], FedCS, a
federated learning approach is proposed, efficiently and actively managing the clients
based on their resource condition. Via FedCS, many clients’ updation is possible,
and performance improvement can be seen because of ML.
To accommodate many users into BS and balance the load, Channel State
Information-based user Assessment scheme is discussed in [29]. In this method,
instead of the whole CSI, user movement is caught with Synchronization Signal
Power, which reduces overhead, and complexity is observed in load balancing.
In a dense cell with limited overhead, UA is a challenging task. In [30], with
the help of ML, the multi-connectivity capable UA approach is converted into a
multi-level classification problem. LPC and RAkEL algorithms convert the multi-
label classification into single-label classification and correlation of the users. BS
are defined with the help of Graphical representation model by designing Markov
random fields approach.
2.3 Network Performance Optimization
In [31], various Wireless Sensor Network fault detection mechanisms are developed
with ML, and one such mechanism was enhanced. Here, the results are tested on a
real medical dataset, which gives accurate results compared to the existing one.
A method for learning data flow rates in a wireless network to improve its quality of
service is presented in [32]. An appropriate neighboring node for packet forwarding
is selected by learning the environment with the help of Reinforcement Learning
(RL). The hierarchical decision technique is used to improve the learning capacity
of nodes. For each layer, the decision is applied, and particular nodes are selected
with whom more information about the environment is present. This information is
shared with less informative nodes, and learning capacity is improved.
A deep learning framework consisting of a binary measurement matrix, having
a non-uniform quantizer, and a non-iterative recovery solver is presented [33]. By
training the network, these parts are jointly optimized. The results on synthetic and
real datasets reveal a drastic reduction of the transmission bits.
With intelligent optimization and target repair, a jointly optimized extreme
learning machine (JOELM) approach is proposed for the short-term prediction of
fading channels [34]. In this, firefly algorithm is imported to intelligently optimize
the traditional extreme learning algorithm.
To optimize the spectrum and energy, cellular and IoT’s symbiotic relationship is
used in a centralized and de-centralized manner [35]. With DL’s help, channel esti-
mation at a global and local level based on different frames to detect user association
policy at BS is defined. Based on distributed DRL algorithm, users are managed at a
central and distribution center based on historical channel and interference informa-
tion.To efficiently handle the data, accuracy and computational efficiency together is
not consider for data handling approach in [36]. Efficient and fast processing of data
queue management in the data processing layer needs to be investigated from an
overhead point of view. In [37], a connection is established between the model-
driven and in-depth learning approach by examining the model and data-driven
approach. Considering the Wireless Communication scenario, only a data-driven
approach is insufficient; however, a theoretical mathematical model as a primary
information decider will efficiently balance resource management accuracy and flex-
ibility. Several issues in the model-driven approach in context to receiver design and
channel information accuracy are discussed in [38]. A model-driven approach can
significantly reduce the computation time compared to the Monte Carlo simulation
with a specialized and accurate selection of models.
Several models driven approach issues context to receiver design, and channel
information accuracy is discussed in the paper.
Various opportunities are listed in Table 3.
Table 3 A few of the ML/DL opportunities in network performance optimization

Paper Parameter Name Parameter O/P compared with Results
No. ML/DL compared
method
[31] Fault Various ROC curve, MAE, J48, random A new way to
detection ML time to converge forests, and optimize the
k-nearest neighbors performance by
regression, additive combining
regression—linear classification and
regression regression is
proposed
[33] CSI Extreme RMSE by varying AR, BPNN, SVM An auto
ML the SNR for decision-making
(Optimisation, algorithm can
Repair, and reduce the
Robustness), BER computation cost
and SER (for
performance),
PDF and CDF for
channel quality
[34] User Deep RL Average sum Random policy and • Rapid
association transmission rate optimal policy correlation
algorithm with a identification is
centralized and a difficult task if
distributed we consider the
approach user association
• The distributed
approach is
more scalable
than the central
system
2.4 Traffic Prediction in Wireless Communication
A densely connected CNN-based traffic prediction approach is proposed for

predicting the traffic in WC [39, 40].
This paper identifies the significant correlation between spatial and temporal
features. Using this as a reference, big-data-driven and intelligent Spatial–Temporal
Cross-domain neural Network (STCNet) architecture are proposed [41] to capture
the intricate patterns unseen in cellular data effectively. Modeling the spatial and
temporal dependencies, the diversity of various city regions is evaluated. A clus-
tering algorithm is presented to segment city areas into different groups, and with
a successive inter-cluster transfer learning strategy, a knowledge reuse-based traffic
prediction framework is designed. Capturing the channel dynamics via two RNN-
based traffic predicting, optimal placement policy and dimension for the data center
are designed in [42].
Decomposing the data into in-tower and inter-tower, various characteristics and
root causes of dynamic channels are studied [43]. The first time, DL is used for
predicting the individual tower traffic based on spatial dependency. Due to hetero-
geneous devices, uneven bursty traffic reaches switches and may lead to congestion.
A Deep CNN-based intelligent Partial Overlapping Channel allocation strategy is
proposed in [44], predicting future traffic and assigning the channels by reducing the
convergence time. In dynamic users movement, data accessing and processing is a
challenging a task. Various issues such as privacy and security need to be considered
while mapping the user data and satisfaction level [45]. Multiple opportunities are
listed in Table 4.
3 Discussion and Proposed Methodology
In the above section, we have identified many essential points that need to be consid-
ered to develop ML-based WCN. In [12, 15, 16, 19, 22], novel model-driven, data-
driven training approaches are discussed, and alternatively, the online–offline mode
is used for manual and automatic feature extraction. DL is used efficiently for channel
estimation, and narrow features are extracted from wideband channel [11, 13], which
motivate us to use the ML in WCN.
To enhance the performance, various new approaches are proposed in [34, 35]
for intelligent optimization. Traffic prediction is the first step in managing the
next-generation network; hence, different traffic prediction approaches are studied
[39–44].
Though the past developments have improved performance, limitations with the
existing approach constrain its high-performance utilization.
From the above section, we have observed a few issues with current deep machine
learning in wireless communication which are as follows:
3.1 Key Observations
1. ML’s complexity depends upon the size and quality of the data with performance
objective and efficient learning, and updation needs to be observed during data
exchange. Traditional complexity evaluator matrices are not sufficient as they
cannot catch the dynamic requirement of future data handling networks.
2. The application of the proposed approach in interfacing distributed devices
using interfacing frameworks [46] has been limited to defined protocols, and a
dynamic updation leads to operational instability in such conditions.
3. To deal with the future network demand, several ML need to train the network
several times, which creates processing overhead, leading to a considerable
latency in the network. Such delays minimize the network throughput, data
integrity, and network lifetime by increasing the burden on the allotted resources.
Table 4 A few of the ML/DL opportunities in traffic prediction

Paper Parameter Name ML/DL Parameter O/P compared Results
No. targeted method compared with
[36] Traffic Dense CNN RMSE HA, ARIMA, Accurate
prediction LSTM, predictions result
due to S&T
dependencies are
collectively
modeled; a large
amount of training
data is needed for
better results
[39] Traffic Dense CNN RMSE LSTM, SVR, Needs to consider
prediction 2D-CNN S&T features
simultaneously
[40] Traffic DL RMSE, MAE, LR, SVR, Too complicated
prediction R2 LSTM, model overfits the
Dens-CNN data, and
performance
degradation is
observed
[41] Region RNN’s Varying the GRU, LSTM Dynamic
clustering, LSTM-based activation utilization of
traffic GRU model function and resources by
prediction for different predicting the
geographical traffic
zones
[42] Traffic DL MAE, MARE Naïve, ARIMA, Relation of
prediction LSTM, HW, spatial–temporal
GNN-A dependencies in
traffic prediction
[43] Traffic Deep CNN Accuracy, CoCAG-SBR, Deep CNN
prediction iteration time, CoCAG-BR, perform
and POC throughput, ACPOCA significantly better
assignment packet loss in high-speed
rate transmission in
terms of
convergence time,
packet loss, and
throughput
4. In the future, the proposed model’s feasibility is to be checked via practically

implementing and evaluating the proposed model in terms of Accuracy, Gener-
alizability, Robustness, and Complexity. A proper set of rules which can capture
the variability in the wireless network needs to be formulated.
3.2 Proposed Methodology
1. Based on observation of past developments and the limitations outlined, this

work is focused on developing a new middleware interface modeling for data
exchange in the wireless network. This approach will improve reliability,
throughput, low power consumption, accuracy, and data integrity using an
advanced machine learning approach.
2. To minimize the learning and updating overhead, an in-depth learning approach
using input characteristics will be focused. For an ML approach in wireless
communication, a dataset of learning observation will be recorded, and a new
request will be processed based on the current network parameter and past
learning. The ML approach will be modeled using a spatial–temporal cross-
domain neural network (STCNet) [41], where the learning updation of a dataset
will be made to pass through a decision logic before passing for updation.
The decision logic finds a weighted correlation for new input and dataset entry
to decide the updation process. This method will reduce the learning process
overhead and minimize the delay in data exchange.
3. In the updation process of machine learning, each observed data passed for
learning is randomly stored. This updation results in an extensive search over-
head during decision-making, resulting in the delay of allocation. A new data
semantic approach based on input updation rate, characteristic of input vari-
ation, and maximization of expectation parameter will be made. About the
probabilistic latent semantic analysis [27], an input characteristic variation will
be used for making dataset updation. An illustration of the proposed approach
is shown in Fig. 1.
Fig. 1 Operational diagram of the proposed approach

4 Conclusions
In this work, various performance optimization and traffic prediction techniques are
reviewed. Based on the channel and user information, various research opportunities
are identified, and a novel data handling approach is proposed. The proposed method
suggests the new algorithm with lower overhead, lesser computational complexity,
and more extensive network integrity. This research solution finds scope to improve
the network performance under dynamic time-variant channel conditions in wireless
communication. This solution can minimize the constraint of dynamic variations,
making communication more robust to variations in data exchange over a wireless
medium. This provides an enormous scope in offering higher service compatibility
and resource utilization for next-generation wireless communication.
Deep machine learning algorithms work very well for network management,
network optimization, signal management, channel assignment, network security,
route deciding, etc. Deep reinforcement learning and deep Q-routing are the main
learning techniques, which are most useful for network operations. However, it
is difficult to obtain training data that includes various scenarios. Due to wireless
networks’ dynamic behavior, it is challenging to create the datasets for training,
and due to the dynamicity and unpredictability of wireless channels, it is hard to
find any regular pattern from previously experienced data. The learning and upda-
tion of observations in a run-time environment are a highly complex and resource-
consuming process. In addition to this, the volume and the integration of the network
add processing complexity and overhead to the network. These constraints limit the
usage of ML in wireless communication with many performances. In this research
work, a focus is made for developing low complexity and a fast adaptive approach
for wireless communication to improve the network’s overall performance.
References
Noida, 2017, pp. 1353–1357. https://doi.org/10.1109/CCAA.2017.8230008
2. Khaled, B.L., Wei, C., Yuanming, S., Jun, Z., Ying-Jun, A.Z.: The roadmap to 6G: AI
empowered wireless networks. IEEE Commun. Mag. 84–90 (2019)
3. Shi, Y., Zhang, J., Letaief, K.B., Bai, B., Chen, W.: Large-scale convex optimization for ultra-
dense cloud-RAN. IEEE Wirel. Commun. 22(3), 84–91 (2015)
4. Dai, H.-N.: Big data analytics for large-scale wireless networks: challenges and opportunities.
ACM Comput. Surv. 52(5), 1–35. Article 99. Publication date: September 2019
5. Aguilar Igartua, M., Almenares Mendoza, F.: INRISCO: INcident monitoRing in Smart
Communities. IEEE Access 8, 72435–72460 (2020)
6. Wang, J., Jiang, C.: Machine learning paradigms in wireless network association. Encyclopedia
of Wireless Networks, pp. 1–9 (2018)
7. Boutaba, R., Salahuddin, M.A., Limam, N., Ayoubi, S., Shahriar, N., Estrada-Solano, F.,
Caicedo, O.M.: A comprehensive survey on machine learning for networking: evolution,
applications and research opportunities. J. Internet Serv. Appl. 1–99 (2018)
8. Zhang, C., Patras, P.: Deep learning in mobile and wireless networking: a survey. IEEE
Commun. Surv. Tutor. 1–67 (2018)
9. Kadam, K., Srivastava, N.: Application of machine learning (Reinforcement Learning) for
routing in wireless sensor networks (WSNs). In: Proceedings of the 2012 1st International
Symposium on Physics and Technology of Sensors, pp. 349–352 (2012)
10. Kibria, M.G., Nguyen, K., Villardi, G.P., Zhao, O., Ishizu, K., Kojima, F.: Big data analytics,
machine learning, and artificial intelligence in next-generation wireless networks. IEEE Access
6, 32328–32338 (2018)
11. Ye, H., Li, G.Y., Juang, B.-H.: Power of deep learning for channel estimation and signal
detection in OFDM systems. IEEE Wirel. Commun. Lett. 7(1), 114–117 (2018)
12. Gao, X., Jin, S., Wen, C.-K., Li, G.Y.: ComNet: combination of deep learning and expert
knowledge in OFDM receivers. IEEE Commun. Lett. 22(12), 2627–2630 (2018)
13. Yuan, Y., Sun, Z., Wei, Z., Jia, K.: DeepMorse: a deep convolutional learning method for blind
Morse signal detection in wideband wireless spectrum. IEEE Access 7, 80577–80587 (2019)
14. Kulin, M., Kazaz, T., Moerman, I., De Poorter, E.: End-to-end learning from spectrum data: a
deep learning approach for wireless signal identification in spectrum monitoring applications.
IEEE Access 6, 18484–18501 (2018)
15. Kim, H., Jiangy, Y., Rana, R., Kannany, S., Oh, S., Viswanath, P.: Communication algorithms
via deep learning. In: Proceedings of ICLR 2018, pp. 1–17 (2018)
16. Ma, S., Dai, J., Lu, S., Li, H., Zhang, H., Du, C., Shiyin, L.: Signal demodulation with machine
learning methods for physical layer visible light communications: prototype platform, open
dataset, and algorithms. IEEE Access 7, 30588–30598 (2019)
17. El Khayat, I., Geurts, P., Leduc, G.: Improving TCP in wireless networks with an adap-
tive machine-learnt classifier of packet loss causes. International Federation for Information
Processing, pp. 549–560 (2005)
18. Liu, W., Santos, J.F., Jiao, X., Paisana, F., DaSilva, L.A., Moerman, I.: Using deep learning and
radio virtualization for efficient spectrum sharing among coexisting networks. In: 13th EAI
International Conference, CROWNCOM, pp. 1–10 (2018)
19. Zhang, Y., Li, J., Zakharov, Y.V., Li, J., Li, Y., Lin, C., Li, X.: Deep learning based single carrier
communications over time-varying underwater acoustic channel. IEEE Access 7, 38420–38430
(2019)
20. Mahabub, A., Sultan bin Habib, A.-Z.: A voting approach of modulation classification for
wireless network. In: Proceedings of the 6th International Conference on Networking, Systems
and Security, pp. 133–138 (2019)
21. Wang, H., Wu, Z., Ma, S., Lu, S., Zhang, H., Ding, G., Li, S.: Deep learning for signal demodula-
tion in physical layer wireless communications: prototype platform, open dataset, and analytics.
IEEE Access 7, 30792–30801 (2019)
22. Fontainea, J., Fonseca, E., Shahida, A., Kist, M., DaSilva, L.A., Moermana, I., De Poortera,
E.: Towards low-complexity wireless technology classification across multiple environments.
Ad Hoc Netw. 91, 101881, 1–12 (2019)
23. Zhang, R., Cui, Y., Claussen, H., Haas, H., Hanzo, L.: Anticipatory association for indoor visible
light communications: light, follow me! IEEE Trans. Wirel. Commun. 17(4), 2499–2510 (2018)
24. Qin, Z., Ye, H., Li, G.Y., Fred Juang, B.-H.: Deep learning in physical layer communications.
IEEE Wirel. Commun. 26(2), 93–99 (2019)
25. Sun, Y., Peng, M., Zhou, Y., Huang, Y., Mao, S: Application of machine learning in wireless
networks: key techniques and open issues. IEEE Commun. Surv. Tutor. 1–37 (2019)
26. Alkhateeb, A., Alex, S., Varkey, P., Li, Y., Qu, Q., Tujkovic, D.: Deep learning coordinated
beamforming for highly-mobile millimeter wave systems. IEEE Access 6, 37328–37348 (2018)
27. Chen, B., Yang, C.: Caching policy for cache-enabled D2D communications by learning user
preference. IEEE Trans. Commun. 66(12), 6586–6601 (2018)
28. Nishio, T., Yonetani, R.: Client selection for federated learning with heterogeneous resources
in mobile edge. In: IEEE International Conference on Communications (ICC), pp. 1–7 (2019)
29. Han, P., Zhou, Z., Wang, Z.: User association for load balance in heterogeneous networks with
limited CSI feedback. IEEE Commun. Lett. 24(5), 1095–1099 (2020)
30. Liu, R., Lee, M., Yu, G., Li, G.Y.: User association for millimeter-wave networks: a machine
learning approach. IEEE Trans. Commun. 68(7), 4162–4174 (2020)
31. Pachauri, G., Sharma, S.: Anomaly detection in medical wireless sensor networks using
machine learning algorithms. Proc. Comput. Sci. 70, 325–333 (2015)
32. Sruthi, S.S., Varghese, A.: Enhance QoS by learning data flow rates in wireless networks using
hierarchical docition. ICECCS 708–714 (2015)
33. Sun, B., Feng, H., Chen, K., Zhu, X.: A deep learning framework of quantized compressed
sensing for wireless neural recording. IEEE Access 4, 5169–5178 (2016)
34. Sui, Y., Yu, W., Luo, Q.: Jointly optimized extreme learning machine for short-term prediction
of fading channel. IEEE Access 6, 49029–49039 (2018)
35. Zhang, Q., Liang, Y.-C., Vincent Poor, H.: Intelligent user association for symbiotic radio
networks using deep reinforcement learning. In: IEEE Global Communications Conference
(GLOBECOM), pp. 1–12 (2019)
36. Jan, B., Farman, H., Khan, M.: Designing a smart transportation system: an internet of things
and big data approach. IEEE Wirel. Commun. 73–79 (2019)
37. Zappone, A., Di Renzo, M., Debbah, M.: Wireless networks design in the era of deep learning:
model-based, AI-based, or both? (2019). arXiv:1902.02647
38. He, H., Jin, S., Wen, C.-K., Gao, F., Li, G.Y., Xu, Z.: Model-driven deep learning for physical
layer communications. IEEE Wirel. Commun. (2019)
39. Zhang, C., Zhang, H., Yuan, D., Zhang, M.: Citywide cellular traffic prediction based on densely
connected convolutional neural networks. IEEE Commun. Lett. 22(8), 1656–1659 (2018)
40. Liang, D., Zhang, J., Jiang, S., Zhang, X., Wu, J., Sun, Q.: Mobile traffic prediction based
on densely connected CNN for cellular networks in highway scenarios. In: 11th International
Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–5 (2019)
41. Zhang, C., Zhang, H., Qiao, J., Yuan, D., Zhang, M.: Deep transfer learning for intelligent
cellular traffic prediction based on cross-domain big data. IEEE J. Sel. Areas Commun. 37(6),
1389–1401 (2019)
42. Paul, U., Liu, J., Troia, S., Falowo, O., Maier, G.: Traffic-profile and machine learning based
regional data center design and operation for 5G network. J. Commun. Netw. 21(6), 569–583
(2019)
43. Wang, X., Zhou, Z., Xiao, F., Xing, K., Yang, Z., Liu, Y., Peng, C.: Spatio-temporal analysis and
prediction of cellular traffic in metropolis. In: IEEE 25th International Conference on Network
Protocols (ICNP), pp. 1–14 (2018)
44. Tang, F., Mao, B., Md. Fadlullah, Z., Kato, N.: On a novel deep-learning-based intelligent
partially overlapping channel assignment in SDN-IoT. IEEE Commun. Mag. 80–86 (2018)
45. Alkurd, R., Abualhaol, I.: Big-data-driven and AI-based framework to enable personalization
in wireless networks. IEEE Commun. Mag. 18–24 (2020)
46. Simeone, O.: A very brief introduction to machine learning with applications to communication
systems. IEEE Trans. Cogn. Commun. Netw. 4(4), 648–664 (2018)
Breast Cancer Detection in
Mammograms Using Deep Learning
Abhiram Pillai, Amaan Nizam, Minita Joshee, Anne Pinto,

and Satishkumar Chavan
Abstract Breast cancer is the most lethal cancer among women. Early-stage diag-
nosis may reduce the mortality associated with breast cancer subjects. Diagnosis
can be made with screening mammography. The main challenge of screening mam-
mography is its high risk of false positives and false negatives. This paper presents
the detection of breast cancer in mammograms using the VGG16 model of deep
learning approaches. The VGG16 model is trained and tested on 322 images from
the MIAS dataset. It performs better as compared to AlexNet, EfficientNet, and
GoogleNet models. Classification of mammograms will improve mammograms’
efficient screening, which will be a support system to radiologists.
Keywords Breast cancer detection · Classification of mammograms · Digital

mammography · Convolutional neural network · VGG16 · Deep learning ·
Mammographic image analysis (MIAS) dataset
1 Introduction
Breast cancer disease has the second most noteworthy death rate in women [11].
According to the global cancer statics, the number of new cases in 2018 was esti-
mated to be 18,078,957 and deaths 9,555,027 (52.85%) globally [3]. Breast cancer
cases amount to 2,088,849 (11.55%) and the deaths are estimated to be 626,679
(6.56%). Sixty percentage of the deaths occur in low-income developing countries
like Ethiopia, noted by [5]. If the cancer is detected early, it increases the expectancy
of the patient’s survival rate and decreases the mortality rate. Many presentations like
masses, areas of symmetry and distortion, and micro-calcifications may reveal breast
cancer. The most common and representative indication is masses which may not
be detected due to overlapping breast tissues. Masses can be of two types, namely
undetected and misidentified. False negative cases are categorized as undetected
masses in which delayed diagnosis costs the survival of a patient. Misidentified mass
adds to unwanted anxiety and pain to patients, along with the additional burden of
A. Pillai · A. Nizam · M. Joshee · A. Pinto · S. Chavan (B)

Don Bosco Institute of Technology, Kurla, Mumbai, India
122 A. Pillai et al.
re-screening and biopsy [7]. Many mammographic density ratings, ranging from
manual classification (e.g. BI-RADS) to automatic scores, have been suggested.
Radiologists classified the mammograms visually in the early years by a series of
intuitive yet poorly defined breast tissue patterns. Manual classification is a low-cost
solution but may lead to a considerable risk of misclassification. Also, mammogram
interpretation is challenging, and the possibility of missing abnormality for the tired
radiologist or inexperienced personnel may exist. Therefore, it is expected to have
an efficient, inexpensive, robust, and accurate non-invasive system or tool for breast
cancer detection using a mammogram. This paper presents breast cancer detection
using mammograms in Cranial-Caudal (CC) and Medial-Lateral Oblique (MLO)
views using convolutional neural network, i.e. VGG16.
The paper is organized as follows: Sect. 2 discusses the earlier work in breast
cancer detection. Section 3 explains the presented VGG16 framework for the classi-
fication of mammograms. Section 4 provides the experimental findings followed by
the conclusions in Sect. 5.
2 Related Work
Mammography can be used as a non-invasive method for screening purposes and

supporting modality for prognosis and precise treatment. A status report of the various
types of cancer is analyzed for both sexes [3]. Breast cancer is the most lethal cancer
in females.
There was a breakthrough in 2008 for image classification due to the development
of a convolutional neural network (CNN) to classify objects in 1000 classes [8]. Singh
et al. [13] presented breast cancer diagnosis efficiently in an early stage of cancer.
To classify mass and non-mass regions in the breast, Petrosian et al. [10] preferred
texture features. Deep learning models provided exceptional performance in the field
of medical image analysis. Wang et al. [15] proposed an auto-encoder to classify
breast lesions. Li et al. [9] developed CNN to classify abnormal mammograms.
Detection of micro-calcification using a multi-stage system was presented in [17].
Hadush et al. [6] used faster R-CNN to detect the abnormality in mammo-
grams for the classification of masses into benign and malignant breast cancer.
Abbas [1] used multilayer deep learning architecture to classify extracted mammo-
graphic masses into benign and malignant breast cancer. CNN-based classification
of benign and malignant breast masses is experimented with by Arevalo et al. [2] and
Hamed et al. [7].
Breast Cancer Detection in Mammograms Using Deep Learning 123
3 Methodology
The presented work in this paper is the detection of abnormal mammograms using
the VGG16 deep learning network. The supervised learning of networks is used in
this work. The block schematic of the detection of abnormal mammograms is as
shown in Fig. 1. The dataset used for the experiments is the MIAS dataset, which
consists of 322 images. The mammograms in this dataset are categorized into normal
class and abnormal class. The distribution of images is 208 normal and 114 abnormal
(63 benign and 51 malignant) images in the database. The scans are standardized
to a size of 1024 × 1024 pixels [4]. Figure 2 shows sample images of cancerous
mammograms in CC and MLO views from the MIAS dataset.
3.1 Preprocessing
The images from the MIAS dataset carry a lot of background noise. The presence
of pectoral muscles and outer region makes it a challenging dataset for classification
and segmentation. In this work, the annotations from all the images are removed,
and the pectoral muscles are also cropped. If done so, it always reduces errors and
increases the accuracy of classifying the mammograms. After preprocessing of data,
we obtain the final cropped breast region for the classification task.
3.2 Data Augmentation
The CNN models trained on smaller datasets, like the MIAS dataset, suffer from
an over-fitting problem. To mitigate the over-fitting, data augmentation is preferred
[16]. Data augmentation methods like rotation, scaling, horizontal flipping, resizing
of the images, shearing, etc. are used in this work. Total 2600 images from 322
Fig. 1 The block schematic of classification approach for mammograms during breast cancer
screening
Fig. 2 Sample images of cancerous mammograms from the MIAS dataset a and b MLO views c
and d CC views
images of the MIAS dataset are generated using data augmentation. The percentage
of mammograms used for training, validation, and testing are 70%, 15%, and 15%,
respectively.
3.3 VGG16
The VGG16 model [12] is selected for the work presented on breast cancer detection.
The framework of this excellent CNN model is displayed in Fig. 3.
It consists of five blocks and 16 layers of convolution for feature extraction from
mammograms. Each layer is followed by ReLU and max-pooling layers, supporting
the extraction of varied and in-depth information. The combination of these five
blocks (as shown in Fig. 3) results in better characterization of mammograms. This
leads to improved classification accuracy. The 1 × 1 convolution layers [14] support
Fig. 3 Framework of VGG16 [12]
the reduction of the dimensionality number of trainable parameters. VGG16 consists

of 3 × 3 convolutions with stride 1 and max-pooling layer of 2 × 2 filters with a
stride 2. The network is extensive, and it has about 14,817,193 (approx.) parameters.
We trained the dataset on the VGG16 classification model along with AlexNet [8],
GoogleNet, and EfficientNet. Dropout layer and batch normalization were also used
with each model to reduce over-fitting. While implementing the models, we found out
that AlexNet and EfficientNet were very slow and less efficient than the other models.
AlexNet, which consists of 8 convolution layers, gave an accuracy of 69.64%, while
GoogleNet, which has 22 layers, provided an accuracy of 71.67%. The best results
(the highest accuracy of 75.46%) were achieved using the VGG16, which comprises
16 layers. VGG16 performs excellent with minimum losses. EfficientNet resulted in
an accuracy of 72.29%. A comparative analysis of these four models is presented in
Table 1. The network is trained with a learning rate of 0.001 for 250 epochs.
Table 1 Comparison of various networks for classification of Mammograms on the MIAS dataset
Name of Number Accuracy Loss Validation Parameters
the model of layers (%) loss (Million)
AlexNet 8 69.64 1.84 1.94 49.0
EfficientNet 17 72.29 0.49 1.53 5.3
GoogleNet 22 71.67 0.31 0.63 22.2
VGG16 16 75.46 0.31 0.44 138.0
Even though the VGG16 has fewer layers than some other models, it achieved the
highest accuracy on the MIAS dataset. The other three models have limited accuracy
compared to VGG16. However, it is still a challenge to achieve good performance
with deep learning approaches to classify mammograms from the MIAS dataset.
5 Conclusion
Breast cancer detection in mammograms using VGG16 is presented in this work. The
system identifies the given image as a normal or abnormal mammogram. The pre-
sented methodology includes image preprocessing, data augmentation, and predict-
ing the outcome of new data provided to the trained model. The average classification
accuracy of 75.46% is achieved for the MIAS dataset with VGG16. It is the highest
accuracy compared to AlexNet, GoogleNet, and EfficientNet. However, the number
of trainable parameters is huge in the VGG16. This classification approach may help
in the early diagnosis of breast cancer during the screening of mammograms. It will
be helpful to radiologists for prioritizing mammograms for abnormality during the
screening programs.
References
1. Abbas, Q.: Deepcad: a computer-aided diagnosis system for mammographic masses using deep
invariant features. Computers 5(4), 28 (2016)
2. Arevalo, J., González, F.A., Ramos-Pollán, R., Oliveira, J.L., Lopez, M.A.G.: Representation
learning for mammography mass lesion classification with convolutional neural networks.
Comput. Methods Programs Biomed. 127, 248–257 (2016)
3. Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A.: Global cancer
statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in
185 countries. CA: Cancer J. Clinic. 68(6), 394–424 (2018)
4. Brzakovic, D., Neskovic. M.: Mammogram screening using multiresolution-based image seg-
mentation. In: Series in Machine Perception and Artificial Intelligence. World Scientific, pp
103–127 (1994). https://doi.org/10.1142/97898127978340006
5. Hadgu, E., Seifu, D., Tigneh, W., Bokretsion, Y., Bekele, A., Abebe, M., Sollie, T., Merajver,
S.D., Karlsson, C., Karlsson, M.G.: Breast cancer in ethiopia: evidence for geographic dif-
ference in the distribution of molecular subtypes in africa. BMC Women’s Health 18(1), 1–8
(2018)
6. Hadush, S., Girmay, Y., Sinamo, A., Hagos, G.: Breast cancer detection using convolutional
neural networks (2020). arXiv:200307911
7. Hamed, G., Marey, M., Amin, S., Tolba, M.: Deep learning in breast cancer detection and
classification, pp. 322–333 (2020). https://doi.org/10.1007/978-3-030-44289-7-30
8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional
neural networks. Commun. ACM 60(6), 84–90 (2017)
9. Li, B., Ge, Y., Zhao, Y., Guan, E., Yan, W.: Benign and malignant mammographic image clas-
sification based on convolutional neural networks. In: Proceedings of 10th International Con-
ference on Machine Learning and Computing, ACM (2018). https://doi.org/10.1145/3195106.
3195163
10. Petrosian, A., Chan, H.P., Helvie, M.A., Goodsitt, M.M., Adler, D.D.: Computer-aided diagno-
sis in mammography: classification of mass and normal tissue by texture analysis. Phys. Med.
Biol. 39(12), 2273–2288 (1994). https://doi.org/10.1088/0031-9155/39/12/010
11. Selvathi, D., Poornila, A.A.: Deep learning techniques for breast cancer detection using medical
image analysis. In: Biologically Rationalized Computing Techniques for Image Processing
Applications. Springer, pp 159–186 (2018)
12. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recog-
nition (2015). 1409.1556
13. Singh, D., Singh, A.K.: Role of image thermography in early breast cancer detection-past,
present and future. Comput. Methods Programs Biomed. 183(105), 074 (2020)
14. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke,
V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), IEEE (2015). https://doi.org/10.1109/cvpr.2015.
7298594
15. Wang, J., Yang, X., Cai, H., Tan, W., Jin, C., Li, L.: Discrimination of breast cancer with
microcalcifications on mammography by deep learning. Scienti. Rep. 6(1), 1–9 (2016)
16. Wang, J., Perez, L., et al.: The effectiveness of data augmentation in image classification using
deep learning. Convolut. Neural Netw. Vis. Recognit. 11 (2017)
17. Yang, Z., Dong, M., Guo, Y., Gao, X., Wang, K., Shi, B., Ma, Y.: A new method of micro-
calcifications detection in digitized mammograms based on improved simplified pcnn. Neuro-
comput 218(C), 79–90 (2016). https://doi.org/10.1016/j.neucom.2016.08.068
Deep Learning-Based Parameterized
Framework to Investigate the Influence
of Pedagogical Innovations
in Engineering Courses
M. Ashok , Kumar Ramasamy , Umadevi Ashok ,

and Revathy Pandian
Abstract Pedagogical innovations were mandatory for the twenty-first-century

teaching–learning process. The research work identified the parameters for eval-
uating the presence of innovations in the engineering curriculum pedagogy. Parame-
ters were taught as a framework and predicted the result of the framework using deep
learning. The investigation exhibited the adaptability of teachers in the up-to-date
implementation level of pedagogy.
Keywords Pedagogy · Innovation · Technology · Investigation · Deep learning
1 Introduction
Pedagogy practices played a vital role in the teaching–learning process, as mentioned

in Fig. 1. Revising the curriculum with updated subjects would make the teachers
innovate themselves to improve the teaching–learning process. In the arena of virtual
learning environment, a bunch of challenges was faced by students and teachers.
The preparation level of teachers was increasing irrespective of the subjects. The
behavioral impact of the students was rapidly changing by the influence of social
media factors. The current research compared the effect of pedagogical innovations
in Undergraduate Engineering courses’ past and present curricula. The research case
study had focused on the Private Engineering Colleges affiliated with a State Univer-
sity in Tamil Nadu. Section 2 described the flow of systematic survey incepting
from best practices, implementation, ICT tools, and Artificial Intelligence impact
M. Ashok (B)
Rajalakshmi Institute of Technology, Chennai, TN, India
e-mail: ashok.m@ritchennai.edu.in
K. Ramasamy
Dhirajlal Gandhi College of Technology, Salem, TN, India
U. Ashok
SRM Valliammai Engineering College, Chennai, TN, India
R. Pandian
Velammal Engineering College, Chennai, TN, India
130 M. Ashok et al.
Fig. 1 Conceptual diagram—influence of parameters
in Pedagogical Innovations. Section 3 narrated the design of the parametric frame-

work. Section 4 validated the framework using the deep learning model, and Sect. 5
projected the fine-tuning aspects of the model with different sets of data.
2 Literature Survey
Butler-Henderson and Crawford [1] explored the parameters for students’ evalu-
ation in physical, logical, and pedagogical levels. Tan and Matsuda [2] conveyed
the impact of teaching practices in the agenda of pedagogy implementations. Hill
and France [3] identified the factors which influence the technology-based virtual
learning environment. Kiernan [4] commented about the best teaching practices that
happened during the Covid-19 pandemic break out. Grimal et al. [5] elaborated on
implementing pedagogies for professional training in Universities for Engineering
curriculum. van Twillert et al. [6] narrated teachers’ and students’ behavior analysis
during collaborative learning using ICT tools. Punithavathi and Geetha [7] analyzed
the role of mobile technologies in undergraduate engineering education. Nancy et al.
[8] depicted the role of ICT tools in hybrid teaching and its instances. Özgür [9]
listed the stress management parameters for teachers involving in gadget-based
Deep Learning-Based Parameterized Framework … 131
learning. AlMarwani [10] illustrated the experimental study of incorporating crit-

ical thinking as the primary pedagogy for improving Saudi countries’ education.
Guan et al. [11] surveyed 400 research articles and listed the challenges in adapting
Artificial Intelligence/Machine Learning/Deep Learning in pedagogical innovations.
Turvey and Pachler [12] defended the pedagogy provenance as a critical factor to
enhance the education systems introducing technology. Jeong [13] developed soft-
ware to generate the recommendation for diagramming experiments. Chubko et al.
[14] exhibited digital storytelling as pedagogy and compared two different sets of
students. Avis et al. [15] had conveyed a theoretical framework that processed the
relational pedagogy.
3 Methodology
3.1 Identification of Parameters
The parameters responsible for pedagogical innovations were the Virtual Learning
Environment tool, Social Media Influence, Digital Professional Platform, and
resources available for the teaching–learning process. To create the framework, the
parameters were treated as Usage Quotients (UQ). Every Parameter was described
as the ratio of two or more numerical attributes. It would be easy to perform
computations or predict using the data sets.
VLETUQ = a1/45 (1)
a1 Number of hour sessions conducted using the VLE tool.

45 represents the total number of hours allotted to cover a theory subject.
SMTUQ = b1/c1 (2)
b1 Number of hours used for subject-relevant search per day.

c1 Total number of hours spent on social media per day.
DPPUQ = d1/e1 (3)
d1 Number of hours utilized for professional activities per day.

e1 Total number of working hours per day.
TLPUQ = (f1/5) + (g1/3) (4)
f1 Number of units prepared using ICT.

5 represents the total number of units in a subject as prescribed by the University.
g1 Number of tests conducted using ICT.
132 M. Ashok et al.
3 represents the count of exams, viz., Unit Test-1 (First two and a half Units),
Unit Test- 2 (Second two and a half Units), and Model Exam (complete five units)
as prescribed by the University.
3.2 Parametric Framework
Influence factor (IF) was defined as the variant for storing the influence of (1), (2),
(3), and (4).
IF = [VLETUQ + SMTUQ + DPPUQ] X TLPUQ (5)
The results of (5) can be expressed as a continuous function and narrated as

follows.
f (IF) = {IF > = 3, the subject has the impact of Pedagogical Innovations;
Otherwise, No effect.
Regarding framework validation, third-semester Computer Science Engineering

Papers of 2013, 2017, and 2019 State University regulations in Tamil Nadu were
considered. The quotients were computed and tabulated, as shown in Table 1. By
considering the University’s media protocols, the subject names were renamed as
A–O for all three regulations. AAX, AAY, AAZ, AB1, AB2, and AB3 represent the
Private Engineering Colleges’ alias code names affiliated to the State University in
Tamil Nadu. Table 1 provides the adaptability of pedagogical innovations. It gradu-
ally increases when the curriculum/regulations get updated. Teachers were spending
extra hours to inculcate the innovative pedagogical entities.
A stochastic gradient deep learning model has been constructed to cross-verify the
parametric framework, as shown in Fig. 2. The identified four parameters were passed
as input features and pedagogical innovation’s influence to be predicted/classified.
A 4-3-1 model performed well for the sample data set, as mentioned in Table 1. The
results of the model were tabulated in Table 2. A brief comparison of the proposed
model was carried out with the existing similar models in the research arena of
Pedagogical Innovations in Table 3.
Table 1 Statistics of influence factors in three different regulations

Regulation 2013
Sub/College AAX AAY AAZ AB1 AB2 AB3
A 3.25 3.17 3.13 2.78 3.07 2.92
B 3.21 3.15 3.12 3.18 4.03 3.09
C 2.74 3.26 3.73 3.68 3.96 3.15
D 2.30 4.13 3.36 3.82 2.97 3.72
E 3.13 2.68 2.93 3.45 3.13 3.93
Regulation 2017
Sub/College AAX AAY AAY AB1 AB2 AB3
F 4.13 3.23 3.36 3.13 3.28 3.28
G 3.75 3.32 3.42 3.47 4.32 3.91
H 3.25 3.75 3.92 4.13 4.12 3.82
I 2.86 4.26 3.86 4.24 3.27 4.03
J 3.45 3.10 3.23 3.87 3.67 4.12
Regulation 2019
K 4.25 3.94 3.87 3.92 3.97 4.06
L 4.13 4.01 3.93 3.87 4.63 4.23
M 4.07 4.13 4.07 4.34 4.72 4.18
N 3.12 4.43 4.12 4.43 3.91 4.7
O 3.86 3.82 3.94 4.01 4.09 4.54
Fig. 2 4-3-1 stochastic gradient model

134 M. Ashok et al.
Table 2 Classified outputs of the 4-3-1 model

Regulation 2013
Sub/College AAX AAY AAZ AB1 AB2 AB3
A Y Y Y N Y N
B Y Y Y Y Y Y
C N Y Y Y Y Y
D N Y Y Y N Y
E Y N N Y Y Y
Regulation 2017
F Y Y Y Y Y Y
G Y Y Y Y Y Y
H Y Y Y Y Y Y
I N Y Y Y Y Y
J Y Y Y Y Y Y
Regulation 2019
K Y Y Y Y Y Y
L Y Y Y Y Y Y
M Y Y Y Y Y Y
N Y Y Y Y Y Y
O Y Y Y Y Y Y
Table 3 Comparison of trends in modeling

Parameters 4-3-1 model and parametric Similar models
framework
Parameter identification Clubbed and limited The behavioral impact of students
had been treated as an attitude
factor
Framework computation Simple operations Complexity exhibits in quotient
computation
Model learning Specific clustering Clusters were ignored
Model validation Cross-verified with Parametric Validated with different sets of data
framework
5 Conclusions
Pedagogical innovations were improving whenever the curriculum’s regulations got

revised—also, the teaching community equipping themselves to endorse updated
skills. Further, the investigation would be extended for other semester papers of the
Computer Science Engineering Curriculum. The 4-3-1 model would be fine-tuned

with different sets of data from other states of India.
References
1. Butler-Henderson, K., Crawford, J.: A systematic review of online examinations: a pedagogical

innovation for scalable authentication and integrity. Comput. Educ. 159, 1–12 (2020)
2. Tan, X., Matsuda, P.K.: Teacher beliefs and pedagogical practices of integrating multimodality
into first-year composition. Comput. Compos. (58), 1–16 (2020)
3. Hill, J., France, D.: Innovative pedagogies. In: Kobayashi, A. (eds.) International Encyclopedia
of Human Geography, 2nd edn, pp. 331–339. Elsevier (2020)
4. Kiernan, J.E.: Pedagogical commentary: teaching through a pandemic. Soc. Sci. Hum. Open
2(1), 1–5 (2020)
5. Grimal, L., Marty, P., Perez, S., Troussier, N., Perpignan, C., Reyes, T.: Case study: located peda-
gogical situations to improve global sustainable skills engineering education and universities.
Proc. CIRP 90, 766–771 (2020)
6. van Twillert, A., Kreijns, K., Vermeulen, M., Evers, A.: Teachers’ beliefs to integrate Web 2.0
technology in their pedagogy and their influence on attitude, perceived norms, and perceived
behavior control. Int. J. Educ. Res. (1), 100014 (2020). [Article in Press]
7. Punithavathi, P., Geetha, S.: Disruptive smart mobile pedagogies for engineering education.
Proc. Comput. Sci. (172), 784–790 (2020)
8. Nancy, W., Parimala, A., Merlin Livingston, L.M.: Advanced teaching pedagogy as innovative
approach in modern education system. Proc. Comput. Sci. (172), 382–388 (2020)
9. Özgür, H.: Relationships between teachers’ techno stress, technological pedagogical content
knowledge (TPACK), school support and demographic variables: a structural equation
modeling. Comput. Hum. Behav. 112, 1–9 (2020)
10. AlMarwani, M.: Pedagogical potential of SWOT analysis: an approach to teaching critical
thinking. Think. Skills Creat. 38, 1–6 (2020)
11. Guan, C., Mou, J., Jiang, Z.: Artificial intelligence innovation in education: a twenty-year
data-driven historical analysis. Int. J. Innov. Stud. 4(4), 134–147 (2020)
12. Turvey, K., Pachler, N.: Design principles for fostering pedagogical provenance through
research in technology supported learning. Comput. Educ. (146), 1–36 (2019)
13. Jeong, A.C.: Developing computer-aided diagramming tools to mine, model and support
students’ reasoning processes. Educ. Technol. Res. Dev. 68, 3353–3369 (2020)
14. Chubko, N., Morris, J.E., McKinnon, D.H., et al.: Digital storytelling as a disciplinary literacy
enhancement tool for EFL students. Educ. Technol. Res. Dev. 68, 3587–3604 (2020)
15. Avis, R., Gloria, Q., Liang, L.: Seeing the World through Children’s Eyes, 1st edn. Brill, US
(2020)
Modern Transfer Learning-Based
Preliminary Diagnosis of COVID-19
Using Forced Cough Recordings
with Mel-Frequency Cepstral
Coefficients
Shariva Dhekane, Vaishnavi Agrawal, Aniruddha Datta,
and Kunal Kulkarni
Abstract Researchers have used forced recordings of coughs to diagnose conditions

like asthma and pneumonia accurately. Research to analyze COVID-19 using audio
recordings of cough is still in its infancy. Our paper proposes a novel kind of COVID-
19 test using forced cough recordings. Our model architecture enables a nearly
cost-free, real-time solution for COVID-19 testing. The model uses forced cough
recordings to recognize whether the patient is COVID-19 positive or not. Readily
available tests can help check the outbreak of this novel virus and gradually ensure
a COVID-19-free world Machine Learning and Deep Learning approaches and K-
Nearest Neighbors, Support Vector Machine, Decision Trees, and Random Forest
classifiers were employed to address this issue. At the same time, disparate sorts of
Convolutional Neural Networks were used under the Deep Learning approach.
Keywords Artificial Intelligence Diagnosis · Machine Learning · Deep Learning ·

Convolutional Neural Networks · COVID-19 screening · Speech recognition
1 Introduction
More than a million people have succumbed to COVID-19, and more than 75 million
people have had COVID-19. Mass testing is essential for isolating infected individ-
uals and slowing the spread. India currently has the second most COVID-19 cases
worldwide. The ICMR has recommended faster COVID-19 tests in containment
zones, results in 30 min, costing | 450. With 220 million Indians sustained on an
expenditure level of less than Rs 32/day, and India going through the ‘unlock’ phase,
reverse migration of workers, and reopening of offices and some educational insti-
tutes, the need for a quick, accurate, and inexpensive COVID-19 test could not be
more significant. Even with the advent of the vaccine, there is no information about
its longevity and durability. Hence, there is an added need for people to be tested
regularly.
S. Dhekane (B) · V. Agrawal · A. Datta · K. Kulkarni

College of Engineering, Wellesley Road, Shivajinagar, Pune 411005, Maharashtra, India
138 S. Dhekane et al.
Forced cough Feature COVID-19 test

AI Model
recording Extraction result
Fig. 1 Overall block diagram for COVID-19 test
Dry cough is one of the most common symptoms of COVID-19. The virus, being
a respiratory disease, even if the person is asymptomatic, the lungs are weakened
by this virus, and hence, his/her forced cough is discernible from a healthy person’s
cough [1]. While these slight cough differences are not decipherable to the human
ear, they can be picked up by an Artificial Intelligence system.
The Aarogya Setu app by the Indian government has a national reach, and inte-
gration with our model can help people get a preliminary test of COVID-19. The
economy needs to get back on track with the unlock phase. For this to happen,
regular screening on a large scale is of paramount importance. Our model architec-
ture (Fig. 1) enables a nearly cost-free, real-time solution for COVID-19 testing. The
model uses forced cough recordings to recognize whether the patient is COVID-19
positive or not using Artificial Intelligence [2]. Disparate Machine Learning [3] and
Deep Learning [4] methods were implemented, and the results were observed.
Techniques that were employed are briefly described in Sect. 2 Methodology.
Section 3 shows the results of all the methods adopted. It also explains the results
and their implications. Section 4 is a synopsis of the overall methodology and the
results obtained.
2 Methodology
2.1 Dataset Used
A research group at the University of Cambridge held an extensive media campaign to

crowdsource data from many users. They developed a mobile application and website
named COVID-19 Sounds App [5] to collect cough recordings. Upon approaching
them, they shared the data with us for research purposes. The data received by us
contained 724 cough audio recordings in total, out of which 141 were declared
as tested positive for COVID-19, and 583 samples were forced cough recordings
from COVID-19 negative people. This includes people having other cough-related
diseases like asthma and people with a clean medical record.
Modern Transfer Learning-Based Preliminary Diagnosis of COVID-19 … 139
The coughing sound recorded while data collection is resampled to 16 kHz. Librosa
library was used for audio processing. From the resampled audio, the leading and
trailing silence was removed. Mel-Frequency Cepstral Coefficients (MFCCs) [6]
were extracted and stored in the form of Mel spectrogram images from this silence-
removed audio. As mentioned in the dataset section, the Cambridge dataset was
skewed in favor of healthy cough recording samples; that is, the dataset contained
fewer COVID-19 positive samples as compared to healthy samples; hence data
augmentation was used to increase the COVID-19 positive samples and balance
the dataset to get improved results.
2.3 Data Augmentation
After Mel spectrogram images were extracted from audios, these images were used
for data augmentation. To augment the Mel images, image pixel data was scaled by
using different normalization techniques. All Mel images from COVID-19 positive
samples were processed by applying these techniques, and the new augmented images
were stored in the dataset for further training.
The techniques used were Pixel value normalization, Centering pixel values, and
Standardizing pixel values.
2.4 Models’ Architecture
To train the model so that it learns to conduct the COVID-19 test, the forced cough
recordings were obtained from the dataset (Sect. 2.1). The recordings were then
processed to extract features. These features were fed to disparate Machine Learning
and Deep Learning models (Fig. 2). Each model was tested to calculate the accuracies,
and a comparative study was performed to conclude which model is best in terms of
reliability and accuracy.
Machine Learning Models:
For training the machine learning models [3], Mel-frequency cepstral coefficients
(MFCCs) were extracted for all 724 audio samples available and stored in our dataset.
The audios were divided into frames of 25 ms duration with the overlap of 10 ms, and
Mel-frequency Classification as
Forced cough
Cepstral ML/DL model COVID-19 positive
recording
Coefficients or negative
Fig. 2 Block diagram of the approach followed

12 MFCC coefficients were extracted for each frame. 75% of the available dataset
was used for training and 25% for testing. The models performed binary classification
into COVID-19 positive and healthy. Support vector machine classifier, decision tree
classifier, random forest classifier, and k-nearest neighbor classifier were trained to
obtain the results.
Deep Learning Models:
Suitable CNNs [7] were tested for classifying the Mel spectrogram images into binary
classes: COVID-19 positive and healthy. As mentioned above, a Mel-frequency
spectrogram was plotted for each audio. Then, because the dataset is skewed, data
augmentation was applied to balance the data, and hence a total of 1147 images were
subjected to the CNN under test. This set of 1147 images contained 564 images that
represented COVID-19 positive cough audios. A custom-made CNN model and a
few other transfer learning [8]-based CNN models were tested to determine which
one was the best at correctly classifying the maximum number of Mel images.
Custom-made CNN.: A custom CNN was designed by building a sequential model
comprising of a series of 2D Convolution layers, pooling layers, dense layers, and
dropout layers along with regularization to avoid overfitting. The architecture of this
custom-made CNN is shown below (Fig. 3).
Transfer Learning and Fine-Tuning Pre-trained CNNs.: Pre-trained CNNs (pre-
trained on the Imagenet dataset classifying a plethora of images into many classes)
were tweaked and used for classifying Mel spectrogram images. These deep neural
networks are loaded with Imagenet weights, and then the top softmax layer that
classifies images into a thousand classes is removed. A few top layers are set to
trainable, and the bottom layers are frozen on Imagenet weights. Dense and dropout
layers are added at the top, including a last dense layer containing two nodes that clas-
sify the Mel images into two classes: COVID-19 positive and healthy. The different
pre-trained CNNs used are enumerated as follows:
• ResNet50: Out of these 176 layers, 171 were frozen at Imagenet weights. The top
softmax classifier layer was replaced by three layers: a dense layer containing 100
nodes, an L2 loss regularizer, a dropout layer, and a final dense layer containing
two nodes with softmax activation. The summary of this network architecture can
be found in Table 1.
• Xception: Similar to the procedure followed in ResNet50, after removing the top
softmax classifier layer of the Xception model, 129 out of the total 133 layers
were frozen on Imagenet weights. The top softmax classifier was replaced with a
128-node dense layer, a dropout layer, and a final softmax dense binary classifier
layer. The summary of this network architecture can be found in Table 1.
• VGG16: This model, too, was fine-tuned for training Mel spectrogram images
by removing the top softmax classifier layer, freezing the bottom 17 layers (out
of a total of 20 layers), and adding a 100-node dense layer, dropout layer, and a
softmax binary classifier layer.
The summary of all network architectures pertaining to above mentioned pre-trained
CNNs can be found in Table 1.
Fig. 3 Custom CNN architecture
This section contains the results of all the models that were adopted to test COVID-19
using cough recordings. The overall accuracy and the categorical accuracy of each
class (COVID-19 positive and healthy) are mentioned for each model. The graphs
Table 1 Summarized description of pre-trained CNN architectures used

Model Input shape Output shape Number of Trainable Non-trainable
layers parameters parameters
ResNet50 (None, (None, 2048) 176 1,259,822 22,532,992
377,377,3)
Xception (None, (None, 2048) 133 3,426,178 17,697,832
377,377,3)
VGG16 (None, (None, 512) 20 2,411,310 12,354,880
377,377,3)
Table 2 Accuracies of various machine learning models

Model Overall accuracy (%) COVID-19 positive accuracy Healthy accuracy (%)
(%)
SVM 86.18 25.8 98.66
Decision tree 82.87 12.9 97.33
Random forest 88.39 35.48 99.33
KNN 87.29 25.8 100
depicting training and validation accuracies and losses are also plotted for each Deep
Learning model. Two types of models were used: Machine Learning based and Deep
Learning based.
3.1 Results of ML Models
See Table 2.
3.2 Results of DL Models
Custom CNN:
Categorical accuracies:
Accuracy of predicting healthy cough recording correctly: 98.59%, Accuracy of
predicting COVID-19 positive cough recording correctly: 78.72% (Fig. 4)
ResNet50:
Xception:
Fig. 4 Losses and accuracies versus epoch for training and testing of Custom CNN
Fig. 5 Losses and accuracies versus epoch for training and testing of ResNet50 using transfer
learning and fine-tuning

VGG16:
CNN results tabulated:
The detailed results portray that the ML methods are insufficient to identify
whether a particular cough recording corresponds to a COVID-19 positive person or
not. This failure of ML models can easily be attributed to the highly skewed dataset
that was used. Audio recordings were not subjected to data augmentation; hence, ML
Fig. 6 Losses and accuracies versus epoch for training and testing of Xception using transfer
learning and fine-tuning
Fig. 7 Losses and accuracies versus epoch for training and testing of VGG16 using transfer learning
and fine-tuning
models were trained on a total of 724 audios, out of which only 141 were COVID-19
positive and others healthy. Table 3 summarizes the accuracies of DL models.
Deep Learning models perform better than ML models here, as Mel spectrogram
images were augmented to balance the dataset. A total of 724 audio samples present in
the dataset were processed to extract Mel images, and these were then supplemented
to obtain 1147 Mel images. As previously mentioned, these Mel spectrogram images
were then fed to various CNNs. Among the multiple CNNs tried and tested, it is
evident that applying transfer learning on VGG16 and then fine-tuning gave out the
best accuracy. This can be attributed to the fact that a smaller model will work better
Table 3 Accuracies of various Deep Learning CNNs

Model Overall accuracy (%) COVID-19 positive Healthy accuracy (%)
accuracy (%)
Custom CNN 88.28 78.72 98.59
Fine-tuned ResNet50 90.23 86.52 94.37
Fine-tuned Xception 87.50 76.59 98.59
Fine-tuned VGG16 92.19 90.07 93.66
Table 4 Comparison table of present method with the other similar way on a similar dataset
Model Overall accuracy (%) Number of samples for COVID-19 Total samples
positive
Our model 92.19 141 724
Cambridge model 80 141 599
since it is a small dataset. VGG16 has the most miniature architecture among the
pre-trained CNNs.
We obtained the dataset from a research group at the University of Cambridge.
They developed a mobile application and website named COVID-19 Sounds App to
collect cough recordings and conducted research on the collected dataset [5]. Their
research includes a total of 599 sounds from different users, which contained cough
as well as breathing sounds. Classifiers such as logistic regression, gradient boosting
trees, and support vector machines were tested on features that combine handcrafted
features and features obtained through transfer learning. Our model was trained to
identify COVID-19 positive or negative based on only cough recordings, whereas
the above model uses breathing recordings. Comparing the accuracies of our model
with the above model, our model performs better (Table 4).
4 Conclusion
An AI-based COVID-19 pre-screening test that discriminates 92.19% of COVID-

19 positives from forced cough recordings at essentially no cost was successfully
designed. This model can be deployed on a website or a mobile application to pre-
screen the whole population daily while avoiding the cost of testing each inhabitant,
especially important in low-incidence areas where the required post-test confinement
is harder to justify.
Acknowledgements COVID-19 Sounds App’s [5] reliable data has helped us build this model that
can play an essential role in recovering from the pandemic.
Conflict of Interest All of the authors do not have any conflict of Interest with any individuals,
agencies, or institutes.
References
1. Shi, Y., Liu, H., Wang, Y., Cai, M., Xu, W.: Theory and application of audio-based assessment
of cough. J. Sens. 2018. Article ID 9845321 (2018)
2. Imran, A., Posokhova, I., Qureshi, H.N., Masood, U., Riaz, M.S., Ali, K., John, C.N., Iftikhar
Hussain, M.D., Nabeel1, M.: AI4COVID-19: AI enabled preliminary diagnosis for COVID-19
from cough samples via an app. Inf. Med. Unlocked (2020)
3. Alpaydın, E.: Introduction to Machine Learning, Second Edition (Adaptive Computation and
Machine Learning), 2nd edn. The MIT Press Cambridge, Massachusetts, London, England
(2010)
4. Moolayil, J.J.: Learn Keras for Deep Neural Networks: A Fast-Track Approach to Modern Deep
Learning with Python, 1st edn. Apress, New York (2019)
5. Brown, C., Chauhan, J., Grammenos, A., Han, J., Hasthanasombat, A., Spathis, D., Xia, T.,
Cicuta, P., Mascolo, C.: Exploring Automatic Diagnosis of COVID-19 from Crowdsourced
Respiratory Sound Data. In: KDD’20 (Health Day), San Diego, CA, USA (virtual event) (2020)
6. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals, 4th edn. AT&T, Prentice-
Hall, Inc., Englewood Cliffs, New Jersey
7. Laguarta, J., Hueto, F., Subirana, B.: COVID-19 artificial intelligence diagnosis using only cough
recordings. IEEE Open J. Eng. Med. Biol. (2020)
8. Hussain, M., Bird, J.J., Faria, D.R..: A study on CNN transfer learning for image classification.
In: 18th Annual UK Workshop on Computational Intelligence, Nottingham (2018)
Biomedical Text Summarization:
A Graph-Based Ranking Approach
Supriya Gupta, Aakanksha Sharaff, and Naresh Kumar Nagwani
Abstract The latest and precise information regarding the biomedical and health-
care domain is required in the current pandemic situation. The world has turned
into a small place where everyone wants quick and relevant medical data to prevent
contagious diseases. Doctors, nursing staff, medical practitioners, frontline Covid19
epidemic fighters, and even the common man requires updates and summarized
biomedical statistics. A study on Graph-based biomedical text summarization with
different similarity measures and ranking of sentence embeddings is presented in
this paper. Cosine and Dice similarities and the pre-trained BERT model providing
context via sentence embeddings are combined with TextRank and PageRank algo-
rithms resulting in an opulent extractive text summarization of biomedical Cord19
Pubmed articles. Rouge-1 and Rouge-L scores are empirically calculated, providing
a comparison between the average F-score, precision, and recall values for various
graph-based sentence extraction methods. It has been observed that Cosine similarity
and BERT sentence embeddings are equally effective when used with graph-based
ranking algorithms. The significant contribution is the proposed TextRank with the
BERT embedding model, which is evaluated as the preferred choice for short biomed-
ical document summarization. But for large documents, the BERT model behaves
heavy and causes latency in execution whereas, LexRank including Cosine measure
still works efficiently for mid-size document summarization.
Keywords Sentence extraction · Biomedical extractive text summarization ·

NLP · Graph-based method · TextRank · PageRank · LexRank · Cosine · BERT
word embedding
S. Gupta (B) · A. Sharaff · N. K. Nagwani

National Institute of Technology Raipur, Raipur, India
e-mail: sgupta.phd2018.cs@nitrr.ac.in
A. Sharaff
e-mail: asharaff.cs@nitrr.ac.in
N. K. Nagwani
e-mail: nknagwani.cs@nitrr.ac.in
148 S. Gupta et al.
1 Introduction
The enormous volume of Textual data in the biomedical area is consistently a chal-
lenge that drives analysts to create new domain-related textual processing methods.
In the past, biomedical summarization techniques have been broadly examined to
give clinicians and specialists. Biomedical data is accessible as various kinds of
archives. Biomedical reviews provide clinicians and specialists vital information to
evaluate the most recent advances in a specific field of study, create and approve
new theories and hypotheses, experimental analysis, and decipher their outcomes.
Readers need to track down accumulated data from their own experience along-with
considering new knowledge to comprehend and assess the biomedical article [1]. The
new Researcher can’t figure out which of the concurrent sentences are significant and
the preliminary information the authors need to introduce. Numerous wellsprings in
the existing literature can be accessed over the Web, Cord19, and Pubmed reposito-
ries with openly filed insightful archives inside the healthcare and biomedical fields.
The motivation is to retrieve informative sentences through the bulk of the biomed-
ical articles, which seems to be a significant challenge. Text Summarization is to
summarize the vast data in a precise length with informative knowledge collection.
The proposed methodology gives the text rank graph-based approach for extracting
top-ranked sentences to summarize the biomedical text articles (Cord19, Pubmed).
Text summarization using automated methods is troublesome and significant due to
human summarizers attached with knowledge and language capabilities that aren’t
straight similar to computer logic. By and large, the two types of ATS approaches are
either extractive or abstractive [2]. In the extractive way of summarization, significant
sentences from the text content are decided and selected in the shortened outline.
Extraction relies upon text strings taken from the original first content as it were.
Interestingly, abstractive summarization techniques plan to replicate the significant
substance into another structure as humans do.
A brief history of work done in this field is presented in Sect. 2. Section 3 contains
the proposed methodology, system architecture, working approach, and algorithm.
Section 4 highlights the experimental setup, analysis, and results for the work done.
The last Sect. 5, contains the conclusion.
2 Literature Review
Mainly, the three fundamental stages of text summarization in an extractive manner

are as follows: (1) Develop transitional information of the text available; (2) Eval-
uate similarity scores of tokenized text strings based on various methods; (3) Select
the significant text strings for forming summarized content [2]. At the start of the
ATS period, Luhn [3] introduced ATS distinguishing critical text info via scoring
with measurable data received from the word recurrence. The word criticalness is
estimated via seasons of the emergence in-text archive at that point zeroed in on
Biomedical Text Summarization: A Graph-Based Ranking Approach 149
repeated words deciding the significant parameter of a sentence. WordNet [4] repre-
sents the famous non-exclusive philosophy. Nagwani et al. presented Dice and Cosine
similarities for sentence similarity measure [5].
Chen and Verma [6] built up a clinical book rundown framework that pre-owned
catchphrases from the first report as a question. The arrangement of the watchwords
was extended from the coordinating ideas inside the UMLS knowledgebase. The
sentences are scored considering 1 point when the first catchphrase is present in
any sentence and a 0.5 point if it is the extended watchword. Finally, the superior k
sentences having the most noteworthy scores have been chosen in the summarized
content. Likewise, Sharaff and Nagwani [7] identify email threads and evaluate this
with precision-recall and F-score. Mihalcea et al. [8] introduced TextRank that like-
wise utilized PageRank for assessing the list of sentences yet assembled and spoke to
chart by using a co-event connection, gotten from comparability of each sentence. The
text processing for the SMS threads is reported in [9]. The authors isolated the archive
into tokenized sentences and planned the text strings via metaphysics, resulting in
the Ontology-based sentence tree by assessing scores. The more significant rank of
hubs represented a superior score.
Zahir et al. [10] summed up conventional archives utilizing a diagram-based
strategy. First, they constructed vertices and edges generated from the sentences with
a word frequency of more than one. Scores at that point are expressed in the form
of a specific symmetric lattice. This work didn’t consider the relative relationship
within the sentence to decrease confusion and rely on the framework. It also lacked
the evaluation of cosine resemblance. Moradi and Ghadiri [11, 12] introduced the
summarization based on Bayesian principles utilizing data from the Pubmed repos-
itory, which is strategically integrated with the UMLS philosophy; at that point,
the sentence vectors are processed for separation estimation and recovering ideas.
Mohamed and Oussalah [13] presented Semantic Role Labeling, i.e., SRL to parse,
set weightage to individual sentences, afterward utilized PageRank to score penalties.
Sharaff and Nagwani [14] gave the document summarization by an agglomer-
ative method. To assess the content rundown exhibition, there are a few estima-
tions, i.e., F-measure, Recall, Precision, and ROUGE values [15]. ROUGE values
depend on the N-gram cover among the frameworks created and compared to the
reference summarization (otherwise called reference standard). The standard gold
summarization might be the human-made synopsis or theoretical from the original
text. The ROUGE measurement process scores are somewhere in the range of 0–
1. A higher score speaks to a more prominent presentation. Our paper introduces
a programmed extraction-based summarization framework zeroing in biomedical
survey and research papers via graph structure. PageRank algorithm combined with
BERT word embedding and Cosine and Dice similarities is proposed. The proposed
technique’s evaluation is done via calculating ROUGE measurements using multiple
similarity measures and comparing average F-score, precision, and recall attributes.
150 S. Gupta et al.
After initial preprocessing, the PageRank algorithm’s blend with various syntactic
and word embedding models is proposed for extractive text summarization over
biomedical datasets. Similarity matrices are generated using Cosine, Dice, and the
combination of Cosine and Dice similarities. Separately, pre-trained BERT [16, 17]
model is implemented to produce word embeddings to provide semantic and contex-
tual representation. Later, a sentence graph model is created via the TextRank algo-
rithm based on the PageRank algorithm [18]. The sentences having apex ranking are
selected to produce the précised content.
3.1 System Architecture
The TextRank algorithm is based upon the PageRank algorithm. In the planned archi-
tecture, sentences are represented as multiple nodes. Similarly, edge identification
is performed from the score of different similarity measures. Biomedical articles
are used for extractive graph-based text summarization with a ranked approach. The
Cord19 dataset contains Covid19-related articles, and the Pubmed dataset contains
research articles and literature in the biomedical domain.
Figure 1 highlights the basic building blocks of the developed system and model
for extractive summarization of biomedical textual data. Biomedical documents
are fed to Json [19] and XML parsers (Pubmed Parser) [20] to gather text data,
which is convenient for text processing. Documents are tokenized and preprocessed;
sentences are converted into vectors, and words are weighed using cosine angle and
dice weighting schemes. In addition to the weighting mechanism, BERT is used for
word embedding, which provides the similarity scores between two sentences with
the help of the knowledgebase. According to similarity matrices, nodes, edges, and
linkage are identified for creating a text graph. The different sentences are ranked,
and then top-ranked sentences are picked to form the summary document.
PageRank
PageRank (Brin and Page, 1998) is maybe one of the most well-known positioning
calculations and was planned as a technique for Web connect examination. In
contrast to other positioning calculations, PageRank coordinates the effect of both
approaching and active connections into one single model, and in this way, it creates
just one bunch of scores:
PR(Vm)
PR(Vm) = (1 − d) + d ∗ vn ∈ ln(Vm)
|Out(Vn)|
Here, d represents the boundary, which is fixed somewhere in the range of 0–1.
For every one of these calculations, beginning from self-assertive qualities allotted to
Fig. 1 Extractive graph-based ranking approach model
every hub in the diagram, the calculation repeats until assembly under a given edge
is accomplished. In the wake of running the calculation, a score is related to every
vertex, which indicates the “significance” or “force” of that vertex inside the diagram.
It has been observed that the underlying worth decision does not influence the last
qualities; just the quantity of emphases to intermingling might be extraordinary.
3.2 Text Preprocessing
As explained in Fig. 2, before processing steps, the biomedical content is parsed.

The abstract and article text from Cord19 [19] and Pubmed repository archives are
separated. The document content is modified using nltk and other Python-based
NLP libraries. Standard data cleanup steps are carried out, including sentence and
152 S. Gupta et al.
Fig. 2 Preprocessing stages for data clean up
word markings, stop words/noise (resembling symbolic words) removal, and similar-
stemmed word evacuation.
3.3 Sentence-Scoring
In Figure 3, the sentences are marked as nodes and their interconnectivity as edges.
These nodes are derived from the sentence hubs, while advantages are the linkage
among any two nodes. The significant accomplishments consolidated together are (1)
Understanding the semantic context of the document via the BERT model resulting
in the extraction of critical biomedical concepts and (2) Evaluation and comparison
of the standard weighting mechanism over Cord19 and Pubmed datasets.
Fig. 3 Sentences in the form

of Nodes with their linkage
3.4 Sentence-Selection
The privileged sentences are selected via the TextRank algorithm based on different
similarity measures. The stepwise sequence of processing text data and establishing
summarized content based on various similarity measures and word embeddings is
depicted through Algorithm 1.
Algorithm 1: Graph-based ranking using different similarity measures and embedding
Input: Biomedical article as a single text

Output: Precise length summary
Parse Cord19 dataset and Pubmed dataset with Json parser and an XML parser
1: for each Document D, do
2: Identify and fragment sentences via Tokenization
3: for each sentence (Sn), do
4: Execute preprocessing steps (Tokenize words, Filter stop words, Convert sentence in
lowercase, Lemmatize)
5: Attain filtered sentences (Sn)
6: Set Damping factor as .85
7: Set Convergence threshold as 1e-5
8: Set Iterations as 100
9: Convert text to vector
10: Evaluate Cosine measures
11: Evaluate Dice measures
12: Evaluate and combine Cosine and Dice measures
13: Evaluate sentence embedding with Bert
14: Construct similarity matrix Graph with the node as sentences and edge as similarity scores
15: Execute Page Rank algorithm
16: Return top-ranked sentences
17: end for
18: Generate summary with given no. of top-ranked sentences
19: end for
The environment setup included executing the open-source Python notebook created
on Google collaborator consisting of different extended libraries like nltk, BERT
transformers, and numpy. Pubmed and Cord19 biomedical text repositories are
tweaked with XML parser [20] and Json parser [21] to obtain plain text documents
and abstracts. ROUGE scores are evaluated by comparing generated summary from
the proposed system, and the gold outline is created from the abstracts of the original
biomedical text articles. At that point, we utilized Python ROUGE [22] to gauge the
produced rundowns in ROUGE scores. The summarizers’ text rank cosine is deployed
from Sumy [23]. The summarization tasks—TextRank with BERT, LexRank cosine,
Dice, and Cosine with Dice are our practical implementations.
Table 1 shows average F-scores, precision, and recall values from ROUGE execu-
tion over the proposed methods. It has been observed that our proposed technique
performed well with highlighted scores.
154 S. Gupta et al.
Table 1 Text summarization graph-based ranking using similarity measures

Dataset Graph-based approach F-Score Precision Recall
R1 RL R1 RL R1 RL
CORD19 TextRank with BERT 0.206 0.122 0.176 0.099 0.323 0.196
LexRank Cosine 0.192 0.109 0.141 0.081 0.399 0.212
TextRank with Dice 0.151 0.101 0.127 0.085 0.241 0.163
TextRank with Cos-Dice 0.143 0.111 0.143 0.104 0.169 0.150
Pubmed TextRank with BERT 0.316 0.214 0.340 0.208 0.300 0.220
LexRank Cosine 0.311 0.214 0.307 0.199 0.325 0.234
TextRank with Dice 0.126 0.139 0.250 0.191 0.088 0.111
TextRank with Cos-Dice 0.124 0.140 0.249 0.194 0.086 0.111
Fig. 4 Similarity-based graph approach with sentence ranking text summarization
Figure 4 depicts the quality of generated summary considering biomedical arti-

cles from Cord19 and Pubmed repositories. The x-axis represents various similarity
models implemented on two datasets, and the Y-axis contains the values of multiple
parameters showcasing précised information retrieval in summary.
5 Conclusion
In our paper, the results are offered for the biomedical extraction-based text summa-
rization calculated from different similarity measures using a text rank algorithm
to extract top-ranked sentences in summary. The proposed method contributes to
extracting top-ranked sentences with other syntactic and word embedding methods
for graph-based matrix generation. The PageRank algorithm is used for ranking the
highest score sentences, and BERT and Cosine-based text rank algorithms perform
well compared to the baseline. BERT-based ranking is efficient for short docu-
ments but is time-consuming for huge text corpuses. The Cosine-based LexRank
performs efficiently for short and mid-length document summarization. Graph-based
approaches for finding specific significant vertices can be enhanced in future work,
and the proposed method can be further integrated with different knowledge bases
like UMLS for biomedical datasets (Pubmed, Cord19).
References
1. Mishra, R., Weir, C.R., Bian, J., Jonnalagadda, S., Fiszman, M., Mostafa, J., Del Fiol, G.: Text
summarization in the biomedical domain: a systematic review of recent research. J. Biomed.
Inform. 52, 457–67 (2014)
2. Allahyari, M., Trippe, E.D., Pouriyeh, S., Safaei, S., Assefi, M., Kochut, K., Gutierrez, J.B.:
Text summarization techniques: a brief survey (2017). arXiv preprint arXiv:1707.02268
3. Sharaff, A., Roy, S.R.: Comparative analysis of temperature prediction using regression
methods and back propagation neural network. In: ICOEI (2018).
4. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
5. Nagwani, N.K., Singh, P.: Weight similarity measurement model based object oriented
approach for bug databases mining to detect similar and duplicate bugs. In: International
Conference on Advances in Computing, Communication and Control (ICAC3’09) (2009)
6. Chen, P., Verma, R.: A query-based medical information summarization system using ontology
knowledge. pp. 37–42.
7. Sharaff, A., Nagwani, N.K.: Email thread identification using latent Dirichlet allocation and
non-negative matrix factorization based clustering techniques. J. Inf. Sci. 42(2), 200–212 (2016)
8. Mihalcea, R., Tarau, P.: TextRank: bringing order into text. Int. J. Public Adm. 42(7), 596–615
9. Sharaff, A., Shrawgi, H., Arora, P., Verma, A.: Document summarization by agglomerative
nested clustering approach. In: IEEE International Conference on Advances in Electronics,
Communication and Computer Technology (2016)
10. Zahir, S., Cenek, M., Fatima, Q.: New graph-based text summarization method. pp. 396–401
(2015)
11. Moradi, M., Ghadiri, N.: Different approaches for identifying important concepts in proba-
bilistic biomedical text summarization. Artif. Intell. Med. (2017)
12. Moradi, M., Ghadiri, N.: Quantifying the informativeness for biomedical literature summa-
rization: an itemset mining method. Comput. Methods Progr. Biomed. 146, 77–89 (2017)
13. Mohamed, M., Oussalah, M.: An iterative graph-based generic single and multi document
summarization approach using semantic role labeling and wikipedia concepts. pp. 117–120
14. Sharaff, A., Nagwani, N.K.: SMS spam filtering and thread identification using bi-level text
classification and clustering techniques. J. Inf. Sci. 1–13 (2015)
15. Lin, C.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization
Branches Out (2004)
16. Beltagy, I., Cohan, A., Lo, K.: SciBERT: a pretrained language model for scientific text. In:
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
pp. 3615–3620. Association for Computational Linguistics, Hong Kong, China (2019)
17. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-
networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language
Processing, pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China
(2019)
156 S. Gupta et al.
18. Page, L., Winograd, T., Motwani, R., Brin, S.: The PageRank citation ranking: bringing order
to the web. Stanford InfoLab (1999)
19. Moen, S., Ananiadou, T.S.S.: Distributional semantics resources for biomedical text processing.
pp. 39–43 (2013)
20. Achakulvisut, T., Acuna, D.E.: Pubmed Parser. (2015). https://doi.org/10.5281/zenodo.159504
21. Inc. GitHub. Open source data. https://github.com/deepset-ai/ COVID-QA, (2020). Stephan
Tulkens, “humumls”. GitHub repository. Retrieved from https://github.com/clips/humumls
(2018)
22. Yuya Taguchi, “pythonrouge”. GitHub repository. Retrieved from https://github.com/tagucci/
pythonrouge (2018)
23. Mišo Belica, “sumy”. GitHub repository. Retrieved from https://github.com/miso-belica/sumy
(2018)
EEG-Based Diagnosis of Alzheimer’s
Disease Using Kolmogorov Complexity
Digambar Puri, Sanjay Nalbalwar, Anil Nandgaonkar, and Abhay Wagh
Abstract Alzheimer’s disease (AD) is the most common and fastest

growing neurodegenerative disorder of the brain due to dementia in old age
people in Western countries. Detection and identification of AD patients from
normal subjects using EEG biomarkers is a research problem. This study has
developed an automatic detection of AD patients using Spectral Entropy (SE) and
Kolmogorov Complexity (KC) feature sets. It is observed that (i) the SE value
is low in AD patient’s EEG signals compared to normal controlled subjects. (ii)
AD patients’ EEG is more regular compared to normal controlled subjects, as
shown by KC features. These feature sets have been computed and compared based
on statistical measures of classifiers. We have used six different supervised and
unsupervised classifiers in this research. Support Vector Machine classifier had
performed well compared to others and achieved more than 95% accuracy when
we provided both SE and KC feature sets. This work suggests that nonlinear EEG
signal analysis can contribute to enhancing insights into brain dysfunction in AD.
Keywords Spectral entropy · Kolmogorov complexity · Electroencephalogram
1 Introduction
Alzheimer’s disease (AD) is the most common and significant public health issue
worldwide. The impact of AD on the aging population is growing at an alarming rate.
At present, the number of people suffering from AD and its cognitive impairments is
estimated to be more than 50 million, and it is predicted that it will double by 2030
and triple by 2050 [1]. AD is a chronic neurological disorder that kills the number
of synapses due to the deposition of tau protein neurofibrillary tangles and amyloid
D. Puri (B)
Department of Electronics and Telecommunication, R.A.I.T, Mumbai, India
S. Nalbalwar · A. Nandgaonkar
Department of Electronics and Telecommunication, Dr. B. A.T. University, Lonere, India
A. Wagh
Directorate of Technical Education, Maharashtra State, Mumbai, India
158 D. Puri et al.
plaque, and eventually, the death of neurons will occur [2]. In recent years, several
research groups have investigated the potential of EEG for the diagnosis of AD over
other brain imaging techniques such as fMRI, SPECT, and PET [3]. Since EEG
recording systems are non-invasive, of low cost, mobile, and provide high temporal
resolution, EEG can be used as a tool to screen people for the risk of AD. Many
researchers have shown that AD has three significant effects in EEG: slowing, loss
of complexity, and perturbation in EEG [4].
The study in [5] proposed relative power to investigate the slowing effect and
Lempel–Ziv complexity for measuring irregularity in EEG of AD patients, and its
final accuracy is 85%. [6] proposed the approximate entropy and auto-mutual infor-
mation for detecting AD patients from NC subjects and got accuracy up to 90%. In
this paper, we have used Spectral Entropy (SE) and Kolmogorov Complexity (KC)
to investigate the slowing effect and regularity present in EEG of AD patients. These
features have been provided to various classifiers to check the performance. SVM
performed better compared to other supervised and unsupervised classifiers.
The rest of the paper is described as follows: Sect. 2 provides the details of EEG
recordings and gives SE and KC feature information. Section 3 presents the analysis
and discussion of the results, followed by Conclusion in Sect. 4.
2 Material and Methods
2.1 Subjects Details and Acquisitions of EEG
EEG datasets used in this experiment consist of two groups: AD patients and Normal
Controlled (NC) subjects. EEG signals have been captured from the 11 NC subjects
(4 men and 7 women) and 12 (5 men and 7 women) AD patients with age 72.8 ±
8.0 (mean ± standard deviation) years. These AD patients have been recruited from
Alzheimer’s patients Relatives Association of Valladolid (AFAVA). These patients
have fulfilled the criteria of probable AD. To access the cognitive ability, all partic-
ipants have gone through clinical evaluation like physical and neurological exami-
nation, brain scanning, and the most significant test Mini-Mental State Examination
(MMSE) [7]. The mean value of the MMSE score was 13.1 ± 5.9 (mean ± SD)
for AD patients. 5 AD patients (out of 12) had MMSE scores less than 12, which
denotes severe dementia. All EEG signals have been recorded at the University
Hospital of Valladolid (Spain). The NC group contains 11 subjects (4 men and 7
women) and an age group of 72.5 ± 6.1 (mean ± SD), having no present symp-
toms and history of dementia or any neurological disorder. This NC group had more
than 30 MMSE values. All participants willingly participated in the EEG recording
activity, and written informed consent was taken from NC subjects and caregivers of
demented patients. The local ethical committee approved this acquisition process of
the Hospital Clinic Universitario de Valladolid.
EEG-Based Diagnosis of Alzheimer’s Disease … 159
Fig. 1 Three electrode (P3, P4, and O2) sample EEG signal of a Controlled person, b AD Patient
More than 5 min of EEG signals were recorded from each subject using a Profile
Study Room 2.3.411 EEG system (Oxford Instruments) at electrodes O1, O2, Fz, Cz,
Pz, F3, F4, F7, F8, Fp1, Fp2, C3, C4, T3, T4, T5, and T6 of the International 10–20
electrode placement system with linked earlobe reference points [6, 7]. During the
recording process, all participants were at rest, awake, and with eyes closed under
vigilance control. The 12-bit A-D converter was used to sample the EEG signals at
256 Hz sampling frequency. A specialist physician had checked the EMG activity,
eye movement, and other artifacts in EEG segments.
Thus, we have only selected the EEG segments free from electro-oculographic
with minimal EMG generated due to nonlinear analysis movements. Afterward, EEG
data was arranged in 5s artifact-free epochs (1280 points). The average number of
epochs selected was 28.8 ± 15.5 (mean ± SD) per electrode per subject. Figures 1
and 2 show a sample EEG of three electrodes (P3, P4, and O2) for AD patients and
HC subjects.
2.2 Spectral Entropy
The most common method to measure spectral power distribution is spectral

entropy (SE). SE is derived from information entropy or Shannon entropy in infor-
mation theory. SE calculates the Shannon entropy of the signal and assumes the
signal’s normalized power distribution as a probability distribution in the frequency
domain [8]. This SE can be used for feature extraction in various applications, such as
diagnosing disease and fault detection. SE is extensively utilized as a unique feature
in audio, speech recognition, and biomedical signal applications.
The SE is obtained from a probability distribution and the power spectrum
of the signal. Signal x(n) is given; the power spectrum is denoted as Sp(k) =
|X(k)|2, where X(k) indicates discrete Fourier transform of x(n). The probability
distribution P(k) is derived as
160 D. Puri et al.
Sp(k)
P(k) = (1)
i Sp(i)
The normalized SE is given as

N
P(k).log 2 P(k)
H=− k=1
(2)
log 2 N
where N is a finite number of frequency samples. The denominator, log2 N, indicates

maximal spectral entropy of white noise and is uniformly distributed over a given
set of frequencies in the frequency domain. The following equation denotes the
probability distribution if S(t, f ) (time–frequency power spectrogram) is provided by

t Sp(t, k)
P(k) = −
t f Sp(t, f )
To evaluate SE at a given time–frequency power spectrogram S(t, f ), the

probability distribution at time t can be given as

Sp(t, k)
P(t, k) = − t (4)
f Sp(t, f )
Then the normalized spectral entropy at time t is

N
P(t, k).log 2 P(t, k)
H(t) = − k=1
(5)
log 2 N
In this work, SE has been applied to all the EEG data of the AD patients and NC
subjects.
2.3 Kolmogorov Complexity
In 1965, Kolmogorov proposed measures of information in terms of complexity.

Kolmogorov suggested that the random sequences have relatively high complexity.
KC’s goal is to develop a measure of the complexity or “randomness” of an object.
If the information “I” wants to produce a finite sequence “x” can be indicated by
a program whose length is small among all, then that length L(p) of program, p is
described to KC, which is given as
K C(x) := min p {L( p) : f ( p) = x} (6)

KC was designed to compute the onset of binary digit sequence. To apply KC

to the EEG dataset, a string of binary values have been calculated from each EEG
epoch. The median of all the EEG epochs for every electrode is evaluated. If the
resulting value is more significant than one, it is labelled binary 1 else binary 0 [9].
Herein, we have used Spectral Entropy and Kolmogorov Complexity as a feature

set to identify AD patients from NC subjects. Spectral Entropy (SE) was estimated
from 16 channels O2, O1, P3, P4, C3, C4, F3, F4, F7, T3, T4, T5, Fp1, Fp2, and
T6, and the mean and standard deviation (SD) of SE for each electrode EEG signal
from AD patients and NC subjects. The results are summarized in Table 1. It has
been observed that the SE values are significantly lower in AD patients compared
to typical controlled (NC) subjects in all 16 electrodes, with a significant difference
in two classes (p < 0.01) at O1, O2, P3, and P4 electrodes, as shown in Fig. 2. The
comparison of the Kolmogorov Complexity of AD and NC classes for all 16 EEG
electrodes is depicted in Fig. 3. This suggests that the EEG signal activity is more
regular in AD patients than in the NC subjects.
We estimated the ability of both feature extraction methods (SE and KC) to
discriminate NC subjects from AD patients at the electrodes, where considerable
Table 1 Average spectral entropies of AD patient and controlled subjects for each electrode with
their (mean ± SD) values
Electrode AD patients Controlled subject p
F3 0.574504193 ± 0.1121 0.579422229 ± 0.1122 0.1114
T5 0.575884202 ± 0.1123 0.581471674 ± 0.1123 0.0192
F7 0.574658548 ± 0.1137 0.578638296 ± 0.1137 0.6354
Fp1 0.574861246 ± 0.1127 0.581081475 ± 0.1127 0.0632
Fp2 0.577233983 ± 0.1141 0.580057193 ± 0.1141 0.1244
T3 0.575677362 ± 0.1141 0.579304739 ± 0.1141 0.7663
F4 0.574568333 ± 0.1122 0.578557822 ± 0.1122 0.8242
T4 0.577530426 ± 0.1134 0.582928486 ± 0.1134 0.9701
C3 0.575470097 ± 0.1142 0.579007141 ± 0.1142 0.1819
T6 0.576247037 ± 0.1135 0.58030197 ± 0.1134 0.0322
F8 0.576602097 ± 0.1146 0.578504554 ± 0.1146 0.4426
C4 0.575781958 ± 0.1123 0.579298901 ± 0.1122 0.3199
P3 0.575143866 ± 0.1132 0.582246802 ± 0.1132 0.0014
O1 0.576824677 ± 0.1138 0.579224846 ± 0.1138 0.0027
O2 0.582924038 ± 0.1114 0.588796275 ± 0.1113 0.0086
P4 0.57875928 ± 0.1156 0.580618155 ± 0.1156 0.0031
162 D. Puri et al.
Spectral Entropy 0.59 NC subject AD Patient
0.585
0.58
0.575
0.57
0.565
F3 F4 F7 F8 Fp1 Fp2 T3 T4 T5 T6 C3 C4 P3 P4 O1 O2
EEG Electrodes
Fig. 2 Comparison of spectral entropies of AD and NC classes for all 16 EEG electrodes
NC subject AD Patient
10
9.9
Kolmogorov
Complexity
9.8
9.7
9.6
9.5
9.4
F3 F4 F7 F8 Fp1 Fp2 T3 T4 T5 T6 C3 C4 P3 P4 O1 O2
EEG Electrodes
Fig. 3 Comparison of Kolmogorov complexity of AD and NC classes for all 16 EEG electrodes
differences were found using Area Under Curve (AUC) of the ROC plot. The other
values like precision, recall, F1-score, and accuracy of each classifier for three
different feature sets: (a) only SE, (b) only KC, (c) SE, and KC.
Firstly, we have evaluated the six different classifiers’ performance parameters by
using only SE feature sets that have been shown in Table 2. SVM and KNN provide
maximum accuracy of 90.8% and 90.6%, whereas other classifiers like RF, MLPNN,
Table 2 Performance parameters of various classifiers using only spectral entropy feature sets with
tenfold cross-validation technique
Accuracy F1-score Precision Recall AUC
SVM 90.8 90.7 91 90.8 95.1
RF 89.7 89.6 89.9 89.7 94.9
MLPNN 90.6 90.6 90.6 90.6 95.4
NB 89.7 86.8 86.9 86.7 93.3
KNN 90.6 90.5 90.7 90.6 95.3
AdaBoost 82.6 82.5 82.6 82.6 81.7
Table 3 Performance parameters of various classifiers using only Kolmogorov complexity feature
sets with tenfold cross-validation technique
SVM 92.9 92.8 93 92.9 96.6
RF 91.2 91.1 91.6 91.2 96.5
MLPNN 93.1 93 93.2 93.1 96.9
NB 87.3 87.3 87.5 87.3 93.7
KNN 92.1 92.1 92.3 92.1 96.8
AdaBoost 84.9 84.9 84.9 84.9 84.1
Table 4 Performance parameters of various classifiers using spectral entropy (SE) and Kolmogorov
complexity feature sets with tenfold cross-validation technique
SVM 95.6 95.1 95.2 95.2 98.3
RF 90.5 90.4 90.5 90.5 96.1
MLPNN 94.1 94.1 94.1 94.1 97.7
NB 79 79 79 79 86.8
KNN 95.2 95.6 95.6 95.6 98.5
AdaBoost 88.1 88.1 88.1 88.1 87.5
Naive Bayes (NB), and AdaBoost, 89.7%, 90.5%, 89.7%, and 82.6%, respectively.
Secondly, we have applied the KC feature sets to the same classifiers which have
already been used for SE feature sets; again, we got the maximum accuracy of 92.9%
from the SVM classifier. The other classifiers also performed well. The accuracy,
F1-score, precision, recall, and AUC for all classifiers with KC as feature sets are
provided in Table 3. Thirdly, we have applied the combination of SE and FC feature
sets to the same classifiers; it is observed that all classifier performance parameters
have been improved, as shown in Table 4. SVM provides the maximum classification
rate of 95.6%. In all three experiments, SVM has performed well compared to all other
classifiers used to evaluate feature sets. The comparison of classification accuracy
from all classifiers has been shown in the bar chart in Fig. 4; it’s clear that the
combination of SE and KC feature sets performs well compared to an individual
one. Performance estimation is done by a tenfold cross-validation method in all
scenarios. From these evaluations, it has been observed that AD patients’ EEG is
more regular than that of the NC subjects; this has been captured from KC feature
sets. The spectral entropy values and KC values are significant biomarkers that can
identify an AD patient from NC subjects. A comparison table of the present method
with the other similar approach is shown in Table 5.
164 D. Puri et al.
100 SE KC SE+KC
95
90
85
80
SVM RF MLPNN NB KNN AdaBoost
Fig. 4 Performance evaluation of various classifiers for three different input feature sets a SE only,
b KC, c SE, and KC
Table 5 A Comparison table

Reference Extracted Classifiers Accuracy (%)
of the present method with
features
other similar methods
Guilia et al. FFT and DWT Decision tree 91.00
[4]
Charles et al. Parallel factor ANN 74.70
[10] analysis
Datta et al. Wavelet energy KNN 79.52
[11] and entropy
Thibaut [12] Tsallis Entropy Decision tree 93.75
Proposed SE + KC SVM 95.60
method
4 Conclusion
In our framework, the diagnosis of AD patients from NC subjects has been performed
on the basis of measures of Spectral Entropy and Kolmogorov Complexity. We
obtained that SE values are significantly lower in AD patients’ EEG than NC subjects.
The EEG of AD patients is more regular. This has been captured from KC values.
There are some limitations of this work. Firstly, the in-hand dataset size was small.
To utilize this technique as a tool for diagnosing AD, this must be overextended
to more extensive AD patient samples. In our next work, we will concentrate on
studying EEG synchrony with various entropies and complexity for the diagnosis of
AD patients from mild cognitive impairment patients and healthy controlled subjects
of the same age group. We will apply the method described in this work on the other
EEG data collected at various hospitals to find our method’s correctness.
References
1. Alzheimer’s disease facts and figures the journal of alzheimer’s association, Chicago, vol. 13
(2020)
2. Lopez-Martin, M., Nevado, A. and Carro, B.: Detection of early stages of Alzheimer’s disease
based on MEG activity with a randomized convolutional neural network. Artif. Intell. Med.
107 (2020). ISSN 0933-3657, https://doi.org/10.1016/j.artmed.2020.101924
3. Puri, D., Ingle, R., Kachare, P., Awale, R.: Wavelet packet sub-band based classification of
alcoholic and controlled state EEG signals. In: International Conference on Communication
and Signal Processing (ICCASP), Atlantis Press, pp. 562–567 (2016). https://doi.org/10.2991/
iccasp-16.2017.82
4. Fiscon, G., Weitschek, E., Cialini, A., Felici, G., Bertolazzi, P., De Salvo, S., Bramanti, A.,
Bramanti, P. and De Cola, M.C.: Combining EEG signal processing with supervised methods
for Alzheimer’s patients classification. BMC Med. Inform. Decis. Mak. 18(35) (2018). https://
doi.org/10.1186/s12911-018-0613-y
5. Dauwels, J., Srinivasan, K., Ramasubba Reddy, M., Musha, T., Vialatte, F.-B., Latchoumane,
C., Jeong, J., Cichocki, A.: Slowing and loss of complexity in Alzheimer’s EEG: two sides of
the same coin? Int. J. Alzheimer’s Dis. 539621 (2011). https://doi.org/10.4061/2011/539621
6. Abasolo, D., Hornero, R., Escudero, J., Gomez, C., Garcia, M., Lopez, M.: Approximate
entropy and mutual information analysis of the electroencephalogram in alzheimer’s disease
patients. In: IET 3rd International Conference On Advances in Medical, Signal and Information
Processing (MEDSIP), (2006), pp. 1–4. https://doi.org/10.1049/cp:20060347
7. Folstein, M.F., Folstein, S.E., McHugh, P.R.: Mini-mental state: a practical method for grading
the cognitive state of patients for the clinician. J. Psychiatry Res. 12(3), 189–198 (1975). https://
doi.org/10.1016/0022-3956(75)90026-6
8. Vakkuri, A., Yli-Hankala, A., Talja, P., Mustola, S., Tolvanen-Laakso, H., Sampson, T., Viertiö-
Oja, H.: Time-frequency balanced spectral entropy as a measure of anesthetic drug effect in
central nervous system during sevoflurane. Propofol, Thiopental Anesth., Acta Anaesthesiol.
Scand. 48(2), 145–153 (2004)
9. Petrosian, A.: Kolmogorov complexity of finite sequences and recognition of different preictal
EEG patterns. In: Proceedings Eighth IEEE Symposium on Computer-Based Medical Systems,
Lubbock, TX, USA, pp. 212–217 (2015). https://doi.org/10.1109/CBMS.1995.465426
10. Latchoumane, C.F.V., Vialatte, F.B., Jeong J., Cichocki, A.: EEG Classification of mild and
severe alzheimer’s disease using parallel factor analysis method. In: Ao, S.I., Gelman, L. (eds.)
Advances in Electrical Engineering and Computational Science. Lecture Notes in Electrical
Engineering, vol. 39. Springer, Dordrecht (2009). https://doi.org/10.1007/978-90-481-2311-
7_60
11. Datta, A., Chatterjee, R.: Comparative study of different ensemble compositions in EEG signal
classification problem. In: Abraham, A., Dutta, P., Mandal, J., Bhattacharya, A., Dutta, S. (eds.)
Emerging Technologies in Data Mining and Information Security. Advances in Intelligent
Systems and Computing, vol. 813. Springer, Singapore (2019). https://doi.org/10.1007/978-
981-13-1498-8_13
12. De Bock, T.J., et al.: Early detection of Alzheimer’s disease using nonlinear analysis of EEG via
tsallis entropy. In: Biomedical Sciences and Engineering Conference. Oak Ridge, TN, pp. 1–4
(2010). https://doi.org/10.1109/BSEC.2010.5510813
Quantification of Streaking Effect Using
Percentage Streak Area
Sajjad Ahmed and Saiful Islam
Abstract The Streaking effect is an artifact or characteristic fingerprint left by the

application of median filter. Many studies have used the application of median fil-
ters on digital images. Recently, percentage streaking area (PSA) has been used
as a metric to measure streaking in images. The paper performs an analysis of the
percentage streak area (PSA). The work investigates streaking effect using PSA in
natural images and when commonly applied filters such as Gaussian filter, Average
filter, and Unsharp masking are used for image filtering. Standard image datasets
UCID, BOSS, and Dresden have been used in the study, and results are presented.
The investigation shows that the streaking effect can be quantified by the percentage
streak area.
Keywords Median filter · Median filtering detection · Streak area · Percentage

streak area
1 Introduction
In today’s world, where information is communicated in multimedia such as digital

pictures, audio, and videos, establishing the authenticity of such digital multimedia
is an absolute necessity. The authenticity check becomes even more relevant and
essential when such media has to be presented before a court of law. For this pur-
pose, the Digital forensics researchers, precisely Digital image Forensics (DIF), are
developing methods to establish trust in digital images.
An Image forger in the modern world has access to a variety of sophisticated
hardware and software to tamper with an image. After tampering with the image,
the next step is to remove the traces left over by tampering. The most common type
of tampering is image composition, copy–move [1], resampling [2], and JPEG com-
S. Ahmed (B)
Baba Ghulam Shah Badshah University, Rajouri, India
e-mail: sajjad@bgsbu.ac.in
S. Islam
ZHCET, Aligarh Muslim University, Aligarh, India
168 S. Ahmed and S. Islam
pression [3]. To cover such forgeries, usually, an operation such as median filtering
and contrast enhancement is applied. This is done to hide the evidence that may be
visible to naked eyes or sophisticated tools used for the purpose.
To remove the evidences left by image manipulation techniques, the median filter
(MF) is the tool of the choice. The filter is effective against those techniques that rely
on the supposition that neighboring pixels are linearly correlated.
Median Filter is widely used for removing impulse noise from digital images. As it
is a nonlinear filter, it efficiently eliminates traces left by other methods. Thus, detect-
ing the median filter application on an image raises suspicion about the authenticity
of an image. Numerous methods have been proposed for median filter detection.
Many of the median filter detection methods use the streaking effect as an artifact
left by applying the median filter. Such work’s reliability is based on the fact that the
median filter images contain a streaking effect.
The purpose of the work is to investigate the streaking effect in natural images
in standard datasets, which are being used in digital image forensics-based methods
to evaluate such detectors’ performance. The work provides an in-depth analysis of
how much streaking is present in natural images and what it happens to streaks when
an original, unaltered, and uncompressed image is median filtered with different
window sizes. The work presented also studied the amount of streaking present in
other operators such as Gaussian filter, Average filter, and Unsharp masking. To
quantify the streaking effect on an image, percentage streak area (PSA) as proposed
in [4] was employed.
The rest of the paper is organized in the following sections: Sect. 2 covers back-
ground and prior work, Sect. 3.1 describes the experimental setup, Sect. 3.2 presents
and analyzes the results, and the conclusion is presented in Sect. 4.
2 Background and Prior Work
2.1 Median Filter
A median filter is a popular nonlinear filter used in image processing. As it is non-

linear, image forgers are used to remove traces of linear image manipulations. The
median filter operates on an image by replacing the pixel on which it is operated.
Usually, an odd-sized pixel window is employed. The median filter is a pixel-by-pixel
manipulation-based image operation. Mathematically, let I be a grayscale image of
M × N dimensions and Iˆw is the median filtered version of I . As median filter oper-
ates pixel-by-pixel, the output Iî, j of the median filter when the pixel (i, j) is operated
upon using a square window w × w encompassing (2 ∗ Z + 1) × (2 ∗ Z + 1) pixels
may be defined as
Iî,wj = medw (Ii, j )

Quantification of Streaking Effect Using Percentage … 169
where
medw (Ii, j ) = Median { Ii+r, j+s |}

∀ ( i, j) ∈ (1, . . . , M) × (1, . . . , N )} (1)
where r, s ∈ (−Z , . . . , 0, . . . , +Z ). The commonly used filter size is a square win-

dow of 3 × 3 and 5 × 5 size and is the only parameter available with median filter.
2.2 Streaking Effect
When a median filter is applied on one-dimensional data, the output value is one of
the input values, and no new value is created. Thus, the same value may be chosen
as output over several filter window shifts producing streaks or blotches having no
visual correlation. Such artifacts are called streaks and the effect is called streaking
effect, first mentioned in [5]. The median filter application on an image produces an
image with the same or almost the same gray level. Such regions exhibit different
shapes depending on the filter’s size and the type of filter applied, such as star and
square-shaped. These regions, which are introduced into the image, may take the
form of a streak or a two-dimensional rough patch. The streaking effect for images
was analyzed by Bovik et al. in [6].
2.3 Median Filter Detector Based on Streaking Artifact
Among various methods used to detect median filter applications on digital images,
the most frequently used method is based on the streaking effect. The streaking effect
was first mentioned in [5]. Bovik in [6] performed a probability-based analysis of
the impact in median filtered signals, and the results showed that median filter with
square window produces blotches in two-dimensional signals. An undesirable effect
of applying a median filter to an image is the streaking effect. Application of median
filter produces streaks or blotches, which are the runs of equal or near-equal values.
The streaking effect problem is that it makes blotches and streaks which means it
introduces artifacts such as false lines and contours. The streaking effect becomes
useful when testing whether an image is median filtered or not and is applied for
many researchers’ purposes. In [7], Kirchner et al.’s work is based on the streaking
effect and measured the streaking effect by taking the first-order image difference
of the image. They observed that the ratio of the number of zeros to the number of
ones might be used as a feature vector. In the same paper, the author also explored
the SPAM application [8] to detect median filter application. They successfully used
second-order SPAM feature vector of size 686 dimensions to detect median filtering
in high-quality JPEG compressed images. The work by Cao et al. [9] is also based
on streaking artifacts. They quantified the effect by measuring the number of zeros
in the image’s textured region after taking the first-order difference image. The work
considered both horizontal and vertical streaks introduced after the application of a
median filter. The work by Yuan in [10] is based on the fact that median does not
introduce new values. Only redistribution of value occurrs and presented the median
filtering forensics (MFF) feature. The MFF is a combination of five subfeatures
that are calculated using order statistics to measure local dependence introduced
by the median filter. Kang et al. [11] used the autoregressive model for median
filter detection. Li et al in [12] proposed a single dimension feature based on the
observation that frequency residual obtained from an image which is median filtered
again and again monotonically decreases. Sajjad Ahmed and Saiful Islam introduced
percentage streak area to measure streaking effect characteristics using a median
filter. Further, a median filter detector has been proposed based on machine learning
methods. This method utilizes the percentage streak area as a quantitative measure to
construct a feature vector based on an increase in the percentage of pixels involved
in streaking after median filtering an image.
2.4 Percentage Streak Area
The percentage streak area (PSA) is a metric proposed in [4] to quantify the streaking
effect. The PSA is the percentage of pixels involved in streaks in an image. We denote
a gray-level image as I of M × N dimensions. We will assume that
−→ )=←
psa(I − )
psa(I
and
↓ psa(I ) =↑ psa(I ).
The − → is calculated in the horizontal direction and vertical directions ↓ psa and
psa
merged to create, psa(I ), by taking weighted mean of − → ) and ↓ psa(I ) as
psa(I
follows:
−→ )√2+ ↓ psa(I )√2
psa(I
psa(I ) = (2)
2
The above equation may also be written as
−−−−−−−−→ 100
psa(I ) = ( Str eak Ar ea(I )+ ↓ Str eak Ar ea(I )) × √ (3)
MN 2
−−−−−−−−→
where Str eak Ar ea(I ) is the total numbers of pixels in row-wise streaks in image
I , and ↓ Str eak Ar ea(I ) represents the total numbers of pixels involved in column-
wise streaks in image I . For an image I w which is median filtered with a square
window of size w, psa(I w ) may be calculated using Eq. 3 as follows:
−−−−−−−−→ 100
psa(I w ) = ( Str eak Ar ea(I w )+ ↓ Str eak Ar ea(I w )) × √ (4)
MN 2
3.1 Experimental Setup
To analyze the presence of streaks and how they are affected by median filter, a study
of streak in original images and streaks in median filtered images was conducted.
For this purpose, a total of 12,826 digital images from the standard image dataset
UCID [13], from BOSS [14], and Dresden [15] were taken to construct a dataset of
natural image DS = {U C I D, B O SS, Dr esden}. The DS consists of 1338 images
from UCID, 10,000 Images from BOSS, and 1488 images from Dresden datasets.
The median filtered dataset DSw was constructed by median filtering the original
dataset DS with different filter window sizes w = {3, 5, 7, 9, 11}. To study the effect
of average filtering, DS was filtered using the same window size, w, generating
DSavgw . Similarly, dataset DSg f and DSusm were generated from DS. We employed
Percentage Streak Area (PSA) as described in [4] as a quantification parameter to
measure the streaking effect. We define a streak as a run of pixel length with the
same or almost the same pixel intensity value in the study. A streak of length one
consists of 2 pixels of the same or almost the same intensity. The psa(I ) measures the
percentage of pixels of an image involved in streaks and may be applied to measure
the streaking effect.
3.2 Results and Discussion
To study the streaks in original and unaltered images, the dataset DS was processed
to extract streaks and PSA using Eq. 3. One of the images highlights the horizontal
streak in original and median filtered images in Fig. 1 for UCID000107.tif from
the UCID dataset. Figure 1a shows the original image, and Fig. 1b shows a median
filtered image. Figure 1c shows horizontal streaks in the original image and Fig. 1d
shows streaks in the median filtered image. The visual difference between Fig. 1c,
d clearly shows an increase in streaking after applying the median filter. Table 1
shows statistics related to streaks in the image UCID000107.tif. Similar statistics
were available for every image in dataset DS indicating an increase in streak area
after median filtering except a very small number of images and is presented in
Table 2. Table 2 shows the number of images from each dataset. Upon investigation,
(a) Original Image (b) Median Filtered with 3x3 filter
(c) Horizontal Streaks in original (d) Horizontal Streaks in Image

image Median filtered with 3x3 window
Fig. 1 Horizontal streaks in UCID000107.tif
it was found that these images contain highly saturated regions such as a black sky
with a moon in the foreground. Figure 3 shows the mean percentage streak area
for the dataset DS. The mean percentage streak area for natural images is minimal
compared to the mean percentage streak area for median filter images of any window
size. Also, the percentage streak area for images filtered with a higher filter window
size is more than the percentage streak area of images filtered with smaller window
filter sizes. It can be inferred from Fig. 3 that significant streaks are present in original
and unaltered images, but the increase in pixels involved in streaks is also significant
after the median filtering of the images. The dataset DS was also studied for the
effect of the repetitive application of median filter on PSA and increase in PSA on
repeated application of median filtering with a window size of 3 × 3. The first five
differences between consecutive repetitions of media filter of all images in the UCID
dataset are potted in Fig. 2. di f f (1) is the difference in PSA of original images and
PSA of the 1-time median filtered version of the images. Similarly, di f f (2) is the
difference in PSA of 2-times and 1-time median filtered version of the image and so
on. All images from the datasets show the same trend.
For unfiltered and original images di f f (1) and di f f (2) are tremendous as com-
pared to di f f (2) and di f f (3) and so on. Similar results are obtained for the Dresden
Table 1 Increase in streaks and PSA for Image UCID000107.tif

Original Median filtered Increase
No. of streaks 9,885 36,734 26,849
No. of pixels in streaks 21,756 87,411 65,655
PSA 11.07 (%) 44 (%) 32.93 (%)
Table 2 Outliers
Datasets No. of 3×3 5×5 7×7 9×9 11 × 11
images
BOSS 10,000 2 0 3 5 12
UCID 1338 0 0 2 0 0
Dresden 1488 0 0 0 0 0
Total 12,826 2 0 5 5 12
Fig. 2 Difference in
percentage streak area
Fig. 3 Percentage streak

area versus window size
Fig. 4 Average percentage streak area after application of various operations on dataset
dataset and the BOSS dataset. For all three datasets, the increase in percentage streak
area (PSA) is monotonic. The rise in PSA is huge when an image is median filtered
for the first time. But on further application of median filter, the PSA increases slowly.
Table 2 show several images that do not follow the monotonic behavior when median
filtered with different window sizes of 3 × 3, 5 × 5, 7 × 7, 9 × 9, and 11 × 11. All
other images in the dataset considered for the study show a monotonic increase in
the streaking effect on repeated median filtering.
When an image is filtered with other popular filters that are often applied filters for
image processing such as average filter, Gaussian filter, and unsharp masking filter,
the streaking also increases and is plotted in Fig. 4 which shows mean percentage
streak area for the UCID dataset when filtered using median filter, the average filter,
the Gaussian filter, and the unsharp masking filter. The results clearly show that the
mean percentage streak area increases significantly more for the median filter than
the average filter and Gaussian filter. The increase is small for the Gaussian filter,
but the trend is reversed for unsharp masking, for which the percentage streak area
decreases. Percentage streak area is more for median filter and is a characteristic
feature for detecting a median filter application. Further study of percentage streak
area as a quantification measure of streaking needs to be evaluated, and a qualitative
and quantitative comparison is required.
4 Conclusions
The paper investigates the streaking effect in natural images. The percentage streak-
ing area (PSA) has been used as a metric to quantify streaking in an image. Stan-
dard image datasets UCID, BOSS, and Dresden have been used in the study. Our
work shows that significant streaking is present in original and unaltered images as
well. The investigation indicates that though authentic images contain a considerable
amount of streaking, the median filter application increases the streaking significantly
and can be detected by percentage streak area (PSA). The percentage streak area also
increases on application of average filter and Gaussian filter but increase in signifi-
cantly smaller as compared to increase in percentage streak area when same image is
median filtered and for unsharp masking filter the percentage streak area decreased.
In conclusion, we can say that the streaking effect can be quantified using percentage
streak area and is a promising feature vector for future studies in the area.
References
1. Ferrara, P., Bianchi, T., Rosa, A.D., Piva, A.: Image forgery localization via fine-grained anal-
ysis of CFA artifacts. IEEE Trans. Inf. Forensics Secur. 7(5), 1566–1577 (2012)
2. Cao, G., Zhao, Y., Ni, R.: Forensic identification of resampling operators: a semi non-intrusive
approach. Forensic Sci. Int. 216(1), 29–36 (2012)
3. Neelamani, R., De Queiroz, R., Fan, Z., Dash, S., Baraniuk, R.G.: Jpeg compression history
estimation for color images. IEEE Trans. Image Process. 15(6), 1365–1378 (2006)
4. Ahmed, S., Islam, S.: Median filter detection through streak area analysis. Digit. Invest. 26, 100–
106 (2018). [Online]. https://www.sciencedirect.com/science/article/pii/S1742287617303109
5. Justusson, B.I.: Median Filtering: Statistical Properties, pp. 161–196. Springer Berlin Heidel-
berg, Berlin, Heidelberg (1981). [Online]. http://dx.doi.org/10.1007/BFb0057597
6. Bovik, A.C.: Streaking in median filtered images. IEEE Trans. Acoust. Speech Signal Process.
ASSP-35(4), 181–194 (1987)
7. Kirchner, M., Fridrich, J.: On detection of median filtering in digital images. IS& T/SPIE
Electron. Imaging 110–754 (2010)
8. Pevny, T., Bas, P., Fridrich, J.J.: Steganalysis by subtractive pixel adjacency matrix. IEEE Trans.
Inf. Forensics Secur. 5(2), 215–224 (2010)
9. Cao, G., Zhao, Y., Ni, R., Yu, L., Tian, H.: Forensic detection of median filtering in digital
images In: IEEE International Conference on Multimedia and Expo (ICME), pp. 89–94 (2010)
10. Yuan, H.-D.: Blind forensics of median filtering in digital images. IEEE Trans. Inf. Forensics
and Secur. 6(4), 1335–1345 (2011)
11. Kang, X., Stamm, M.C., Peng, A., Liu, K.J.R.: Robust median filtering forensics based on
the autoregressive model of median filtered residual. In: Proceedings of the 2012 Asia Pacific
Signal and Information Processing Association Annual Summit and Conference, Dec 2012,
pp. 1–9 (2012)
12. Li, W., Ni, R., Li, X., Zhao, Y.: Robust median filtering detection based on the difference of
frequency residuals. In: Multimedia Tools and Applications, pp. 1–19 (2018)
13. Schaefer, G., Stich, M.: Ucid: an uncompressed color image database. Electron. Imaging 2004,
472–480 (2003)
14. Bas, P., Filler, T., Pevny, T.: Break our steganographic system: the ins and outs of organizing
boss. In: International Workshop on Information Hiding, pp. 59–70 (2011)
15. Gloe, T., Böhme, R.: The Dresden image database for benchmarking digital image forensics.
J. Digit. Forensic Pract. 3(2–4), 150–159 (2010)
Improving Topographic Features
of DEM Using Cartosat-1 Stereo Data
Litesh Bopche and Priti P. Rege
Abstract In the current study, we have generated a Digital Elevation Model (DEM)
using high spatial resolution stereo images (2.5 m spatial resolution) of the Cartosat-
1 satellite and examined the terrain’s quantitative topographic features. Firstly, the
DEM is generated through topographic features such as elevation, slope gradient,
aspect, hill shade, and contour map. We performed a comparative evaluation of
the accuracy of topographic features the DEM generated through stereo images and
freely accessible Cartosat-1 DEM data (30 m spatial resolution) with other references
DEMs such as Shuttle Radar Topography Mission (SRTM) DEM and ALOS global
DSM (AD3D30). The visual analysis of all the DEMs is done through a surface profile
map. The surface profile map of DEM generated through stereo images shows a good
correlation with reference DEMs in all regions of the profile map. This study reveals
that the Cartosat-1 DEM generated through stereo images gives better accuracy than
freely accessible Cartosat-1 DEM.
Keywords ALOS global DSM · Cartosat-1 DEM · Filters · SRTM DEM
1 Introduction
The most popular and simplest arrangement of terrain representation in 3D is the

Digital Elevation Model (DEM). A DEM and its derivatives (for example, slope
gradient, aspect, hill shade, landscape roughness, contour, drainage, and curvature)
are input to numerous studies, such as geomorphological and geological studies,
watershed analysis, drainage network characterization, soil mapping and characteri-
zation, and landslide hazard analysis [1–3]. DEMs are generally useful in such areas
that are devoid of comprehensive topographic maps. A DEM refers to a quantitative
and qualitative model of a different part of the Earth’s terrain in digital form. Hence,
L. Bopche (B) · P. P. Rege

College of Engineering, Pune, India
e-mail: bp18.extc@coep.ac.in
P. P. Rege
e-mail: ppr.extc@coep.ac.in
178 L. Bopche and P. P. Rege
there is an immense requirement for precise and accurate DEMs covering the world’s
entire surface.
DEMs can be extracted from numerous data sources and techniques such as
the Interferometric Synthetic Aperture Radar (InSAR) methods, photogrammetric
methods, aerial laser scanning, and ground surveying methods [4, 5]. DEM is also
referred to as Digital Terrain Model (DTM) and Digital Surface Model (DSM) in
scientific journals. DSM of the topography characterizes the Earth’s terrain and
contains all objects like plants, ground surface, vegetation, and buildings. DSM
is very beneficial for landscape modeling: land use land cover planning, environ-
mental application, construction purpose, and many others [6]. The existing DEMs
like Cartosat DEM, Shuttle Radar Topography Mission (SRTM) DEM, and ALOS
global DSM (AD3D30) are used for obtaining the topographic features map.
This paper has generated high Spatial Resolution (SR) Cartosat-1 DEM (10-m
SR) from stereo pair images of the Cartosat-1 satellite. A detailed comparison of
the DEM generated through stereo images and Cartosat-1 DEM has been accom-
plished regarding the DEMs like SRTM DEM and ALOS global DSM (AD3D30)
to estimate the quality accuracy generated DEM [7, 8]. Freely available Cartosat-1
DEMs show considerable drawbacks in consistency, availability, degree of resolu-
tion, and coverage. This study shows that the DEM accuracy generated through stereo
pair images was close to the reference DEM compared to Cartosat-1 DEM (30-m
resolution).
The following sections of the paper are organized as follows: Sect. 2 of the paper
explains the detailed topographic region and the data-set under study. Section 3
presents the methodology used in this work, whereas Sect. 4 covers a discussion on
the results. The paper finishes with Sect. 5, which is Conclusion. Henceforth, the
DEM extracted through stereo pair images and freely accessible Cartosat-1 DEM
will be designated as DEM-CART and DEM-CART1, respectively.
2 The Background
In the current study, DEM comparison has been performed for the Ambegaon taluka
of Pune district of Maharashtra, India. The northwest part of the study area contains
rugged mountains, undulating regions, and a high slope gradient. The region of
interest is about 1039 km2 . The geography of the study area lies between latitude
73° 24 0 N and 73° 27 0 N and longitude 19° 6 0 E to 19° 3 0 E as shown in
Fig. 1.
The data-set was downloaded from the following website:
(1) DEM-CART1 {Bhuvan—geoportal of Indian Space Research Organization
(ISRO)}
(2) SRTM-DEM {www.usgs.org}
(3) ALOS World 3D global DSM {https://www.eorc.jaxa.jp/ALOS/en/aw3d30}.
Improving Topographic Features of DEM Using … 179
Fig. 1 The geographic area

of Ambegaon taluka of Pune
district
Pune District Map
2.1 Cartosat-1 DEM
Cartosat-1 DEM with a 7.5 m vertical resolution and 2.5 m SR are projected to
extract DEM. The Rational Polynomial Coefficient (RPC) file and geometric models
are the essential components required for extracting DEM from stereo pair images
of Cartosat-1 satellite. The Cartosat satellite was launched by ISRO on May 5, 2005.
Cartosat-1 satellite has two panchromatic cameras for stereo image viewing and a
Global Positioning System (GPS) receiver to position the areas under consideration.
Panchromatic cameras of Cartosat-1 sensors are tilted 5° (backward-viewing) and 26°
(forward-viewing) from the ground axis, respectively. The time difference between
capturing the stereo images is 52 s. Cartosat-1 satellite has along-track stereo viewing
capabilities with a swath width of 30 km.
Stereo images of the Ambegaon area for Cartosat-1 sensors are acquired by
National Remote Sensing Centre (NRSC) Hyderabad, India. The DEM extracted
using stereo-pair images are shown in Fig. 2a. DEM-CART1 of the study region
downloaded from the Bhuvan website is shown in Fig. 2b.
2.2 Srtm Dem
The SRTM has been the foremost mission using a space-borne single-pass InSAR
instrument to create a worldwide DEM of the Earth’s terrestrial surface with moderate
horizontal and vertical accuracies ± 30 m, ± 16 m, respectively. The SRTM mission
has been a revolution in Remote Sensing (RS) of geography creating the most
comprehensive, high-resolution DEM worldwide. The SRTM mission was started
by the collaboration of the NGA, and NASA gathered interferometric sensor data
utilized by the Jet Propulsion Laboratory (JPL) to produce a near worldwide (80%
of Earth’s land) DEM for latitude lesser than 60°. The SRTM DEM of the area under
consideration is shown in Fig. 2c.
(a) (b)
(c) (d)
Fig. 2 Study Area a Cartosat-1 DEM generated through stereo pair images (DEM-CART),
b Cartosat-1 DEM (DEM-CART1) downloaded by Bhuvan web portal, c SRTM DEM, and d ALOS
World 3D global DSM
2.3 ALOS World 3D Global DSM
Since 2014, the Japan Aerospace Exploration Agency (JAXA) has conducted the
project to produce the accurate worldwide digital 3D model “ALOS World 3D”
(AW3D) for screening the global land regions through the utilization of 3 million
scene records obtained by the PRISM panchromatic optical sensor on the progressive
land observing satellite “DAICHI” (ALOS).
The advanced digital 3D model contains a DEM or DSM that can characterize
land topographies with 5 m (approximately) in SR and orthorectified PRISM nadir
viewing images. The AW3D DSM data-set is further processed, and the “ALOS
World 3D-30 m” (AW3D30) DSM data-set was released, which has approximately
30 m of SR. The digital 3D model has been used in various applications like damage
estimation of natural calamities, map development, infrastructure planning, and water
resource study. The ALOS DEM of the area under consideration is shown in Fig. 2d.
3 Methodology
3.1 Extraction of DEM-CART
The DEM-CART extraction through the stereo images of Cartosat-1 sensors is

accomplished using the LPS software ERDAS IMAGINE. Some noise such as
random noise, speckle noise, and inherent noises are present in the stereo images
of the Cartosat-1 DEM. The conventional noise reduction filters such as weighted
average filter, median filter, sharpen filter, lee sigma filter, and local sigma filter are
used to remove noise from the preprocessing steps’ stereo images. After that, the
DEM was created from the stereo images. The process of extracting DEM-CART in
LPS software includes multiple actions such as generating a block file, adding and
editing the frame, providing the RPC (exterior and interior positioning), initiation of
automated tie points, block triangulation, and generation of DEM.
The extraction of DEM procedure in LPS software, using stereo images, starts
with generating a block file, describing the geometric model, and then a raster image
is included. The new raster image is corrected by providing RPC. Cartosat-1 sensor
stereo images have corresponding RPC files. The Rational Polynomial sensor models
relate the image space (row and column) to the latitude, longitude, and altitude of
the terrain. We have used the automatic tie point collection to select the tie points
on the stereo pair image. The LPS software selects a point in one image and sets the
corresponding point in the second image.
The DEM extraction block triangulation process defines the mathematical relation
between the sensor model’s images and the territory. Block triangulation of the frame
is to be done by using the automatic tie points of the image. The DEM-CART is
extracted after processing the block file.
In this work, DEM is extracted using the automatic tie point selection method only.
The DEM extraction using the automatic tie point selection techniques develops a
good quality DEM from the stereo images without gathering any extra information
(like Ground Control Points (GCPs) of the region) for the area under the study.
3.2 Generation of the Topographic Maps
Slope map, aspect map, hill shade map, and contour map are the crucial topographic
features that effectively represent the landscape structure and relief of the terrestrial
surface. Topographic features have an enormous influence on the accuracy and quality
of the DEMs. The effects of the spatial distribution of vertical errors on the topo-
graphic feature maps were examined for the resultant DEM-CART by comparing
elevation inconsistencies, mean value, and standard deviation (SD) of the DEMs
with respect to topographic features obtained from the reference DEM. Topographic
feature maps for all the DEMs were generated by using the spatial analytic toolbox
of ArcGIS software.
The quality and accuracy of DEM-CART, DEM-CART1, SRTM DEM, and ALOS
World 3D global DSM are compared through elevation value and topographic
features produced in ArcGIS 10.8 software, as shown in Fig. 3.
Fig. 3 Topographic features maps of all the DEMs a slope map, b aspect map, and c hill shade
map
Fig. 3 (continued)
For the comparison purpose, the statistical parameters will be generated and
compared. The elevation value is compared through minimum elevation, maximum
elevation, mean, and SD of the DEM as given in Table 1. The DEM-CART elevation
statistical data are closer to the reference DEM (such as SRTM DEM and ALOS
DSM) compared to the DEM-CART1.
Table 1 Statistical parameter values of all the DEMs

Elevation Map (in meter) Minimum Maximum Mean SD
DEM-CART1 198 1199 702.66 144.94
DEM-CART 225 1270 723.42 144.58
SRTM 296 1277 773.62 144.66
ALOS 298 1281 774.22 145.02
Slope Map (in degree) Minimum Maximum Mean SD
DEM-CART1 0 80.79 7.17 7.27
DEM-CART 0 79.72 9.06 8.84
SRTM 0 75.39 8.6 8.69
ALOS 0 72.15 8.87 9.53
Hill Shade Map Minimum Maximum Mean SD
DEM-CART1 0 254 176.75 22.41
DEM-CART 0 180 60.20 72.90
SRTM 0 180 54.81 69.44
ALOS 0 180 52.68 68.58
For comparing the quality and accuracy of the DEM-CART, DEM-CART1, SRTM
DEM, and ALOS World 3D global DSM through topographic features like slope
map, aspect map, and hill shade map were created for all the DEMs in ArcGIS
10.8 software individually. The spatial analytic toolbox’s surface tool is used for
generating the topographic feature maps of all the DEMs. A comparison of statistical
parameters for the topographic features is given in Table 1. The statistical values of
the topographic features for the DEM-CART are closer to the reference DEM values
than the DEM-CART1.
The visual comparison of the DEM-CART, DEM-CART1, SRTM DEM, and
ALOS World 3D global DSM was done with the help of the contour map and surface
profile map of the DEM. The 50 m time interval contour maps of the study area are
generated using ArcGIS software for all the DEMs, as shown in Fig. 4. The contours
of DEM-CART were comparatively similar to the reference DEM. The count of the
different contour lines has also validated the DEM-CART accuracy (Table 2).
Fig. 4 Contour map of Cartosat-1 DEM (DEM-CART1), DEM-CART, SRTM DEM, and ALOS
World 3D global DSM
Table 2 Statistical values of different contour lines for all the DEMs
Contours DEM-CART1 DEM-CART SRTM DEM ALOS DSM
200–550 596 378 147 112
600–800 1396 677 555 587
850–1200 508 329 431 373
Total counts 2500 1384 1133 1072
Fig. 5 Surface profile maps of Cartosat-1 DEM (DEM-CART1), SRTM DEM, ALOS World 3D
global DSM, and generated Cartosat-1 DEM (DEM-CART), respectively
The study region’s surface profile maps are generated using ArcGIS software for
all the DEMs, as shown in Fig. 5. The surface profiles were plotted (elevation value
versus distance in km) to check height variation along the profile lines. The outcomes
of the surface profile lines show that some regions (red circles) of DEM-CART1 are
less similar to reference DEM compared to the elevation profile map of DEM-CART.
The elevation profile map of DEM-CART shows a good correlation with reference
DEM in all regions of the profile map.
5 Conclusion
In this study, DEM’s topographic features extracted through Cartosat-1 stereo images
(DEM-CART) and DEM-CART1 are compared and validated against a reference
DEM for the area under consideration. The statistics of the terrain-related attributes
are calculated with the help of the ArcGIS software. It is observed that the eleva-
tion value and topographic feature attribute values of DEM-CART are closer to the
reference DEM. The main reasons are (i) DEM-CART is generated through high-
resolution Cartosat-1 stereo images, (ii) sufficient number of the tie points are used,
and (iii) noise reduction filters are used for removal of the noise in the preprocessing
steps.
DEM-CART1 shows a more considerable variation in the reference DEM’s eleva-
tion values and topographic features’ attribute values. DEM-CART provided useful
and realistic information about the area’s topography and showed virtually the same
as that of the reference DEM. The visual analysis of the DEMs also clarifies the
quality and accuracy of the CART-DEM. The graphical illustration of the 50 m time
interval contour map of DEM-CART is similar to the reference DEM.
The surface profile graph of the DEM-CART1 shows a more considerable differ-
ence in the elevation values against distance compared to the reference DEM. The
surface profile map of DEM-CART shows a good correlation with reference DEM
in all regions of the profile map. The DEMs’ topographic feature maps are handy for
several studies like hydrology, drainage network, groundwater mapping, landslide
hazards mapping, runoff modeling, and watershed analysis.
References
1. Yin, Z.Y., Wang, X.: A cross-scale comparison of drainage basin characteristics derived from
digital elevation models. Earth Surf Process. Landf. 24, 557–562 (1999)
2. Bhatt, S., Ahmed, S.A.: Morphometric analysis to determine floods in the Upper Krishna basin
using Cartosat DEM. Geocarto. Int. 29, 878–894 (2014)
3. Gopinath, G., Swetha, T.V., Ashitha, M.K.: Automated extraction of watershed boundary and
drainage network from SRTM and comparison with Survey of India toposheet. Arab. J. Geosci.
7, 2625–2632 (2014)
4. Giribabu, D., Kumar, P., Mathew, J., Sharma, K.P., Krishna Murthy Y.V.N.: DEM generation
using Cartosat-1 stereo data: issues and complexities in Himalayan terrain. Eur. J. Remote Sens.
46, 431–443 (2013)
5. Singh, V.K., Ray, P.K.C., Jeyaseelan, A.P.T.: Orthorectification and digital elevation model
(DEM) generation using Cartosat-1 satellite stereo pair in Himalayan Terrain. J. Geogr. Inf.
Syst. 2, 85–92 (2010)
6. Hobi, M.L., Ginzler, C.: Accuracy assessment of digital surface models based on WorldView-2
and ADS80 stereo remote sensing data. Sensors 12, 6347–6368 (2012)
7. Pakoksung, K., Takagi, M..: Assessment and comparison of digital elevation model (DEM)
products in varying topographic, land cover regions and its attribute: a case study in Shikoku
Island Japan. Model Earth Syst. Environ. (2020)
8. Agarwal, R., Sur, K., Rajawat, A.S.: Accuracy assessment of the CARTOSAT DEM using robust
statistical measures. Model Earth Syst. Environ. 6, 471–478 (2020)
Active Noise Cancellation System
in Automobile Cabins Using
an Optimized Adaptive Step-Size FxLMS
Algorithm
Arinjay Bisht and Hemprasad Yashwant Patil
Abstract Active noise cancellation systems (ANCs) are employed to reduce or

virtually eliminate the noise produced in the subject’s vicinity. The Filtered-X Least
Mean Square (FxLMS) is an adaptive algorithm that hands down the best and most
convenient option for realizing this ambition since it is highly regarded for its reduced
computational complexity and design robustness for use in a controller in an adaptive
filter. An adaptive filter differs from a conventional filter given that it has a dynamic
mode of operation, which involves the use of adaptive algorithms. These algorithms
are used in applications ranging from system identification to the cancellation of
unwanted noise. Here, we have performed a comparative study between our proposed
design and the conventional design. In this paper, we endeavor to apply the FxLMS
algorithm to tackle noise generated by vehicular traffic, including vehicle combustion
engines, which usually lie in the narrowband frequency spectrum. The proposed
system uses adaptive learning parameters such as adaptive step size to increment
the rate of convergence and the speed of reduction of noise. We have made use of
narrowband internal combustion engine white noise as our source noise signal for
simplicity and convenience, which has been randomly generated during simulations.
Keywords FxLMS · ASSFxLMS · MSD · NRR · ANC · FIR
1 Introduction
The noise generated by an Internal Combustion (IC) engine comprises several compo-
nents from various sources. In its normal operating condition, the noise generated by
combustion ranges mostly 100–1000 Hz, which falls in the narrowband spectrum.
Hence, this justifies applying a single-channel FxLMS algorithm to minimize the
complexity of computations and installation costs. Another critical issue is related to
specific nonlinear characteristics of the noise generated by the IC Engine system. The
origin of the primary nonlinearity effects can be from the following three sources:
the primary source of noise, actuators and system-based sensors, and the paths of
A. Bisht (B) · H. Y. Patil

School of Electronics Engineering, Vellore Institute of Technology, Vellore, TN, India
188 A. Bisht and H. Y. Patil
acoustic propagation. With all these factors and implementation considerations, such
as the physical constraints, complexity of the system design, and cost reduction, we
have decided to go with an upgraded Adaptive Step-Size FxLMS algorithm. The
FxLMS algorithm has been aptly designated as perhaps the most popular of all adap-
tive algorithms that can be used to update the controller weight. The fixed step-size
version of this algorithm provides satisfactory performance, but at the expense of the
burden caused by more decadent computational complexity. To acquire an algorithm
that assures a high convergence rate in both dynamic and stationary environments,
using an adaptive step-size function in the original algorithm seems more appro-
priate and lucrative [1]. For the algorithm’s application, an approximation of the
second propagation path is performed, which is then used to filter the reference noise
signal. Following this, it is used in a generalized secondary propagation path and
static non-deterministic noise field. These effects have been analyzed only under the
assumption that the secondary path is perfect [2, 3]. The complexity created as a
result of the x-filtering blocks poses a challenging problem and leads to disruptions
in implementing the system.
The above problem has provided ample motivation to seek a new advanced struc-
ture for the system where the disruptions and discrepancies caused by this may be
mitigated or eliminated [4]. Active Noise Cancellation (ANC) of narrowband white
noise will be instrumental in reducing noise signals with frequency characteristics
of a discrete nature. Considering the case of narrowband white noise, which ranges
from a few 100 Hz to less than a few 1000 Hz, it is a difficult task to eliminate
interference merely by methods of a submissive nature, but it can be removed more
effectively by using state-of-the-art ANC techniques involving destructive interfer-
ence [5]. An algorithm with a fixed step size might fail to give an optimized response
to a time-variant channel’s parameters, which may result in poor performance. For
the sake of overcoming this limitation, methods involving variable step size have
been developed, and many of the adaptive step-size functions have been built to
overcome the limitations of existing structures. For this very reason, a small step
size is usually maintained in a bit to ensure that the property of better convergence
is fulfilled [6]. Here, we have envisioned developing and implementing an advanced
algorithm for an optimized variable step-size and tap-length active noise cancella-
tion system using FxLMS for randomly generated narrowband internal combustion
engine noise. Using this approach, we endeavor to achieve a much better convergence
rate than previous system structures [7, 8] and an increase in overall performance
while also respecting the power and computational cost constraints. Section II reviews
the design features of a standard feedforward ANC system. Section III discusses the
intricacies of existing methodologies involving the FxLMS algorithm, while section
IV describes the various design parameters that we have taken into consideration
while constructing our proposed design. In section V, we analyze the simulation
results as a result of our contribution, and section VI draws up concluding remarks
concerning the proposed system, which reaffirms and validates the feasibility and
functionality of the design.
Active Noise Cancellation System … 189
2 Active Noise Cancellation System Design
The transfer function involving the secondary propagation path has a critical role in
generating anti-noise in avenues where ANC applications are in high demand since
it has nonlinearity. As a result, it causes a delay, which further leads to instability in
the LMS algorithm. This problem has been dealt with with the help of the robust and
efficient FxLMS algorithm [9], as it also takes into account an estimate of the second
propagation path. There is also the advantage of flexibility because the algorithm can
be used in both feedback and feedforward structures. In the case of a feedforward
ANC system, P(z) denotes the primary propagation path, which entails the acoustic
response of source reference noise to the system identification error sensor, and S(z)
represents the secondary propagation path. But since the effect of the secondary
propagation path effect needs to be canceled out, we need to measure the secondary

impulse response denoted by S(z). The secondary signal y(i) is represented by (1)
[9].
y(i) = w T (i)x(i); (1)
where the coefficient w(i) and signal vector x(i) have length L, and the FIR filter
W(z) exists at discrete time interval i. By the FxLMS algorithm, these coefficients
are updated in the following manner:

wl (i + 1) = wl (i) + μx (i − 1)e(i), (2)
l = 0, 1, . . . , L − 1, μ > 0
where μ denotes the step size, and

x (i) = S (i) ∗ x(i) (3)
is the reference signal after it has been filtered.
3 Existing Methodologies
3.1 Eriksson’s Method
This method of online secondary propagation path modeling was proposed in [10].
This basic method generates a random noise signal for training purposes. In this
paper, we use an adaptive filter to generate a secondary impulse response that will
model S(z) while the ANC system is in operation. As per this algorithm, the signal
e(i), which denotes the noise residue, is expressed as
e(i) = d(i)−y (i) + v (i); (4)
y (i) = s(i) ∗ y(i); v (i) = s(i) ∗ v(i); (5)
where v(i) is the random AWGN signal generated internally and then injected at the
output of W(z), which denotes the control filter. Here, the finite impulse response
(FIR) filter is responsible for modeling the secondary impulse response and is

represented by S (z) and has length M:

T
v (i) = s (i) ∗ v M (i). (6)

Here, v (i) generates the error signal, which is correspondingly sent to the

modeling FIR filter S (z) and the controller denoted by W(z), and both are represented,
respectively, as follows:

f (i) = d(i) − y (i) + v (i) − v (i)

(7)

g(i) = [d(i) − y (i)] + v (i)

(8)

The modeling FIR filter S (z) has its coefficients updated as follows:

s (i + 1) = s (i) + μe (i) ∗ f (i) ∗ v(i) (9)
where μe (i) is the parameter for step-size function. Following this, the controller
(W(z)) coefficients are updated in the given manner:

w(i + 1) = w(i) + μe (i) ∗ f(i) ∗ x (i) (10)

The reference signal is filtered through S (z) to derive the LMS algorithm’s input

T
x (i) = s (i) ∗ x K (i) (11)
where
x K (i) = [x(i), x(i − 1), . . . , x(i − K + 1)]T (12)

3.2 Akhtar’s Method
Several other techniques have since been presented in a bid to outperform Eriksson’s
method [10–15]. Among these, one of the current methods presented in [12] shows
great promise. Akhtar’s approach can be described as a worthy upgrade over
Eriksson’s system [10]. This method performs modeling of the filter using the VSS-
LMS algorithm and makes use of f(i) as the designated error signal in the case of both

W(z) and S (z). This algorithm updates the coefficients of the modeling filter S (z).
The step-size parameter of the algorithm (μe (i)) then updates the filter in question,
and the appropriate calculations of the parameter are performed using the following
steps:
• In the beginning, the power computation of error signals f(i) and e(i) is performed:
Pe (i) = λ ∗ Pe (i − 1) + (1 − λ) ∗ e2 (i); (13)
P f (i) = λ∗P f (i − 1) + (1 − λ) ∗ f 2 (i), (14)
0.9 < λ < 1
• Following this, we acquire the estimated power ratio:
p(n) = P f (i)/Pe (i) (15)
p(0) ≈ 1; lim p(i) → 0.

n→∞
• And finally, we perform the calculation of the step size as follows:
μs (i) = p(i) ∗ μsmin + (1 − p(i)) ∗ μsmax (16)
where we determine the values of μsmin , μsmax , and λ experimentally. An increase in

accuracy involving the modeling process has been demonstrated by this algorithm,
which correspondingly improves system performance.
We have selected a Feedforward Adaptive Step-Size Filtered-x LMS (ASFxLMS)

algorithm after carefully considering the necessary parameters and conditions of
operation. Kalman filtering-based algorithms and other algorithms like the Recursive
Least Squares (RLS) are known to generate a better noise reduction performance,
which unfortunately gets offset by the high computational costs, which is where the
FxLMS algorithm gets into the picture. It is widely preferred as an algorithm since it
finds use in a range of avenues involving the economy’s industrial and commercial
sectors. In a real-time environment, there is between the secondary loudspeaker and
the error microphone, a secondary propagation path. The error microphone has the
option of being in either static mode or dynamic mode. This secondary propagation
path introduces an unwanted timing delay in the error signal, and this leads to an
error in the synchronization process between the reference signal and error signal.
The FxLMS algorithm has specifically been developed to compensate for this delay to
nullify this effect of attenuation in the reference signal. Once this has been achieved,
it is used as the adaptive filter input.
When we deal with a reference signal with uncorrelated disturbance, Feedforward
ANC has immense utility. The same is not directly available in the case of feedback
ANC reference signal. So, it has to be internally generated to be used effectively as the
adaptive filter input. FxLMS algorithm also has the advantage of faster convergence,
which it owes to the output signal’s prefiltering before it is sent through the secondary
propagation path. This algorithm’s performance, which uses a static step size, puts
a heavy burden in the form of increased computational complexity. Hence, in a bid
to acquire an algorithm that promises to offer a fast convergence rate in both static
and dynamic environments, we have made dedicated efforts to incorporate the use
of the adaptive step-size function in the proposed design.
Here, we have made the use of FIR filters to model P(z), which denotes the
Primary Propagation Path, the Controller (C(z)), Secondary Propagation Path, which

is characterized by S(z), and the Estimated Secondary Propagation Path S (z). In the
system given in Fig. 1, we have represented the source noise signal as x(k), which is
then propagated to the sensor through the primary propagation path, which is a fluid
medium and is represented by P(z). The arriving noise signal is measured by the
sensor as y p (k). To mitigate the effects of and correspondingly cancel out this noise,
another ‘noise’ signal yω (k) is generated using the controller C(z) and noise signal
x(k). In other words, we need to model the Controller on the lines of the primary
propagation path denoted by P(z).
Fig. 1 Block diagram of proposed feedforward active noise control system

A least mean square adaptive step-size algorithm is applied to adjust the controller
coefficient dynamically. However, another fluid secondary propagation medium is
represented by S(z) between the sensor and the actuator. This is more commonly
referred to as the secondary propagation path, as described earlier. Therefore, in a
bid to acquire a practical solution and ensure efficient cancellation of narrowband
noise, there is an evident need to compensate the adjustment process using S (z)
which is an estimate of S(z). The main objective here is to ensure that this newly
generated noise signal destructively interferes with the original noise signal x(k).
Therefore, to acquire the required solution and ensure maximum efficiency in the
cancellation process of narrowband noise, we compensate this controller coefficient

adjustment using S (z), which is a modeling FIR filter that offers an estimate of S(z).
• In Fig. 1, which illustrates the design of the proposed feedforward active
noise cancellation system: P(z) represents the primary propagation path used for
modeling the acoustic response between the reference and error microphones in S(z)
or secondary propagation path.
• The Controller function represented by C(z) is convolved with S(z) in the
secondary propagation path to eliminate d(k). C(k)‘s objective is to reduce the mean
square error of e(k), which is essentially a significant determinant of the algorithm’s
accuracy.
• Background noise, x(k), which in this instance is additive white Gaussian noise
(AWGN), is random and passes through the primary propagation path P(z) and since
its characteristics are uncorrelated with those of the unwanted signals in the channel
medium, the signal y p (k) is generated at the other end of P(z), from which we can
acquire the residual error signal e(k) as per Eqs. (17, 18):

e(k) = y p (k) − ys (k); (17)

ys (k) = s(k) ∗ ys (k). (18)
• The acoustic path passing through S(z) and C(z) is estimated using a suitable
adaptive filter by injecting the same white Gaussian noise x(k) at the control filter’s

(C(z)) input. In Fig. 1, S (z) which is the modeling FIR filter and has length K
generates x s (k) as per Eq. (19):

(k)
T
xs = s (k) ∗ xs N (k). (19)
Here, x s (k) generates a response error signal yw (k) after convolving with the

error residue e(k) in the FxLMS filter, after which it is sent to controller C(z).

Coefficients of the modeling filter (S (z)) are updated as shown:

s (k + 1) = s (k) + μs (k) ∗ e(k) (20)

where μs (k) is the parameter for modeling process step size. Finally, we update the
coefficients of the controller C(z) in the manner represented by Eqs. (21, 22):

C(k + 1) = C(k) + μs (k) ∗ f (k) ∗ x (k);

(21)
f (k) = e(k) − x s (k).

(22)

The reference signal passing through S (z) is filtered to derive the LMS algorithm
input, which is given by Eqs. (23, 24):

T
x (k) = s (k) ∗ x N (k) (23)
where
x N (k) = [x(k), x(k − 1), . . . , x(k − N + 1)]T (24)
Here harmonic sources are controlled by the system through adaptive filtering
of non-synthesized reference signal, which uses an adaptive step size containing
parameters alpha and beta. (k) controls the speed and shape of the adaptive step-size
algorithm, and β(k) controls the range of values of the functional response of S(z).
If the tap length is denoted by K, i.e., the adaptive filter length, we refer to Eq. (24).
Then we update the modeling filter using the step-size parameter of the algorithm
(μs (k)) and the appropriate calculations of the parameter are performed using the
following steps:
• In the beginning, the power computation of error signals f(i) and e(i) is executed:
Pe (k) =∝ ∗Pe (k − 1) + (1− ∝) ∗ e2 (k); (25)
P f (k) =∝ ∗P f (k − 1) + (1− ∝) ∗ f 2 (k), (26)
0.9 <∝< 1
• Following this, the estimated power ratio is acquired by Eq. (15) and corre-
sponding derived results for n = 0 and n = ∞.
• And finally, the calculation of the step size is performed as given below in Eq. (27):
μs (k) = p(k) ∗ μs (0) + (1 − p(k)) ∗ μs (k − 1) (27)
where the values of μs (0) and ∝ are experimentally determined and μs (0) denotes
the initial step size at the beginning of the path modeling process, which involves
varying the step size with the time changing at discrete intervals given by k. These
values are specifically chosen to ensure that the process of adaptation doesn’t slow
down or become unstable. This is done to ensure that the initial value of μs (n)
corresponds to μs (0) and that the estimators given in Eqs. (20, 21) are initialized by
identical values, which for the sake of convenience are taken as unity, or in other
words Pe (0) = P f (0) = 1. It is also recommended that the value ∝ that we use in
the two estimators’ case is the same. Using the proposed Optimized Adaptive Step-
Size FxLMS (OASSFxLMS) algorithm increases the accuracy of modeling, which
is instrumental in improving the system’s performance.
The newly designed ANC generates anti-noise with an amplitude equal to and
a phase opposite to that of the unwanted noise that it cancels out while traveling
through the secondary propagation path source. The convergence rate analysis and
the magnitude of noise reduction will be critical in realizing our ambition of achieving
better performance than conventional algorithms. Using a feedforward system, we
can avoid acoustic feedback, which is not desirable by applying this technique. The
traditional ANC algorithms involving FxLMS, which use a fixed tap length, usually
need a control filter with a predetermined long tap length for different environments.
As a result, the convergence rate slows down because the maximum value of step
size has a set limit.
The proposed algorithm is designed to self-adjust the required tap length to adapt
to the environment so that the noise cancellation system can achieve faster conver-
gence than conventional methods. For applications involving ANC, primary and
secondary propagation paths have asymmetric impulse responses, which help attain
the desired output response for the sake of canceling out undesired noise. Thus,
the new OASSFxLMS algorithm has been developed with a generalized dynamic
step-size function for a response model, which promotes the exponential decay of
noise residue by optimizing the filter’s coefficients. Specific issues concerning the
application of the proposed algorithm have also been addressed. Hence, we expect
the proposed OASSFxLMS algorithm to offer better performance and faster conver-
gence when compared to its conventional variable step size and other fixed step-size
counterparts.
Here, we have analyzed the performance of the Optimized Adaptive Step-Size

FxLMS (OASSFxLMS) algorithm by demonstrating its characteristic error iden-
tification and performance curves on two parameters, namely Mean Square Devia-
tion (MSD) and Noise Reduction Ratio (NRR). We will compare the same with the
following conventional and existing algorithms, namely the fixed step-size FxLMS
algorithm, FxRLS algorithm, and conventional adaptive step-size algorithm. The
analysis of parameters like the convergence rate and the magnitude of noise reduc-
tion proved critical and helped us realize our ambition to achieve better performance
than a conventional algorithm. With this technique’s help, we were successfully able
to avoid any acoustic feedback that is not desirable. The noise signal itself removes
the causality constraint, and each harmonic can be independently managed by the
reference signal generated internally.
The simulation results for the proposed algorithm are given below. The input
signal is composed of narrowband random additive white Gaussian noise. One can
observe in the figures below that the Active Noise Cancellation process manages to
generate anti-noise, which has an amplitude equal to and a phase opposite to that
of the unwanted noise, which it successfully cancels out while traveling through
the secondary propagation path source, which acts as the control signal as shown in
Fig. 3.
The noise residue left after carrying out the destructive interference by using
the variable step OASSFxLMS algorithm is significantly less than conventional
ASSFxLMS which is applied for the same Narrowband White Noise. In this partic-
ular instance, it is passed through a finite impulse response (FIR) filter to acquire the
best fit to model the internal combustion engine noise that we desire to cancel out.
The noise reduction increases slightly with the proposed OASSFxLMS algorithm’s
help by varying the step and tap length of the secondary path coefficients.
The convergence speed of OASSFxLMS is also observed to be better when
equated with that of the conventional ASSFxLMS. The same can be observed in
Figs. 2 and 4. The System Identification Error parameter shown in Fig. 4 represents
the accuracy of secondary impulse response by comparing its characteristics with
that of the input noise signal. This particular figure has again demonstrated that our
proposed OASSFxLMS algorithm has outperformed the conventional ASSFxLMS
algorithm by significantly tuning down the system identification error. Figure 5 shows
the amplitude levels of filter taps of the secondary and secondary impulse path coef-
ficients, which have been compared to show variation in terms of amplitude. As
observed in Fig. 5, the OASSFxLMS algorithm initially approaches the modeled
secondary impulse response with a larger step size and smaller tap length, which we
Fig. 2 Comparison of noise reduction using OASSFxLMS and ASSFxLMS algorithms

have obtained by reducing the MSD of the coefficients of the filter taps. Following
this, we have used varied filter taps to acquire the MSD for all the algorithms while
adjusting the step size according to Eq. (27). In this process, we have developed a
recursive algorithm for optimizing the estimation of step size and tap length, which
helps keep the algorithm’s computational complexity in check. Since the MSD of
the two-sided exponential decay model can be proved as a convex function of the
tap lengths and step sizes, the new algorithm has the property of global optimality.
As mentioned earlier, our experimental analysis involves evaluating MSD and the
Noise Reduction Ratio (NRR) comparison for the chosen ANC algorithms.
Simulation results are shown in Figs. 6 and 7 which conclusively prove that our
proposed OASSFxLMS algorithm has a substantially faster convergence and better
noise reduction performance than other conventional and existing algorithms, even
assuming the tap length and step-size is known a priori. The same has been illustrated
in the table representation given above, where the OASSFxLMS has a comparatively
higher NRR than its counterparts while also maintaining a lower MSD, which is
highly desirable for accurate estimation of the step size and an optimized convergence
rate. Thus, our proposed OASSFxLMS algorithm, which has been developed with
Fig. 3 Anti-noise signal generation by the controller
Fig. 4 System identification error measurement using secondary impulse response

Fig. 5 Comparison of filter taps of coefficients of S (z) and S(z)
Fig. 6 Comparison of the MSD performance for different ANC algorithms in case of narrowband
IC Engine noise
Fig. 7 Comparison of NRR performance for different ANC algorithms in case of narrowband IC
Engine noise
Table 1 Average of different values of performance parameters of Figs. 6 and 7

Performance parameter OASSFxLMS (dB) ASSFxLMS (dB) FxLMS (dB) FxRLS (dB)
Mean square deviation −33.125 −30.5 −27.051 −28.375
Noise reduction rate 16.416 14.33 11.75 15.25
a generalized form of variable step sizes for a secondary impulse response model,
has achieved the minimum MSD in the computer simulations performed for optimal
coefficients of filter taps and offers the highest noise reduction ratio when compared
to the conventional ASSFxLMS, FxLMS, or FxRLS algorithms (Table 1).
6 Conclusion
In this paper, a modified variable step-size and tap-length FxLMS algorithms have
been proposed. The proposed algorithm has an advantage over others because of its
simplicity and robust performance, making it a decent contender for practical appli-
cations. The computational complexity of a given algorithm is usually determined
by considering the number of required multiplication operations per iteration for
the given algorithm (27). After performing the relevant computer simulations, we
obtained a computational complexity of O(N log N), which is feasible to implement
within the pre-existing industrial norms. All the existing methodologies generally
involve three adaptive filters and have the same level of computational complexity.
Although we have introduced a linear computational complexity in the proposed
method while updating the step size instead of using the one used in Akhtar’s method
with constant computational complexity, the higher convergence rate and reduction
levels have more than compensated for the same. The following has been demon-
strated with the help of computer simulations, through which we can observe that the
proposed method offers much higher convergence, good stability, and robustness for
ANC of narrowband internal combustion engine noise. Hence, we have successfully
managed to develop a high-performance feedforward ANC design that promises low
power consumption due to low computational complexity and would be viable in
reducing the internal combustion engine noise in automobiles. Experimental results
show that the proposed design can attenuate most of the narrowband combustion
noise between 100 and 1000 Hz.
References
1. Huang, B., Xiao, Y., Sun, J., Wei, G.: A variable step-size FXLMS algorithm for narrowband
active noise control. IEEE Trans. (2013)
2. Ardekani, I.T., Student Member, I.E.E.E., Waleed, H.A., Member, I.E.E.E.S.: Effects of imper-
fect secondary path modelling on adaptive active noise control systems. IEEE Trans. Control
Syst. Technol. 20(5), 1252–1262 (2012)
3. Tahir Akhtar, M., Member, I.E.E.E., September, W.M.M.: Improving performance of hybrid
active noise control systems for uncorrelated narrowband disturbances. IEEE Trans. Audio
Speech Lang. Process. 19(7), 2058–2066 (2011)
4. Prof. Mrs. Pathak, B., Ms. Hirave, P.P.: FXLMS algorithm for feed forward active noise
cancellation. In: Universal Association of Computer and Electronics Engineers IEEE 978-
1-46730136-71111$26.00, pp. 18–22. IEEE (2011)
5. Meller, M., Niedzwiecki, M.: Multi-channel self-optimizing narrowband interference canceller.
Signal Process. 98 Elsevier 396–409 (2013)
6. Ang, W.P., Farhang-Boroujeny, B.: A new class of gradient adaptive step-size Imsalgorithms.
IEEE Trans. Signal Process. 49(4), 805–810 (2001)
7. Manzano, E.A., Tafur, J.: Optimal step size for a delayed FxLMS algorithm applied in a
prototype of active noise control system. In: 2018 IEEE 14th International Conference on
Control and Automation (ICCA)
8. Chang, D.C., Chu, F.T.: Feedforward active noise control with a new variable tap-length and
step-size filtered-X LMS algorithm. IEEE/ACM Trans. Audio, Speech, Lang. Process. 22(2)
(2014)
9. Kuo, S.M., Morgan, D.R.: Active noise control: a tutorial review. Proc. IEEE 8(6), 943–973
(1999)
10. Eriksson, L.J., Allie, M.C.: Use of random noise for online transducer modeling in an adaptive
active attenuation system. J. Acoust. Soc. Am. 85(2), 797–802 (1989)
11. Kuo, S.M., Vijayan, D.: A secondary path modeling technique for active noise control systems.
IEEE Trans. Speech Audio Process. 5(4), 374–377 (1997)
12. Akhtar, M.T., Abe, M., Kawamata, M.: A method for online secondary path modeling in active
noise control systems. In: Proceedings of the IEEE 2005 International Symposium Circuits
Systems (ISCAS2005), 23–26, pp. I-264–I-267 (2005)
13. Akhtar, M.T., Abe, M., Kawamata, M.: Modified filtered-x LMS algorithm based active noise
control system with improved online secondary path modeling. In: Proceedings of the IEEE
2004 International Midwest Symposium Circuits Systems, 25–28, pp. I-13–I-16 (2004)
14. Kuo, S.M., Vijayan, D.: Optimized secondary path modeling technique for active noise control
systems. In Proceedings of the IEEE Asia-Pacific Conference on Circuits and Systems, pp. 370–
375. Taipei, Taiwan (1994)
15. Zhang, M., Lan, H., Ser, W.: Cross-updated active noise control system with online secondary
path modelling. IEEE Trans. Speech, Audio Proc., 9(5) (2001)
FFT-Based Robust Video Steganography
over Non-dynamic Region in Compressed
Domain
Rachna Patel , Kalpesh Lad , and Mukesh Patel
Abstract The proposed research work presents a novel data hiding method for video
steganography in the compressed domain. In this method, the random numbered
secret frames are selected from the RGB cover video sequence. This method increases
the complexity level of video steganography by considering the specific host to
conceal confidential data. It extracts the specific non-dynamic region from the secret
frame and transforms the pixel value to the frequency domain using Fast Fourier
Transform (FFT). The usage of random Least Significant Bit (LSB) of the real part of
FFT as a carrier object leads to good video quality and secret data-carrying capacity.
Furthermore, the secure compressed stego video is reconstructed using the H.264
video compression technique. The proposed method is experimented on some well-
known video datasets by considering RGB images with different resolutions as a
secret message. Performance evaluation parameters evaluate the proposed method’s
efficiency, imperceptibility, robustness, and embedding capacity, and the improved
results are compared with reported methodologies. The results show a significant
improvement in the imperceptibility as a Peak Signal-to-Noise Ratio (PSNR) value
is reached up to infinity (Inf) in some cases. At the same time, the similarity between
embedded and extracted message is achieved nearer to 1 with the negligible Bit Error
Rate (BER) less than 0.1%, and the embedding capacity greater than 0.5% in all cases
indicates an excellent sign of carrying a big amount of confidential data.
Keywords Compressed video steganography · H.264 · FFT · Imperceptibility ·

Robustness · Embedding capacity
R. Patel (B)
Computer Engineering Department, CGPIT, UTU, Bardoli 394350, Gujarat, India
e-mail: rachu.cuty@gmail.com
K. Lad
SRIMCA, Uka Tarsadia University, Bardoli 394350, Gujarat, India
M. Patel
Department of Mathematics, Uka Tarsadia University, Bardoli 394350, Gujarat, India
202 R. Patel et al.
1 Introduction
Steganography is an emerging trend in data security in which secret information

is prevented from unauthorized or illegal access during sharing over an unsecured
network channel. In steganography, there are two objects; one is a cover, and another
one is a message which is to be hidden into the cover object. Steganography can
be applicable for text, picture, audio, or video objects for different purposes. In the
recent era, steganography is majorly using video as a cover to hide a high amount
of secret information known as video steganography. It can be classified into two
basic domains based on video compression, mainly considered as uncompressed and
compressed video domain [1, 2]. It can be further classified into spatial and frequency
domains. Much work has been carried out in uncompressed video steganography
with more payload capacity, but less robust against compression, added noises, and
decryption. Thus, the video steganography over a compressed domain is widely in
researchers’ focus [2].
It has wide applications in the real phenomenon, viz., military, medical or intel-
ligence communication, and personal information system. In the military, digital
information like video related to location tracking of the enemy, military operation,
rescue operation, surgical strike, electronic signature video of war, etc., are highly
evidential where they have to communicate with the control office securely. In the
medical field, the biometric, bioinformatics, or psychological, behavioral informa-
tion of a patient must be prevented against the public domain’s leak while commu-
nicating between the medical centers. Similarly, it is indeed to maintain the secrecy
of personal video of an individual against disclosing in the public domain [1, 3].
In this research work, steganography is implemented on video, especially in a
compressed domain known as compressed video steganography. A compressed video
is used as a cover object, while a secret message is an RGB image. The forthcoming
section includes the related work in the same domain using different transform coef-
ficients. The results obtained by other associated methods are also briefly explained
with respective quality assessment parameters. After that, in the next section, the
proposed methodology is elaborated with embedding and extracting algorithms.
The proposed method is followed by experimental results and discussion using a
well-known video dataset.
FFT-Based Robust Video Steganography … 203
2 Literature Review
Video steganography in the temporal domain, using transform coefficients, is a well-

known data hiding method in compressed video bit-stream H.264/Advanced Video
Coding (AVC). The different transform coefficients, viz., Discrete Cosine Transform
(DCT), Discrete Sine Transform (DST), Discrete Wavelet Transform (DWT), Quan-
tized DCT (QDCT), and Quantized DST (QDST) convert the input video frame from
spatial to the temporal domain. The components of these coefficients are used as a
carrier object to conceal secret information. The literature review on video steganog-
raphy using these components carried out by different researchers is described as
follows.
Li et al. [4] have used recoverable privacy protection to distribute video content.
DWT sub-bands of the ROI are used to create the secret message and the carrier.
The DWT coefficients having middle and high frequency are taken as carrier data,
and the DWT sub-band having a low frequency is taken as secret information. The
performance of the method is evaluated with the PSNR value greater than 42.1 dB.
Liu et al. [5] have performed video steganography in the intra-frame domain to
hide the secret information using H.264 coding. The current block data is predicted
from the encoded adjacent blocks considering boundary pixels of left and upper
blocks in this method. The distortion is propagated negatively to the current block
during the embedding process, and the increment in distortion drift will be found
toward the lower-right intra-frame blocks. He has used QDCT coefficients of lumi-
nance for data embedding. This method has low embedding capacity because only
luminance intra-frame blocks under specific conditions are used, and the average
PSNR value is 40.73 dB. The author [6] has further extended the work by imple-
menting BCH code on secret information before embedding it to enhance the
performance. This approach resulted in an average PSNR value of 46.25 dB.
Mstafa et al. [7] have adopted a DCT-based robust video steganography using BCH
Error-Correcting Codes by considering cover as a YUV video and secret message as a
large text file. In this work, the secret message is encrypted by BCH before embedding
it into the YUV-DCT components of the cover video frame. The experimental results
show that the PSNR is obtained between 38.95 and 42.73 dB with similarity (Sim)
1 and hiding ratio of 27.53%.
Mstafa et al. [8] have reported the method for robust and secure video steganog-
raphy in the DCT domain based on Multiple Object Tracking and Error Correcting
Code (ECC) using the cover as an RGB video and text as a secret message. Hamming
and BCH codes pre-process the secret message before embedding, and after that, it is
hidden into the DCT component of all motion regions of a cover video frame. Here,
PSNR is obtained between 35.95 and 48.67 dB, and the hiding ratio is obtained as
3.46%. The BER varies from 0 to 11.7%, and the similarity is nearer to 1.
204 R. Patel et al.
Liu and Xu [9] have introduced a robust steganography method for HEVC based
on secret sharing in which the secret message was encoded by threshold secret
sharing and embedded into a 4 × 4 luminance DST block. The average PSNR value
was obtained between 34.4 and 46.38 dB with an average Bit Error Rate (BER) of
20.41%–22.59%. This method achieves high performance in the context of visual
quality and robustness performance based on HEVC.
Yang and Li [10] have proposed the latest video steganography method using
high-efficiency video coding (HEVC). This method is based on motion vector space
encoding for the HEVC process. In this method, the motion vector components are
selected from N/2 prediction units (PUs) through smaller sizes in a coding tree unit
(CTU) as a secret information carrier object. Embedding capacity is higher than
LSB under similar motion vectors and lower than LSB under identical carriers. The
empirical results show that the PSNR is varying from 30 to 41.50 dB.
Despite these video steganography methods, it is necessary to improve PSNR
value and high robustness by selecting well-secured carrier objects. In this consid-
eration, the characteristics of the transform coefficient (FFT) components in the
temporal domain perform vital roles. Furthermore, selecting a secret frame and a
specific region of that frame used as a carrier object to conceal confidential data
enhances video steganography’s robustness.
The proposed methodology named “FFT-Secret Bit Positions of Non-Dynamic

Region for Message” (FFT-SBPNRM) for video steganography under the
compressed domain is processed into two different stages: embedding and extracting
of secret message.
3.1 Embedding Method
The system architecture of the proposed embedding method is shown in Fig. 1.

Initially, a compressed video is separated into a sequence of video frames, and
secret frames are selected using the stego key. The stego key also extracts the non-
dynamic region from these secret frames where the secret message is concealed.
The extracted non-dynamic region is converted from spatial to the frequency domain
using transform coefficient FFT [11]. The FFT components’ data type is a complex
number, a combination of real and imaginary parts. The FFT components’ real part
is separated whose random Least Significant Bit (LSB) is used as a carrier object.
The secret message (RGB image) is divided into R, G, and B components converted
into binary form and finally concealed into random LSBs of the real part of the FFT
component using an embed key.
The Inverse Fast Fourier Transform (IFFT) provides the stego non-dynamic region
replaced at their respective positions of secret video frames known as stego frames
by reversing the above process. H.264 video encoder encodes the compressed stego
video in which stego frames replace the hidden frames in the sequence of video
frames. The Embedding process has been comprehensively described in Algorithm
1.
FFT- Random LSB Vector
Stego key
R
ADM FFT
Compressed Secret
Original G Message
Video
Original Non-dynamic B
Frame Region Embed Key
IFFT
H.264
Compressed
Stego Video Encoder
Stego Stego Non-
Frame dynamic Region
Fig. 1 Block diagram of proposed FFT-SBPNRM embedding stage

206 R. Patel et al.
Algorithm 1. FFT-SBPNRM Embedding Algorithm

3.2 Extracting Method
The extracting process of the proposed technique FFT-SBPNRM is used to extract

the compressed secret message from the compressed stego video. The processing
steps of extraction are shown in Fig. 2.
The H.264 encoded compressed stego video is separated into a sequence of frames
from which the secret message is extracted. The same stego key used in the embedding
stage is implemented to extract the stego frames from the stego video frame sequence.
It also extracts the non-dynamic region from the secret stego frames in which a
secret message has been concealed. Moreover, FFT is implemented on this extracted
stego non-dynamic region to transform in the frequency domain. The R, G, and
208 R. Patel et al.
FFT-Random LSB Vector

Stego key
R
RGB Frames
ADM FFT
Compressed Secret
G
Stego Video H.264 Message
Decoder Stego
Stego B
Frame Non-dynamic Region Extract
Key
Fig. 2 Block diagram of proposed FFT-SBPNRM extracting stage
B component secret message is extracted from the random LSBs of the real part of
obtained FFT components. Finally, the secret message is reconstructed by combining
R, G, and B components. The extracting process can be briefly described in Algorithm
2.
Algorithm 2. FFT-SBPNRM Extracting Algorithm

210 R. Patel et al.
4 Experimental Results and Discussion
The proposed method has been experimented on different cover videos having a
different size (number of frames), resolution (dimension of frames), and frame rate
(frames per second). The quality of video steganography is measured based on the
following quality assessment parameters.
4.1 Imperceptibility
It measures variation between original and stego data and was measured by two
different parameters: PSNR and MSE. The lower the value of MSE, the higher the
value of PSNR increases the level of imperceptibility. The MSE and PSNR can be
calculated using Eqs. (1) and (2), respectively.
m n h
i=1 j=1 k=1 [F(i, j, k) − S(i, j, k)]2
MSE = (1)
m×n×h

MAX2F
PSNR = 10 × log10 (dB) (2)
MSE
where F is the original cover frame, while S is the stego-frame. m × n represents the
frame’s dimension, and h is used to denote the RGB component of the frame using
(k = 1, 2, and3). The highest pixel value of the frame F is denoted by MAXF .
4.2 Robustness
The robustness of video steganography is decided based on its two parameters: (i)
Sim: The distance between embedded and extracted data, and (ii) BER: Error between
the bit positions of original and stego object. Both Sim and BER can be calculated
using Eqs. (3) and (4), respectively.
x y
i=1 j=1 [N (i, j) × N (i, j)]

Sim = y
(3)
x x y
i=1 j=1 N (i, j)2 × i=1 j=1 N (i, j)2
x y
i=1 j=1 [N (i, j) ⊕ N (i, j)]

BER = × 100% (4)
x×y

where N and N are the concealed and extracted hidden data, and “x” and “y” are
the sizes of the hidden data.
4.3 Embedding Capacity
Video steganography’s capacity to hide maximum data into a cover object is known
as embedding payload/hiding capacity. It is measured by the Hiding Ratio (HR) that
can be calculated by Eq. (5) [1, 2].
Size of the embedded message

HR = × 100% (5)
Video size
The empirical result shows that if HR ≥ 0.5%, then the embedding capacity is
considered significantly high [1, 2].
The proposed FFT-SBPNRM method is implemented in MATLAB on the well-
known database (Elecard Videos, Remega Videos) [15, 16]. The RGB image is
considered secret information, and the video is considered a cover carrier object.
The experimental results are also compared based on quality assessment parameters
by the reported methods described in the literature review section. The realistic results
for imperceptibility and robustness and the proposed method’s hiding capacity are
shown in Table 1.
Table 1 illustrates the efficiency of video steganography applied on RGB
compressed video with different sizes, frame rates, and resolutions. In this experi-
ment, RGB images of varying sizes are used as a secret message. Compared to the
reported method, the imperceptibility parameter PSNR is significantly improved as
the reported method having a maximum Average PSNR (APSNR) value is 46.38 dB,
while the proposed method has an APSNR value varying from 81.005 dB to Inf
(Infinity). In this proposed method, the Average MSE (AMSE) is almost zero,
indicating a significant imperceptibility improvement.
Table 1 also illustrates that the robustness in video steganography by the proposed
method in which Similarity (Sim) is almost 1 and BER varies from 0.006 to 0.0766%
indicates that the quality of video steganography is satisfactory. Also, the hiding ratio
varies between 0.58353 and 2.69598% suggests that the proposed method using non-
dynamic regions as carrier objects improves the hiding capacity of video steganog-
raphy; as discussed above, the HR of more than 0.5% indicates the remarkable
payload capacity.
Different numbers of secret frames, viz., 3, 6, and 9 are selected by the stego
key to conceal the secret information in various experiment cases. The increment
in the number of frames facilitates more spaces to carry more secret message bits,
leading to improved hiding capacity. But, there is a trade-off between the number
of secret frames and the quality of steganography. The increment in the number
of cover frames with the secret message’s constant size increases the PSNR value;
thus, the imperceptibility is improved while reducing the hiding capacity. On the
Table 1 Imperceptibility, robustness, and embedding capacity using proposed FFT-SBPNRM method
212
Sr. Cover Cover Cover Frame Total No. of Secret Secret Proposed method Existing methods Proposed method Proposed
no. video video video size rate no. of frames message message imperceptibility APSNR (dB) robustness method
name (Height × frames selected (SM) size embedding
Width) from (Height × capacity
cover Width) AMSE APSNR Method Method Sim BER HR (%)
video (dB) [9] [10] (%)
1 Basketball 1080 × 50 108 3 242 × 0.0000000 Inf – 42.30 0.9772 0.0034 0.58353
drive 1920 150
2 BQ 1080 × 60 143 3 339 × 0.0004221 81.877 – 42.75 1.0000 0.0006 1.14439

terrace 1920 210
3 Cactus 1080 × 50 132 6 260 × 0.0000595 90.387 – 40.00 0.8627 0.0567 0.87770
1920 420
4 Kristen 720 × 60 165 6 203 × 0.0000007 109.773 46.38 – 0.9418 0.0414 1.20414
and Sara 1280 328
5 Slide 720 × 30 167 9 236 × 0.0005159 81.005 – – 0.8207 0.0766 1.08406
editing 1280 381
6 Traffic 800 × 30 109 3 362 × 0.0000017 105.771 – 40.16 1.0000 0.0007 2.63958
1280 224
7 Party 480 × 50 228 6 323 × 0.0000008 109.338 40.65 36.05 0.8632 0.0588 2.69598
scene 832 200
8 Basketball 480 × 50 197 3 129 × 0.0000506 91.089 41.61 – 0.9799 0.0028 2.23958
drill 832 208
9 People on 800 × 30 150 3 139 × 0.0000000 Inf – 41.96 0.9668 0.0031 1.01354
street 1280 224
(continued)
R. Patel et al.
Table 1 (continued)
Sr. Cover Cover Cover Frame Total No. of Secret Secret Proposed method Existing methods Proposed method Proposed
no. video video video size rate no. of frames message message imperceptibility APSNR (dB) robustness method
name (Height × frames selected (SM) size embedding
Width) from (Height × capacity
cover Width) AMSE APSNR Method Method Sim BER HR (%)
video (dB) [9] [10] (%)
10 China 768 × 30 233 3 125 × 0.0000000 Inf – 40.44 0.9938 0.0014 1.07023
speed 1024 202
a* Inf: There is no significant difference between the original video and the stego video
FFT-Based Robust Video Steganography …
213
214 R. Patel et al.
contrary, with the increment of the number of cover frames, the increment in the
secret message’s size improves video steganography’s hiding capacity.
5 Conclusion and Future Scope
The proposed FFT-SBPNRM method for compressed video steganography works

on the component of transform coefficient FFT. The selection of several secret cover
video frames, non-dynamic regions, random LSB of FFT component, and the secret
bit positions of LSB are the key points of robust video steganography that improves
the security level. Moreover, the quality assessment parameter, especially APSNR,
indicates a significantly enhanced value varying from 81.005 dB to Inf (infinity).
Meaning of Inf is there is no significant difference between the original video and
the stego video, which shows the highest imperceptibility. Also, the similarity equals
1, and a nearby value of BER reflects the increased robustness of the proposed video
steganography method. Furthermore, the hiding ratio is more significant than 0.5%
in each case is a sign of adequate embedding capacity.
In the future, video steganography can be performed on motion regions across
the video frames detected by some well-defined motion extraction methods. Further-
more, different transform coefficients can also be used as a carrier object. H.265/High
Efficiency Video Coding (HEVC) can enhance a video compression technique
without losing much quality of the video.
References
1. Mstafa, R.J., Elleithy, K.M.: Compressed and raw video steganography techniques: a compre-
hensive survey and analysis. Multimed. Tools Appl. 76(20), 21749–21786 (2017). https://doi.
org/10.1007/s11042-016-4055-1 (Springer)
2. Mstafa, R.J., Elleithy, K.M., Abdelfattah, E.: Video steganography techniques: taxonomy, chal-
lenges, and future directions. In: Applications and Technology Conference (LISAT), 2017.
IEEE Long Island, pp. 1–6, IEEE (2017). https://doi.org/10.1109/LISAT.2017.8001965
3. Balu, S., Babu, C.N.K., Amudha, K.: Secure and efficient data transmission by video steganog-
raphy in medical imaging system. Cluster Comput. 4057–4063 (2018). https://doi.org/10.1007/
s10586-018-2639-4 (Springer)
4. Li, G., Ito, Y., Yu, X., Nitta, N., Babaguchi, N.: Recoverable privacy protection for video
content distribution. EURASIP J. Inf. Secur. 1–11 (2010). https://doi.org/10.1155/2009/293031
(Springer)
5. Liu, Y.X., Li, Z., Ma, X., Liu, J.: A novel data hiding scheme for H.264/AVC video streams
without intra-frame distortion drift. In: IEEE 14th International Conference on Communication
Technology, pp. 824–828. IEEE (2012). https://doi.org/10.1109/ICCT.2012.6511318
6. Liu, Y., Li, Z., Maa, X., Liu, J.: A robust data hiding algorithm for H.264/AVC video streams.
J. Syst. Soft. 86(8), 2174–2183 (2013). https://doi.org/10.1016/j.jss.2013.03.101
7. Mstafa, R.J., Elleithy, K.M.: A DCT-based robust video steganographic method using bch
error correcting codes. In: 2016 IEEE Long Island Systems, Applications and Technology
Conference (LISAT). IEEE (2016). https://doi.org/10.1109/LISAT.2016.7494111
8. Mstafa, R.J., Elleithy, K.M., Abdelfattah, E.: A robust and secure video steganography method
in DWT-DCT domains based on multiple object tracking and ECC. IEEE Access, vol. 5.
IEEE—Institute of Electrical Electronics Engineers, Inc., ISSN No.: 2169-3536. https://doi.
org/10.1109/ACCESS.2017.2691581, pp 5354–5365, 6th April 2017
9. Liu, S., Xu, D.: A robust steganography method for HEVC based on secret sharing. Cognitive
Syst. Res. 59, 207–220 (2020). https://doi.org/10.1016/j.cogsys.2019.09.008 (Elsevier)
10. Yang, J., Li, S.: An efficient information hiding method based on motion vector space encoding
for HEVC. Multimed. Tools Appl. 77(10), 11979–12001 (2017). https://doi.org/10.1007/s11
042-017-4844-1 (Springer)
11. Khan, A., Sarfaraz, A.: FFT-ETM based distortion less and high payload image steganog-
raphy. Multimed. Tools Appl. 25999–26022 (2019). https://doi.org/10.1007/s11042-019-
7664-7 (Springer)
12. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 3rd ed. ISBN: 0-13-168728-x 978-0-
13-168728-8. Pearson Education (2008)
13. Tutatchikov, V.S.: Two-dimensional fast Fourier transform Batterfly in analog of Cooley-Tukey
algorithm. In: 11th International Forum on Strategic Technology (IFOST), pp. 495–498, IEEE
(2016). https://doi.org/10.1109/IFOST.2016.7884163
14. Hussein, A.A., Al-Thahab, O.Q.J.: Design and simulation a video steganography system by
using FFTturbo code methods for copyrights application. Eastern-Eur. J. Enterprise Technol.
2(9), 43–55 (2020). https://doi.org/10.15587/1729-4061.2020.201010
15. https://www.elecard.com/videos. Video compression Guru, Elecard Video, June 2019
16. https://github.com/remega/video_database/tree/master/videos. Remega video database,
November 8, 2017
An Improved Approach for Devanagari
Handwritten Characters Recognition
System
Rajdeep Singh, Arvind Kumar Shukla, Rahul Kumar Mishra, and S. S. Bedi
Abstract It is a very complicated task to recognize the handwritten characters and

scanned data/images in recent years. The different sizes and writing methods of the
characters play a critical role in clearly identifying the handwritten characters. This
script’s massive prevalence must be taken care of by using advanced technologies
to connect to the real world to a greater depth. Machine Learning is one of the
most popular technologies that has attracted the recent research work of handwritten
character recognition using A.I. techniques. Various new technologies have been
developed to execute fast neural networks with little exhaustive knowledge require-
ments. Here, we operate using Keras and Python libraries for building our model.
The main aim of CNN is to recognize the training data and fit that training data
into models that should help human beings. In this paper, an attempt has been made
to construct and evaluate a simple individual learning algorithm (like k-means and
SVM) using Keras to recognize isolated Devanagari handwritten characters datasets
and assess the impact of variations in parameters in the learning phase. The proposed
methodology gives a better result. The accuracy is better than individual algorithm
performance.
Keywords Handwritten character recognition · OCR · Keras · Python 3.0 ·

K-means · SVM · CNN
1 Introduction
The handwritten character recognition aims to replicate the human understanding

capabilities so that the computer may modify and recognize the work like understand-
able human formats. Image processing and pattern recognition in the handwriting
character recognition system are among the most critical and demanding study areas.
R. Singh · A. K. Shukla (B) · R. K. Mishra

School of Computer Science & Applications, IFTM University, Moradabad, India
S. S. Bedi
Department of CS&IT, MJP Rohilkhand University, Bareilly, UP, India
218 R. Singh et al.
It gives exceptionally advanced computerization procedures and develops the inter-

face between human and machine in numerous applications. Devanagari is the key
and most liked Indian language script such as Hindi in the Asian country. Indo-Aryan,
Sanskritic languages are written in scripts as natural languages of the Asian country,
and Hindi is the third most well-liked language inside the world [1–4].
The handwritten character recognition system is the most crucial field, used in
several applications for various purposes. Optical Character Recognition (OCR) and
Handwritten Character Recognition (HCR) are used in handwritten character recog-
nition methods. OCR system is most suited for the different applications of human
life. A handwritten character recognition system would build a paperless setting
by digitizing and presenting paper credentials [5, 6]. The OCR is a converter that
interprets handwritten content data/images to machine-based text. The handwritten
character recognition process is differentiating in two ways: offline and online. A
scanner scans the writing text in electronic format.
The handwritten character recognition process is useful for the involuntary trans-
lation of the handwritten text into computerized text, [7]. This paper presents an
implementation of “CNN Approach for Devanagari Characters Recognition Using
Python” within the field of Handwritten Character Recognition. The Devanagari
script consists of 13 swars, 36 vyanjans, and 10 “anka” (Figs. 1, 2, 3 and 4).
There are several particular signs/symbols, punctuation symbols in Devanagari script,
including the swar–vyanjan and vyanjan–vyanjan combinations [8–13].
Swars’ modifiers have an essential role in Hindi script. Each vowel except v
communicates to a modifier symbol. Figure 4 depicts “swars” with the corresponding
modifier in Hindi script.
Fig. 1 Swars-Devanagri
Fig. 2 Vyanjan-Devanagri
An Improved Approach for Devanagari Handwritten … 219
Fig. 3 “Anka in Devanagri”
Fig. 4 “Swars with modifiers in Devanagri”
Fig. 5 Semi-form of Vyanjans
Figure 5 shows the semi-form of “vyanjans” of Fig. 2. This figure’s vacant

positions reveal that the corresponding vyanjan does not possess half shape/form
[10].
The handwritten character recognition process is discussed in the following
sections.
2 Problems in Character Recognition
Humans have been scripting various notes and documents, spiritual notes on papers,
and posted handwritten cards to their relatives for information storage and communi-
cation from ancient times to date. Historical scripts/books and other past documents
are also written in different styles and manners for a long time. Although we are
using computers to write the many types of documents for different purpose of life
now. The use of paper is the finest choice to write letters. To search the data from
past handwritten scripts is challenging to understand quickly, but a computerized
220 R. Singh et al.
form of data can be easily assessable by the user. Therefore, we have to find the
best way to convert handwritten documents for recognition by the computer easily.
It improves human efficiency to understand the handwritten data with the help of
computers. Handwritten data with different methods and forms makes the document
more difficult to understand because every person has a unique way of writing the
data. Therefore, Devanagri language is difficult compared to the English language
due to writing styles [14–17].
This research is focused on solving some of the issues related to Machine Learning
and Image Processing so that it is useful to recognize handwritten characters and
numeric digits for various areas of life. In specific, the research objectives are as
follows.
• Designing a human-friendly handwritten character recognition system will help
classify the information using machine learning techniques.
• Recommendation of an appropriate solution for the Devnagiri characters recog-
nition system.
• To publish the research work for information dissemination to the computer or
researchers and different kinds of organizations.
3 The Proposed Model
The handwritten character recognition method is separated into the following steps,
clearly shown in Fig. 6.
3.1 Image Acquisition
The electronic images and pictures are instigating for input purposes with the help
of electronic devices. Such devices utilize an enclosure that is robotic in the scenery.
The images had been taken by writing their own words on a computer or creating
the images of words in many styles and scanned data also [18, 19].
3.2 Pre-processing
It is an essential part of handwritten character detection for a secure identifica-

tion rate of interest to be recognized. The purpose of pre-processing is to stabi-
lize blows and put out distinctions that might or else make problems recognition
and scale back the familiarity rate. These alterations stand for the varying size of
content, lost points during pen section collections, clamor contribution in wording,
left or right curve in writing, and rough separation of points from adjacent locations.
Fig. 6 Handwritten character recognition system
It has simple phases: range standardizes and centralization, inception lost points,
promoting, incline improvement, and exploring points [20].
3.3 Segmentation
This phase is responsible for allocating the characters from an image separately.
The document is operated in a very well-defined manner. Firstly, intensity lines are
separated exploitation row diagram. From every row, words are extracted from the
victimization column bar chart, and at last, letters are separated from language words.
In this phase, the style and appearance of images and data are modified by the feature
extraction technique, which is more relevant for the arrangement and grouping of
data. The methods like binary category, linear, and graph-based analysis for feature
extraction are used to mine unique typescript ideas. This scheme is based on the
nature of the entered data.
222 R. Singh et al.
3.5 Classification
Whenever we feed up the input images to our projected handwritten character recog-
nition system, it gives square-shaped assessment distributed forms as entered for
skilled classifiers like SVM or ANN. It measures up to input attributes through
seizing on the prototype and resolves the most effective identical type for input data.
3.6 Post Processing
This phase carried out the process of removing unspecified outputs by adjusting
the semantic data. It is a procedure to get the result from the form identification
of handwritten data. Devanagri (Lipi)-based data records/images will amplify the
correctness gained by a clean appearance for acknowledgment. For script or hand-
written data input, some shape recognizers acquiesce one sequence of characters,
whereas the rest of others give up a range of proxy values for each temperament,
typically through a exist of assurance [21, 22].
The following model clarifies the machine learning process for character recog-
nition. The different phases of Machine Learning Classifiers are working, as shown
in Fig. 7.
Fig. 7 “Working process of machine learning classifiers”
Fig. 8 SVM calculation

Fig. 9 Experimental 90
analysis and discussion
80
70
K-Mean
60
50
SVM
40
30
20 Proposed
Model
10
0
150 750 1500 3000 4500
4 The Proposed Methodology
4.1 Dataset
In this paper, in character recognition processes, an appropriate and perfect dataset

is required. The first step of the proposed system is to train the system. After data is
given to the system, evaluate the performance. A machine learning algorithm dataset
is required. For this training data, we take images and specially scanned data. A total
of 1000 images are collected from different writing styles for the train system, and
2000 images are gathered to test the system operation.
4.2 Augmentation
After collecting the dataset, a Convolutional Neural Network (CNN) is used for
feature extraction. For feature extraction on the above dataset, a supervised learning
technique of Convolutional Neural Network (CNN) is used. For this, we take consid-
erable fundamental information. A large amount of data can give an extensive and
accurate amount of feature attribute in CNN. All the dataset are divided into their
different data category. The process of CNN is different in different images. It is
divided into three different convolutional layers. Each layer has in between max
building blocks. First, take an image as an input. The input is given to the initial
convolutional block and modifies the entered image with 36 kernels of 3.5 × 3.5.
After the first max building block, the primary convolutional building block’s result
gives the output to the second convolutional building block as input data. In the
second convolutional block, the image will be filtered with 64 kernels of size 4 × 4.
Give this output to the second max-pooling layer as input data; after that, the max
building block results in the final convolutional block. It filters images/data as 128
kernels of size 1 × 1 and gives the output as fully connected 512 neuron layers—
the result given to softmax function. The softmax function provides a prospected
224 R. Singh et al.
circulation of the four result categories. The last layer is connected to MLP. All
convolutional layer output has the activated ReLu function and it is fully connected
to the layers. The system is trained using Adam. The batch size is the size of 100
for 1000 epochs. Thus, we collect the features of the image dataset using the CNN
algorithm [23].
4.3 Classification Principle
In the data analysis and database management technique, clustering is one of the data
structure management techniques. Lots of data can be subdivided into a subgroup.
The same type of data can be placed in the same group. Using this method, we
can define the task of identification. Find a homogeneous subpart inside the data
point. Euclidean-based distance or correlation-based distance is used to identify
this method. It is an application-specification. Based on the features, sub-grouping
clustering analysis is used [2].
Clustering is an unsupervised machine learning method. Clustering can be done
differently. Partition of the dataset features is taken from the Convolutional Neural
Network (CNN). Each section of the dataset is non-overlapping clusters where every
point of features has belonged to only one group. Decide the total number of clus-
ters—first centroid of the random data point and iterating data. Suppose centroids
are not changed then iterating repeatedly. Data points assign in the same cluster.
After that, calculate the sum of the squared distance between data points and all data
centroids (Fig. 8).
After that, call the SVM algorithm to evaluate the k number of clusters. The sort
number is denoted as a T. Create a condition where every value can be evaluated as
a newly generated solution. Then it will give a kSVM solution.
ksvm-model = {(c1, Lsvm1), (c2, Lsvm2)…(ck, Lsvmk)};
where k = local model no. of the clusters;
y = it is presented as a parameter which is hyper of kernel function of RBF;
c = the error rate of SVM.
And in the very last return the global best solution. Repeat till all such cluster
is pruned, and it gives the final classification. Then we can classify and identify
the characters and numeric values very efficiently and calculate the accuracy of the
character’s recognition [8].
In Machine Learning, perhaps k-means is the most known and studied method
for clustering analysis [8, 9].
The k-means is a clustering method that helps to feed up new scanned data or
images into the required form of blocks for handwritten character categories. Using
a desktop GUI form/web portal, a client can demand and visualize the historical
images with the server’s past gathered data.
5 The Results and Discussions
The performance related to the handwritten character recognition system is calcu-

lated here. The overall performance is calculated as how much time is taken for the
recognition process. In this way, performance is calculated. For this experiment, the
Anaconda navigator software system is used. Instanced python programming using
a Jupyter notebook is more efficient. After the proposed method, the total loss will
be 0.02313. Using multilevel k-means and SVM algorithm, the efficiency of perfor-
mance will be calculated. After clustering, if the data will go through the SVM, the
multilevel classification will be a better result. 3500.00 data images are used to train
the proposed system. Test accuracy with the best parameter set is 0.9890 (Fig. 9).
In this Fig. 4, we derived that individual algorithms like the k-means process give
efficiency 87.9%, and SVM provides the efficiency of 90%. But in the proposed
methodology gives a better result. The performance analysis is 98.9%. The accuracy
is better than individual algorithm performance.
6 Conclusion
The proposed method can give better performance. The k-means algorithm doesn’t
work well in the universal cluster, and it does not work well with a cluster of different
data sizes and different data masses. So that after clustering, if we give the clusters in
the multiple SVM class, it provides better classification. In this method, it is found that
a large number of datasets can be easily trained and tested to recognize the different
handwritten characters. Now in daily life, this kind of approach is beneficial. Future
work can be developing the algorithm for better-segmented techniques. So there is a
scope of improvement in the methods.
References
1. Pal, U., Chaudhuri, B.B.: Indian script character recognition: a survey. Pattern Recognit. 37,
1887–1899 (2004)
2. Singh, H., Sharma, R.K.: Moment in online handwritten character recognition. In: National
Conference on Challenges & Opportunities in Information Technology (COIT- 2017,
Gobindgarh. March 23 (2007)
3. Hanmandlu, M., Ramana Murthy, O.V.: Fuzzy model-based recognition of handwritten
numerals. Pattern Recognit. 40, 1840–1854 (2007)
4. Arora, S.: Combining multiple feature extraction techniques for handwritten devnagari char-
acter recognition. In: IEEE Region 10 Colloquium and the Third ICIIS Kharagpur, India
(2008)
5. Arica, N.: An overview of character recognition focused on offline handwriting, C99-06-C-203,
IEEE (2000)
6. Cardona, G., Jain, D.: The Indo-Aryan Languages. Routledge, pp. 68–69 (2003). ISBN 978-
0415772945
226 R. Singh et al.
7. Ramteke, R.J., Mehrotra, S.C.: Recognition of handwritten devnagari numerals. Int. J. Comput.
Process. Orient. Lang. (2008)
8. Das, N., Sarkar, R., Basu, S., Saha, P.K., Kundu, M., Nasipuri, M.: Handwritten bangla character
recognition using a soft computing paradigm embedded in two pass approach. Pattern Recogn.
48, 2054–2071 (2015)
9. Sarkhel, R., Das, N., Saha, A.K., Nasipuri, M.: A multi-objective approach towards cost effec-
tive isolated handwritten bangla character and digit recognition. Pattern Recogn. 58, 172–189
(2016)
10. Indian, A.: A survey of offline handwritten hindi character recognition. IEEE (2017). 978-
15090-6403-8/17
11. Shamim, S.M., Neural, D.: Glob. J. Comput. Sci. Technol., Artif. Intell. 18(1) Version 1.0 Year
2018, Type: Double Blind. Peer Rev. Int. Res. J., Publisher: Global Journals, Online ISSN:
0975-4172 & Print ISSN: 0975-4350 (2018)
12. Saha, M.: Int. J. Adv. Sci. Technol. 29(9), 2900–2910 (2020)
13. Shukla, A.K.: Patient diabetes forecasting based on machine learning approach. In: Pant, M.,
Kumar Sharma, T., Arya, R., Sahana, B., Zolfagharinia, H. (eds.) Soft Computing: Theories and
Applications. Advances in Intelligent Systems and Computing, vol. 1154.Springer, Singapore
(2020). https://doi.org/10.1007/978-981-15-4032-5_91
14. Das, N., Basu, S., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.: Handwritten bangla compound
character recognition: potential challenges and probable solution. In: IICAI, pp. 1901–1913
(2009)
15. Das, N., Das, B., Sarkar, R., Basu, S.: Handwritten banglabasic and compound character
recognition using MLP and SVM classifier. J. Comput. 2(2), 109–115 (2010)
16. Bag, S., Bhowmick, P., Harit, G.: Recognition of bengali handwritten characters using skeletal
convexity and dynamic programming in emerging applications of information technology
(EAIT). In: Second International Conference , pp. 265–268 (2011)
17. Aggarwal, A., Rani, R., Dhir, R.: Handwritten Devanagari character recognition using
gradient features, international journal of advanced research in computer science and software.
Engineering 2(5), 85–90 (2012)
18. J. Pradeepa, E., Srinivasana, S., Himavathib.: Neural network based recognition system inte-
grating feature extraction and classification for english handwritten. Int. J. Eng. 25(2), 99–106
(2012)
19. Aggarwal, A., Rani, R., Dhir, R.: Handwritten devanagari character recognition using gradient
features. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2(5), 85–90 (2012). (ISSN: 2277-128X)
20. Pathan, I.K., Ali, A.A., Ramteke, R.J.: Recognition of offline handwritten isolated Urdu
character. Int. J. Adv. Comput. Res. 4(1), 117–121 (2012)
21. Vaidya, S.A., Bombade, B.R.: A novel approach of handwritten character recognition using
positional feature extraction. Int. J. Comput. Sci. Mob. Comput. 2(6), 179–186 (2013)
22. Wu, X., Tang, Y., Bu, W.: Offline text-independent writer identification based on scale invariant
feature transform. IEEE Trans. Inf. Forensics Secur. 9, 526–536 (2014)
23. Dholakia, K.: A survey on handwritten character recognition techniques for various indian
languages. Int. J. Comput. Appl. 115(1), 17–21 (2015)
PSO-WT-Based Regression Model
for Time Series Forecasting
P. Syamala Rao, G. Parthasaradhi Varma, and Ch. Durga Prasad
Abstract Regression models are used in engineering applications to improve the

accuracy of the prediction studies. The linear regression (LR) and multilinear regres-
sion (MLR) models are simple models recommended for several applications. In this
paper, the LR and MLR coefficients’ optimal values are identified using the particle
swarm optimization (PSO) algorithm. Further, the time series is processed through
wavelet transform (WT), and the decomposed coefficients are incorporated in the
regression models to improve the accuracy of the prediction. These PSO-WT-based
MLR models comparatively produce better results than PSO-MLR models.
Keywords Linear regression · Wavelet transform · PSO · Forecasting
1 Introduction
Identification of forecasting models plays a crucial role in present engineering appli-

cations. Irrespective of the domain, extensive data is available globally, and these fore-
casting and prediction models help improve the planning, operation, and scheduling
of future pricing. Accurate prediction increases the net revenue and reduces the
overall loss in abnormal conditions of the systems. For forecasting, linear regression
models were extensively used in all domains of engineering [1].
Some of the available applications in the literature are presented in this section.
PM10 concentrations are predicted in [2] using MLR approach with the help of time
series errors. The non-asymptotic error bounds concept was utilized in a similar
line in [3] to find the linear regression models’ time series. When the input and
output variations vary in time bounds, normal LR models are not suitable. Therefore,
P. S. Rao (B)
Department of CSE, Acharya Nagarjuna University, Guntur, India
G. P. Varma
Department of CSE, Chaitanya Bharathi Institute of Technology, Hyderabad, India
Ch. D. Prasad
Department of EEE, SRKR Engineering College, Bhimavaram, India
228 P. S. Rao et al.
segmented linear regression modeling is proposed [4]. One of the critical applica-
tions of regression models in the present smart cities concept is forecasting load
and renewable generation inputs. This MLR approach is used for short-term load
forecasting to provide better power system planning available in [5]. In [6], wind
forecasting is presented with a deep learning model due to wind speed’s volatile
nature. Recently, the application of these prediction studies has emerged in civil
engineering. Several optimized MLR models are presented in [7–10] to identify the
unknown quantities. However, these MLR models produce large errors when the
data is more volatile. Therefore, nonlinear regression models were developed for
accurate prediction studies. Artificial neural networks (ANN) [11, 12], fuzzy logic
[13], support vector machine (SVM) [14, 15], and deep neural networks (DNN) [16]
fall under intelligent studies. However, these models are complex and required hard
preprocessing.
In this paper, the accuracy of the prediction results is enhanced with wavelet
transform (WT). Using WT, the detailed and approximate frequency components of
the time series are extracted. These coefficients are used in PSO-assisted MLR model
for best fit. This approach predicts the future data more accurately than standard
MLR models. The complete procedure of algorithm implementation and results are
presented in the following sections: Sect. 2 includes applying wavelet coefficients
in the linear regression model with PSO assistance. Results and discussions are
presented in Sect. 3, and conclusions are reported in Sect. 4.
2 Proposed Method
Galton introduced the concept of regression analysis to identify the relationship

between the independent and dependent variables. This analysis yields a mathemat-
ical model, either linear or nonlinear, to predict or forecast future trends. Among
these models, MLR is extensively used in engineering applications. In MLR, the
decision variable is linearly expressed with independent variables. The generalized
expression for MLR is given by

n
y= βi xi + β0 (1)
i=1
In Eq. (1), y is the output variable and xi (i = 1, 2 . . . n) is the input variable. The
coefficients of MLR are identified to fit the data correctly. In this paper, the PSO
algorithm is used to find the optimal values of the coefficient. These models are not
suitable with available data in complex processes where identifying the dependent
variables is difficult. Therefore, more features are required for close prediction. For
this purpose, the input time data is processed through WT to extract the approximate
and detailed coefficients. Further, these coefficients are used in MLR models as input
variables. WT is used to find multiple time data features to find accurate prediction
PSO-WT-Based Regression Model for Time Series Forecasting 229
models of engineering applications [10, 11]. By using WT, time series x(t) can be
analyzed at various levels of decomposed coefficients using Eq. (2)
∞
w f (a, b) = ∫ x(t) · ϕa,b (t) · dt (2)
−∞
In Eq. (2), ϕa,b (t) is mother wavelet function. There are several mother wavelets
available in the literature. Among them, the Daubechies wavelets are used for more
useful feature extraction. This WT analysis decomposed the given signal into low-
and high-frequency components known as approximation coefficients al and the
detailed coefficients d l . When these coefficients are substituted in Eq. (1) with single
input data to obtain better forecasting outputs, the modified expression is given by

k
k
y= βi al i + γi dl i + β0 (3)
i=1 i=1
Equation (3) coefficients’ optimal value is also identified by using PSO [18]. Since
selecting the coefficients influences prediction, population search-based techniques
are suitable for finding the best-fit values. Among these techniques, PSO is a simple
population search-based algorithm introduced in 1995 by Kennedy and Eberhart.
This algorithm is used in this paper to find the best-fit regression coefficients. The
mechanism of PSO is based on two equations known as position ( p) and velocity
(v). The solutions of the problem (coefficients of MLR) are randomly generated
within the limits of the variables called positions. Using these randomly generated
initial values, the fitness of the particles is calculated and, based on the best fitness
solutions, updated using Eqs. (4) and (5) given by

vni+1 = ωv in + a1r1 pbest i − pni + a2 r2 gbest i − pni (4)
pni+1 = pni + vni+1 (5)
In Eqs. (4) and (5), pbest and gbest are the individual and group best positions.
The rest of the variables are the control and standard parameters of the PSO. The
objective function used for the PSO-WT-MLR approach is given by
2
Fitness function = yactual − ypredicted (6)
This fitness value is calculated for every solution of PSO to find the best optimal
coefficients of MLR.
Initially, an LR model has been considered in which the output depends on the single
input. The corresponding data of financial series is fit with linear expression given
by
y = β1 x1 + β0 (7)
Figure 1 shows the time series data for the implementation of regression models.
This data is divided into three segments known as pre-data, training data, and testing
data [17, 18]. The total samples in the data set are 768, out of which 1–10 samples
are used as pre-data samples, 11–720 samples for training, and 721–768 samples for
testing. For the data shown in Fig. 1, the optimal LR model obtained by PSO is given
by
y = 0.9916x1 + 4.1517 (8)
In Eq. (8), x1 is single sampled delay information of y. Similarly, the MLR model
for the data using twoinputs identified by PSO is given by
y = 1.0627x1 − 0.0719x2 + 4.5164 (9)
The prediction outputs of these traditional regression models further improved

with the help of WT coefficients. For this purpose, the input x1 is processed through
WT, and level-1 db4 detailed and approximation coefficients are extracted. Figure 2
shows these coefficients. With the help of these values, the modified PSO-assisted
MLR model is given by
y = 0.9964a1 − 0.4701d1 + 1.8923 (10)
The regression model coefficients are identified using PSO with the fitness func-
tion values 24.8059, 24.6779, and 15.9623. Using the PSO regression models, the
Fig. 1 Time series data consisting of 768 samples

Fig. 2 Db4 wavelet decomposed coefficients a approximate, b detailed
absolute errors of testing data are shown in Fig. 3. This result shows the improve-
ment in the proposed PSO-WT-MLR approach. The R 2 values for various regression
models obtained by PSO are reported in Table 1. These results show the improvement
of WT-assisted PSO-optimized MLR approach for accurate prediction of future data
samples.
Fig. 3 Errors of test data for all LR models
Table 1 R 2 values for

Approach R 2 in training phase R 2 in testing phase
various regression models
PSO-LR 0.9809 0.7531
PSO-MLR 0.9811 0.7535
PSO-WT-MLR 0.9877 0.8539
4 Conclusions
In this paper, the PSO-WT-MLR approach is presented for time series forecasting.
The regular LR and MLR models that were obtained using PSO produce many
errors in forecasting the data. Therefore, the prediction results are improved by
using wavelet coefficients. The wavelet coefficients extracted after data processing
provided better prediction models. The statistical measures and percentage errors of
the test data provided the efficacy of the proposed method.
References
1. Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples.
Biometrika 76(2), 297–307 (1989)
2. Ng, K.Y., Awang, N.: Multiple linear regression and regression with time series error models
in forecasting PM 10 concentrations in Peninsular Malaysia. Environ. Monit. Assess. 190(2),
63 (2018)
3. Alaeddini, A., Alemzadeh, S., Mesbahi, A., Mesbahi, M.: Linear model regression on time-
series data: non-asymptotic error bounds and applications. In: 2018 IEEE Conference on
Decision and Control (CDC), pp. 2259–2264. IEEE (2018)
4. Valsamis, E.M., Husband, H., Chan, G.K.: Segmented linear regression modelling of time-
series of binary variables in healthcare. Comput. Math. Methods Med. (2019)
5. Amral, N., Ozveren, C.S., King, D.: Short term load forecasting using multiple linear regression.
In: 2007 42nd International Universities Power Engineering Conference, pp. 1192–1198. IEEE
(2007)
6. Liu, H., Mi, X., Li, Y.: Smart multi-step deep learning model for wind speed forecasting based
on variational mode decomposition, singular spectrum analysis, LSTM network and ELM.
Energy Convers. Manage. 159, 54–64 (2018)
7. Egbe, J.G., Ewa, D.E., Ubi, S.E., Ikwa, G.B., Tumenayo, O.O.: Application of multilinear
regression analysis in modeling of soil properties for geotechnical civil engineering works in
Calabar South. Niger. J. Technol. 36(4), 1059–1065 (2017)
8. Nagaraju, T.V., Prasad, C.D., Murthy, N.G.: Invasive weed optimization algorithm for predic-
tion of compression index of lime-treated expansive clays. In: Soft Computing for Problem
Solving, pp. 317–324. Springer, Singapore (2020)
9. Nagaraju, T.V., Prasad, C.D.: Swarm-assisted multiple linear regression models for compres-
sion index (Cc) estimation of blended expansive clays. Arab. J. Geosci. 13(9) (2020)
10. Shen, Y.X., Yang, J.G.: Temperature measuring point optimization and thermal error modeling
for NC machine tool based on ridge regression. Mach. Tool Hydraulic. 40(5), 1–3 (2012)
11. Pradeep Kumar, D., Ravi, V.: Forecasting financial time series volatility using particle swarm
optimization trained quantile regression neural network. Appl. Soft Comput. 58, 35–52 (2017)
12. Ghazvinian, H., Bahrami, H., Ghazvinian, H., Heddam, S.: Simulation of monthly precipitation
in semnan city using ANN artificial Intelligence model. J. Soft Comput. Civil Eng. 4(4), 36–46
(2020)
13. Yuan, K., Liu, J., Yang, S., Wu, K., Shen, F.: Time series forecasting based on kernel mapping
and high-order fuzzy cognitive maps. Knowl.-Based Syst. 206, 106359 (2020)
14. Singh, V., Poonia, R.C., Kumar, S., Dass, P., Agarwal, P., Bhatnagar, V., Raja, L.: Prediction
of COVID-19 corona virus pandemic based on time series data using support vector machine.
J. Discrete Math. Sci. Cryptograp. 1–5 (2020)
15. Sahoo, B.B., Jha, R., Singh, A., Kumar, D.: Application of support vector regression for
modeling low flow time series. KSCE J. Civil Eng. 23(2), 923–934 (2019)
16. Vidal, A., Kristjanpoller, W.: Gold volatility prediction using a CNN-LSTM approach. Expert
Syst. Appl. 113481 (2020)
17. Gupta, D., Pratama, M., Ma, Z., Li, J., Prasad, M.: Financial time series forecasting using twin
support vector regression. PLoS ONE 14(3), e0211402 (2019)
18. Rao, P.S., Varma, G.P., Prasad, C.D.: Identification of linear and nonlinear curve fitting models
using particle swarm optimization algorithm. In: AIP Conference Proceedings, vol. 2269, no.
1, p. 030040. AIP Publishing LLC (2020)
Leaf Diagnosis Using Transfer Learning
Prashant Udawant and Pravin Srinath
Abstract Agricultural machine learning is used to improve plant yield and crop
quality. The key challenge faced by farmers in farming is the assault on bacterial
infections, fungal viruses, and worm attacks, or unsupervised leaf-decaying agricul-
ture. The application of transfer learning and tweaking of state-of-the-art models can
be used to diagnose plant and crop diseases. Here, a state-of-the-art method with
Faster RCNN and single shot detector (SSD) is used to propose a hybrid method for
detecting plant diseases. This hybrid method senses the leaves of the plant and deter-
mines the affected area. Experimental studies indicate that healthy and unsanitary
plant leaves are specifically classified.
Keywords Artificial intelligence · Cotton disease identification · Transfer

learning · Faster RCNN · Single shot detector
1 Introduction
The biggest problem farmers face is low yields and poor yields due to insects and
pests. Insects, rodents, fungus, and weeds decrease yields by up to 20% or more
during the early to mid-and post-harvest times. Farmers can use pesticides to keep
the insect and rodent populations under control. Lack of quality control, high prices,
noise, timely unavailability, lack of education, and the use of defective machinery due
to an untrained labor force are the main constraints for pesticide inefficiency. Cotton
plays a crucial role in the Indian economy, as its textile industry is primarily cotton-
based. The Indian textile industry contributes about 5% to its gross domestic product
(GDP), 14% to industrial production, and 11% to overall export revenues [1]. The
non-application of new technology leads to low yields relative to the global average.
P. Udawant (B)
Assistant professor SVKM’s NMIMS MPSTME, Shirpur, India
e-mail: Prashant.udawant@nmims.edu
P. Srinath
Associate professor SVKM’s NMIMS MPSTME, Mumbai, India
e-mail: pravin.srinath@nmims.edu
236 P. Udawant and P. Srinath
Infusion of modern farming practices to increase productivity is essential to allow

farmers to switch from subsistence to market-driven farming. Detection of disease
in crops is one such new technology that will help farmers detect different diseases
that are not readily recognizable with the naked eye. Deep learning implementations
of artificial intelligence (A.I.) to help classify plant pathogens by appearances and
visual signals that mimic human behavior should be discussed. While several state-
of-the-art machine learning models have been accessible and studied, applying these
models to the agricultural sector has not yet been seen on a wide scale. Pass learning
and alteration of state-of-the-art systems can be used to classify plant and crop
diseases. The previous study has shown that the detection of crop diseases based
on a computerized image scheme by extracting features has shown positive results.
Still, the extraction of features is computationally expensive and involves professional
expertise for successful depiction. Many studies have demonstrated that CNN has
been used to track artifacts such as Faster RCNN and SSD. There are only a few
small, large, curated image repositories in the Crop Disease database. According to
the report, the CNN [2] model trained with open-source Plant Village Dataset [3]
achieved an accuracy of 99.35% [4], but when checked with photos taken under
different conditions from the images used during model testing, the model achieved
an accuracy of 31.4%. Therefore, many safe and diseased images taken from other
infected parts of plants and developing under different environmental conditions are
needed to create reliable and precise detection models. This study aims to implement
state-of-the-art techniques to develop a hybrid model to detect disease and pests on
cotton plants.
2 Literature Survey
Deep learning models such as Convolution Neural Network [5] were used to develop
models for detecting and diagnosing diseases in plants using leaf images of diseased
and healthy plants. In previous works, deep learning models for the identification
and diagnosis of plant diseases were validated by five simple CNN architectures;
AlexNet, AlexNetOWTBn, GoogleNet, Over feat, VGG [6–9].
These models have been trained, tested, and implemented with the aid of the
Torch7 framework [10]. As shown by the findings, the highest performance rates
were achieved by VGG and AlexNetOWTBn architectures. These two models were
later trained and tested on the original pictures. When trained with the actual pictures,
VGG showed a maximum success rate of 99.53 percent.
The Deep Learning approach was used to establish an image-based system to
detect plant diseases [11]. The proposed method used a publicly available dataset
containing 54,306 images of both diseased and stable plant leaves. The dataset
included 14 species of plants and 26 diseases. Three experiments were performed
with various versions of the dataset; the first version of the dataset consisted of
original images, i.e., colored images; the second version of the dataset consisted of
grayscale images; and the third version of the dataset, consisting of images leaves, was
Leaf Diagnosis Using Transfer Learning 237
segmented from the image, thus eliminating the extra background from the photos.
Performance in two architectures: First, AlexNet [7] and second, GoogleNet [8] on
Plant Village Dataset [2] was studied by training the model in two separate cases.
In the first example, the model was learned from scratch, and in the second case, a
transition learning approach was used in which pre-trained models were modified
for model training. The average accuracy was 85.53% for the model trained from
scratch and 99.34% for the model trained from transfer learning.
A comparative analysis of machine learning algorithms was undertaken to clas-
sify safe and unhealthful plant leaves [12]. Three different kinds of plants were
selected for this analysis, namely cabbage, sorghum, and citrus. Three features were
taken into account to classify safe and unhealthful plants, which consisted of color-
based features such as pixels; descriptors such as histogram of directed gradients
(HOG) [13]; and statistical features such as mean, standard deviation, min, and max.
The dataset used to train the models consisted of 382 cabbage images, 262 sorghum
images, and 539 citrus images. Three separate forms of machine learning techniques,
namely support vector machine (SVM) [14], supervised ANN [15], and Random
Forest [16], have been used for the classification of these photos. For sample prepa-
ration, 60% of the dataset images were used, while the remaining 40% of the images
were used for research purposes. The efficiency of all three models was compared
with the F1 Score [17]. According to the findings, SVM obtained the highest F1 Score
for damaged sorghum but did not produce positive results in detecting damaged citrus,
while Random Forest achieved an average F1 score of 0.954.
An electronic diagnostic system for the detection of wheat diseases has been
proposed [18]. Deep multi-instance discovers that the suggested approach detects
wheat diseases and mapS disease regions with only picture-level annotations for
training pictures. The dataset containing in-field images of the wheat crop was
compiled and used for verification purposes. The wheat disease dataset consisted of
9,230 photographs containing 7 different types of disease, one of which was a healthy
form. An entirely convoluted network [19] (FCN) is used to achieve local feature
extraction and disease prediction. This convoluted network generates disease spatial
score maps where each score map corresponds to a local raw picture window. These
equations are then integrated into a multi-instance learning (MIL) system. Approxi-
mation of bounding boxes (BBA) is performed to better seal disease positions. Two
models VGG-CNN-S and VGG-CNN-VD16, which are considered basic models,
have been trained for 60 epochs with 0.0001 as an initial learning rate and a batch
size of 45 instances, while the advanced models VGG-FCNVD16 and VGG-FCN-S
have been trained for around 20 epochs with 0.00005 as an initial learning rate and
a batch size of 2 cases. The three aggregated functions of VGG-FCN-S and VGG-
FCN-VD16 were used to match the constructed models with the standard Convoluted
Neural Network. The findings reveal that VGG-FCN-S outperforms VGG-CNN-S
and VGG-FCN-VD16 outperforms VGG-CCN-VD16 in all categories except Black
Chaff.
3 Proposed Method
This research aims to detect the plant’s diseased region and provide useful informa-
tion to the farmers. A lot of research is going on to prevent pest attacks and identify
the disease early to avoid any further losses.
3.1 Image Acquisition
Around 1500+ images have been collected directly from Maharashtra and Gujarat’s
fields varying in diseases, health, color, etc. The images were gathered using current
high-quality mobile smartphone cameras. The dataset collected is verified by experts.
3.2 Image Pre-Processing
As the images taken of plant leaves are acquired from fields, they may contain
dust, water spots, or noise. Also, the collected dataset consists of images taken from
different cameras, which results in a difference in the exact pixels values. The sole
purpose of pre-processing is to reduce the noise and other irrelevancy to make data
consistent throughout the project. Each image was manually analyzed to find defects,
the difference in leaf color, blurriness, remove shadows, etc. (Fig. 1)
The processed images were less than needed for the project, so to get more data, we
augmented the collected dataset on different aspects like rotation, scaling, skewness,
shear, blurring, etc., randomly making our dataset of 1500+ images to more than
10,000+ images. To train the object detection model, we need to label or annotate
our data. The dataset images were labeled as “Healthy” and “Unhealthy” for the first
detection. The image annotation took place before the augmentation process; once
annotations were done, the images’ annoys and images were augmented, producing
more labeled data. The annotation was done using LabelImg Software. This software
helps create a bounding box and label the bounding box and store them in Pascal
VOC XML format, which can be read while training and transferring learning object
detection models.
3.3 Image Segmentation
Now, image segmentation is the next approach for the system. It represents the
image more understandably and easy to analyze. Under the segmentation process, a
digital image is subdivided into multiple segments. The main objective behind the
segmentation process is to dig out meaningful information from the digital image.
Fig. 1 The proposed image processing-based disease detection solution
The techniques which could be used under the segmentation process are Region
Based, Edge Based, Threshold Based, Feature-Based Clustering, and Color Based.
It is a method of demarcating the required features or characteristics of the picture

that is crucial when shown realistically to provide the necessary details needed for
interpretation and classification purposes. This extraction function could be done on
parameters such as color, shape, or texture. Shape-oriented extraction features such
as eccentricity, solidity, area, and perimeter are measured. While in texture-oriented
extraction, homogeneity, energy, correlation, and mean are calculated.
3.5 Classification, Training, and Testing
Once we got labeled data in Pascal VOC XML format, we did some error correction,
if any, in the labels and then converted them into a single CSV file along with the
bounding box values such as X-min, Y-min, X-max, Y-max, and labels. The converted
CSV file was converted into Tfrecords, which maps the bounding box values, labels,
and image pixel values into a package. If records are a type of file read by the
TensorFlow Python Library, which can unpack the underlying values while training.
The object detection model used to detect the “Healthy” and “Unhealthy” regions is
called the Single Shot Detector model. Using transfer learning, the MobileNet Single
Shot Detector model is trained for our particular dataset for over 30,000+ epochs
and got the testing accuracy of approximately 91. Once the model was trained, the
weights of the model were frozen and saved for further use. The frozen model can
test the model on new data and validate and compare it with other models. If any
changes are required, then continue training the model for the last checkpoint and
validate again.
4 Architectural Design
There are five significant measures used to diagnose plant diseases in plants. The
processing scheme consists of an image analyzer through an image, an image analyzer
like image enhancement, noise reduction, image annotation and segmentation where
the affected and usable areas are segmented, feature extraction, and classification. In
the end, the presence of diseases on plants can be observed and remembered. RGB
photos of leaf samples were obtained in the initial stage (Fig. 2).
The Single Shot Detector consists of two parts that extract map features and use
convolution filters to detect objects. SSD is using VGG16 to remove a structure
diagram. The particles are then identified using the Conv4 3 layer. Use multiple
layers (multi-scale attribute maps) to detect objects independently. As CNN gradually
limits spatial scales, the resolution of feature maps also decreases. SSD uses lower
resolution layers to detect larger object sizes. Add six additional auxiliary convolution
layers to the VGG16. Five of these will be added to the target detection. There are
six assumptions in three of these layers instead of four. Any added feature layer (or
current feature layer from the base network) produces a defined range of detection
predictions using a series of coevolutionary filters [20] (Fig. 3).
5 Result and Discussions
The Lecture Notes in Computer Science volumes are sent to ISI for inclusion in their
Science Citation Index Expanded (Table 1).
Fig. 2 Architecture design for expert
Fig. 3 Single shot detector architectural design

Table 1 Layer wise output

Layer (type) Output Shape Param#
shape and parameter
conv2d (Conv2D) (None, 254, 254,16) 448
max pooling2d (None, 127, 127,16) 8
(MaxPooling2D)
conv2d 1 (Conv2D) (None, 125, 125,32) 4648
max pooling2d 1 (None, 62, 62, 32) 8
(MaxPooling2)
conv2d 2 (Conv2D) (None, 60, 60, 64) 18,496
(MaxPooling2)
conv2d 3 (Conv2D) (None, 28, 28, 128) 73,856
(MaxPooling2)
conv2d 4 (Conv2D) (None, 12, 12, 128) 147,584
(MaxPooling2)
flatten (Flatten) (None, 4608) 8
dense (Dense) (None, 512) 2,359,808
dense 1 (Dense) (None, 15) 7695
Total Params: 2,612,527
Trainable Params: 2,612,527
Non-Trainable Params: 0
Previously built a classification model using Convolutional Neural Network

trained on the Plant Village dataset, an open-source dataset consisting of 15 plant
diseases. The CNN model had 11 layers and a total of 2,612,527 parameters, out of
which 2,612,527 parameters were trainable.
The images were first converted into Grayscale and LAB Color space, and noise
reduction was done to get more data. Two models are made using Grayscale images
and LAB color space images. The model is trained for 591 epochs and got the
best testing accuracy of 96.288%. But while validating the model, it is noticed that
the model correctly identified the plant in the image but struggles with the disease
identification. Many algorithms to extract the diseased area from the image were tried
like k-means hierarchical clustering to cluster out the diseased area but failed to do
so. The objective of the project is to identify the infected area out of the complete leaf
image. This becomes possible with the help of an object detection model. Transfer
learning is used to transfer the knowledge of a previously trained model on millions
of images for thousands of hours. Using transfer learning, an object detection model
is a build that could give the region of the diseased area as a result very quickly and
accurately (Fig. 4).
Fig. 4 CNN model description, k-means clustering, and results
5.1 Image Annotation
The software used for labelling the images for this project is called "Label-Img".
The software LabelImg helped us annotate images by building bounding boxes over
the region of interest and storing them in Pascal VOC XML format. Figure 5 shows
the procedure to label images using a labeling tool. It provides a perfect GUI for
managing labels and finds errors, color coding, etc. The color codes tell the different
labels, and the right panel shows us the total annotations done. The left panel is used
for the I/O, as well as for configuring the XML files.
Fig. 5 Feature maps for layer 3

The type of model used in the proposed method is object detection model called
Single Shot Detector using the MobileNet, which has the configuration responsible
for detecting an image of a cotton leaf as diseased or healthy. The Single Shot Detector
Object Detection model was trained on more than 30,000+ epochs on Google’s
Colaboratory server having Nvidia GPU Tesla K80 resulting in the final lossless
0.00.
5.2 Result for Test Image
All the images in Fig. 6 are the predicted results for the test images used for testing
the trained. It is showing the percentage of the healthy and unhealthy portion of the
cotton plant leaves.
Fig. 6 Results for Healthy and unhealthy leaf identification

6 Conclusion
Identifying the contaminated region with the help of machine learning was a big
challenge. Transfer learning technique is used to transfer knowledge of a previously
learned model to millions of images. With the use of transfer learning, an object
detection model is developed that has been able to send the diseased region as a
result very quickly and precisely. The result shows that from the testing image data
set Healthy and unhealthy cotton plant leaves have been identified accurately.
References
1. Dixit, P., Lal, R.C.: A critical analysis of indian textile industry: an insight into inclusive growth
and social responsibility. Russ. J. Agric. Socio-Econ. Sci. 88(4) (2019)
2. Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep
convolutional neural networks. Artif. Intell. Rev. 53(8), 5455–5516 (2020)
3. Hughes, D., Salath’e, M.: An open access repository of images on plant health to enable the
development of mobile disease diagnostics (2015). arXiv preprint arXiv:15110.080
4. Mohanty, S.P., Hughes, D., Salath’e, M.: quot; Using Deep Learning for Image-Based Plant
Disease Detectionquot, arxiv, 1604, 25 April 2016
5. Xin, M., Wang, Y.: Research on image classification model based on deep convolution neural
network. EURASIP J. Image Video Process. 2019(1), 40 (2019)
6. Ferentinos, K.P.: Deep learning models for plant disease detection and diagnosis. Comput.
Electron. Agric. 145, 311–318 (2018)
7. Alom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Van Esesn, B.C.,
Awwal, A.A.S., Asari, V.K.: The history began from alexnet: a comprehensive survey on deep
learning approaches. arXiv preprint arXiv:1803.01164 (2018)
8. Jasitha, P., Dileep, M.R., Divya, M. (2019) Venation based plant leaves classification using
GoogLeNet and VGG. In: 2019 4th International Conference on Recent Trends on Electronics,
Information, Communication amp; Technology (RTEICT) (pp. 715–719). IEEE
9. Wulandhari, L.A., Gunawan, A.A.S., Qurania, A., Harsani, P., Tarawan, T.F., Hermawan, R.F.:
Plant nutrient deficiency detection using deep convolutional neural network. ICIC Express Lett.
13(10), 971–977 (2019)
10. Nguyen, G., Dlugolinsky, S., Bob’ak, M., Tran, V., Garc´ıa, A.L., Heredia, I., Mal´ık: Machine
Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey.
Artif. Intell. Rev. 52(1), 77–124 (2019)
11. Mohanty, S.P., Hughes, D.P., Salath’e, M.: Using deep learning for image-based plant disease
detection. Front. Plant Sci. 7 Article 1419, September 2016.
12. Rahman, H., Jabbar Ch, N., Manzoor, S., Najeeb, F., Siddique, M.Y., Khan, R.A.:A comparative
analysis of machine learning approaches for plant disease identification. Adv. Life Sci. 4(4)
(2017)
13. Islam, M.A., Yousuf, M.S.I., Billah, M.M.: Automatic plant detection using HOG and LBP
features with SVM. Int. J. Comput. (IJC) 33(1), 26–38 (2019)
14. Cervantes, J., Garcia-Lamont, F., Rodr´ıguez-Mazahua, L. and Lopez, A., : A comprehen-
sive survey on support vector machine classification: Applications, challenges and trends.
Neurocomput. 408, 189–215 (2020)
15. Walczak, S.: Artificial neural networks. In: Advanced Methodologies and Technologies in
Artificial Intelligence, Computer Simulation, and Human-Computer Interaction, pp. 40–53.
IGI Global
16. Herrera, V.M., Khoshgoftaar, T.M., Villanustre, F., Furht, B.: Random forest implementation
and optimization for Big Data analytics on LexisNexis’s high performance computing cluster
platform. J. Big Data 6(1), 68 (2019)
17. Vakili, M., Ghamsari, M., Rezaei, M.: Performance Analysis and Comparison of Machine
and Deep Learning Algorithms for IoT Data Classification (2020). arXiv preprint arXiv:2001.
09636
18. Lu, J., Hu, J., Zhao, G., Mei, F., Zhang, C.:An In-field Automatic Wheat Disease Diagnosis
System. arxiv:1710.08299v1, 26 Sep 2017.
19. Chen, G., Zhang, X., Wang, Q., Dai, F., Gong, Y., Zhu, K.: Symmetrical dense-shortcut deep
fully convolutional networks for semantic segmentation of veryhigh-resolution remote sensing
images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 11(5), 1633–1644 (2018)
20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot
multibox detector. In European Conference on Computer Vision, pp. 21–37. Springer, Cham
(2016)
Attendance System Using Face
Recognition Library
Bhavna Patel, Vedika Patil, Onkar Pawar, Omkar Pawaskar,

and J. R. Mahajan
Abstract Among several methods used for monitoring the attendance of students,
facial recognition is not mostly acclaimed. The emerging image processing tech-
nology is not a prevailing part of regular attendance monitoring systems regardless
of the numerous benefits. To eliminate data handling processes, it is required to
design an intelligent system that detects a student’s face and verifies it from the
database. This paper proposes a system that uses TensorFlow for face identifica-
tion and verification and displays students’ attendance on a web-based/local GUI.
This system is capable of generating real-time output based on video feed obtained
from the classroom. The outcome is labeled with the name of the student as entered
in the database. This system functions on the Google Colab platform on Graphics
Processing Units (GPUs). In its preliminary stage, a local dataset of a student under
diverse light conditions has been experimented upon to study the behavior of the
Face Recognition algorithm in illumination. The results suggest that the algorithm is
effective under low light conditions as well. This paper primarily engenders signif-
icant advances in image processing through facial recognition library highlighting
Machine Learning applications in everyday circumstances.
Keywords Machine learning · Face recognition library · GPU · Illumination
Abbreviations
GUI Graphical User Interface

GPU Graphics Processing Unit
HOG Histogram of Oriented Gradients
SVM Support Vector Machine
B. Patel (B) · V. Patil · O. Pawar · O. Pawaskar · J. R. Mahajan

Department of Electronics & Telecommunication, MCT’s Rajiv Gandhi Institute of Technology,
Mumbai, Maharashtra, India
J. R. Mahajan
e-mail: jayant.mahajan@mctrgit.ac.in
248 B. Patel et al.
PSNR Peak Signal-to-Noise Ratio

RGB Red Green Blue
SQL Structured Query Language
CNN Convolutional Neural Network
CVPR Computer Vision and Pattern Recognition
VGGF Very Deep Convolutional Network for Large-Scale Face Recognition
Dataset
PCA Principal Component Analysis
LBPH Local Binary Pattern Histogram
LDA Linear discriminant analysis
API Application Programming Interface
1 Introduction
Attendance tracking is administered through traditional methods in the present time,

which demands time and effort for execution. Several techniques such as Finger-
print identification, Radio Frequency Identification cards, and Eyeball detection have
disadvantages that reduce such attendance tracking methodologies [1–3]. Advanced
face recognition techniques have been introduced to overcome the intrusive nature
and time ineffectiveness of new frameworks. The design and working of a precur-
sive structure have been presented in this paper, which introduces image processing
techniques using a face recognition library in daily events. In this method, a student’s
appearance is labeled by comparing the images captured during the class with the
accumulated database.
This paper presents an approach to design a viable framework that labels students’
attendance via an automated algorithm using standard machine learning procedures
and displays the attendance on an interface for accessibility of the administrator. The
process to detect a face using Convolutional Neural Networks involves identifying a
face in the image captured using HOG feature extraction in which images are broken
into cells, landmarks are seen, and its orientation is observed [4]. The training data
undergoes a facial verification process where features among different faces are clas-
sified and compared. After computing facial characteristics, they are differentiated
to identify the person in the image. The real-time images of persons verified from the
stored database are labeled with names after completing the procedure. Along with
real-time attendance tracking of students, this research has attempted to work upon
the deduced drawbacks in the study of previous work performed in the implementa-
tion of Face Recognition technology and Machine Learning in Attendance Tracking
Systems. As a subdivision of the subject, this study tries to analyze the functioning
of the dlib Face Recognition Library in varying light conditions, the results of which
are discussed in this paper.
This paper presents detailed structural information about the idea of an atten-
dance tracking framework with its working principle mapped through related work
Attendance System Using Face Recognition Library 249
performed in this field in Sect. 2, the proposed working system in Sect. 3 along
with an experiment, its results, and discussions in Sect. 4, and the conclusion of the
paper in Sect. 5. The study has been carried out in multiple factors that comprehen-
sively justify selecting necessary components for designing such a system. The areas
of consideration are the Machine Learning platform, Algorithm, Image Processing
Method, Dataset, and Camera.
2 Related Work
Among all the manual-based attendance tracking practices, face recognition is not
widely used. Machine Learning tools are proposed for improved results [3, 5, 6].
The overall system enables the algorithm to make decisions with minimum human
intervention. Image processing techniques are used to enhance facial recognition
and produce high accuracy. Digital image processing consists of numerous methods
to detect faces in an image composed of a finite number of elements referred to
as a picture element. Generally, image processing involves considering images as
2-dimensional properties while applying already set signal processing methods to
them [6]. HOG Feature Detection [7, 8], Haar Cascade using Eigenfaces [9], Local
Binary Pattern [6, 10], Principal Component Analysis [11], and Linear Discriminant
Analysis are some approaches used to detect faces from an image and extract requisite
features from it. In some cases, a combination of techniques has also been used for
enhanced results [11, 12].
A prototype system is proposed in [13] where feature extraction is performed
through 128-d facial embeddings from Face Net, and its implementation uses libraries
originating from OpenCV and dlib. Another system used the Viola–Jones Algorithm
and HOG features along with the SVM classifier to perform a quantitative analysis
of recorded images based on PSNR values [8]. The computed results suggest that
the higher the PSNR, the better its compressed or reconstructed image.
The TensorFlow platform will be used in the proposed system because it gives the
best result and accuracy for large datasets [8, 9]. Several image processing methods
can be used for face detection and verification, but based on primary research, the
most suitable is HOG feature extraction. Along with flexible programming, Tensor-
Flow allows easy data representation in terms of dataflow graphs that can be used
for the computation of accurate results. The datasets to be used are created from
students’ images in vivid poses, donning accessories like spectacles, masks, etc.,
and in varying light conditions.
In the HOG feature extraction method [7], the images are converted from RGB to
grayscale. These edited images are divided into cells to compute gradients within the
image, demonstrating the orientation of edges in the cell to determine the weights.
The cells are grouped to form blocks to normalize above overlapping spatial blocks
representing the descriptor. Moreover, the algorithm uses Support Vector Machines
(SVMs) that examine data and count gradient orientation appearances in parts of an
250 B. Patel et al.
image. The images’ training and testing are carried out in such binary classifiers to
determine a person’s attendance.
3 Proposed System
The proposed system incorporates a database of labeled images with students’ names
in a classroom environment, as shown in Fig. 1. This data is passed through a neural
network for training the system to identify facial features from the image. Later, the
camera input is provided to the face recognition algorithm in the form of a video
feed to obtain the total number of students and their name labels as real-time output.
Further, the attendance is registered on a web-based/local GUI, which presents the
administrator’s tabular form. This system ensures the effortless storage of attendance
data in a classroom.
The working of the proposed design is as follows.
1. Training the Model: A database of labeled students’ images is created. Some of
the images are sent to the Face Recognition Model as training data. This data
must include images in vivid poses and gestures.
2. Camera Input: After the model is trained, testing occurs on video data obtained
from the classroom.
3. Face Recognition Model: The system is trained with images in Support Vector
Machines that analyze the data and determine the face by processing the oriented
gradient features. This block of the procedure determines students’ presence by
comparing the camera input with the accumulated database.
Fig. 1 Block diagram of the proposed system

4. Real-Time Output: The output obtained labels the images and stores the record
simultaneously.
5. Attendance Database Management: The appearances of the students are
recorded in a tabular format. A system using SQLite is generated for database
management.
The model verifies faces by differentiating the test data from the original database
and displays the recorded attendance. HOG features being a dense overlapping grid
gives outstanding results for person detection [4] and thus are to be used in the Face
Recognition Model.
3.1 Experiment
As a segment of this proposed system, experimentation has been performed on

Google Colab GPU in which images of students under varying light conditions
have been tested. For the development and training of neural networks on Colab
Notebooks, original datasets of students have been used to study the working and
determine the algorithm’s effectiveness. The light illuminations have been taken to
high and low medium, such that the face recognition algorithm can be applied to all.
The camera plays a crucial role in the working of the system. Hence, the camera’s
image quality and performance in a real-time scenario must be tested thoroughly
before actual implementation [14]. The resolution of a camera plays a crucial role
in recognizing the image. The better the camera, the more accurately it will identify
the face [15]. After considering the research on hardware components to be used, the
in-built laptop camera has been used to detect and recognize this experiment. The
camera used has an efficiency of 0.922 megapixels with a resolution of 1280 × 720.
The algorithm deployed in the Face Recognition Model, which holds the system’s
image processing attributes, uses the following procedure.
1. Face Identification: This step involves detecting an object in the image, i.e., the
student’s face. For any object, the HOG image descriptor is used for training
and creating an object classifier with the use of HOG feature extraction.
2. Landmark Detection: To make the system more accurate, the landmarks of faces
are distinguished. The faces are aligned and centered such that the observed
output thus obtained is normalized.
3. Training: The system is trained with images in Support Vector Machines that
analyze the data and determine the face by processing the oriented gradient
features.
4. Face Recognition: During training, the system learns to distinguish between
different faces considering their landmarks weight votes. The aligned images
undergo facial verification.
252 B. Patel et al.
5. Testing: The real-time images of persons verified from the stored database are
labeled with names after completing the procedure. A new image can be distin-
guished by computing face features. After computing face embeddings, they
are differentiated to identify the person in the image.
HOG features are used due to advantages such as proper orientation binning,
fine-scale gradient, relatively coarse spatial binning, and high-quality local contrast
normalization, which is essential for good performance [4].
The experimentation findings show that the HOG feature extraction method has
exhibited satisfactory face identification and verification. The features and landmarks
have been detected successfully to provide positive output. The hardware deployed
in this experiment was that of low resolution, but it has been proven successful. The
images tested were shot under high, medium, and low light conditions. The dlib
Face Recognition Library has successfully identified faces in all the light conditions
shown in Fig. 2. The model’s working remains the same, but interpretation concerning
dataset and image classification changes [9]. It is inferred that using normalized
HOG features yields good results for face identification. The presented framework
functions under low light conditions to produce real-time output.
From this experiment, we also conclude that GPUs’ use has eliminated the scope
of some common error factors like false acceptance of data, low lighting, and occlu-
sion, which tend to reduce the system’s reliability. Using video as an input source
has been advantageous because it offers a lesser size for the same amounts of frames
that images provide. This paper indicates the relevance of advanced concepts like
Machine Learning in everyday situations to improve supervision and overcome
traditional methods’ drawbacks.
The experiment has been carried out through the TensorFlow platform because it
holds the following advantages (Table 1).
The proposed system’s complete study puts forth a framework that stimulates an
accessible platform for attendance monitoring with real-time output. Implementing
Fig. 2 The output of face recognition library for high, medium, and low light conditions
Table 1 Comparison among machine learning platforms

Parameters OpenCV TensorFlow Keras
Configurations CUDA Runtime API, CPU, Non-GPU TensorFlow GPU,
NVIDIA GPUs TensorFlow CPU
Training data Large datasets, positive Large datasets, Very large dataset
and negative images high-performance
models
Recognition Manual feature extraction Neural networks Softmax activation
function
Accuracy Poor Good Excellent
this structured system has been suggested to execute TensorFlow using datasets of
students in diverse poses and gestures. For larger datasets, the Viola–Jones algorithm
can be used parallel to processing of data using GPUs on TensorFlow. A high-quality
laptop camera has been used for experimentation, which can be used to execute
the system proposed. By successfully achieving this experimentation’s output, the
proposed system methodology’s performance can be improved under unfavorable
light conditions. Thus, the accuracy of the system can be enhanced to a large extent.
Cloud and the Internet of Things (IoT) systems have become very popular due to their
location-independent services, seamless connectivity and scalability, and portability
with significantly less energy consumption [16–18].
5 Conclusion
Under unfavorable light conditions, this algorithm has delivered accurate output.
Therefore, attendance tracking during classroom sessions can be effectively executed
through this system. As discussed, CNN’s use extends the working environment to
ensure attendance monitoring in low light conditions. There is seemingly high effec-
tiveness of face recognition technology in everyday situations, eliminating subsidiary
changes in the domain that affect the system’s reliability.
This experimentation will be expanded in the Machine Learning arena in the future
with the introduction of novel algorithms that can increase the system’s learning rates
and efficiency. With the TensorFlow platform, an advanced system will be designed
that can handle larger datasets. Furthermore, the system’s data can be made available
through a live feed, thus increasing real-time output. A database management system
can be designed to maintain attendance logs, which can be accessed when desired.
This experiment provides a study for illumination through photo and video feed.
Research can be conducted on pose and gesture alterations and angle variations.
With these advancements, there can be developed an innovative system that can
effortlessly be utilized in everyday situations.
254 B. Patel et al.
References
1. Varadharajan, E., Dharani, R., Jeevitha, S., Kavinmathi, B., Hemalatha, S.: Automatic atten-
dance management system using face detection. In: Coimbatore, 2016 Online International
Conference on Green Engineering and Technologies (IC-GET)
2. Hoo, S., Ibrahim, H.: Biometric-based attendance tracking system for education sectors: a
literature survey on hardware requirements. J. Sens. (2019)
3. Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multi-task
cascaded convolutional networks. IEEE Signal Process. Lett.
4. Kazemi, V., Sullivan, J.: One-millisecond face alignment with an ensemble of regression
trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 1867–1874 (2014)
5. Khan, S., Akram, A., Usman, N.: Real-Time Automatic Attendance System for Face Recog-
nition Using Face API and OpenCV 2020. Springer Science+Business Media, LLC, part of
Springer Nature 2020
6. Chintalapati, S., Raghunadh, M.V.: Automated attendance management system based on face
recognition algorithms. In: 2013 IEEE International Conference on Computational Intelligence
and Computing Research
7. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), vol.
1, pp. 886–893. IEEE (2005, June)
8. Rathod, H., Ware, Y., Sane, S., Raulo, S., Pakhare, V., Rizvi, I.A.: Automated attendance system
using machine learning approach. In: Navi Mumbai, 2017 International Conference on Nascent
Technologies in Engineering (ICNTE)
9. Sripathi, V., Savakhande, N., Pote, K., Shinde, P., Mahajan, J.: Face recognition based
attendance system. 2020 Int. Res. J. Eng. Technol. (IRJET)
10. Salim, O.A.R., Olanrewaju, R.F., Balogun, W.A.: Class attendance management system using
face recognition. In: Kuala Lumpur, 2018 7th International Conference on Computer and
Communication Engineering (ICCCE)
11. Patil, M.N., Iyer, B., Arya, R.: Performance evaluation of PCA and ICA algorithm for facial
expression recognition application. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K.
(eds.) Proceedings of Fifth International Conference on Soft Computing for Problem Solving.
Advances in Intelligent Systems and Computing, vol. 436, pp. 965–976. Springer, Singapore
(2016). https://doi.org/10.1007/978-981-10-0448-3_81
12. Borkar, N.R., Kuwelkar, S.: Real-time implementation of the face recognition system. In: Erode,
2017 International Conference on Computing Methodologies and Communication (ICCMC)
13. Handaga, B., Murtiyasa, B., Wantoro, J.: Attendance system based on deep learning face
recognition without queue. In: Semarang, Indonesia, 2019 Fourth International Conference on
Informatics and Computing (ICIC)
14. Apoorva, P., Impana, H.C., Siri, S.L., Varshitha, M.R., Ramesh, B.: Automated criminal iden-
tification by face recognition using open computer vision classifiers. In: Erode, India, 2019 3rd
International Conference on Computing Methodologies and Communication (ICCMC)
15. Khan, S., Akram, A., Usman, N.: Real time automatic attendance system for face recognition
using face API and OpenCV. Wireless Pers. Commun. 113, 469–480 (2020)
16. Deshpande, P., Iyer, B.: Research directions in the Internet of Every Things (IoET). In: 2017
Noida, 2017, pp 1353–1357. https://doi.org/10.1109/CCAA.2017.8230008
17. Iyer, B., Patil, N.: IoT enabled tracking and monitoring sensor for military applications. Int. J.
Syst. Assur. Eng. Manag. 9, 1294–1301 (2018). https://doi.org/10.1007/s13198-018-0727-8
18. Deshpande, P.: Cloud of everything (CLeT): the next-generation computing paradigm. In:
Advances in Intelligent Systems and Computing, vol. 1025, pp. 207–214, Springer, Singapore
(2020). https://doi.org/10.1007/978-981-32-9515-5_20
Studies on Performance of Image
Splicing Techniques Using Learned
Self-Consistency
Bhukya Krishna Priya, Anup Das, Shameedha Begum,

and N. Ramasubramanian
Abstract In recent years, improper image manipulation has become a significant

issue across many sectors, such as in the education field, politics, entertainment
sector, and social media platforms. In various society sectors, images play a cru-
cial role in interpreting the facts and number of fraud incidents. The fraud and fact
can be detected by image manipulation, which is gaining more and more atten-
tion in today’s world. Detecting such manipulations images is a major challenging
problem concerning many fragmented images’ unavailability as training data. A
learning algorithm has been proposed for recognizing many visually spliced images
and manipulations in the image. The proposed algorithm also utilizes the existing
recorded image Exchangeable Image File Format (EXIF) metadata as a supervisory
image for training the model to identify the image is self-consistent, i.e., whether
the content of the photo has been generated using a single imaging pipeline. The
proposed algorithm is implemented to the errand of identifying and localizing the
image splices. An excellent performance improvement has been observed on several
benchmarks, despite not using any spliced image data while training. This model
will stand out as a vital source to detect morphed images in various industries.
Keywords Self-Consistent · Exchangeable image file format · Imaging pipeline
1 Introduction
Malicious image manipulation has become a widespread attack observed in the

image. The malicious image can be accessible to almost everyone, because of which
there is a lot of fake content being generated on various internet platforms and social
media platforms like Facebook, etc. [1–3]. The elementary photo altering skills have
B. K. Priya (B) · A. Das · S. Begum · N. Ramasubramanian

Department of Computer Science and Engineering, National Institute of Technology,
Tiruchirappalli 620015, Tamil Nadu, India
e-mail: Shameedha@nitt.edu
N. Ramasubramanian
e-mail: nrs@nitt.edu
256 B. K. Priya et al.
been used to produce realistic combined images, substitute significant image regions,
and generate unreal images.
Most of the existing techniques have used the standard supervised learning
approaches for several image detection problems. A problem with the standard super-
vised learning approaches is that, despite being efficient for many types of detection
problems, they aren’t perfectly suitable for image splice detection. This is because
the domain of manipulated images is broad and extensive, that it is nearly impossible
to have adequate manipulated training data in order to make a supervised method
completely work out. As a matter of fact, detecting visual manipulation can be pic-
tured as an anomaly detection problem, that is, we intend to mark something that
seems different and “out of normal”. For the image splice detection problem, no
suitable solution has been found in the literature [4–7]. Hence, a method has been
proposed that does not need to have any modified training data and can function in a
self-supervised environment. The image EXIF metadata has been utilized by the pro-
posed method to have a comprehensive source of information. The EXIF metadata
tags are nothing but specifications of the camera, which have digitally been inscribed
in the image during capturing and are pervasively accessible. Figure 1 shows that the
first glimpse of an image might appear original, but after detailed observation, it is
noticed that the person on the left (Modi) has been captured into the image. The
spliced region content is from a separate image, given next to it. This type of manip-
ulation is known as a spliced image, and this is an unusual way of producing fake
images. After accessing source images, it can interpret from their respective EXIF
metadata that may have numerous variations in the imaging pipelines. Hence, the
model utilizes the automatically recorded image Exchangeable Image File Format
(EXIF) metadata as the supervisory signal for training the model to verify if the
image is self-consistent, i.e., a single imaging pipeline has generated photo content.
Figure 1 shows an Anatomization of the spliced image. A typical method of pro-
ducing forged images is done by splicing the content from two separate authentic
image sources. The proposed model is based on the patches of a spliced image gen-
erated with different EXIF metadata attributes, that is, by other imaging pipelines.
But while testing an image for its authenticity on the model, access to EXIF metadata
values of these two source images is not highlighted.
Fig. 1 Anatomization of the splice

Studies on Performance of Image Splicing Techniques Using Learned Self-Consistency 257
2 Related Work
Salloum et al. [8] proposed a technique that learns to detect the spliced regions
trained using a thoroughly conventional network on labeled training data. This tech-
nique has been used to detect the significant problems of identifying some specific
manipulation cues, like double JPEG compression and contrast amplification. Mayer
et al. [9] introduced a model using the Siamese network to establish if pairs of image
patches are from a similar camera model, a specific case of our EXIF data consis-
tency model though the results are very preliminary. These methods also estimated a
photo’s semantic content and its metadata matches. Agarwal and Farid [10] proposed
a methodology that utilizes an inconsiderable difference between imaging pipelines
to detect image splices, especially during JPEG quantization, the method in which
different cameras truncate numbers. Due to easy interpretability, these approaches
are found to be helpful. Bondi et al. [11] define an algorithm for tampering identi-
fication and localization in the image, using characteristic footprints present on the
images by dissimilar camera models. In pristine images, the algorithm shows that
each pixel has been detected to identify the image as being shot with one device.
Conversely, traces of multiple devices are often seen if an image is obtained through
image composition. This proposed algorithm uses a CNN network to extract all the
camera model characteristics from image patches. The features are further evaluated
using iterative clustering to verify whether a picture has been forged and localizing
the tampered region. Doersch et al. [12] introduce a model that is trained to see
whether the relative positions of pairs or patches from a picture match or not. It is
trained to find out very similar artifacts like chromatic lens aberration as images are
noise. The proposed model has been implemented with our algorithm to interpret the
imaging pipeline’s properties and pass semantics.
3 Proposed Architecture
Figure 2 describes the flow diagram of how the splice detection and localization of a
spliced image are done. Two random image patches from an image are considered,
and prediction is done whether they have consistent metadata or not. Each metadata
attribute consists of a consistency metric at the time of training and testing.
3.1 Training Network
The Siamese network is used for training the proposed model. In the Siamese net-
work, ResNet50 has been used as a subnetwork for extracting features from the image
patches, i.e., ResNet50 has a depth of 50 layers and 224 × 224 image input size. In
ResNet50, the image patches have to be passed through different convolution layers
Fig. 2 Flow diagram of splice detection and localization

Fig. 3 Siamese network with ResNet50 subnetwork
and other operations like batch normalization. Relu is performed on the resultant
block to get feature vectors of both the image patches. The feature vector passed
from the three-layer MLP (Multilayer Perceptron). MLP is used to detect different
kinds of object detection and edge-detection and passed through the sigmoid func-
tion. The sigmoid function calculates the two input image patches’ various features
are the same or not, and based on these, a similarity score matrix is generated to
show the two patches having the same EXIF attributes or different EXIF attributes.
The proposed model is trained using the Siamese Neural Network with the help of
ResNet50 subnetwork, as shown in Fig. 3.
3.2 Attribute Consistency (First Phase)
Two image patches of size 128 × 128 are predicted to know the probability that
the two image patches have the same value for each EXIF metadata attribute. Flickr
images are considered for identifying the features, which occur in most similar image
patches (more than 50,000). EXIF values that do not occur more than 100 times
for each EXIF attribute are considered in the images. During training, the Siamese
network uses ResNet50 as a subnetwork that produces a 4096 dim. The feature
vectors are concatenated and pass them via a three-layer MLP with 4096, 2048,
1024 units, and a final output layer having the similarity score is generated. In this
way, for a particular EXIF attribute, a prediction is performed whether two patches
of an image will have the same value or not.
The two main challenges faced in this technique are some EXIF attributes are too
challenging to learn as they are rare, and selecting the random pairs may have the
same values for EXIF attributes even though they are from the same image.
These issues can be overcome by implementing a unary and pairwise re-balancing.
During unary re-balancing, merge the rare EXIF attributes and construct a minibatch
for each attribute. While completing minibatch during pairwise re-balancing, fifty
percent of the batch should have the same value for the attribute, and remaining
should not.
3.3 Post-processing Consistency (Second Phase)
Three operations have been performed in these phases, like randomly image resizing,
re-reading the image, and performed the Gaussian blur operation. For each opera-
tion, parameters have been chosen randomly from a discrete set of numbers. These
three operations have been introduced to our model to check whether the two image
patches have the same parameters for a particular augmentation operation. The post-
processing facilitates identifying the image’s consistency despite the manipulated
region consisting of the same metadata attribute as the image into which it was
inserted previously. By adding these three operations, 83 (80 + 3) binary attributes
are present. The next step is to combine the consistency predictions and get the over-
all consistency of the image, as shown in Fig. 4
Self-supervised training of the images: At the time of training, two random image
patches are predicted whether the two patches have consistent metadata or not.
Fig. 4 EXIF comparison of two image patches

3.4 Combining Consistency Predictions of Two Image

Patches (Third Phase)
The proposed model is a self-supervised task and there is no supervision on the

spliced part of the image as the proposed model works on unmanipulated images
during training. So, the EXIF consistency predictions are used as a simple classifier
to predict the image patches, whether they are coming from the same image or not.
3.5 Aggregate Image Consistency from Patch Consistency

(Fourth Phase)
The next step is the aggregation process. In the aggregation process, combine pairwise
consistency probabilities (which we got from the third phase) into a global self-
consistency score for the entire image. For a particular image, image patches are
sampled in a grid. This striding results in a maximum of 625 patches (for the standard
4:3 aspect ratio, we sample 25 × 18 = 450 patches). A response map has been a picture
corresponding to its consistency with every other patch in the image for a given
patch. The average overlapping calculation has been performed on patch predictions
to increase the spatial resolution of each response map. If the image is manipulated,
most of the patches from the non-spliced area of the image will ideally have a lower
consistency compared to the patches from the spliced portion. To generate a final
response map for an input image, it is essential to find the most consistent mode
among all patch response maps by the mean shift technique. This mean shift map
gives the response map by segmenting the image into two parts like consistent and
inconsistent regions naturally. This aggregated response map is our consistency map.
4.1 Dataset
Three different datasets have been used for testing and evaluating:
• Columbia dataset: This consists of 180 relatively simple images.
• Realisting Tampering(RT) dataset: This contains 220 images with a combination
of splicing and post-processing operations. Some other manipulations such as
copy-move are also included.
• In-the-wild dataset: It contains 201 images, which are extracted from THE ONION,
a news website, and REDDIT PHOTOSHOP BATTLES. As ground truth labels for
internet splices are not available, to get the approximate ground truth, annotation
of images is done manually.
Fig. 5 Spliced images and the EXIF consistency outputs
4.2 Results of Splice Localisation
Figure 5 shows some spliced images and the EXIF consistency outputs produced by
our model. Figure 6 shows real authentic images and their EXIF consistency outputs
obtained from the proposed model. The proposed model has correctly indicated that
the images are consistent as they are original authentic images.
5 Conclusion
Images play a significant role in interpreting the facts and number of fraud incidents
happening in society. To identify such an image splice, a learning-based algorithm has
been proposed for recognizing many visually spliced images and manipulations in
the image. The proposed model shows an impeccable performance improvement, and
hence it can be integrated into various sectors to curb the spread of fake content and
avoid fraud incidents. The model also localizes the spliced region with an accuracy of
86%, the precision of 87%, and the specificity of 97%. With the additional copy-move
forgery detection, it can detect even amazing manipulations in the visual fakes.
Fig. 6 Real authentic images and their EXIF consistency outputs
References
1. Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Two-stream neural networks for tampered face
detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops
(CVPRW), pp. 1831–1839. IEEE (2017). https://doi.org/10.1109/CVPRW.2017.229
2. Ghosh, P., Morariu, V., Larry Davis B.-C.I.S.: Detection of metadata tampering through dis-
crepancy between image content and metadata using multi-task deep learning. In: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 60–68
(2017). https://doi.org/10.1109/CVPRW.2017.234
3. Kniaz, V.V., Knyaz, V., Remondino, F.: The point where reality meets fantasy: Mixed adversarial
generators for image splice detection. In: Advances in Neural Information Processing Systems,
pp. 215–226 (2019)
4. de Sa, V.R.: Learning classification with unlabeled data. In: Advances in Neural Information
Processing Systems, pp. 112–119 (1994). https://doi.org/10.5555/2987189.2987204
5. Ferrara, P., Bianchi, T., De Rosa, A., Piva, A.: Image forgery localization via fine-grained
analysis of CFA artifacts. IEEE Trans. Inf. Forensics Secur. 7(5), 1566–1577 (2012). https://
doi.org/10.1109/TIFS.2012.2202227
6. Ye, S., Sun Q., Chang, E.-C.: Detecting digital image forgeries by measuring inconsistencies
of blocking artifact. In: 2007 IEEE International Conference on Multimedia and Expo, pp.
12–15. IEEE (2007). https://doi.org/10.1109/ICME.2007.4284574
7. Cun, X., Pun, C.-M.: Image splicing localization via semi-global network and fully connected
conditional random fields. In: Proceedings of the European Conference on Computer Vision
(ECCV) (2018)
8. Salloum, R., Ren, Y., C-C. Jay Kuo (2018) Image splicing localization using a multi-task
fully convolutional network (MFCN). J. Vis. Commun. Image Represent. 51, 201–209 (2018).
https://doi.org/10.1016/j.jvcir.2018.01.010
9. Mayer, O., Stamm, M.C.: Learned forensic source similarity for unknown camera models. In:
2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
pp. 2012–2016. IEEE (2018). https://doi.org/10.1109/ICASSP.2018.8462585
10. Agarwal, S., Farid, H.: Photo forensics from JPEG dimples. In: 2017 IEEE Workshop on
Information Forensics and Security (WIFS), pp. 1–6. IEEE (2017). https://doi.org/10.1109/
WIFS.2017.8267641
11. Bondi, L., Lameri, S., Güera, D., Bestagini, P., Delp, E.J., Tubaro, S.: Tampering detection
and localization through clustering of camera-based CNN features. In: 2017 IEEE Conference
on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1855–1864. IEEE
(2017). https://doi.org/10.1109/CVPRW.2017.232
12. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context
prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp.
1422–1430 (2015). https://doi.org/10.1109/ICCV.2015.167
Random Forest and Gabor Filter Bank
Based Segmentation Approach for Infant
Brain MRI
Vinodkumar R. Patil and Tushar H. Jaware
Abstract A precise study of infant brain development during the first year of infancy
is crucial in the initial research of rapid neurological growth. Non-invasive techniques
of neuroimaging, such as MRI, are essential for connecting the brain to behavioral
changes in neonates and infants. The illustration of the developing brain from magnet
resonance (MR) offers a description of the developmental process following gesta-
tion. It is challenging to maintain normality since the presence of the normal brain
varies nearly every week. Hence, images in the infant MR generally reveal a decreased
contrast to adult tissue images. Therefore, the current computing techniques gener-
ally designed for adult brains are not appropriate for treating images in neonatal MR.
Few analytical tools for neuroimaging of the baby brain were suggested to over-
come those problems. This article presented state-of-the-art empirical approaches
for MRI diagnosis and the study of baby brains that helped us comprehend neonatal
neurodevelopment. We employed the BM3D image denoising method in prepro-
cessing stage. A hybrid combination of 32 Gabor filter banks and Canny edge, Sobel,
Prewitt, Scharr, Gaussian (with σ = 3 and 7), Median (with σ = 3 and 7), and Roberts
operators are used for effective feature extraction. Then lastly, the Random Forest
classifier is utilized for tissue segmentation and classification.
Keywords Random forest · Infant brain MRI · Gabor filter · BM3D · Neonatal
phase · Segmentation
1 Introduction
A computerized morphologic study of the infant’s brain is essential for a critical

understanding of natural growth and brain development [1]. Neurodevelopmental
disorders were strongly correlated with brain defects, which provided a clinical
assessment frame. Structured representations of the infant and neonatal brains can
be visualized through MRI advancements at the tissue level [2]. Qualitative anal-
ysis neuroimaging experiments utilizing MRI have been progressively employed
V. R. Patil (B) · T. H. Jaware

R. C. Patel Institute of Technology, Shirpur, Maharashtra, India
266 V. R. Patil and T. H. Jaware
in the neonatal phase to examine brain development and growth. Brain segmenta-
tion is a fundamental requirement for evaluating the tissue structure of the brain
through computational MRI. Though manual analysis of MR image takes an incred-
ibly long time and so it is a tedious task. Moreover, the intra and inter heterogeneity
is vulnerable to manually marking, which decreases its efficacy. The shortcoming in
the manual method poses a challenge to make more extensive samples of subjects
needed for the large population assessment. Thus, specific automated approaches are
required to delineate the brain tissues. Automatic newborns and infant brain segmen-
tation are much more challenging than those of the adult human brain. Infant Brain
MRI is having relatively low CNR and low SNR because of the small head size.
However, at image collection, they are often exposed to significant motion artifacts.
The primary objective of this research is to have an accurate, automated infant brain
MRI segmentation framework.
Challenges
Automatic infant brain MRI segmentation is also tricky, amid success in the capturing
of MR images. There exist critical constraints for the MR images that adversely
impact Segmentation regardless of the development of the conceptual framework.
The intensity of subsequent tissue types is non-uniform as well as steadily varies
throughout the input image. The RF coils and magnetic interference produces
intensity inhomogeneity in the input image. Applied magnetic field power scanner
contributes to more substantial variations in intensity. The mix of various tissue types
in a particular voxel with partial volume (PV) effect presents inherent problems for
the accurate description of tissue borders [3]. Though this resulting image has low
resolution, voxels with many tissues consist of mixed tissue strength in the voxel.
Besides, flickering inside this image is always visible, mainly due to electric noise
inside the body and subtle irregularities in receiving devices [4, 5]. So infant brain
MRI automated process of tissue segmentation becomes much tricky compared to
an adult.
The paper is structured as follows: Sect. 2 focuses on related work, Sect. 3 illus-
trates the proposed hybrid segmentation method, Sect. 4, presents simulated findings
and discussion finally the conclusion of research work is explored in Sect. 5.
2 Related Work
Several segmentation approaches for the automated categorization of the newborn

brain MRI were being suggested over the past decades. All such strategies aim to
identify objects of the importance of varying granularities: brain, tissues, or even more
specific structures. These methods perform Segmentation of brain tissues and can
be grouped in atlas fusion-based method, unsupervised method, parametric method,
and deformable models based approaches. For tissue segmentation purposes, atlases
are used [6]. Consequently, difficulties related to image processing accelerated brain
growth, and limited imaging data usability complicate the segmentation operation. In
Random Forest and Gabor Filter Bank … 267
Fig. 1 Infant brain MRI segmentation approaches
this section, a brief overview of segmentation methods utilized for infant brain MRI is
presented. It is classified into the brain and non-brain extraction, tissue segmentation,
and more detailed structure delineation and depicted in Fig. 1.
Research Gaps
An infant’s brain progresses in the first two years is complex and more relevant to the
advancement of neural skills and lifelong habits and the likelihood of neurodevelop-
mental disease. It is also essential to recognize and measure the standard progressive
brain patterns defined as early and specifically as needed for deviating progression,
highlighting developmental issues and disturbance [7]. The prevailing methodolog-
ical constraint on the comprehensive and accurate measurement of structural and
functional brain for infants by MRI is partly responsible for this critical research
gap [8]. Although the appropriate and automatic (considering the time) division
of earlier brain functional MRIs has not adequate frameworks, currently offered—a
method necessary for a nearly most comprehensive study of early brain MRIs. Unless
the infants’ Segmentation is precise and automatic, the MRI study is systemic and
labor-intensive, reducing all sample sizes, durability, and robustness. The initia-
tives towards early detection of defects in growth or developmental disabilities and
tracking the impacts of intervention would substantially raise the complexities of the
first-year adolescents’ Segmentation [9].
3 Proposed Method
3.1 Input Image
The T1 and T2 weighted Brain MRI images of infants are used as the input images.
The images are used in the work are taken from a standard database of infant brain
MRI [10]. The input images are acquired using an MRI scanner having different
scanning times, protocols, and sequences. The input images are having low contrast,
affected with noise due to RF Coils and magnets. For accurate Segmentation of brain
tissues, it’s very much necessary to preprocess these images and improve their quality
for further analysis. In healthcare, particularly in medical image analysis, edges are
of great importance. Preservation of edges and improvement of image quality is a
significant concern for accurate diagnosis.
3.2 Image Denoising
BM3D is the new denoising approach relying on the assumption that an image has
a local patchy description in the time domains. By separating identical 2D image
patches towards 3D classes, this sparsity is improved [11]. BM3D is an advanced
technique. Owing to very precise block-matching in the more substantial edges, their
denoted findings can always be higher than in the smoother or weaker edge areas. This
makes improved image denoising with the use of adjustable block sizes in various
image areas. BM3D filtering and grouping process is known as the collaborative
filter method. This is performed in four different phases [12].
• Reveal as well as organize the image patches in a 3D block like a particular image
patch
• Perform 3D linear transformations on the image
• Modify variables with a shrinking range
• Perform reverse 3D linear transformation.
In the medical image analysis, the feature performs an incredibly crucial role.
Different image preprocessing operations are performed input brain MR image before
obtaining features [13, 14]. Afterward, feature extraction methods are used to acquire
essential features in medical image segmentation. Here 32 Gabor filter banks and
Canny edge, Sobel, Scharr, Gaussian (with σ = 3 and 7), Median (with σ = 3 and 7),
and Roberts operators are used for effective feature detection and edge preservation.
The expression used for the Gabor filtering bank is given as follows.
x
x 2 + γ 2 · y2
g(x, y; λ, θ, ψ, σ, γ ) = exp exp i 2π + ψ (1)
2σ 2 λ
The significant increase in medical research, a hybrid combination of all the above
methods, is the novelty of this research work. Optimum features are selected for this
purpose to improve segmentation accuracy.
Fig. 2 Proposed infant brain MRI segmentation scheme
3.4 Image Segmentation
Recently, random forests (RFs) have expanded interest in the infant’s brain MR image
analysis [15–17]. These RFs have proved to be specific and stable for several brain
tissue segmentation challenges to handle a significant volume of multiclass data of a
high dimension. The proposed segmentation framework is depicted in Fig. 2, whereas
Fig. 3 represents the random forest approach for Brain MR image segmentation.
Figure 4 represents the simulation results of the proposed method. For evaluating
each segmentation approach, the DSC and accuracy parameters are employed by
comparing results with manual Segmentation. Higher DSC and accuracy values indi-
cate superior performance. Regarding manual interpretations considered the ‘gold
standard,’ the efficiency of segmentation approaches is investigated systematically.
With overlapping tests or object displacement measurements, the efficiency is usually
approximate. The Dice Similarity coefficient seems to be the most standard indicator
of overlap used in an infant’s brain MRI, and the following expression gives it
2|X ∩ Y |
DSC = (2)
|X | + |Y |
Fig. 3 Random forest segmentation approach
where X represents Segmentation obtained using the proposed algorithm and Y repre-
sents manual Segmentation. In the complete agreement between the two segmenta-
tion approaches, the calculation takes a value of 1 and 0 when there is no overlap.
Table 1 represents the values of performance metrics, which include DSC and
accuracy. Table 2 depicts the performance comparison with existing methods.
5 Conclusions
Infant brain MRI is becoming more attractive as higher quality images are obtained,
and newborn growth is gradually being focused. We proposed a system for infant’s
brain MRI segmentation in this research paper. Initially, segmentation challenges
and research gaps are discussed, followed by a short overview of the preprocessing
methods used to minimize image artifacts related to movement, noise, and volume.
To overcome BM3D image denoising and hybrid feature extraction strategies, the
Random Forest classifier is employed for accurate brain tissue segmentation. The
latest segmenting challenge iSeg2017 database is used for the Segmentation of
neonatal brain tissues. The method outperforms as compared to many existing
methods.
Images T1 w Segmented Image Ground Truth
Fig. 4 Simulation results
Table 1 Performance metrics

Image DSC Accuracy (%)
1 0.92 91.56
2 0.90 90.47
3 0.88 90.87
4 0.91 91.48
5 0.90 90.68
Table 2 Performance
Method for segmentation DSC
comparison
Proposed 0.902
Moeskops et al. [18] 0.857
Choi et al. [19] 0.820
References
1. Hack, M., Fanaroff, A.A.: Outcomes of children of extremely low birth weight and gestational
age in the 1990s. Seminars Neonatol. 5(2), 89–106 (2000)
2. Marlow, N., Wolke, D., Bracewell, M.A., Samara, M.: Neurologic and developmental disability
at six years of age after extremely preterm birth. N. Engl. J. Med. 352(1), 9–19 (2005)
3. Makropoulos, A., Gousias, I., Ledig, C., Aljabar, P., Serag, A., Hajnal, J., Edwards, A., Counsell,
S., Rueckert, D.: Automatic whole brain MRI segmentation of the developing neonatal brain.
IEEE Trans. Med. Imaging 33(9), 1818–1831 (2014)
4. Belaroussi, B., Milles, J., Carme, S., Zhu, Y.M., Benoit-Cattin, H.: Intensity nonuniformity
correction in MRI: existing methods and their validation. Med. Image Anal. 10(2), 234–246
(2006)
5. Tofts, P.: Quantitative MRI of the Brain: Measuring Changes Caused by Disease. Wiley (2003)
6. Weishaupt, D., Froehlich, J.M., Nanz, D., Kochli, V.D., Pruessmann, K.P., Marincek, B.:
How Does MRI Work?: An Introduction to the Physics and Function of Magnetic Resonance
Imaging. Springer (2008)
7. Xue, H., Srinivasan, L., Jiang, S., Rutherford, M., Edwards, A.D., Rueckert, D., Hajnal, J.V.:
Automatic segmentation and reconstruction of the cortex from neonatal MRI. NeuroImage
38(3), 461–477 (2007)
8. Rutherford, MA: MRI of the Neonatal Brain. W.B. Saunders (2002)
9. Prastawa, M., Gilmore, J.H., Lin, W., Gerig, G.: Automatic segmentation of MR images of the
developing newborn brain. Med. Image Anal. 9(5), 457–466 (2005)
10. Makropoulos, A., Counsell, S.J., Rueckert, D.: A review on automatic fetal and neonatal brain
MRI segmentation. Neuroimage 170, 231–248 (2018)
11. Gilmore, J.H.: Understanding what causes schizophrenia: a developmental perspective. Am. J.
Psychiatry 167(1), 8–10 (2010)
12. Wang, Y., Haghpanah, F., Aw, N., Laine, A., Posner, J.: A transfer-learning approach for first-
year developmental infant brain segmentation using deep neural networks (2020). https://doi.
org/10.1101/2020.05.22.110619
13. Wang, L., et al.: Benchmark on automatic six-month-old infant brain segmentation algorithms:
the iSeg-2017 challenge. IEEE Trans. Med. Imaging 38(9), 2219–2230 (2019)
14. Smith, S.M.: Fast robust automated brain extraction. Hum. Brain Mapp. 17(3), 143–155 (2002)
15. Shattuck, D.W., Sandor-Leahy, S.R., Schaper, K.A., Rottenberg, D.A., Leahy, R.M.: Magnetic
resonance image tissue classification using a partial volume model. Neuroimage 13(5), 856–876
(2001)
16. Magar, V.M., Christy, T.B.: Gabor filter based classification of mammography images using
LS-SVM and random forest classifier. In: 2nd International Conference on Recent Trends in
Image Processing and Pattern Recognition, pp. 69–83. Springer, India (2018)
17. Mahsa, D.D., Louis, C.: BISON: brain tissue segmentation pipeline using T1-weighted
magnetic resonance images and a random forest classifier. Magn. Reson. Imaging 85(4),
1881–1894 (2021)
18. Moeskops, P., Viergever, M.A., Benders, M.J., Isgum, I.: Evaluation of an automatic brain
segmentation method developed for neonates on adult MR brain images. In: Proceedings of
SPIE Medical Imaging, vol. 9413 (2015)
19. Choi, U.S., Kawaguchi, H., Matsuoka, Y., Kober, T., Kida, I.: Brain tissue segmentation based
on MP2RAGE multi-contrast images in 7 T MRI. PLoS ONE 14(2) (2019)
Sensory-Motor Cortex Signal
Classification for Rehabilitation Using
EEG Signal
Vinay Kulkarni, Yashwant Joshi, and Ramchandra Manthalkar
Abstract Brain–Computer Interface (BCI) is a vibrant topic in rehabilitation engi-

neering. It is essential to have informative features and practical classification algo-
rithms for the appropriate communication between humans and machines. This paper
focuses on improving accuracy in detecting elbow movements and analyzing the
effect of kinematic movement variability over the sensory-motor cortex with 10–20
EEG system. The EEG data from healthy volunteers was acquired by training them
for a proposed protocol. These healthy volunteers were asked to play a PC game
intended for rehabilitation with his/her elbow movement utilizing the ArmeoSpring
treatment instrument. The input raw EEG signal is passed through 8–30 Hz band-
pass filter. The Common Spatial Pattern (CSP), Event-Related Desynchronization
(ERD)/Synchronization (ERS), and Autoregressive Moving Average (ARMA) mod-
eling features are estimated and tested with SVM classifier. The proposed framework
can differentiate and classify the kinematic movements of the elbow with an aver-
age accuracy of 84.61, 92.77, and 97.26% for CSP, ARMA, and ERD/ERS features,
respectively. The experimental results on the proposed dataset demonstrate that the
combination of feature extraction techniques and classifiers improves the classifica-
tion performance, which would benefit the BCI rehabilitation community.
Keywords Electroencephalogram (EEG) · Brain–computer interface (BCI) ·

Event-related synchronization (ERS)/desynchronization (ERD) · Common spatial
pattern (CSP) · ARMA modeling · Armeo spring · Rehabilitation
1 Introduction
A Brain–Computer Interface (BCI) or Mind–Machine Interface (MMI) using EEG

is well-known terminology. It may be formulated as an assembled block that trans-
reduces the mind activity or thoughts of the subject. These thoughts having specific
patterns are translated into commands or messages for a respective external device [1].
BCIs are most useful for patients who cannot generate signals from their motor region
V. Kulkarni (B) · Y. Joshi · R. Manthalkar

Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded, India
274 V. Kulkarni et al.
resources and cannot control their limbs or any body parts. A BCI empowers these
subjects to control a PC with their imagination for correspondence, portability, and
different purposes [2]. Practically speaking, movement-related signals are mostly
used for BCI applications since they are very much defined and unbiased to the
subject [3]. Movement-based BCIs enable the subject to demonstrate a decision by
performing or just thinking of a few already defined movements. If BCI is used
practically, it tries to convert the rhythmic signals related to human sensorimotor
procedures in a noninvasive way have been seriously studied. In this field, researchers
are mostly concentrated on power changes in alpha (α) and beta (β) bands [4]. The
benefit of the BCI procedure is the feasibility of predicting imaginary or actual
movement and finding which EEG signals would be best for controlling such an
instrument.
As the standard BCI model, Motor Execution (ME) or Motor Imaginary (MI) is
a unique mental state. During such a state, everyone plays psychological practice of
sensory-motor activity with or without motor execution. Hence, most of the time,
motor imaginary is used to control the disabled person’s idea [5]. The signals related
to particular movement and increment or decrement of the power in the specific band
are known as Event-Related Desynchronization (ERD) and Event-Related Synchro-
nisation (ERS), respectively [6]. The upper limbs’ actual movement induces changes
in EEG signals over a sensory-motor zone in explicit frequency ranges, for example,
the beta band and alpha band.
To distinguish this task incited EEG action, many feature extraction methods
have improved MI and ME BCIs. For instance, Power Spectral Density (PSD) or
AR modeling algorithms have been utilized to describe ERD, and ERS [7, 8]. As
of late, spatial filters like CSP have been demonstrated to help expand contrasts
Fig. 1 Block diagram of proposed EEG signal classification system

Sensory-Motor Cortex Signal Classification for Rehabilitation Using EEG Signal 275
between actual movement features, which leads to classifying these classes quickly
[9]. After feature extraction, its important to classify the signal with many latest
machine learning algorithms, for example, Linear Discriminant Analyzers (LDA) and
Support Vector Machine (SVM), etc., have been utilized to distinguish the objective
of ME and an interpretation of the motor execution of its output [10].
Figure 1 gives the block diagram of the proposed system for the classification
of sensory-motor signal while the subject is performing the suggested task. The
paper is arranged in the following manner. Section 2 gives a detailed explanation
of the materials and methods implemented during work. Section 3 gives the results
obtained during experimentation. Finally, Sect. 4 concludes the work presented in
the paper.
2 Materials and Methods
2.1 Experiment Design
The participants were instructed to sit in a chair and their right hand was fixed in an
ARMEO Spring rehabilitation device (Hocoma, Switzerland). The ARMEO Spring
is an exoskeleton and supports the subject’s arm from gravity to prevent muscle
fatigue.
2.2 Experimental Paradigm
The motor activity areas that are better reflected are central, partial, and frontal, but
dominantly on the central portion. So that EEG data were recorded from 20 active
electrodes (CZ, P4, F8, Pz, P3, P8, O1, O2, T8, P7, C4, F4, Fp2, Fz, C3, F3, Fp1,
T7, F7, Oz, i.e., EXT) attached to an EEG cap from 20 dry scalp electrodes as per
the 10–20 system which is accepted internationally as shown in Fig. 2a (Top view),
b (Side view).
Figure 3 shows the experimental setup for a designed protocol in which the subject
is playing the High Flyer game and simultaneously brain signals were recorded by the
Enobio-20 machine designed by Neuro-Electrics. The Enobio machine’s advantage
is its functioning as wireless and wired and has a sampling rate 500 Hz.
The experimental setup and the experimental paradigm is as shown in Figs. 3
and 4, respectively. A rest period is between 0 and 4 s at the start of each run and
the end. There is a 1-s gap for mental preparation at the actual beginning run. The
trial ended 1 s after the success queue. After every run, a break of 5 s followed. Each
run consisted of 30 trials (15 trials for each target, randomly distributed). Six runs
were recorded, i.e., three for the left hand and three for the right hand. Also, four
different weight conditions to be followed for forearms like A, B, C, and D in weight
(a) Top View (b) Side View

Fig. 2 10–20 electrode placement system
Fig. 3 Experimental setup for protocol with ENOBIO and ArmeoSpring
Fig. 4 Experimental paradigm

increasing order. The two trials of right-hand movement were recorded with the load
A and the load B. Similarly, two trials of left-hand movement were recorded with
the load C and the load D. Hence, we get the data of actual movements of both right
and actual left hand.
2.3 EEG Data Analysis and Preprocessing
The EEG signal is very noisy as it contains artifacts added due to muscle movements
of different body parts, 50 Hz line noise, etc. Hence, preprocessing of recorded raw
EEG data is an important step. To remove this noise present in the raw EEG signal,
a bandpass filter from EEGLAB is used having the following specifications. The
FIR filter of transition width of 0.5 Hz, passband edges between range Hz, cutoff
frequencies—[7.75 30.25] Hz. The Independent-Components Analysis (ICA) is used
to eliminate noise from the recorded data. Hence, clean EEG data were generated by
removing such ICs. For example, an EEG sample channel before and after artifact
removal is shown in Fig. 5.
2.4 Time Series Modeling Approach for EEG Signal
The ARMA model is the combination of Autoregressive (AR) model and Moving
Average (MA) model. ARMA modeling is among the most conspicuous parametric
techniques [11]. It predicts the future output from a previous linear mixture of EEG
data along with some independent components. This brings clean EEG data. The for-
ward expectation of filtered EEG data was cultivated by utilizing the accompanying
Fig. 5 ICA filtering on raw EEG data

derivation

m
p [k] = an p [k − n] + q [k] (1)
n=1
where q[k] is the prediction error (modified information included in present EEG
sample), an are the model parameters, and p[k] is a time series fed to the model.
For the evaluation, recorded EEG data having 20 electrodes. Each elbow movement
is trialed for 1 s, and sampling of filtered EEG data is done with the sampling rate
500 Hz.
The ARMA ( p, q) model is

p

q
yt = − at(c) yt−c + bt(d) et−d + et (2)
c=1 d=1
where yt is a sample in time domain belongs to one channel, at(c) is an AR model

coefficient and bt(d) is moving average coefficients at the multiple and random time
points t and d, c representing zeros and poles. et is white Gaussian noise.
2.5 Event-Related Desynchronization/Synchronization
The attributes of ERD/ERS methodology are summarized as EEG power calculated

from the alpha and beta bands is comparatively shown relative to the power of the
same frequency band when a subject is idle [12]. This power is called the baseline
power.
This technique is as follows: (1) Apply the bandpass filter for all trials (2) Power
is calculated by squaring the amplitude of each sample, i.e., the voltage at each time
sample (3) Take the average of all calculated trials.
The ERD/ERS is calculated as relative power increase or decrease concerning the
baseline power. PR is baseline power and P j is power at jth row.
P j − PR
ERD/ERS = (3)
PR
The above Fig. 6 shows ERS and ERD patterns due to left movement on C3 and C4
electrodes.
2.6 Spatial Domain Analysis
Common spatial pattern (CSP) [13] discovers spatial filters that increase the variance
of a particular class also parallelly decrease that of another. Suppose matrix X ∈ R E×P
Fig. 6 ERD patterns for left, right-hand movement at C3 and C4 electrodes
catch an attempt of filtered EEG data which portray an actual developer, where E
stands for electrodes, and P is a number of data points in a single run. If N j trials
are there in single training set for the class C j , so a mean covariance matrix for C j
is calculated with covariance matrix M
1
Cj = M (4)
N j M∈Ω
j
It could find the spatial filter w ∈ R E×1 which increases the variance of C1 as well
as decreases the variance of C2 by the fixing the following:
w T C1 w
max J (w) = (5)
w T (C1 + C2 )w
The solution of Eq. (5) is costed with the eigenvalue problem by C1 = λ(C1 +
C2 )w where w is the eigenvector and λ is a generalized eigenvalue. This gives E
eigenvectors wk also eigenvalues λk , k = 1,…, T. Hence, designed spatial filter gives
output yk
yk = wkT X (6)
The columns of wkT are CSPs which are considered as the origin of time-invariant
features vectors. The features used for classification are obtained by filtering the EEG
from Eq. (6).
2.7 Classification Performance
This paper conducted one experiment three times with different speeds (Slow,
Medium, and Fast) and tested every classification algorithm on it. Also, every subject
has undergone the six trials for each elbow movement at each speed level. The results
obtained by applying multiple approaches are compared in the following section.
To test the classification results on features extracted from filtered data, the SVM
and KNN with different kernels from the Matlab 2018b are applied. After applying
all classifiers on filtered EEG data, it finds that Q-SVM [14] is the best classifier that
works for the EEG and BCI. In the holdout validation method, 30% of total data is
used for testing, and the remaining 70% are kept to train the classifier. Every time
while running the classifier, this 30% for testing data is selected randomly.
3.1 Comparison of Feature Extraction Methods with Q-SVM

Classifier
This section describes the quality of the collected EEG dataset and the performance
of different feature extraction methods is summarized in the following Table 1.
This comparison also gives the importance of frequency bands while performing
actual movements at the sensory-motor cortex area. Among three ARMA, CSP, and
ERD/ERS methodologies, ERD/ERS gives the best results with an average classi-
fication accuracy of 97.26 ± 0.3. From Table 1, it can be seen that hand movements
are better captured in the alpha band (8–12 Hz) than the beta band (13–30 Hz). The
performance of every subject is studied from three different experiments, as shown in
below Table 1. The bold values in Table 1 shows the best classification results among
all the trials. The classification accuracy for subject 3 is best among all trials from
all subjects.
The average classification accuracy of the proposed methodology is compared
with the state-of-the-art performances, as shown in Table 2. Nicolas-Alonso et al. [15]
focused on classifier and used Regularized Linear Discriminant Analysis (RLDA).
The authors got lesser accuracy (75%) as only bandpass filtering was performed in
the preprocessing part. To overcome this, ICA is included in the preprocessing part.
Ghaemi and Rashedi [16] had reported 76.02% by implementing Blind Source Sep-
aration (BSS) to remove artifacts that are not that much powerful if data is too noisy.
The ICA cleans EEG data irrespective of the rawness of collected EEG data. Tang et
al. [17] achieved a classification accuracy of 87.37%. They have used ERD/ERS as
features and LDA as a classifier. In 2019, Sun et al. [18] have worked on variations
based on segmented bispectrum (VBSB) as a feature and achieved an average CA
of 93.10%. The VBSB works on averaging technique.
Table 1 Comparison of feature extraction methods within different frequency bands using Q-SVM
classifier
Alpha Beta
ARMA CSP ERD/ERS ARMA CSP ERD/ERS
SUB 1 Exp1 90.9 82.9 96.1 94.2 89.7 96.5
Exp2 93.2 84.0 97.6 95.4 91.6 96.2
Exp3 91.5 82.9 97.1 94.4 89.1 96.6
SUB 2 Exp1 92.9 86.8 97.0 94.2 89.7 96.3
Exp2 88.9 80.0 95.8 92.1 87.5 96.2
Exp3 94.6 87.7 97.0 96.2 92.9 97.2
SUB 3 Exp1 95.4 88.3 98.7 96.6 94.1 96.6
Exp2 93.4 83.8 97.4 95.6 91.7 97.7
Exp3 94.1 85.1 98.6 96.6 93.4 97.2
Mean ± SD 92.77 ± 0.6 84.61 ± 0.8 97.26 ± 0.3 95.03 ± 0.4 91.08 ± 0.6 96.72 ± 0.1
Table 2 Comparison of proposed method with state of art methodologies

Authors Method Average CA (%)
Nicolas-Alonso [15] CSP + regularized LDA 75
Ghaemi and Rashedi et al. [16] BSS + ERS + SVM 76.02
Tang et al. [17] ERD/ERS+LDA 87.37
Sun et al. [18] BP + VBSB + SVM/NN 93.10
Proposed method BP + ICA + CSP + Q-SVM 84.61
BP + ICA + ARMA + Q-SVM 92.77
BP + ICA + ERD/ERS + Q-SVM 97.26
Here this paper tried to develop a robust and generalized algorithm to classify left
and right-hand movement irrespective of subject-specific model and tuning. Here
the paper implements three different ways to achieve higher accuracy than other
methodologies.
4 Conclusions
This paper proposes different ways to deal with the non-stationary time series EEG
data. The paper worked on recorded and filtered EEG used to classify the movements
of the upper limb. The detachability of EEG features of complete BCI data has
been assessed after the recording. The unique methodology has been conducted
to collect raw EEG data using a designed, efficient protocol. The results showed
above clearly signify that the ERD/ERS algorithm performs much better than other
techniques such as CSP and ARMA. The ERD/ERS marking shows contralateral as
well as an ipsilateral reflection at the sensory-motor cortex area when the volunteer is
performing actual movement tasks. The effect of hand movements is better captured
in the alpha or mu frequency bands. The proposed framework would lead to the
advancement of real-world BCI technology strategies and assistance for healthy and
motor-disabled persons in everyday life.
References
1. Clerc, M., Bougrain, L., Lotte, F.: Brain-Computer Interfaces, vol. 1. Wiley-ISTE (2016)
2. Fok, S., Schwartz, R., et.al.: An EEG-based brain computer interface for rehabilitation and
restoration. In: 2011 Annual International Conference of the IEEE Engineering in Medicine
and Biology Society, pp. 6277–6280. IEEE (2011)
3. Vidaurre, C., Ramos-Murguialday, A., Haufe, S., Gómez-Fernández, M., Müller, K.-R.,
Nikulin, V.V.: Enhancing sensorimotor BCI performance with assistive afferent activity: an
online evaluation. NeuroImage (2019)
4. Pfurtscheller, G., Brunner,C., Schlögl, A., Da Silva, F.H.L.: Mu rhythm (de) synchronization
and EEG single-trial classification of different motor imagery tasks. NeuroImage 31(1), 153–
159 (2006)
5. Liang, S., Choi, K.S.: Improving the discrimination of hand motor imagery via virtual reality
based visual guidance. Comput. Methods Programs Biomed. 132, 63–74 (2016)
6. Pfurtscheller, G.: Induced oscillations in the alpha band: functional meaning. Epilepsia 44, 2–8
(2003)
7. Penny, W.D., Roberts, S.J., Curran, E.A., Stokes, M.J.: EEG-based communication: a pattern
recognition approach. IEEE Trans. Rehabil. Eng. 8(2), 214–215 (2000)
8. Shibata, E., Kaneko, F.: Event-related desynchronization possibly discriminates the kinesthetic
illusion induced by visual stimulation from movement observation. Exp. Brain Res. 237(12),
3233–3240 (2019)
9. Song, X., Yoon, S.-C.: Improving brain-computer interface classification using adaptive com-
mon spatial patterns. Comput. Biol. Med. 61, 150–160 (2015)
10. Guger, C., Edlinger, G., Harkam, W., Niedermayer, I., Pfurtscheller, G.: How many people
are able to operate an EEG-based brain-computer interface (BCI). IEEE Trans. Neural Syst.
Rehabil. Eng. 11(2), 145–147 (2003)
11. Tseng, S.-Y., Chen, R.-C., Chong, F.-C., Kuo, T.-S.: Evaluation of parametric methods in EEG
signal analysis. Med. Eng. Phys. 17(1), 71–78 (1995)
12. Nagai, H., Tanaka, T.: Action observation of own hand movement enhances event-related
desynchronization. IEEE Trans. Neural Syst. Rehabil. Eng. (2019)
13. Feng, J.K., Jin, J.: An optimized channel selection method based on multi frequency CSP-rank
for motor imagery-based BCI system. Comput. Intell. Neurosci. 2019 (2019)
14. Hortal, E., Planelles, D.: SVM-based brain-machine interface for controlling a robot arm
through four mental tasks. Neurocomputing 151, 116–121 (2015)
15. Nicolas-Alonso, L.F.: Adaptive stacked generalization for multiclass motor imagery-based
brain computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 23(4), 702–712 (2015)
16. Ghaemi, A., Rashedi, E.: Automatic channel selection in EEG signals for classification of left
or right hand movement in brain computer interfaces using improved binary gravitation search
algorithm. Biomed. Signal Process. Control. 33, 109–118 (2017)
17. Tang, Z., Sun, S., Zhang, S., Chen, Y., Li, C., Chen, S.: A brain-machine interface based on
ERD/ERS for an upper-limb exoskeleton control. Sensors 16(12), 2050 (2016)
18. Sun, L., Feng, Z., Lu, N., Wang, B., Zhang, W.: An advanced bispectrum features for EEG-based
motor imagery classification. Expert Syst. Appl. 131, 9–19 (2019)
D-CNN and Image Processing Based
Approach for Diabetic Retinopathy
Classification
Armaan Khan, Nilima Kulkarni, Ankit Kumar, and Anirudh Kamat
Abstract People with diabetes risk developing an eye disease called diabetic
Retinopathy. It happens when high blood glucose levels cause damage to blood
vessels within the retina. These blood vessels may swell, leak or close, stopping
blood from passing through. Sometimes new blood vessels may grow on the retina.
All of these results can steal the eye vision. Generally, for the diagnosis and detec-
tion of this disease, skilled professionals must detect this disease using images of the
patient’s retina. But due to recent development and improvement in deep learning,
this task can be done very efficiently and easily using advanced techniques in deep
understanding. We have implemented multiple states of the art DNN architecture like
InceptionV3, VGG net, and ResNet with transfer learning. We have used Gaussian
blur with some filters as preprocessing the image, and it is found that it gives better
results. This also helped to remove unwanted noise from the image. In this work, the
dataset contained images of five different D.R. classes (No D.R., Mild, Moderate,
Proliferate DR, Severe) is used. After training multiple models, InceptionV3 had the
best result with an accuracy of 81.2% on training data and 79.4% on testing data, so
we chose it.
Keywords Diabetic retinopathy (D.R.) · Deep neural network (DNN) · Gaussian

blur · Transfer learning · InceptionV3
1 Introduction
Diabetic Retinopathy usually has no early warning signs. It can cause fast vision
loss. In general, however, a person with this disease is likely to possess blurred
vision, creating it very difficult to do things like browse or drive. In some cases, the
vision can recover or worse throughout the day. It is the leading cause of blindness
in adults and the most common cause of blindness among people with diabetes. It
can arise due to the high blood sugar levels that diabetes causes having too much
A. Khan (B) · N. Kulkarni · A. Kumar · A. Kamat

Department of Computer Science and Engineering, MIT School of Engineering, MIT-ADT
University, Pune 412201, India
284 A. Khan et al.
sugar in the blood and can damage blood vessels throughout the body, including the
retina. If sugar blocks the tiny blood vessels connected to the retina, it can cause
them to leak or bleed. As a result, the eye may grow a new blood vessel that is much
weaker and leaks or bleeds more quickly than a normal retina. If the eye starts to grow
new blood vessels, this is known as proliferative D.R., which experts consider a more
advanced stage. The first stage is non-proliferative diabetic retinopathy. The eye may
accumulate fluid during long periods of high blood sugar. This fluid accumulation
changes the shape and curve of the lens, causing vision changes. Due to this, the
person may become blind.
More than 20% of people with diabetes are suffering from diabetic Retinopathy.
India is considered the world’s capital in terms of diabetes, and it is projected that
by 2025, India will have 69.9 million cases of diabetes and by 2030, 80 million
patients, which is around a 266% increase. Diabetes is now the fifth major cause
of blindness around the world. It is the primary cause of blindness among diabetic
patients worldwide [1].
This work aims to make it easier for professionals and patients to check the
severity of diabetic Retinopathy and have a second opinion. To achieve the purpose,
we have built a D-CNN model that can predict the level of D.R. given an image
of the patient’s retina. This paper is organized further as follows: Sect. 2 discusses
the method used and the proposed system. Experimental results are discussed and
analyzed, and future work is explained in Sect. 3. Then, in Sect. 4, Our proposed
work is concluded (Fig. 1 and Table 1).
Fig. 1 A comparison is shown between the vision of an average person and a person with diabetic
retinopathy (Ref. [2])
D-CNN and Image Processing Based Approach … 285
Table 1 Literature survey

Author Accuracy Result/Observations
Dutta et al. [3] Used 3 different algorithms for Classification on an image using
classification on image and 2 DNN achieved the highest
algorithms for classification using accuracy, statistical data achieved
statistical data significantly lower accuracy
Arcadu et al. [4] Used transfer learning with the help Used AUC for individual
of a random forest algorithm to field-specific DCNNs
predict the progression of diabetic
Retinopathy over months of period
Chakrabarty et al. [5] Conversion of RGB image to Classified images into binary
greyscale and DCNN model with 3 classes with an accuracy of
convolutional layers 91.67% training data
Rajalakshmi et al. [6] The images were classified using The A.I. software achieved 95.8%
A.I. D.R. software EyeArtTM on, a sensitivity and 80.2% specificity
smartphone
Verma et al. [7] A median filter was used to remove Data were classified into 3 classes
the noise, and a Random forest was with top accuracy of 90%
used for the classification
Li et al. [8] Used tenfold cross-validation set to 3 expert’s accuracy was also
optimize the model, and measured, which were 93–95%,
InceptionV3 was used for deep and the model’s accuracy reached
transfer learning 94.25%
Athira et al. [9] R-CNN was used, which divide the Binary classifier classified it into
whole image, and only regions of two classes of D.R. and No-DR
interests were used for classification
Jiang et al. [10] Ensemble model which used With the Adaboost algorithm, a
InceptionV3 model was created which
integrates the performance of all
three model
Bajaj et al. [11] A model with InceptionV3 as a base Categorical accuracy of 60.82%
model was used was achieved
2 Proposed Method
2.1 System
For diabetic Retinopathy, the manual detection system is preferred Fig. 2. Due to
the recent development in CNN, its efficiency and accuracy have been significantly
Fig. 2 Current manual system of detecting D.R

286 A. Khan et al.
Fig. 3 Proposed system overview
increased. Therefore, it is used in various Computer vision applications. The basic

idea behind it is that a filter is convolved over an image, and different features are
captured with additional filters, and they are then further processed for our need (as
shown in Fig. 3).
As it can be seen that the existing system requires lots of steps, and it can take a
couple of weeks to complete while on the other hand, our model can finish the same
task in about a few hours with the same results.
2.2 Image Dataset and Preprocessing
The dataset used for this model (Ref. [12]) had 3700 images of the retina of the person
labeled in 5 different classes, namely as 1-No-DR 2-Mild 3-Moderate 4-Proliferate
D.R. 5-Severe D.R. below are the 5 images representing each class of D.R (Fig. 4).
As severity increases and D.R. goes to level 4, white spots on the surface of the
retina are formed, and the objective of our model is to detect these spots and classify
that into different classes of D.R.
For preprocessing of the images, Gaussian blur with some filters is used to remove
the unwanted and unnecessary noise from the image, and that would make our model
much more accurate and more comfortable to classify the images. Cropping on the
Fig. 4 Retinal images of different stages or levels of DR (Ref. [13]). a Level 0: no D.R., b level 1:
mild, c level 2: moderate, d level 3: severe, e level 4: proliferate D.R
Fig. 5 Effect of applying Gaussian blur on the retinal image with some filters
image is also used to align the retina to the center of the image so that the essential
features of the images are on the same spot.
2.3 Gaussian Blur
It is the result of blurring an image by convolving the image with the gaussian
function. The effect of applying gaussian blur on the retinal image can be seen in
Fig. 5. As it is visible, that image becomes much more visible for feature extraction.
It is a low pass filter that removes the high-frequency component of an image, and
the formula for Gaussian blur in 2-D is
1 −(x 2 +y 2 )/(2σ 2 )
G(x, y) = e (1)
2π σ 2
2.4 Model Architecture
The first step in our model is the image preprocessing for training and test images.
The preprocessed images are then fed into the inceptionv3 model. With the help of
transfer learning, the inception model parameters are trained, and then the features we
get from the inceptionv3’s last layer are fed to a fully connected dense layer connected
to a dropout layer to minimize any overfitting that may occur. Finally, the SoftMax
layer takes the previous layer’s input and converts it to a one-hot encoded vector that
can be later then interpreted for the prediction. A diagrammatical representation of
our model’s architecture is shown in Fig. 6.
288 A. Khan et al.
Fig. 6 The architecture of our proposed model
2.5 InceptionV3
It is stating the art convolutional neural network that is 48 layers deep. It consists
of multiple inception netblocks stack on top of each other. With the help of transfer
learning, you can load weights trained on more than 1 million images of 1000 different
categories. This architecture has 11 inception blocks, 5 convolutional layers, two
maxpooling layers, one average pooling layer, and one fully connected layer. The
idea behind transfer learning is that to get the weight of a network that has been
previously trained on millions of images, which may have captured the features that
our model may not be able to capture (Fig. 7).
Fig.7 The architecture of InceptionV3 [14]

Table 2 Comparison with

Paper Accuracy
some already developed
models Wang et al. [13] 63.23
Alban et al. [16] 41.68
Bajaj et al. [11] 60.82
Li et al. [8] 94.25
Proposed system 81.2
3 Result and Discussion
3.1 Performance Metric
A performance metric’s job is to measure how good our model is in doing its supposed
task, and for that, we have used accuracy as our performance metrics, which calculates
the percentages of examples it classified correctly.
1 m
Accuracy = Xi (2)
m i=1
where Xi = 1, if and only if the predicted label yˆ(i) is equal to the true label y(i);
otherwise Xi = 0. The Training Accuracy of the system is 81.2%, and Test Accuracy
is 79.4%.
A comparison is shown of our system with some already existing systems in Table
2. It can be seen that [8] achieved 94.25% accuracy, which is very good compared
to the proposed approach. The reason for our system’s less accuracy maybe that size
of the dataset was smaller. Other than that, our method performed reasonably well
compared to some different approaches discussed in Table 2. As future work, the
performance can be improved by training the model with more data and advanced
algorithms. An application can be made using this model so that the prediction of D.R
can be made by any professional with ease. In the future, the medical data collected
may be of enormous volume and turns to be a big data management problem. Prescrip-
tive and predictive analytics can be the best candidate for dealing with massive data
management [15] and will help generate user-specific information (Fig. 8).
4 Conclusion
Multiple states of the art DNN architecture like InceptionV3, VGG net, and ResNet
with transfer learning are implemented in this paper. Gaussian blur is used for prepro-
cessing, which helps in improving the performance of the system. The proposed
system reached 81.2% accuracy. The system is compared with other states of the art
systems, as shown in Table 1. It can be seen that the proposed system performed
290 A. Khan et al.
Fig. 8 Accuracy and loss of our model
well, but compared to [8], our system got less accuracy. The reason could be the size
of the dataset and image size. As future work, the performance can be improved by
training the model with more data and advanced algorithms.
References
1. Pandey, S., Sharma, V.: World diabetes day 2018: Battling the emerging epidemic of diabetic
retinopathy. Ind. J. Ophthalmol. 66(11), 1652 (2018)
2. topconhealth.com: Diabetic retinopathy: an eye disease with 4 stages. https://www.topconhea
lth.com/diabetic-retinopathy-an-eye-disease-with-4-stages/
3. Dutta, S., Manideep, B.C., Basha, S.M., Caytiles, R.D., Iyengar, N.C.: Classification of diabetic
retinopathy images by using deep learning models. Int. J. Grid Distrib. Comput. 11(1), 99–106
(2018). https://doi.org/10.14257/ijgdc.2018.11.1.09
4. Arcadu, F., Benmansour, F., Maunz, A., Willis, J., Haskova, Z., Prunotto, M.: Deep learning
algorithm predicts diabetic retinopathy progression in individual patients. Npj Digit. Med. 2(1)
(2019). https://doi.org/10.1038/s41746-019-0172-3
5. Chakrabarty, N.: A deep learning method for the detection of diabetic retinopathy. In: 2018 5th
IEEE Uttar Pradesh Section International Conference on Electrical, Electronics and Computer
Engineering (UPCON) (2018). https://doi.org/10.1109/upcon.2018.8596839
6. Rajalakshmi, R., Subashini, R., Anjana, R.M., Mohan, V.: Automated diabetic retinopathy
detection in smartphone-based fundus photography using artificial intelligence. Eye 32(6),
1138–1144 (2018). https://doi.org/10.1038/s41433-018-0064-9
7. Verma, K., Deep, P., Ramakrishnan, A.G.: Detection and classification of diabetic retinopathy
using retinal images. In: 2011 Annual IEEE India Conference (2011). https://doi.org/10.1109/
indcon.2011.6139346
8. Li, F., Liu, Z., Chen, H., Jiang, M., Zhang, X., Wu, Z.: Automatic detection of diabetic
retinopathy in retinal fundus photographs based on deep learning algorithm. Transl. Vis. Sci.
Technol. 8(6), 4 (2019). https://doi.org/10.1167/tvst.8.6.4
9. Athira, T.R., Sivadas, A., George, A., Paul, A., Gopan, N.R.: Automatic detection of diabetic
retinopathy using R-CNN. Int. Res. J. Eng. Technol. (IRJET) 5595–5600
10. Jiang, H., Yang, K., Gao, M., Zhang, D., Ma, H., Qian, W.: An interpretable ensemble deep
learning model for diabetic retinopathy disease classification. In: International Conference of
IEEE Engineering in Medicine and Biology Society (EMBC) (2019)
11. Bajaj, R., Kulkarni, N., Garg, S.: Diabetic retinopathy stage classification. SSRN Electron. J.
(2020). https://doi.org/10.2139/ssrn.3645460
12. Diabetic retinopathy resized from: https://www.kaggle.com/sohaibanwaar1203/diabetic-retino
pathy-full
13. Wang, X., Lu, Y., Wang, Y., Chen, W.: Diabetic retinopathy stage classification using convo-
lutional neural networks. In: 2018 IEEE International Conference on Information Reuse and
Integration (IRI) (2018). https://doi.org/10.1109/iri.2018.00074
14. Milton-Barker.: Inception-v3-deep-convolutional-architecture-for-classifying-acute-
myeloidlymphoblastic. (2019)
15. Deshpande, P.S., Sharma, S.C., Peddoju, S.K.: Predictive and prescriptive analytics in big-data
era. In: Security and Data Storage Aspect in Cloud Computing. Studies in Big Data, vol. 52,
pp. 71–81. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-6089-3_5
16. Alban M., Gilligan T.: Automated detection of diabetic retinopathy using fluorescein angiog-
raphy photographs. Stanford Tech. Rep. (2016)
Pothole Detection Using YOLOv2 Object
Detection Network and Convolutional
Neural Network
R. Sumalatha, R. Varaprasada Rao, and S. M. Renuka Devi
Abstract Bad road conditions, such as cracks and potholes, can cause passenger
discomfort, vehicle damage, and accidents. Condition of roads indirectly effects on
growth of the country. Hence, there is a need for such a system that can detect potholes.
It would allow vehicles to issue alerts to identify potholes so that drivers can reduce
the speed and avoid them and make the ride smooth. Many researchers had developed
various algorithms to become aware of potholes on roads. In this paper, the proposed
system detects the potholes using You Only Look Once version 2(YOLOv2) and a
convolutional neural network (CNN). The predefined CNN, namely resnet50, is used
to extract the features of testing images and training images. Kaggle data set is used
to evaluate the proposed algorithm. The experimental results are evaluated in terms
of precision rate and recall rate. The proposed approach precision rate is 94.04% for
test images.
Keywords Potholes · YOLOv2 · CNN · Precision rate · Recall rate
1 Introduction
In rainy season, the roads are occupied with flooded water, so it is not easy to identify
the potholes underwater. The existence of potholes on roads is a significant issue for
road accidents. By setting up a pothole detection method in vehicles, the accidents
are minimized and enhance drivers’ security. Detection of potholes is a difficult
task compared with object detection like signboards, cars, pedestrians, etc., because
potholes have a wide range of geometrics. In our daily life, manual detection of
R. Sumalatha (B)
Vardhaman College of Engineering, Hyderabad, Telangana, India
e-mail: r.sumalatha@vardhaman.org
R. V. Rao
St. Peters Engineering College, Hyderabad, Telangana, India
S. M. R. Devi
G. Narayanamma Institute of Technology & Science for Women, Hyderabad, Telangana, India
294 R. Sumalatha et al.
potholes is a significant problem, and it also consumes more time. This paper proposes
a pothole detection system using YOLOv2 and a convolutional neural network.
Youngtae Jo et al. had suggested a new pothole detection method with a black box
camera [1]. Artis Mednis et al. discussed accelerometer data-based pothole detection
using smartphones [2]. Kulwant Singh et al. discussed real-time pothole detection
using image processing techniques [3]. Rui Fan et al. contributed a disparity trans-
formation algorithm for pothole detection [4]. Hsiu-WenWang et al. have proposed
a real-time pothole detection method based on mobile sensing techniques [5]. Lim
Kuoy Suong et al. proposed a method for detecting potholes using CNN [6]. Kashish
Bansal et al. proposed a machine learning-based pothole detection system to pinpoint
potholes present on roads [7]. Kiran Kumar et al. developed a pothole detection and
depth estimation using laser for intelligent vehicle systems [8]. Hadistian Muhammad
Hanif et al. proposed a proximity sensor-based pothole detection system for vehicles
to avoid accidents on roads [9]. Aditya Anand et al. proposed safe driving by setting
maximum speed limit and pothole detection for vehicles and sending information to
drivers using GPS technology [10]. Zhaojian Li et al. describes a pothole detection
based on a multiphase model [11].
This paper is organized as follows: the literature survey is discussed in Sect. 2. The
proposed work is explained in Sect. 3. Section 4 presents the proposed pothole detec-
tion system. The experimental results are discussed in Sect. 5. Section 6 discusses
the conclusion of the work presented in this paper.
2 Proposed Model
The proposed model is as shown in Fig. 1.

• Divide the dataset into training data, validation data, and test data.
• Construct a YOLOv2 Object Detection Network
• A predefined network, ResNet-50, is used for feature extraction.
• Add different combinations to the training data using data augmentation.
• Evaluate the pothole detector using precision and recall rate.
2.1 Database
The pothole dataset consists of two folders, namely normal and potholes. ‘Normal’
contains 352 images of smooth roads from different angles, and ‘Potholes’ includes
329 images of streets with potholes in them. In this paper, only pothole images are
used for pothole detection [12]. Figure 2 shows sample pothole images.
Pothole Detection Using YOLOv2 Object Detection Network … 295
Fig. 1 Proposed system
Fig. 2 Sample Pothole database images
2.2 YOLOv2 Object Detection Network
YOLOv2 is more precise and faster than YOLO [13]. The YOLOv2 architecture is
as shown in Fig. 3. It consists of batch-normalization and anchor-boxes.
Fig. 3 YOLOv2 Object Detection Network
Fig. 4 Resnet50 architecture
2.3 Resnet50
In this paper, the Resnet50 CNN model is used to detect the potholes on the road. The
architecture of Resnet50 CNN consists of Convolutional, pooling, Rectified Linear
Unit (ReLU), and Fully Connected layers. The Resnet50 architecture is shown in
Fig. 4.
2.4 Convolutional Layer
The input color images are fed to the convolutional layer. This layer extracts the
features from input images. The first convolutional layer extracts the low-level
features, and the next layers extract the middle and high-level features from the
images. The first layer performs a convolution operation between the input image
and filter to produce a feature map. This feature map is given as input for the next
layer.
2.5 Pooling Layer
The pooling layer reduces the dimensionality of the feature map. In this category,
there are two-layer options:
1. Max pooling
2. Average pooling.
Max Pooling finds the maximum value from the part of an image enclosed by the
filter. An average Pooling computes the average of all the values from the part of an
image enclosed by the filter.
2.6 ReLU Layer
ReLU is a non-linear operation. This layer removes every negative value from the
filtered images and replaces it with the zero’s.
f (x) = x i f x > 0
= 0if x < 0 (1)
2.7 Fully Connected Layer
Fully Connected Layer (FC) establishes a connection between each filter in the earlier
layer to every filter in the subsequent layer. FC layer provides the feature map to the
softmax activation function for classification.
2.8 Softmax Activation Function
The softmax activation function is used to get probabilities of the input being. Finally,
the obtained possibilities of the object in the image belonging to the different classes.
In this paper, 329 pothole images are used for evaluating the performance of the
proposed method in terms of precision rate and recall rate. To create a YOLOv2
pothole detector network, the mini-batch size is equal to 16, the initial learning
rate is 0.001, and maximum epochs are 20 chosen for the training process. Figure 5
shows the experimental results of the proposed method. Table1 provides a qualitative
comparative analysis of the proposed method. The proposed method achieves a 94.04
precision rate and 0.1181 recall rate. The precision rate and recall rate formulas are
given below.
Fig. 5 Experimental results of potholes detection using YOLOv2 and CNN

Fig. 5 (continued)
Table 1 Table captions

Algorithm No of potholes Precision rate Recall rate
should be placed above the
tables [4] 79 98.15 0.7709
[6] 203 82.43 0.8372
Proposed 329 94.04 0.1181
T r ue Positive
Pr ecision = (2)
T r uepositive + False Positive
T r ue Positive
Recall = (3)
T r ue Positive + FalseN egative
4 Conclusions
In this paper, we propose a pothole detection system using YOLOv2 and CNN.
The proposed approach achieves a 94.04% precision rate on test data. In this paper,
resnet50 pre-trained network is used to extract features from images. The proposed
system efficiently detects the pothole condition using the resnet50 CNN model.
References
1. Jo, Y., Ryu, S.: Pothole detection system using A black-box camera. Sensors 15, 29316–29331
(2015)
2. Mednisy, A., Strazdins, G., Zviedris, R., Kanonirs, G., Selavo, L.: Real time pothole detection
using android smartphones with accelerometers. IEEE Conf. (2011)
3. Singh, K., Hazra, S., Chandramukherjee, S.G., Gowda, S.: Iot based real time potholes detection
system using image processing techniques. Int. J. Sci. Technol. Res. 9(02), 785–789 (2020).
Issn 2277–8616
4. Fan, R., Ozgunalp, U., Hosking, B., Pitas, M.L.: Pothole detection based on disparity
transformation and road surface modeling. IEEE Trans. Image Process. 1–12 (2019)
5. Wang, H.-W., Chen, C.-H., Cheng, D.-Y., Lin, C.-H., Lo, C.-C., A real-time pothole detection
approach for intelligent transportation system. Math. Prob. Eng. 2015, 1–7
6. Suong, L.K., Jangwoo, K.:Detection of potholes using a deep convolutional neural network. J.
Univ. Comput. Sci. 24(9), 1244–1257
7. Bansal, K., Mittal, K., Ahuja, G., Singh, A., Gill, S.S.: Deepbus: machine learning based
real time pothole detection system for smart transportation using Iot. Internet Technol. Lett.
3(E156), 1–6 (2020)
8. Vupparaboina, K.K., Tamboli, R.R., Shenu, P.M., Jana, S.: Laser-based Detection and Depth
Estimation of Dry and Water-Filled Potholes: A Geometric Approach. IEEE (2015)
9. Hanif, H.M., Lie, Z.S., Astuti1, W., Tan, S.: Pothole detection system design with proximity
sensor to provide motorcycle with warning system and increase road safety driving, In: The
3rd International Conference on Eco Engineering Development IOP Conference Series: Earth
and Environmental Science, vol. 426, pp. 1–9 (2020). IOP Publishing
10. Anand, A., Gawande1, R., Jadhav, P., Shahapurkar, R., Devi, A., Kumar, N.: Intelligent vehicle
speed controlling and pothole detection system. In: E3S Web of C onferences 170, EVF’2019,
pp. 1–5 (2020)
11. Li, Z., Kolmanovsky, I., Atkins, E., Jianbo, L., Filev, D.: Road anomaly estimation: model
based pothole detection. In: American Control Conference Palmer House Hilton, 1–3 July
2015, Chicago, IL, USA, pp. 1315–320
12. https://www.Kaggle.Com/Atulyakumar98/Pothole-Detection-Dataset
13. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: 2017 IEEE Conference on
Computer Vision And Pattern Recognition (Cvpr). IEEE (2017)
A New Machine Learning Approach
for Malware Classification
G. Shruthi and Purohit Shrinivasacharya
Abstract Cloud Computing provides a lot of data to be shared and secured with the
Cloud Service Provider and the Client. Accessing the Cloud Environment is through
the Internet. Any data leak or poor network configuration will lead to Malware
entry into the Cloud Computing Environment. In this scenario, accessing the Cloud
Environment is easier for Malware from both—outside and inside. In this paper,
a new learning machine approach is used effectively where significance is given
to data analysis, feature engineering, and modeling. This way, it helps us quickly
differentiate actual file and malware type based on the characteristics before entering
into the Cloud Environment. The designed system is tested and results are tabulated
with performance metrics. Our system gives the best results with a multi-level log
loss with 0.03% error and 97% accuracy acceptable in the malware system.
Keywords Machine learning · Classification · Performance · Malware · Cloud ·

Training · Test algorithm
1 Introduction
Cloud computing involves hosting services over the internet. The three types of
services are categorized into Platform as a Service (PaaS), Infrastructure as a Service
(IaaS), and Software as a Service (SaaS). PaaS is used to host applications such as
Amazon web services, Google, etc., for virtual computing, storage, and computing
stacks, known for its on-demand services, for example, Simple Storage Service (S3)
and Elastic Cloud Computing (EC2). Applications can be developed in an IaaS
with a scalable environment and additional networking, storage caching, and content
delivery. Security in cloud computing can be classified into two categories: security
G. Shruthi (B)
Department of Computer Science and Engineering, Siddaganga Institute of Technology,
Tumakuru, Karnataka, India
P. Shrinivasacharya
Department of Information Science and Engineering, Siddaganga Institute of Technology,
Tumakuru, Karnataka, India
302 G. Shruthi and P. Shrinivascharya
issues faced by cloud providers and their customers. Insecure Interfaces, API, Data
Loss & Data Leakage, and Hardware Failure, are the top three threats in the cloud
with 29, 25, and 10% accountability of all cloud security outages, respectively [1, 2].
Hackers will try to gain access to the cloud and control a large amount of information
through a single attack. This is popularly called “hyper jacking”. Examples of hyper
jacking are the iCloud 2014 leak and Dropbox security breach.
Hackers can breach this information, hack millions of passwords and private data,
and make money by bitcoin [3]. According to a recent study, insider attack is the
biggest threat in the cloud computing environment. Cloud service providers should
check the physical access to the server in the data center to avoid any suspicious
activity. Proper data isolation and logical data segregation and virtualization should
be taken care of against Malware, data leakage, and exploited vulnerabilities of threats
causing cloud outages. The paper organization is as follows. Section 2 summarizes
various methods and work done on the classification of Malware. In Sect. 3, the
distribution of malware classes in the data set is analyzed. In Sect. 4, the technique
used in classifying Malware is discussed. In Sect. 5, experimentation on the datasets
is conducted, and the result is analyzed. Finally, a conclusion is drawn in Sect. 6.
2 Background
In this section, research work for classification and detection of Malware is focused.
In the paper [4, 5] author has discussed static and dynamic approaches to malware
analysis. Static and string analysis is used to develop a detection system with inter-
pretable and semantic strings extracted from API execution calls [6]. They also use
the Super Vector machine ensemble approach to construct detectors. The author has
also discussed a way of classifying Malware with the technique of text categoriza-
tion. In paper [7] also text categorization procedures for malware classification are
discussed. In this paper, the author has tried to extract all n-grams from the train
data set. The highest 55,500 features are consistent with their frequency score to
which selection method like “Fisher score” is applied. After this, they tried various
machine learning algorithms on the obtained results. Mostly used trained algorithms
are Support Vector Machines, Bayes algorithms, Decision trees, and Artificial Neural
Networks. The paper “Detection of Malicious Code by Applying Machine Learning
Classifiers on Static Features” by Shabtai [8] focuses on techniques such as “Fisher
score”, “document frequency”, “Gain ratio”, and “hierarchical feature selection” and
“feature classification algorithms”. (Fisher Score, Document Frequency, Gain Ratio,
and Hierarchical Feature Selection and Feature Classification Algorithms). In [9], a
different n-gram model using different classifiers such as Bays, IB, Decision Trees,
and Random Forests is proposed.
To reduce feature space, class-wise document frequency is also used if the
extracted n-grams count is enormous. N-gram features were extracted from range
one to eight. KNN with the Euclidean distance as metric is also reported as a linear
clustering algorithm for detecting Malware [10]. GIST features were extracted from
A New Machine Learning Approach for Malware Classification 303
images (grayscale) of binary content. Their algorithm applies agglomerative hier-

archical clustering on prototypes to avoid a large volume of data. Shabtai et al.
[8] conducted many experiments to identify the best term description, find the n-
gram size, feature selection method, and top n-grams and evaluate various machine
learning algorithms’ performances. In the papers [11–13], detection, feature extrac-
tion, feature selection, and classification steps are discussed by the authors, but the
essential features like entropy, opcode, and API are missed.
3 Methodology
3.1 Data Set
Our method is evaluated on given data set by malware classification challenge posted
on the Kaggle website by Microsoft [1–3, 14]. To correctly classify malware samples,
the system uses only .asm files. It also classifies as many as possible categories of
features from the extensive dataset collection [15] (21,736 file types). The data set
has a group of known malware files representing a mixture of nine different families.
Each malware file has a unique identifier (Id), a hash value (20 characters), a class
that uniquely identifies the file, and an integer representing one of nine family names
to which the Malware may belong. The nine families of Malware are as follows: 1.
Lolli-pop 2. Ramnit 3. Vundo 4. Simda 5. Obfuscator ACY 6. Kelihos_ver3 7. Tracur
8. Kelihos_ver1 9. Gatak.
The metric to be evaluated for its performance is the logarithmic loss and confusion
matrix (accuracy). For every file, we generated a group of predicted probabilities (one
for each class), and therefore, the log loss of the model is as follows:
1
N M
logloss = − yi j log pi j , (1)
N i=1 j=1
where N = Files within the test set, M = the number of labels (in class), log = The
Natural logarithm, yij = 1 if observation i is in class and j = 0 if the observation
is not in the class, and predicted probability of the observation i is in class j [14].
Confusion matrix or error matrix computes the performance of a classifier on a given
test data. It provides the visualization of performance measures and counts every
class of correct and incorrect predictions.
Fig. 1 Distribution of malware class in .asm data
4 Data Preparation
To control the database samples, a python script is completed. Our objective

is to predict every data-point’s probability of nine malware classes using multi-
classification problems. We separated the first database into three parts; train, cross-
validation, and test with 64, 16, and 20% of knowledge, respectively, to enhance
our model’s effectiveness. The first task functioned on *.byte files. System separates
*.byte files from *.asm files, as we would do all the modeling and data analysis on
those files separately. To verify if our dataset was balanced, it was vital to know the
frequency of every class. It is shown on the subsequent histogram.
The next task is to feature engineering on *.byte files. The primary feature to be
considered is the size of the file. Box plot, which displays the variety of file sizes
per class, is one way to know if the dimensions are a useful feature/variable or not.
File size has some usefulness, which helps us detect and differentiate between some
classes and predict malware classes (Fig. 1).
5 Data Preprocessing
Each malware file contained assembly-level code, and 52 crucial features were
extracted with parallel processing on all the .asm files. Features of assembly-level
files were extracted by preprocessing the Header and opcode fields from the code
section. We need to create numerical feature vectors out of the existing structure
to use this data in the classification process. In this work, the extraction process
was done on .asm files by randomly distributing the files into a separate folder. The
different static properties such as opcodes, bytes, segments, prefixes, a function call,
and Malware API were collected. The .asm files contain many instructions at the
assembly level, but only the vital occurrence of prefixes and keywords and opcode
Fig. 2 Results of normalized data set (.asm file)
were counted with the “bag of words” model and thread processing. These item
counts are important in the instruction, which will define the behavior of Malware.
The following prefixes in the segments give the best values to count their occurrence
in the .asm files: HEADER, text, idata, pav, data, bss, e-data, r-data. The following
best opcodes are used to get the best result to count the occurrence of the opcode.
Jmp, mov, retf, push, xor, pop, nop, retn, sub, or, ror, rol, jnb, rt.
Similarly, registers such as edx, eip, esp, esi, eax, ebx, ecx, edi, and ebp are the best
for the count. The extracted feature is available as binary data. The machine learning
model accepts numeric data, so we converted the output of this feature extracted data
into a malware file vector representation. The algorithm shows the iterative method
of removing the features from .asm files. This experiment was run on an Intel 2 GHz
processor with 8 GB RAM to extract the features mentioned earlier from the .asm
files. The time taken to extract the unigram features from 150 GB data of .asm files
is 48 h. The thread processing concept was applied to perform this operation by
splitting the .asm files into five folders and the data of each folder file was assigned
to the thread. Each thread would count .asm files and the occurrences of prefixes.
Algorithm 1 shows each thread process of collecting features from each .asm file.
The resulting data frame is as follows (Fig. 2):
Algorithm 1. Extraction of Feature for .asm files
Purpose: To extract the important features from the .asm file

Input: Prefixes, Opcodes, Keywords, Registers
Output: Total Count of Prefixes, opcode, keyword, Registers in vector format.
START
STEP 1: Read the .asm file
STEP2: [To find the count the prefixes, opcodes, keywords, registers in eve-
ry line] Append each count to array Feature [] Write array value to
an output file Loop until end of file
STEP 3: Output the result]
Save the output in asmoutputfile.csv
STOP
6 Machine Learning Model
6.1 Classification
The malware classification is performed using machine learning algorithms because

they provide higher efficiency than the heuristic-based analysis. The training process
involves building a model and making predictions on a given input data set. The
training model’s output depends on the data used and the way training is given to
a model. From a machine learning point of view, malware detection is a problem
of classification or clustering. The classification involves training a model on the
massive dataset of malicious files that can reduce this problem with known or already
classified malware families.
6.2 Classification Model
The first machine learning model used is the k-nearest neighbor algorithm (KNN)
method for classification. In this model, X is training data, and Y is a target value.
It has to predict the class labels for the provided data and return the probability
estimates for the test data X. It gave a log loss = 0.137(on test data) and the number
of misclassified points as 2.25. The second model is Logistic Regression, which
produces results in a binary format and predicts the outcome, which is a discrete
variable. It gave a log loss for test data as 0.69 and the number of misclassified
points as 14.2. The latest algorithm used is Random Forest Classifier, which is a
multi-object classifier. Its accuracy is high, and training time is less. Even though
large data is missing, it maintains its accuracy. It gave a log loss for test data as 0.034
and the number of misclassified points as 0.597. The system will be trained on the
training data, optimize and tune its hyperparameter using the cross-validation data,
and eventually, we can compute its log loss and accuracy using the test data. The
effectiveness of the approach is measured with the results in the table given below
(Table 1).
Table 1 Table log-loss measure of all the three classification algorithms

Test/algorithm KNN algorithm Logistic regression Random forest
Train data 0.05919525589361 0.6739563901316 0.019491389585678
Cross validate data 0.17759066815017 0.6917739497022 0.05099454127585
Test data 0.13750783501141 0.6962483779350 0.03424590999589
Number of misclassified 2.2539098436062 14.213431462741 0.5979760809567
points
The graph shows the results obtained by three different models: KNN, logistic
regression, and random forest on the trained system and wont to predict malware
classes (Figs. 3, 4).
Fig. 3 Log loss of the different classification results
Fig. 4 Cross-validation error per hyperparameter alpha

Fig. 5 The Malware class prediction model
6.3 Exploratory Data Analysis
The final model we chose is an optimized one. It is an XGBoost Gradient machine

learning algorithm. It gives fast and accurate results. A different hyperparameter was
used to test our results. We tried with the best parameters. The log loss for train and
test data is 0.0173 and 0.033, respectively; therefore, XGBoost is the final model
with very low test loss and works fine with unseen data.
The trained model is ready to test the given train data to classify malware class.
An axiom was given as an input to test the malware class present in (Fig. 5).
The above data set shows that our models correctly predict the Malware as class
1.
7 Conclusion
This paper presents a multiclass classification process on Malware from modeling

to feature engineering. Four machine learning algorithms were compared to predict
the malware class from a massive collection of datasets. The system extracted the 52
assembly features for classification. The system is trained with various classification
models with training and test data. Finally, it is found that XGBoost is the best to
fit as the final model with less error log loss compared to other models, as already
discussed. In future work, fisher score and n-grams techniques can be used along
with some deep learning methods to improve the prediction and classification of
Malware.
References
1. Deshpande, P., Sharma, S.C., Sateesh Kumar, P.: Security threats in cloud computing. In:
International Conference on Computing, Communication & Automation, pp. 632–636 (2015)
2. Deshpande, P., Sharma, S.C., Peddoju, S.K., et al.: Security and service assurance issues in
Cloud environment. Int. J. Syst. Assur. Eng. Manag. 9, 194–207 (2018). https://doi.org/10.
1007/s13198-016-0525-0
3. Deshpande, P., Sharma S.C., Peddoju S.K.: Data storage security in cloud paradigm. In: Pant,
M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference
on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol.
436, pp.247–259 (2016). https://doi.org/10.1007/978-981-10-0448-3_20
4. Gibert, D., Mateu, C., Planes, J.: The rise of machine learning for detection and classification
of malware: research developments, trends and challenges. J. Netw. Comput. Appl. 153 (2020)
5. Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction,
selection and fusion for effective malware family classification. In: Proceedings of the Sixth
ACM Conference on Data and Application Security and Privacy, pp.183–194 (2016)
6. Ye, Y., Chen, L., Wang, D., Li, T., Jiang, Q., Zhao, M.: SBMDS: an interpretable string based
malware detection system using SVM ensemble with bagging. J. Comput. Virol. (2), 283 (2009)
7. Shabtai, A., Moskovitch, R., Elovici, Y., Glezer, C.: De-tection of malicious code by applying
machine learning classifiers on static features: a state-of-the-art survey. Information security
Technical Report 14, no. 1, pp. 16–29 (2009)
8. Shabtai, A., Moskovitch, R., Feher, C., Dolev, S., Elovici, Y.: Detecting unknown malicious
code by applying classification techniques on opcode patterns. Secur. Inform. 1(1), (2012)
9. Jain, S., Meena, Y.K.: Byte level n–gram analysis for malware detection. In: International
Conference on Information Processing, Springer, Berlin, Heidelberg, pp. 51–59 (2011)
10. Nataraj, L., Karthikeyan, S., Jacob, G., Manjunath, B.S.: Malware images: visualization and
automatic classification. In: Proceedings of the 8th International Symposium on Visualization
for Cyber Security, pp. 1–7 (2011)
11. Naderi, H., Vinod, P., Conti, M., Parsa, S., HadiAlaeiyan, M.: Malware signature generation
using locality sensitive hashing. In: International Conference on Security & Privacy, pp.115–
124. Springer, Singapore (2019)
12. Ye, Y., Li, T., Adjeroh, D., Iyengar, S.S.: A survey on malware detection using data mining
techniques. ACM Comput. Surv. (CSUR) 50(3), 1–40 (2017)
13. Grini, S., Shalaginov, A., Franke, K.: Study of soft computing methods for large-scale multi-
nomial malware types and families detection. In: Proceedings of the 6th World Conference on
Soft Computing (2016)
14. Microsoft malware classification challenge (big 2015) https://www.kaggle.com/c/malware-cla
ssification (2017). Accessed 30 Sept 2019.
15. Bazrafshan, Z., Hashemi, H., HazratiFard, S.M., Hamzeh, A.: A survey on heuristic malware
detection techniques. In: The 5th Conference on Information and Knowledge Technology,
pp. 113–120. IEEE (2013)
Analysis of Feature Selection Techniques
to Detect DoS Attacks Using Rule-Based
Classifiers
Atharva Vaidya and Deepak Kshirsagar
Abstract Denial of Service (DoS) attacks are emerging as a security threat, which,
when ignored, may result in enormous losses for the organizations. Such attacks
lead to the unavailability of the services provided by the organizations to legitimate
users. The detection of such attacks with lower computation and minimization of
errors is an ongoing research area. This paper focuses on analyzing different feature
selection methods for feature selection in the detection of DoS attacks. The analysis
of feature selection methods provides relevant and noisy feature subsets based on
the score obtained by each method. The obtained relevant feature subset is tested on
the CICIDS-2017 DoS dataset and achieves higher accuracy of 99.9591% with the
PART classifier.
Keywords Denial of service (DoS) · Rule-based classifiers · Feature selection ·

Noisy features
1 Introduction
As technology moves forward at a fast rate, many devices are being connected to the
internet. The dependence on remote servers is increasing substantially. It is becoming
crucial that these servers should be available on-demand basis. Small downtime
duration may incur huge losses to organizations.
Denial of Service (DoS) is one such type of attack [1] that might result in the
unavailability of services provided by the organizations when performed on an unse-
cured network. DoS attacks leverage the resource handling vulnerabilities due to
logical and programmatical errors in handling network packets. A robust system
is needed to detect and prevent such attacks and ensure that these services remain
available to the end-users.
Recently, DoS attacks have been revolutionalized to become a massive threat
to large businesses and governments. Botnets or IoT devices act as one common
A. Vaidya (B) · D. Kshirsagar

Department of Computer Engineering and IT, College of Engineering Pune, Pune, Maharashtra,
India
e-mail: vaidyaan19.comp@coep.ac.in
312 A. Vaidya and D. Kshirsagar
platform for DoS attacks. Recently, a DoS attack was performed on Amazon Web
Services (AWS) [2], a multi-purpose cloud computing platform, using a specific
technique called Connectionless Lightweight Directory Access Protocol” (CLDAP)
Reflection. The network traffic was generated from these vulnerable CLDAP servers
and was amplified by 56–70 and performed for over three days.
Feature selection in intrusion detection plays a dominant role by drastically
decreasing the build-up time for any Machine Learning model. It helps in removing
unnecessary features and reduces the complexity of the model. Reduced features
result in a simpler model, thereby decreasing the computational time required to
detect DoS attacks. Feature selection algorithms are mainly differentiated into three
types, namely Filter, Wrapper, and Embedded. The former method returns a subset of
features based on the data’s generic characteristics, with lower computational over-
head, but are comparatively inaccurate. Wrapper-based uses a pre-trained learning
algorithm’s probabilistic accuracy to give the quality output for selected features.
Information Gain (IG) is a type of filter method [3] generally used to select
features that help measure the significance of the dataset attribute. Gain Ratio (GR)
is an extension of the information gain algorithm [4] that resolves the bias problem
towards features with a broader set of values. The correlation Feature Selection
method (CFS) [5] implements a search algorithm and a function to evaluate feature
subsets’ merit. Mutual Information Feature Selection (MIFS) correlates two random
variables [6] in terms of symmetry. ReliefF is an extension to the Relief algorithm,
which overcomes the drawbacks of the former algorithm, which cannot deal with
incomplete data and is limited to two-class problems.
Multiple types of classifiers exist to solve a variety of kinds of problems [7]. Each
type of classifier is used to solve a specific type of situation. The contribution of our
work is as follows:
1. This work analyses different feature selection methods for the detection of DoS
attacks at the application layer.
2. This work obtains relevant feature subsets based on the particular feature
selection method’s score and is tested on the CICIDS-2017 DoS dataset.
Section 2 of the paper gives an overview of the literature. Section 3 describes the
proposed methodology. Section 4 talks about implementing the proposed model and
its results, and finally, Sect. 5 concludes the paper.
2 Related Work
Obaid et al. [8] described the DoS attacks at different layers of the open systems
interconnection (OSI) model in detail. According to the authors, HTTP GET and
POST are popular in the OSI Model application layer. DoS attacks at the presentation
layer and session layer include Security Sockets Layer (SSL) misuse and telnet DoS
Attacks. Transportation layer DoS attacks have a transport layer protocol. Attacks in
this layer can be further classified as flooding and desynchronization attacks. Ping of
Analysis of Feature Selection Techniques … 313
death, smurf, internet control message protocol (ICMP) flood are common attacks in
the Network Layer. Unfairness, collision, and exhaustion are DoS attacks occurring
in the data link layer.
Mohammadi et al. [9] proposed a novel multivariate-mutual-information-based
feature selection method to select essential intrusion detection features. The selected
features are used to build a model using the least-square-SVM algorithm. The
proposed model outperforms other approaches that use MIFS, modified MIFS, and
linear-LCFS techniques. The analysis is carried out on KDD Cup ‘99 along with
Kyoto 2006 + and NSL-KDD datasets.
Dua et al. [10] proposed an ensemble classifier approach for the detection of intru-
sions. The model proposed by the authors is a combination of a random tree (RT),
REP Tree, k-nearest neighbors (KNN), J48 Graft, and random forest (RF) classi-
fiers. The model is validated on NSL-KDD. 99.72% accuracy is achieved for binary
classification, and 99.68% accuracy is achieved for multiple-class classification.
Umar et al. [11] (2020) propose a hybrid IDS Model which uses a wrapper-based
feature selection algorithm with a Decision Tree used as a feature evaluator. ANN,
SVM, KNN, RF, and NB models are built using the selected features. These models
are compared with baseline models, which are built using all features. The proposed
model achieves a detection rate of 97.05% when built with RF.
Ployphan et al. [12] proposed a hybrid intrusion detection model using the Adap-
tive AdaBoost classifier. It uses a correlation feature reduction technique, and a
hybrid classifier includes MLP, KNN, C4.5, linear discriminant analysis (LDA), and
support vector machines (SVM) using the concept of adaptive boosting. The model
is tested on UNSW-NB15, NSL-KDD, and KDD Cup ‘99 datasets and achieves an
accuracy of 99.96%.
Azar et al. [13] proposed an intrusion detection model based on selection methods
of a specific feature combination. These methods include GR, CFS, and IG. The target
system uses KNN, naive bayes (NB), and multilayer perceptron (MLP) for intrusion
detection. Validation of the proposed model is carried out on KDD Cup ‘99 dataset
and achieves a detection rate of 98.9%.
Taha et al. [14] proposed a lightweight Intrusion Detection System (IDS) for DoS
attacks, which uses IG and CFS for feature selection and employs C4.5, NB, RF,
REP Tree classifiers. Features for detecting DoS attacks are reduced to 9 from 41.
The proposed model has been validated on KDD Cup ‘99 dataset and results in a
99.6% detection rate.
Pattawaro, A. and Polprasert, C. [15] proposed a network IDS using attribute
ratio (AR) as a feature reduction strategy with a threshold value of 0.01. The reduced
feature subset is given to k-means clustering and the XGBoost classification algo-
rithm. The model is validated on the NSL-KDD dataset. Accuracy and true positive
rate are equal to 84.41% and 86.36%, respectively. However, FPR is 18.20%, and
the area under the curve is 0.84.
Aljawarneh et al. [16] proposed hybrid IDS using feature reduction analysis. IG
and the concept of the vote are used for feature selection. It uses a hybrid classifier
with a combination of RT, REPTree, J48, AdaBoost - M1, NB, decision stump, and
Meta-Paging. The model results in 99.9% of accuracy on NSL-KDD for DoS attack
detection.
Pullagura et al. [17] proposed an IDS with a robust feature reduction technique.
The system uses a combination of feature selection techniques using euclidean
distance, CFS, and chi-square. SVM is used to train the model. The model is vali-
dated on the KDD Cup ‘99 dataset. The proposed method reduced the features from
41 to 5. Further, the system has an accuracy of 96.25%, precision of 80.20%, and
recall of 78.96%.
Most of the research is carried upon packet-based datasets—KDD Cup ‘99
and NSL-KDD, which contains network and transport layer attacks. However, the
CICIDS-2017 DoS dataset contains Layer-7 (Application Layer) attacks. It encour-
ages us to apply different feature selection techniques to make an effort to reduce
the count of features required to build models without reducing the accuracy.
3 Proposed Model
In this section, we present an overview of the Machine Learning architecture

components for detecting DoS attacks. The proposed model for the detection of
DoS attacks is as shown in Fig. 1. CICIDS-2017 DoS dataset is a labeled flow-
based dataset containing layer 7 (Application layer) DoS attacks, namely Slowloris,
SlowHTTPTest, Hulk, and GoldenEye. It includes 77 features generated by the
CICFlowMeter tool [18]. The dataset contains noise [19] like infinity and N/A values,
which degrade the model’s efficiency. Hence, the infinity and N/A values are replaced
with appropriate large finite values and zeros in the data preprocessing stage.
The feature selection methods—IG, GR, CFS, SU, MIFS, and ReliefF assign
appropriate weights to all the dataset features. The features will be segregated as
relevant and noisy features based on the weights assigned. The components having
nonzero values are termed as relevant, and those having values equal to zero are
termed as Noisy features. The relevant features will be used as an input to Rule-based
Classifiers PART, OneR, ZeroR, JRip, and Decision Table. 10-fold cross-validation
will be used to build models using these classifiers. These rule-based classifiers would
further classify whether the traffic is legitimate or malicious. The performance of the
models will be measured with the general performance metrics used in Machine
Learning.
4 Results and Analysis
The proposed model given in Sect. 3 is implemented using Waikato Environment

for Knowledge Analysis (Weka 3.8.3) Tool on a workstation having 32 GB RAM
equipped with Intel Xeon CPU E3-1271 v3 @ 3.60 GHz processor. Pandas python
Fig. 1 DoS detection model
library is used for replacing the infinity and N/A values with appropriate large values
and zeros, respectively. A duplicate column named” Fwd Header Length” is removed.
Weka is used to apply IG, GR, CFS, SU, and ReliefF algorithms on the prepro-
cessed dataset. Scikit-learn python framework is used exclusively for calculating
the Mutual Information of the features. Different feature selection techniques are
applied to the dataset consist of 77 features. These feature selection techniques
assign a specific score to each feature. The range of the score assigned by each
feature selection algorithm is shown in Table 1.
Some of the features have been assigned a score equal to zero by each feature
selection algorithm. Such features having a score of zero assigned by each feature
selection algorithm are mentioned in Table 2. These features are termed noisy features
and can be discarded for DoS attack detection.
Table 1 List of ranges of

Feature selection Range
feature selection algorithms
IG 0–0.879
GR 0–0.430
CFS 0–0.624
MIFS 0–0.684
ReliefF 0–0.283
SU 0–0.495
Table 2 List of noisy

Feature number Feature name
features
32 Bwd PSH flags
33 Fwd URG flags
34 Bwd URG flags
50 CWE Flag count
56 Frwd avrg bytes/bulk
57 Frwd avrg packets/bulk
58 Frwd avrg bulk
59 Bawd avrg bytes/bulk
60 Bawd avrg packets/bulk
61 Bawd avrg bulk
Rule-based classifiers will be applied for the evaluation of the model. The
following performance metrics will be used for assessment:
TN +TP
Accuracy = ∗ 100 (1)
T N + T P + FP + FN
I ncorr ectly Classi f ied% = (1 − Accuracy) ∗ 100 (2)
The abbreviations listed above are meant as follows:

• TP: True Positive—Attack Packets correctly classified as Attack.
• TN: True Negative—Genuine/Benign Packets correctly classified as Benign.
• FP: False Positive—Genuine/Benign Packets misclassified as Attack.
• FN: False Negative—Attack Packets misclassified as Genuine/Benign.
The rule-based classifiers have been applied to the dataset consisting of 77
features. Table 3 shows the results of the application of different Rule-based Classi-
fiers without using any feature selection techniques. It can be observed that, out of
the mentioned Rule-based Classifiers, PART performs better in terms of accuracy.
The rule-based classifiers have now been applied to the preprocessed dataset
consisting of 67 features. Table 4 shows the results of the application of rule-based
Table 3 Performance of rule-based classifiers using 77 features

Algorithm Accuracy (%) Built-Up time (Seconds) Incorrectly classified instances
(%)
PART 99.9591 502.58 0.0409
JRip 99.9539 2545.46 0.0461
OneR 93.7917 13.38 6.2083
ZeroR 63.5238 0.07 36.4762
Decision table 99.7913 1614.34 0.2087
Table 4 Performance of rule-based classifiers using 67 features

Algorithm Accuracy Built-Up time Incorrectly classified instances (%)
(%) (Seconds)
PART 99.9591 457.71 0.0409
JRip 99.9577 2370.78 0.0423
OneR 93.7917 11.95 6.2083
ZeroR 63.5238 0.08 36.4762
Decision table 99.7913 1551.45 0.2087
classifiers after removing the noisy features. It can be again observed that PART
performs better than other rule-based classifiers in terms of accuracy.
Table 5 shows a brief comparison with the existing intrusion detection systems.
The work [9], 10 is carried on KDD Cup’99, NSL-KDD dataset, which consists of
layer 3 (Network layer) and layer 4 (Transport layer) attacks. The work [11] is carried
on the UNSW-NB15 dataset, consisting of layer 4 DoS attacks. However, the types of
DoS attacks are absent in this dataset. The work presented in this paper is carried on
CICIDS-2017, a reasonably new dataset that consists of modern layer 7 (Application
layer) attacks. The results reported in [9] and [10] achieve an accuracy of 94.31% and
99.72%, respectively, to detect DoS attacks at the network and transport layer. The
work presented in this paper achieves higher accuracy of 99.95% for the detection
Table 5 A brief comparison with existing work

Work Dataset Attack layer DoS attack types Accuracy
(%)
[9] KDD Cup’99 3 and 4 Back, Land, Neptune, Ping of Death, 94.31
Smurf, Teardrop
[10] NSL-KDD 3 and 4 Back, Land, Neptune, Ping of Death, 99.72
Smurf, Teardrop
[11] UNSW-NB15 4 Not Applicable 86.41
Our work CICIDS-2017 7 SlowHTTPTest,Slowloris,Hulk, 99.95
GoldenEye
of DoS attacks at the application layer compared to the study presented in [9, 10],
and [11].
5 Conclusions
Filter-based Feature Selection techniques—Information Gain, Gain Ratio,

Symmetric Uncertainty, ReliefF, and Mutual Information are applied on the CICIDS-
2017 DoS dataset. The features 32, 33, 34, 50, 56, 57, 58, 59, 60, 61 are termed as
noisy features in this work as the weights assigned to these features by all the feature
selection techniques are equal to 0. A 13% reduction in the total count of features
is achieved. Rule-based Classifiers—PART, OneR, ZeroR, JRip, and Decision Table
are used to build models using all features and exclude noisy features. Among these
Rule-based Classifiers used, PART yielded the best accuracy of 99.9591%.
An effort will be made to decrease the count of relevant features by applying
different combinations of feature selection techniques in the future.
References
1. Wankhede, S., Kshirsagar, D.: Dos attack detection using machine learning and neural
network. In: 2018 Fourth International Conference on Computing Communication Control
and Automation (ICCUBEA), pp. 1–5. IEEE (2018)
2. Nicholson, P.: 5 most famous dos attacks (2020). https://www.a10networks.com/blog/5-most-
famous-ddos-attacks/
3. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a
data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
4. Ibrahim, H.E., Badr, S.M., Shaheen, M.A.: Adaptive layered approach using machine learning
techniques with gain ratio for intrusion detection systems (2012). arXiv:1210.7650
5. Deshpande, P., Aggarwal, A., Sharma, S.C., Kumar, P.S., Abraham, A.:Distributed port-scan
attack in cloud environment. In: Fifth International Conference on Computational Aspects of
Social Networks (CASoN), pp. 27–31 (2013)
6. Ambusaidi, M.A., He, X., Nanda, P., Tan, Z.: Building an intrusion detection system using a
filter-based feature selection algorithm. IEEE Trans. Comput. 65(10), 2986–2998 (2016)
7. Kshirsagar, D., Kumar, S.: An ensemble feature reduction method for web attack detection. J.
Discret. Math. Sci. Cryptogr. 23(1), 283–291 (2020)
8. Pandey, V.C., Peddoju, S.K., Deshpande, P.S.: A statistical and distributed packet filter against
DDoS attacks in cloud environment. Sādhanā 43, 32 (2018). https://doi.org/10.1007/s12046-
018-0800-7
9. Mohammadi, S., Desai, V., Karimipour, H.: Multivariate mutual information based feature
selection for cyber intrusion detection. In: 2018 IEEE Electrical Power and Energy Conference
(EPEC), pp. 1–6. IEEE (2018)
10. Dua, M., et al.: Attribute selection and ensemble classifier based novel approach to intrusion
detection system. Procedia Comput. Sci. 167, 2191–2199 (2020)
11. Umar, M.A., Zhanfang, C., Liu, Y.: Network intrusion detection using wrapper-based decision
tree for feature selection. In Proceedings of the 2020 International Conference on Internet
Computing for Science and Engineering, pp. 5–13 (2020)
12. Sornsuwit, P., Jaiyen, S.: A new hybrid machine learning for cybersecurity threat detection
based on adaptive boosting. Appl. Artif. Intell. 33(5), 462–482 (2019)
13. Salih, A.A., Abdulrazaq, M.B.: Combining best features selection using three classifiers
in intrusion detection system. In: 2019 International Conference on Advanced Science and
Engineering (ICOASE), pp. 94–99. IEEE (2019)
14. Tchakoucht, T.A.I.T., Mostafa Ezziyyani, M.: Building a fast intrusion detection system for
high-speed-networks: probe and dos attacks detection. Procedia Comput. Sci. 127, 521–530
(2018)
15. Pattawaro, A., Polprasert, C.: Anomaly-based network intrusion detection system through
feature selection and hybrid machine learning technique. In: 2018 16th International Confer-
ence on ICT and Knowledge Engineering (ICT&KE), pp. 1–6. IEEE (2018)
16. Aljawarneh, S., Aldwairi, M., Yassein, M.B.: Anomaly-based intrusion detection system
through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 25,
152–160 (2018)
17. Priyadarsini, P.I., Sai, M.S.S., Suneetha, A., Santhi, M.V.B.T.: Robust feature selection
technique for intrusion detection system. Inter. J. Control Autom. 11(2), 33–44 (2018)
18. Kshirsagar, D., Kumar, S.: Identifying reduced features based on ig-threshold for dos attack
detection using part. In: International Conference on Distributed Computing and Internet
Technology, pp. 411–419. Springer (2020)
19. Shaikh, J.M., Kshirsagar, D.: Feature reduction-based dos attack detection system. In: Next
Generation Information Processing System, pp. 170–177. Springer, Berlin
Botnet Detection Using Bayes Classifier
Prapti Kolpe and Deepak Kshirsagar
Abstract In today’s connected world, risk of getting attacked over the internet is
increased, which plays a major role in infecting the devices over the internet. The
internet is flooded with different malwares, but we have focused on the harmful
effects of Botnet. Botnet is a group of devices controlled by a single device to attack
and infect other devices over the internet. The devices are called bots and these can be
any internet-connected device and the single device controlling these can be called
as a botmaster or a bot driver. It is crucial to detect them at a faster rate since they can
perform various malicious activities. We performed different experiments to detect
Botnet. For experimentation, we used CICIDS2017 dataset and different machine
learning algorithms from Weka. With the ML algorithms, we achieved the highest
accuracy of 98.9146% for NaiveBayesMultinominalText algorithm.
Keywords Botnet · Machine learning · Bayes · Malware detection
1 Introduction
Botnet is a system of devices connected to the internet such as phone, computer,

IoT device [1] with compromised security. The devices in the botnet system are
hacked/compromised and controlled by someone else without the knowledge of the
device owner. These hacked devices are called “Bots”, and the one who contols the
devices is called “Bot Master”. Bots, here, will act as a carrier and whoever will
come in contact with them will become a potential bot. This way, the network of
these bots gets increased and it will form a group to make some other devices in a
network to be part of it.
Compared to other types of malware such as Trojan, Blended Threat, Spyware,
Adware, Key-logger, Rootkit the Bot has an advantage of communicating with the
P. Kolpe (B) · D. Kshirsagar

College of Engineering Pune, Pune, India
e-mail: praptidk17.is@coep.ac.in
D. Kshirsagar
e-mail: ddk.comp@coep.ac.in
322 P. Kolpe and D. Kshirsagar
attacker through an established [2] command and control (C&C) channel. Using an
established C&C channel, botmaster can easily manipulate the bot’s behavior on
victims’ machine for his interests. Botnet is controlled and supervised by botmaster
and has become a distributed platform to perform malicious and illegal activities
on victims’ machine such as sending SPAM-emails, malware distribution, identity
theft, and attack on the organizational network and critical infrastructure. Botnet
allows attackers to perform different attacks such as Phishing, DDoS, Cryptojacking,
Snooping, Bricking, and Spam-bots. Botnets can be categorized [3] as Centralized
and Decentralized.
1.1 Machine Learning
ML is a way to analyze the data/information to predict the desired outcome. There

is lots of data around us, but usually, we don’t analyze it to conclude something.
Let’s consider an example of any disease where there could be thousands of patients
with multiple symptoms for a single disease. If we observe those symptoms and try
to find out the common one, we could find a pattern. Machine learning is precisely
doing the same thing, that is, processing multiple data inputs and finding out the
desired pattern. In this way, ML algorithms gets matured when it gets more data and
scenarios to process.
1.2 Feature Selection
It can be used to remove unwanted or irrelevant attributes from the dataset. It makes
it easy for ML algorithms to analyze and understand the data in huge dataset. Feature
selection methods are classified into filter and wrapper [4] according to the feature
evaluation measures. The filter methods directly select a feature’s subset according
to data characteristics from the dataset and then apply different classification algo-
rithms to evaluate the selected subset. The wrapper methods uses predefined learning
algorithms to choose a feature subset for evaluation. Wrapper methods require more
computation, and hence these are more expensive and complicated than filter-based
approach. So, when there is substantial data to process, filter-based methods would
be the preferred choice. There are different types of feature selection methods:
1. Correlation: In this method the algorithm selects a feature subset that is highly
correlated with the output class and not much correlated with another class
feature [5]. The correlation can be calculated as
maxe
ax y = (1)
m + m(m − 1)aee
Botnet Detection Using Bayes Classifier 323
where a(xy)is the correlation (dependence) between features and the class vari-
able, m is the number of features, a(xe) is the average of the correlation between
feature-class and a(ee) is the average inter-correlation between feature.
2. Information Gain: It is a feature selection technique where the feature is selected
based on the amount of information provided by the feature in the output class. It
is based on entropy and also calculated as reduction in entropy. The value of IG
lies between 0 and 1. Zero represents no information and 1 represents maximum
information. The more relevant feature in the output class will have a higher IG
value and will get selected. Information gain and entropy [6] are vice versa. The
value IG can be increased by decreasing the value of entropy. The value of IG is
used to split the dataset in ID3 while building the decision tree. We can calculate
the value of IG for single variable as follows:
I G(D, z) = H (D) − H (D | z) (2)
where IG represents the information gain for the dataset D, H(D) represents the
entropy for dataset D and H(D|z) represents the conditional entropy of the dataset
D for variable z.
3. Gain Ratio: The gain ratio is the improvement in Information gain; it overcomes
the IG’s drawback [7], where IG considers many attributes for the decision,
which may not be useful. The gain ratio helps in selecting those features which
are essential for that outcome in the decision tree. The gain ratio can be calculated
as
I n f or mationGain(a)
Gain Ratio(a) = (3)
I mpor tantvalue(a)
where
n
|Di | |D|
I mpor tantvaluea (D) = ( )log 2 ( ) (4)
i=1
|D| |D|
The above equation gives the value of dividing attributes into training dataset D
and n parts.
4. Symmetrical uncertainty: IG is based on the context of selecting attributes. It is
the drawback of IG. Symmetrical uncertainty (SU) overcomes the drawback of
IG [8] by normalization. It is restricted to the range of 0 to 1, where if the SU
comes zero, that means there is no relation between two attributes, and if SU
comes 1, then the two attributes are dependent on each other and can be taken
into consideration. It can be calculated as
IG
SU = 2 ∗ ( ) (5)
(H (a) + H (b))
5. RelliefF: It calculates the score of each feature by evaluating instances from a

dataset and ranks them in an order based on their score/weight. The score of a
feature decreases if it is missed from a nearby instance in the dataset for the same
class and increases if it is found in another nearby instance of the same class.
Accordingly, the importance of each feature is calculated based [9] on its score
and it is selected if relevance is greater than a threshold value. It is an extension
of the Relief algorithm, and it is robust than Relief.
6. Chi-Square: Chi-Square algorithm selects features based on the Chi-Square
score. It can be calculated as follows:
Of − Ef
z2 = (6)
2E f
where O(f) is the observed frequency, that is, number of observations in the class
and E(f) is the expected frequency [10], that is, many observations in the class
where there is no connection between the feature and target.
1.3 Bayes Classifier
This is a collection of classification algorithms based on the Bayes Theorem.

1. Naive Bayes algorithm : The labels consist of sequential numbers. Features
selected during the evaluation process works on the same approach [11]. How-
ever, their outcome is completely independent of each other. The final result is
a combined outcome of selected features. Let’s consider a dataset describing
weather and ground condition to decide whether to play cricket or not. Given
the condition, each feature identifies its outcome. At the end of the process, the
outcome is predicted based on the individual feature’s combined result.
2. Bayesian network classifier : A classifier in which there is a strong indepen-
dence between features selected for evaluation. It essentially means, the overall
outcome is not dependent on one particular feature, or we can say the pres-
ence/absence of one feature does affect other features to predict the outcome.
Bayesian networks are ideal for predicting the best possible outcomes based on
the event that occurred [12] by doing permutation of most relevant inputs. For
example, this could be used to compute possible diseases a patient could have
by looking at his symptoms.
3. Multinominal naive Bayes : With this model, feature vector represents the num-
ber of times a particular event has occurred. A feature vector Y = (Y1, Y2, …,
Yn) [13] represents a histogram with Yi counting the number of times event i
have occurred in a particular instance. This model is typically used in document
classification with features representing the number of times a particular word
occurred in a single document.
Our paper is further distributed in 4 sections, where we have performed literature sur-
vey, and with the help of which we have got direction to select classifier and perform
botnet detection. Our next section describes our proposed model for the detection of
botnet which gives different results. The results are presented and discussed in the
next section.
2 Related Work
Narang, Pratik et al. [14] proposed a conversation—based mechanism for the detec-
tion of botnet. They have used Discrete Fourier Transforms (DFTs) and informa-
tion entropy as a measure, Correlation-based Feature Selection (CFS) algorithm,
Consistency-based Subset Evaluation (CSE) search algorithm for feature selection.
The classifiers they used are Random forests, Reduced error pruning’ (REP) trees,
Naive Bayes and Decision tree hybrid classifier, namely Naive Bayes tree (NB tree),
K nearest neighbors algorithm, Support Vector Machines (SVM),‘stacking’ ensem-
ble learning technique (also known as ‘stacked generalization’). They have selected
a total of 23 features.
Kirubavathi and Anitha [15] proposed an approach for botnet detection which is
based on network traffic performance analysis. The classifiers they used are Boosted
decision tree (AdaBoostM1+J48) ensemble classifier, Naive Bayesian (NB) statisti-
cal classifier, and Support vector machine (SVM) discriminative classifier. They have
used the publicly available datasets and those are Conficker dataset from CAIDA,
ISOT Botnet dataset from the University of Victoria, dataset from the University of
Georgia, four different datasets from CVUT University, three different IRC botnet
dataset from Centro University, Argentina, Citadel botnet and Alexa benign datasets
from Dalhousie University. They have used different classifiers, out of which the
Naive Bayesian classifier gives the highest accuracy of 99% and 0.02% false posi-
tive rate.
Haq and Singh [16] proposed a hybrid approach for the detection on the basis
of false positive rates of signature-based and anomaly-based detection. The hybrid
approach is a combination of classification technique and clustering technique. They
have used the Weka tool for pre-processing, training, testing, and cross-validation of
features. The classifiers used are Naive Bayes Classifier, Ibk classifier, Rule Decision
Table, Trees, and J48 classifier. On the other hand, they have also used k-means
clustering. They have used the publicly available datasets and they are ISOT, ISCX,
and CTU-13. According to the analysis of the results obtained with various classifiers,
J48 tree algorithm gave the highest accuracy of 90.2723%.
Garre et al. [17] proposed a system using honeypots with SSH sensors to capture
the data coming through the network and machine learning technique for the detection
of attack. They have selected features based on the commands captured during the
SSH session. The classifiers used are Decision tree, Random forest, SVM, Naive
Bayes. They created their own dataset with a total of 93 features, 72 commands, 7
session states, and 14 statistics. The Random Forest classifier achieved a precision of
95.7% and a recall of 93.9%, which indicates that the random forest gives less false
negative. Due to this, they have chosen Random Forest classifier. They have tested
their model on 20% of training dataset achieving the highest accuracy of 99.59%,
precision of 96.87%, recall of 100%, giving zero false negatives.
Narang et al. [18] performed feature selection techniques for peer to peer botnet
traffic. They have used the ISOT dataset which is publicly available and their own
created dataset. There were a total of 23 features in their dataset. They used three
feature selection techniques and those are CFS (Correlation-based Feature Selection),
CSE (Consistency-based Subset Evaluation), PCA (Principal Component Analysis).
For experiments, three different classifiers are used and those are C4.5, Naive Bayes,
and Bayes Network. They obtained 5 features from CFS, 8 features from CSE, and
12 features from PCA.
Al Janabi et al. [8] used a real time dataset with 18266 records and 5 features to
understand which technique is best. There are two types of feature selection tech-
niques those are wrapper model and filter model. These are further classified into
different types. They have performed a comparative study of these types of feature
selection techniques.
3 Proposed Model
3.1 Dataset
In our proposed model, there are different components as shown in Fig. 1. The first
component of our model is Dataset and we have used the dataset [19] named CICIDS
2017 Friday Morning Botnet. This is a publicly available dataset by the University
named UNB (University of New Brunswick). The dataset have 184045 instances,
out of which 182096 are benign and 1948 are bots. Since the dataset is a collection
of raw data, it needs to be pre-processed. The pre-processing of data is performed
by the next component of our model as shown in Fig. 1.
3.2 Data Pre-processing
Initially, the dataset needs to be pre-processed to transform it into machine-readable

format. The raw dataset may contain some null, NAN, and duplicate values, which
is called noisy data. Hence, it becomes necessary to remove this noisy data. It can
be done using different techniques, in our case, we have used a python code for data
pre-processing. With the help of python code, we transformed our dataset to a clean
and machine-readable pre-processed dataset. The pre-processed dataset is used for
experimentation.
Fig. 1 System architecture
3.3 Feature Selection
Feature selection is a technique for selecting distinct features from a dataset. We have
used open-source software named Weka for selecting useful features by applying
different machine learning feature selection algorithms. Now, on the basis of selected
features, the dataset is classified and this task is performed by the next component
called classifier. We are using filter based-feature selection algorithms for selecting
distinct features.
3.4 Classifier
There are different ML classifiers which are used for the classification of data. In
classification, class is predicted on the basis of input data. The class can also be
called as the target output. There are different types of classifiers, out of which, we
are using binary classifier, which gives class as bot or benign. There are different
machine learning-based classifiers available in Weka, out of which, we are using
Bayes classifiers.
In our experimentation, we have used a labeled dataset which is collected from [19]
, and it consists of 78 features including 1 label. Then this dataset is pre-processed
with the help of python code. The pre-processed data is used for feature selection.
Features are chosen using filter- based feature selection algorithms such as Infor-
mation Gain, Correlation, Gain Ratio, Symmetrical Uncertainty, Chi-Square, and
ReliefF. As shown in Fig. 1, with the help of ML feature selection algorithms, fea-
tures are listed based on their weights and according to their ranks. We observed that
the selected feature list has zero and non-zero weighted features subsets. However,
the zero weighted features do not have any significance. So, we applied different
classification algorithms on the list of non-zero weighted feature subsets. The list
of non-zero weighted features subset are shown in Table 1, and the zero weighted
features subset are shown in Table 2.
We applied the Naive Bayes classifier for selected features with tenfold cross-
validation. The results using the Naive Bayes classifiers are shown in Table 3. From
Table 3, we observed that the Naive Bayes Multinominal Text classifier gives the
highest accuracy of 98.9416%. So we selected this classifier for our further exper-
imentation. We have applied the Naive Bayes Multinominal Text classifier for the
non-zero weighted feature subset from Table 2. The result after applying the Naive
Bayes Multinominal Text classifier for different lists of selected features using differ-
ent feature selection algorithms is shown in Table 3. It is observed that the correlation
and ReliefF selected the same list of features. Similarly, Gain ratio, Chi-square and
information gain, and Symmetrical Uncertainty selected the same features (Table 4).
Table 1 List of non-zero weighted features

Subset Features Feature No.
CR and RelieF 67 47, 49, 39, 12, 1, 69, 13, 55, 9, 54, 8, 38, 11, 43, 10, 7,
53, 2, 21, 14, 26, 41, 19, 24, 76, 74, 77, 23, 29, 18,
63, 5, 48, 31, 45, 28, 17, 22, 27, 42, 72, 30, 25, 67, 71,
70, 37, 52, 75, 44, 73, 66, 15, 20, 40, 46, 51, 16, 35,
36, 3, 62, 4, 64, 6, 65, 68
GR, IG, Chi, SU 65 1, 13, 55, 43, 42, 14, 41, 35, 67, 5, 63, 11, 40, 53, 69,
10, 24, 7, 66, 30, 21, 29, 28, 65, 6, 26, 27, 22, 4, 64,
36, 9, 54, 23, 12, 2, 19, 18, 25, 38, 17, 37, 16, 15, 49,
3, 62, 68, 39, 8, 47, 20, 76, 77, 74, 73, 70, 72, 75, 71,
31, 45, 44, 48, 52
Table 2 List zero weighted features

Subset Features Feature No.
CR and RelieF 10 33, 34, 59, 32, 60, 56, 61, 57, 50, 58
GR, IG, Chi, SU 12 34, 46, 33, 51, 50, 61, 57, 32, 58, 60, 59, 56
Table 3 Performance analysis of Bayes classifier

Algorithm Accuracy (%) Incorrectly classified Model build time (s)
instances (%)
BayesNet 96.3835 3.6165 25.06
Naive Bayes 84.2125 15.7875 3.13
Naive Bayes 98.9416 1.0584 0.08
multinominal text
Naive Bayes 84.2125 15.7875 5.8
updateable
Table 4 Result of NaiveBayesMultinominalText (Non-zero weighted features)

Algorithm Features Accuracy (%) Incorrectly Model build time
classified (s)
instances (%)
CR and RelieF 67 98.9416 1.0584 0.07
GR, IG, Chi, SU 65 98.9416 1.0584 0.06
5 Conclusions
We proposed a model for botnet detection using the filter-based feature selection
algorithms and Naive Bayes classifier. The bots are detected with the highest accuracy
of 98.9416% with the help of the Naive Bayes Multinomial Text classifier from
dataset CICIDS 2017. In future, we will try to reduce the features by using different
feature reduction techniques to obtain high accuracy. We will test our model against
different datasets.
References
1. https://www.akamai.com/uk/en/resources/what-is-a-botnet.jsp
2. Chen, C.-M., Lin, H.-C.: Detecting botnet by anomalous traffic. J. Inf. Secur. Appl. 21, 42–51
(2015)
3. https://www.crowdstrike.com/epp-101/botnets/
4. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng.
40(1), 16–28 (2014)
5. Hall, M.A.: Correlation-based feature selection for machine learning (1999)
6. https://machinelearningmastery.com/information-gain-and-mutual-information/#:~:
text=Information
7. Trabelsi, M., Meddouri, N., Maddouri, M.: A new feature selection method for nominal clas-
sifier based on formal concept analysis. Proc. Comput. Sci. 112, 186–194 (2017)
8. Al Janabi, K.B.S., Kadhim, R.: Data reduction techniques: a comparative study for attribute
selection methods. IJACST 2249–3123
9. Pupo, O.G.R., Morell, C., Soto, S.V.: ReliefF-ML: an extension of ReliefF algorithm to multi-
label learning. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds.) Progress in Pattern Recognition,
Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer
Science, vol. 8259. Springer, Berlin, Heidelberg (2013). https://doi.org/10.1007/978-3-642-
41827-36
10. Deshpande, P., Sharma, S.C., Peddoju, S.K., et al.: Security and service assurance issues in
cloud environment. Int. J. Syst. Assur. Eng. Manag. 9, 194–207 (2018). https://doi.org/10.
1007/s13198-016-0525-0
11. Mukherjee, S., Sharma, N.: Intrusion detection using naive Bayes classifier with feature reduc-
tion. Proc. Technol. 4, 119–128 (2012)
12. Ugochukwu, C.J., Bennett, E.O.: An intrusion detection system using machine learning algo-
rithm. Int. J. Comput. Sci. Math. Theory 4(1), 39–47 (2018)
13. https://scikit-learn.org/stable/modules/naive_bayes.html
14. Narang, P., Hota, C., Sencar, H.T.: Noise-resistant mechanisms for the detection of stealthy
peer-to-peer botnets. Comput. Commun. 96, 29–42 (2016)
15. Kirubavathi, G., Anitha, R.: Botnet detection via mining of traffic flow characteristics. Comput.
Electr. Engi. 50, 91–101 (2016)
16. Haq, S., Singh, Y.: Botnet detection using machine learning. In: 2018 Fifth International Con-
ference on Parallel, Distributed and Grid Computing (PDGC), pp. 240–245. IEEE (2018)
17. Garre, J.T.M., Pérez, M.G., Ruiz-Martínez, A.: A novel machine learning-based approach for
the detection of SSH botnet infection. Future Gen. Comput. Syst. 115, 387–396
18. Narang, P., Reddy, J.M., Hota, C.: Feature selection for detection of peer-to-peer botnet traffic.
In: Proceedings of the 6th ACM India Computing Convention, pp. 1–9 (2013)
19. Canadian Institute of Cybersecurity. https://www.unb.ca/cic/datasets/ids-2018.html
Insider Attack Prevention using
Multifactor Authentication Protocols - A
Survey
Siranjeevi Rajamanickam, N. Ramasubramanian and Satyanarayana Vollala
Abstract The technologies’ progress is liable to bring our needs at our doorstep
by accessing the applications through our palmtop or computer system. The users
share the credentials with the application servers to access the web applications. It
is mandatory to secure the user credentials from illegal access or usage. Multifactor
authentication protocols are designed to make use of web applications securely.
Simultaneously, a study on these proposed protocols is necessary to find its strength
in preventing several Security attacks, especially Insider attacks, which is crucial.
This paper presents a comparative study of different Security protocols based on
their strength in preventing Security attacks, including their performance and other
parameters involved in the protocol. It is inferred from the study that Insider attacks
can be prevented only when the Security protocol is free from various other attacks.
The comparative study on performance reveals that the computation cost increases
with the cryptosystem’s strength in designing the protocol.
Keywords Authentication · Security attacks · Insider attacks · Elliptic curve

cryptosystem (ECC)
1 Introduction
The growth of the digital world demands well-built web applications with high-level
Security. Authentication of the parties involved in communication is predominant in
deciding the level of Security. There are several applications on the web that are most
S. Rajamanickam (B) · N. Ramasubramanian

Department of Computer Science and Engineering, National Institute of Technology,
Tiruchirappalli, Tamil Nadu, India
N. Ramasubramanian
e-mail: nrs@nitt.edu
S. Vollala
Department of Computer Science and Engineering, Dr. Shyama Prasad Mukherjee International
Institute of Information Technology, Naya Raipur, India
e-mail: satya@iiitnr.edu.in
332 S. Rajamanickam et al.
widely used by many web users. In this paper, a few of the multifactor authentication
protocols suitable for applications like single server, distributed servers, remote users,
e health care systems, and Internet of Things are discussed. The user prefers to use
smart cards, for which numerous authentication protocols for an environment with a
single server are proposed. This idea is then extended to a multi-server environment.
Protocols are designed so that users can access all the servers’ services, using single
identity and password pair. The increase in the spread of diseases and high population
dictates the necessity of Telecare Medical Information Systems (TMIS). This system
creates a comfortable communication environment, linking the remote patients with
the medical practitioners and access the required data instantly. The advancement in
technologies demands wireless sensor networks in medical applications, which are
used to improve the care provided to people of different categories. The dependence
of consumers and business applications on cloud and IoT platforms is increased,
leading to many technologies [9]. However, the applications mentioned above turn
out to be fruitful only when the users feel secure when accessing these applications.
Generally, users prefer to use the applications frequently only when they use strong
authentication protocols for communication. These protocols are recommended for
usage only when they are free from several Security attacks. Researchers propose
several multifactor authentication protocols. The proposed protocols claim to prevent
Security attacks and preserve Security’s properties discussed in the forthcoming
section.
1. Password guessing attacks: It is a brute force attack in which a fraudulent try
to obtain the password by trying the different possible combinations of letters
or symbols. It is, in turn, classified into Offline and Online guessing attacks.
a. Offline password guessing attacks: An attacker tries to compute the exact
text from the hash value obtained by him. Adversary attempts with the list of
passwords and corresponding hash values stored for this purpose. b. Online
password guessing attacks: Attempting to guess the correct password, the illegal
user ventures with several user names and password pairs against the user login
portal.
2. Man in the middle attack: The attacker finds a suitable position between the
communication parties and eavesdrops or impersonates and creates regular com-
munication between the entities.
3. Impersonation attacks: On obtaining a legitimate user or server’s identity, the
attacker forges to communicate with parties in the protocol.
4. Replay attack: Data transmission by a malicious attacker by repeating the pre-
vious message transmitted causes a delay or attack.
Insider Attack Prevention using Multifactor Authentication Protocols - A Survey 333
5. Session key disclosure attack: Obtaining the credentials from the messages col-
lected from the session keys transmitted during the key agreement phase.
6. Denial of Service Attack: The accessibility of services is denied for legitimate
users due to the traffic flooding caused by the attackers.
7. Server spoofing attack: The attack caused by a Malicious user on obtaining the
server’s credentials forges to be a legitimate server and communicates with the
user.
8. Stolen verifier attack: The attacker obtains the users’ credential information,
which is stored in the database maintained by the server.
9. Cookie theft attack: The attack is caused by obtaining the data from the cookie
in a smart device.
10. Modification attack: An illegitimate user does attack by altering the valid data
transmitted between the communicating parties.
11. Smart card stolen attack: Attacker retrieves and uses the valid credentials stored
in the smart card, during the registration process, by executing a power analysis
attack and use the data for illegal access of web applications.
12. Insider attack: An insider attack is predominant over all other attacks since access
to the user’s significant credentials is easy and comfortable to the Insiders, who
may be an employee in the current duration or an employee who previously
worked in the organization.
The possibility of the occurrence of Insider attacks may fall into one of the following
cases:
Case 1: Insiders obtain the user credentials from the user while he/she registers with
the service organization.
Case 2: Insiders can retrieve the user credentials from the message shared with the
service provider.
Case 3: Insiders have a chance to make a correct guess of the valid username and
password if it is of low entropy.
Case 4: Insider attacks come into action when some careless users use a single
username and a corresponding unique password to access all web applications.
Insider attacks are to be treated seriously since they cause expenditure loss and
cause reputational damage to the organization. Moreover, it is challenging to detect
Insider attacks. Organizations’ better option is to use Security protocols free from
Insider attacks. Apart from being free from security attacks, the security protocols
must guarantee that the following properties are preserved in the protocol:
1. User/Device anonymity: User or device credentials should be confidential and
unknown to other illegitimate users.
2. Forward secrecy: Session keys are to be preserved even when the long-term
confidential information is compromised during the key exchange.
3. Backward secrecy: The inability to discover the preceding group keys by the
attacker from a contiguous subset, that contains the group keys.
4. Key independence: The incapability of a passive adversary from obtaining the
other set of group keys from any known proper set of group keys
5. Mutual authentication: Two parties do authentication simultaneously; the entities

authenticate by verifying the values shared between the authentication parties.
6. Robustness: The ability of the entities involved in authentication to work cor-
rectly without errors during the authentication process.
7. Freedom to select identity and password: Users are permitted to select their user
name and password randomly of any entropy.
8. The inefficiency of multiple secret keys: Usage of many secret keys to protect
user credentials is to be avoided since it increases the system’s complexity.
Several multifactor authentication protocols are proposed and claim to prevent Insider
attacks but failed when cryptanalyzed in the future, which are discussed in the forth-
coming section.
The remaining part of the article is structured as follows. The literature survey
section describes the several schemes proposed by various authors to prevent an
Insider attack. The next section depicts the different vulnerable Security parameters
that are accountable for Insider attack’s cause—the Security analysis section briefs
about the different tools currently used by the researchers to analyze the protocols.
The features of different security protocols are compared and briefed in the compar-
ative study section. The article is concluded and future directions are mentioned in
the Conclusion section.
2 Literature Survey
In [8], a user authentication protocol is proposed for medical applications using smart
cards and passwords as significant authentication parameters, suffer from Insider and
several other attacks as seen in [3]. A key agreement scheme, used for authentication,
specifically for SIP, is proposed in [2], which by the cryptanalysis done in [7] is
proved to suffer from other Security attacks inclusive of Insider attacks. A protocol
used to authenticate a user to multiple servers based on biometric, with a password
and biometric information of the user stored in a smart card using ECC is proposed
in [10], which has various drawbacks as stated in [16]. An authentication scheme
for remote users’ benefit to access multiple servers using a key agreement scheme
is proposed in [15], and is found to be insecure against several Security attacks, as
proved in [12]. Kalra and Sood proposed an authentication protocol using ECC for
the embedded devices, HTTP clients in [5], which is found to be insecure against
Security attacks as in [6]. Users who prefer roaming services in global mobility
networks can use an anonymous authentication scheme proposed by Mun et al. in
[9], which is cryptanalyzed in [13] and found to have various Security drawbacks. A
key agreement scheme that uses three factors for authentication that can be used for
Telecare Medicine Information Systems is proposed by Nikooghadam et al. in [1]
is found to be insecure as stated in [4]. In common, all these protocols are insecure
against Insider attacks. The vulnerabilities causing Insider attacks are discussed in the
forthcoming section. In 2019, Siranjeevi Rajamanickam et al. proposed a protocol,

that is free of Insider attacks in [11], with few Security drawbacks, as stated in [14].
3 Vulnerable Security Parameters Causing Insider Attacks
Based on the Security loopholes, the cause of Insider attacks is categorized into the
following cases:
1. Case 1: Direct share of credentials
Liu chung scheme: As mentioned in [3], Liu chung scheme [8], cannot prevent
Insider attacks.The user’s identity and password are shared with the trusted
authority to personalize the smart card. In this way, the Insider can misuse the
credentials and cause attacks.
2. Case 2: Credentials obtained by offline guessing attacks
Wang et al. scheme: As cryptanalyzed in [12], wang et al. scheme [15] suffers
Insider attacks. Insiders obtain the credentials by offline guessing attacks.
3. Case 3: Weakness in the server verification message
Arshad et al. scheme: In [7], Authors cryptanalyzed [2], proposed by Arshad
et al., the Insider can forge as legal client after selecting a random number dc
and obtain the message Vc = h(I Di Q s r ealm dc Q s Vi ), where Vi =
h(IDi PWi Nc ). Here, Insiders can cause forgery attack.
4. Case 4: Intentional action of the attacker
Odelu et al. scheme: The weakness of Odelu et al. scheme [10], is revealed in [16],
stating the user shares IDi with the organization, the hashed value, h(IDi k),
ri which is stored in the table, can be deleted by the attacker intentionally and
use the IDi and register again and cause forgery attack.
5. Case 5: Credentials obtained by guessing and stolen verifier attack
Mun et al. scheme: In [13], the Mun et al. scheme [9] is cryptanalyzed and found
that Insiders can execute a stolen verifier attack by obtaining the information
from the database and performing guessing attack credentials of the user and
misuse it in several ways.
6. Case 6: Credentials generated by Insider
Kalra and Sood scheme In [5], the cloud server CS itself generates a unique
password for every device registered with the CS. Any Insider can misuse the
information and cause several attacks, as proved in [6].
7. Case 7: Credentials obtained by power analysis attack and smart card stolen
attack
Arshad and Nikooghadam scheme: The cryptanalysis done by the author [4],
proves the Arshad et al. scheme [1], is prone to an Insider attack if he obtains
the smart card by stealing and in turn obtain the credentials by power analysis
attack.
Table 1 Characteristic features of the security protocols
336
Refs. An Application that can Factors used Participants Merits Cryptographic A Tool used for
use the protocol for involved in the Algorithm and Formal Security
for communication authentication communication operations used Analysis
[8] Wireless health care User id, Client Instant data access Bilinear pairing Not done
sensor networks password trusting authority or for the users hashing
server, sensor node Low computation cost
[2] IP based telephony User id, Client, Avoids illegal usage of ECC, Not done
networks Password server Voice over Internet Hashing
protocol
[10] Battery limited mobile User id, User, Server, Provides strong user ECC, BAN Logic,
devices Password, Registration center anonymity Hashing AVISPA tool
Biometric
[15] Remote distributed User id, User, Server, Easy re-registration Hashing, Oracle reveal
networks Password, Registration center procedure for the user, Key Exchange
Biometric Preserves biometric Protocol (IKEv2)
information
[5] Internet of Things Device id, Embedded device, Expand the coverage of ECC, AVISPA
cloud server Password Server capabilities offered by Hashing
IoT making them reliable
[9] Global mobility User id Mobile User, Free from Synchronization Elliptic Curve Not done
networks Home agent, problem Diffie–Hellman
Foreign agent (ECDH),
Hashing
[1] Telecare Medical User id, User, Provides strong user ECC, Not done
Information system Password, Telecare Server anonymity Hashing
Biometric
[11] Suitable for all User id, User, Free from inside attacks ECC, Scyther
web applications Password Service providing server, Hashing
Password management
server.
S. Rajamanickam et al.
4 Security Analysis
All the protocols mentioned above prove free from all these attacks through informal
Security analysis, which is done by cryptanalyzing their own proposed protocol, but
seems to be incomplete. Formal Security analysis is also done through different tools
like BAN logic, AVISPA tool.
5 Comparative Study
A comparative study is made on all these protocols and presented in Table 1, exposing
different characteristic features. A survey of the behavior of protocols with different
attacks is presented in Fig. 1. The different protocols’ performance is analyzed, with
a special mention in computation cost incurred by the protocols for login and authen-
tication phase along with the critical agreement and generation phase is presented in
Table 3. The description of the necessary notations used for performance analysis is
listed in Table 2.
Fig. 1 The behavior of security protocols for different security attacks

Table 2 Important notations

Notations used Description
Th Time taken to compute a hash value
TPM Time taken for Elliptic curve point multiplication
TPA Time taken for Elliptic curve point addition
TK Time taken for generating key
TINV Time taken for inversion function
TR Time taken to generate a random number
Table 3 The Computation time of the security protocols for login and authentication and key
generation phases
Scheme Computation cost calculation for Computation time
Login and authentication phase (in ms)
Liu chung 1Th 0.023
Arshad et al. 8Th + 4TPM + 1TINV + 1TM + 3TR 10.86
Odelu et al. 24Th + 6TPM + 6TK 13.43
Wang et al. 4Th 0.032
Kalra & 9Th + 7TPM 15.602
Sood’s
Mun et al. 10Th + 4TM + 2TK 0.042
Arshad & 15Th + 4TPM + 1TINV + 2TM 9.258
Nikooghadam’s
Siranjeevi 8Th + 4TPA + 1TK 0.265
Rajamanickam
et al.
6 Conclusion
Several multifactor authentication protocols suitable for different applications are

studied comparatively on factors like performance, Security attacks, advantages, and
other protocol features. It is inferred from the cases discussed in the above section
that the Insider attacks can be prevented only when the Security protocol is free
from various other attacks. The comparative study on performance reveals that the
computation cost increases with the cryptosystem’s strength used in designing the
protocol. The cause of Insider attacks can also be studied by analyzing the strength
of various cryptographic algorithms used in several Security protocols to prevent
Insider attacks, which is suggested as the future work.
References
1. Arshad, H., Nikooghadam, M.: Three-factor anonymous authentication and key agreement
scheme for telecare medicine information systems. J. Med. Syst. 38(12), 136 (2014)
2. Arshad, H., Nikooghadam, M.: An efficient and secure authentication and key agreement
scheme for session initiation protocol using ECC. Multimed. Tools Appl. 75(1), 181–197
(2016)
3. Challa, S., Das, A.K., Odelu, V., Kumar, N., Kumari, S., Khan, M.K., Vasilakos, A.V.: An
efficient ecc-based provably secure three-factor user authentication and key agreement protocol
for wireless healthcare sensor networks. Comput. Electr. Eng. 69, 534–554 (2018)
4. Das, A.K.: A secure user anonymity-preserving three-factor remote user authentication scheme
for the telecare medicine information systems. J. Med. Syst. 39(3), 30 (2015)
5. Kalra, S., Sood, S.K.: Secure authentication scheme for iot and cloud servers. Pervasive Mob.
Comput. 24, 210–223 (2015)
6. Kumari, S., Karuppiah, M., Das, A.K., Li, X., Wu, F., Kumar, N.: A secure authentication
scheme based on elliptic curve cryptography for iot and cloud servers. J. Supercomput. 74(12),
6428–6453 (2018)
7. Lin, H., Wen, F., Chunxia, D.: An anonymous and secure authentication and key agreement
scheme for session initiation protocol. Multimed. Tools Appl. 76(2), 2315–2329 (2017)
8. Liu, C.-H., Chung, Y.-F.: Secure user authentication scheme for wireless healthcare sensor
networks. Comput. Electr. Eng. 59, 250–261 (2017)
9. Mun, H., Han, K., Lee, Y.S., Yeun, C.Y., Choi, H.H.: Enhanced secure anonymous authentica-
tion scheme for roaming service in global mobility networks. Math. Comput. Model. 55(1–2),
214–222 (2012)
10. Odelu, V., Das, A.K., Goswami, A.: A secure biometrics-based multi-server authentication
protocol using smart cards. IEEE Trans. Inf. Forensics Secur. 10(9), 1953–1966 (2015)
11. Rajamanickam, S., Vollala, S., Amin, R., Ramasubramanian, N.: Lightweight password-based
authentication techniques using ECC. IEEE Syst. J., Insid. Attack Prot. (2019)
12. Reddy, A.G., Yoon, E.J., Das, A.K., Odelu, V., Yoo, K.Y.: Design of mutually authenticated
key agreement protocol resistant to impersonation attacks for multi-server environment. IEEE
Access 5, 3622–3639 (2017)
13. Reddy, A.G., Yoon, E.J., Das, A.K, Yoo, K.Y.: Lightweight authentication with key-agreement
protocol for mobile network environment using smart cards. IET Inf. Secur. 10(5), 272–282
(2016)
14. Shamshad, S., Mahmood, K., Kumari, S., Khan, M.K.: Comments on “insider attack protection:
lightweight password-based authentication techniques using ECC”. IEEE Syst. J. (2020)
15. Wang, C., Zhang, X., Zheng, Z.: Cryptanalysis and improvement of a biometric-based multi-
server authentication and key agreement scheme. Plos one 11(2), (2016)
16. Zhang, M., Zhang, J., Tan, W.: Remote three-factor authentication protocol with strong robust-
ness for multi-server environment. China Commun. 14(6), 126–136 (2017)
Link Scheduling in Wireless Mesh
Network Using Ant Colony Optimization
Makarand D. Wangikar and Balaji R. Bombade
Abstract Recent developments in the last few years have shown rapid growth in
wireless communication technologies. When a Network is established across the
given geographical distance, the biggest issue arises about its effective sharing among
all the stakeholders. Nowadays, the internet laid network parses across the globe.
Therefore, sharing among nations is required to be distributed without any loss to
anyone operating the network. It can be achieved by several other greedy based
network sharing approaches, which compute the network business along with the
routing information and collectively affects the throughput. To avoid this, we present
a novel approach of using the Genetic Algorithm-based Ant Colony Optimization
on heavily trafficked networks to improve network performance. We are further
predicting the future network traffic and accordingly scheduling the network routing
mechanism. Our experiment on NS2 simulator proves that the results are far better
than other presented methods of greedy-based computations.
Keywords Network traffic analysis · Routing of packets · Link weight

computation · Scheduling algorithm
1 Introduction
All of us know the very first experiment of DARPA in the history of the computer
network. The speedy development caused immense pressure to share the resources
like network bandwidth, available hardware for the routing process, and many more
such resources [1]. Hence, innovation leads to prevalent network traffic scheduling
methods using greedy techniques [2]. Although greedy approaches are providing
good result, still the network is not utilized at its full extent. Hence, we propose
M. D. Wangikar (B)
School of Computational Sciences, Swami Ramanand Teerth Marathwada University, Nanded
431605, India
B. R. Bombade
Shri Guru Gobind Singhji Institute of Engineering and Technology, Nanded 431605, India
342 M. D. Wangikar and B. R. Bombade
a novel approach using Genetic Algorithms and further add-on of unique Predic-
tive analysis techniques. The Genetic Algorithms were Genetic Modeling of human
genes and is almost successfully incorporated in System design and development [3].
It is widely known that the Genetic Algorithm is used as an optimization technique.
Here we used this context and eventually further improved the network performance
through the predictive analysis [4]. The wireless mesh network is complex and has
multiple hybrid category devices attached to its service point. The study and exper-
imental results with greedy approach carried out earlier by different researchers
resulted in better network performance with some lacuana. Hence, we tried to put
our research on better scheduling of network traffic and used the concept of Ants
Colony Optimization Technique in our proposed method. It is a fact that ant uses a
special liquid or chemical to trace its path. When ants found a portion of food in the
nearby vicinity, it marches back to its colony, and during this march back, it sprays
down a particular chemical called Pheromone. This Pheromone is nothing but a chem-
ical secreted and triggers a predefined social response among the member species.
Various Pheromones exist, e.g., Alarm Pheromones, Food-Trail Pheromones, Escape
Pheromones, etc. Here, we are using the Food-Trail Pheromones concept [5].
In Food-Trail Pheromones, when an Ant finds a portion of food, it lays down
the Pheromones and accordingly a social response is generated to trace back the
food. According to their intelligence, a different ant tries to apply differently to
different paths to reach the same food. But Food-Trail Pheromones vanishes due
to atmospheric conditions like heat, air, humidity, wind flow, other species activate,
etc. Therefore, in such situations, it is observed that the chemical trail which has
the shortest distance from the located food will always be more robust and the most
crucial track details available. However, every ant tries its best, but the ants’ group
will choose only the shortest route, and all other available paths will be discarded [6]
as shown in Fig. 1. A similar concept can be applied to network traffic scheduling
[7]. The network may be LAN, WAN, MAN, wired or wireless, terrestrial or extra-
terrestrial, of the same kind of hybrid. The proposed method uses G.A to avoid delay
in case of congestion irrespective of the type of network.
Further, the predictive technique method is applied to avoid delays by marking
the intermediate nodes used as routers and ranking them as per their busy/available
bandwidths. This will help us to predict in advance the availability of the bandwidth.
No doubt, the prediction may sometimes provide values that might not reach the
predictions but give fair values based on which routing/scheduling of packets can be
done effectively [8].
It is observed that when more than two users use a network, network scheduling is
required. The sharing resource algorithm comes in the scenario. To date, lot many
researchers tried to apply various algorithms successfully to sharing resource prob-
lems. But networks started to increase throughout the globe, and hence it becomes
Link Scheduling in Wireless Mesh Network Using Ant … 343
Fig. 1 Ants colony optimization technique for finding the shortest path using pheromones
complex to solve the dispute of network resource sharing. Everyone needs to find
the best optimal network resource sharing policy across the world.
A very brief comparison table of the Scheduling mechanism with the other similar
method is given in the following table, explanatory. The scheduling methods that
are considered are divided into mainly two types as Node Scheduling and Link
Scheduling. Another comparative analysis is done on the network type, whether it is
centralized or distributed in nature, along with the threshold, Medium of access, and
single or multi-channel in the heart (Table 1).
Table 1 Comparative analysis using scheduling mechanism

Scheduling Centralized/distributed Interference Medium Single Author
type access channel/multi-channel
Node Centralized Threshold-based STDMA Multi-channel Koutsonikolas
et al. [9]
Node Distributed Decrease TDMA Single channel Wang et al.
multi-user [10]
interference
Link Centralized Physical model STDMA Single channel Tran and
Hong [11]
Link Distributed N/A STDMA Multi-channel Salem and
Hubaux [12]
2.1 Application of Genetic Algorithms to Wireless Mesh

Network
Selecting a Network: First, we chose a small network of 110 computers within our
institute only. We have experimented on more than 200+ nodes until 100,000 node
computer networks, which are virtually available on the NS2 simulator [13]. Later,
we run the proposed algorithm in a real-time network too. The difference between
a simulator and a real-time environment was varying due to clock synchronization
issues. Global and local time stamps on the packets changed highly during network
testing. It is observed from the experiment that several real-time problems are not
incorporated in the simulated environment, e.g., loss of connectivity during the trans-
mission, network bandwidth down due to physical infrastructure loss, and other such
conditions that are only available in real-time systems and cannot be augmented.
Few defined Parameters for the NS2 simulator are as shown in Table 2.
Creating Test Packet Acting Like Pheromone: The termed Pheromone packet will
be a test packet that always moves through the network at a predefined interval and
finds out the busiest routes along with the network [14]. Such a packet is nothing
but similar to the chemical Pheromone used by the Ants. The timestamp acts as a
Pheromone clock, which helps to know the exact conditions of network links and
their availability at the given time. The data received helps to have better scheduling
of the wireless mesh network.
Table 2 NS2 setup

Parameters Symbol Value
parameter details
Transmission power P 10 mW
Noise power N0 −90 dBm
Communication threshold 20 dB
ℽc
Interference threshold 10 dB
ℽi
Frame duration Tf 10 ms
Path loss exponent β 4
Area covered R*R 886*886 m2
Proposed Algorithm:
Steps Description
1 Fixing Network Parameters like Network Type, B.W., Data Intervals, Test
Packets, test packet intervals
Δ → n1, n2, n3, n5, ……. nm (No. of Nodes)
δ → 5 (Routers/Gateways)
S (n) → S {Ø} = S1, S2, S3, ……. Sn, (n ≥ 25)
G (V, E) → E (G) = (Δ1, Δ2, Δ3, ……. Δn )
Pheromone Packets (P) (P = 1, 2, 3, ……. n)
2 Sort the G(V,E) in ascending or descending order
3 Create traffic { S1, Δ, δ1, δ2, δ3, δ4, δ5, P = Δ2)
4 Deploy the Pheromone Packet and check the link status periodically.
5 Compute & Find out the busiest route and available route for the given time
slot
value of S
6 Rank the routers and links accordingly as per its business and availability for
given time.
7 /* Ranking helps to get Predictive awareness and self-decision making
system to route/reroute the packets for better bandwidth management. Hence
effective scheduling is achieved [15]. */
8 Repeat & Compute for all G (V, E) across all Δ nodes. Increment S = S+1, Δ
= Δ +1 for all values of S
9 End
The termed Pheromone packet will be a test packet that always moves through
the network at a predefined interval, finds out the busiest routes and the web, and
acts like Pheromone of ants.
Once Condition A & B are satisfied, we initiate the system to run our small program in
which the packet is transmitted on the available network. Once a batch of successful
transmission is achieved from more than five nodes, the mean, average time required
for the packet to reach the destination is calculated. Accordingly, we varied the
batches of data to be sent and tested the result. After every batch of packets, a
pheromone packet is transmitted to check the network availability. The result we
received is tabulated in Table 3. It shows that the Pheromone packet has provided
insight into network traffics, and accordingly, the network’s business is obtained.
Once the traffic is analyzed, it becomes easier to rank the intermediate routers and
the available free paths. For the given packet size, data size, available bandwidth,
number of nodes, number of routers, the scheduling can be done using Pheromone
packets.
Table 3 Comparative analysis using NS2, LAN, and WAN

N/W type No. of nodes No. of routers Time required by pheromone packet (ms)
NS2 25 5 01
50 5 01
100 5 02
110 5 03
200 5 03
LAN 110 5 01
200 5 03
WAN 50,000 5 700
100,000 5 1300
In the above technique, we fixed the bandwidth for all types of networks, i.e.,
10 MHz. Therefore, this helps us to define the uniform solution to network scheduling
problems. For more significant analysis and compare the experiment is run on 110
nodes executed on NS2. It is always expected to have a minimum signal to noise ratio
for a given test, as the impact of interference may lead to having marginal output
than expected [16].
3.1 Application of Predictive Analysis to Wireless Mesh

Network Using Pheromone Based Table Values
for Scheduling
We had deployed numerous Pheromone packets across the network as per the
network’s size, as shown in Table 4. These numbers of packets are nothing but imper-
sonating as the ants who are laying down Pheromone chemical across the network
space and trying to find out the shortest route for better scheduling and faster access
[13, 17]. If you closely observe the time taken by Pheromone packets to travel and
reach destinations, then it is quite clear that the insight we got can help tune the
Table 4 The varying number

No. of nodes Number of pheromone packets
of pheromone packets versus
size of the network NS2 LAN WAN
25 50 50 0
50 100 100 0
100 200 200 0
110 220 220 0
200 400 400 0
500,000 5000 0 5000
Fig. 2 Comparison of throughput value of the proposed method with existing algorithms
scheduling of network traffic effectively. The NS2 simulator has been used with 25,
50, 100, 110, 200, 50,000, and 100,000 nodes in the first experiment. The number
of Pheromone packets quantity is zero over here as we want to make it an example
needed for comparison. This had given the value of throughput, as shown in Fig. 3.
The second experiment slot is related to the LAN network. Here only two slots are
experimented with for 110 and 200 Nodes due to infrastructure constraints. The
number of Pheromones is 10% of the total nodes, and the throughput value is shown
in Fig. 3.
Similarly, the experiment with the WAN-based system was unique, and here the
number of nodes was 50,000 and 100,000, respectively, and Pheromone packets were
10% of the actual node value. The resultant throughput is shown in Fig. 2.
Based on Pheromones packet values to the time taken from source to destination,
the system gets auto-tuned for the shortest path. It has the lowest congestion value
or highest bandwidth availability [18]. Similarly, as per Fig. 3, we can conclude
that the proposed algorithm’s resultant scheduling performance is far better than the
previously used algorithms due to the Pheromone-based Ants Colony Optimization
algorithm [5]. The scheduling performance is almost increased as compared to greedy
algorithms. The system shows better performance as we keep up with an increasing
number of nodes. It is found that the proposed algorithm has good time complexity
in the specified given time [19].
Fig. 3 Comparative scheduling performance of the proposed algorithm with existing algorithms
4 Conclusion
Network scheduling is an issue when traffic is generated, and it becomes difficult

to schedule at an optimal level due to bottlenecks. Therefore, our experiments have
proposed Ant Colony Optimization-based network scheduling. Figure 2 shows that
the highest throughput value is achieved when the network uses the Pheromone
packet analysis and tunes its network to schedule the network routes. Keeping 10%
of Pheromone packets’ value avoids the network congestion itself caused by the
testing packets. Any value below ten percentage was not found to be yielding results
as the network is very dynamic and executes at higher rates. Therefore, at regular
micro intervals, we need to send Pheromone packets to obtain the network status and
get the shortest path for effective and predictive network scheduling.
References
1. Simaribba, O., et al.: Robust STDMA, scheduling in multi-hop wireless networks for single
node position perturbation, pp. 566–571. IEEE (2009)
2. Brar, G., et al.: Computationally efficient scheduling with the physical interference model for
throughput improvement in wireless mesh networks. In: Proceeding MobiCom’06 Proceedings
of the 12th Annual International Conference on Mobile Computing and Networking, pp. 2–13
(2006)
3. Mang, K.F., et al.: Genetic algorithms, concept and application in engineering design. IEEE
Trans. Ind. Eng. 1, 519–534 (1996)
4. Martins, D., et al.: Classification with ant colony optimization. IEEE Trans. Evol. Comput. 11,
651–665 (2007)
5. Goyal, M., Agrawal, M.: Optimize workflow scheduling using hybrid ant colony optimization
and particle swarm optimization algorithm in cloud environment. Int. J. Adv. Res. Ideas Innov.
Technol. (IJARIIT) 181–189 (2017)
6. Nayyar, A. et al.: Ant colony optimization—computational swarm intelligence technique. In:
3rd International Conference on Computing for Sustainable Global Development, pp. 392–398
(2016)
7. Bruno, R., et al.: Mesh networks: commodity multihop adhoc networks. IEEE Commun. Mag.
123–131 (2005)
8. Nayyar, A., et al.: Ant colony optimization—computational swarm intelligence technique. In:
3rd International Conference on Computing for Sustainable Global Development, pp. 392–398
(2016)
9. Koutsonikolas, D., Das, S.M., Hu, Y.C.: An interference-aware fair scheduling for multicast in
wireless mesh networks. J. Parallel Distrib. Comput. 68, 372–386 (2008)
10. Wang, K., Chiasserini, C.F., Rao, R.R., Proakis, J.G.: A distributed joint scheduling and power
control algorithm for multicasting in wireless ad hoc networks. In: Proc. of IEEE Int. Conf. on
Communications, pp. 725–731 (2003)
11. Tran, N.H., Hong, C.S.: Fair scheduling for throughput improvement in wireless mesh networks.
pp. 1310–1312
12. Salem, N.B., Hubaux, J.-P.: A fair scheduling for wireless mesh networks. In: Proc. of 1st IEEE
Workshop on Wireless Mesh Networks (WiMesh) (2005)
13. Manickavasagan, V., et al.: Online resource scheduling using ants colony optimization for cloud
computing. Int. J. Eng. Sci. Comput. (IJESC) 5430–5432 (2017)
14. Hasio, Y.T., Chaung, C.L.: Ant colony optimization for best path planning. In: IEEE Inter-
national Symposium on Communication and Information Technology, ISCIT, pp. 668–678
(2004)
15. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Technol. J. 623–656
(1948)
16. Jain, K., et al.: Impact of interference on multi-hop wireless network performance. In: ACM
Proceedings on Network, pp. 66–80 (2003)
17. Adubi, S.A., et al.: A comparative study on the ant colony optimization. In: 2014 IEEE 11th
International Conference on Electronics, Computer and Computation, (ICECCO), pp. 215–228
(2014)
18. Luo, W., Lin, D., Feng, X.: An improved ant colony optimization and its application on TSP
problem. In: IEEE International Conference on Internet of Things and IEEE Green computing
and Communications (greencom) and IEEE Cyber, Physical And Social Computing (cpscom)
and IEEE Smart Data (smart data), pp. 175–188 (2016)
19. Gore, A.D., et al.: Link scheduling algorithms for wireless mesh network. IEEE Commun.
Surv. Tutrorials 13(2), 258–273 (2011)
Development of an Integrated Security
Model for Wireless Body Area Networks
K. R. Siva Bharathi and R. Venkateswari
Abstract Security is a critical perspective to be considered while designing any

network. Wireless Body Area Networks (WBANs) are those networks that provide
an information-based diagnosis of diseases, thus enhancing early treatment. Once if
intruders enter the network, the entire network will become profligate. This paper
proposes an integrated security model by utilizing the biometric and digital signature
applications to overcome intruder’s attack and enhance security, thus making the
network reliable and stable. Experimental results indicate that the proposed method
emphasizes a higher level of security and stability of the system leaving behind an
optimal residual energy of 20 J with many deceased nodes below 50 for 2000 rounds
of data communication.
Keywords Attacks · Biometric authentication · Cryptographic protocols · RSA ·

HMAC-SHA · Security
1 Introduction
Wireless Body Area Networks (WBANs), a wireless network comprising numerous

wearables and computing devices, are categorized under special-purpose Wireless
Sensor Networks (WSNs) for remote monitoring and tracking the health care of
people. These networks’ functionality depends on the communication of these wear-
able devices or external devices through the internet. WBANs play a crucial role
in keeping track of the health parameters such as sugar level, Blood pressure level,
heartbeat rate, and cholesterol levels. WBANs involves several physiological sensors
to be implemented in the human body, enabling continuous tracking and monitoring
of the person’s health parameters. With wireless technology advancements, sensors,
and communication, WBANs can provide cost-effective healthcare to society.
K. R. Siva Bharathi (B)

Department of ECE, Sri Krishna College of Engineering and Technology, Coimbatore, India
R. Venkateswari
Department of ECE, PSG College of Technology, Coimbatore, India
352 K. R. Siva Bharathi and R. Venkateswari
A typical WBAN node consists of a sensor or actuator, a controller, a power unit,

memory, and a transceiver device. Since WBAN nodes are resource-constrained, it
necessitates simple cryptographic primitives to compensate for the computing power
and save energy. IEEE 802.15.6 is a widely accepted standard for WBAN, including
seven different security protocols [1]. IEEE 802.15.6 defines three PHY layers, the
Narrowband (NB), the Ultrawideband (UWB), and Human Body Communication
(HBC) [2], and it also ensures three different levels of security.
Attacks may occur in all the layers of the OSI model. These attacks cause
very severe security issues and degrade the performance of the network. Some of
the attacks include jamming, tempering, collision, Hello flood attacks, Wormhole
attacks, Blackhole attacks, Sybil attacks, flooding, Desynchronization attacks, and
DoS attacks [3]. Despite these attacks, WBANs are prone to several security threats.
Moreover, the data stored within WBAN face threats due to node or device compro-
mise and network dynamics [4]. These facts enable numerous challenges in WBAN,
such as ensuring data quality, data management, data consistency, data originality,
and Data privacy for secure communication.
A study on the WBAN security mechanisms explains the various cryptographic
protocols and key agreement protocols proposed for Wireless Body Area Networks’
security. A 1-round anonymous authentication protocol [5] establishes security to the
transmitted and stored data, prone to further diagnosis and treatment. Elliptic curve
cryptography and bilinear mapping techniques enhance communication between
first and second-level nodes, thus improving anonymity [6]. After client validation,
Certificate Authority (CA) issues and maintains a pool of certificates to the clients
[7].
Various IBC-based systems are proposed wherein a client uses his identity as a
public key, and the key generation center generates a private key [8]. Partial private
keys are generated based on the master key and user identity by the Key Generation
Center as an alternative to IBC systems [9]. Reference [10] provides a complete
survey and analysis of various authentication schemes implemented in WBANs
such as cryptography-based authentication, Hash-based authentication, Anonymous
authentication, and Biometric based authentication.
A trusted third-party generating user secret keys is proposed in [11], identifying
whether the client is valid or not, thus enhancing secure data transfer. An authenti-
cation scheme for a network with three elements, namely a Medical WBAN Sensor
(MWS), Personal Device (PD), and a Medical Server with three phases, is explained
in [12]. A hybrid security scheme using physiological values and preloading of the
secret key is described in [13], which spins around the biokeys to update the master
key of the hub node. [14] explains an identity-based authentication scheme using
Elliptic Curve Cryptosystems. This method affords client-based anonymity along
with mutual authentication.
The scheme implemented in [15] resolves the replay attack problems and mutual
authentication using the Elliptical Curve Cryptography. An efficient anonymous
authenticated key agreement scheme is explained in [16]. A reduced computation
cost is attained during the authentication phase as the session keys are generated and
kept secretly during the registration phase.
Development of an Integrated Security Model for Wireless … 353
Fig. 1 A three-level architecture of WBAN
A lightweight anonymous mutual authentication and key agreement scheme are

proposed in [17]. This scheme’s authentication phase enables a sensor node to authen-
ticate with the hub node to establish anonymous mutual authentication by generating a
session key. A secure and energy-efficient framework for WBANs to reduce commu-
nication overhead and energy efficiency is explained in [18] with security measures to
safeguard the medical data against malicious nodes to enhance integrity and privacy.
The general three-tier architecture is illustrated in Fig. 1. Section 2 elucidates the
proposed methodology; Sect. 3 describes the results and discussions, Sect. 4 forms
the conclusion, and Sect. 5 lists the references.
WBANs are usually deployed in a limited area to collect sensitive data that includes
patient identity and health status and transmit it over an open network. This sensi-
tive information is captured the same way from numerous patients and directed to
a centralized data center. This increases the network burden along with innumer-
able security threats in the open network. In this paper, a practical methodology is
proposed to reduce network traffic and fortify network security.
A random WBAN network is generated initially, and then a clustering approach
is applied to the sensor nodes in the network. Each node in the cluster transmits its
data to the cluster head, usually a border router from which the data is communicated
to the centralized node where it is analyzed. Thus, the clustering approach reduces
all the nodes in the network communicating to the centralized node, thus reducing
network traffic. Once this is done, an integrated hybrid security model is applied to
this clustered architecture to provide a high-security level to the network. The steps
involved in generating an integrated secure WBAN model are illustrated in Fig. 2.
2.1 Generation of Random WBAN Network
Generating a random WBAN architecture involves deploying body sensors in

patients, athletes, etc. Sensor deployed patients or athletes can be observed parallelly
Fig. 2 Integrated secure

clustered WBAN
architecture
Table 1 Features of a
WBAN features Capacity
WBAN network
Individual WBAN size 10 (max)
Total no. of WBAN nodes 100
Network topology Random
Total WBAN energy Sum of all body sensors
within the coverage region. The outstanding features of a WBAN network are listed
under Table 1.
2.2 Building a Balanced Clustered Architecture
Clustering plays a vital role in reducing the network load. In general, WBANs are
resource constraint networks with limited node energy and sensing range. Hence,
clustering helps in improving the network lifetime and thus the stability of the
network. Each patient is to be considered as a miniature WBAN network or a single
cluster. Every patient will be having a border router sensor to collect information
from all the nodes in his body and communicate it to the centralized node. The
network is generated, and each node is checked for energy and node probability.
A particular node is selected as the cluster head if it lies within energy and node
threshold probability. Further, the nodes that fall within the selected cluster head’s
coverage become the cluster member. Communication is initiated between the nodes
in the cluster to the border router for reliable data delivery to the centralized node.
Table 2 Obtaining palm/thumb image
2.3 Applying Integrated Security Model
A triad level of security is applied to incorporate a high-security level in the generated

clustered Architecture. The first level involves a biometric authentication of whoever
requests access to the network at any level. The second level uses the cryptographic
algorithm to ensure security at the individual cluster level. The third level involves
security at the data level. In this work, the cryptographic algorithm, RSA, is applied
as the second level security, and HMAC-SHA-256 is being used as the security
primitive for the third level.
Several biological features of human beings can be used for biometric authenti-
cation [19–21]. The palm/Thumb image is used here for access to the network. The
procedure for obtaining and processing the biometric feature is illustrated below in
Table 2.
The palm/thumb biometric features of the medical practitioners, patients, athletes,
and those involved in the network are formerly taken and stored in the network
database. Once the biometric matches with the stored identity, access is granted to
the network. After the network ccess is granted, one has to choose the miniature
network for accessing the data. At this level, the RSA algorithm is implemented as
the next level of security standard. The Cluster Head (CH) acts as the controller node
of that miniature network and does the key generation, distribution, and management.
The algorithm for this digital signature-specific authentication is illustrated in Table
3.
Once the public key and private key pairs are generated, the message block can be
encoded using (p,e) on the WBAN node. The encryption process can be exemplified
as
E_Txt = (Txt)e mod p (1)
where
(p,e) Txt is the Original Message block and E_Txt is the Generated Encoded
Message in the Public key pair.
This E_Txt is transmitted to the Cluster head by the WBAN node. With the private
key it possesses, it performs decoding of the message. It can be exemplified as
Table 3 Level 2 security—RSA
Txt = (E_Txt)x mod p (2)
where
(p,x) is the private key pair, E_Txt is the Encoded Message, and Txt is the Retrieved
Message block.
To inculcate the third level of security paradigms in our network, we use the
Hash-based Message Authentication Code (HMAC) and SHA-256 cryptographic
algorithm. The message block data is encoded and decoded using the Secure Hash
Algorithm (SHA-256) that yields a high level of security to the data.
HMAC is a Message Authentication Code that utilizes a cryptographic hash func-
tion and a secret key. HMAC does not perform any encryption in the message, but
the message is transmitted alongside the hash. Nodes with the secret key will hash
among themselves, and if it seems authentic, the hashes will match. The inner and
outer padding is done to generate the values to the length of the message. The HMAC
function is defined [22] as

HMAC(x) = F k ⊕ opad, F k ⊕ ipad, x (3)
where
F—Cryptographic Hash function, k—Secret key,
k—block-sized key derived from the secret key, x—Message to be authenticated,
⊕—Bit-wise XOR operation, opad—Outer padding, iPad—inner padding.
The HMAC provides the communicating nodes the secret key producing the hash
function to access the required data. The above-elucidated methods are applied to
the balanced clustered network for accomplishing a high level of security in the
network. The simulation environment and the results obtained are discussed in the
next section.
Fig. 3 Deceased node analysis
The proposed integrated security model is defined to improve the performance and
stability of the clustered WBAN. It also provides authenticated and secure commu-
nication. The network’s performance is analyzed utilizing Deceased node analysis,
Packet communication analysis, Energy residue analysis, and Network scalability
analysis. Biometric security is used to provide overall security, RSA is the asym-
metric cryptographic method for node-level authentication, and HMAC-SHA256
is used to attain safe symmetric communication over the network. Transmission is
made for 500, 1000, 1500, and 2000 rounds at different periods for different models,
and the obtained results are illustrated and discussed below. Network reliability is
analyzed based on the number of nodes effectively performing communication in
the network. Figure 3 shows the comparative performance in terms of the existence
of dead nodes in the network.
Improving the transmission rate is the ultimate aim of any network. Communica-
tion failures are widespread in the clustered network as intermediate malicious nodes
occur in the network. Figure 4 shows the comparative performance of the network
in terms of packet communication.
The above graphs and tabulation in Table 4 shows that the proposed inte-
grated security model effectively enhances network performance and scalability.
The comparison of the proposed method with similar models is summarized in Table
5.
4 Conclusions
WBANs suffer from numerous security attacks, both external and internal. Since
these networks deal with life-critical sensitive data, they require strict security prin-
ciples to be secure. In this paper, a three-level integrated security model is proposed
and analyzed. Biometric security, RSA cryptographic algorithm, and HMAC-SHA
Fig. 4 Packet communication analysis
Table 4 Residue energy analysis

No. of rounds Residual energy (J)
SC_WBAN SC_WBAN_RSA Proposed WBAN
500 28.058 48.625 50.31
1000 8.658 30.659 40.896
1500 2.123 20.586 25.636
2000 0 15.656 20.235
Table 5 Comparison of the performance metrics of the proposed model with existing methods
Metric SC_WBAN SC_WBAN_RSA Proposed model
Deceased nodes Highest Medium Lowest
Packet transmission Minimum Medium Maximum
Residual energy Minimum; drops to 0 when no. of Average Maximum
rounds = 2000
algorithms are used for accomplishing security in this model at the identity level,
node level, and data levels. The network is simulated for 500, 1000, 1500, and 2000
cycles to evaluate the network performance and reliability. Our future work focuses
on extending the same, satisfying the network’s demanding security requirements
when used for heterogeneous applications.
References
1. Toorani, M.: Security analysis if the IEEE 802.15.6 standard. Int. J. Commun. Syst. 29(17),
2471–2489 (2016)
2. Kwak, K.S., Ullah, S., Ullah, N.: An overview of IEEE 802.15.6 standard. Appl. Sci. Biomed.
Commun. Technol. (ISABEL) (2010)
3. Niksaz, P., Branch, M.: Wireless body area networks: attacks and countermeasures. Int. J. Sci.
Eng. Res. 6(9), 556–568 (2015)
4. Li, M., Lou, W., Ren, K.: Data security and privacy in wireless body area networks. IEEE
Wirel. Commun. 17(1) (2010)
5. Liu, J., Zhang, L., Sun, R.: 1-raap: an efficient 1-round anonymous authentication protocol for
wireless body area networks. Sensors 16(5), 728 (2016)
6. Liu, J., Zhang, Z., Chen, X., Kwak, K.S.: Certificateless remote anonymous authentication
schemes for wireless body area networks. IEEE Trans. Parallel Distrib. Syst. 25(2), 332–342
(2014)
7. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public
key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
8. Choi, K.Y., Hwang, J.Y., Lee, D.H., Seo, I.S.: ID based authenticated key agreement for low
power mobile devices, pp. 494–505. Springer (2005)
9. Al-Riyami, S.S., Paterson, K.G.: Certificateless public key cryptography, pp. 452–473. Springer
(2003)
10. Masdari, M., Ahmedzadeh, S.: Comprehensive analysis of authentication methods in wireless
body area networks. Secur. Commun. Netw. 9(17), 4777–4803 (2016)
11. He, D., Zeadally, S., Kumar, N., Lee, J.H.: Anonymous authentication for wireless body area
networks with provable security. IEEE Syst. J. 11(4), 2590–2601 (2017)
12. Yeh, C.K., Chen, H.M., Lo, J.W.: An authentication protocol for ubiquitous health monitoring
systems. J. Med. Biol. Eng. 33(4), 415–419 (2013)
13. Koya, A.M., Deepthi, P.: Anonymous hybrid mutual authentication and key agreement scheme
for wireless body area networks. Comput. Netw. 140, 138–151 (2018)
14. Zaho, Z.: An efficient anonymous authentication scheme for wireless body area networks using
elliptic curve cryptosystems. J. Med. Syst. 38(2), 13 (2014)
15. Shankar, S.K., Tomar, A.S., Tak, G.K.: Secure medical data transmission by using ECC with
mutual authentication in WSNs. Procedia Comput. 70, 455–461 (2015)
16. Li, T., Zheng, Y., Zhou, T.: Efficient, anonymous, authenticated key agreement scheme for
wireless body area networks. Secur. Commun. Netw. (2017)
17. Li, X., Ibrahim, M.H., Kumari, S., Sangaiah, A.K., Gupta, V., Choo, K.K.R.: Anonymous
mutual authentication and key agreement scheme for wearable sensors in wireless body area
networks. Comput. Netw. 129, 429–443 (2017)
18. Saba, T., Haseeb, K., Ahmed, I., Rehman, A.: Secure and energy efficient framework using
internet of medical things for e-healthcare. J. Infect. Public Health 13(10), 1567–1575 (2020)
19. Jain, A.K., Ross, A., Prabhakar, S.: An introduction to biometric recognition. IEEE Trans.
Circuits Syst. Video Technol. 14(1) (2014)
20. Delac, K., Grgic, M.: A survey of biometric recognition methods. in: ELMAR, pp. 16–18.
(2004)
21. Bhattacharya, D., Ranjan, R., Alisherov, S., Choi, M.: Biometric authentication: a review. Int.
J. u- e-Service 2(3) (2009)
22. Canale, M.: Comparison of authentication schemes for wireless sensor networks as applied to
secure data aggregation. (2010)
An Improved Node Mobility Patten
in Wireless Ad Hoc Network
Manish Ranjan Pandey, Rahul Kumar Mishra, and Arvind Kumar Shukla
Abstract This paper has reported an improved process for an optimized and effec-
tive node management model for mobile wireless ad hoc networks. The improved
technique is based on optimized and route maintenance of the network. The proposed
method aims to overcome the problem when the movement of nodes happens during
the routing process. Mobility Models’ performance has been estimated using param-
eters like Packet Delivery Ratio (PDR), Average Latency, Throughput, etc., using
NS-3.0.
Keywords MANET · Mobility models · Mobility sample · NS-3.0 · Performance

constraint · BonnMotion 2.0
1 Introduction
The main concern in an ad hoc wireless network is ad hoc routing because of its ad
hoc nature, like dynamic (frequently changing) network topology, a shared medium
partial bandwidth, and multimode characters, etc. There is a need for an efficient
mobility management scheme. Node mobility has been frequently used for simulation
functions, while new conversation or direction-finding methods are considered. Node
mobility in the network is the wireless capability that nodes are free to travel in any
direction. This free node can purpose hyperlinks between nodes to alternate pretty
regularly, and the topology is self-motivated and irregular. Access to data in the
free traveling node is essential for the ad hoc wireless network’s normal working.
Constructing and keeping hyperlinks between nodes is an overwhelming venture and
warm lookup topic in ad hoc network. So an improvement in the node management
scheme is needed.
In Mobility Management, there are two directions of research. One approach is
designing a new Mobility Model that predicts a new era of mobility. Another method
is to enhance mobility on account of manipulating routing protocol parameters such
as interruption, jitter, and throughput. The network routing protocols are affected by
M. R. Pandey · R. K. Mishra · A. K. Shukla (B)

Department of Computer Applications, IFTM University, Moradabad, India
362 M. R. Pandey et al.
nodes’ movement, linked failure, bit error rate degradation, enhancement in routing
overhand, etc. When cellular nodes’ velocity enhances, the wide variety of cell nodes
below any transmission varies is decreased [1–3].
This paper is focused on enhanced node mobility patterns in wireless ad hoc
networks. This manuscript aims to define an effective node movement pattern with
Random Waypoint, one of the efficient node mobility management models for Wire-
less ad hoc network. The last cause to plan a mobility structure is to depict movement
samples of persons in action and calculate how their velocity, place, and acceleration
trade over time. It is ideal for mobility fashions to consider the movement sample of
centered practical software in a real-looking way. Motion patterns play a vital func-
tion in identifying protocol performance. When evaluating a wireless ad hoc network
protocol, it is essential to pick the acceptable underlying mobility management model
[4, 5]. For instance, the Random Waypoint model’s node works pretty in another way
than the occupation cluster or group. It’s no longer relevant to gauge the purposes that
the place nodes tend to maneuver collectively using the Random Waypoint Model.
Therefore, there shall be a method to improve mobility management models’ right
understanding and their effect on protocol performance.
This paper is organized as follows—The problem statement is discussed in Sect. 2.
The proposed model is reported in Sect. 3, while Sect. 4 reports the adopted method-
ology. The results and discussions are discussed in Sect. 5, and Sect. 6 concludes the
paper.
2 Problem Statement
The node movement pattern is the main problem in wireless ad hoc networking
and plays an essential role in available throughput, PDR, and Quality of Services
(QoSs). The function of mobility models is to express a common workstation node
movement pattern procedure so that the analysis for these purposes may be made
concerning the mobility model. Thus, nodes’ mobility performs a vital position in the
overall performance evaluation of ad hoc wireless networks. The most frequently
used mobility model is the Random Waypoint mobility model. So the next section
deals with explaining the mobility model. It has been verified why it is no longer
appropriate to model a human being’s motion or transportation means. Therefore,
new mobility models are very much needed.
2.1 Random Waypoint Mobility Model Applied
The Random Waypoint Mobility Model (RWP) consists of pauses time between
transformation in route and speed. A node starts evolving by settling in one area
for a unique time (recess time). As soon as this factor terminates, the node opts for
an arbitrary vacation spot inside the simulation region and a velocity is unvaryingly
An Improved Node Mobility Patten in Wireless Ad Hoc Network 363
disbursed between [least-speed, utmost-speed]. Then, it voyages toward the recently

elected vacation spot at the selected speed. Upon appearance, the node breaks for a
designated duration earlier than beginning the technique once more. However, given
that its overall performance is unbiased of previous action (memory-less), it creates
very impractical or non-realistic displacements. The visiting sample is accompanied
by using a node. The usage of this mobility management model is plotted. The
mobility of RWP continuously motives topology exchange [1, 4–7]. The pause time
and the utmost pace have an impact on the mobility conduct of the nodes. If the
maximum speed is short and the break time is increased, a community topology
becomes distinctly stable. On the contrary, if the utmost velocity is excessive and the
break time is little, the topology is hugely self-motivated. Two simple issues of the
Random Waypoint mobility management model are sudden flip and surprising give
up [8]. A sharp flip happens on every occasion; there is a path alternate inside the
range. Sudden give up that takes place on every occasion is a trade of velocity. This
aspect is not relative to the preceding speed. These issues are frequently minimized
by permitting the previous pace and course to affect the longer time pace and route
[9–11]. Most researches have been characterizing the individual mobility models
followed by the nodes. However, a single node’s routing consideration is rare, as
most of the nodes’ traffic shows unity property in wireless ad hoc networks.
3 Proposed Model
After improving the mobility model, the proposed problem of getting more practical
mobile nodes can be solved. The mathematical hypothesis and complete analysis of
this model are explained besides the above-said limitation and requirements. We aim
to provide a solution to the random movement of nodes, which may cause the link
to break. Here, we are proposing our node movement method in realistic scenarios
like university sites and shopping malls, etc. Our main objectives of the research
are to improve random waypoint performance in terms of delay, latency, throughput,
reliability and reduce overhead by finding the best path for transferring packets to
their destination.
The mobility model plays a significant role in the assessment of wireless network
protocols. Within the network, wireless mobility models vary from other existing
networks. The connectivity and capacity of the network repeatedly depend on the
nodes’ mobility performance. Compared to other presented models that require Base
Stations (BSs), the wireless mobility models need to cooperate with two or more
communicating nodes [5, 7]. Although separate models exist for other presented
models and ad hoc wireless mobility models, there are some resemblances between
the two categories. The Random Waypoint Model is one of the most extensively
used models among ad hoc wireless models for ad hoc wireless simulation and
has been put into practice in lots of network simulators. The movements of nodes
are self-regulating in many mobility models; it has to be described in the past few
research papers. But the movements of nodes are obsessed with one another in group
Fig. 1 Mobility model
mobility models. In this paper, a new mobility model is proposed as a replacement

for the ad hoc wireless network. The mobility model is tested and analyzed with
a real-life setup. First, people move toward definite destinations as an alternative
to arbitrarily deciding the destination. Second, there are some barriers to the setup.
Third, the human tendency usually is to pass through a pathway and select the shortest
paths; they do not stroll alongside unsystematic trajectories [12, 15]. The developed
improved mobility model would disagree with the graph. Movement sample of nodes
from a supply place to a rest spot one, every node has to discover apposite passageway
via the surroundings. The barrier-free node moments are allowed by using a direction
discovering algorithm. This algorithm uses a ray launching method that includes an
optimized line algorithm for the quick beam meeting point search calculation (Fig. 1).
3.1 The Path Finding Algorithm
A route may also be a set of vicinity factors that shape adjoining sections, and no
section overlaps with an impediment inside the surroundings. Here is an algorithm
to satisfy this fact.
(a) In the first step, we initialize the starting and ending points. After initializing
source and target points, we draw a tracing line between source and rest point.
(b) Now we observe the striking objects. If the first object is struck, draw another
tracing line from the striking or hitting position to the rest position, i.e.,
destination position.
(c) Else we add rest position to the path and stop the procedure.
(d) Now we check the first edge of the obstacle strike.
(e) If there is any strike, then add a hit position to the path.
(f) Else again observe the first striking object.
Initially, the opening role and, consequently, the authentic function are equiva-
lent. We initiate a beam from supply to a rest spot and seem for the predominant
impediment strike by using this beam. Now we insert the foremost strike factor to
the trail and look at out to body this obstacle. For doing this, we seem for the main
area hit at some point of this barrier. Suppose a foothold is struck the unique strikes
to the meeting point on this edge. We opt for the closest facet of the strike side up to
attenuate the final direction length. We repeat till the impediment is encountered. It
suggests that the beam from the node role to the rest spot does not strike this barrier’s
fringe.
NS-3.0 simulator [1] is used for the simulation and analysis of the proposed algorithm.
UBUNTO 14.04 LTS is basic hardware and operating system used in simulation
work. The performing configuration is described in Table 1. The BONNMOTION 2.0
is a fundamental mobility state of affairs technology tool [11, 13, 14]. According to
the result given below, we have produced mobility scenarios for RWP and enhanced
mobility model using NS-3.0 to integrate into TCL scripts. Unsystematic traffic
acquaintances of CBR can be group between mobile nodes with the usages of a traffic
scenario generator script. Our study used the random waypoint model and enhanced
mobility model for the node with a pause time of 15 ± 3 s. and speed varying between
Table 1 Performance
Constraints Value
parameters
Type of channel Wireless channel
Simulator NS 3.0 (Version 3.0)
Protocols DSR routing protocol
Time duration for simulation 300 s
Amount of nodes 20, 30, 40, 50
Range of transmission 250 m
Movement management Model Random-waypoint
MAC layer protocol 802.11
Break time (s) 15 ± 3 s
Utmost speed 30
Least speed 0.5
Packet rate Four packets
Type of traffic CBR (Constant Bit Rate)
Data payload 512 bytes/packet
Max of CBR connections (10, 20, 40, 60, 80)
Size on an environment (600 m * 600 m)
0 and 100 m/s with a minimum speed of 5 m/s and a maximum speed 20 m/s for a
simulation time of 300 s. Table 1, which is given below, demonstrate the performance
constraint.
For each simulation process, nodes’ position, their movement, and traffic between
them are located arbitrarily. BONNMOTION-2.0 is accountable for the unsystem-
atic residences of the nodes’ locations and actions, and the site visitors, NS-3.0 arbi-
trary variables are utilized. Putting the unsystematic variables is the main factor as
otherwise, it may land up in excessive simulations without any meaningful results.
(a) Packet-Delivery-Ratio (PDR): PDR is the proportion of records packets
transported to the rest spot to these produced from the starting places. It
is estimated by dividing the variety of packets acquired through the rest
spot through the range packet originated from the supply [11, 13].
PDF = (Packet acquired/Packet sent) ∗ 100
(b) Throughput: It is a common variety of messages efficiently delivered per unit

time quantity of bits transported in each second [13, 14].
Throughput = Total Received Packets/Total Simulation Time (Kbits/s)
(c) Average End-to-End Delay: This is consists of every possible set-back precip-
itated with the aid of buffering throughout route-finding latency, which is waiting
in line at the boundary queue, re-transmission set-back at the MAC, and broadcast
and switch times. It is described as the time taken for an information packet to be
transmitted throughout an ad hoc from supply to rest spot [11, 13, 14].
Delay D = Get Hold of Time − Dispatch Time.
5 Result and Analysis
In terms of overall performance evaluation, we think above the general concert

constraint. In Figs. 2, 3, and 4, the simulations focus on inspecting the usual overall
performance on end-to-end delay, throughput, and packet transport ratio.
The effects contrast with two mobility models that we had chosen, i.e.,
the Random Waypoint Model and Enhanced Mobility Model. The outcome
will exhibit overall performance for mobility fashions with admire to DSR protocol
that had been chosen below distinctive mobility model, which is proven in Figs. 2,
3, and 4.
5.1 Dynamic Source Routing (DSR)
The DSR is a direction-finding protocol for those networks which is wireless. It makes
use of supply routing instead of counting on the routing desk at every interme-
diate device. We can say that Dynamic Source Routing (DSR) is an autonomous
routing protocol for those networks which is wireless. In Dynamic Source
Routing, every supply determines the route to transmit its packets to pick desti-
nations. There are two predominant components, known as pathfinding and path
preservation.
5.2 Packet Delivery Ratio (PDR)
The PDR is the number of packets lucratively transported to the targeted or sink node,
to the whole group of data packets transmitted by different sensor nodes. In Fig. 2,
at nodes 20, the PDR is 0.18 in RWP and 0.21; at nodes 30, PDR is 0.27 in RWP and
0.30 for the proposed mobility model. At node 30, PDR is again in the increasing
stage. However, at nodes, 50 PDR is 0.37 in RWP and 0.22, i.e., there is a decrease in
PDR compared to RWP. It can be seen that between nodes 20–40, PDR’s performance
is increasing in the proposed model compared to the Random Waypoint Model. At
node 50, it’s a bit decreasing, but whenever the node increases, it’ll increase further.
Fig. 2 Packet delivery ratio versus number nodes

Fig. 3 Throughput versus number nodes
5.3 Throughput
Throughput can be described as the ratio of data packets sent out successfully and
calculated in bits/sec. It is to be noted that that higher values of throughput indicate
better performance. In the given Fig. 3, at nodes 20 throughput is 0.15 bits/s for RWP
and 0.3 bits/s for the proposed mobility model. And at nodes 30 throughput is 0.22
bits/s for RWP and 0.33 for the proposed mobility model. Here, we can analyze that
the throughput is continuously increasing whenever nodes increase compared to the
Random Waypoint Model in the improved model.
5.4 End-to-End Delivery
An End-to-End delay is the amount of time a packet requires to arrive at its target
location after leaving its source. Figure 4 at nodes 20 end-to-end delays is 0.220 bits/s
for RWP and 0.219 bits/s for the proposed mobility model. And at nodes, 30 end-
to-end delays are 0.178 bits/s for RWP and 0.169 for the proposed mobility model.
Here, we can analyze that the time taken through a packet from source to targeted
spot is equal or slightly decreased.
6 Conclusion
In this paper, for a node which we viewed right here is wireless ad hoc routing
protocol like DSR. Here, we also considered RWP and proposed mobility models.
Fig. 4 End-to-end delay versus number nodes
Here, we observed that for different ad hoc protocols, the performance of mobility
models could change drastically. Our investigational outcomes point up the better
performance of ad hoc network direction-finding protocol with dissimilar mobility
models. According to our outcomes, the performance of the protocol is exaggerated
by the mobility model. The mobility models’ performance should be estimated with
the wireless ad hoc network protocol (like DSR routing protocol on our experimental
basis) in the sense that most strictly equivalent with a predictable real-world scenario.
There are three parameters End-to-End delay, throughput, and PDR, for which we
have made a comparison in this paper. The routing protocol which we considered
here is DSR for our comparative study.
The proposed mobility model carried out improved outcome compared to the
random waypoint mobility model on set constraints like packet delivery ratio, end-to-
end delay, and throughput for movement sample of the node. It is to be observed that
based on evaluation between two models, the Throughput and PDR of our proposed
model shows better at 20, 30, 40, and 50 nodes. But in the case of End-to-End Delay,
our pattern’s performance is just equal, or we can say a little bit well. Based on
these performances parameter, we can say that it will give better results when we
apply our small organization model. The outcome also illustrates that a wireless ad
hoc network’s previous setup in a real-life scenario is not adequate to investigate its
performance with a particular mobility model. The preference for mobility patterns
has a significant impact on performance.
References
1. Network simulator Homepage. https://www.nsnam.org/release/ns-3.0-pre-releases/

2. Althunibat, S., Badarneh, O.S., Mesleh, R.: Random waypoint mobility model in space
modulation systems. IEEE Commun. Lett. 23(5) (2019)
3. Soltani, M.D., Purwita, A.A., Zeng, Z., Chen, C., Haas, H., Safari, M.: An orientation-
based random waypoint model for user mobility in wireless networks. In: IEEE International
Conference on Communications Workshops, ICC, Dublin, Ireland, Ireland (2020)
4. Bhusal, N.: A review on impact of mobility model of routing protocols in ad-hoc network. ISTP
J. Res. Electr. Electron. Eng. (ISTP-JREEE). In: 1st International Conference on Research in
Science, Engineering & Management (IOCRSEM ) (2014)
5. Manzoor, A., Sharma, V.: A survey of routing and mobility models for wireless ad hoc network.
SSRG Int. J. Comput. Sci. Eng. 46–50 (2015)
6. Ribeiro, A., Sofia, R.C.: A survey on mobility models for wireless networks. SITI Technical
Report SITI-TR-11-01, February (2011)
7. Pullin, A.: Techniques for Building Realistic Simulation Models for Mobile Ad Hoc Network
Research. Ph.D. thesis, Leeds Beckett University, Leeds, UK (2014)
8. Shukla, A.K., Jha, C.K., Arya, R.: A simulation study with mobility models based on routing
protocol. In: Proceedings of Fifth International Conference on Soft Computing for Problem
Solving, pp. 867–875 (2016)
9. Bai, F., Helmy, A.: A survey of mobility models. In: Wireless Ad-hoc Networks, pp. 1–30
(2004)
10. Agashe, A.A., Bodhe, S.K.: Performance evaluation of mobility models for wireless ad hoc
networks. In: Proceedings of the IEEE First International Conference on Emerging Trends in
Engineering and Technology, pp. 172–175 (2008)
11. Carofiglio, G., Chiasserini, C.F., Garettoy, M., Leonard, E.: Route stability in MANETs under
the random direction mobility model. IEEE Trans. Mobile Comput. 8(9), 1167–1179 (2009)
12. Shukla, A.K., Kapil, M., Garg, S.: Int. J. Eng. Res. Ind. Appl. (IJERIA). 5(III), 1–10 (2012).
ISSN 0974-1518
13. Vetrivelan, N., Reddy: Impact and performance of analysis of mobility models on stressful
mobile WiMax environments. Int. J. Comput. Netw. Secur. (IJCNS) 2 (2010)
14. Gerharz, M., de Waal, C.: BonnMotion—A Mobility Scenario Generation Tool. University of
Bonn [Online]. www.cs.uni-bonn.de/IV/BonnMotion/
15. Bekmezci, Sahingoz, O.K., Temel, S.: Flying Ad-Hoc Networks (FANETs): a survey. Ad-Hoc
Netw. 11(3), 1254–1270 (2013)
IGAN: Intrusion Detection Using
Anomaly-Based Generative Adversarial
Network
Jui Shah and Maniklal Das
Abstract We present an architecture, an Anomaly-based Generative Adversarial

Network (IGAN), that detects malicious strings with decent accuracy. The proposed
IGAN is comprised of an encoder, a decoder, and a discriminator. The encoder and
decoder form the generative unit, trying to reconstruct the input and map that input
and output to a latent space variable. IGAN exploits this latent space together with
the adversarial training with the discriminator to enhance the learning of the normal
distribution. The discriminator used is as a classifier as well as a feature extractor.
To identify an anomaly or a normal instance, the anomaly score is calculated after
passing it to the trained model. We have performed a detailed analysis of the existing
and the proposed architecture using reliable metrics such as AUC score, Precision,
Recall, and F_score. The experimental results of the proposed IGAN outperform
other existing models in detecting anomalies with high accuracy.
Keywords Anomaly detection · Autoencoders · Generative adversarial

networks · Intrusion detection
1 Introduction
With the advancement of modern digital transformation, cybersecurity becomes a

crucial concern for individuals and organizations. Various measures have been taken
to defend the system and network perimeter in different computing systems lay-
ers. However, attackers have also been upskilled in their knowledge to bypass such
perimeters to exploit malicious intentions in the target systems. An Intrusion Detec-
tion System (IDS) is used to monitor a network or systems to detect potential threats,
malicious activities, and policy violations. These systems are primarily classified
according to the systems they monitor or the method/approach they use to detect.
The former group is further classified into a Network Intrusion Detection System
(NIDS), which tracks network traffic. It consists primarily of a network tap or port
mirroring and a Host Intrusion Detection System (HIDS), which analyzes an individ-
J. Shah (B) · M. Das

DA-IICT, Gandhinagar, India
372 J. Shah and M. Das
ual system by inspecting networks and logging events. Based on these approaches,
they are classified as either a Signature or an Anomaly-based IDS. Conventionally,
signature-based techniques were used to detect existing patterns and thereby restrict
the ability to see new attacks in a developing world. Therefore, there is a recent shift
to anomaly-based detection where the machine is trained to learn the entity’s nor-
mal behavior to detect any abnormal behavior that deviates from the normal without
having to pre-train the unknown attacks.
Anomaly detection as an unsupervised approach to learning does not rely on nar-
rowly labeled datasets. This feature makes it ideal for an IDS as present network
traffic databases do not have all kinds of attacks, and many are outdated. Addition-
ally, anomaly-based detection is not limited to applications for Intrusion Detection.
It is used for disease detection, sensor networks for event identification, device con-
trol, fault and fraud detection, ecosystem disturbance detection, and also in medical
imaging.
Generative Adversarial Networks [1] have gained immense prominence and have
reached nearly all fields, including detection of anomalies. The use of GANs in the
identification of anomalies is still unexplored, though. Schlegl et al. [2] proposed
AnoGAN that is focused on medical imaging. It is noted that AnoGAN is com-
putationally expensive. The work [3] further developed the architecture and train-
ing based on the original AnoGAN. Their approach, however, has some significant
drawbacks. Both of them train on the 10% KDD dataset, which does not indicate
the larger picture and quickly yields deceptive outcomes. Furthermore, the KDD-99
dataset has redundant entries and the same entries in the train and test dataset, adding
to the system’s enormous bias. They train the network with standard data samples
considered as anomalies, ignoring the anomaly-based detection concept whose sole
aim is to train the usual data to detect any new anomalies. In [4], an encoder-based
adversarial training samples data into Gaussian distribution space using an encoder
and uses a discriminator to test whether the input comes from standard latent space
or is an anomaly by learning to train on the encoder’s latent space. However, it is
arguable that the Gaussian space will distinguish between qualified standard samples
and the anomaly samples that could be encountered during testing. Subsequently, the
authors [2] proposed an optimized AnoGAN framework for faster computation. In
[5], an additional encoder is used to minimize the distance between the images during
training and the latent variables.
In this paper, we present a novel adversarial generative training architecture,
termed as IGAN, for a NIDS that minimizes the mapping error without the use
of an external encoder, and constitutes the following salient points:
• IGAN trains data on the normal training samples only so that the system knows
how to handle the usual network traffic and, thus, identify any anomalies.
• IGAN uses NSL-KDD dataset to prevent the system from any bias and the results
from being deceptive.
IGAN: Intrusion Detection Using Anomaly-Based Generative Adversarial Network 373
• Provides a complete analysis of various existing architectures, which tried to solve

the problem with appropriate metrics and gives clear experimental results showing
why the IGAN is better.
The paper is organized as follows: Sect. 2 discusses related works. Section 3
presents the proposed IGAN. Section 4 provides the performance of the proposed
IGAN model along with the experimental results. The paper is concluded in Sect. 5.
2 Related Work
2.1 Generative Adversarial Network
A generative adversarial network [1] comprises a generator unit and a discriminator

unit. The generator tries to mimic the input distribution and the discriminator tries
to differentiate between the real data and the generated data fed into it. Adversarial
training is carried out in which the generator tries to fool the discriminator while
discriminators tend to train in such a way that it does the opposite.
There has been a lot of research on GANs, and we can see many variants of
GANs in the literature. An efficient way of training is described in [14]. The work
in [15] discusses how the latent space of GANs can be optimized, and [16] pro-
poses an adversarial autoencoder, which has been used widely for anomaly detection
architectures (Fig. 1).
2.2 F-AnoGAN
In [2], the proposed anomaly detection for medical images was much faster than the
previously proposed AnoGAN [2]. It is comprised of two training steps, namely the
GAN training and the encoder training. The GAN is trained on latent representation
using a generator and a discriminator. After this stage, encoder training is carried out,
which maps the normal version of the input image to the latent space variables fed as
input to the generator. It is built on the assumption that in normal image samples, the
conversion to latent representation by an encoder and the consequent mapping back
to the image space via generator should be an identity transform, and the degree of
Fig. 1 Generative
Adversarial Network
deviation is used for anomaly scoring. The W-GAN [11, 12] architecture was used
in the generator specific for image input.
However, the training for the GAN involves initiating the latent representation by
sampling from noise. This introduces a potential flaw in the method. The mapping
between the input images and the latent space variables could be better achieved if the
initial latent variables were close to optimal than being generated from random noise.
Additionally, the two-step training method does not guarantee an overall identity
transformation since the mapping will not be linear.
2.3 Ganomaly
Ganomaly architecture was proposed in [5], which developed on the original idea
of AnoGAN by adding an encoder. This semi-supervised learning technique uses
the same anomaly-based approach and uses contextual, encoder, and adversarial
losses to train the entire system of two encoders, one decoder, and a discriminator.
The main focus of this method is introducing the new encoder loss, which tries to
minimize the bottleneck features of the input, i.e., z and the encoded features of the
generated image, i.e., z by using the L 2 norm. The second encoder has the same
architecture as the first encoder unit but has different parametrization. Therefore, the
normal reconstructed image passed onto the next encoder cannot be expected to be
identical to the initial latent encoding. The use of two different encoders for achieving
a perfect reconstruction of the generated image and the encoding adds on the non-
linear complexity that each neural network brings with itself, and consequently, the
L 2 norm will not be able to discard the random non-linear noise added. Furthermore,
all the units use the DCGAN [20] architecture. The generator also has a convolutional
transpose layer, ReLU activation, and batch-norm with a tanh layer at the end. This
structure is suited only for image datasets, and they have tested the results on MNIST
[21], CIFAR [22], and X-ray security screening datasets (Fig. 2).
Fig. 2 a F-GAN and b Ganomaly

3 The Proposed IGAN Model
In an intrusion detection system, suppose that n instances of requests in network

traffic are described as X = {x1 , x2 , x3 , · · · , xm } with each entry having n = 122
dimensions as features. They are mapped to y = {y1 , y2 , y3 , · · · , ym }, which is either
0 or 1, classified based on them being normal traffic or an intruder(Anomaly). The
key idea of the proposed method is to predict y such that it is equal to y. We use an
anomaly score given by Eq. (6) and identify a threshold above which all samples are
classified as intruders or anomalies.
The proposed IGAN consists of the following attributes–
Generative Adversarial Autoencoder: Autoencoder that is used as a generator to
get an internal representation of features z is known as latent representation [6], as an
intermediate output before the final output of the generator, as can be seen in Fig. 3.
Latent space consistency: IGAN uses the latent space representation to optimize the
generative adversarial network’s training. The latent space is the lower dimensional
compact space, which preserves all non-redundant features of the input data. One of
the main features of IGAN is to reconstruct all normal samples perfectly, and it is
achieved by making sure a perfect reconstruction is made from the latent variable
encoded by the encoder with the help of the following two losses. We accomplish
this using the following two losses:
The Reconstruction Loss: It tries to minimize the distance between the reconstructed
feature and the input one using L1 loss.

n
Lr = |xi − xi | (1)
i=1
The Latent reconstruction Loss: This loss is used to increase the autoencoder
effectiveness. It is built on the logic that if the reconstructed data is the same as the
input passed in the autoencoder, then, when it is passed again into the encoder, the
latent variable generated should be the same as before.

n
L lr = |z i − z i | (2)
i=1
Fig. 3 Proposed training method in 4 steps

Discriminator as a feature extractor: In addition to the normal function as a clas-

sifier, the discriminator is also used as a feature extractor. We use the second last
layer of the discriminator for this purpose.

k
L fe = | f (xi ) − f (xi )| (3)
i=1
where f (.) is the intermediate output of the discriminator with k = 64.

Adversarial Loss: We use the original GAN loss [1] to do the adversarial training
Ladv = Ex∼ p X [(Dis(x) − 1)2 ] + Ex ∼ p X [(Dis(x ))2 ] (4)
Once the training is done, we test the data after passing it through the autoencoder
only and determine its anomaly score using the following formula.
1
n
Anomaly_scor e = (x − xi )2 (5)
n i=1 i
A threshold is determined using the optimal metrics on the test set so that all input
instances that cross the threshold are classified as anomaly.
Objective function: The overall loss function is the summation of all these 4 losses.
L total = (L adv + L r + L lr + L f e ) (6)
4 Analysis and Experimental Results
We have used the NLS-KDD dataset for evaluating the proposed model, which has
been derived from the original KDD [10] dataset by [7]:
• Removing redundant entries in train-set.
• Keeping no duplicate records in train and test set.
• Selecting records of each difficulty group in inverse proportion to their percentage
of records.
The number of records in the dataset is reasonable so that it is feasible to run the
method on the entire set without randomly selecting a smaller chunk and therefore,
the results obtained become consistent and comparable across all algorithms. The
dataset contains 41 features of the network traffic, of which 34 are continuous and 7
symbolic. We convert the continuous features to one hot encoder, so finally, we have
an input dimension of 122. The type of attack is listed in a one-word format in the
last column of the dataset. It is transformed into a binary form of either 0 for a normal
sample and 1 otherwise. This step helps make the output be one-dimensional. We
Fig. 4 a Autoencoder loss and b TPR v 1-FPR
use the CSV files KDDTrain+ and KDDTest+ for implementation. The KDDTrain+
datasets have 67343 normal entries which are used for training. The KDDTest+ data
has 9711 normal entries. So, we have a total of 71463 anomalous entries comprised
of 12, 833 from KDDTrain+ data and 58630 from KDDTest+. As a result, we test
on 81174 entries in total.
The proposed IGAN architecture comprises an encoder, a decoder, and a dis-
criminator. Each encoder and decoder has four hidden layers with ReLU activation
function. The encoder has 512, 256, 128 and 64 dimension hidden layers with input
and output dimension as 122 and 32. The decoder has the dimensions of 64, 128,
256 and 512 and the input and output dimension as 32 and 122, respectively. The
discriminator has 4 hidden layers of dimensions 512, 256, 128 and 64 with the input
and output dimension as 122 and 1. The first hidden layer uses the activation function
of Leaky ReLU, the next three layers use the ReLU, and the output layer uses the
sigmoid activation function. The IGAN is trained for 100 epochs with a learning rate
of 0.0001, and the parameters are optimized using Adam optimizer [8].
We compare the proposed IGAN with the 4 other architectures, which we discuss
below.
The training of IGAN results in reducing the loss of the autoencoder, as can be
seen in Fig. 4. The continuous and gradual decreases imply the model’s effective
training, which learns to mimic the normal traffic distribution.
The effectiveness of anomaly-based intrusion detection was analyzed using the
metrics of AUC, Precision, Recall, F_score, and accuracy [17–19]. Because there
are many samples in our test set, accuracy is not the best measure to determine
performance. Due to a very high number of anomaly class samples than normal
samples, if we consider the model’s accuracy, we get a biased result since even if the
model predicts all samples to be anomaly, then accuracy is high since normal samples
are very low. The best measure for our binary classification problem into anomaly or
normal is AUC (Area Under the Curve of the ROC Curve). A ROC Curve is generated
by plotting the True Positive Rate (TPR) against the False Positive Rate (FPR) at
various threshold settings [9]. We determine the best threshold for the classification
of anomalies from the intersection of the two curves: TPR and 1-FPR. It is the point
of the optimal trade-off between the True positive rate and False Positive rate.
Table 1 Comparison of models with respect to Feature Extraction

Without FE training With FE training
f-AnoGAN 0.922619 0.948453
Ganomaly 0.954145 0.9577306
IGAN 0.939880 0.950905
The AUC score is the highest for the proposed IGAN, as seen from the below
table. Even a small percentage of increase makes a significant difference since real-
world network traffic is large in numbers and we want as little as possible rates of
False positives and True negatives. The AUC is better than that of f-AnoGAN by
1.09692% and that of Ganomaly by 0.13764%. The model outperforms the other
variants, as shown in Table 1.
AUC Precision Recall F_score Accuracy

Enc-Dec 0.96351 0.98711 0.91209 0.94812 91.21270
f-AnoGAN 0.94853 0.98482 0.89807 0.93945 89.80831
E1-Dec-E2 0.95769 0.98221 0.88235 0.92961 88.23638
Ganomaly 0.95773 0.98249 0.88388 0.930580 88.39037
E1-D-E1 0.95719 0.98228 0.88263 0.92980 88.26595
IGAN 0.95905 0.98363 0.89066 0.93483 89.06916
We have also analyzed whether a discriminator as a Feature Extractor (FE) rather than
just a simple classifier. The results showed a significant increase in performance for
all baseline and proposed architectures, as can be seen in the Table II. We obtained
the results without FE training by using the discriminator as only a classifier for
the Generative Adversarial network’s adversarial training. For IGAN, the FE loss
training has led to an increase in AUC score by 1.15942%.
5 Conclusion
We proposed an anomaly-based network intrusion detection model (IGAN) using

a generative adversarial network. IGAN detects anomalies with high accuracy. We
have tested IGAN on the NUS-KDD dataset, a newly refined dataset that tackles all
the drawbacks of the widely used KDD-99 dataset. We have performed a detailed
analysis of the existing and the proposed architecture using reliable metrics such as
AUC score, Precision, Recall, and F_score. The experimental results of the proposed
IGAN outperform other existing models in detecting anomalies with high accuracy.
References
1. Goodfellow, I., Abadie, J.P., Mirza, M., Xu, B., Farley, D.W., Ozair, S., Courville, A., Bengi,
Y.: Generative Adversarial Nets. In: Advances in Neural Information Processing Systems, pp.
2672–2680 (2014)
2. Schlegl, T., Seebock, P., Waldstein, S.M., Langs, G., Erfurth, U.S.: f-anogan: fast unsupervised
anomaly detection with generative adversarial networks. Med. Image Anal. 54, 30–44 (2019)
3. Zenati, H., Romain, M., Foo, C.S., Lecouat, B., Chandrasekhar, V.R.: Adversarially learned
anomaly detection. In: Proceedings of IEEE International Conference on Data Mining, pp.
727–736 (2018)
4. Gherbi, E., Hanczar, B., Janodet, J.C., Klaudel, W.: An encoding adversarial network for
anomaly detection. In: Proceedings of Asian Conference on Machine Learning, pp. 188–203
(2019)
5. Akcay, S., Atapour-Abarghouei, A., Breckon, T.P.: Ganomaly: semi-supervised anomaly detec-
tion via adversarial training. In: Proceedings of Asian Conference on Computer Vision, pp.
622–637 (2018)
6. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders (2015).
arXiv: 1511.05644
7. Dhanabal, L., Shantharajah, S.P.: A study on NSL-KDD dataset for intrusion detection system
based on classification algorithms. Int. J. Adv. Res. Comput. Commun. Eng. 4(6), 446–452
(2015)
8. Kingma, D.P., Adam, J.B.: A method for stochastic optimization (2014). arXiv:1412.6980
9. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating char-
acteristic curve. Radiology 143(1), 29–36 (1982)
10. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.: A Detailed Analysis of the KDD CUP 99
data set. In: Proceedings of IEEE Symposium on Computational Intelligence for Security and
Defense Applications (2009)
11. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN 1701, 07875 (2017)
12. Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of
wasserstein GANs. In: Proceedings of Advances in Neural Information Processing Systems,
pp. 5767–5777 (2017)
13. Yi, X., Walia, E., Babyn, P.: Generative adversarial network in medical imaging: a review. Med.
Image Anal. 58 (2019)
14. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved
Techniques for Training GANs. In: Proceedings of Advances in Neural Information Processing
Systems, pp. 2234–2242 (2016)
15. Bojanowski, P., Joulin, A., Paz, D.L., Szlam, A., Optimizing the latent space of generative
networks (2017). arXiv:1707.05776
16. Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. 1511,
05644 (2015)
17. Davis, J., Goadrich, M.: The Relationship between Precision-Recall and ROC curves. In: Pro-
ceedings of the international Conference on Machine learning, pp. 233–240 (2006)
18. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and f-score, with
implication for evaluation. In: Proceedings of European Conference on Information Retrieval,
pp. 345–359 (2005)
19. Hanley, J., McNeil, B.J.: The meaning and use of the area under a receiver operating charac-
teristic (ROC) curve. Radiology 143(1), 29–36 (1982)
20. Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolu-
tional generative adversarial networks (2015). arXiv:1511.06434
21. LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/
22. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. https://
www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
CodeScan: A Supervised
Machine Learning Approach to Open
Source Code Bot Detection
Vipul Gaurav, Shresth Singh, Avikant Srivastava, and Sushila Shidnal
Abstract Enhancing software productivity would help companies to cut their costs
and increase profits. Software metrics rely heavily on the personal experiences and
skills of managers in pattern recognition and rewards. Differentiating between actual
human effort and machine-generated code can help drive an organization’s decision-
making process that is rewarding its employees and provide an assistive tool to the
managers allowing effective monitoring without micromanagement that has a wide
application in managing work from home and other virtual environments. The paper
explores the insight into the quality of machine-generated bot code compared to
actual human coding efforts. It uses machine learning techniques to identify patterns
and gives intelligent insights that can be used as a performance metric for versioning
systems and business intelligence. We successfully distinguished between a bot and
human-written code with an F1-score of 0.945 using the Light Gradient Boosting
Method.
Keywords Software productivity · Machine learning · Business intelligence ·

Software metrics · Versioning systems · Light gradient boosting
1 Introduction
Programming productivity has been an extensive subject of study for software engi-
neers and product managers. Collaboration through versioning systems has become
essential in modern software development. They come with their own set of new
challenges, including machine-generated bot code, which led to code quality issues
and caused memory complexity problems. The machine-generated code can repli-
cate the human coding efforts to a certain degree. Still, many a time creates merge
conflicts, and some developers use them to increase their contribution to the project,
which can lead to incorrectly rewarding the developer who did not contribute as
V. Gaurav (B) · S. Singh · A. Srivastava · S. Shidnal

Sir MVIT, Bengaluru, India
S. Shidnal
e-mail: sushila_cs@sirmvit.edu
382 V. Gaurav et al.
much as another team member. Bot-generated code can come in many forms ranging
from database updates pull request evaluation automation, certifications, and many
other code snippets used by developers. The paper provides insights into the previous
works done in identifying bot automation for coding, maintenance, and fault testing,
methodology discussing from data acquisition, business understanding, and data
exploration to transition toward the development of new relevant features to identify
bot characteristics, feature engineering, and modeling. It ends with a brief discussion
about the evaluation of each model and conclusive future scope of whether bot code
can replace actual human coding efforts or not.
2 Literature Survey
The software usage that enables developers to interact and be aware of what their
colleagues are doing has successfully built systems [1], which can be achieved using
versioning systems. Open-source projects can develop software quality prediction
models to become state-of-the-art to detect defects in programs in the upcoming era
[2]. Software faults and failures will lead to customer dissatisfaction, and traditional
software metrics need to be modified to match the versioning system’s suitability to
catalyze the process of detecting bugs [3]. A source file is more fault-prone as the
developers’ contributions to the file are more imbalanced (lower entropy) and can be
useful for predicting fault-prone programs, generally characterized by a high amount
of machine-generated code [4]. Bence Kollanyi studied the nature of bot-generated
code, its influence on Github repositories and identified its characteristics different
from human coded effort [5]. Nagappan et al. [6] found that failure-prone software
entities are statistically correlated with code complexity measures, and automation
makes the maintenance work more comfortable at the cost of introducing unnecessary
code. Idreos and Callaghan [7] designed key-value storage engines and studied the
design characteristics useful in understanding the nature of bots in complex pipelines.
Bavota et al. [8] studied the specific cases when refactoring tends to introduce bugs
in the project repositories, which are not accounted for by automated bots.
3 Methodology
3.1 Business Oriented Features and Data Acquisition
The choice of software metrics without hypothesis testing may prove to be a setback
for product teams, and the failure history of similar projects can give validation and
help extract business value. To resolve this, we studied metrics with business value
such as team velocity, which determines the team’s collaboration quality, set by the
deliverables promised by them in every sprint cycle [9]. It can be quantified as the
CodeScan: A Supervised Machine Learning Approach to Open Source … 383
quotient of the number of relevant commits and time of commit for the project.
We combined it with Halstead complexity [10] to create a business-driven metric
that incorporates the program size and errors reported with the number of features
delivered in a sprint cycle. Escape defects refer to the number of bugs present in
the project before the first release, which the managers are supposed to track and
resolve until it becomes coherent with the customer requirements [11]. We combined
this with cyclomatic complexity [12] to provide the number of errors and study
modularity flow control of the programs committed to the project repository. Thus,
we can formalize an intelligent system to identify actual human coding efforts with a
mix of software metrics and business value. We scraped 1.3 million different project
repositories with the help of Selenium and Git and created programs to calculate
metrics such as cyclomatic complexity, Halstead complexity, fan-out complexity
and their change per commit, and other parameters such as data abstraction coupling
[13], number of methods, time of commit, filetype, filesize, and number of comments.
3.2 Data Understanding and Exploration
The data collected using Github API had a mix of human and bot-coded files, and
a few repositories had a proper bot labeling in the code, while others were manu-
ally labeled (1: Bot, 0: Human), turning it into a binary classification problem. The
problem is to identify the actual human coding efforts from the data. Programs were
made to calculate the standard metrics features as aforementioned. The details of all
the features collected are provided as follows (Tables 1 and 2; Fig. 1).
The data collected had a higher percentage of missing entries for bot code than
human coded files. The bots do not consider other features, creating higher missing
values, which is identified as its major characteristic differentiating it from the human
code. The number of code lines written by humans was about twice that of a bot and
Table 1 List of code evaluation metrics used and their description

Code evaluation metrics Description
Cyclomatic complexity Cyclomatic complexity is a software metric used to indicate the
complexity of a program
Halstead’s complexity Halstead’s measures can be used to evaluate the testing time, size
of the program, difficulties, and errors encountered, and the
efforts required to develop the software
Fan-out complexity Fan-out complexity can be explained as an association of one
class with the other class
Data abstraction coupling Data abstraction coupling measures the complexity caused by
Abstract data types. It examines the coupling between classes
reflecting a significant aspect of Object-Oriented Design
Number of commits The frequency of relevant commits to the project
Number of errors The number of issues labeled as an error in the project
Table 2 List of regular parameters associated with source code

Parameters Analysis
Number of methods Humans tend to use more methods while programming when compared
to bots. This makes the code more modular and reusable
Lines of code The number of code lines can be combined with other evaluation
metrics to get a ratio of complexity per unit line of code, which gives us
a standard basis for comparing source code with different sizes and
language
Number of comments Comparative analysis shows that bots tend to write fewer comments
compared to a human. The number of comments to the number of code
lines can show a separation between human and bot written code
File type File type corresponds to the language or framework the source code
belongs to. Modern high-level language may have less number of lines
but still more value on evaluation metrics
File size File size can directly correlate with the number of code lines. Hence, a
change in other parameters can be observed per unit file size
Time of commit Time of commit with week and date, time details
Fig. 1 Workflow diagram
showed a similar trend in the filesize as it was directly proportional to the number of
code lines. Most of the bots were programmed for languages like *pom* .xml (used
in Apache Maven), Javascript, Java, XML, and SQL. On the other hand, humans
were more active in high-level object-oriented programming languages such as Java,
Javascript, and Python (Fig. 2).
The number of methods was found to have higher values for humans than bots
implying that human code is characterized by higher modularity. Human-written code
was found to have a higher number of comments than bots, indicating better read-
ability, efficient debugging, and lower escape defect, thus escalating team velocity
and accelerating the development process. Total data abstraction coupling or DAC is
a metric that measures the number of instantiations of other classes within the given
class and is concerned with the reuse degree [13]. Human code had a higher DAC
value signifying substantial complexity and required more significant maintenance
Fig. 2 Target variable distribution and Feature-wise missing value analysis
and testing efforts. Likewise, reusability and understandability are negatively influ-
enced by the coupling, which means more money and time would be invested in
debugging, which may frustrate the customer, affect deadlines, and impede the team
velocity. Bots have the edge over humans in this case, as they have better reusability,
and the code is less complicated, making it easier to debug and maintain. A signif-
icant drawback of high DAC is stability; as coupling makes the different objects
and classes interrelated, a faulty component can lead to instability resulting in intan-
gible losses for the organization. Fan-out complexity [14] is defined as the number
of functions called within a given position, and its value is lower for code written
by bots than humans and has a high correlation with DAC. Halstead complexity is
significantly higher for humans than bots signifying higher logic and mathematical
prowess of human-written code than the bots (Fig. 3).
Fig. 3 Programming languages used by humans (left) and bots (right)

3.3 Feature Engineering and Modeling
The data collected was preprocessed by Single Centered Imputation using Multiple
Chained Equation (SICE) [15] to handle missing values by randomly imputing
continuous two-level data, and maintain consistency between imputations through
passive imputation only for columns having higher than 20% null values, otherwise
using the median for the same. Z-Score normalization [16] was applied to transform
the data into a standard normal distribution to make features comparable. This step
aims to standardize the range of the continuous initial variables so that each one
of them contributes equally to the analysis. We add our own set of new features
combined as aforementioned, quantifiably created as follows:
Cyclomatic Escape Defect Complexity:
M = E − N + 2P + Q (1)
where
E = the number of edges in the control flow graph
N = the number of nodes in the control flow graph
P = the number of connected components
Q = the number of relevant issues reported for the repository
This is combined with the number of issues reported for conflicts merging and
bugs in the repository. We combine Halstead complexity with the team’s velocity
to get an idea about the change in the program’s size and its increased complexity.
Mathematically, we can define it as follows:
Halstead Team Velocity Program Length:
N ∧ = n 1 log2 n 1 + n 2 log2 n 2 (2)
N = N 1 + N 2 + T eam V elocit y (3)
T eam V elocit y
= (N umber o f commits + N umber o f Employees)/T ime o f commit (4)
where n1 = number of distinct operators, n2 = number of distinct operands, N1 =

Total number of operators, and N2 = Total number of operands.
The fan-in-out metric is the sum of both the Fan-in and Fan-out complexity, and
thus mathematically, we can represent them as follows:
Fan-in Complexit y = L R + G R + P R (5)
where LR = Local Variable Reading, GR = Global Variable Reading, PR =

Parameter Reading
Table 3 Comparison of new

Feature Correlation with target
and old features with
variable
Human/Bot classification
Cyclomatic complexity 0.21
Cyclomatic escape defect 0.45
complexity
Halstead complexity 0.81
Halstead team velocity 0.95
program length
Fan-out complexity 0.54
Fan-out Complexit y = R P W + GW + L W (6)
where RPW = Reference Parameter Written (RPW), GW = Global Variable Written,

and LW = Local Variable Written. We compare the proposed features in correlation
with the target variable and found them better (Table 3).
Synthetic Minority Oversampling (SMOTE) [17] is applied to balance minority
class and prevent overfitting of the model. We further take a stratified sample using
fivefold Cross-validation [18]. We apply dimensionality reduction using principal
component analysis [19], a simple mathematical reduction technique based on trans-
forming data into vectors and optimizing using eigenvalues. The data had its fair set
of linear and nonlinear features. Hence, Logistic Regression [20] failed to make an
optimal decision boundary. Support Vector Machine (SVM) [21] had slightly more
advantage over the former. However, due to an inherent imbalance within the nature
of high-dimensional data, it failed to construct an optimal hyperplane. Ridge Clas-
sifier [22] yielded similar results as SVM. Hence, we can say that the data appeared
to be linear but had too many non-linearities and a few outliers to handle. Decision
Trees [23] experienced overfitting due to mismatching train-test accuracy. Random
Forest [24] provided an optimal result than its counterparts. However, it saturated
for higher dimensions. XGBoost [25] and Gradient Boosting [26] performed well
on the data, with the former doing slightly better, and further LightGBM [27] gave
similar results as XGBoost, but with a reduced training time Upon further fine-tuning
with randomized grid search [28], we gained the most optimal results, without any
overfitting. F1-Score [29] is the most suitable metric based on the idea of dealing
with class-imbalance problems. Thus, we selected it as our evaluation metric.
Performance evaluation by segregating actual human code from machine-generated

bot code proves to be a promising metric for versioning systems. Light Gradient
Boosting proved to be the most optimal model with data abstraction coupling, and
Halstead team velocity program length being the most contributive features to the
Table 4 Tabulated results of

Model algorithm Precision Recall F1-score
modeling
Logistic regression 0.79 0.80 0.795
SVM classifier 0.81 0.82 0.815
Ridge classifier 0.82 0.81 0.815
Gradient boosting 0.92 0.91 0.915
XGBoost 0.93 0.95 0.94
Light GBM 0.94 0.95 0.945
model. We observed that bot code suffers from code quality issues and lesser logic;
however, it is a good alternative for maintenance work, and actual human effort is
highly modular and logical (Table 4).
5 Conclusions
This paper provides insight into machine-generated code that has a long way to
replace human coding efforts in software development and can only assist managers.
It is more suitable for performing maintenance work, and if overused, can lead to more
problems than benefits. Identifying human coding effort will serve as a cutting edge
metric to evaluate an employee’s performance, giving better ideas to the manager
for incentive distribution and team quality. With automated bots for maintenance
and testing, developers can shift their focus to more pleasing aspects of product
designing, however, to a limited degree.
References
1. Treude, C., Storey, M.: Awareness 2.0: staying aware of projects, developers and tasks using
dashboards and feeds. In: IEEE International Conference on Software Engineering (2010)
2. Canaporo, M., Ronchieri, E.: Data mining techniques for software quality prediction in open
source software: an initial assessment. In: European Physical Journal Conference (2019)
3. Punitha, K., Chitra, S.: Software defect prediction using software metrics: a survey. In:
International Conference on Information Communication and Embedded Systems (2013)
4. Yamauchi, K., Aman, H., Amasaki, S., Yokogawa, T., Kawahara, M.: An entropy-based metric
of developer contribution in open source development and its application to fault-prone program
analysis. Int. J. Netw. Distrib. Comput. 6(3) (2018)
5. Kollanyi, B.: Where do bots come from? An analysis of bot codes shared on GitHub. Int. J.
Commun. (2016)
6. Nagappan, N., Ball, T., Zeller, A: Mining metrics to predict component failures. In: International
Conference on Software Engineering (2006)
7. Idreos, S., Callaghan, M.: Key-value storage engines. In: ACM SIGMOD International
Conference on Management of Data (2020)
8. Bavota, G., De Carluccio, B., De Lucia, A., Di Penta, M., Oliveto, R., Strollo, O.: When does
a refactoring introduce bugs? An empirical study. In: IEEE International Workshop on Source
Code Analysis & Manipulation (2012)
9. Abouelela, M., Benedicenti, L.: Bayesian network based XP process modelling. Int. J. Softw.
Eng. Appl. (2010)
10. Chang, Z., Son, R.G., Sun, Y.: Validating halstead metrics for scratch program using process
data. In: IEEE International Conference on Consumer Electronics (2018)
11. Kapur, R., Sodhi, B.: A defect estimator for source code: linking defect reports with
programming constructs usage metrics. In: ACM Transactions in Software Engineering and
Methodology (2020)
12. Misra, S., Fernandez-Sanz, L., Adewumi, A., Crawford, B., Soto, R.: Applicability of cyclo-
matic complexity on WSDL. In: International Conference on Soft Computing, Intelligent
Systems, and Information Technology (2015)
13. Arora, R., Kumar, M.: Dynamic coupling metrics for object oriented software. Int. J. Res. Anal.
Rev. 5(2) (2018)
14. Murgia, A., Tonelli, R., Marchesi, M., Concas, G., Counsell, S., McFall, J., Swift, S.: Refac-
toring and its relationship with fan-in and fan-out: an empirical study. In: IEEE European
Conference on Software Maintenance and Engineering (2012)
15. Khan, S.I., Latiful Hoque, A.S.Md.: SICE: an improved missing data imputation technique. J.
Big Data (2020)
16. Mohsin, M.F.M., Hamdan, A.R., Bakar, A.A.: The effect of normalization for real value negative
selection algorithm. In: International Multi-Conference on Artificial Intelligence Technology
(2013)
17. Fernández, A., García, S., Herrera, F., Chawla, N.V.: SMOTE for learning from imbalanced
data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. (2018)
18. Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal
datasets for quality classification. In: IEEE International Conference on Advanced Computing
(2016)
19. Sehgal, S., Singh, H., Agarwal, M., Bhasker, V., Shantanu: Data analysis using principal compo-
nent analysis. IEEE International Conference on Medical Imaging, m-Health and Emerging
Communication Systems (MedCom) (2014)
20. Lv, C., Chen, D.-R.: Interpretable functional logistic regression. In: International Conference
on Computer Science and Application Engineering (2018)
21. Zhang, Y.: Support vector machine classification algorithm and its application. In: International
Conference on Information Computing and Applications (2012)
22. Zhang, L., Suganthan, P.N.: Benchmarking ensemble classifiers with novel co-trained kernel
ridge regression and random vector functional link ensembles. IEEE Comput. Intell. Mag.
(2017)
23. Zhong, Y.: The analysis of cases based on decision tree. In: IEEE International Conference on
Software Engineering and Service Science, Beijing (2016)
24. Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. (2012)
25. Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: ACM SIGKDD
International Conference (2016)
26. Priyadarshini, R.K., Banu, A.B., Nagamani, T.: Gradient boosted decision tree based classifica-
tion for recognizing human behavior. In: International Conference on Advances in Computing
and Communication Engineering (2019)
27. Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: LightGBM: a highly
efficient gradient boosting decision tree. In: International Conference on Neural Information
Processing Systems (NeurIPS) (2017)
28. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn.
(2012)
29. Lipton, Z.C., Elkan, C., Narayanaswamy, B.: Optimal thresholding classifiers to maximize F1
measure. In: Joint European Conference on Machine Learning and Knowledge Discovery in
Databases (2014)
Green Internet of Things: The Next
Generation Energy Efficient Internet
of Things
Navod Neranjan Thilakarathne , Mohan Krishna Kagita,

and W. D. Madhuka Priyashan
Abstract The Internet of Things (IoT) is seen as a novel technical paradigm aimed
at enabling connectivity between billions of interconnected devices all around the
world. This IoT is being served in various domains, such as smart healthcare, traffic
surveillance, smart homes, smart cities, and various industries. IoT’s main func-
tionality includes sensing the surrounding environment, collecting data from the
surrounding, and transmitting those data to the remote data centers or the cloud. This
sharing of vast volumes of data between billions of IoT devices generates a large
energy demand and increases energy wastage in the form of heat. The Green IoT
envisages reducing the energy consumption of IoT devices and keeping the envi-
ronment safe and clean. Inspired by achieving a sustainable next-generation IoT
ecosystem and guiding us toward making a healthy green planet, we first offer an
overview of Green IoT (GIoT), and then the challenges and the future directions
regarding the GIoT are presented in our study.
Keywords IoT · Green IoT · GIoT · Green computing · Green IT
1 Introduction
Day by day IoT-related technologies are getting close to our lives in various forms.
It is believed that IoT will become a revolutionizing technology that can change
the phase of our world [1–7]. This IoT is capable of facilitating the connection of
billions of digital devices. It can also be known as the advanced version of Machine
N. N. Thilakarathne (B)
Department of ICT, Faculty of Technology, University of Colombo, Colombo, Sri Lanka
e-mail: navod.neranjan@ict.cmb.ac.lk
M. K. Kagita
School of Computing and Mathematics, Charles Sturt University, Melbourne, Australia
W. D. M. Priyashan
Department of Mechanical and Manufacturing Engineering, Faculty of Engineering, University of
Ruhuna, Galle, Sri Lanka
e-mail: madhuka.p@mme.ruh.ac.lk
392 N. N. Thilakarathne et al.
to Machine (M2M) communication, where each machine or a digital object commu-

nicates without human interference with another machine or a digital object [1, 2, 8].
IoT and related technologies encompass various devices such as various actuators,
sensors, network gateways, and mobile devices linked through the Internet. These
things or objects can sense the environment, transfer information, and interact with
each other in various ways [1, 8, 9]. Like any other device, the IoT devices utilize
energy to function and operate, but in some cases, these devices utilize more than the
required energy, which leads to waste of energy by generating unnecessary heat. This
waste of energy and excessive heat should be reduced to benefit the economy and
our environment’s safety. The latest technological advancement and the exponential
growth of working IoT devices also increased the energy demand required for device
functionality. This created the desire for low-power consuming IoT, also known as
GIoT [2, 3]. It is claimed that by adapting to low energy consumption strategies, IoT
will play a key role in mitigating the climate crisis in the forthcoming years [10].
Energy consumption rates have reached distressing levels due to various daily
use of energy-hungry digital devices [1, 8, 9, 11, 12]. With the increased use of
connected IoT devices and the amount of data generated and transmitted across the
IoT infrastructure, scientists expect a tremendous data rate and a considerable content
size at the price of exceptional carbon emissions into the environment [1, 10]. The
latest report [13] has shown that carbon dioxide emissions from cellular networks will
be 345 million tons by 2020 and are projected to rise annually. As a result, due to these
massive carbon dioxide emissions and the environmental and health challenges, clean
or green technologies are becoming an enticing research field [1]. On the other hand,
this GIoT ecosystem itself faces various challenges such as security and quality of
service concerns, the complexity of adopting a universal architecture, heterogeneous
devices, etc. As a result, researchers and industry are developing novel solutions
such as innovative GIoT solutions and integrating with enabling technologies like
cloud computing, fog computing, and overcoming such challenges. This work’s main
objective is to provide readers with a brief understanding of GIoT and utilization
of GIoT techniques towards achieving an eco-friendly sustainable world. For the
remaining part of this paper, section two provides an overview of IoT’s architecture
and various IoT applications. In section three, we discuss GIoT and the approaches
for achieving GIoT. The challenges and the future directions are discussed in section
four, and finally, the paper is concluded in area five.
Green Internet of Things: The Next Generation … 393
2 Architecture and the Applications of IoT
2.1 Architecture of IoT
Many researchers have proposed different IoT architectures. But there is no single
unit of IoT architecture that is generally agreed upon [2, 14, 15]. The most well-
known IoT architecture comprises three layers: the perception, network, and the
application layer [16–20], as depicted in Fig. 1.
• The perception layer comprises physical IoT devices that consist of various
actuators and sensors for sensing the environment and collecting information.
• The network layer supports the transmission and the processing of sensor data
gathered by the perception layer. It is mainly used for connecting to other smart
things, network devices, and servers.
• The application layer holds the responsibility for supplying the user with
application-specific services. It consists of various applications that facilitate
smart cities, smart homes, smart healthcare, and other IoT domains.
IoT’s functionality comprises several key stages: identification, sensing, commu-
nication, computing, services, and semantics [15, 21]. The identification stage
ensures that the information or service required reaches the correct address. Their
sensing deals with collecting data from different resources and sending it to the
data centers or the cloud. These IoT sensing devices can contain various sensing
attributes such as air quality, humidity, temperature, etc. IoT communication enables
IoT devices to provide specific services for users, and most of the time, it is carried out
using wireless media such as Bluetooth, BLE (Low Energy Bluetooth), and Wi-Fi.
Various microcontrollers, microprocessors, and many software applications perform
computations. Depending on the context and domain in which the IoT devices reside,
services can vary and provide various end-users’ services. Finally, Semantics deals
with the gathering of intelligent knowledge to make quality decisions.
Fig. 1 The architecture of IoT

2.2 Applications of IoT
It is no doubt that, by monitoring different situations and making smart decisions to

optimize our lifestyle, IoT is revolutionizing our daily lives. Following we discuss
few application domains of IoT [1, 7–10, 14, 15, 20, 22–24].
• Smart Homes: Integrating the home environment with various IoT devices and
technologies like smart TVs, home security systems, heating systems can facilitate
tracking the activities of inhabitants and controlling the home environment [3, 25].
• Food Supply Chains (FSC): Integrating IoT technologies in the supply chain
allows the vendors to keep track of their products, from the farm to the consumers
[3, 26].
• IoT in Mining Industry: IoT technologies can ensure miners’ safety and provide
valuable information regarding the mining process to the mining companies.
Besides, it facilitates communication and allows companies to track down the
location of miners.
• IoT in Transportation: IoT allows for tracking vehicles and products using various
tracking devices. (For example, Radio Frequency Identification (RFID) tags)
• Smart Cities: The smart city is an amalgamation of different smart domains like
smart homes, intelligent transportation, and intelligent surveillance to facilitate
residents in the town to have a quality, decent lifestyle [3, 11, 15, 25].
• IoT in Healthcare: IoT in healthcare encompasses various devices (e.g., heart rate
monitors, ventilators, pulse oximetry monitors, etc.) used for patient treatment,
disease diagnosis, remote health monitoring, and emergency patient care [4, 16,
17, 19, ].
• Smart Grid: This is known as the next-generation power grid, which emerged as a
replacement for outdated power systems in the twenty-first century. It is integrated
with advanced communication and computing capabilities that help control and
manage energy resources [3, 15, 18, 25].
• Smart vehicles: This is a novel research area that the automotive industry is
currently focused on, developing an automobile capable of driving itself powered
by electricity or other environmentally safe power sources [27].
• Smart farming: This technology can change the concept of farming as remote
farming locations can be monitored, and livestock can be tracked, whereas this
area has not been adequately researched yet [28, 29].
3 Green Internet of Things
Although the IoT has so many problems, such as security and privacy, interoperability
issues [4–6, 17], energy usage will be the most critical obstacle in implementing the
IoT. As the number of IoT devices such as RFIDs, sensors, actuators, and mobile
devices connected to the Internet has risen rapidly [30], energy needs will also grow. If
the billions of IoT devices are constantly working, it will require massive amounts of
Fig. 2 Green internet of things
energy daily, and it will generate a large volume of data that will magnify the energy
consumption. For transportation of this data and storage also increases the energy
requirement. The fact that we are short of the traditional form of energy sources is
also deepening the crisis. The side effect of this massive energy consumption will
increase carbon dioxide emissions (CO2) to the environment without control. To
solve these problems GIoT is proposed [15, 31, 32]. It can also be described as the
energy-efficient IoT procedures (hardware, software, and policy-based) that reduce
the greenhouse effect [2, 15]. Figure 2 showcases the ecosystem of GIoT.
Before moving into the GIoT approaches, readers need to understand what is
meant by Green Computing (GC) as GIoT is fundamentally based on the GC tech-
niques. GC or most popularly, known as Green IT (GIT), is the study and practice of
environmental sustainability computing or IT. It encompasses the research and the
practice of design, manufacture, use, and disposal of computing components effi-
ciently and effectively with minimal effect or no effect on the environment, and it
consists of four phases that assist in adapting to the green practices.
• Green Use: This focuses on reducing energy consumption and promotes the
sustainable use of computers and other information systems.
• Green Disposal: This focuses on the refurbishing and re-use of obsolete computers
and recycling unwanted computer items.
• Green Design: This focuses on designing computer components that are energy
efficient and environmentally friendly.
• Green Manufacturing: This focuses on developing electronic parts, digital devices
with low impact or no impact on the environment.
It is noted that eco-friendly and energy efficiency are the two unique features
of this GIoT. These characteristics are accomplished by incorporating hardware,
software, and policy-based energy-efficient procedures and techniques to minimize
energy consumption, CO2 emission, and the greenhouse effect [1, 3, 8]. Most IoT
devices are not optimized for energy efficiency. Hence, they waste energy when the
devices are active; even they are not required to be active all the time. Due to this
massive energy consumption and wastage, in GIoT, it is ensured that the IoT device
is ON only when required and idle or OFF when not required. GIoT focuses on the
smart operation of devices with a decrease in energy waste. Proper energy-efficient
ventilation for the heat generated from servers and data centers, intelligent energy-
conserving techniques are various strategies to conserve energy by implementing
GIoT. Several key green technologies, such as green RFID, green sensing networks,
and green cloud computing, have been implemented to achieve GIoT. RFID is a
tiny compact electronic device that contains a variety of RFID tags and small tag
readers [8, 9]. It stores data about the objects to which they are connected. The
transmission range of RFID systems, in general, is a few meters. There are two
types of RFID tags, which are known as passive and active tags. The active tags
have batteries to continuously transmit their signal while there is no battery for the
passive tags. The passive tags need to harvest energy instead of an onboard battery.
Another leading technology for allowing GIoT is the green Wireless Sensor Network
(WSN). Many sensor nodes with minimal power and storage space are used in the
Wireless Sensor Networks (WSNs) [1, 8, 9]. Cloud computing is fundamentally
based on virtualization processes, aiming to reduce energy consumption compared
to having multiple servers in the data centers. Green cloud computing encompasses
various policies for making the cloud more energy efficient. Following, we discuss
the critical GIoT techniques [23, 32, 33].
• Green Internet Technology
Green Internet Technologies require special hardware and software designed to
consume less energy without reducing performance. That includes gateways,
routing devices, and communication protocols, etc.
• Green RFID Tags
Active RFID tags have built-in batteries for continuously transmitting their signal,
while passive RFID tags don’t have an operational battery source. Reducing the
size of an RFID tag can reduce the amount of non-degradable material, and there
are various strategies have been proposed to reduce the energy consumption of
RFID tags. Interested readers are encouraged to refer [1, 23, 33] to understand
Green RFID and associated technologies better.
• Green Wireless Sensor Network
Green WSN can be achieved by green energy conservation techniques, radio
optimization techniques, and green routing techniques, which leads to a reduction
of mobility energy consumption in WSNs. Smart data algorithms can also be

devised to reduce storage capacity and the size of the data content passing in the
WSNs. Also, reducing the energy consumption sensor nodes in the WSN can be
activated only when necessary.
• Green Cloud Computing
In green cloud computing, hardware and software are used in such a way to reduce
energy consumption. Additional policies are applied to make the underlying
processes more energy-efficient [1, 32].
• Green Data Centers
Data centers are responsible for storing, managing, processing, and disseminating
all types of data and applications. Data centers should be designed in a way to
use renewable energy resources. On the other hand, energy-efficient ventilation
techniques, energy-efficient communication protocols needed to be devised to
reduce energy consumption.
In addition to the above-mentioned GIoT technologies, by modifying transmitting
power (to the minimum required level) and carefully applying algorithms to design
practical communication protocols, energy efficiency can be wisely improved. Also,
activity scheduling, the purpose of which is to move specific nodes to low-power
service (sleeping) mode, can further strengthen the energy efficiency of the networks
such that only a subset of connected nodes remain active while the network is still
working [8, 9]. Additionally, there are huge concerns over the toxic pollution and
E-waste generated from this IoT ecosystem. It places new stress on achieving a
sustainable, eco-friendly world. There is a growing desire to shift into GIoT as it
concerns and cares about the entire pervasive IoT ecosystem. To do this, a range
of steps need to be taken to reduce CO2 emission, E-waste, and should encourage
device manufacturers and end-users to devise effective energy-efficient techniques
[30].
3.1 GIoT Approaches
This subsection intends to provide readers with a brief understanding of recent models
and techniques developed and proposed towards achieving GIoT. We have cate-
gorized them based on the devised GIoT approach that is Hardware-Based (HB),
Software-Based (SB), and Policy-Based (PB) (Table 1).
4 Challenges and Future Directions
In GIoT, there are several problems associated with transforming from IoT to GIoT.
It can be based on different parameters like hardware-based, software-based, routing
algorithm-based, policy-based, etc. Hardware-based can be processors, sensors,
Table 1 GIoT approaches

Reference Technology Type Description
[34] GIoT network SB To extend the life of the IoT
networks, an energy-efficient scheme
is proposed in this study
[35] GIoT network SB An energy management scheme for
IoT is introduced in this study
[36] Wireless sensor SB An energy-efficient data routing
network-assisted protocol for data transferring is
IoT network introduced and experimented with
within this study
[37] Virtualization framework for HB An energy-efficient cloud computing
energy-efficient IoT networks platform for IoT is introduced in this
study
[38] Controlling greenhouse effect HB/SB IoT and a cloud-based system for
for precision agriculture precision agriculture are introduced
in this study
[39] Datacenter SB A methodology for context-aware
sever allocation for the
energy-efficient data center is
introduced through this study
[40] Smart home automation PB /SB In this study, the authors propose
various strategies to track different
energy consumption parameters and
reduce the energy wastage in smart
home environments
[41] IoT sensors SB A method for improving energy
efficiency in IoT sensors is proposed
in this study
servers, ICs, RFID devices, etc. At the same time, software-based can be cloud-
based, virtualization, data centers, etc. Policy-based can be smart metering systems,
prediction of energy usage, and so on [3, 14, 15]. GIoT technology is currently in its
infancy, but immense research activities are underway to achieve green technology
and keep the environment safe. As of now, many difficulties and issues have to be
tackled with the urge. Following, we discuss the key challenges that are blocking the
way toward achieving GIoT.
• Universal GIoT Architecture for IoT: Various vendors and standardization orga-
nizations try to allow links between heterogeneous networks and IoT devices with
huge varieties to introduce an energy-efficient architecture that can apply univer-
sally for pervasive IoT ecosystems. But due to the heterogeneity of devices and
networks, it has become a tedious task.
• Green Infrastructure: Offering energy-efficient infrastructure is taken into consid-

eration as a vital issue. However, due to the complexity of deploying signifi-
cantly new infrastructure, this research location is less focused and requires more
attention.
• Green Security and Quality of Service (QoS): Execution of security encryption
algorithms puts the extra load on IoT devices, which cause the consumption of
high energy and power. In the case of GIoT, safety and security are becoming
high priority [5]. Along with achieving security for GIoT, we also need to check
for solutions that give the optimal required QoS for its users.
• GIoT Applications: This is a less focused research area, whereas application layer
services can be made more energy-efficient. This can be achieved through incor-
porating various methods when designing applications such as web applications
(e.g., Blackle energy-saving internet search uses a black background as its search
background)
• Reliability and Content awareness: Reliability and the context-awareness should
be enhanced for green IoT energy consumption models, leading to reliable and
trustworthy GIoT solutions.
• The complexity of IoT Infrastructure: There is no doubt that the IoT ecosystem
comprises various complex devices. Based on the vendor and the underlying
technologies, devices are getting more complex and should have a proper way
to reduce the complexity so more efficient energy-efficient mechanisms can be
introduced.
• Green Design in Practice: As a part of the design process, when designing IoT
devices, concerns like reducing energy consumption and make the devices more
energy efficient with less E-waste need to be considered and incorporated. Thus,
it leads saves time and cost both.
4.1 Future Directions
It is no doubt that the quality of life and the environment can be enhanced by GIoT, by
making the related technologies and related infrastructure more environment friendly.
Recent GIoT research has mainly focused on Green IoT applications and services,
devising advanced energy-efficient RFIDs, energy-efficient models and planning, and
localizing GIoT devices [1, 9, 42]. Also, it is expected that most IoT devices will be
made recycling, again and again, to reduce the toxic and hazardous materials that emit
into the environment. Besides, we can expect that incorporating GIoT with enabling
technologies like cloud computing [43, 44], fog computing, edge computing, and
blockchain will be more familiar with GIoT solutions as those technologies are good
at providing more scalability, security, and high performance for the underlying IoT
ecosystem [2, 11, 15, 45, 46].
5 Conclusion
Inspired by achieving a sustainable green smart world, our study provides an overview
of GIoT and various integrated technologies and challenges of the GIoT. Then,
the future research directions and open problems regarding GIoT have also been
presented. Based on our review, we noted that GIoT could offer many advantages,
such as environmental sustainability and protection, end-user satisfaction in different
IoT domains, and minimize the harmful effects on the environment and human health.
Also, we noted that even though GIoT is currently in its infancy, there are a lot of
GIoT-based research activities are being conducted to keep the environment safe and
reduce the harmful effects of using IoT. As IoT constitutes the main part of digital
infrastructure globally, the benefits we can gain from adapting to green practices will
be immense. We believe this study will help researchers, academics, students, and
other key stakeholders interested in making a safer green world.
References
1. Albreem, M.A., El-Saleh, A.A., Isa, M., Salah, W., Jusoh, M., Azizan, M.M., Ali, A.: Green
internet of things (IoT): an overview. In: 2017 IEEE 4th International Conference on Smart
Instrumentation, Measurement and Application (ICSIMA), pp. 1–6 (2017)
2. Shaikh, F.K., Zeadally, S., Exposito, E.: Enabling technologies for green internet of things.
IEEE Syst. J. 11(2), 983–994 (2015)
3. Ahmad, R., Asim, M.A., Khan, S.Z., Singh, B.: Green IoT—issues and challenges. In: Proceed-
ings of 2nd International Conference on Advanced Computing and Software Engineering
(ICACSE) (2019)
4. Kagita, M.K., Thilakarathne, N., Gadekallu, T.R., Maddikunta, P.K.R.: A review on security
and privacy of internet of medical things (2020). arXiv:2009.05394
5. Kagita, M.K., Thilakarathne, N., Gadekallu, T.R., Maddikunta, P.K.R., Singh, S.: A review on
cyber crimes on the internet of things (2020). arXiv:2009.05708
6. Kagita, M.K., Thilakarathne, N., Rajput, D.S., Lanka, D.S.: A detail study of security and
privacy issues of internet of things (2020). arXiv:2009.06341
7. Al-Turjman, F., Kamal, A., Husain Rehmani, M., Radwan, A., Khan Pathan, A.S.: The green
internet of things (G-IoT) (2019)
8. Prasad, S.S., Kumar, C.: A green and reliable internet of things. Commun. Netw. 5(1), 44–48
(2013)
9. Huang, J., Meng, Y., Gong, X., Liu, Y., Duan, Q.: A novel deployment scheme for green internet
of things. IEEE Internet Things J. 1(2), 196–205 (2014)
10. Green IoT.: https://www.telekom.com/en/company/topic-specials/internet-of-things/green-iot.
Last accessed 07 Nov 2020
11. Dogan, O., Gurcan, O.F.: Applications of big data and green IoT-enabling technologies for
smart cities. In: Handbook of Research on Big Data and the IoT, pp. 22–41 (2019)
12. Varjovi, A.E., Babaie, S.: Green internet of things (GIoT): vision, applications and research
challenges. Sustain. Comput.: Inform. Syst. 28, 100448 (2020)
13. Green Power for Mobile.: The global telecom tower ESCO market, Technical Report (2015)
14. Khan, N., Sajak, A., Alam.: Analysis of green IoT (2020)
15. Arshad, R., Zahoor, S., Shah, M.A., Wahid, A., Yu, H.: Green IoT: an investigation on energy
saving practices for 2020 and beyond. IEEE Access 5, 15667–15681 (2017)
16. Thilakarathne, N.N., Kagita, M.K., Gadekallu, D.T.R.: The role of the internet of things in health
care: a systematic and comprehensive study. Int. J. Eng. Manag.-Ment Res. 10(4), 145–159
(2020)
17. Thilakarathne, N.N.: Security and privacy issues in IoT environment. Int. J. Eng. Manag. Res.
10 (2020)
18. Thilakarathne, N.N., Kagita, M.K., Lanka, D., Ahmad, H.: Smart grid: a survey of architectural
elements, machine learning and deep learning applications and future directions (2020). arXiv:
2010.08094
19. Thilakarathne, N.N., Kagita, M.K., Gadekallu, T.R., Maddikunta, P.K.R.: The adoption of ICT
powered healthcare technologies towards managing global pandemics (2020). arXiv:2009.
05716
20. Bashar, D.A.: Review on sustainable green internet of things and its application. J. Sustain.
Wireless Syst. 1(4), 256–264 (2020)
21. Green IoT way to save the environment.: https://www.techiexpert.com/green-iot-way-to-save-
the-environment/. Last accessed 04 Nov 2020
22. Maksimović, M.: Transforming educational environment through green internet of things (G-
IoT). Trend 2017(23), 32–35 (2017)
23. The Internet of Things: Green Living.: https://en.reset.org/knowledge/internet-things-050
32017. Last accessed 07 Nov 2020
24. Green IoT.: How the internet of things is improving the environment. https://banyanhills.com/
green-iot-how-the-internet-of-things-is-improving-the-environment/. Last accessed 07 Nov
2020
25. Zhu, C., Leung, V.C., Shu, L., Ngai, E.C.H.: Green internet of things for smart world. IEEE
Access 3, 2151–2162 (2015)
26. Li, J., Liu, Y., Zhang, Z., Ren, J., Zhao, N.: Towards green IoT networking: performance
optimization of network coding based communication and reliable storage. IEEE Access 5,
8780–8791 (2017)
27. Smart Car.: https://en.wikipedia.org/wiki/Smart_car. Last accessed 07 Nov 2020
28. Nandyala, C.S., Kim, H.K.: Green IoT agriculture and healthcare application (GAHA). Int. J.
Smart Home 10(4), 289–300 (2016)
29. Ferrag, M.A., Shu, L., Yang, X., Derhab, A., Maglaras, L.: Security and privacy for green IoT-
based agriculture: review, blockchain solutions, and challenges. IEEE Access 8, 32031–32053
(2020)
30. Alsamhi, S.H., Ma, O., Ansari, M.S., Meng, Q.: Greening internet of things for greener and
smarter cities: a survey and future prospects. Telecommun. Syst. 72(4), 609–632 (2019)
31. Maksimović, M., Omanović-Mikličanin, E.: Green internet of things and green nanotechnology
role in realizing smart and sustainable agriculture (2017)
32. Solanki, A., Nayyar, A.: Green internet of things (G-IoT): ICT technologies, principles, applica-
tions, projects, and challenges. In: Handbook of Research on Big Data and the IoT, pp. 379–405
(2019)
33. Gapchup, A., Wani, A., Wadghule, A., Jadhav, S.: Emerging trends of green IoT for smart
world. Int. J. Innov. Res. Comput. Commun. Eng. 5(2), 2139–2148 (2017)
34. Abedin, S.F., Alam, M.G.R., Haw, R., Hong, C.S.: A system model for energy efficient green-
IoT network. In: 2015 International Conference on Information Networking (ICOIN), pp. 177–
182 (2015)
35. Said, O., Al-Makhadmeh, Z., Tolba, A.: EMS: an energy management scheme for green IoT
environments. IEEE Access 8, 44983–44998 (2020)
36. Lenka, R.K., Rath, A.K., Sharma, S.: Building reliable routing infrastructure for green IoT
network. IEEE Access 7, 129892–129909 (2019)
37. Al-Azez, Z.T., Lawey, A.Q., El-Gorashi, T.E., Elmirghani, J.M.: Virtualization framework
for energy efficient IoT networks. In: 2015 IEEE 4th International Conference on Cloud
Networking (CloudNet), pp. 74–77 (2015)
38. Vatari, S., Bakshi, A., Thakur, T.: Green house by using IOT and cloud computing.
In: 2016 IEEE International Conference on Recent Trends in Electronics, Information &
Communication Technology (RTEICT), pp. 246–250 (2016)
39. Peoples, C., Parr, G., McClean, S., Scotney, B., Morrow, P.: Performance evaluation of green
data centre management supporting sustainable growth of the internet of things. Simul. Model.
Pract. Theory 34, 221–242 (2013)
40. Zamora-Izquierdo, M.A., Santa, J., Gómez-Skarmeta, A.F.: An integral and networked home
automation solution for indoor ambient intelligence. IEEE Pervasive Comput. 9(4), 66–77
(2010)
41. Eteläperä, M., Vecchio, M., Giaffreda, R.: Improving energy efficiency in IoT with re-
configurable virtual objects. In: 2014 IEEE World Forum on Internet of Things (WF-IoT),
pp. 520–525 (2014)
42. Yaacoub, E., Kadri, A., Abu-Dayya, A.: Cooperative wireless sensor networks for green internet
of things. In: Proceedings of the 8th ACM Symposium on QoS and Security for Wireless and
Mobile Networks, pp. 79–80 (2012)
43. Maksimovic, M.: Greening the future: green internet of things (G-IoT) as a key technological
enabler of sustainable development. In: Internet of Things and Big Data Analytics Toward
Next-Generation Intelligence, pp. 283–313 (2018)
44. Thilakarathne, N.N., Wickramaaarachchi, D.: Improved hierarchical role based access control
model for cloud computing (2020). arXiv:2011.07764
45. Jalali, F., Khodadustan, S., Gray, C., Hinton, K., Suits, F.: Greening IoT with fog: a survey. In:
2017 IEEE International Conference on Edge Computing (EDGE), pp. 25–31 (2017)
46. Sharma, P.K., Kumar, N., Park, J.H.: Blockchain technology toward green IoT: opportunities
and challenges. IEEE Netw. (2020)
iGarbage: IoT-Based Smart Garbage
Collection System
Zofia Noorain, Mohd. Javed Ansari, Mohd. Shahnawaz Khan,

Tauseef Ahmad, and Md. Asraful Haque
Abstract These days, the populace’s rapid development leads to the development of
garbage and waste materials in urban areas and urban territories. There are numerous
issues to trash assortment for the most part in metropolitan urban communities. In
this paper, we have proposed an IoT-based Smart Garbage Collection System. The
proposed system is used for collecting garbage using IoT-based system. The designed
system takes care of the previously mentioned issue and spares both the garbage
collectors and the individuals from the houses. The practical implementation of the
developed system is very efficient and accurate in its operation. The accuracy results
achieved by real-time operations are very encouraging.
Keywords Internet of Things · IR sensor · ThingSpeak · Arduino UNO · Waste

management
1 Introduction
It is a generally accepted fact that waste and trash are expanded quickly in this day and
age. The Government goes through an enormous whole of cash just on the assortment,
transportation, and the board of trash, and still, it isn’t sufficient [1]. Ecological issues
are raised by current urban areas for trash assortment and removal [2]. Hence, smart
waste management frameworks got essential for urban communities that plan to
decrease cost and oversee assets and time [3]. The run of the mill dustbin can store
trash in it. Regardless of whether the dustbin is full or not, it is upon the clients to
choose. Trash upkeep in dustbins is an issue in numerous homes, extraordinarily in
the metropolitan urban communities principally due to the bustling calendars of city
life [4].
Z. Noorain · Mohd. J. Ansari · Mohd. S. Khan · T. Ahmad (B)

Department of Information Technology, Rajkiya Engineering College Azamgarh, Deogaon, India
e-mail: tauseefahmad@zhcet.ac.in
Md. A. Haque
Department of Computer Engineering, Aligarh Muslim University, Aligarh, India
e-mail: md_asraf@zhcet.ac.in
404 Z. Noorain et al.
The task “IoT-based Smart Garbage Collection System” exhibits a framework that
can be incorporated in a dustbin and permits the dustbin to recognize the degree of
trash in it and send the information to the cloud to check ceaselessly. The framework
utilizes “Thingspeak,” cloud storage going about as both server and controller.
In this model, Arduino is utilized as a microcontroller in which IR Sensor is used
to recognize the degree of the trash in the dustbin, and ESP 8266 Wi-Fi Module is
incorporated to send the gathered information to “Thingspeak” cloud for information
checking and handling. After information handling, in light of the triggers followed
by indicated activities set up by the manager, an SMS will be sent to the concerned
personnel. From that point onward, the concerned person will gather the home’s trash
utilizing the dustbin’s Id number.
The structure of the paper has been organized as follows. Section 2 presents a brief
description of the previous works. Section 3 presents the proposed architecture of
the iGarbage system. Section 4 discusses the results to analyze the system. Section 5
concludes the paper with suggestions for future enhancements.
2 Related Work
The authors in [1] broken down that 85% of the all-out city strong waste administra-
tion financial plan is spent on trash assortment and transportation, so they executed an
IoT-based Smart Bin that will recognize the degree of trash in the container and after-
ward send the data to the database, and important data will be sent to the concerned
position to take appropriate activities.
The authors in [4] examined the number of canisters and the populace in a territory
and attempted to actualize shrewd trash built on a microchip-based stage Arduino
Uno board interfacing with GSM framework and ultrasonic sensors. It can notify the
filling status of the waste in the dustbin to the municipal authority.
The authors in [5] attempted to lessen the measure of strong waste by actualizing
a Smart Dustbin that will distinguish the edge level of the trash in a canister and
afterward utilize the blower to diminish the measure of trash in the container when the
garbage can’t be packed further; the concerned position will discharge the receptacle
by taking activities.
The authors in [6] examined that individuals have less awareness of others’ expec-
tations for neatness and cleanliness in this day and age for the most part in urban
culture. In this way, they had attempted to limit the flood of dustbins by building up
a framework that will distinguish the degree of trash in the dustbin and offer awards
to the clients on the off chance that they toss the garbage in a vacant or incompletely
filled dustbin.
In [7], the authors built up a framework that will identify the degree of trash in
the dustbin and impart a sign if 70% of the dustbin is filled. After the sign is sent,
a blower is utilized to pack the trash level. GSM Module recognizes the area of the
dustbin.
iGarbage: IoT-Based Smart Garbage Collection System 405
The authors in [8] built a framework in which sensor nodes are connected with
an Arduino board control station that sends sensor information in SMS using GSM
module to the trash collecting vehicle and the server. The sensor nodes utilize ultra-
sonic sensors to detect the level of waste against the previously set threshold level.
Also, a GPS module is integrated to get the exact location of the bin. The Amica
R2 NodeMCU microcontroller acts as a controller for GPS modules and ultrasonic
sensors. This board has a built-in Wi-Fi module that is used to send data to the server.
The authors [9] built a system in which an accelerometer sensor is integrated to
detect the dustbin’s closing and closing. Also, a temperature and humidity sensor is
used to check the waste material’s temperature and humidity. An ultrasonic sensor is
used to check the level of garbage in the dustbin. A Zigbee Pro microcontroller is used
to control the sensors. The microcontroller board has a built-in Wi-Fi module that
sends the sensor data to the gateway. The server is over a GPRS. The database man-
agement system used is Caspio. There are many more similar works been presented
in [10–13].
3 The Proposed Architecture
Internet of Things is defined as a network of electronic components embedded with

sensors, software, and other technologies to exchange information over the Internet
[14]. The Internet of things is a technology or framework that interconnects com-
putation devices, mechanical machines, digital machines, things, animals or human
beings enabling them to communicate with each other and transfer data using a
Unique Identifier (UID) with direct or indirect human interaction with the Internet.
Figure 1 shows an overview of IoT. The important hardware and tools used for our
system are listed below.
– Arduino UNO
Arduino is an open-source, easy to use a microcontroller. It provides an inexpensive
and flexible platform to interact with other devices using sensors and actuators [15].
– IR Sensor
An infrared sensor is an electronic instrument used to sense certain characteristics
of its surroundings by either emitting and detecting infrared radiation [16].
– ESP8266 Wi-Fi Module
The ESP8266 Wi-Fi Module is a self-contained System on Chip (SoC) with an
integrated TCP/IP protocol stack that facilitates any microcontroller to access a
Wi-Fi network. It supports Bluetooth interfaces and APSD for VoIP applications.
– ThingSpeak Cloud
ThingSpeak is an open-source IoT-API that is used to analyze live data streams in
the cloud. ThingSpeak is often used for the creation of sensor logging applications,
location tracking applications.
Fig. 1 Internet of Things [3]
– SMS API
An SMS API is a dedicated code that allows existing platforms to integrate Short
Message Service (SMS) service. Here the same twilio SMS API is used to send
messages. It has been configured with the Thingspeak cloud.
3.1 iGarbage Architecture
Sensing happens with the assistance of an IR Sensor coordinated in the dustbin. The
detected information continuously gets transferred to ThingSpeak cloud utilizing the
Internet. If the information at ThingSpeak goes to a specific characterized value, an
SMS is sent to the garbage collector. Figure 2 shows the working of the proposed
model.
3.2 Global Functioning of Proposed Architecture
Our framework consists of three modules:

– Sensing Module
– Networking Module
– Communication Module.
Fig. 2 Proposed architecture
3.2.1 Sensing Module
The sensing module consists of a sensor integrated with the dustbin. This module
is responsible for detecting the trash within the threshold limit by emitting Infrared
radiations and receiving it back.
3.2.2 Networking Module
The networking module is a layer that establishes a connection between the sensing
module and the communication module. It consists of a Wi-Fi ESP8266 module and
Internet.
3.2.3 Communication Module
The communication module consists of ThingSpeak Cloud and SMS alerts with SMS
API integration.
3.3 Flowchart of the Working Model
The status of garbage in the dustbin is continuously observed. In the wake of arriving
at the limit level, information is sent to the ThingSpeak cloud. Further, a message is
Fig. 3 Flowchart of the

proposed system
sent to the staff with the dustbin id to gather the garbage from a particular dustbin.
After an assortment of waste, the status of the dustbin is refreshed. The flowchart for
the proposed model is presented in Fig. 3.
4 Implementation of the Proposed System
This proposed system was implemented for a colony with few residential flats for
obtaining the results and checking the proposed model’s performance.
IR sensor is integrated into the Arduino Board to detect the level of the garbage.
To make the IR sensor functional, pin 12 or digital pin 12 of the Arduino board is
connected with the IR sensor’s output pin. A 5V power supply is supplied to the IR
sensor with a USB cable connected to the Arduino board. The above integration of
the IR sensor with Arduino makes it functional.
Whenever any object comes around at approximately 6 cm, the IR sensor will
detect it and send this signal to the microcontroller.
To upload this data in the Thingspeak cloud or server, integration of the Wi-Fi
ESP8266 module is done. To make the Wi-Fi module functional, all 6 pins are used.
The 0(RX) pin of the Arduino Board is connected with the Wi-Fi module’s RX pin.
The digital pin of 1(TX) of Arduino Board is connected with TX pin, the 3.3 V pin
of Arduino is connected with Power pin (3.3 V) and EN pin of Wi-Fi module and the
Fig. 4 Circuitry design
rest, ground pin is associated with the ground. When the microcontroller reads the
IR sensor’s signal, the data gets uploaded to the Thingspeak server. As soon as the
garbage level reaches the threshold limit, the connected LED light turns from green
to red. The LED light is integrated into the dustbin. To make it work, 2 pins of RGB
LED are used. One is connected to the digital pin 13 of the Arduino board as the
output pin, and the other pin is connected to the ground. The circuit design for the i
garbage is presented in Fig. 4.
4.1 Software Implementation
4.1.1 SMS API
The SMS API is used to send messages to individual staff whenever the dustbin
reaches the threshold value. Here, the twilio SMS API is used to send messages. It
has been configured with the Thingspeak cloud. The official website of the SMS API
used here is https://www.twilio.com.
Fig. 5 ThingSpeak cloud real-time data monitoring
4.1.2 ThingSpeak Cloud Platform
ThingSpeak cloud platform is used to track the real-time garbage level of the dustbin.
Also, this platform is used to integrate SMS API via ThingHttp and React Application
for sending SMS according to the prescribed condition. As soon as the prescribed
condition triggers, an SMS gets delivered to the specified number.
A field chart presented in Fig. 5 is used to display the real-time data. Write
API key and Read API key are provided to send data in ThingSpeak server from
a microcontroller. The figure shows Read and Write API keys as well key URL to
write and read data.
4.1.3 ThingHTTP
ThingHTTP is a service provided by ThingSpeak. The ThingHTTP App enables a

microcontroller to connect over the Internet to any web server using HTTP. With
this app’s help, we create an HTTP object and then manage the object with basic
API commands. Username and password authentication provided by twilio is used
to create a ThingHTTP request.
4.1.4 REACT
React is an Application that is used to define the trigger for SMS sending automation.
Here, a condition is provided that, if the IR sensor’s value (value in Field 1) reaches
1, action triggers, and with the help of ThingHttp, an SMS is sent to the respective
staff.
4.1.5 Web Application
A site is utilized to get subtleties of the dustbin. The site has three segments, Guest,
Staff and Admin as shown in Fig. 6. A visitor can see the subtleties of the dustbin
by giving the Dustbin ID and password. A staff can list all the dustbins in a specific
state or a city. The administrator has full control over the site and can change the
data to the dustbins.
Fig. 6 User interface web application

The proposed system is cheap as compared to other previous systems consisting of

cameras or RFID tags.
The dustbin was able to detect the garbage level with the help of an IR Sensor.
The data was successfully getting uploaded to the ThingSpeak Cloud. The graph was
plotted correctly at the ThingSpeak dashboard. The SMS was successfully sent to
the staff. All the website controls were adequately functioning.
Figure 7 shows the proposed IoT-based Smart Garbage Collection System. All
the parts like Arduino UNO, Wi-Fi Module, IR Sensor, and LED are incorporated
on the dustbin. The principal picture dustbin isn’t full until a given edge limit, so the
LED isn’t gleaming. The subsequent picture is demonstrating a sparkled LED after
the dustbin is full to the threshold limit.
Figure 8 shows the Smart Bin Web application used to get information related
to the dustbin and provide detailed information about the locations at which the
dustbins are installed in a given city or area. The dustbin status is also provided to
the administrator so that the appropriate action can be taken.
To test the proposed model’s efficiency, accuracy is cross-validated through
the real-time data received physically from dustbins with different garbage levels.
Fig. 7 iGarbage: Smart Bin working model

Fig. 8 User interface web application
Fig. 9 Distance values obtained from sensors and real-time
Figure 9 shows the proposed model’s accuracy when compared with the real-time
data received from the dustbin. The developed model is quite efficient when com-
pared with the real-time data received from the dustbin. The data values reported
from the model are very close to the real-time data received from the dustbin. A
brief comparison of some essential features incorporated in our model is presented
in Table 1.
Table 1 Comparison of important features incorporated in our model

S. No. Problem Proposed solution in iGarbage
1 GSM module used for SMS [5, 7] Twilio API for sending SMS
2 SIM card for each garbage bin [7] Single API is used to send SMS
3 Costly email charges [14] Cheaper SMS charges
4 No central server [13] ThingSpeak cloud as a central server
5 Absence of management portal [7] Web application for dustbin management
6 No analysis tools [5, 7] Matlab analysis tools
6 Conclusion and Future Work
The framework named “IoT-based Smart Garbage Collection System” has been ten-
tatively demonstrated to work sufficiently by incorporating various segments con-
strained by the microcontroller. The IR Sensor was tried on numerous occasions by
putting the trash at various levels. The results obtained were quite encouraging. The
efficiency and accuracy achieved are outstanding.
Future enhancements for the proposed framework can be as follows:
– Including AI empower framework that will automatically isolate the dry and wet
waste.
– A component would likewise be added to open the dustbin on the voice identifi-
cation control framework consequently.
– The proposed iGarbage system requires more maintenance cost, and the proposed
system is battery operated. One of the future directions can be to improve the
battery life of iGarbage system.
References
1. Zeb, A., Ali, Q., Saleem, M.Q., Awan, K.M., Alowayr, A.S., Uddin, J., Iqbal, S., Bashir, F.:
A proposed IoT-enabled smart waste bin management system and efficient route selection.
Hindawi J. Comput. Netw. Commun. 2019, 1–9 (2019)
2. Chowdhury, B., Chowdhury, M.U.: RFID-based real-time smart waste management system. In:
Australasian Telecommunication Networks and Applications Conference, pp. 175–180 (2007)
3. Zanella, A., Bui, N., Castellani, A., Vangelista, L., Zorzi, M.: Internet of Things for smart
cities. IEEE Internet Things J. 1(1), 22–32 (2014)
4. Sinha, T., Kumar, K.M., Saisharan, P.: Smart dustbin. Int. J. Ind. Electron. Electr. Eng. 03(05)
(2017)
5. Nagaraju, U., Mishra, R., Kumar, C., Rajkumar: Smart dustbin for economic growth. Project
report
6. Parikh, P.A., Vasani, R., Raval, A.: Smart dustbin—an intelligent approach to fulfill Swatchh
Bharat Mission. Int. J. Eng. Res. Electron. Commun. Eng. 4(10) (2017)
7. Thorat, S., Kanase, S., Bhingardeve, P.: Smart dustbin container using IoT notification. Int.
Res. J. Eng. Technol. 6(4) (2019)
8. Omar, M.F., Termizi, A.A.A., Zainal, D., Wahap, N.A., Ismail, N.M., Ahmad, N.: Implemen-
tation of spatial smart waste management system in Malaysia. IOP Conf. Ser.: Earth Environ.
Sci. 37 (2016)
9. Longhi, S., Marzioni, D., Alidori, E., Buo, G.D., Prist, M., Grisostomi, M., Pirro, M.: Solid
waste management architecture using wireless sensor network technology. In: 5th International
Conference on New Technologies, Mobility and Security (NTMS), pp. 1–5 (2012)
10. Ahmad, T., Abbas, A.M.: EEAC: an energy efficient adaptive cluster based target tracking in
wireless sensor networks. J. Interdiscip. Math. 23(2), 379–392 (2020)
11. Murugaanandam, S., Ganapathy, V., Balaji, R.: Efficient IOT based smart bin for clean envi-
ronment. In: International Conference on Communication and Signal Processing (ICCSP), pp.
0715–0720 (2018)
12. Ahmad, T., Haque, M., Khan, A.M.: An energy-efficient cluster head selection using artifi-
cial bees colony optimization for wireless sensor networks. In: Advances in Nature-Inspired
Computing and Applications. EAI/Springer Innovations in Communication and Computing
(2019)
13. Nehete, P., Jangam, D., Barne, N., Bhoite, P., Jadhav, S.: IoT based garbage monitoring system.
In: Second International Conference on Electronics, Communication and Aerospace Technol-
ogy (ICECA), pp. 1454–1458 (2018)
14. Zouai, M., Kazar, O., Bellot, G.O., Haba, B., Kabachi, N., Krishnamurhty, M.: Ambiance
intelligence approach using IoT and multi-agent system. Int. J. Distrib. Syst. Technol. 10(1)
(2019)
15. Louis, L.: Working principle of Arduino and using it as a tool for study and research. Int. J.
Control Autom. Commun. Syst. 1(2) (2016)
16. Karim, A., Andersson, J.Y.: Infrared detectors: advances, challenges and new technologies.
IOP Conf. Ser.: Mater. Sci. Eng. 51 (2013)
IoT-Based Smart Home Surveillance
System
Shruti Dash and Pallavi Choudekar
Abstract Most surveillance system prototypes that have been developed to date
utilize sensors “for motion detection and require a memory card for storing data. The
issues like—price associated with the cameras, flexibility, and user-friendliness need
to be addressed for a surveillance system. This paper aims to develop an affordable
surveillance system and doesn’t need external memory devices by deploying cloud
storage. It uses computer vision for motion detection, which allows for facial recog-
nition based on preloaded data. The final model incorporating all these features has
been developed successfully and verified through multiple testing processes.”
Keywords Surveillance · RaspberryPi · IoT
1 Introduction
In the face of the increasing number of crimes, it has become essential to ensure
the safety of one’s home by continually monitoring and remaining alert about any
trespassers that try to enter the premises illegitimately. Even though the market
is swamped with different kinds of surveillance systems, there is still scope for
enhancing these systems in terms of users’ features.
Many surveillance systems today depend on a physical memory component like
an external SD card to store data from the surveillance video stream [1]. This surges
cost for the user in the long term. Further, hardly any CCTV cameras are prepared for
motion detection [2], and these can also be more expensive than the existing CCTV
cameras in use. Facial recognition is not yet prevalent or widely seen in the CCTV
systems available in the market [3]. This surveillance system model solves the above
difficulties by integrating cloud storage, allowing motion detection as well as facial
recognition during the video stream, and having a low cost as well. The objectives
of this research are
1. To design an intelligent surveillance system using Raspberry Pi
S. Dash · P. Choudekar (B)

Amity University, Noida 201301, UP, India
e-mail: pachoudekar@amity.edu
418 S. Dash and P. Choudekar
Table 1 Description of prototype modules

Module Description
Live video feed Show the live footage of surroundings without any delay on the user’s
computer screen
Motion detection Detect motion using computer vision, simply known as image processing
Facial detection Recognize the faces of known people based on a predefined database for
improved security
Cloud storage Capture images when motion is detected and store these images to DropBox,
a cloud storage client
User notification As soon as motion is detected, send an email to the user without delay
2. To implement the intelligent surveillance system

3. To test the intelligent surveillance system.
This surveillance system is developed, keeping in mind the requirement of a CCTV
camera set up in homes and offices. It is used in places where a clear distinction is to
be made between trusted and unknown people. The proposed modules of this system
are summarized below (Table 1).
This paper outlines the design methodology required to build the surveillance
system and complete testing and verification of all functions of individual modules
and the entire system. Further scope of this paper and possible modifications have
also been covered at the end of the article.
2 Literature Review
Raspberry Pi was developed in the United Kingdom by the Raspberry Pi Foundation

as a series of small single-board computers, in collaboration with Broadcom. Today,
it is the preferred choice of processor for most hobbyists looking to develop usable
and handy computer projects. The board used in this project is the Raspberry Pi 3
Model B and is the latest development offered by the RPi foundation. It features an
ethernet port, 4 USB ports, headphone jack, HDMI port, camera port, and 40 GPIO
pins. The processor used is Broadcom BCM2835 with 512 MB RAM, the highest
capacity offered on Pi 3 models [4–6].
The camera used for this project is the official camera module offered by RPi foun-
dation, PiCamera. This camera can be easily controlled using the python interface
and is most commonly used for home automation projects involving RPi. Previous
models of surveillance system developed by various researchers have used both RPi
and PiCamera and are in many ways similar to the proposed prototype, but there are
some striking differences as summarized below (Table 2).
IoT-Based Smart Home Surveillance System 419
Table 2 Comparison between proposed and existing systems

Property Existing systems [7–11] Proposed system
Cloud storage Not available; SD cards used for Available; eliminates the
storage need for external memory
cards
Facial recognition Not implemented Implemented using computer
vision
Hardware requirement for PIR sensors used Image processing used; a
motion detection difference in consecutive
frames triggers motion
detection
Power consumption High Low; RPi only needs a 5A
supply
The methodology used for this paper is the object-oriented design and analysis,
or OOAD methodology. The first step in this process is to demarcate the system’s
different modules, as done already, and then design the system required. Raspberry Pi
is the heart of this surveillance system and is the primary microcomputer used to run
all programs. The official RPi camera module, PiCamera, has been used to capture
images and stream video. Setting up this surveillance system requires executing a
simple python script that gives a continuous live feed of the surroundings. The setup
only consists of the small RPi board and its attached camera module fixed on top of
the board connected to the nearest power supply.
Figure 1 represents the numerous components in this surveillance system and their
connections. Raspberry Pi’s central element is a small hand-sized microcomputer
Fig. 1 Block diagram of surveillance system

that acts as the brain of this surveillance system. RPi is easy to get and offers a very
intuitive programming environment due to Linux distribution, the Raspbian OS [12,
13]. Further, it has its camera module, PiCamera, and can also work with a USB
camera bought separately. This offers ease of setup to the programmer and user. The
other interconnected components, DropBox, host computer, and computer vision, all
work together in sync to capture and store images once motion is detected. Dropbox
is a popular cloud-based storage service used here to store snapshots of detected
motion in this surveillance system. Any popular service can be used for this purpose,
such as Amazon Web Services or Google Cloud. Dropbox makes the prototype cost-
effective and is the best replacement for traditional memory cards. It also makes it
easier to send updates to the user in case of any motion detection. Motion detection
and facial recognition are accomplished by using Python programming to implement
computer vision. The component of computer vision is performed by OpenCV, which
is a dedicated library primarily employed for image processing tasks [14, 15]. “For
motion detection, each frame recorded through the video stream is processed, and
when there is a change between successive frames, motion is detected by the RPi.
If a face is perceptible by the camera for a certain period, facial recognition is also
accomplished. For successfully using facial recognition, it is a precondition to first
load the pictures of known people to the RPi’s internal memory (Fig. 2).”
Fig. 2 Flowchart of
surveillance system
The entire setup can be divided into distinct modules, each module working sepa-
rately for live video feed, motion detection, facial recognition, storage, and user
notification. As discussed, this project follows an object-oriented analysis and design
methodology, or the OOAD methods to analyze system requirements, implementing
the design to satisfy these requirements, and finally, testing for ensuring proper
working. The requirement analysis can be summarized as below (Table 3).
Further, the functionality of the prototype is also tested as has been summarized
below (Table 4).
Table 3 Requirement analysis (Input–Output Requirement)

Module Input Process Output
Motion detection Live video feed A change in consecutive Image captured as soon as
frames triggers motion motion detected
detection
Facial recognition Live video feed Name of a person displayed The captured photo has a
in case person is trusted name stamp
(his image is stored in the
database)
Storage Motion detection Image captured is DropBox account can be
immediately sent to accessed to check all stored
DropBox images
Live video feed None A feed of surroundings The video feed can be seen
shown without lag on the user’s computer
screen
User notification Image capturing The captured image is User can get alert of
emailed to the user intrusion
Table 4 Functional testing results

S. Process Expected result Actual result Pass/fail
No
1 Live video feed has no Delay of less than 5 s No noticeable delay Pass
delay expected
2 Images captured are Dropbox folder updates All captured images can Pass
being stored in Dropbox immediately with be viewed via DropBox
pictures
3 Motion is detected Lag of less than 5 s 1–2 s of lag noticed Pass
without lag expected
4 Motion detection PiCamera captures the PiCamera captures the Pass
triggers a camera image as soon as motion image as soon as motion
is detected is detected
5 User gets notification Delay of less than 5 s Delay of 2–3 s seen Pass
expected
The final step in the model development procedure was to conduct both unit and
system tests. It is necessary to test the prototype to catch any functioning errors and
ensure working is as envisioned during the conception stage. For testing, the entire
hardware setup consisting of Raspberry Pi and PiCamera is connected to a 5-A power
supply and connected to the same internet network as the host computer. Through
this, RPi’s graphical user interface can be seen and operated via the host computer
screen.
Figure 3 shows the physical arrangement of the surveillance system model. The
initial step was to perform unit functionality tests for each module to check various
aspects like camera setting, the video display on the host screen, motion detection and
facial recognition by RPi, and storage of PiCamera photographs into the DropBox.
Python code for all these tasks is first written separately and tested out one by one.
Once it is confirmed that all the above features are working, the entire code for these
individual tasks is created to create a final surveillance system model program. This
comes under system testing. The camera can click images, and these images are
stored successfully on DropBox.
Figure 4 shows the successful setup of the PiCamera module. Room status is
unoccupied when no motion is detected.
Figure 5 shows the successful setup of the video stream of surroundings. As soon
as RPi detects motion, the image is captured and uploaded to DropBox, as shown
here. Room status changes to occupied.
Figure 6 shows that motion detection is successfully enabled. All snapshots are
available in the DropBox account, along with timestamp and date.
Fig. 3 Hardware setup

Fig. 4 Camera setup
Fig. 5 Video stream setup

Fig. 6 Motion detection
Figure 7 shows that the surveillance system has been successfully connected to the
cloud. Facial recognition is seen to be working correctly as well, using the preloaded
images on RPi memory.
Figure 8 shows that the home surveillance system is successful in identifying
faces. For the final system test, the hardware setup was connected to a power supply
and stationed in front of the entrance of a home to alert the user of any intrusion. As
already established through unit testing, the snapshots were uploaded to DropBox,
and face was recognized for the individuals whose pictures were existing in the
database of RPi. There is an additional possibility for upgrading the proposed work,
wherein the camera module can be substituted by the infrared PiCcamera module
through which RPi can capture pictures and identify faces in dark settings as well,
the face recognition feature can be changed to distinguish recurrently seen faces by
itself instead of relying on a preloaded database, streaming the video feed to a website
accessible only to the user, and adding an alarm or beeper for forming an improved
alert system. All these modifications have been summarized in Table 5. Recently,
Internet of Everything (IoET) and Cloud-based computing systems have become
very popular due to their inherent and location-independent operation, low-power
requirements, portability, and high scalability [16, 17]. The proposed work can also
be expanded using these technologies.
Fig. 7 Cloud integration
Fig. 8 Facial recognition

Table 5 Future Scope of Proposed Surveillance System

Property Developed model Future model
Night vision Not available; PiCamera is used It can be incorporated using an
infrared model
Facial recognition Requires preloaded data Machine learning can be used to
recognize frequently seen faces
Live video stream Available on the host computer It can be made available on a
website
User alert Email used to alert the user An alarm can be added for extra
security
5 Conclusion
This surveillance system archetype was planned, keeping in mind the requirement for
a small form factor, low cost, ease of use in terms of position-setting, and flexibility
for the user. Raspberry Pi is the operating system used here. Since it has a Linux
distribution, it is easily modifiable. As it is readily available to people, either in
physical markets or online, the surveillance system also proves cost-effective. Novel
features that had not been earlier seen in other surveillance systems have also been
incorporated, like using a cloud storage client, Dropbox in this case, using computer
vision for motion detection and facial recognition, and optimizing the time delay in
sending user updates and live video stream.
References
1. Hou, J., Wu, C., Yuan, Z., Tan, J., Wang, Q., Zhou, Y.: Research of intelligent home secu-
rity surveillance system based on ZigBee. In: 2008 International Symposium on Intelligent
Information Technology Application Workshops (2008)
2. Keat, L.H., Wen, C.C.: Smart indoor home surveillance monitoring system using Raspberry
Pi, vol 2. International Journal on Informatics Visualization (2018)
3. Pi, R.: Raspberry pi. Raspberry Pi 1, 1 (2013)
4. Richardson, M., Wallace, S.: Getting started with raspberry PI. O’Reilly Media, Inc. (2012)
5. Upton, E., Halfacree, G.: Raspberry Pi User Guide. John Wiley & Sons (2014)
6. Keval, H.: CCTV control room collaboration and communication: does it work? In: Proceedings
of Human-Centered Technology Workshop, pp. 11–12 (2006)
7. Iyer, B., Pathak, N.P., Ghosh, D.: RF sensor for smart home application. Int. J. Syst. Assur.
Eng. Manag. 9, 52–57 (2018). https://doi.org/10.1007/s13198-016-0468-5
8. Poole, N.R., Zhou, Q., Abatis, P.: Analysis of CCTV digital video recorder hard disk storage
system. Digit. Invest. 5(3), 85–92
9. Gerrard, G., Parkins, G., Cunningham, I., Jones, W., Hill, S., Douglas, S.: National CCTV
strategy. Home Office, London (2007)
10. Boghossian, B.A., Velastin, S.A.: Motion-based machine vision techniques for the manage-
ment of large crowds. In: Electronics, Circuits and Systems, 1999. The 6th IEEE International
Conference on Proceedings of ICECS’99, vol. 2, pp. 961–964. IEEE
11. Quadri, S.A.I., Sathish, P.: IoT based home automation and surveillance system. In: 2017
International Conference on Intelligent Computing and Control Systems (ICICCS) (2017).
12. Sruthy, S., George, S.N.: Wi-Fi enabled home security surveillance system using Raspberry Pi
and IoT module. In: 2017 IEEE International Conference on Signal Processing, Informatics,
Communication and Energy Systems (SPICES) (2017)
13. Noble, F.K.: Comparison of OpenCV’s feature detectors and feature matchers. In: 2016 23rd
International Conference on Mechatronics and Machine Vision in Practice (M2VIP) (2016)
14. Lin, C., Tang, Y.: Research and design of the intelligent surveillance system based on
DirectShow and OpenCV. In: 2011 International Conference on Consumer Electronics,
Communications, and Networks (CECNet) (2011)
15. Ahmad Razimi, U.N., Alkawaz, M.H., Segar, S.D.: Indoor intrusion detection and filtering
system using raspberry Pi. In: 2020 16th IEEE International Colloquium on Signal Processing
& Its Applications (CSPA) (2020)
16. Deshpande, P., Iyer, B.: Research directions in the Internet of Every Things (IoET). In: 2017
17. Deshpande, P., Sharma, S.C., Peddoju, S.K., Abhrahm, A.: Efficient multimedia data storage
in cloud environment. Inf. Int. J. Compu. Inform. 39(4), 431–442 (2015)
Optimized Neural Network for Big Data
Classification Using MapReduce
Approach
Sridhar Gujjeti and Suresh Pabboju
Abstract This paper proposes a big data classification approach over big data based
on the MapReduce framework. In the mapper phase, feature selection is carried out
by selecting features based on Principal Component Analysis (PCA). Once feature
selection is made, the selected features are passed to the reducer phase, where clas-
sification is done by the proposed Rider Neural Network (RideNN) categorizes the
data into two classes, like normal and abnormal classes. The proposed RideNN clas-
sifier achieves a high accuracy of 0.932, maximal sensitivity of 0.831, and maximal
specificity of 0.958 based on the Cleveland dataset.
Keywords Big data classification · MapReduce framework · Neural network ·

Principal component analysis · Rider optimization algorithm
1 Introduction
One of the significant challenges involved in classification as well as machine learning

approaches is the extraction of knowledge from the enormous databases [1–3].
Big data has been under significant consideration in several fields, like electronic
commerce, e-health, the Internet of Things (IoT), bioinformatics, and Online Social
Networks (OSN) [3–5]. The MapReduce framework manages the issues related to
massive datasets [6–8]. In general, the MapReduce framework is implemented based
on the robust parallel programming framework known as Hadoop [3, 9, 10]. Also,
deep learning approaches are very advantageous while dealing with substantial unsu-
pervised data, and the data are represented typically using a greedy layer-wise manner
[11–14].
S. Gujjeti (B)
Computer Science & Engineering, Kakatiya Institute of Technology & Science, Bheemaram,
Hanamkonda 506015, India
S. Pabboju
Information Technology, Chaitanya Bharathi Institute of Technology, Gandipet, Hyderabad
500075, India
430 S. Gujjeti and S. Pabboju
Several techniques are developed in the literature for big data classification [15,
16]. Gerardo et al. [17] developed two hybrid neural architecture that combines
perceptrons and the morphological neurons. Hassib et al. [3] presented a machine
learning approach for classifying imbalanced datasets. Lin et al. [18] developed an
improved Cat Swarm Optimization algorithm for the feature selection for solving
significant data classification issues. Elkano et al. [19] modeled distributed learning
method, termed CFM-BD, for constructing accurate fuzzy rule-enabled classification
systems for Big Data.
Proposed RideNN for disease prediction over big data: The proposed RideNN
is introduced for big data classification in the reducer phase in which training is
carried out based on Rider Optimization Algorithm (ROA) such that the classification
accuracy is enhanced.
The rest of the paper is organized as follows: Sect. 2 describes the proposed
MapReduce-based Rider Neural Network (RideNN) for big data classification.
Section 3 provides the results and discussion. The conclusion is provided in Sect. 4.
2 Proposed MapReduce-Based Rider Neural Network

for Classifying Big Data
Figure 1 illustrates the disease prediction based on RideNN. The developed model
processes upon two functions, like feature selection and classification.
2.1 Mapper Phase
Let us consider the big input data G with different attributes, expressed as
G = {duv } ; (1 ≤ u ≤ P) ; (1 ≤ v ≤ H ) (1)
where uth data in vth attribute is denoted as duv , the term P represents the total
data points, H indicates every data point’s total attributes. In the mapper phase, the
dimension of features is reduced based on PCA. The big data is initially classified
into data subsets, expressed by

G j = r j ; (1 ≤ j ≤ M) (2)
where the total sub-sets of data are denoted by M. The total data subsets are then fed
to the mapper phase, expressed by

N = N1 , N2 , ..., N j , ..., N M (3)
Optimized Neural Network for Big Data Classification Using … 431
Fig. 1 Schematic view of proposed RideNN for big data classification
Principal component analysis for selecting the features

PCA [13] is utilized to select the best features for big data classification. The
expression of PCA is given by

D j = PC A G j (4)
The PCA output is denoted as D j , which is passed to the reducer phase for big
data classification.
2.2 Reducer Phase
Once the features are selected based on PCA, the concatenated feature is forwarded
to the NN [20] for classification and is given by

B = D1 D2 , D j D M (5)
The reducer phase input is represented as

B = B jl , 1 ≤ j ≤ P ; 1 ≤ l ≤ M × Q (6)
Classification based on Rider Optimization Algorithm-enabled neural network:

Let us consider the input layer B in the network as given by
B = {B1 , B2 , ..., Bx } (7)
where the total input neurons are denoted as Bx . The hidden layers representation is
given by

K = k1 , k2 , ..., kt , ..., kq (8)
where q indicates the total hidden neurons in NN and kt denotes the tth hidden
neuron, which has the output calculated as
1
s
kt = Rs Bs (9)
s e=1
The term Rs refers to weight s among the input and hidden neurons and the total
weights are denoted as s. The NN output is calculated using the below equation:

x
Ta = ka Ba (10)
a=1
When the output layer is rewritten, the solution becomes Yi = H (Ba , B), which
means the output layer is the input layer’s function along with weights. The expression
of weights in NN is given by
W = {1 , 2 , ..., d } (11)
where the term d represents the number of weights in NN.

NN training using Rider optimization algorithm: The ROA [20] approach follows
a fictional computing scheme for tackling optimization issues using some imaginary
ideas to decide the winner.
Step 1: Initialization: In this step, the algorithm is initialized by four-rider groups
and their locations, and the equation is given by

Z g = Z g (y, z) ; (1 ≤ y ≤ E); (1 ≤ z ≤ J ) (12)
where the count of riders is denoted as E the total coordinates are indicated as
J . Z g (y, z) refers to the location of the rider y at a time interval z. The bypass,
overtaker, follower, and attacker are indicated as F, A, L, and I , respectively. The
coordinate, steering, and location angle of rider vehicle y are denoted as φ, S of,w ,
and αi , respectively. The gear, accelerator, and brake of the rider y are indicated as
vs , ls , and z s , respectively. The gearing value ranges between 0 and 4; meanwhile,
accelerator and brake take values from 0 and 1. Here, Z ∈ W .
Step 2: Computation of fitness: The objective function is computed using a

minimum value of the error and the solution concerning minimum error is considered
the best solution.
⎡ ⎤

1 ⎣
MSE = Rtarget − Ta ⎦ (13)
p=1
where training samples are represented as n. Ta and Rtarget are the estimated and the
target output of the classifier.
Step 3: Update the leading rider’s location: The fitness is computed for all the
riders where the maximal fitness is considered a leader such that the leading rider is
nearer to the target. In case of not fixing the leading rider, the updation is carried out
at the iteration end based on fitness rate.
Step 4: Update the overtaker’s location: The overtaker upgrades the position
using a direction indicator, coordinate selector, and the success rate. The update of
overtaker position is given by

Z gA (y, z) = Z g (y, z) + OZ g (y) ∗ Z L (L , z) (14)
where the term Z gA (y, z) refers to the position of yth rider at zth coordinate and the
direction indicator of yth rider is denoted as Z g (y).
Step 5: Re-compute the fitness rate: Once the rider location is updated, every
rider’s fitness rate gets updated. Therefore, the maximal rider fitness rate is chosen
as the leading rider.
Step 6: Update rider parameters: The steering angle, gear, accelerator ride
off-time, and the brake, along with the activity counter, are updated.
Step 7: End: Steps 1–6 are continued till the iteration end. Finally, the optimization
derives the best solution (weights and biases) for tuning the RideNN classifier. Thus,
the RideNN classifier determines the classes as normal or abnormal.
The developed model’s implementation is carried out in the JAVA tool with windows
10OS, 4-GB RAM, and the Intel I3 processor. The experimentation is carried out
based on datasets, such as the Cleveland dataset [21] and the diabetic dataset [22].
The metrics utilized for the analysis are accuracy, sensitivity, and specificity.
3.1 Comparative Analysis
The comparative analysis is performed with the existing methods, such as NN [20],
Support Vector Neural Network (SVNN) [23], Support Vector Machine (SVM) [24],
and RideNN.
Comparative analysis based on Diabetic dataset: Figure 2 portrays the analysis
using diabetic dataset. Figure 2a demonstrates the analysis of the accuracy parameter.
When 90% of training data is considered, accuracy values measured by NN, SVNN,
SVM, and RideNN are 0.685, 0.827, 0.840, and 0.841, respectively. Figure 2b demon-
strates the analysis of sensitivity. For 90% training data, sensitivity values obtained
by NN, SVNN, SVM, and RideNN are 0.599, 0.698, 0.717, and 0.718, respectively.
Figure 2c demonstrates the analysis of specificity parameter. When the training data
percentage is 90, corresponding specificity values obtained by NN, SVNN, SVM,
and RideNN are 0.595, 0.885, 0.897, and 0.899, respectively.
Comparative analysis using the Cleveland dataset: Figure 3 illustrates the analysis
of methods based on the Cleveland dataset. Figure 3a represents the analysis based on
accuracy. When training data is 90%, the accuracy values obtained by NN, SVNN,
SVM, and RideNN are 0.735, 0.867, 0.917, and 0.932, respectively. The analysis
of the sensitivity parameter is depicted in Fig. 3b. When 90% of training data is
considered, the sensitivity obtained by NN, SVNN, SVM, and RideNN are 0.337,
0.679, 0.794, and 0.830, respectively. The analysis based on the specificity parameter
is illustrated in Fig. 3c.
Fig. 2 Analysis based on Diabetic dataset by changing training data percentage a accuracy
b sensitivity, c specificity
Fig. 3 Analysis of methods by varying the training data percentage based on Cleveland dataset
a accuracy b sensitivity, c specificity
3.2 Comparative Discussion
The comparative discussion of the proposed method with the existing methods based
on the best performance is provided in Table 1. From the analysis, it is exposed that
the proposed RideNN performs the big data classification more effectively.
Table 1 Comparative
Accuracy Sensitivity Specificity
discussion
Diabetic dataset
NN 0.686 0.599 0.595
SVNN 0.828 0.699 0.886
SVM 0.902 0.821 0.938
Proposed RideNN 0.931 0.827 0.957
Cleveland dataset
NN 0.743 0.609 0.839
SVNN 0.867 0.679 0.911
SVM 0.918 0.795 0.949
Proposed RideNN 0.932 0.831 0.958
4 Conclusion
In this paper, an effective data classification method based on the MapReduce frame-
work is presented. The proposed technique involves two steps, which include feature
selection and classification. Here, selecting the features is carried out in the MapRe-
duce framework’s mapper function using PCA, and classification is performed in
reducer based on RideNN. The experimentation of the developed model is done
based on two databases, like Cleveland and Diabetic datasets. The proposed model
achieves maximal accuracy of 0.932, maximal sensitivity of 0.831, and the maximal
specificity of 0.958 based on the Cleveland dataset. In future, the method will be
expanded by performing additional analysis using different datasets.
References
1. Storey, V.C., Song, I-Y.: Big data technologies and management: what conceptual modeling
can do. Data Knowl. Eng. 108, 50–67 (2017)
2. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges
and analytical methods. J. Bus. Res. 70, 263–286 (2017)
3. Hassib, E.M., El-Desouky, A.I., Labib, L.M., El-kenawy, E-S.M.: WOA+ BRNN: an imbal-
anced big data classification framework using Whale optimization and deep neural network.
Soft Comput. 1–20 (2019)
4. Gupta, B.B.: Computer and Cyber Security: Principles, Algorithm, Applications, and Perspec-
tives. CRC Press (2018)
5. Manogaran, G., Thota, C., Lopez, D.: Human-computer interaction with big data analytics. In:
HCI Challenges and Privacy Preservation in Big Data Security IGI Global, pp. 1–22 (2018)
6. Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for
prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)
7. Banchhor, C., Srinivasu, N.: Integrating cuckoo search-grey wolf optimization and correlative
Naive Bayes classifier with map reduce model for big data classification. Data Knowl. Eng.
101788 (2019)
8. Tsai, C.-F., Lin, W.-C., Ke, S.-W.: Big data mining with parallel computing: a comparison of
distributed and MapReduce methodologies. J. Syst. Softw. 122, 3–92 (2016)
9. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1),
72–77 (2010)
10. Din, S., Paul, A., Ahmad, A., Gupta, B.B., Rho, S.: Service orchestration of optimizing contin-
uous features in industrial surveillance using big data based fog-enabled internet of things.
IEEE Access 6, 21582–21591 (2018)
11. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural
Comput. 18(7), 1527–1554 (2006)
12. Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep
networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)
13. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagc, E.:
Deep learning techniques in big data analytics. In: Big Data Technologies and Applications.
Springer, pp.133–156 (2016)
14. Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and
challenges. Neurocomputing 237, 350–361 (2017)
15. Hassib, E.M., El-Desouky, A.I., Labib, L.M., El-kenawy, E.-S.M.: WOA + BRNN: an imbal-
anced big data classification framework using Whale optimization and deep neural network.
Soft Comput. 24, 5573–5592 (2020)
16. Hernández, G., Zamora, E., Sossa, H., Téllez, G., Furlán, F.: Hybrid neural networks for big
data classification. Neurocomputing 390, 327–340 (2020)
17. Gerardo, H., Zamora, E., Sossa, H., Téllez, G., Furlán, F.: Hybrid neural networks for big data
classification. Neurocomputing (2019)
18. Lin, K.-C., Zhang, K.-Y., Huang, Y.-H., Hung, J.C., Yen, N.: Feature selection based on an
improved cat swarm optimization algorithm for big data classification. J. Supercomput. 72(8),
3210–3221 (2016)
19. Elkano, M., Sanz, J.A.A., Barrenechea, E., Bustince, H., Galar, M.: CFM-BD: a distributed rule
induction algorithm for building compact fuzzy models in Big Data classification problems.
IEEE Trans. Fuzzy Syst. (2019)
20. Binu, D., Kariyappa, B.S.: RideNN: a new rider optimization algorithm-based neural network
for fault diagnosis in analog circuits. IEEE Trans. Instrum. Meas. 68(1), 2–26 (2018)
21. Cleveland dataset taken from https://archive.ics.uci.edu/ml/datasets/Heart+Disease. Accessed
March 2020
22. Diabetic dataset taken from https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospit
als+for+years+1999-2008. Accessed March 2020
23. Mukkamala, S., Janoski, G., Sung, A.: Intrusion detection using neural networks and support
vector machines. In: Proceedings of International Joint Conference on Neural Networks, vol.
2, pp. 1702–1707 (2002)
24. Demidova, L., Nikulchev, E., Sokolova, Y.: Big Data classification using the SVM classifiers
with the modified particle swarm optimization and the SVM ensembles. Int. J. Adv. Comput.
Sci. Appl. 7(5) (2016)
Impact of Deployment Schemes
on Localization Techniques in Wireless
Sensor Networks
Prateek , Aakansha Garg, and Rajeev Arya
Abstract The localization efficiency of a wireless sensor network largely depends

upon the coverage provided by the deployment scheme. The demand for different
deployment schemes and optimization goals must be determined before being suit-
able for the intended environment. We wish to apply other deployment techniques
to sensor nodes in a scenario afflicted by errors due to dust and debris in the current
work. By careful calculation of the proposed approach, a detailed analysis is carried
out to improve the critical parameters such as positioning error, network connectivity,
and average neighbor anchor nodes. Based on numerical computations, the proposed
technique is robust and resilient to faults due to dust and debris. Simulation results
validate the theoretical fundamentals discussed here.
Keywords Dust–debris errors · Localization estimation · Sensor node deployment

scheme · Wireless sensor networks
1 Introduction
Han et al. [1] proposed a three-dimensional deployment scheme in an underwater

scenario such that the localization error could be reduced. The simulation conducted
exposed that tetrahedral deployment scheme is more suited than cubical or random
deployment methods in terms of localization ratio when the average number of neigh-
boring anchor nodes is maintained. A uniform-sea surface circumference (USC)
deployment scheme was presented in [2], which yielded superior localization results
than both the cube and random deployment schemes while also considering the
bending of acoustic rays due to sound speed profile underwater. To address localiza-
tion errors due to known sensor attacks such as Aligned Beacon Position (ABP) attack
and inside-attack, the authors in [3] developed a novel beacon placement strategy and
a filtering technique by a localization algorithm that incorporates the two mitigation
Prateek · A. Garg · R. Arya (B)

National Institute of Technology Patna, Patna, Bihar, India
e-mail: rajeev.arya@nitp.ac.in
440 Prateek et al.
techniques. In this case, the beacon deployment was shown to improve localiza-
tion performance. A spatially circular deployment configuration was presented in
work by [4] to address the tracking method and the trajectory position algorithm
for localization of autonomous underwater vehicles (AUVs) in the marine environ-
ment [5]. Authors in [6] formulated a novel technique based on underestimating
non-convex Maximum Likelihood (ML) function to achieve appreciable execution
time and convergence rate compared to Barzilai–Borwein and Nesterov’s optimal
method. Some challenges faced by the works mentioned above are poor localization
accuracy due to unfavorable node deployment patterns, low localization ratio in case
of sparse anchor nodes, and high RMS errors associated with obstructed target nodes.
The significant contributions of the work presented in this paper are as follows:
• A square random sensor node deployment is carried out for a set of 300 nodes,
of which 60 are anchor nodes. The nodes have a communication radius of 200 m,
and either anchor or non-anchor nodes may occupy the vertices.
• A “C” shaped random node deployment is also carried out where the sequence of
anchor node and sensor node is not fixed.
• A square-shaped regular deployment pattern is carried out, which differs from
random square deployment in the sense that the position of sensor nodes and
anchor nodes follows a specified sequence. Vertices are occupied with anchor
nodes.
• A “C” shaped regular deployment pattern is also done in which the anchor nodes
and sensor nodes are evenly distributed.
• An “O” shaped random deployment scenario is performed in which nodes are
distributed evenly in a circular fashion.
• Parabola-shaped node deployment enables us to approach position coordinates
analytically, wherein the parabolic equation shall govern the deployment pattern.
• A circumcircle is also tried as a node deployment scenario. It can be accomplished
by considering triplets of nodes, Centroid as the center at a given time. The rest
of the nodes are deployed on its circumference.
The rest of the paper is organized as follows: Sect. 2 describes the signal and
the system’s localization model. The derivation of Cramer Rao’s lower bound is
discussed in Sect. 3. Explanation of numerical findings is detailed in Sect. 4. Simu-
lation and numerical computations are carried out in Sect. 4. The paper is concluded
in Sect. 5.
2 Localization Model
The method for the PIT test and the consequent APIT algorithm is discussed in
[7]. Based on the centroid formula, the localization depends upon N anchor node
information (X i , Yi ) to estimate coordinates of the target (X est , Yest ), as shown in
Eq. (1)
Impact of Deployment Schemes on Localization … 441

(X 1 + X 2 + · · · + X N ) (Y1 + Y2 + · · · + Y N )
[X est , Yest ] = , (1)
N N
Centroid algorithm is preferred for its simplicity. Another technique utilizing the
DV-Hop mechanism is also a range-free technique. The average single-hop distance
is estimated by ith anchor using the following formula:

(xi −xj )2 +( yi −y j )2
ij hj
γk = (2)
h pki + h pk j
The anchor node position h is the number of hop notations, with the subscript
i, j, k denoting the ith, jth anchor nodes, and kth target node, respectively. This
hop count information is propagated to nearby nodes, and it is the indication of the
approximate position of the target node. The RSSI-based [8] cooperative localization
technique, which uses a log distance path loss model, is governed by the equation:
Pirjx = P jt x − 10η log10 di j + n i j (3)
The power received [9] at the sensor node of interest from the transmitting node
is that the propagation loss due to path loss depends primarily on the distance of
separation di j between the source and destination sensor nodes. n i j is the additive
noise, which is Gaussian distributed with zero mean and variance σ 2 .
The localization scheme involves the deployment of nodes using one of the deter-
ministic ways, such that we get the coordinates based on the said scheme. Then
the stated localization techniques, namely APIT, DV-hop, centroid-based, and RSSI-
based techniques, are implemented to estimate the node coordinates. Subsequently,
localization error is computed for each of the methods. The further analysis involves
the computation of neighbor anchor nodes and average connectivity of the nodes for
different configurations.
3 Computation of Fisher Information
Let us consider a 3D terrestrial RF localization scenario where target debris is located

with N anchor nodes’ help. Let A = {1, 2, 3, . . . N } be the anchor nodes, and the
actual position of debris be deb = [x, y, z]T . Let the position of the ith anchor node
be ai = [xi , yi , z i ]T such that the exact length of signal travel is denoted as
li = deb − ai (4)
The presence of debris in and around the sensor node is familiar in industrial,
military, arid areas, to name a few [10]. Due to debris, the RF signal undergoes
scattering when the signal wavelength matches debris’s physical dimensions. For
442 Prateek et al.
current work, we assume that while locating a piece of debris, the neighboring debris
particles act as noise sources by restricting signal or altering its natural form. Let the
signal between the ith anchor node and the target debris be scattered by an angle of
αi . Then, αi can be expressed as

−1 (x − xi )2 + (y − yi )2
αi = tan (5)
(z − z i )
Let the angle made by the signal with respect to the horizontal axis be given by
θi , computed as
(z + z i )
θi = tan−1 (6)
(x − xi )2 + (y − yi )2
Based on the traveling speed of the signal, the actual time required for a signal to
propagate from ith the anchor node to the target debris is

1 + sin(θi + αi ) 1 + sin(θi − αi )
ti = ln − ln (7)
cos(θi + αi ) cos(θi − αi )
Due to the presence of debris, the estimated time for a signal to propagate from
ith the anchor node to the target debris is
tî = ti + n i (8)
where n i is the measurement

error due to surrounding debris particles taken to be
Gaussian with n i ∼ N 0, σi2 . The vector form of true time for all the anchor nodes
shall be denoted by t(deb ) = [t1 , t2 , . . . t N ]T , and the vector form of measurement
error due to surrounding debris particles be represented by n = [n 1 , n 2 , . . . n N ]T .
Thus, the vector representation of estimated time is t̂ such that
t̂ = t(deb ) + n
= [t1 , t2 , . . . t N ]T + [n 1 , n 2 , . . . n N ]T (9)
The covariance matrix of errors in the measurement of debris positions is given

by

Cov(n) = E (n − E(n))(n − E(n))T
⎡ 2 ⎤
σ1
⎢ .. ⎥
=⎣ . ⎦ (10)
σ N2
The Fisher information Matrix is thus calculated with the help of the Jacobian
Matrix J . The general expression of Jacobian is expressed as

∂ti ∂ti ∂ti
J= ∂ x ∂ y ∂z , ∀ i = {1, 2, . . . N } (11)
The Fisher Matrix is then expressed as
F I M = J T Cov(n)−1 J (12)
Computing the inverse of the Fisher Matrix shall yield the Cramer’s Rao
lower bound, which would indicate the minimum possible variation in terms of
measurement errors by deploying various sensor node configurations and different
localization techniques.
4 Numerical Findings
The localization problem is solved using four range-free techniques: DV-Hop local-
ization, APIT technique, RSSI-based localization, and centroid-based localiza-
tion. The said methods’ performance evaluation is compared by considering fully
connected networks in 1000 × 1000 m2 with 300 sensor nodes and 60 anchor nodes.
The communication radius of each anchor node is taken to be 200 m. The commu-
nication model taken here is the Regular Channel model, which consists of the log
distance path loss model with debris and dust factors are taken into account. The
table of parameters is summarized in Table 1. Figure 1 represents the positioning
error of the deployment schemes.
The localization error associated with each sensor node is the average positioning
error, which is the ratio of Euclidean distance from the estimated position to the sensor
node’s actual position and the sensor node’s communication radius. Because of the
effect of dust and debris, the hardest hit method is RSSI-based localization [11]. DV-
Hop method also faces the issue of poor performance under a random deployment
scheme. APIT has the advantage of triangulating the target precisely, but the best is
the centroid method for the said configuration (Fig. 1).
Table 1 Table of parameters

Sl. No Parameter Value
used in the computations
1 Number of nodes 300
2 Number of anchor nodes 60
3 Communication radius of the anchor 200 m
node
4 Communication model Regular model
444 Prateek et al.
Fig. 1 Comparison graph of

positioning error of sensor
nodes
Fig. 2 Comparison of
average connectivity of
sensor nodes of different
deployment schemes
The mean value of the count of anchor nodes surrounding the target node is found
with the help of the ratio of sensor nodes, which can sense nearby anchor nodes, to
the total number of nodes [12]:
Snbr
Anbr = (13)
S
Snbr is the number of sensor nodes that can communicate with anchor nodes
and S is the total number of sensor nodes. For the said configuration, the average
connectivity is close to 300 for parabolic-shaped nodes, whereas all other deployment
shapes have average connectivity below 50 nodes (Fig. 2).
It is essential to judge the localization algorithm’s network connectivity in sensor
nodes communicating with other sensors. The requisite expression is given by [13]
Ss_cnv
Scnv = (14)
S
where Ss_cnv denotes the count of sensor nodes in applicable communication with
other sensor nodes. S is the overall count of sensor nodes. The mean count of neighbor
anchor nodes is the maximum for parabola shape, whereas all different deploy-
ment shapes have average neighbor anchors below 20, as shown in Fig. 3. A tabular
Fig. 3 Comparison of
Average Neighbour Anchor in N/W

Average Neighbour Anchor comparison of different Deployment schemes
average neighbor anchor 60
nodes of different sqrandom
crandom
deployment schemes sqregular
40
cregular
orandom
parabolashape
20 circumcircle
0
DVHop APIT RSSI Centroid
Different Localisation techniques
Table 2 Summary of different Parameters/Node Configuration/Localization technique

Parameter/Node Advantages Remarks
configuration/Localization
technique
Parabolic node deployment It exhibits the best network It May be used for emergency
connectivity as well as applications that prioritize
neighborhood anchor count network cohesiveness over other
parameters
RSSI based localization It favors symmetric anchor RSSI method is susceptible to
technique node deployments such as RF obstructions, hence best
square arrangement suited for LOS communications
APIT and centroid Both exhibit comparably low APIT and centroid method is
localization method localization errors than less sensitive to sensor node
DV-Hop and RSSI deployment patterns
Average connectivity and Their values do not vary Connectivity and neighborhood
average neighbor anchor drastically with the change of anchor count depends upon
localization technique anchor nodes with respect to
time
comparison may be referred to in Table 2 to summarize the findings of the current

work.
Recently, the Internet of Everything has become very popular. It has many advan-
tages over traditional systems [14]. In the future, the proposed work can be explored
in this direction.
5 Conclusion
The present work aimed to achieve a rigorous comparison between different methods
of sensor node deployment. Simulations were carried out to determine the perfor-
mance of techniques such as DV-Hop algorithm, APIT, RSSI-based localization, and
centroid-based localization. While the RSSI method would be highly dependent on
the received signal strength, it was erroneous whenever the communication range
446 Prateek et al.
was obstructed by dust or debris. Though inferior to APIT, the DV-Hop algorithm
is better than the RSSI method because the sensor hop count is more related to
node connectivity than signal strength variations. This work’s future scope would
include specific terrains (such as underwater sensor networks) and the possibility of
using advanced computational techniques to address more targeted sensor network
localization aspects.
References
1. Zhang, H., Liu, Y., Lei, H.: Localization from incomplete euclidean distance matrix: perfor-
mance analysis for the SVD-MDS approach. IEEE Trans. Signal Process. 67, 2196–2209
(2019). https://doi.org/10.1109/TSP.2019.2904022
2. Won, J., Bertino, E.: Robust sensor localization against known sensor position attacks. IEEE
Trans. Mob. Comput. 18, 2954–2967 (2019). https://doi.org/10.1109/TMC.2018.2883578
3. Dai, L., Wang, B., Yang, L.T., Deng, X., Yi, L.: A nature-inspired node deployment strategy
for connected confident information coverage in industrial internet of things. IEEE Internet
Things J. 6, 9217–9225 (2019). https://doi.org/10.1109/JIOT.2019.2896581
4. Han, G., Zhang, C., Shu, L.: Rodrigues, JJPC: impacts of deployment strategies on localization
performance in underwater acoustic sensor networks. IEEE Trans. Ind. Electron. 62, 1725–1733
(2015). https://doi.org/10.1109/TIE.2014.2362731
5. Li, Y., Cai, K., Zhang, Y., Tang, Z., Jiang, T.: Localization and tracking for AUVs in marine
information networks: research directions, recent advances, and challenges. IEEE Netw. 78–85
(2019). https://doi.org/10.1109/MNET.2019.1800406
6. Zhang, Y., Li, Y., Zhang, Y., Jiang, T.: Underwater anchor-AUV localization geometries with
an isogradient sound speed profile: a CRLB-based optimality analysis. IEEE Trans. Wirel.
Commun. 17, 8228–8238 (2018). https://doi.org/10.1109/TWC.2018.2875432
7. He, T., Huang, C., Blum, B.M., Stankovic, J.A., Abdelzaher, T.: Range-free localization
schemes for large scale sensor networks. In: Proceedings of the Annual International Confer-
ence on Mobile Computing and Networking, MOBICOM (2003). https://doi.org/10.1145/938
985.938995
8. Poduri, S., Sukhatme, G.S.: Constrained coverage for mobile sensor networks. In: Proceed-
ings—IEEE International Conference on Robotics and Automation (2004). https://doi.org/10.
1109/robot.2004.1307146
9. Hou, Y.T., Shi, Y., Sherali, H.D., Midkiff, S.F.: On energy provisioning and relay node place-
ment for wireless sensor networks. IEEE Trans. Wirel. Commun. (2005). https://doi.org/10.
1109/TWC.2005.853969
10. Kim, H.S., Abdelzaher, T.F., Kwon, W.H.: Minimum-energy asynchronous dissemination to
mobile sinks in wireless sensor networks. In: SenSys’03: Proceedings of the First International
Conference on Embedded Networked Sensor Systems (2003). https://doi.org/10.1145/958491.
958515
11. Bulusu, N., Heidemann, J., Estrin, D.: GPS-less low-cost outdoor localization for very small
devices. IEEE Pers. Commun. (2000). https://doi.org/10.1109/98.878533
12. Cho, H., Lee, J., Kim, D., Kim, S.W.: Observability-based selection criterion for anchor nodes
in multiple-cell localization. IEEE Trans. Ind. Electron. (2013). https://doi.org/10.1109/TIE.
2012.2213557
13. Al-Turjman, F.M., Hassanein, H.S., Ibnkahla, M.: Quantifying connectivity in wireless sensor
networks with grid-based deployments. J. Netw. Comput. Appl. (2013). https://doi.org/10.
1016/j.jnca.2012.05.006
A Survey on 5G Architecture
and Security Scopes in SDN and NFV
Jehan Hasneen and Kazi Masum Sadique
Abstract 5G is an emerging technology and is not going to be an update to its prede-

cessors. Researchers are intended to achieve a head-turning advancement in terms
of all the way performances such as data rates, network reliability, massive connec-
tivity, mobility, energy efficiency, latency, secure channel, spectral efficiency, etc. 5G
is going to be an end-to-end system that will provide hyper-connectivity to its users.
It is supposed to support roughly three use cases, i.e., Enhanced Mobile Broadband
(eMBB), Ultra-Reliable and Low Latency Communication (URLLC), and massive
Machine-Type Communication (mMTC). All these create convergence among wire-
less communication and computer networking that incorporates Software-Driven
Network (SDN), Network Functions Virtualization (NFV), Service-Based Architec-
ture (SBA), 5G new radio technologies (M-MIMO, mmWave, UDN, FD), massive
IoT techniques. This hyper-convergence will introduce new trust and security threats
and relevant threat management challenges. In this paper, we tried to summarize why
and how 5G is evolved and the technology requirements for 5G revolutions, and the
steps adopted for technology change over to 5G. We focused on potential threats
and challenges and their suggested mitigation techniques. Several open issues are
identified, and possible future research directions are also discussed in this paper.
Keywords 5G · Software-defined network (SDN) · Security · Network functions

virtualization (NFV) · M-MIMO
J. Hasneen (B)
Institute of Information and Communication Technology (IICT), Bangladesh University of
Engineering and Technology (BUET), Dhaka 1000, Bangladesh
K. M. Sadique
Department of Computer and Systems Sciences, Stockholm University, Borgarfjordsgatan 8, 164
07 Kista, Sweden
e-mail: sadique@dsv.su.se
448 J. Hasneen and K. M. Sadique
1 Introduction
Cellular phone technology made us visualize the dream of mobile telephony. It

evolves so fast that every decade is revolutionary in performance, connectivity, data
speed, etc. We have experienced 4G technology so far. Now 5G is on brewing. 5G
is visualized to come up with high flexibility. It is to be designed with maximum
scalability and dynamic adaptability, so the architecture becomes more resilient. The
desired 5G network will render advanced performances in data rates, network reli-
ability, massive connectivity, energy efficiency, latency, etc. [1]. New technology
needs requirement analysis, development of standardization, and extensive tech-
nology trials. All of these steps are done in a series. But the market demands concur-
rent execution of the processes as mentioned earlier so that 5G can be furnished to
the market soon. 5G is believed to solve the last mile problem. It is going to be an
end-to-end system that will provide hyper-connectivity to its users. It is supposed to
support roughly three use cases, i.e., eMBB, URLLC, and mMTC [2, 3]. All these
create convergence among wireless communication and computer networking that
incorporates Software-Driven Network (SDN), Network Functions Virtualization
(NFV), and edge cloud technology.
The rest of our paper will emphasize 5G infrastructure components, their devel-
opment, newly arising security threats, and their possible mitigation techniques.
Section 2 will provide a short overview of 5G development motivations and their
backgrounds. 5G capabilities, spectrum requirements, and 5G technology enablers.
5G use cases are also covered in Sect. 2. Few related works are discussed in Sect. 3.
Section 4 contains various security threats to 5G enabling technologies. Mitigation
techniques to those threats are addressed in Sect. 5. In Sect. 6, we put two tables
representing a quantitative analysis of the 5G security scopes we found during the
survey. We have listed some open issues and future work directions also in this
section.
2 Background and Motivation of 5G Technology
The era of 4G, as technology has bloomed to its fullest. The convenience of 4G and its
technical capacity made today’s high-speed communication possible. The outreach
functionality of 4G enabled us to develop sprawling remote sensor networks and
generated the idea of IoT [4]. But the extensive engagement of IoT devices cannot
be met with present 4G capabilities. 4G technology has few limitations like user
privacy leakage, weak home network control, limitations of architecture, and risk of
radio interfaces, and a few more [4, 5]. All these limit proper deployment of IoT and
other lower latency services properly, thus arises the demand for 5G technology. 5G
technology has to evolve to render a more scalable, adaptive, flexible, agile, secure,
and trustworthy programmable network platform on which different applications and
services with alternating needs could be installed and operated according to preset
A Survey on 5G Architecture and Security Scopes in SDN and NFV 449
standardization [6]. With the proliferation of data volume, diversified service require-
ments, 5G is expected to handle massive data, massive control, enormous resilience,
and massive IoT connectivity [7]. All these 5G capabilities will be discussed in the
following section.
2.1 5G Capabilities
5G presents a paradigm switch into mobile wireless communication. It will render

higher bandwidth and lower latency with maximized throughput than existing tech-
nology to support a massive range of services with new capabilities. 5G capabilities
according to market demand in [1, 4, 7–10] are
• Massive data: The usage of cellular networks is enormous. Following that, the
cellular network data rate is increasing. Increment in the cellular network data
rate follows a pattern, and it is getting ten times higher every 5 years. Thus, 5G is
expected to handle this massive amount of data.
• Massive IoT: Implementation of IoT will increase numerous device attachments
to cellular networks. The realization of the massive Internet of Things (IoT) is
expected to leap 5G deployment.
• Tremendous control: Remotely controlling tactile internet requires infrastructure
support. Deployment of 5G will enable us to build secure infrastructure to handle
massive controlling needs.
• Excellent resilience: Massive IoT implementations need a highly flexible and
resilient network. Technologies like SDN, NFV, Cloud Radio Access Network
(RAN), and Mobile Edge Cloud (MEC) will increase network resiliency.
2.2 5G Spectrum Requirements and Small Cell Networks
The least technical requirements for 5G got their approval in [11]. All those key
performance parameters and their values using cases will be found in tabular form
[11]. The current microwave bands in use are lower than 6 Hz with small-scale
capacity due to high traffic caused by heavy usage. To meet the growth, 5G is planned
to be introduced in different frequency bands [4]. Several candidate frequency bands
in the mm-wave ranges from 24 to 100 GHz have been approved for investigation
purposes by WRC-15. Additionally, the spectrum in the unlicensed 60-GHz bands
can also be used for study purposes [12]. We can wisely choose lower frequency bands
for comprehensive area network coverage while combining all the above bands. High-
range mm-wave bands will be selected for LAN and personal area communications,
and comparably shorter range links of the unlicensed spectrum of the mm-wave
bands [2, 12, 13]. The subsequent sections will give a precise idea about SDN and
NFV, the two critical technological enablers of the 5G core network. The network
slicing concept is also studied and depicted in detail.
Fig. 1 Pictorial view of a small cell 5G wireless network [14]
The 5G wireless network is designed to function in a high-frequency band of wire-

less spectrum and desired to provide higher capacity, real-time connectivity, which
will enable IoT. IoT users will need massive bandwidth for real-time connectivity
and proper functionality, which will help implement MEC. Thus, including IoT and
other real-time users, 5G network will have a high density (of users and devices),
distributed network of base stations in the small cell infrastructure (see Fig. 1). 5G
networks will run on small cells due to its high (mm-Wave)-frequency operation
bandwidth. Small cells will render higher data capacity for 5G. They will need more
geographically dispersed small-sized antennas, ensuring less power consumption
and longer battery life for geographically distanced IoT devices [14, 15].
2.3 Technologies Involved in 5G
SDN and NFV are the two leading technologies required to implement services and
applications supported by 5G. Both will operate on network slicing operations. Table
1 represents 5G technological enablers identified by different papers. As illustrated
in Table 1, cloud computing is also recognized as a technology enabler for 5G. But
Table 1 5G technology
Technology enablers for 5G Reference papers
enablers
Cloud computing [4, 16, 17, 16–20]
SDN [16, 21, 16–18, 20–26]
NFV [16, 17, 12, 16–18, 27, 22–27]
Network slicing [16, 12, 22, 3, 22–25, 25]
we have only discussed SDN, NFV, and network slicing in detail because SDN and
NFV are part of the cloud computing paradigm.
Software-Defined Networks (SDN): Networking devices usually handle control
plane management and data plane management that forwards network traffic. SDN
is a technology that sets apart the control plane management from the data plane of
network devices [25, 26, 29, 30]. The separation of the control plane from the data
plane enables more automation and policy-based governance [8, 29, 31]. An SDN
controller controls information flow within a data center [3, 12, 31]. It identifies
larger data flows, prioritizes them, and hence renders optimized data flow. SDN
controllers are set to track frequent and infrequent traffic patterns and optimize them
according to demand [17, 31]. Since its inception, SDN provides a programmable
networking protocol that allows the system to manage and propagate network traffic
among routing and switching devices vendors independently [32].
SDN and NFV include some extra network components. Namely, those are SDN
controller, Orchestrator, Hypervisor, Security Function Virtualization, and a few
more [8, 25]. Including extra components causes new security issues and unknown
risks [8]. Other than SDN/NFV components, some network functions like cloud
radio access network (Cloud RAN/c RAN/centralized RAN), Mobile Edge Cloud
(MEC), and Network Slicing are obligatory to enable the system for resource sharing
optimization and to support real-time (low latency) services [8, 25]. All these new
functions also make the system more prone to security risks. All these security issues
are described in detail in later sections of this paper.
Network Functions Virtualization (NFV): Conventional Network Functions are
embedded in hardware appliances. NFV defines sharing common physical resources
through a specific VM ware or virtual machine. NFV implies Network Functions
will run on cloud computing infrastructure located in a data center [5, 17]. But NFV
infrastructure will not be analogous to commercial or enterprise Cloud [5]. NFV will
ensure the highest possible use of enterprise cloud resources, and those will not be
interchangeable [12, 26]. Usually, various networking functions run on a range of
industry-standard hardware. The idea of NFV in 5G technology came into the scene
to aggregate several network functions onto software appliances. NFV decouples
software from hardware, reducing operating cost, increasing scalability, and making
network services more resilient [24, 26].
Network Slicing: Network Slicing is introduced to accommodate multiple logical
self-contained networks [25]. It will allow customized services to different users
orchestrated in different ways [7]. The new core network architecture determined for
5G enables network operators to specify Network Slices [2, 25, 28]. These Network
Slices are tuned/associated with particular service-level agreements by network oper-
ators, which can be noted as Logically Isolated Network Partition (LINP) [2]. It is
assumed that Network Slice Selection Function (NSSF) is placed in the Radio Access
Network (RAN) link to have proper Network Slices regulations. Slicing a network
space into chunks allows diversified users with varying requirements to parallel self-
contained logical networks [3, 6, 12]. While tuned, the user attains full control over
all the 5G infrastructure’s vertical layers, namely the physical layer, the virtualiza-
tion layer, and the service layer [6, 22]. Slicing networks can pose trust and security
threats discussed in a later section of this paper.
2.4 5G Use Cases
5G is envisaged to support a diverse variety of use cases [2, 3, 7, 9, 12, 24–26,

33–36]. Those are classified broadly into three categories, namely Enhanced Mobile
Broadband (eMBB), Ultra-Reliable and Low Latency Communication (URLLC),
massive Machine-Type Communication (mMTC).
Enhanced Mobile Broadband (eMBB): A range of use cases fall into eMBB,
including a wide-area network and mobile hotspots. Wide-area network users need
seamless coverage, high mobility, and a high data rate. On the other hand, hotspots
must support connectivity to a high density of users and higher traffic capacity. But
hotspot users need very low-speed mobility like pedestrians and a much higher data
rate than wide-area users.
Ultra-Reliable and Low Latency Communication (URLLC: Few applica-
tions have robust and vigorous demand for highly reliable, low latency real-time
interaction. URLLC use cases need the fastest access and strict authentication
protocol and high-speed cryptographic algorithms to meet the low latency and
high-reliability requirements. Tactile Internet Applications, Intelligent Transport
System, V2X-Transportation Safety, Wireless Control of Industrial Manufacturing,
Remote Surgery falls in this category. There are some potential security threats like
eavesdropping, man-in-the-middle attacks, rogue devices, and DoS attacks.
Massive Machine-Type Communication(mMTC): After mMTC deployment,
there will be an ample number of devices generating a high or low volume of non-
delay-sensitive data. Thus, mMTC devices should be low-cost, low power consump-
tion with long-lasting battery life for long-range machine-type communication.
Lightweight cryptographic algorithms and key management protocols are crucial
for resource-constrained IoT devices. Wearable sensors, remote sensor networks for
agriculture are examples of mMTC use cases. In addition to that, there are several
mMTC services and applications that are not entirely featured yet. There are some
potential security threats like device cloning, data manipulation, rogue devices, and
DoS attacks in mMTC.
2.5 Global Standardization Effort for 5G Technology
This section portrays few standardization efforts on 5G technology from several

telecommunication industries and standardization bodies. At present, different
telecommunication industries and standardization bodies found in [5, 25, 37] are
focusing on conceptualizing 5G technology [12], 5G network design principles,
evolving and adapting current technology toward 5G pre-commercial technology

trials. All the above steps must be executed, monitored, and successfully compat-
ible with a defined industry standard. Standardization bodies like (3GPP, ITU,
IEEE), associations like (ETSI, TIA), alliances like (NGMN and Wireless and World
Research Forum: WWRF) are operating numerous research projects to monitor and
establishment 5G standards aligned to market demands for successful commercial
deployment of 5G technology.
2.6 Relevant Works to Our Survey
Many recent types of research addressed the 5G technology and its’ security issues.
In [2], Shafi et al. provided a tutorial overview about 5G technology requirements,
5G use cases, and 5G enabling technologies like network slicing, SDN, and NFV.
The authors discussed 5G use cases in detail. Dutta et al. [8] identified 5G supporting
technologies, their potential security issues, opportunities, and suggested solutions.
Li et al. [38] focused on incorporating SDN techniques enhanced IoT inclusion into
5G technology. In [6], Yousaf et al. presented an overview of SDN and NFV as
5G network enablers. All the preceding papers and few more surveys discussed the
5G architecture, 5G enabling software and technologies, their limitations, poten-
tial challenges, and mitigation countermeasures. Some surveys presented overall
pictures, and some surveys emphasize particular areas. This paper tried to portray
a birds’ eye picture from 5G background to 5G requirements, use cases through
technological innovation inclusive to 5G technology and their potential threats and
probable mitigation measures.
3 5G Security Threats
While setting up a plan of action for new technology, some points of hindrance
come forward. As the development process progresses, some impeding factors get
solved, and a few more get added. These sections describe various security threats
and potential risk issues that come with 5G supporting technologies like SDN, NFV,
and network slicing, and their mitigation techniques are also explained later.
3.1 SDN Security Threats
SDN controllers have a vital role in SDN security and threat management, as it
controls the data flow through SDN devices [12, 16, 32]. So, malicious attackers can
easily take hold of precious and sensitive data if the SDN controller is compromised.
SDN controllers can be exploited by various security threats, namely API flood attack,
REST API parameter exploitation, denial of service attacks, a man-in-the-middle

attack (MiTM), SDN controller impersonation, spoofing, and protocol fuzzing [8,
24, 38]. Attackers usually take over SDN controllers through their outbound APIs
[16, 23].
3.2 NFV Security Threats
Usually, various networking functions run on a range of industry-standard hard-

ware. The idea of NFV in 5G technology came into the scene to aggregate several
network functions onto software appliances. NFV decouples software from hard-
ware, reducing operating cost, increasing scalability, and making network services
more resilient [7]. The development of NFV brings new challenges and security
threats too. Threats could arise from NFV Management and Orchestration (MANO)
and NFV infrastructure [5, 7, 25]. Interfaces among Virtualized Network Function
(VNF), NFV Infrastructure (NFVI), and NFVO can also be sources of security risks
[5, 7, 20]. NFV MANO could encounter several security risks like a single-point
failure from the DoS attack, lack of consistent policies, inappropriate policy config-
uration, and malicious insiders [23, 24]. Interface security could be at risk due to
insecure management of interfaces, routing loops due to vicious attacks, standard
interface development, integration flaws, data loss, and information leakage. VNF
security risks are DoS attacks of VNF external interface or VNF instance, VNF
software vulnerabilities. NFV infrastructure can be a source of security threats too.
Namely, those are hypervisor hijacking, unauthorized changes to the hypervisor,
resource isolation failure, infected VM images, VM escape or hopping, malicious
guest VMs, and a few more [7, 16, 27].
3.3 Network Slicing Security Threats
Network slicing is instigated to render purpose-built services in 5G. It is developed

using cloud computing facilities. Hence, it has few security threats and challenges
[6, 7, 23, 24, 28, 37, 39] that are listed below:
• If any user equipment (UE) can take simultaneous entry to more than one network
slices, it can abuse this as a bridging span and make security attacks from one
slice to another.
• Multiple network slices use a standard Network Slice Selection Functions (NSSF),
which creates a comfortable hole for attackers to attack through. Attackers can
eavesdrop on a target slice’s data by taking an illicit hold on the common NSSF
from another slice.
• If different network slices do not have good isolation, attackers may misuse one
slice’s capacity flexibility to grab resources from another slice. This could send
the victim slice out of service.
• Attackers may listen to security, or privacy information exchanged between UE
and network when network slice selection occurs. Attackers could forge this
information to take illegal access to network resources.
4 Countermeasures for 5G Threats Mitigation
Some potential threats and possible risk factors in 5G supporting technologies are
discussed earlier. Several threat mitigation techniques are summarized below:
4.1 SDN Risk Mitigation
To prevent threats imposed on SDN architecture, early detection mechanisms like

establishing a dedicated channel between SDN controller and SDN device or SDN
controller can be placed in a secure location [16, 20]. There has to be a stringent access
policy, an initial establishment of the trust relationship, and a secure communication
channel between the SDN controller and SDN device to mitigate these threats and
attacks and assure more dependable software-defined networking functions. SDN
controllers gather intelligence from outbound APIs, which can be used to enable
dynamic security control among routers and switches to make the network more
resilient [23, 32, 38].
4.2 NFV Risk Mitigation
There are several mitigation measures for NFV security risks are suggested in [5, 7,
16, 20, 24], which are listed below:
NFV MANO risks mitigation: Single point failure can be prevented by sepa-
rating the administration and introducing distributed control. Security monitoring
can check DoS attacks. Malicious insiders’ entry can be controlled by fine-grained
access control.
Interface risks mitigation: Improvement in confidentiality management and
integrity protection will eliminate Sensitive data leakage problems. Network
topology validation can prevent Malicious routing loop attacks.
VNF security risk mitigation: Signing a VNF image cryptographically can
prevent VNF images from getting infected. To stop the Malicious VNF hypervisor,
introspection and abnormality detection can be done. Deployment of flexible VNF
and scaling strategy can prevent DDoS attacks.
NFVI risk mitigation: By keeping the hypervisor up to date, VM escape

and hopping can be avoided. Installation of crash protection mechanisms will
help to reduce data removal due to VM crashes. Fine-grained authentication and
authorization will stop unauthorized change to the hypervisor.
4.3 Network Slicing Risk Mitigation
Network slicing threats can be mitigated by imposing a few strategies strictly, as

described in [7, 20, 24, 37, 39]. There must be stringent authentication and autho-
rization rules for UE to take entry to any network slice. We can impede simultaneous
access to more than one network slice by implementing rigorous authentication
by any single UE. Thus, a UE being abused as a bridging span can be avoided.
Strict slice access control should be implemented to prevent Slicing risks. Each
network slice can have service-oriented security mechanisms to render customized
security services. Service-oriented security mechanisms include unique authentica-
tion protocols, particular security functions, cryptographic algorithms, and security
policy configurations. Things to be considered while configuring security policies
include key length, critical update period, etc.
5 Open Issues and Future Research Directions
5G technology is now at its pre-commercial trial phase, and the process is accelerating
each day to meet early deployment races. Such a situation compels us to address
5G technological challenges and security threats urgently. While doing the survey,
we tried to address those security challenges and developed the following table.
Table 2 portrays our findings of several security threats toward those new technology
enablers (presented in Table 1). Our findings on security threats in SDN, NFV, and
network slicing are identified from [2, 7, 16, 17, 20, 23–28, 30, 32, 38, 40] and
presented in Table 2. We also put relevant target points for solutions in Table 2,
which opens new sets of research opportunities. Further study on target network
elements like centralized control points, SDN controller and switches, SDN controller
hypervisor, SDN controller–switch communication, SDN controller communication,
SDN virtual switches and routers, shared cloud resources. A data center in Cloud
will shed light on solutions to the threats targeted on those elements.
There will be more practical threats when 5G is implemented, and many more
end-user service-specific applications will be using 5G. Automation of defense mech-
anisms, threat monitoring, and mitigation processes can be used to address this
issue. Future application of anomaly detection machine learning algorithms can be
employed in the SDN control plane for trait analysis and pattern recognition in SDN
architecture [3, 16, 30]. Efficient assignment of network slices over NFV and their
management is a vast area to research further [12]. URLLC use cases that require
Table 2 Security challenges in 5G technology and target solution points

Threats towards Security Security Security Target network
security challenges challenges challenges element
associated with associated with associated with
SDN NFV network slicing
√ √
DoS attack Centralized
control points
√ √
DDoS attack Centralized
control points
√
Saturation attack SDN controller
and switches
√ √ √
Resource (Network Shared cloud
slice) theft resources,
hypervisor
√ √
Configuration SDN Virtual
attack switches and
routers
√ √
Hijacking attack SDN controller,
hypervisor
√
TCP level attack SDN
controller–switch
communication
√
Man-in-the-middle SDN controller
attack communication
√
Penetration attack A data center in
cloud
√
Eavesdropping 5G core
networking
element
highly reliable, ultra-low latency real-time interaction, latency has to be less than
10 ms, which is quite a critical challenge to solve in future wireless networks [30].
6 Conclusions
This paper tried to survey and summarize key points of 5G technology, 5G network
requirements and analysis, 5G network architecture, and new components required
to envisage 5G technology. We also focused on some open research challenges in the
5G architecture reinforced by SDN and NFV. A shortlist of potential research issues,
and their future research directions are listed in this paper. The idea of SDN and
NFV and their possible implementations are discussed a bit in detail. The network
slicing technique on which SDN and NFV built up is also explained. Potential trust
issues, security risks, and threats are also identified, and their possible mitigation
techniques are also discussed. 5G use cases, their expansions potential future appli-
cations of 5G are also identified. Following 5G use cases, 5G capabilities are also
focused. All possible risks related to trust and security handlers are expected to be
addressed during the design phase while setting up the 5G network architecture and
security model. These security issues are crucial and are addressed by standardization
authorities like 3GPP, IEEE, and ETSI.
References
1. Pirinen, P.: A brief overview of 5G research activities. In: Proc. 2014 1st Int. Conf. 5G Ubiq-
uitous Connect. 5GU 2014, vol. 5, pp. 17–22 (2014). https://doi.org/10.4108/icst.5gu.2014.
258061
2. Shafi, M., Fellow, L., Molisch, A.F., Smith, P.J., Haustein, T., Zhu, P., Member, S., Silva,
P.D., Tufvesson, F., Benjebbour, A., Member, S.: 5G: a tutorial overview of standards, trials,
challenges, deployment, and practice. IEEE J. Sel. Areas Commun. 35, 1201–1221 (2017)
3. Fourati, H., Maaloul, R., Chaari, L.: A survey of 5G network systems: challenges and machine
learning approaches. Springer Berlin Heidelberg (2020). https://doi.org/10.1007/s13042-020-
01178-4.
4. Gupta, A., Jha, R.K.: A survey of 5G network: architecture and emerging technologies. IEEE
Access 3, 1206–1232 (2015). https://doi.org/10.1109/ACCESS.2015.2461602
5. Andrews, J.G., Buzzi, S., Choi, W., Hanly, S.V., Lozano, A., Soong, A.C.K., Zhang, J.C.: What
will 5G be? IEEE J. Sel. Areas Commun. 32, 1065–1082 (2014). https://doi.org/10.1109/JSAC.
2014.2328098
6. Yousaf, F.Z., Bredel, M., Schaller, S., Schneider, F.: NFV and SDN-key technology enablers
for 5G networks. IEEE J. Sel. Areas Commun. 35, 2468–2478 (2017). https://doi.org/10.1109/
JSAC.2017.2760418
7. Zhang, S., Wang, Y., Zhou, W.: Towards secure 5G networks: a survey. Comput. Netw. 162
(2019). https://doi.org/10.1016/j.comnet.2019.106871
8. Dutta, A., Hammad, E.: 5G security challenges and opportunities: a system approach. In: 2020
IEEE 3rd 5G World Forum, 5GWF 2020—Conf. Proc. pp. 109–114 (2020). https://doi.org/10.
1109/5GWF49715.2020.9221122
9. Wen, F., Wymeersch, H., Peng, B., Tay, W.P., So, H.C., Yang, D.: A survey on 5G massive
MIMO localization. 94, 21–28 (2019)
10. Li, S., Xu, L.D., Zhao, S.: 5G internet of things: a survey. J. Ind. Inf. Integr. 10, 1–9 (2018).
https://doi.org/10.1016/j.jii.2018.01.005
11. Mohyeldin, E.: Minimum requirements relate r d to technical performance for IMT-2020
radio interface(s), document ITU-R M. [IMT-2020. TECH PERF REQ]. https://www.itu.
int/en/ITU-R/study-groups/rsg5/rwp5d/imt-2020/Documents/S01-1_Requirements%20for%
20IMT-2020_Rev.pdf (2020). Last accessed 5 Dec 2020
12. Morgado, A., Huq, K.M.S., Mumtaz, S., Rodriguez, J.: A survey of 5G technologies: regulatory,
standardization and industrial perspectives. Digit. Commun. Netw. 4, 87–97 (2018). https://
doi.org/10.1016/j.dcan.2017.09.010
13. Hansen, C.: WIGIG: multi-gigabit wireless communications in the 60 GHZ band. 60–61 (2011)
14. Nguyen, T.: Small cell networks and the evolution of 5G (Part 1). https://www.qorvo.com/des
ign-hub/blog/small-cell-networks-and-the-evolution-of-5g. Last accessed 4 Jan 2020
15. Liu, F., Peng, J., Zuo, M.: Toward a secure access to 5G network. In: Proc.—17th IEEE Int.
Conf. Trust. Secur. Priv. Comput. Commun. 12th IEEE Int. Conf. Big Data Sci. Eng. Trust,
pp. 1121–1128 (2018). https://doi.org/10.1109/TrustCom/BigDataSE.2018.00156
16. Ahmad, I., Kumar, T., Liyanage, M., Okwuibe, J., Ylianttila, M., Gurtov, A.: 5G security:
analysis of threats and solutions. In: 2017 IEEE Conf. Stand. Commun. Networking, CSCN
2017, pp. 193–199 (2017). https://doi.org/10.1109/CSCN.2017.8088621
17. Neves, P., Calé, R., Costa, M., Gaspar, G., Alcaraz-Calero, J., Wang, Q., Nightingale, J., Bernini,
G., Carrozzo, G., Valdivieso, Á., Villalba, L.J.G., Barros, M., Gravas, A., Santos, J., Maia, R.,
Preto, R.: Future mode of operations for 5G—the SELFNET approach enabled by SDN/NFV.
Comput. Stand. Interfaces 54, 229–246 (2017). https://doi.org/10.1016/j.csi.2016.12.008
18. Panwar, N., Sharma, S., Singh, A.K.: A survey on 5G: the next generation of mobile
communication. Phys. Commun. 18, 64–84 (2016). https://doi.org/10.1016/j.phycom.2015.
10.006
19. Singh, S., Saxena, N., Roy, A., Kim, H.S.: A survey on 5G network technologies from social
perspective. IETE Tech. Rev. (Institution Electron. Telecommun. Eng. India) 34, 30–39 (2017).
https://doi.org/10.1080/02564602.2016.1141077
20. Ahmad, I., Kumar, T., Liyanage, M., Okwuibe, J., Ylianttila, M., Gurtov, A.: Overview of 5G
security challenges and solutions. IEEE Commun. Stand. Mag. 2, 36–43 (2018). https://doi.
org/10.1109/MCOMSTD.2018.1700063
21. Krishnan, P., Najeem, J.S.: A review of security, threats and mitigation approaches for SDN
architecture. Int. J. Innov. Technol. Explor. Eng. 8, 389–393 (2019)
22. Gohil, A., Modi, H., Patel, S.K.: 5G technology of mobile communication: a survey. In: 2013
Int. Conf. Intell. Syst. Signal Process. ISSP 2013, pp. 288–292 (2013). https://doi.org/10.1109/
ISSP.2013.6526920
23. Khettab, Y., Bagaa, M., Dutra, D.L.C., Taleb, T., Toumi, N.: Virtual security as a service for
5G verticals. In: IEEE Wirel. Commun. Netw. Conf. WCNC (2018). https://doi.org/10.1109/
WCNC.2018.8377298.
24. Ji, X., Huang, K., Jin, L., Tang, H., Liu, C., Zhong, Z., You, W., Xu, X., Zhao, H., Wu, J.,
Yi, M.: Overview of 5G security technology. Sci. China Inf. Sci. 61 (2018). https://doi.org/10.
1007/s11432-017-9426-4
25. Blanco, B., Fajardo, J.O., Giannoulakis, I., Kafetzakis, E., Peng, S., Pérez-Romero, J.,
Trajkovska, I., Khodashenas, P.S., Goratti, L., Paolino, M., Sfakianakis, E., Liberal, F., Xilouris,
G.: Technology pillars in the architecture of future 5G mobile networks: NFV MEC and SDN.
Comput. Stand. Interfaces 54, 216–228 (2017). https://doi.org/10.1016/j.csi.2016.12.007
26. Akpakwu, G.A., Silva, B.J., Hancke, G.P., Abu-Mahfouz, A.M.: A survey on 5G networks for
the internet of things: communication technologies and challenges. IEEE Access 6, 3619–3647
(2017). https://doi.org/10.1109/ACCESS.2017.2779844
27. Lal, S., Taleb, T., Dutta, A.: NFV: security threats and best practices. IEEE Commun. Mag. 55,
211–217 (2017). https://doi.org/10.1109/MCOM.2017.1600899
28. Cunha, V.A., da Silva, E., de Carvalho, M.B., Corujo, D., Barraca, J.P., Gomes, D., Granville,
L.Z., Aguiar, R.L.: Network slicing security: challenges and directions. Internet Technol. Lett.
2, e125 (2019). https://doi.org/10.1002/itl2.125
29. Thomasm, M.: 24 top internet-of-things (IOT) examples you should know. https://builtin.com/
internet-things/iot-examples. Last accessed 5 Dec 2020
30. Agiwal, M., Roy, A., Saxena, N.: Next generation 5G wireless networks: a comprehen-
sive survey. IEEE Commun. Surv. Tutorials 18, 1617–1655 (2016). https://doi.org/10.1109/
COMST.2016.2532458
31. Alqarni, M.A.: Benefits of SDN for big data applications. In: 2017 14th Int. Conf. Smart Cities
Improv. Qual. Life Using ICT IoT, HONET-ICT 2017, pp. 74–77 (2017). https://doi.org/10.
1109/HONET.2017.8102206
32. Zhong, H., Fang, Y., Cui, J.: LBBSRT: an efficient SDN load balancing scheme based on server
response time. Futur. Gener. Comput. Syst. 68, 183–190 (2017). https://doi.org/10.1016/j.fut
ure.2016.10.001
33. Ullah, H., Gopalakrishnan Nair, N., Moore, A., Nugent, C., Muschamp, P., Cuevas, M.: 5G
communication: an overview of vehicle-to-everything, drones, and healthcare use-cases. IEEE
34. Storck, C.R., Duarte-Figueiredo, F.: A survey of 5G technology evolution, standards, and
infrastructure associated with vehicle-to-everything communications by internet of vehicles.
IEEE Access 8, 117593–117614 (2020). https://doi.org/10.1109/ACCESS.2020.3004779
35. Jahng, J.H., Park, S.K.: Simulation-based prediction for 5G mobile adoption. ICT Express 6,
109–112 (2020). https://doi.org/10.1016/j.icte.2019.10.002
36. Wang, C.X., Bian, J., Sun, J., Zhang, W., Zhang, M.: A survey of 5g channel measurements
and models. IEEE Commun. Surv. Tutorials 20, 3142–3168 (2018). https://doi.org/10.1109/
COMST.2018.2862141
37. Barakabitze, A.A., Ahmad, A., Mijumbi, R., Hines, A.: 5G network slicing using SDN and
NFV: a survey of taxonomy, architectures, and future challenges. Comput. Netw. 167 (2020).
https://doi.org/10.1016/j.comnet.2019.106984
38. Li, Y., Su, X., Ding, A.Y., Lindgren, A., Liu, X., Prehofer, C., Riekki, J., Rahmani, R.,
Tarkoma, S., Hui, P.: Enhancing the internet of things with knowledge-driven software-defined
networking technology: future perspectives. (2020)
39. Sattar, D., Matrawy, A.: Towards secure slicing: using slice isolation to mitigate DDoS attacks
on 5G core network slices. arXiv. pp. 82–90 (2019)
40. Cao, J., Ma, M., Li, H., Ma, R., Sun, Y., Yu, P., Xiong, L.: A survey on security aspects for
3GPP 5G networks. IEEE Commun. Surv. Tutorials 22, 170–195 (2020). https://doi.org/10.
1109/COMST.2019.2951818
Study and Analysis of Hierarchical
Routing Protocols in Wireless Sensor
Networks
Ankur Choudhary, Santosh Kumar, and Harshal Sharma
Abstract Wireless Sensor Network (WSN) is an interconnected system of sensors

capable of sensing environmental changes and communicating these perceived
changes to a centralized location, where this sensed data can be processed for
further decision-making purposes. This makes it suitable for WSNs to be successfully
deployed in various domains like home monitoring, vehicle tracking, environmental
monitoring, agricultural field monitoring, military applications, and many more. For
successful deployment, efficiency is one of the critical terms that remain associated
with WSNs, and along with other factors, efficiency largely depends on the routing
protocols used to deploy WSNs. The hierarchical routing protocol is one of the routing
strategies used for this, others being location-based and the traditional Flat routing
protocol. This paper talks about the well-established and appreciated hierarchical
routing protocols and also discusses the results of the two main hierarchical routing
protocols, namely Low-Energy Adaptive Clustering Hierarchy (LEACH) and Power-
Efficient Gathering in Sensor Information System (PEGASIS). The paper will help
budding researchers gain a quick insight into the discussed protocols to investigate
further.
Keywords Wireless sensor networks · Hierarchical routing protocols · LEACH

protocol · PEGASIS protocol
1 Introduction
Wireless Sensor Networks (WSNs) have proven their mettle and have gained enor-
mous popularity over a while. Their practical serving areas not limited to include
environmental monitoring [1], traffic control [2], medical health care [3], home
automation [4], field monitoring, military applications and border surveillance [5],
and other fields [6]. WSNs are deployable in friendly and hostile environments [7,
A. Choudhary · S. Kumar (B) · H. Sharma

Department of Computer Science and Engineering, Graphic Era Deemed to be University,
Dehradun, Uttarakhand, India
462 A. Choudhary et al.
8]. Hostile environmental deployment of these networks has always been a chal-
lenge as the same is a concern. As the sensors accumulate the readings from the
deployed region and forward it to the central location commonly known as a sink,
a significant amount of energy is expended in the process involving collection and
transmission. The overall setup will work as long as there is sufficient energy left
for the process [9–12]. The energy dissipation depends mostly on the routing proto-
cols used, i.e., the more adept the routing protocol, the better the efficiency, thus
leading to an extended network lifetime [13]. Moreover, prolonging the battery life
and improving efficiency remains a challenge. There are different routing protocols
designed for prolonging the network life in the case of WSNs. Depending on how
the sensors get interconnected and the route they follow to communicate the sensed
data toward the base station, the routing protocols are generally classified as classical
Flat, Location-Based, and Hierarchical routing protocols.
Flat Routing Protocols—deployment of the sensors is uniform, each node is each
other’s peer, and there is no organization or any segmentation structure between the
nodes. These protocols based on the routing technique they implement can be further
categorized into Proactive and Reactive routing protocols [14], e.g., DSDV, AODV,
FSR, etc.
Location-Based Routing Protocols—as the name suggests, the sensors are cate-
gorized based on their location in the network. Here received signal strength is the
basis for determining the distance among the sensors. More the signal strength, the
closer the sensors and vice versa [15], e.g., GAF, MECN, GEAR, GPSR, etc.
Hierarchical Routing Protocols—deployed sensors are arranged into groups. Each
group or cluster is governed by an elected sensor commonly known as the Cluster
Head (generally node having maximum energy). The node with maximum energy is
preferred to perform the duty of cluster head simply because its work is doubled up
as a normal data collecting node, coordinating with the rest of the nodes in the cluster
and communicating with other C.H.s or the B.S. (depending on the algorithm used).
Cluster Head (C.H.) generally receives the environmental values from the sensors
deployed in its cluster, removes redundancy from the received data, and forwards
it to the other C.H. or Base Station [16, 17], e.g., LEACH, PEGASIS, SEP, EAP,
REAP, TEEN, APTEEN, etc. Figure 1 depicts the broad categorization of routing
protocols, and Fig. 2 illustrates the primary clustered sensor network.
This paper talks about the hierarchical routing protocols and aims to benefit
the researchers who intend to start work on clustered-based routing protocols. The
Fig. 1 Routing protocols

categorisation
Study and Analysis of Hierarchical Routing Protocols … 463
Fig. 2 A typical hierarchical

sensor network
remaining work is structured as Sect. 2 presents a literature review and provides an

overview of past surveys on hierarchical routing protocols. Section 3 covers the most
prominent hierarchical protocols. Section 4 performs the experimental evaluation and
discusses the results of the two famous hierarchical routing protocols—LEACH and
PEGASIS. Finally, Sect. 5 presents a conclusion based on the study.
2 Literature Review
There are various quality surveys available about improving the network lifetime
[18], optimization techniques [19, 20], congestion control [21], and other domains.
However, this work focuses on hierarchical routing protocols, and the following
literature is reviewed. Akkaya and Younis [22] studied various routing protocols,
discussed and categorized them into location-based, hierarchical, and datacentric
routing protocols. The paper also talks about the quality of service modeling methods.
Deosarkar et al. [23], in the survey, evaluated the cluster head selection mech-
anism. The survey classified the cluster head selection mechanism into different
categories: deterministic, adaptive, hybrid, and combined metric clustering.
Ramesh and Somasundaram [24] present a survey on clustering techniques,
discuss, and compare different cluster head selection methodologies. Liu [25]
surveyed clustering routing protocols and categorized clustering attributes into clus-
tering process, characteristics of the cluster, cluster head, and total proceeding of the
technique used. The paper also presents their goals and capabilities. In the survey,
Sha et al. [26] discussed the multipath routing technique based on wireless sensor
networks’ design structure. The paper categorizes these techniques into infrastruc-
ture, non-infrastructure, and coding-based methods with a discussion under each
category. The paper also compares different categories’ approaches in load balancing,
energy efficiency, route setup, and reliability. Guo and Zhang [27] surveyed intel-
ligent routing protocols. This paper categorized algorithms into Neural Networks
(NN), Genetic Algorithms (GA), Ant Colony Optimization (ACO), Reinforcement
Learning (RL), and Fuzzy Logic (FL). Afsar and Tayarani-Najaran [28] survey cate-
gorize the clustering methods into equal- and unequal-sized clustering algorithms.
This paper also compares each category method in terms of cluster size, cluster
count, mobility, etc. Singh and Sharma [29] surveyed cluster-based routing proto-
cols. The paper focuses on three classifications: block cluster, chain cluster, and
grid cluster-based classification. This paper also evaluates the methods on various
parameters like stability, efficiency, scalability, etc. Arora et al. [30] surveyed leach
and its variants covering C LEACH, MODLEACH, Heterogeneous LEACH, Two-
Level LEACH, Multi-hop LEACH, Vice LEACH, and other hierarchical routing
protocols, including PEGASIS. They also presented modifications over hierarchical
routing protocol.
Shokouhi et al. [31] categorized the clustering methods as homogeneous and
heterogeneous. This survey also compares various methods according to different
features like cluster head count, cluster count, intercluster communication, etc.
Fanian and Rafsanjani [32] discussed various cluster-based routing protocols from a
methodology perspective. This literature classifies the methods based on the method-
ologies used in classical approaches, metaheuristic-based strategies, fuzzy-based
approaches, and hybrid metaheuristic-based approaches.
3 Hierarchical Routing Protocols
The sensor network is grouped into various clusters, and elected nodes (cluster head)
from different clusters (preferable nodes with more energy) are responsible for peri-
odically communicating with the member nodes to collect the data, perform some
local computations, remove redundancies from the collected data, and communicate
it to the central location [33]. Clustered routing protocols avoid long-distance data
transmission between C.H.s and B.S. The normal neighbor sensing nodes are at the
bottom level, above which are the elected cluster heads of different clusters respon-
sible for aggregating data collected from the bottom level sensors. Eventually, the
aggregated data is communicated to the centralized station. Now, this aggregated
data can be analyzed for decision-making purposes. This makes it a two-level hier-
archical routing protocol. Similarly, there are three hierarchical routing protocols
where another level of C.H. nodes above the second-level C.H.s. Here, these top-
level C.H. nodes aggregate the data received from the second-level C.H.s, which is
then finally communicated to the centralized station.
3.1 Low-Energy Adaptive Clustering Hierarchy (LEACH)
LEACH is a hierarchical protocol capable of self-organizing and adaptive; it uses

randomization to reduce wireless sensor networks’ energy expenditure. Initially, it
was proposed by [16, 34] and to date stands as the basis for most of the hierarchical
routing protocols. The normal data sensing nodes sense and forward the data to
elected cluster heads of their respective clusters for further necessary processing, as
shown in Fig. 2. A sensor nominates and declares itself as a cluster head according to
formula 1 [16]. A sensor n selects an arbitrary value between 1 and 0, if the selected
value is below the threshold T(n), it becomes the cluster head for that particular
round. LEACH dictates that the nodes that have served as cluster heads in round 0
cannot serve the cluster in the capacity of cluster heads again for the next 1/P rounds,
where P is the expected percentage of cluster heads. The sensor that nominated
itself as a cluster head advertises a message to the remaining sensors using a CSMA
MAC protocol. All cluster heads are assumed to use equal transmission energy for
advertisement broadcast. The cluster head is decided based on the received signal
strength.

P
1−P (r | P1 |)
if n ∈ G
T (n) = (1)
0 other wise
P—estimated percentage of cluster heads in the network.

r—existing round.
G—set of sensors that have not served as cluster heads in the last 1/P rounds.
Thus, LEACH works in two phases, namely: Setup Phase and Steady-State Phase.
Setup Phase—nodes organize themselves into a cluster.
Steady-State Phase—after the cluster formation, the C.H. creates a TDMA
schedule for the member nodes to transmit the data to the C.H.
Figure 2 shows two clusters, the sensing nodes of each cluster senses the data
and forwards this sensed data to the cluster head of their respective clusters. Cluster
heads, in turn, aggregate the data and forward it to the Base Station.
Modified LEACH (MODLEACH)
MODLEACH, Modified LEACH, uses two transmitting power levels when a node is
cluster head and not. Here, the cluster head does not change after each round, and a
node remains cluster head until its energy does not drop below the minimum energy
requirement criteria to lead the cluster. This is an improvement over LEACH as it
helps in conserving the energy that otherwise would be utilized in cluster formation
after each round [35]. Here, after every round, the remaining energy of the cluster
head is taken into consideration. If it qualifies the threshold value set, then it delays the
new cluster head selection procedure. It divides the communication into intracluster,
intercluster, and communication from cluster head to base station, requiring different
amplification, which is not the case in LEACH, where all communication requires
the same amplification.
Centralized LEACH (LEACH-C)

One of the limitations with LEACH protocol is that as there isn’t any centralized
control on the selection of cluster heads and the sensors elect themselves as cluster
head based on the remaining energy requirements as per formula 1; thus, there is
no assurance of the location of the cluster head and this may affect the efficiency of
the protocol in specific rounds. On the other hand, Centralized leach allows the base
stations to randomly select the cluster heads, thus providing a centralized control on
the cluster head selection. For this, each node is required to share its existing location
and current residual energy information with the base station. This helps the base
station decide load-balanced clusters uniformly distributed across the network. After
determining the clusters, the B.S. broadcasts the I.D. of the elected cluster head in
the network, all the nodes which receive this I.D., compares it with their I.D.s and
if it matches with broadcasted I.D., then that node becomes the next C.H. [34]. The
steady-state phase remains the same as of LEACH.
Multi-Hop LEACH (MH-LEACH)
Multi-hop LEACH adopts the same technique for cluster head selections as LEACH.
The focus here is on using other cluster heads that are on the way toward the B.S. to
forward the data, instead of directly communicating with the B.S. This helps extend
the network lifetime as it reduces the cluster heads’ energy requirements, which
was high while making a direct connection with the B.S. This same methodology
is adopted in intracluster mode also, i.e., the nodes forward the sensed data via
neighboring nodes towards the cluster head rather than sending data directly to the
cluster head. This sometimes creates an overhead for the intermediate nodes (in case
of intracluster) or the cluster heads (in case of intercluster) as their energy might get
depleted faster. Thus these intermediate nodes depending on their residual energy
levels can reject the data packets, in which case the data is forwarded to some other
node on the way to the destination [36]. Routing tables are created and maintained
to route data packets both at intercluster and intracluster levels.
Two Level LEACH (LEACH-TL)
In LEACH protocol, the cluster head is responsible for aggregating the data, removing
redundancy among the collected data, and then transmitting it to the base station.
This overhead drains the cluster head energy faster, initiating a new cluster head
formation procedure. LEACH-TL divides this cluster head task into a primary and
secondary task [37]. The secondary cluster head performs the data aggregation task,
while the primary cluster transmits the data to the B.S. This methodology helps in
delaying the new cluster head selection and ultimately tries to prolong the network
life. LEACH-TL talks about two conditions: first, the energy of the current cluster
head should not be less than the average energy of the nodes, and second, the distance
between the primary cluster head and the base station should not be larger than the
average distance. If either of the condition is not true, then the normal node with
maximum energy in the cluster is selected as the secondary cluster head, else, if both
the conditions are true, it is not required to elect a secondary cluster head.
In [38], the performance of LEACH-TL was evaluated with the following

simulation parameters:
Area = 200 * 200 m, Number of nodes = 200, starting energy available with the
nodes 0.5 J, CH proportion = 7%, BS location 100,100 m, packet size of 4000 bits,
Eelec = 50 nJ/bit, εfs = 10 pJ, ETX = ERX = 50 pJ, εmp = 0.0013 pJ and EDA of 5
nJ/bit.
The energy efficiency and network life were evaluated to be better than the LEACH
protocol.
There are various other variants of LEACH like LEACH with distance-based
threshold (LEACH-DT), Density of sensor LEACH (DS-LEACH), Vice Cluster
Head LEACH (Vice LEACH), etc.
3.2 Power-Efficient Gathering in Sensor Information System

(PEGASIS)
PEGASIS follows a chain-based approach in which nodes organize themselves to

form a chain, and if any node dies in the process, then reconstruction of the chain is
done by leaving or bypassing that node, such that the system does not fail or break.
The chain formation task is accomplished by the sensor nodes at the local level using a
greedy algorithm or is decided at the B.S. level and then broadcasted to the network’s
participating nodes. Sensors take the lead, in turn, to accept and communicate the
data to their close neighbors toward B.S. This helps in even load distribution in
the network and thus prolonging the deployed system’s lifetime. One head node is
elected, which transmits the data to the sink. Each node performs data fusion, fusing
neighbor’s data with its own to create a single packet and then forward this packet to
its neighbor toward the B.S. [39]. Simulation results showed that PEGASIS performs
better than classical LEACH as the network size increases. This is due to balanced
energy dissipation in the network.
3.3 Threshold Sensitive Energy-Efficient Sensor Network

(TEEN)
A protocol developed for reactive networks, where during the cluster change, the
current C.H. broadcasts Hard Threshold (H.T.) and Soft Threshold (S.T.) along with
the other attributes [40], this helps in controlling the amount of data transmissions.
The nodes sense the attributes continuously but transmit them to the cluster head only
when the sensed characteristic exceeds the H.T. value. The first time a parameter from
the attribute set reaches its hard threshold value, the sensed attribute gets stored in
an internal variable called the sensed value (S.V.). The nodes transmit based on two
conditions:
The value of the sensed attribute is greater than the hard threshold value, and the
value of the perceived attribute differs from S.V. by an amount equal to or greater
than the S.T.
This helps in reducing the data transmission frequency by ignoring the small
changes in the sensed attribute. S.T. value is set according to the application require-
ment. The smaller the S.T. value more accurate the network but at the cost of increased
energy consumption. Thus, there is a need to tradeoff between accuracy and energy
consumption.
4 Experimental Evaluation of Classical LEACH

and PEGASIS
To assess both the protocols’ performance, a comparison is done based on overall

network lifetime by assuming initial node energy to be 1 J and then 2 J, simulation
is done using MATLAB (Figs. 3 and 4).
4.1 Types of Energies Required in the Process
Energy consumption occurs in the transmission and reception of data. The energy
requirements are calculated using the formula below [16]:
ETx (k, d) = Eelec ∗ k + εamp ∗ k ∗ d2

ERx (k) = (Eelec + EDA) ∗ k
Fig. 3 Nodes deployed in

WSN using LEACH
Fig. 4 Nodes deployment in

WSN using PEGASIS
where ETx is Energy consumption: transmission, unit is Joules/bit.

ERx is Energy consumption: receiving, unit is Joules/bit.
εamp is Energy consumption: power amplifier, unit is Joules/bit/m2 .
EDA is Energy consumption: aggregation, unit is joules/bit.
4.2 Simulation and Result Discussion
In this paper, a 100 m * 100 m area is considered for the deployment of 100 nodes,
and simulation is done in MATLAB using the following parameters shown in Table
1 (Figs. 5, 6, 7 and 8).
A comparative study of the two algorithms LEACH and PEGASIS based on the
overall network lifetime was done in the paper and shown in Table 2. The paper
considered initial energy to be 1 J/node and 2 J/node for analysis (rest parameters
being same). The simulation analyzed the number of rounds it took for both the
algorithms to reach 1, 10, 20, 50, 70, and 100% nodes to go down. The simulation
showed that LEACH and PEGASIS almost doubled their rounds when the energy
per node was doubled while keeping the rest parameters.
The first node was exhausted in PEGASIS compared to LEACH in some simu-
lations, but the overall network lifetime of PEGASIS was much better than that of
EACH for the same initial energy per node. This indicates that the algorithms’ perfor-
mances will largely depend on how efficiently the random deployment is achieved
by the respective algorithms each time.
As the network size increased, the performance of PEGASIS got better over
LEACH for the same parameters. PEGASIS is more energy efficient due to better
energy dissipation distribution and stability for the deployment of WSN.
Table 1 Simulation
Parameters Values
parameters
Area 100 m * 100 m
Number of nodes 100
Sink location 50,150
CH probability 5%
Initial energy of node 1 and 2 J
Data packet size 4000 bits
Energy consumption: power amplifier, 100 * 10−12 J/bit/m2
εamp
Energy consumption: transmission 50 * 10−9 J/bit
(ETX )
Energy consumption: receiving (ERX ) 50 * 10−9 J/bit
Energy consumption: aggregation 5 * 10−9 j/bit
(EDA)
Fig. 5 LEACH network

lifetime with 1 J/node initial
energy
5 Conclusion
The paper discussed some of the existing surveys on clustered routing protocols.
WSN Energy consumption is one of the significant constraints, and different proto-
cols have been studied for decades to address the issue. Hierarchical routing protocols
have shown excellent results in prolonging the network lifetimes by efficient energy
distribution. The paper also discusses the famous LEACH protocol along with its
prominent variants developed. Other hierarchical routing protocols like PEGASIS
and TEEN have also been discussed. A simulation-based comparison is also shown
between LEACH and PEGASIS to understand the two routing protocols better.
Fig. 6 PEGASIS network

energy
Fig. 7 LEACH network

energy
However, we have seen that the PEGASIS outperforms the LEACH in terms of
the network lifetime due to better energy distribution.
Further, it also induces an additional delay as it creates a chain in the network.
Moreover, we should not forget the fact that LEACH remains the basis for the new
algorithms. LEACH was a turning point and had outperformed the then clustering
protocols by inducing adaptive clusters and changing cluster heads after each round,
which lead to better energy dissipation distribution among the network. But the
random cluster head positioning was one of the restricting issues in LEACH, which
have been addressed in many different versions of LEACH that followed over time.
Fig. 8 PEGASIS network

energy
Table 2 LEACH and PEGASIS performance comparison

Percentage of dead nodes (%) LEACHa PEGASISa LEACHb PEGASISb
1 698 950 1336 1546
10 837 1670 1650 2862
20 916 1775 1721 3339
50 1037 1924 2012 3738
70 1108 1975 2204 3994
100 1475 2106 3367 4161
a Initial Node Energy 1 J
b Initial Node Energy 2 J. Other parameters considered the same
References
1. Xu, G., Shen, W., Wang, X.: Applications of wireless sensor networks in marine environment
monitoring: a survey. Sensors 14(9), 16932–16954 (2014)
2. Mini, S., Udgata, S.K., Sabat, S.L.: Sensor deployment and scheduling for target coverage
problem in wireless sensor networks. IEEE Sens. J. 14(3), 636–644 (2014)
3. Wu, F., Li, X., Sangaiah, A.K., Xu, L., Kumari, S., Wu, L., Shen, J.: A lightweight and robust
two-factor authentication scheme for personalized healthcare systems using wireless medical
sensor networks. Futur. Gener. Comput. Syst. 82, 727–737 (2018)
4. Fathany, M.Y., Adiono, T.: Wireless protocol design for smart home on mesh wireless sensor
network. In: Proceedings of the International Symposium on Intelligent Signal Processing and
Communication Systems (ISPACS), Nusa Dua, Indonesia, pp. 462–467, 9–12 November 2015
(2015)
5. Butun, I., Morgera, S.D., Sankar, R.: A survey of intrusion detection systems in wireless sensor
networks. IEEE Commun. Surv. Tutor. 16(1), 266–282 (2014)
6. Mohamed, R.E., Saleh, A.I., Abdelrazzak, M., Samra, A.S.: Survey on wireless sensor network
applications and energy efficient routing protocols. Wirel. Pers. Commun. 101(2), 1019–1055
(2018)
7. Raghavendra, C.S., Sivalingam, K.M., (eds.): Wireless Sensor Networks. Kluwer Academic,
New York (2004)
8. Znati, T., Raghavendra, C., Sivalingam, K.: Special issue on wireless sensor networks, guest
editorial. Mob. Netw. Appl. 8 (2003)
9. Avci, B., Trajcevski, G., Tamassia, R., Scheuermann, P., Zhou, F.: Efficient detection of motion-
trend predicates in wireless sensor networks. Comput. Commun. 101, 26–43 (2017)
10. Khelladi, L., Djenouri, D., Rossi, M., Badache, N.: Efficient on-demand multi-node charging
techniques for wireless sensor networks. Comput. Commun. 101, 44–56 (2017)
11. Rashid, B., Rehmani, M.H.: Applications of wireless sensor networks for urban areas: a survey.
J. Netw. Comput. Appl. 60, 192–219 (2016)
12. Wang, D., Lin, L., Xu, L.: A study of subdividing hexagon-clustered WSN for power saving:
analysis and simulation. Ad Hoc Netw. 9(7), 1302–1311 (2011)
13. Gnanambigai, J., Rengarajan, N., Anbukkarasi, K.: Leach and its descendant protocols: a
survey. Int. J. Commun. Comput. Technol. 1(3)(2), 15–21 (2012)
14. Arce, J., Pajares, A., Lazaro, O.: Performance evaluation of video streaming over Ad hoc
networks of sensors using FLAT and hierarchical routing protocols. Mobile Netw. Appl. 13(3–
4), 324–336 (2008)
15. Savvides, A., Han, C.-C., Srivastava, M.B.: Dynamic fine-grained localization in Ad-Hoc
networks of sensors. In: Proceedings of the Seventh ACM Annual International Conference on
Mobile Computing and Networking (MobiCom), pp. 166–179, July 2001 (2001)
16. Heinzelman, W.R., Chandrakasan, A., Balakrishnan, H.: Energy-efficient communication
protocol for wireless microsensor networks. In: Proceedings of the 33rd Hawaii International
Conference on System Sciences (ICSS), Washington, USA, vol. 2, pp. 1–10, 04–07 Jan 2000
(2000)
17. Handy, M.J., Haase, M., Timmermann, D.: Low energy adaptive clustering hierarchy with
deterministic cluster-head selection. In: Proceedings of 4th IEEE Conference on Mobile and
Wireless Communications Networks, Stockholm, vol. 1, pp. 368–372, Sep. 9–11, 2002 (2002)
18. Jung, J.W., Weitnauer, M.A.: On using cooperative routing for lifetime optimization of multi-
hop wireless sensor networks: analysis and guidelines. IEEE Trans. Commun. 61(8), 3413–
3423 (2013)
19. Curry, R.M., Smith, J.C.: A survey of optimization algorithms for wireless sensor network
lifetime maximization. Comput. Ind. Eng. 101, 145–166 (2016)
20. Fei, Z., Li, B., Yang, S., Xing, C., Chen, H., Hanzo, L.: A survey of multi-objective optimization
in wireless sensor networks: metrics, algorithms, and open problems. IEEE Commun. Surv.
Tutor. 19(1), 550–586 (2017)
21. Sergiou, C., Antoniou, P., Vassiliou, V.: A comprehensive survey of congestion control protocols
in wireless sensor networks. IEEE Commun. Surv. Tutor. 16(4), 1839–1859 (2014)
22. Akkaya, K., Younis, M.: A survey on routing protocols for wireless sensor networks. Ad Hoc
Netw. 3(3), 325–349 (2005)
23. Deosarkar, B.P., Yadav, N.S., Yadav, R.: Cluster head selection in clustering algorithms for wire-
less sensor networks: a survey. In: Proceedings of the International Conference on Computing,
Communication and Networking, (ICCCN), VI, USA, pp. 1–8, 18–20 December 2008 (2008)
24. Ramesh, K., Somasundaram, D.K.: A comparative study of cluster head selection algorithms
in wireless sensor networks. Int. J. Comput. Sci. Eng. Surv. 2(4), 153–164 (2011)
25. Liu, X.: A survey on clustering routing protocols in wireless sensor networks. Sensors 12(8),
11113–11153 (2012)
26. Sha, K., Gehlot, J., Greve, R.: Multipath routing techniques in wireless sensor networks: a
survey. Wirel. Pers. Commun. 70(2), 807–829 (2013)
27. Guo, W., Zhang, W.: A survey on intelligent routing protocols in wireless sensor networks. J.
Netw. Comput. Appl. 38(1), 185–201 (2014)
28. Afsar, M.M., Tayarani-Najaran, M.H.: Clustering in sensor networks: a literature survey. J.
Netw. Comput. Appl. 46, 198–226 (2014)
29. Singh, S.P., Sharma, S.: A survey on cluster based routing protocols in wireless sensor networks.
Procedia Comput. Sci. 45, 687–695 (2015)
30. Arora, V.K., Sharma, V., Sachdeva, M.: A survey on LEACH and other’s routing protocols in
wireless sensor network. Optik-Int. J. Light Electron. Opt. 127(16), 6590–6600 (2016)
31. Shokouhi Rostami, A., Badkoobe, M., Mohanna, F., Hosseinabadi, A.A.R., Sangaiah, A.K.:
Survey on clustering in heterogeneous and homogeneous wireless sensor networks. J.
Supercomput. 74(1), 277–323 (2018)
32. Fanian, F., Rafsanjani, M.K.: Cluster-based routing protocols in wireless sensor networks: a
survey based on methodology. J. Netw. Comput. Appl. 142, 111–142 (2019)
33. Kaur, R., Sharma, D., Kaur:, N. Comparative analysis of leach and its descendant protocols in
wireless sensor network. Int. J. P2P Netw. Trends Technol. 3(1), 22–27 (2013)
34. Heinzelman, W.B., Chandrakasan, A.P., Balakrishnan, H.: An application-specific protocol
architecture for wireless microsensor networks. IEEE Trans. Wirel. Commun. 1(4), 660–670
(2002)
35. Mahmood, D., Javaid, N., Mahmood, S., Qureshi, S., Memon, A.M., Zaman, T.: MODLEACH:
a variant of LEACH for WSNs. In Proceedings of the International Broadband and Wireless
Computing, Communication and Applications (BWCCA), Compiegne, France, pp. 158–163,
Oct 28–30 2013 (2013)
36. Neto, A.S., Cardoso, A.R., Celestino, J.: MH-LEACH: a distributed algorithm for multi-hop
communication in wireless sensor networks. In: ICN, The Thirteenth International Conference
on Networks, pp. 55–61, 23–27 February 2014 (2014)
37. Peng, H., Dong, H., Li, H.: LEACH protocol based two-level clustering algorithm. Int. J. Hybrid
Inf. Technol. 8(10), 15–26 (2015)
38. Fu, C., Jiang, Z., Wei, W., Wei, A.: An energy balanced algorithm of LEACH protocol in WSN.
Int. J. Comput. Sci. 10(1), 354–359 (2013)
39. Lindsey, S., Raghavendra, C.S.: PEGASIS: power-efficient gathering in sensor information
systems. In: Proceedings of the IEEE Aerospace Conference Proceedings, vol. 3, pp. 1125–
1130, Big Sky, Mont, USA, 9–16 March 2002 (2002)
40. Manjeshwar, A., Agrawal, D.P.: TEEN: a routing protocol for enhanced efficiency in wireless
sensor networks. In: Proceedings of the 15th International Parallel and Distributed Processing
Symposium (IPDPS), pp. 2009–2015, San Francisco, CA, USA, 23–27 April 2001 (2001)
Circularly Polarized 1 × 4 Antenna
Array with Improved Isolation
for Massive MIMO Base Station
Ravindra S. Bakale, Anil B. Nandgaonkar, S. B. Deosarkar, and R. Bhadade
Abstract This paper proposes a circularly polarized 1 × 4 antenna array with

improved Isolation for Massive MIMO Base Station application. Massive MIMO
playing an important role in the design and implementation of 5G. An antenna array
is designed using a Hexagonal microstrip antenna. The proposed antenna has eight
ports in the design process. Circular polarization is incorporated using a dual coaxial
probe feed technique with equal amplitude and 90-degree phase shift. An antenna
array is simulated at a spacing of 0.50λ, 0.55λ, and 0.60λ. Improved Isolation is
achieved at a spacing of 0.55λ. The proposed antenna is simulated using HFSS13.0v
at 3.7-GHz frequency and fabricated on a Rogers RT/duroid 5880. Designed antenna
have an impedance bandwidth of 160 MHz (at S11 = −10 dB), gain of 4.97 dB per
port, and axial ratio of 0.27 (<3 dB). The inter-element spacing of 1 × 4 antenna
arrays is analyzed using HFSS so that Isolation will be greater than 20 dB. Measured
and simulated results are found in good agreement.
Keywords Massive MIMO · Microstrip antenna · Axial ratio · Impedance

bandwidth
1 Introduction
In 5G, there is a need for a 10-Gbps data rate, 1msec of latency, and more than
101 devices connected to the base station compared to 4G. Massive MIMO will
ensure maximum coverage and low power consumption of the devices [1]. The
R. S. Bakale (B)
Department of Electronics and Telecommunication Engineering, College of Engineering,
Ambajogai, Beed, Maharashtra, India
A. B. Nandgaonkar · S. B. Deosarkar
Department of Electronics and Telecommunication Engineering, DBATU Technological
University, Lonere, Raigad, India
e-mail: sbdeosarkar@dbatu.ac.in
R. Bhadade
MIT College of Engineering Pune, Pune, India
e-mail: raghunath.bhadade@mitpune.edu.in
476 R. S. Bakale et al.
multi-user MIMO system’s performance is improved by increasing the number of

antennas at the base station compared to the number of users. The performance of
Massive MIMO depends on spatial correlation and mutual coupling between antenna
elements [2]. 5G antenna systems consist of an antenna array at both the base station
and mobile handset. Gain and impedance bandwidth will decide coverage area and
channel capacity for base station and mobile device [3]. Multi-port Multi-antenna
elements are used to design a Massive MIMO system to achieve a high data rate in the
Gbps. Four-port per antenna elements are designed with optimum spacing between
them to improve isolation and envelope correlation coefficient [4]. Massive MIMO
antennas are used for LTE 42/43 (3.4–3.8 GHz) and LTE 46 band (4.8–5.925 GHz)
[5]. Massive MIMO antennas are designed using a microstrip patch antenna. A
single port is designed using 2 × 2 planer antennas with proper phase excitation
to be available in the required direction [6]. Pentagonal microstrip antenna on a
suspended substrate technique is used for designing a massive MIMO base station at
2.45 GHz. Circular polarization is incorporated to minimize the effect of multipath
effects [7]. Real-time testbeds are used in massive MIMO. Based on software radio
technology, the base station has a 100 coherent radio frequency trans-receiver chain.
High throughput and low latency are the features of this system [8]. Massive MIMO
antenna arrays are designed for Long-Term Evolution (LTE) 42/43/46 bands. It finds
application in the mobile handset for 5G [9]. Active multi-beam antenna systems
are designed using 256 antenna and 64 channels for massive MIMO applications
in 5G. It provides a verification platform for enormous MIMO channel estimation
in wireless communication and digital beamforming algorithm [10]. Full-dimension
MIMO is a new technology for LTE system in 5G. FD-MIMO uses multiple antennas
in a 2D antenna array panel to support multi-user MIMO (MU-MIMO) transmission
[11].
Channel properties are measured and analyzed for a vast antenna array using a
measurement setup of 128 antenna arrays at the base station and 26 lines of sight
(LOS) user and 10 non-lines of sight (NLOS) user [12]. An extensive antenna array
system is designed for 3.5–5.2 GHz with maximum gain, low correlation, and VSWR
<1.5 [13]. The unmanned aerial vehicles (UAVs) are commonly referred to as drones
and require control and connectivity over a wireless network to transfer massive
data from high-resolution cameras to ground base stations using massive MIMO
antennas [14]. The fundamental challenge of the existing massive MIMO system
is high computational complexity and complicated spatial structures. We focus on
channel estimation and direction of arrival (DOA) estimation to solve the problem that
integrates massive MIMO into deep learning [15]. The future broadband network will
be energy-efficient, secure, and robust. Massive MIMO will help connect the internet
of things, internet of people with clouds through digital networks [16]. In recent times,
efforts are on to achieve more gain and better signal processing techniques, massive
MIMO systems are proposed [17].
The proposed hexagonal microstrip antenna is simulated using HFSS 13.0v, fabri-
cated, and validated at 3.7 GHz. The second section of the paper will contain the
proposed design of the antenna. The third section will deal with results at the different
spacing between antenna array elements and discuss the prototype’s performance in
Circularly Polarized 1 × 4 Antenna Array … 477
terms of simulated gain, bandwidth, isolation between the port and array element,
axial ratio. The fourth section deals with linear and planer antenna array geometry
for massive MIMO base station. Final section deals with a conclusion on a linear
antenna array of 1 × 4 size for massive MIMO base station application.
The authors’ contribution is to design and develop a circularly polarized hexagonal
microstrip antenna for massive MIMO base station applications with a gain of 4.97 dB
per port. The array antenna gain of 1 × 4 is 11.37 dB for element spacing of 0.55λ.
The impedance bandwidth of 160 MHz and axial ratio less than 3 dB is achieved for
the given antenna at 3.7 GHz.
2 Proposed Antenna Design
Hexagonal microstrip antenna is used as a radiating element in designing a 1 × 4

antenna array at 3.7 GHz frequency. An antenna is created on a Rogers RT/duroid
5880 substrate of dielectric permittivity 2.2, a thickness of 1.57 mm, and a loss
tangent of 0.0009. The size of the ground plane is 40.5 mm × 40.5 mm. The antenna
is excited with the coaxial probe-feed technique. The inner and outer conductors of
the probe having dimensions of 0.65 mm and 1.5 mm, respectively. The dimension
of the dielectric is 0.85 mm. The feeding technique is popular and has low spurious
radiation. Hexagonal microstrip antenna is simulated by HFSS13.0v, which is a finite
element method. The dimension of the antenna is calculated using the equation:
1.8412c
a= √ (1)
2π fr r
2c
S= √ (2)
n fr r
a is the radius of circular microstrip antenna, c is the velocity of EM wave in free

space, f r and r are the resonant frequency and relative dielectric constant of the
substrate, respectively, S is the side length of a regular polygon, and n is the number
of sides of the polygon.
For a hexagon, with the value of n = 6, we can calculate the side length. The
equation can calculate the area of various regular polygons
S2 n
Area = ◦ (3)
4 tan( 180
n
)
Hexagonal microstrip antenna is designed at 3.7 GHz, the side length of the
antenna is calculated using Eq. (2), final dimension of the antenna is as follows in
Table1.
Circular polarization is achieved using the dual feed in which magnitudes applied
are the same and phase is in quadrature. It improves the performance against the
Table 1 The final dimension

Parameter Value
of the proposed antenna
Measured side length (S), mm 18.2
Simulated side length (S), mm 16.5
Dielectric constant ( r ) 2.2
Substrate thickness (h), mm 1.57
Loss tangent 0.0009
Ground plane (Ls × Ws), mm 40.5 × 40.5
multipath fading. Circularly polarized antenna in Massive MIMO BS can serve many
tens of terminals in the same time–frequency resource. The performance of CP
antennas is measured in an axial ratio. The axial ratio should be less than 3 dB over
the operating frequency range. Right-Hand Circularly Polarized (RHCP) hexagonal
microstrip antenna with dual feed is shown in Fig. 1.
FP1 and FP2 are feed with equal amplitude value and phase shift of 90 degrees
between them, respectively. The antenna is simulated over a 3.0–4.5 GHz frequency
range with 3.7 GHz as the center frequency. More than −25 dB return loss (S11) is
achieved at 3.7 Hz. Isolation between the port (S12) is more than 36 dB is achieved.
The impedance bandwidth at −10 dB is approximately 160 MHz. Results are shown
in Fig. 2.
The radiation pattern of the simulated antenna with a maximum gain of 4.97 dB
per port is achieved. E plane and H plane of radiation pattern obtained are shown in
Fig. 3. A hexagonal microstrip antenna is used to design a 1 × 4 antenna array for
massive MIMO base station applications. The simulated result shows that impedance
bandwidth of 160 MHz and axial ratio of <3 dB is achieved. 1 × 4 antenna array is
fabricated and tested using Vector Network analyzer. Axial ratio is a quality metric
Fig. 1 Two-port one

element RHCP hexagonal
microstrip antenna with dual
feed. Ls—Length of the
substrate, Ws—Width of the
substrate, S—Side length of
the hexagon. FP1—Feed
point 1 along—X-axis,
FP2—Feed point along +Y
axis
Fig. 2 S11 and S12 parameters of simulated Hexagonal microstrip antenna at 3.7 GHz
Fig. 3 The radiation pattern of simulated Hexagonal microstrip antenna with E plane and H plane
used in circular polarization. The axial ratio value is 0.27 at the center frequency of
3.7 GHz, as shown in Fig. 4.
Fig. 4 The axial ratio of simulated Hexagonal microstrip antenna at 3.7 GHz
The antenna array of 1 × 4 is designed using a hexagonal microstrip antenna with the
center to center spacing of 0.5λ, where λ is the wavelength of the electromagnetic
wave at 3.7 GHz. The antenna is simulated using HFSS13.0v. Isolation between
elements is less than 20 dB. The maximum achievable gain for of 1 × 4 antenna
array is 11.10 dB. Size of 1 × 4 antenna array is 40.5 mm × 162 mm × 1.57 mm.
Mutual coupling S13, S15, S17 for the array are as shown in Fig. 5.
Hexagonal microstrip antenna is used to design an antenna array of 1 × 4 with the
center to center spacing of 0.6λ, where λ is the wavelength of the electromagnetic
wave at 3.7 GHz. The antenna is simulated using HFSS13.0v. Isolation between
elements is much improved compared to the result with 0.5λ spacing. The maximum
achievable gain for the 1 × 4 antenna array is 11.43 dB. Gain and Isolation are
improved, but the size of the antenna is increased. The size of the antenna array is
40.5 mm × 186.3 mm × 1.57 mm. Mutual coupling S13, S15, S17 for the array are
as shown in Fig. 6.
Fig. 5 Mutual coupling between antenna elements of 1 × 4 antenna array with spacing between
antenna elements is 0.5λ
Fig. 6 Mutual coupling between antenna elements of 1 × 4 antenna array with spacing between
antenna elements is 0.6λ
The proposed antenna array of 1 × 4 is designed with an element spacing of

0.55λ, where λ is the wavelength of the electromagnetic wave at 3.7 GHz. Isolation
obtained is more than 22 dB and having a maximum gain of 11.37 dB. The size of the
antenna array is 40.5 mm × 175.5 mm × 1.57 mm. Comparative simulated mutual
coupling between array elements is given in Table 2.
The RHCP Hexagonal microstrip of 1 × 4 antenna array is designed and fabri-
cated on Rogers’s substrate of RT/duroid having a dielectric permittivity of 2.2. The
dimension of the array is 40.5 mm × 175.5 m × 1.57 mm. The fabricated prototype
is shown in Fig. 7.
The top and bottom views of the fabricated antenna is shown in Fig. 7a, b. The
S parameters measured are return loss (S11), mutual coupling between ports of a
single element (S12/S21), as shown in Fig. 8a, b
The measured return loss is –29.08 dB at a center frequency of 3.696 GHz,
matching with simulated results. Make of Vector Network Analyzer is RHODE &
SCHWARZ and can measures up to 6 GHz. The experimental setup for measuring
S parameters such as S11, S22, VSWR, S12, S21, and Phase shift between the ports
is shown in Fig. 9a. The gain obtained for the port of a single element is 4.97 dB.
For the massive MIMO, we are using a linear antenna array of 1 × 4 to increase the
gain. The Array Factor of 1 × 4 will increase the array gain to 11.37 dB at a spacing
of 0.55λ, as shown in Fig. 9b. The measured result is matched with the simulated
one for feed points 1 and 2, as shown in Table 3.
The results obtained for the proposed antenna element are compared with the
literature, as shown in Table 4.
4 Antenna Array Geometry for Massive MIMO Base

Station
The antenna array element spacing will decide the performance of MIMO and
massive MIMO for base station application. Transmission and reception of the inde-
pendent signal are analyzed in a rich scattering environment. In a linear array geom-
etry radiating elements are placed along an axis; similarly, in a planar array geometry
elements are placed along both axes. The performance of the linear array is measured
using different software such as HFSS, SystemVue in terms of maximum directivity
(D0 ), half-power beamwidth (HPBW), and side lobe level (SLL). The D0 and HPBW
(θh ) of a linear array are given by Eq. (4) and (5), respectively.

π −1 1.391λ
h = 2 cos (4)
2 π Nd

d
D0 = 2N (5)
λ
482
Table 2 Analysis of mutual coupling at the different spacing between array elements
Parameter Distance between ants Distance between ants Distance between ants
Array (d = 0.50λ) Array (d = 0.55λ) Array (d = 0.60λ)
Mutual coupling S13 S15 S17 S13 S15 S17 S13 S15 S17
(dB) −19.0 −36.2 −53.3 −22.3 −42 −63.8 −25 −50.4 −75
R. S. Bakale et al.
(a) Top View of designed antenna array (b) Bottom view of the designed antenna array
Fig. 7 Fabricated hexagonal microstrip antenna array of 1 × 4 size
(a) Return Loss (S11/S22) (b) Mutual coupling (S21/S12)
Fig. 8 Measurement of S parameter
(a) Experimental Setup (b) Frequency vs Gain plot
Fig. 9 Experimental set up for measurement of S parameter of the fabricated antenna
N is the number of elements, d is the inter-element spacing, and λ is the operating

wavelength. For different element spacing such as 0.5λ, 0.55λ, and 0.6λ, we can
analyze the performance in terms of 2D and 3D radiation patterns. Hence, linear and
planar array geometry is validated for Massive MIMO’s design and, therefore, the
base station.
Table 3 Comparison of designed and fabricated antenna parameters

Parameters Feed points 1 Feed points 2
Simulated Measured Simulated Measured
Frequency (GHz) 3.70 3.69 3.70 3.69
S11 (dB) −27.60 −29.13
S22 (dB) – −27.5 −29.0
S12 or S21 (dB) −36.0 −42.57 −37.0 −42.57
Phase (degree) 90.2 −91.37 90.2 −91.28
VSWR 1.09 1.18 1.10 1.18
Table 4 Benchmarking
Reference Frequency Bandwidth Gain (dB)
results of antenna element
elements (GHz) (MHz)
with the literature
[6] 3.6 230 5.4/port (2 × 2
antenna array)
[7] 2.45 171.9 6.17/port (No
antenna array)
[10] 5.8 200 13/port (1 × 4
antenna array)
[11] 2.596 194 10/port (1 ×
4antenna array)
Proposed 3.696 160 4.97/port (1 × 4
element antenna array)
Recently, concurrent multiband systems have become very popular [18–21]. The
proposed prototype of antenna design can be extended in this direction. This approach
will reduce the dimensions of the prototype and supports the multiple bands of
operation simultaneously.
5 Conclusions
This paper proposes a Massive MIMO antenna for mobile base station applications
designed at 3.7-GHz frequency. Antenna S parameters are measured, such as S11,
S21, and found closer to the simulated one. 1 × 4 antenna array having a simulated
gain of 11.37 dB and impedance bandwidth of 160 MHz at spacing 0.55λ. At 0.60λ
spacing, isolation is improved, but at the cost of increasing the antenna array size and
at 0.50λ spacing, isolation is less than 20 dB. For the proposed design, 0.55λ spacing
is selected for higher isolation. The antenna array is designed with circular polariza-
tion by feeding two-port with equal amplitude and quadrature-phase to achieve an
axial ratio of less than 3 dB.
References
1. Shaikh, A., Kaur, M.J.: Comprehensive survey of massive MIMO for 5G communications. In:
2019 Advances in Science and Engineering Technology International Conferences (ASET),
Dubai, United Arab Emirates, pp. 1–5 (2019). https://doi.org/10.1109/ICASET.2019.8714426
2. Artiga, X., Devillers, B., Perruisseau-Carrier, J.: Mutual coupling effects in multi-user massive
MIMO base stations. In: Proceedings of the 2012 IEEE International Symposium on Antennas
and Propagation, Chicago, IL, pp. 1–2 (2012). https://doi.org/10.1109/APS.2012.6349354
3. Gampala, G., Reddy, C.J.: Massive MIMO—beyond 4G and a basis for 5G. In: 2018 Inter-
national Applied Computational Electromagnetics Society Symposium (ACES), Denver, CO,
pp. 1–2 (2018). https://doi.org/10.23919/ROPACES.2018.8364192
4. Manteuffel, D.: Compact multi-port multi element antenna for Massive MIMO. In: 2016 IEEE
International Symposium on Antennas and Propagation (APSURSI), Fajardo, pp. 11–12 (2016).
https://doi.org/10.1109/APS.2016.7695714
5. Li, Y., Zou, H., Peng, M., Wang, M., Yang, G.: Hybrid 12-antenna array for quad-band 5G/Sub-
6GHz MIMO in micro wireless access points. In: 2018 International Conference on Microwave
and Millimeter Wave Technology (ICMMT), Chengdu, pp. 1–3 (2018). https://doi.org/10.1109/
ICMMT.2018.8563780
6. Al-Tarifi, M.A., Faouri, Y.S., Sahrawi, M.S.: A printed 16 ports massive MIMO antenna system
with directive port beams. In: 2016 IEEE 5th Asia-Pacific Conference on Antennas and Prop-
agation (APCAP), Kaohsiung, pp. 125–126 (2016). https://doi.org/10.1109/APCAP.2016.784
3130
7. Bhadade, R., Mahajan, S.: High gain circularly polarized pentagonal microstrip for massive
MIMO base station. AEM 8(3), 83–91 (2019)
8. Vieira, J., et al.: A flexible 100-antenna testbed for Massive MIMO. In: 2014 IEEE Globecom
Workshops (GC Wkshps), Austin, TX, pp. 287–293 (2014). https://doi.org/10.1109/GLO
COMW.2014.7063446
9. Li, Y., Sim, C., Luo, Y., Yang, G.: 12-Port 5G massive MIMO antenna array in sub-6GHz
mobile handset for LTE bands 42/43/46 applications. IEEE Access 6, 344–354 (2018). https://
doi.org/10.1109/ACCESS.2017.2763161
10. Xingdong, P., Wei, H., Tianyang, Y., Linsheng, L.: Design and implementation of an active
multi-beam antenna system with 64 RF channels and 256 antenna elements for massive MIMO
application in 5G wireless communications. China Commun. 11(11), 16–23 (2014). https://
doi.org/10.1109/CC.2014.7004520
11. Kim, Y., et al.: Full dimension mimo (FD-MIMO): the next evolution of MIMO in LTE systems.
IEEE Wirel. Commun. 21(2), 26–33 (2014). https://doi.org/10.1109/MWC.2014.6812288
12. Payami, S., Tufvesson, F.: Channel measurements and analysis for very large array systems at
2.6 GHz. In: 2012 6th European Conference on Antennas and Propagation (EUCAP), Prague,
pp. 433–437 (2012). https://doi.org/10.1109/EuCAP.2012.6206345
13. Yuan, H., Wang, C., Li, Y., Liu, N., Cui, G.: The design of array antennas used for Massive
MIMO system in the fifth generation mobile communication. In: 2016 11th International
Symposium on Antennas, Propagation and EM Theory (ISAPE), Guilin, pp. 75–78 (2016).
https://doi.org/10.1109/ISAPE.2016.7833881
14. Geraci, G., Garcia-Rodriguez, A., Galati Giordano, L., López-Pérez, D., Björnson, E.: Under-
standing UAV cellular communications: from existing networks to massive MIMO, IEEE
15. Huang, H., Yang, J., Huang, H., Song, Y., Gui, G.: Deep learning for super-resolution channel
estimation and DOA estimation based massive MIMO system. IEEE Trans. Vehicular Technol.
67(9), pp. 8549–8560 (2018). https://doi.org/10.1109/TVT.2018.2851783
16. Larsson, E.G., Edfors, O., Tufvesson, F., Marzetta, T.L.: Massive MIMO for next genera-
tion wireless systems. IEEE Commun. Mag. 52(2), 186–195 (2014). https://doi.org/10.1109/
MCOM.2014.6736761
17. Lu, L., Li, G.Y., Swindlehurst, A.L., Ashikhmin, A., Zhang, R.: An overview of massive MIMO:
benefits and challenges. IEEE J. Select. Top. Signal Process. 8(5), 742–758 (2014). https://doi.
org/10.1109/JSTSP.2014.2317671
18. Iyer, B., Pathak, N.P., Ghosh, D.: Dual-input dual-output RF sensor for indoor human occu-
pancy and position monitoring. IEEE Sens. J. 15(7), 3959–3966 (2015). https://doi.org/10.
1109/JSEN.2015.2404437
19. Iyer, B., Pathak, N.P., Ghosh, D.: Concurrent dualband patch antenna array for non-invasive
human vital sign detection application. In: 2014 IEEE Asia-Pacific Conference on Applied Elec-
tromagnetics (APACE), Johor Bahru, pp. 150–153 (2014). https://doi.org/10.1109/APACE.
2014.7043765
20. Iyer, B., Pathak, N.P., Ghosh, D.: Reconfigurable multiband concurrent RF system for non-
invasive human vital sign detection. In: 2014 IEEE Region 10 Humanitarian Technology
Conference (R10 HTC), Chennai, pp. 111–116 (2014). https://doi.org/10.1109/R10-HTC.
2014.7026309
21. Rathod, B., Iyer, B.: Concurrent triband filtenna design for WLAN and WiMAX applications.
In: Hitendra Sarma, T., Sankar, V., Shaik, R. (eds.) Emerging Trends in Electrical, Commu-
nications, and Information Technologies. Lecture Notes in Electrical Engineering, vol. 569,
pp. 775–784. https://doi.org/10.1007/978-981-13-8942-9_66
Analysis of Rectangular Microstrip
Array Antenna Fed Through Microstrip
Lines with Change in Width
Tarun Kumar Kanade, Alok Rastogi, Sunil Mishra, and Vijay D. Chaudhari
Abstract This paper deals with a detailed investigation of a microstrip array antenna
with step discontinuities at its feed line has been presented. In the proposed configu-
ration, antenna arrays at 2.45 GHz are designed, simulated, and fabricated to demon-
strate the concept of step discontinuities in the feed lines. A four-element rectangular
patch array is fully characterized, and its performance is critically assessed for no
step, single step, and double step microstrip feed lines. The return loss S11 [dB] is
better for microstrip array antennas with double step feed lines than array antennas
with no step and single step feed lines. Impedance matching and higher isolation
between the patches and feed lines were appropriate using step discontinuities at the
feed lines. FR4 substrates were used to design, simulate, and fabricate the microstrip
array antennas. The simulated S11 [dB] for no-step feed lines, single-step feed lines,
and double-step feed lines for rectangular microstrip array antennas are −8.78 dB, −
16.48 dB, and −17.15 dB, respectively. Prototypes of these antennas are then fabri-
cated and measured to validate the analysis and design experimentally. The simulated
and measured results agree with each other.
Keywords Rectangular patch · Array · 2.45 GHz · Microstrip feed lines ·

Dual-polarized antenna · Narrowband antenna
T. K. Kanade (B)
Assistant Professor, Department of Science, The Bhopal School of Social Science, Bhopal, MP,
India
A. Rastogi · S. Mishra
Professor, Department of Physics & Electronics, Institute for Excellence in Higher Education,
Bhopal, MP, India
e-mail: akrastogi_bpl@yahoo.co
V. D. Chaudhari
Assistant Professor, E & TC Engineering Department, G.F.’s Godavari College of Engineering,
Jalgaon, MS, India
488 T. K. Kanade et al.
1 Introduction
Printed Antennas are the promising candidates for microwave and millimeter-
wave communications, where the dimensions of the antenna should be kept to a
minimum. In the twenty-first century, planar antennas have found their applications
in cellular communication systems, digital communication systems, wireless LAN,
and personal communication systems. In modern wireless devices, the microstrip
patch antennas have been progressively demanded because of smart performance,
low-profile, lightweight, ease to construct, and conformability in the microwave and
millimeter-wave circuits. Microstrip antenna has some limitations like narrow band-
width and somewhat lower gain. Microstrip antenna consists of three parts: metal
layer or patch, dielectric substrate and ground metal layer, and a substrate are sand-
wiched between the metal layer and ground metal layer. Together the single patch
antenna and an array of microstrip patch antenna have their benefits in respective
domains. Microstrip array antenna consists of microstrip patch antenna elements,
interconnected and fed using microstrip transmission lines. Array configurations are
extensively used in microwave and millimeter-wave communication systems where a
narrow beam is required. The commonly used feeding techniques in microstrip array
antennas are parallel or series feeding. In a parallel feed network, all the patches are
coupled by single transmission lines, while in a series feed network, the radiating
elements are organized in a line and connected to a planar transmission line. The feed
networks are to be designed carefully to curtail any adverse effects on array perfor-
mance. As the feed line itself radiates, the feed line’s proper optimization must get the
appropriate return loss, gain, and directivity [1–4]. Section 2 describes the antenna
array design and fabrications, followed by Sect. 3, which deals with simulation and
measurement results. Conclusions are drawn in the last Sect. 4.
2 Antenna Array Design
The design and fabrication of various microstrip patch antennas require empirical
formulas and the parameters like dielectric constant and height of the substrate mate-
rial (εr), requiring frequency (fr ). The microstrip patch antenna’s width and length
are determined by the empirical formulae [3–7]. The single element microstrip patch
antenna is designed for fixed frequency and gain, and the radiation pattern is rela-
tively wide with a low directivity or gain. It is essential to design antennas with
specific directive features or large gain to meet long-distance communications in
various applications. The directivity and gain may be increased by increasing the
antenna’s electrical size, but the size increase also doesn’t fulfill the desired require-
ments. Another technique to increase the antenna’s dimensions without increasing
the individual patch elements’ size is to form an assembly of radiating patch elements
in an electrical and structural configuration. Thus, the array antenna is formed by
merging more than one patch element [8–10].
Analysis of Rectangular Microstrip Array Antenna Fed … 489
In the microwave and millimeter-wave circuit design, a straight, uninterrupted

or continuous transmission structure are of little use, and in any case, junctions or
discontinuities are a must. All practical microwave and millimeter-wave propagation
structures must inherently contain discontinuities. The commonly occurs disconti-
nuities in the transmission lines are bends, open circuits, change in width, and tran-
sitions in the planar transmission lines. Discontinuities also play a significant role in
the feeding structure of a single patch microstrip antenna or an array of microstrip
antennas. At discontinuities or junctions, electric field and magnetic field altered,
altered electric field distribution is liable for the change in capacitance, and altered
magnetic field distribution is responsible for the inductance change. The analysis of
microstrip discontinuities for the estimate of inductance and capacitance is carried
out by quasi-static analysis, and scattering parameters are studied through full-wave
analysis [11, 12].
A microstrip array antenna with step discontinuities shows a better performance
in terms of return loss, gain, or directivity than a microstrip antenna with uninter-
rupted feed lines. In this paper, microstrip array antennas are investigated, with the
straight feed line, single-step feed line, and double-step feed lines. In all three cases,
the microstrip array antennas are designed, simulated, and fabricated to study and
compare the performance based on feed lines [13, 14]. Using design formulations
structure of microstrip array antennas with various feed lines is shown in Figs. 1, 2,
and 3.
Fig. 1 Structure of rectangular microstrip patch array with no-step feed line
Fig. 2 Structure of rectangular microstrip patch array with a single-step feed line
Fig. 3 Structure of rectangular microstrip patch array with a double-step feed line
3 Simulation and Measurement Results
Printed Antennas are the favorable candidates for microwave and millimeter-wave
communications, where the dimensions of the antenna should be kept to a minimum.
The microstrip patch array antenna is designed and simulated using FEM-based
HFSS software and fabricated on FR4 substrate. The fabricated microstrip patch
array antennas with three different feed lines are shown in Figs. 4, 5, and 6. The
resulting parameters, like return loss, VSWR, and radiation patterns, were analyzed.
Figures 7, 8, and 9 presents the simulated reflection coefficient versus frequency. The
Fig. 4 Fabricated PCB of

the rectangular microstrip
patch array—no-step feed
line
Fig. 5 Fabricated PCB of

the rectangular microstrip
patch array—single-step
feed line
graphical analysis shows that S11 [dB] is enhanced for a microstrip array antenna
with a double-step feed line compared to an array antenna with single-step and no-
step feed lines. For an array of microstrip patch antennas, the simulated S11 [dB] is
8.79 dB at 2.45 GHz for no step feed line, 16.48 dB at 2.55 GHz for single-step feed
line, and 17.15 dB at 2.45 GHz for double step feed line.
The measured reflection coefficients versus frequency for the three different
microstrip patch arrays with no-step, single-step, and double-step feed lines are
shown in Figs. 10, 11, and 12, respectively. The measured S11 [dB] for a microstrip
Fig. 6 Fabricated PCB of the rectangular microstrip patch array—double-step feed line
Fig. 7 S11 [dB] of the rectangular microstrip patch array—no-step feed line
patch array with no-step feed line is −12.756 dB at 2.51 GHz, with a single-step feed
line is −15.199 dB at 2.52 GHz, and with double-step the feed line is −15.207 dB at
2.48 GHz. The S11 [dB] for a microstrip patch array with a double-step feed line is
resonant at a frequency of 2.48 GHz, near the required frequency. The simulated and
measured results nearly agree with each other—the variance between the simulated
and measured results to the extent of 2.0 dB. A slight deviation is also observed
between the measured and simulated operating frequencies due to the inaccuracies
in the fabrication process and measurement errors. Table 1 shows our implemented
array patch with the earlier implemented single patch [14].
Fig. 8 S11 [dB] of the rectangular microstrip patch array—single-step feed line
Fig. 9 S11 [dB] of the rectangular microstrip patch array—double-step feed line
will reduce the prototype’s dimensions and support the multiple operation bands
simultaneously with significantly less power requirements.
Fig. 10 Measured S11 [dB] of the rectangular microstrip patch array—no-step feed line
Fig. 11 Measured S11 [dB] of the rectangular microstrip patch array—single-step feed line
Fig. 12 Measured S11 [dB] of the rectangular microstrip patch array—double-step feed line
Table 1 Comparisons of single patch and array patch simulated and experimental results at
2.45 GHz
No-step feed Single-step feed Double-step feed
Single Array Single patch Array Single patch Array
patch [dB] patch [dB] [dB] patch [dB] [dB] patch [dB]
Simulation −11.91 −8.79 −14.32 −16.47 −15.91 −17.15
Experimental −11.77 −12.76 −10.44 −15.20 −19.96 −15.21
4 Conclusions
In this paper, four-element microstrip patch array antennas with three different feed
lines have been presented for wireless devices operating at 2.45 GHz. A new strategy
was proposed and analyzed by simulations, fabrications, and measurements to inves-
tigate the role of step discontinuities in a feed line. The simulation and the measured
result for the microstrip patch array antennas reveal that the array antenna with
double-step feed lines result in better performance than a single-step and no-step
feed line array antennas.
References
1. Lamminen, A., Säily, J., Ala-Laurinaho, J., de Cos, J., Ermolov, V.: Patch antenna and antenna
array on multilayer high-frequency PCB for D-band. IEEE Open J. Ant. Propagat. 1, 396–403
(2020)
2. Wang, L., En, Y.-F.: A wideband circularly polarized microstrip antenna with multiple modes.
IEEE Open J. Ant. Propagat. 1, 413–418 (2020)
3. Balanis, C.A.: Antenna Theory: Analysis and Design, 3rd edn. Wiley, New York (1997)
4. Waterhouse, R.B.: Microstrip Patch Antennas: A Designer’s Guide, 1st edn. Springer Science
+ Business Media, New York (2003)
5. Abohmra, A., Abbas, H., Al-Hasan, M., Mabrouk, I.B., Alomainy, A., Imran, M.A., Abbasi,
Q.H.: Terahertz antenna array based on a hybrid perovskite structure. IEEE Open J. Ant.
Propagat. 1, 464–471 (2020)
6. Chiu, C.-Y., Lau, B.K., Murch, R.: Bandwidth enhancement technique for broadside tri-modal
patch antenna. IEEE Open J. Ant. Propagat. 1, 524–533 (2020)
7. Gupta, C., Gopinath, A.: Equivalent circuit capacitance of microstrip step change in width.
IEEE Trans. Microwave Theory Tech. MTT-25, 819–822 (1977)
8. Easter, B.: The equivalent circuit of some microstrip discontinuities. IEEE Trans. Microwave
Theory Tech. MTT-23, 655–660 (1975)
9. Horton, R.: Equivalent representation of an abrupt impedance step in microstrip line. IEEE
Trans. Microwave Theory Tech. MTT-21, 562–564 (1973)
10. Thompson, F., Gopinath, A.: Calculation of microstrip discontinuity inductances. IEEE Trans.
Microwave Theory Tech. MTT-23, 648–655 (1975)
11. Krage, M.K., Haddad, G.I.: Frequency dependent characteristics of microstrip transmission
lines. IEEE Trans. Microwave Theory Tech. MTT-20, 678–688 (1975)
12. Raicu, D.: Universal taper for compensation of step discontinuities in microstrip lines. IEEE
Microwave Guided Lett. 1, 249–251 (1991)
13. Koster, N.H.L., Jansen, R.H.: The microstrip step discontinuity: a revised description. IEEE
Trans. Microwave Theory Tech. MTT-34, 213–223 (1986)
14. Kanade, T.K., Rastogi, A.K., Mishra, S.: Design simulation and experimental investigations of
microstrip patch antennas and its feed line. Int. J. Eng. Res. Technol. 4, 25–28 (2015)
15. Iyer, B., Pathak, N.P., Ghosh, D.: Dual-input dual-output RF sensor for indoor human occu-
pancy and position monitoring. IEEE Sens. J. 15(7), 3959–3966 (2015). https://doi.org/10.
1109/JSEN.2015.2404437
16. Iyer, B., Pathak, N.P., Ghosh, D.: Concurrent dualband patch antenna array for non-invasive
human vital sign detection application. In: 2014 IEEE Asia-Pacific Conference on Applied
Electromagnetics (APACE), Johor Bahru, 2014, pp. 150–153. https://doi.org/10.1109/APACE.
2014.7043765
17. Iyer, B., Pathak, N.P., Ghosh, D.: Reconfigurable multiband concurrent R.F. system for non-
invasive human vital sign detection. In: 2014 IEEE Region 10 Humanitarian Technology
Conference (R10 HTC), Chennai, 2014, pp. 111–116. https://doi.org/10.1109/R10-HTC.2014.
7026309
In: Hitendra Sarma, T., Sankar, V., Shaik, R. (eds.) Emerging Trends in Electrical, Commu-
pp. 775–784 (2020). https://doi.org/10.1007/978-981-13-8942-9_66
19. Iyer, B.: Characterisation of concurrent multiband RF transceiver for WLAN applications. Adv.
Intell. Syst. Res. 834–846 (2016). https://doi.org/10.2991/iccasp-16.2017.112
20. Iyer, B., Garg, M., Pathak, N., Ghosh, D.: Contactless detection and analysis of human vital
signs using concurrent dual-band R.F. system. Procedia Eng. 64, 185–194 (2013)
Parametric Study of Electromagnetic
Coupled MSA Array for PAN Devices
with RF Survey
Shilpa Nandedkar, Shankar Nawale, and Anirudha Kulkarni
Abstract This paper presents a parametric study and design of an electromagneti-

cally coupled microstrip patch array. The proposed design is considered at 2.4GHz
frequency. This array consists of four elements, each at the top and base layers,
respectively. The measured bandwidth of an eight-element planar array is 279 MHz,
and return loss is 30 dB. The antenna is simulated in a CST microwave studio and
fabricated by using FR-4 substrate material. The practical antenna shows a rise in
the bandwidth of 155 MHz and returns loss of 9 dB.
Keywords MSA array · Electromagnetic coupling · PAN Devices
1 Introduction
Array antenna has become one of the essential parts of today’s digital (wireless
communication) world. In wireless communication, connectivity and bandwidth are
significant factors. Microstrip Antenna (MSA) arrays are very popular, with gain
up to 30 dB. These are available in two different forms, linear and planar. Though
both have their merits and demerits, planar array antenna occupies comparatively
less space. Feeding techniques play an essential role in the radiation characteristics
of the array antenna. The literature shows that aperture coupled microstrip array
antenna has gained more rapid development in MSA technology [1], as it gives high
gain and low sidelobe levels [3].
The only difference between aperture coupled and electromagnetic coupled array
antenna is that aperture coupling occurs between top and base layers through the
slot, which occurs in the base layer where the feeding network exists electromagnetic
S. Nandedkar (B)
Maharashtra Institute of Technology, Aurangabad, MS 431005, India
S. Nawale
N B Navale Sinhgad College of Engineering, Solapur, MS 413255, India
A. Kulkarni
Team Leader Cyronics Instruments Pvt Ltd, Pune, MS 411009, India
e-mail: anirudha@cyronics.com
498 S. Nandedkar et al.
coupling, top, and base layers are coupled without slot. For bandwidth enhancement,
the broadband proximity fed gap coupled RMSAs can be used [4]. Series fed network
for microstrip array antenna minimizes feedline length and radiation from feedline
[5, 6]. A corporate feed network is preferred when space limitation takes place. It also
has advantages such as equal power to all elements, larger bandwidth, and modular
nature [7–11]. The microstrip-line feed is easy to fabricate, and if the inset position
is selected correctly, impedance matching becomes easy [12].
Wireless PAN and its applications are in more demand because of their advantages,
such as high data rate and small coverage area up to 10 m. Devices operating with Wi-
Fi are desktops, computers, laptops, smartphones, printers, smartwatches, personal
digital assistances, etc. These devices’ interconnection may include Personal Area
Network (PAN), Bluetooth, and Ethernet [13]. For achieving a high data rate with a
small area, antennas with different structures are used, such as ultra-wideband and
many inputs, many output (MIMO) antenna, planar configuration instead of linear,
and so on [14]. For handheld devices in PAN characterization, the MIMO channel
has been proposed [15–17]. As it is one of the requirements of PAN, planar array
antennas are proposed in this work. Array antenna is simulated using CST microwave
studio. The antenna is fabricated by using FR4 substrate, and operating parameters
are measured with a network analyzer. Simulated and measured results are compared
for the analysis of the antenna.
Further sections of this article are elaborated as Sect. 2 gives a single patch design
procedure, array antenna with the base layer (with microstrip line feed and coaxial
feed), top layer, and their parametric studies. Fabricated antenna and measured results
along with testing for PAN applications are presented in Sect. 3. Finally, a comparison
of measured and simulated results and conclusions of the paper are given in Sect. 4.
2 Antenna Design and Parametric Study
To realize higher BW and gain, an electromagnetic coupled planar RMSA array

antenna is designed in three different stages: single patch, base layer antenna, and
top layer antenna. These are explained in more detail as follows.
2.1 Design of Single Patch Antenna
A single patch antenna is designed with resonant frequency at 2.4 GHz with FR-4
substrate with a dielectric constant of 4.3 and a thickness of 1.6 mm, as shown in
Fig. 1. The dimensions of the patch and microstrip line are also shown. Length and
width are calculated for this design (38.0 mm by 29 mm). Microstrip lines with
dimensions as 23.35 mm with an inset feed of depth of 8.85 mm are used. The
gap between patch and inset feed (Gpf ) is taken 1 mm. The calculated width of the
Parametric Study of Electromagnetic Coupled MSA Array … 499
Fig. 1 Single patch antenna

with inset feed
microstrip line is 3.14 mm. The ground plane and patch have a thickness of 0.035 mm
with copper material. It shows the dimensions of the patch and microstrip line.
2.2 Design of Baselayer of an Array
The array antenna’s base layer is designed with substrate material as FR-4 has a
dielectric constant of 4.3 and a thickness of 1.6 mm. All patches are placed in planar
configuration to reduce the antenna’s size and make it more compact. Initially, two
by two (2 × 2) array with microstrip line feed is designed. For achieving impedance
matching between line feed and individual patch, a quarter wave transformer is used
[16]. The input impedance (Zi) is 50 . Line impedance Zc is calculated by using
the quarter wave transmission line impedance equation.

Zc = Z1 ∗ Z2 (1)
where Z1 = 50 ohms and Z2 = 100 ohms.

Therefore, Zc is = 70.71 ohms.
Figure 2 shows the base layer of the array antenna with a feeding network.
Figure 2a shows the microstrip line, and Fig. 2b shows the array antenna with a
coaxial feed network matched with the quarter-wave transformer.
2.3 Design of Top Layer Array Antenna
The array antenna’s top layer is designed with a similar substrate material and has a
thickness of 0.11 mm. This array is without a feed line and ground plane. Spacing
between two layers of the array is varied to get optimum bandwidth. Figure 3 shows
the top layer of the array antenna.
Fig. 2 a Array with a microstrip line feed. b Array with coaxial feed
Fig. 3 The top layer array

antenna
2.4 Parametric Study
The array antenna’s base layer is simulated with microstrip line feed in which results
obtained main lobe magnitude 4 dbi, main lobe direction 11°, half-power beamwidth
or angular width 58.7 degree, and sidelobe level of -2.6 dB. A base layer antenna with
coaxial feed is also simulated. Similarly, an electromagnetic array (with both base
and top layer) is also simulated. These results are compared and presented in Sect. 4.
Figures 4 and 5 show far-field directivity and return loss curves for electromagnetic
coupled array antenna.
3.1 Array with a Microstrip Line Feed
The fabricated antenna (base layer with microstrip line) is shown in Fig. 6. Parameters
are measured with a network analyzer. Parameters such as return loss with a resonant
Fig. 4 Directivity of the array antenna
Fig. 5 Return loss plot
frequency, bandwidth, VSWR, and impedance are measured for the base antenna. It
shows two peaks at 5.5 GHz with—21.19 dB as return loss and another at 5.09 GHz
with a return loss of—21.39 dB. Bandwidth is measured as 124.9 and 203 MHz at
the resonant frequency of 5.09 GHz, shown in Fig. 7. Similarly, VSWR is 1.20 at
5.5 GHz and 1.19 at 5.09 GHz. It shows an impedance of 47.729 ohms.
3.2 An Electromagnetically Coupled Array Antenna
The radiating microstrip patch elements (four elements) are etched on the antenna’s
top layer and the base layer with a coaxial feed line. The thickness of these two
substrates is chosen independently to optimize radiation and circuitry’s distinct elec-
trical functions [4]. Electromagnetically coupled array antenna with a top layer
without feed and base layer with coaxial feed is shown in Fig. 8.
Fig. 6 Array with the microstrip line feed
Fig. 7 Measurement of bandwidth
Measurement of this array antenna’s various parameters is done by keeping a

minimum distance between the top and base layer. Figure 9 shows the setup and
results of bandwidth measurement for the base antenna. An electromagnetically
coupled array antenna is used as a transmitter. Standard printed dipole antenna and
single rectangular microstrip patch antenna are used for receiving antenna for PAN
applications.
Fig. 8 Electromagnetic coupled array antenna: a Base antenna b top antenna
Fig. 9 Bandwidth measurement of base antenna
Measurement of various parameters

Master board impedance testing:
The main role of the Master board is to act as a transmitter. It consists of an
electromagnetic coupled MSA array antenna with both layers. Here, impedance test
is done for this.
Slave board impedance Testing:
The main role of the Slave board is to act as a receiver. It consists of receiving
antenna and necessary components. Impedance and SWR testing are done for this.
Figure 10 shows impedance and SWR plots for the same.
Fig. 10 Impedance and SWR plot for the slave device
Table 1 Impedance testing report for slave

Impedance Test S11 in dB Impedance in ohm Resonant BW in MHz Observations
With 2.4 GHz −16 77 190 Resonance is
stable
With 865 MHz −11 52 10 Graph is stable
Table 2 Outdoor RF survey

Tx antenna Tx height Rx antenna Rx height Max range Level of vegetation
3.5 ft 3.5 ft 3.5 ft 3.5ft 52 m Low
Table 1 compares two different slave devices’ measured results, one with 2.4 GHz
and another with 865 MHz. Parameters that are compared are return loss, bandwidth,
and impedance measured. These slave results are compared with the standard test
RF survey. The Master transmits −90 dB at 1 m, the Slave1 receives −89 dB at 3 m,
and Slave 2 receives −80 dB at 10 m. Table 3 shows a comparison of measured and
simulated results for electromagnetic coupled array antenna with various parameters.
will reduce the dimensions of the prototype and supports the multiple bands of
operation simultaneously.
4 Conclusions
A parametric study has been done for electromagnetic coupled array antenna for PAN
devices. The proposed antenna is placed at the transmitter side, and 2.4 GHz input
is given from transmitter and at receiver for various distances from 60 cm to 10 m,
receiving signals is observed for different devices. This proposed antenna is also
tested with a vector network analyzer, and the following conclusions are drawn. It
gives wider bandwidth as well as reasonable return loss. When the distance between
top and base antenna is less than 1 mm, it provides more bandwidth, increasing
to 84%. When the top layer is placed horizontally, it gives wider bandwidth as
compared to vertical placement. Bandwidth increases to 440MHz from 279 MHz,
i.e., approximately 57.7%, and there is a change in return loss from −16 to −33 dB.
After the master–slave study at two different ranges, the power received is observed,
which shows good bandwidth response for PAN applications, and range is observed
and verified with the standard procedure of open RF test survey.
References
1. Pozar, D.M.: A review of aperture coupled microstrip antennas: history, operation, develop-
ment, and applications. University of Massachusetts at Amherst
2. Poduval, D., Ali, M.: Wideband aperture coupled patch array antennas high gain, low side lobe
design. Prog. Electromagnet. Res. 160, 71–87 (2017)
3. Amita, A., Ray, K.P.: Proximity fed gap-coupled half E-shaped microstrip antenna array.
Sadhana Acad. Proc. Eng. Sci. 40:75–87 (2015)
4. Wu, K.L., Spenuk, M., Litva, J., Fang, D.G.: Theoretical and experimental study of feed network
effects on the radiation pattern of series-fed microstrip antenna arrays. IEE Proc. H Microw.,
Antennas Propag. 138, 238–242 (1991)
5. Honari, M.M., Abdipour, A., Moradi, G., Mirzavand, R., Mousavi, P.: Design and analysis of
a series-fed aperture-coupled antenna array with wideband and high-efficient characteristics.
IEEE Access 6, 22655–22663 (2018)
6. Sahu, A.K., Das, M.R.: 4×4 rectangular patch array antenna for bore sight application of
conical scan S-band tracking radar. In: 2011 IEEE Indian Antenna Week—Work Advanced
Antenna Technology IAW (2011). https://doi.org/10.1109/IndianAW.2011.6264931
7. Alam, M.M., Sonchoy, M.R., Goni, O.: Design and performance analysis of microstrip array
antenna. Prog. Electromagn. Res. Symp. Proc. 1837–1842 (2019)
8. Nataraj, A.N., Sujatha, M.N.: Analysis and design of microstrip antenna array for S-band
applications. Int. Conf. Commun. Signal Process. ICCSP 2016, 2023–2027 (2016). 978–1–
5090–0396–9/16/$31.00 ©2016 IEEE
9. Gunasekaran, T., Veluthambi, N., Ganeshkumar, P., Kumar, K.R.S.: Design of edge fed
microstrip patch array antenna configurations for WiMAX. In: 2013 IEEE International
Conference on Computer Computational Intelligence Research, IEEE ICCI, pp. 1–4 (2013)
10. Hadzic, H. Verzotti, W., Blazevic, Z., Skiljo, M.: 2.4 GHz microstrip patch antenna array with
suppressed sidelobes. In: 2015 23rd International Conference Software Telecommunication
Comput Networks, SoftCOM, pp. 96–100 (2015)
11. Balanis, C.A. Antenna Theory: Analysis and Design, 3rd edn.
12. Seol, K., Choi, S.: A study on design of antenna for PAN application. In: Proceedings of the 18th
International Zurich Symposium on Electromagnetic Compatibility, EMC, vol. 4, pp. 221–223
(2007)
13. Mallahzadeh, A.R. Es’haghi, S., Alipour, A.: A. design of an E-shaped MIMO antenna using
IWO algorithm for wireless application at 5.8 GHz. Prog. Electromagn. Res. 90, 187–203
(2009)
14. Aredal, J., Johansson, A.J., Tufvesson, F., Molisch, A.F.: Characterization of MIMO channels
for handheld devices in personal area networks at 5 GHz. Eur. Signal Process. Conf. (2006)
15. Sahoo, R., Vakula, D.: Gain enhancement of conformal wideband antenna with parasitic
elements and low index metamaterial for WiMAX application. AEU 105, 24–35 (2019). https://
doi.org/10.1016/j.aeue.2019.03.014
16. Farserotu, J., Hutter, A., Platbrood, F., Ayadi, J., Gerrits, J., Pollini, A.: UWB transmission and
MIMO antenna systems for nomadic users and mobile PANs. Wirel Pers Commun 22, 297–317
(2002)
17. Sipal, D., Abegaonkar, M.P., Koul, S.K..: UWB MIMO USB dongle antenna for personal are
network applications. Asia-Pacific Microw. Conf. APMC (2016)
18. Iyer, B., Pathak, N.P., Ghosh, D.: Dual-input dual-output RF sensor for indoor human occupancy
and position monitoring. IEEE Sens. J. 15(7), 3959–3966 (July 2015). https://doi.org/10.1109/
JSEN.2015.2404437
19. B. Iyer, N. P. Pathak, and D. Ghosh, “Concurrent dualband patch antenna array for non-invasive
human vital sign detection application,” 2014 IEEE Asia-Pacific Conference on Applied
Electromagnetics (APACE), Johor Bahru, 2014, pp. 150–153, DOI: https://doi.org/10.1109/
APACE.2014.7043765.
20. Iyer, B., Pathak, N.P., Ghosh, D., Reconfigurable multiband concurrent RF system for non-
invasive human vital sign detection: IEEE Region 10 Humanitarian Technology Conference
(R10 HTC). Chennai 2014, 111–116 (2014). https://doi.org/10.1109/R10-HTC.2014.7026309
In: Hitendra Sarma, T., Sankar, V., Shaik, R. (eds.), Emerging Trends in Electrical, Commu-
pp. 775–784. https://doi.org/10.1007/978-981-13-8942-9_66
22. Iyer, B.: Characterisation of concurrent multiband RF transceiver for WLAN applications. Adv.
Intell. Syst. Res. 834–846 (2016). https://doi.org/10.2991/iccasp-16.2017.112
23. Iyer, B., Garg, M., Pathak, N., Ghosh, D.: Contactless detection and analysis of human vital
signs using concurrent dual-band RF system. Proc. Eng. 64 185–194 (2013)
Fractal Tree Microstrip Antenna Using
Aperture Coupled Ground
Sanjay Khobragade, Sanjay Nalbalwar, and Anil Nandgaonkar
Abstract Fractal tree microstrip antenna is proposed with the non-contracted ground
with slot call as aperture coupling. The outcome of such design is concluded in an
expansion in the resonant frequency bandwidth and gain. The proposed antenna is
designed for a complete UWB band with a 1-GHz increment on the two sides (for
example, 2.7–12.5 GHz). A similar antenna printed utilizing epoxy substrate having
a relative permittivity of 4.4, and the thickness of the structure is 1.6 mm with a fixed
size of 49 × 65.7 mm. The antenna is simulated by HFSS, fabricated in a university
lab, and tested using a vector network analyzer.
Keywords Microstrip antenna · Multiband · Array · Fractal · Resonant freq.
1 Introduction
The microstrip antenna (MSA) is used for various applications because of its essen-
tial characteristics like lightweight, compact in size, simple to create, and reproduc-
tion just as testing. Regardless, it has the limitation of low gain and small band-
width. To improve these disadvantages, the aperture coupled technique is utilized
in the proposed paper. In an aperture coupled feed tree-formed microstrip antenna,
an alternate dielectric substrate is used for the feed and the patch. The essential
differentiation is both the substrate is disengaged by a ground plane, which contains
a coupling opening or space between the feed and fix. The proposed design of an
antenna shows up in Fig. 1. A plan among layers and the correct decision of hole
size and position will be fundamental in controlling the antenna impedance. The
common presence of holes between the dielectric substrate layers can change the
S. Khobragade (B) · S. Nalbalwar · A. Nandgaonkar

Dr. Babasaheb Ambedkar Technological University, Lonere 402103, India
S. Nalbalwar
e-mail: slnalbalwar@dbatu.ac.in
A. Nandgaonkar
e-mail: abnandgaonkar@dbatu.ac.in
508 S. Khobragade et al.
Fig. 1 Proposed design of MSA using aperture coupled feeding
information impedance regards. The nonappearance of unexpected current disconti-

nuities in Aperture Coupled Feed (ACF) makes it generally simple to design antenna
precisely. Reference [1, 3] focused on the opening coupling, while references [2, 4,
5] depending on the theme’s overall investigation. References [7, 8] are the latest
papers on the aperture coupled microstrip reception apparatus [1–8].
1.1 Description of Design
MSA with a tree-shaped structure is shown in Fig. 1, and its details of the proposed
design are shown in Table 1. The MSA tree-shaped antenna is planned to reverberate
at multiband frequencies using this innovative structure. This design involves two
Fractal Tree Microstrip Antenna Using Aperture Coupled Ground 509
Table 1 Design specification

S. no. Title Specification
1 Patch (iteration 1) 22 × 8 mm
5 Patch (iteration 5) 4.2 × 2 mm
6 Ground (size) 49 × 65.7 mm
7 Substrate 1 and 2 49 × 65.7 mm
8 Substrate height (h) 1.6 mm
9 Sub. die. constant 4.4
10 Loss tangent 0.019
11 Ground addl. patch 12
12 Radius 1 (aperture) 22.1266 mm
13 Radius 2 (aperture) 22.31 mm
14 Patch (aperture) 6.5 × 8 mm
layers of the substrate of FR4, which has a dielectric constant of 4.4 and dielectric
loss tangent (σ ) of 0.019, and a single-layer substrate thickness of 1.6 mm.
The design discussed in this paper is considered in three stages. First is with the
simple ground with a given size of 49 × 65.7 mm. Similarly, the substrate is of FR4
having the 4.4 dielectric constant and loss tangent of 0.019 and height of the substrate
is 1.6 mm. The second stage consists of the ground with mirror image slot, and the
size is a replica of the main patch. For the perfect feeding, we inserted the additional
patch of size 12 mm × 6 mm. The design is modified with the insertion of one more
substrate with the previous one. The size of both these substrates is the same as given
in Table 1.
2 Antenna Design
As per the specification provided in Table 1, the antenna is simulated in three stages.
All three steps are shown in Fig. 2. Simultaneously, Fig. 2a represents the proposed
design with the single substrate and complete ground. The ground’s size is as per
the specification provided as 49 × 65.7 mm, and the substrate is FR4 with 1.6 mm
height with the symmetrical fractal tree structure. This tree is iterated up to the fifth
stage.
The antenna represented in Fig. 2 is of the basic fractal tree-shaped with ground
having a mirror image slot, which is the perfect representation of the patch’s main
design. One more patch is added to the design so that smooth ground is obtained.
The size of this patch is 12 × 6 mm. The substrate used for the design is of the same
size with a thickness of 1.6 mm. Similarly, the last Figure is the final design with two
Fig. 2 Proposed design of aperture coupling MSA using mirror slotted ground with stages 1, 2,
and 3
substrates with the same details provided in Table 1. The design is arranged so that
the middle slot upper substrate and patch will be an aperture coupled feed structure.
In the Result and Discussion, the author discussed all the three-stage designs. The
detailed specification of all the three stages is already discussed in the introduction
and provided in Table 1.
3.1 Simulation Result of Stage One
The proposed antenna is simulated for stage one, as discussed in the earlier section.
The results are discussed in Table 2.
Here we obtained five resonating bands, which are 7.2533, 9.68, 10.9244, 11.6089,
and 12.6356 GHz. This represents the excellent VSWR and Return loss S11. Reso-
nant frequency 9.68 GHz shows the best result in terms of VSWR bandwidth and
return loss bandwidth also. Figure 3 represents the directive gain of stage one.
Results concluded that the direction pattern lobe is inclined more toward 30° and
−30°. The proposed design is perfectly balanced and symmetrical, and hence that
reflects in the directive gain pattern.
Table 2 Simulation results of stage 1

Res. freq. (GHz) VSWR VSWR (B/W) Perc. (B/W) (%) S11 (dB) S11 (B/W) (GHz)
(GHz)
7.253 1.099 0.21 2.89 −26.48 0.2
9.68 1.36 0.47 4.85 −16.38 0.45
10.92 1.34 0.40 3.66 −16.74 0.37
11.61 1.658 0.22 1.89 −12.13 0.22
12.63 1.23 0.46 3.64 −19.74 0.42
Fig. 3 Stage 1 directive gain

Table 3 Simulation results for second stage design

(GHz)
1.936 1.564 0.11 5.68 −13.15 0.12
2.967 1.149 0.11 3.7 −23.16 0.08
3.341 1.113 0.099 2.96 −25.41 0.08
5.214 1.221 0.24 4.6 −20.04 0.24
6.806 1.528 0.13 1.9 −13.59 0.12
8.023 1.379 0.18 2.24 −15.94 0.18
9.1 1.590 0.53 5.8 −12.84 0.41
10.03 1.348 0.77 7.67 −16.46 0.65
12.52 1.362 1.74 18.9 −16.29 1.66
3.2 Simulation Result of Stage Two
The proposed antenna is simulated for stage two, as discussed in the earlier section.
The results are discussed in Table 3.
Here we discussed stage two results. We obtained nine resonant frequency bands,
which are in the range of 1.9–12.5 GHz. Following are the frequencies 1.9365,
2.9666, 3.3411, 5.214, 6.8060, 8.0234, 9.1003, 10.0368, and 12.5184 GHz. Here,
the best result for the VSWR bandwidth is 18.9, which is almost 3.9 times the
bandwidth in stage one. Similarly, the S11 bandwidth is also increased 3.69 times
that of stage one for S11 bandwidth. The directive gain of the said stage is shown in
Fig. 4.
The directive gain, as shown in Fig. 4 distributed in all directions. The best results
for the resonant frequency are at 8.0234 and 9.1003 GHz, where the directive gain
pattern is perfectly directive.
3.3 Simulation Result of Stage Two
The proposed antenna is simulated for stage three, as discussed in the earlier section.
The results are discussed in Table 3. This feed is placed with two patches with R1 =
22.31 mm and R2 = 22.1266 mm shown in Table 1.
Here we discussed stage three design. We obtained seven frequency bands in
the range of 2.732–12.05 GHz. Following are those frequencies 2.7324, 3.9967,
2.514, 8.0234, 9.2876, 10.1773, and 12.0502 GHz. Results concluded that VSWR
bandwidth is 9.33. This VSWR bandwidth is double that of stage one. Similar changes
occurred for the S11 bandwidth also. Stage three directive gain is shown in Fig. 5.
The direction pattern for stage three shows that the lobe is distributed in all direc-
tions. For perfect direction pattern resonant frequencies, 2.7334 GHz is a suitable one.
For the remaining frequencies, the lobe is inclined based on the geometric structure.
The current distribution is shown for all stages in Fig. 6. Which shows the distri-
bution is balance in nature. In the last iteration, because of the limitation of geometry,
the current intensity is significantly less. We are working on this limitation.
3.4 Experimental Result and Discussion
We designed, simulated, and tested the antenna in the university lab. Testing results
are plotted in Fig. 7.
We compared both VSWR and S11 for the proposed design for simulation as well
as experimentation results. Figure 7b, c conclude the simulation and experimental
results are well matched in all three stages. Results are tabulated in Tables 2, 3, and
4. The highest bandwidth for stage one is tabulated as 4.85%, for stage 2 is 18.9%,
and the same for stage three 9.33%. Similarly, 450, 1.66, and 930 MHz for respective
stages for S11 bandwidth. It concludes that stage two and stage three provide the
VSWR and S11 bandwidth increase verified by testing results.
Fig. 6 Current distribution pattern for all the stages
4 Conclusion
The most powerful characteristics of fractal geometry are self-similarity. The author
took advantage of this property to obtain multiband behavior. Tree-shaped fractal
is one of the simplest geometry in terms of the algorithm among all the fractals
available in nature. We implemented this architecture using the aperture coupling
feeding method to enhance bandwidth. Results validate the enhancement in both the
bandwidth. The proposed antenna covers the range from 2.7 to 12.05 GHz range.
The aperture couple improves the bandwidth because of its non-contacting nature.
Results show that VSWR bandwidth increases up to 3.9 times for the second and 1.92
times for the third stage. Respective changes occur for S11 bandwidth also, which
offers 1.66 GHz for stage two and 930 MHz for stage three compared to 450 MHz
for stage one.
Fig. 7 Proposed design with VSWR and S11 Comparison for all stages
Table 4 Simulation results for third stage design

(GHz)
2.732 1.315 0.06 2.19 −17.328 0.08
3.997 1.559 0.08 2.0 −13.216 0.08
5.214 1.476 0.18 3.45 −14.331 0.18
8.023 1.723 0.25 3.11 −11.521 0.21
9.288 1.137 0.35 3.77 −23.873 0.32
10.17 1.057 0.95 9.33 −31.685 0.93
12.05 1.594 0.23 1.9 −12.738 0.2
References
1. Liu, L., Lu, Q., Ghassemlooy, Z., Korolkiewicz, E.: Investigation of transformer turn ratio and
design procedure for an aperture coupled slot antenna. IET J. 61–65 (2011)
2. Feresidis, A.P., Konstantinos, K., Lancaster, M.J., Peter, S.: Waveguide fed high gain antenna at
submillimeter wave frequencies. IET J.
3. Kirov, G.S., Mihaylova, D.P.: Circularly polarized aperture coupled microstrip antenna with
resonant slot and screen smith. Radio Eng. 19(1) (2010)
4. Lai, C.H., Han, T.Y., Chen, T.R.: Broadband aperture coupled microstrip antenna with low cross
polarization and back radiation. Prog. Electromagn. Res. Lett. 5, 187–197 (2008)
5. Kumar, G., Ray, K.P: Broadband Microstrip Antennas. Antennas and Propagation, pp. 1–167.
Artech House, Boston London (2003)
6. Ansoft HFSS12.1 Simulation software
7. Satyanarayana, D.S.S., Bathula, A.: Aperture coupled microstrip antenna design and analysis
using MATLAB. Int. J. Eng. Res. Technol. (IJERT) 8(06) (2019). ISSN: 2278-0181
8. Sujatha, C.N., Murti Sarma, N.S.: Design of aperture coupled microstrip planar array. IJIREEICE
5(6) (2017). ISSN: 2321-2004 (online)
Wind Speed at Hub Height (Using
Dynamic Wind Shear) and Wind Power
Prediction
Rohit Kumbhare, Suraj Sawant, Sanand Sule, and Amit Joshi
Abstract Prediction of hub-height wind speed with the ground-level (10 m) wind
speed is difficult as the wind is chaotic. Several forecasters provide wind speed
forecasts, but due to variations in hub heights, conversion of a hub-height wind speed
is challenging. At present, lots of research is going on to predict the wind speed by
using mathematical formulae and statistics, and biologically inspired computing have
also been used to predict particular height wind speed. Weather parameter affects
the accuracy and increases the error band. To solve this issue, the models have been
created based on the Decision Tree Regressor/Keras Neural Network ML technique,
which uses the weather parameter and ground-level wind speed to predict the wind
shear. These attributes will help in predicting the wind particular hub height and
wind speed for at least 1.5–3 h. Besides, there are also two power forecast models
(Decision Tree Regressor/Keras Neural Network ML) which take the hub-height
wind speed and weather parameters as input and forecast the power generation for
the given power plant. It also provides brief information about the power-law method
to calculate the wind shear coefficient. This model will help many wind power plants
know about the present wind prediction model capabilities; it will also allow us to
predict the particular hub-height wind speed and power generation for their specific
wind farms.
Keywords Decision regressor tree (DRT) · Keras neural network · Wind shear
coefficient · Recursive and non-recursive modeling · MAE (mean absolute error) ·
MAPE (mean absolute percentage error) · WS (wind shear)
R. Kumbhare (B) · S. Sawant · S. Sule · A. Joshi

Department of Computer Engineering, IT College of Engineering Pune, Pune, Maharashtra, India
e-mail: kumbhareru18.comp@coep.ac.in
S. Sawant
e-mail: sts.comp@coep.ac.in
S. Sule
e-mail: sanand.sule@climate-connect.com
A. Joshi
e-mail: adj.comp@coep.ac.in
520 R. Kumbhare et al.
1 Introduction
The wind has always been dynamic; it does not follow any pattern for a long time,
resulting in a good forecast’s unavailability, and thus predicting wind is a challenging
task. Predicting wind in a particular region is very beneficial; it can help wind farms,
the aviation industry, etc. [1, 7, 9]. Though the natural phenomenon rule can determine
wind speed by describing wind shear for it. Wind speed is unstable and fluctuates
randomly. Different wind speeds are indirectly related to other zones, latitudes, longi-
tudes, and continents, sometimes at the same zone or place may have different wind
speeds simultaneously, while wind speeds can also be different at particular height
and area [13]. Features like humidity, temperature, air density, pressure, seasonality,
and various parameters need to be checked as these features result in the change
of wind shear [15]. Hence, we still have significant difficulties in forecasting wind
speed, particularly at hub height.
Wind shear is a microscale unpredictable meteorological event/phenomenon that
is very useful for predicting height wind speed using the ground-level wind speed
[2, 6]. Even after 30–40 years of research, there is no dynamic technique to solve
the forecasting problem due to the instability of weather phenomena and complex
terrains. Recent research in wind prediction is mostly focused on short-term wind
predictions with a range from minutes to a few days [14]. The forecast should be
accurate and updated for wind energy production. The power plant owner needs to
plan and schedule grids to initiate the power generation [8, 11, 14]. Hence, a model
is required to forecast the wind speed and power generation at least 1.5–3 h ahead.
Generally, to get a wind speed forecast, the meteorologists usually use some
formulae related to wind extrapolation like the Hellmann coefficient equation [12],
logarithmic equation [16], etc. Various methods are introduced, such as the Machine
learning models, to estimate wind speed using historical data, but these techniques
require different weather parameters on an equal time interval [10]. The data required
are the direction of the wind, atmospheric pressure, temperature, or derived param-
eters such as wind shear exponent. But for most cases, the wind shear is taken as
constant due to the unavailability of other weather parameters. To predict dynamic
wind shear at each 15 min interval of time, we have implemented a decision tree
regressor/Keras neural network that will take some weather parameters and fore-
casted wind speed at ground level and predict the hub-height wind speed at 1.5–3 h
ahead.
The Wind Shear calculated has been multiplied with the ground-level wind speed
through which hub-height wind speed is generated. This hub-height wind speed and
other weather parameters are taken as input for the power forecast model (based
on Decision Tree Regressor/Keras Neural Network), predicting the 1.5–3 h ahead
forecast.
The decision tree regressor is used as its implementation is fast, and groups most of
the prediction under the same group (parent node), resulting in a similar wind shear
value. The Keras neural network is used for its accuracy as it is better than other
Wind Speed at Hub Height (Using Dynamic Wind Shear) … 521
machine learning models and predicts more accurately by adjusting the weights of
the features.
The methodology present in this work describes how the hub height, wind speed,
and power generation are predicted using the dynamic wind shear, weather parame-
ters, and the historical data present. Keras neural network and the decision regressor
tree are used as the machine learning model. These models use recursive and non-
recursive modeling for prediction based on the recent historical data. The compar-
ison between constant wind shear and dynamic wind shear for wind speed and power
prediction has been analyzed and shown in results and discussions.
The machine learning model is created as per the requirements for predicting wind
shear and power generation. In Fig. 1, the architecture of the proposed methodology
of the model is given. It describes the exact execution of the problem which is used
in this application. First, we have to fetch and preprocess the data; the data is taken
as an input for the model in which the hub-height (turbine level) wind speed, and
ground-level wind speed are used to calculate the wind shear, which is taken as a
target for training phase in which features are wind speed at ground level and other
weather parameters, and same features for prediction (testing) phase. Along with it,
feature engineering is also done for the power forecasting model.
Input: The input data contains the actual ground-level wind speed, hub-height
wind speed, wind gust, wind bearing, power generation, and parameters (of weather)
such as humidity, air pressure, and temperature.
Fig. 1 Proposed methodology

Data Preprocessing: As the data which is coming from the forecasters are hourly,
and we need to convert it into 15 minutely information for which we need to resample
it, this data also contains many outliers that are to be removed such as exponential
values, infinite values, and NAN value.
Feature Engineering: In feature engineering, two datasets are created first to
predict wind shear and next to predict wind power generation; for wind shear features,
we use the wind speed lags, wind bearing, wind chill, pressure, humidity, wind change
rate, and wind speed day lags (if required), and for the power generation features, we
use wind speed (at hub-height), wind bearing, wind chill, pressure, humidity, power
lags, and power change rate.
Modeling: It is the primary phase in which the prediction is made; there are two
decision tree regressor/Keras neural network models running; along with it, we also
need to tune the hyperparameters present for the decision tree regressor/Keras neural
network.
Decision Tree Regressor: It is a machine learning model that builds a tree based
on its features and creates a similar dataset subset. On the top of the tree, the most
crucial component is selected, and as the tree increases, the number of branches and
depth increases with it; in this model, I have used the standard deviation reduction
technique in which first the standard deviation of output is calculated then standard
deviation (for each feature) is estimated (known as standard deviation for target
and predictor). The standard deviation of the output is subtracted from the standard
deviation of the target and predictor. The result is known as a typical deviation
reduction.
Keras Neural Network: Keras is a high-level API that uses TensorFlow 2.0. It
provides easy abstractions and essential building blocks for developing and shipping
machine learning models with higher iteration velocity. In the model, I have used the
basic Keras sequential model.
Parameter Tuning: The machine learning model contains many hyperparam-
eters that can be changed according to the dataset to increase its accuracy. The
simple method is a grid search that selects parameters by checking the parameter
performance on a model.
Recursive Model: It is a type of modeling in which the wind lags/power lags were
included in features, and the model executes one row at a time considering each row
as an input of features; the aim of this type of modeling is that the predicted value
is taken as a feature for next iteration of the model. This type of modeling can be
beneficial for short-term forecasting.
Non-recursive Model: It is a type of modeling in which no lags are taken as an
input; it takes the dataset all at once and predicts them. It is mostly used for day-ahead
forecasting.
Predicted Wind Shear: The wind shear model’s output is then put into the loga-
rithmic formula and the wind speed at ground level to calculate the hub-height wind
speed.
Calculating Wind Speed: Hub-height wind speed is calculated using the
predicted wind shear and wind speed at ground level.
Predicting Power Generation: It is the output coming from the power forecast
model.
Data Used: The weather data that is used in the model is of the external weather
forecaster for a city located in Tamil Nadu nearby a wind power plant; the data
contains the parameters(of weather) such as pressure, temperature, humidity, ground-
level wind speed, wind gust, wind bearing, and wind chill. The data also contains the
hub-height wind speed collected from a wind power plant located at the same place
in Tamil Nadu.
The data provided by the external weather forecaster is hourly based, and the
data provided by the wind power plant is in 15 min intervals. To match the external
weather forecaster data, we interpolate it according to 15 min timely interval.
This section shows the following results:

Wind Speed Hub-Height Conversion using Constant and Dynamic Wind Shear
with decision tree regressor and Keras neural network.
Power Generation of the plant by calculated wind speed (hub-height) using
dsecision tree regressor and Keras neural network.
Data Input 4–5 months of data (15 min interval) is taken as an input for training
the model, and 10–15 days of data (not included in the training set) is taken for
testing.
1. Wind Speed and Power Generation Through Constant Wind Shear
Using Decision Regressor Tree and Keras Neural Network: In this, we use the
logarithmic formula using constant wind shear (0.18) (Figs. 2, 3, 4).
Hub-height wind speed = Wind speed(ground level)*(120(hub height)/10(ground
level))**(0.18)
2. Wind Speed and Power Generation Through Dynamic Wind Shear Using
Decision Regressor Tree: In this, we use the logarithmic formula, but in this, we
multiply it with dynamic wind shear (Figs. 5, 6).
Fig. 2 Constant WS wind speed forecast, MAE: 1.32

Fig. 3 Constant WS DRT power forecast, MAPE: 15.90%
Fig. 4 Constant WS Keras power forecast, MAPE: 12.32%
Fig. 5 Dynamic WS DRT wind speed forecast, MAE: 0.75
Fig. 6 Dynamic WS DRT power forecast, MAPE: 10.46%

Fig. 7 Dynamic WS Keras wind speed forecast, MAE: 0.67
Fig. 8 Dynamic WS Keras power forecast, MAPE: 9.02%

level))**(Dynamic Wind Shear).
3. Wind Speed and Power Generation through Dynamic Wind Shear Using
Keras Neural Network: In this, we use the logarithmic formula, but in this, we
multiply it with dynamic wind shear (Figs. 7, 8).
level))**(Dynamic Wind Shear).
The difference between the MAE of both the calculation is low, but when calcu-
lating the power generation of a power plant with 100–200 turbines, the minor error
causes much difference in the power generation and provides the proper bandwidth
speed at which the power is generated. Another important thing is that predicted wind
speed should follow the pattern as per the actual wind speed, so we can see using
the dynamic wind shear model, the pattern matches the constant value of wind shear.
Recently, the Internet of Things (IoT) and Big data analytics got popularity due to
location-independent services, portability, and ability to process huge data quickly
[3–5]. In the future, the proposed system can be expanded in this direction (Figs. 9,
10).
Fig. 9 Forecasted wind speed MAE
Fig. 10 Forecasted power generation with penalties
4 Conclusion
This model designs a machine learning technique that predicts the wind shear coef-
ficient and also the power generation for every 15 min for 1.5–3 h ahead, as 1.5–3 h
ahead forecast of wind and power is essential for wind farms and aviation. The built
technique uses various features that include weather parameters such as humidity,
temperature, pressure, wind speed, i.e., ground level, power lags, and wind lags.
The result is the dynamic wind shear coefficient used to convert the hub-height
wind speed (up to a particular range) and predict power generation. The model has
increased accuracy as compared to power-law and Panofsky and Dutton model. The
technique can be beneficial in terrain areas where the climate is dynamic and wind
changes usually in a short period.
References
1. Albani, A., Ibrahim, M.Z., Yong, K.H.: Wind shear data at two different terrain types. Data
Brief 25, 104306 (2019)
2. Ambach D., Vetter, P.: Wind speed and power forecasting-a review and incorporating asym-
metric loss. In: 2016 Second International Symposium on Stochastic Models in Reliability
Engineering, Life Science and Operations Management (SMRLO), pp. 115–123. IEEE (2016)
3. Deshpande, P., Sharma, S.C., Peddoju, S.K. et al.: Security and service assurance issues in
Cloud environment. Int. J. Syst. Assur. Eng. Manag. 9, 194–207 (2018). doi:https://doi.org/10.
1007/s13198-016-0525-0
4. Deshpande P.S., Sharma S.C., Peddoju S.K.: Predictive and prescriptive analytics in big-data
era. In: Security and Data Storage Aspect in Cloud Computing. Studies in Big Data, vol. 52,
pp. 71–81. Springer, Singapore (2019). doi:https://doi.org/10.1007/978-981-13-6089-3_5
5. Deshpande, P., Sharma, S.C., Sateesh Kumar, P.: Security threats in cloud computing. In:
International Conference on Computing, Communication & Automation, pp. 632–636. (2015)
6. Gao, J., Zhao, Y.: Simulation research on wind shear prediction of airborne weather radar. In:
2014 International Conference on Virtual Reality and Visualization, pp. 435–438. IEEE (2014)
7. Gualtieri, G.: Atmospheric stability varying wind shear coefficients to improve wind resource
extrapolation: a temporal analysis. Renew. Energy 87, 376–390 (2016)
8. Huang, C.-J., Kuo, P.-H.: A short-term wind speed forecasting model by using artificial neural
networks with stochastic optimization for renewable energy systems. Energies 11(10), 2777
(2018)
9. Jiang, Z., Jia, Q.-S., Guan, X.: Review of wind power forecasting methods: from multi-spatial
and temporal perspective. In: 2017 36th Chinese Control Conference (CCC), pp. 10576–10583.
IEEE (2017)
10. Kulkarni, M.A., Patil, S., Rama, G.V., Sen, P.N.: Wind speed prediction using statistical
regression and neural network. J. Earth Syst. Sci. 117(4), 457–463 (2008)
11. Kumar, T.B., Sekhar, O.C., Ramamoorty, M., Rao, S.K., Rao, D.V.B.: Comparitive study on
wind forecasting models for day ahead power markets. In: 2017 IEEE International Conference
on Signal Processing, Informatics, Communication and Energy Systems (SPICES), pp. 1–5.
IEEE (2017)
12. Li, J., Wang, X., Yu, X.B.: Use of spatio-temporal calibrated wind shear model to improve
accuracy of wind resource assessment. Appl. Energy 213, 469–485 (2018)
13. Qawasmi, A., Kiwan, S.: Effect Weibull distribution parameters calculating methods on energy
output of a wind turbine: a study case. Int. J. Thermal Environ. Eng. 14(2), 163–173 (2017)
14. Singh, A., Gurtej, K., Jain, G., Nayyar, F., Tripathi, M.M.: Short term wind speed and power
forecasting in Indian and UK wind power farms. In: 2016 IEEE 7th Power India International
Conference (PIICON), pp. 1–5. IEEE (2016)
15. Tizgui, I., Bouzahir, H., El Guezar, F., Benaid, B.: Wind speed extrapolation and wind power
assessment at different heights. In: 2017 International Conference on Electrical and Information
Technologies (ICEIT), pp. 1–4. IEEE (2017)
16. Werapun, W., Tirawanichakul, Y., Waewsak, J.: Wind shear coefficients and their effect on
energy production. Energy Procedia 138, 1061–1066 (2017)
Modeling and Simulation of Microgrid
with P-Q Control of Grid-Connected
Inverter
Nasir Ul Islam Wani, Anupama Prakash, and Pallavi Choudekar
Abstract The microgrid consists of a group of interconnected loads and various

energy sources such as wind and solar, which are operated in amalgamation to the
main grid for sharing of the connected load. The unified operation of different energy
sources increases the overall reliability and overall operational efficiency of the whole
system. The microgrid always consists of the main source, which is responsible for
supplying the main power. Thus, the microgrid has the primary grid and other DGs
connected to it and thus provided the microgrid’s various modes of operation, such as
grid-connected mode, islanded mode, and dual-mode. The microgrid can be switched
to multiple methods, and this switching requires a good pattern. The paper describes
modes of operation and control strategies required for the proper switching to various
methods. The variation of the Irradiance value affects the active and reactive power
at the PCC or the bus. At low Irradiance, the load is fed by both grid and the solar
PV. At the high Irradiance, the solar PV’s output power increases, and thus the load
demand is majority filled up by the solar PV. The variation of the Irradiance value
affects the active and reactive ability at the PCC or the bus.
Keywords Solar photovoltaic array · Maximum power point tracking · Microgrid
1 Introduction
The advent of DGs has made a revolution in microgrids. The microgrid consists of
interconnected loads and various energy sources such as wind and solar, operated in
amalgamation to the main grid to share connected loads. The conjugated operation
can increase the credibility of the system [1, 2]. The overall system can be operated
in a grid-connected mode where the load is shared among DGs and main grid and
in Islanded mode where the main grid is turned off, and supply is provided by
DGs. The changeover between a grid linked and an Islanded mode requires a proper
N. U. I. Wani (B) · A. Prakash · P. Choudekar

EEE Department, Amity University, Noida, Uttar Pradesh, India
P. Choudekar
e-mail: pachoudekar@amity.edu
530 N. U. I. Wani et al.
control scheme. When operating in grid-linked mode, the microgrid sources are used
for providing active (P) and reactive power (Q) control, and in Islanded mode, the
sources are used for delivering voltage (v) and frequency (f) control.
The different types of sources may be used in the microgrid, such as converter-
based sources and rotating machine-based sources. The other kind of sources may
lead to various control problems as in converter base sources, the response may
be quick, but on the other hand, in the case of rotating machine-based sources,
the response may be too slow. In Sect. 1, an introduction to microgrid is provided.
Section 2 describes the microgrid model and its modules. In Sect. 3, various operating
modes of a microgrid are related, and a way of detecting the operating mode is
provided. The importance of control strategies and some of them are incorporated in
Sect. 4. Section 5 describes the simulation model and the inverter control algorithm.
The results are discussed and analyzed in Sect. 6. Section 7 gives the conclusion of
the paper.
2 Microgrid Model
A microgrid is a network consisting of various sources, loads, and storage places

connected to amalgamation. The energy sources can be wind, solar, etc. The unified
operation of different energy sources increases the overall reliability and operational
competence of the whole system. The microgrid always consists of a primary source
responsible for supplying the main power [1]. The microgrid model’s basic layout
consists of several renewable sources of energy, various loads, commercial and resi-
dential, storage devices, and electric vehicles operated in amalgamation, as shown
in Fig. 1. Figure 1 indicates that there are multiple loads, the renewable in the form
Fig. 1 Microgrid model

Modeling and Simulation of Microgrid with P-Q Control … 531
Fig. 2 Model of microgrid
of solar and wind, the storage device in the form of the fuel cell or batteries, and the
central generating unit called the utility grid all connected [2].
2.1 Modules of Microgrid
The microgrid incorporates various components. The microgrid to be modeled is

shown in Fig. 2. It consists of two converter-based energy sources connected to the
common bus using PCC. It consists of a filter designed and a star-delta transformer
that connects the primary grid to the two generating stations or DGs. The PCC breaker
is connected, thus operating the microgrid in a connected mode as in Fig. 2. If the
PCC circuit breaker is turned off, removing the connection linking the primary grid
and DGs, the microgrid will be operated in the Islanded or Stand-alone mode [3–5].
3 Modes of Operation
The microgrid is linked to the network using a PCC. The flow of P & Q in PCC
symbolizes the mode of operation. When P&Q flow in the PCC corresponds to
zero, there is a balance in power, and there is no trading of power among the DGs
and the network. These operating conditions are considered as best suited and best
economical operating conditions of the microgrid. Moreover, any imbalance in the
P & Q may correspond to the trading of power between the DGs and the network,
not the right operating conditions. Figure 3 shows the different modes of operation.
3.1 Grid-Connected Mode
This approach of working has the primary grid along with all DGs connected to the
microgrid. Thus, in this fashion of working, microgrid supplies and draws power
Fig. 3 Different modes of

operation
according to the generator and load demand. The primary grid maintains voltage
and frequency control.[1] The distribution generators are deactivated for supplying
the P & Q. The main aim of this mode of operation is maximation of efficiency and
increasing the overall utilization of renewable sources. In this functional approach,
the microgrid ensures grid voltage, power factor, and bus voltage with permissible
operation limits. The primary grid is linked to the distribution system at a point
known as PCC. Microgrid in gird connected mode should operate in constant P-Q
mode, and this ensured only when an inverter is governed in the continuous current
approach [6].
3.2 Islanded Mode
In this operation approach, the primary grid is disconnected, and hence this operation
method is also called isolated mode. In this technique, DGs functions to cater to loads
independent of the grid for supervising Voltage (V) and Frequency (F). The islanded
mode helps with any increment or decrement in V & F by generator tripping and
load shedding, respectively, to maintain them in the permissible operation region.
The principal grid is decoupled from the distribution network [4, 7].
3.3 Detection of Mode of Operation
The presence of a particular mode of operation is found using a supervisory system

[8]. The supervisory system checks the presence of current. Based on it, one can
conclude the existence of Grid-Connected Mode or Island-Connected Mode. The
system contains a particular mode as:
• If the current is present at PCC (Point of Common Coupling), then Grid is
Connected,
• If there is no current at PCC, then the microgrid is in an Islanded mode of operation.
4 Control Strategies
The microgrid has an advantage over other distribution networks in terms of better
controllability. The microgrid control is required mainly for:
(a) Upstream network interface to check whether it works in grid-linked mode or
the isolated mode.
(b) Supervision and Security.
(c) Local defense.
The microgrid control can be operated in a Centralized Control mode where the
main focus is on optimizing the microgrid or in a decentralized mode where the main
focus is on maximizing the power production and selling of additional generated
power.
The control strategies in a microgrid are dependent on the method of operation
[9, 10].
4.1 P-Q Control:
This strategy’s primary goal is to maintain active and reactive power when V & F
variations occur due to changing load. The active power is maintained stationary by
the active power controller, and the reactive power controller stabilizes the reactive
power at given reference [1].
The primary grid maintains the V&F. In this mode of the control scheme, there are
two bands of functioning, an inner band or current loop and the outer band or power
loop. The internal current loop responds to disturbances caused due to voltages.
In this control strategy, the three-phase voltages and current at the grid side
are converted into rotating frame components by employing park transformation.
Consider I the output current of the inverter; Id be the d-axis component of current
and Iq be the q-axis component of the current, then
P
Id = (1)
U
Q
Iq = (2)
U
where
P is reference Active power.
Q is reference Reactive power.
The synchronized reference frame phase-locked loop is used to find the voltage
phases; by employing a simple PI Controller, the value Vd and Vq are determined.
After that, dq-abc Transformation is used to find inverter voltage in the abc domain.
4.2 V-F Control
This control strategy’s primary goal is to restore voltage (v) and frequency (f) at their
nominal value regardless of active and reactive power variations. The frequency is
stabilized by the frequency controller enduring the active power, and the voltage
controller manages steady voltage at given reference. The disconnection with the
primary grid leads to active and reactive power imbalances at the load terminal. The
variation in generation and demand causes fluctuations of voltage and frequency
and hence frequency settles at a different value. Thus, for voltage and frequency
settlement, the DGs need to increase the supply. In this mode of the control scheme,
the outer loop is responsible for maintaining the voltage as an inner current loop that
keeps the current by acting as a servo meter. The dual loop carries a high dynamic
level of precision [1, 2].
The V/f control is necessary for the smooth operation of various sources as it is
responsible for marinating up to the constant flux.
4.3 Droop Characteristics
The Droop Characteristics may be P-Q control characteristics or V-F control char-
acteristics. It includes active power control and voltage control. Figure 4 shows rela-
tions. It doesn’t require any connections, and decentralized control can be achieved.
The distortion associated with voltage is very high and synchronization with the
primary grid is not maintained.
Fig. 4 Droop control characteristics

5 P-Q Control of Solar-Based Microgrid
The simulated model of a microgrid consists of two DGs; as in Fig. 5, the DGs
are converter based and thus require the inverter. The inverter is designed from a
universal bridge. Since we are using the topologies of directly connected inverter
to PV cell thus, we use the grid-connected inverter’s P-Q control strategy in the
microgrid [11–14]. In the inverter’s P-Q control, the inverter’s grid output current
and output current are compared. The reference current is generated by giving the
voltage and current of PV to an MPPT algorithm. Comparing currents is made using
controllers, as the tuning of three different controllers is difficult. Thus, we use the
abc-dq transformation to get the currents in the d-axis and q-axis. The currents are
again transformed by using a dq-abc transformation, and from these currents, we
generate a gate pulse for the inverter by using a PWM generator.
5.1 PV Array
The inputs to a PV array are given by a signal showing the relationship between
the Irradiance and the temperature. A signal builder generates the signal, and the
variation of Irradiance and temperature is shown in the figure. It can be seen from
Fig. 6 that with the increase in the temperature, the Irradiance also increases.
5.2 Control of Inverter
The inverter is designed from the IGBTS. Since we are using the topologies of directly
connected inverter to PV cell thus, we are using the P-Q control strategy of the grid-
connected inverter in the microgrid. The RC block is used to match the PV terminal’s
load line to draw maximum power from the PV array. In this work, the P-Q control
scheme for the inverter has been used. In this scheme, the terminal current and voltage
of the PV are given to an MPPT algorithm. The current from the inverter side and
voltage from the grid side are transformed using parks transformation. Comparing
currents is made using controllers, as the tuning of three different controllers is
difficult. Thus, we use the abc-dq transformation to get the currents in the d-axis
and q-axis. The transformed voltages and currents in d-q are compared using a PI
controller. The d-q components of voltage are connected, and by applying inverse
parks transformation, the Vabcref is generated. This is given to a PWM generator to
provide the necessary PWM signals to the inverter as in Fig. 8. The low-frequency
transformer, as used in Fig. 7, is to eliminate the harmonics caused by the switching
of the inverter.
536
Fig. 5 Overall simulation

N. U. I. Wani et al.
Fig. 6 Variation of Irradiance and temperature
Fig. 7 P-Q control of an inverter
The microgrid’s simulated model consists of a PV array at various irradiances of 10,

500, and 1000 W/m2 . The PV is connected to the bus using an inverter. The primary
utility grid is connected utilizing a transmission feeder, and various loads of rating
are also connected, as in Fig. 5.
The variation of the Irradiance value affects the active and reactive power at the
PCC or the bus. The variation of power output at various irradiance values is also in
Fig. 8. The figure shows that at an Irradiance of 1000 W/m2 , maximum power of 50
Solar PV Array P-V & I-V Charateristic

100 1 kW/m 2
Current (A)
50 0.5 kW/m 2
0.01 kW/m 2
0
0 100 200 300 400 500 600 700 800
Voltage (V)
4
10
6
1 kW/m 2
Power (W)
4
0.5 kW/m 2
2
0 0.01 kW/m 2
0 100 200 300 400 500 600 700 800
Voltage (V)
Fig. 8 V-I and P-V variation of PV with irradiance
KW is supplied by PV, and at 10 W/m2 , the maximum power of 2.313 KW is given

by the PV array.
As seen from Figs. 9 and 10 with an increase in Irradiance, the power also
increases. In the case of high Irradiance, if the load meets the power of the PV
array, whole of power is supplied by the PV array, and in case the rating is more,
they share the power.
It is evident from the figure that the grid’s output power is less as the solar panel
output is less, and thus utility supplies the majority of load demand. With the increase
in the Irradiance, solar PV output increases and the PV supplies the power demand.
All the load is supplied by the solar panel independently at the full irradiance condi-
tions, and the utility has maximum reactive power equal to its rating. It can be seen
that the power at the common bus increases with an increase in Irradiance as power
from both sources is feeding (Fig. 11).
Thus, the variation in the Irradiance value affects the active and reactive power at
the PCC, or the bus are observed. Both grid and the solar PV feed to low Irradiance,
Fig. 9 Variation of P of the grid with irradiance

Fig. 10 Variation of P of inverter with irradiance
Fig. 11 Variation of P of a bus with Irradiance
the load. At the high Irradiance, the solar PV’s output power increases, and thus the
load demand is majority filled up by the solar PV. The variation of the Irradiance
value affects the active and reactive power at the PCC or the bus.
The simulation model with the converter-based source has been modeled. The
inverter has been designed, and P-Q control in the DC grid model is also simulated.
Simulation of various control strategies and control algorithms in grid-connected
mode and islanded operation mode needs to be done in the future. Further, the Internet
of Things and Cloud-based technologies [15, 16] can also improve the proposed
circuits’ utilization.
7 Conclusion
A microgrid is a network consisting of many energy sources connected to loads and

storage units connected to amalgamation. The energy sources can be wind, solar,
etc. The unified operation of various energy sources increases the overall reliability
and overall operational efficacy of the system. The microgrid always consists of the
primary source, which is responsible for supplying the main power. Thus, the micro-
grid has the main grid and other DGs connected to it and thus provided the microgrid’s
various functioning methods, such as grid-connected mode, Islanded mode and Dual
mode. The microgrid can be switched to multiple modes and this switching requires a
good pattern. Thus, various methods of functioning of a microgrid have their benefits
and flaws. Moreover, a proper control pattern must be followed while switching to
multiple modes of operation.
References
1. Haider, S., Li, G., Wang, K.: A dual control strategy for power sharing improvement in islanded
mode of AC microgrid. Prot. Control. Mod. Power Syst. 3(10) (2018)
2. Das, D., Gurrala, G., Shenoy, U.: Transition between grid-connected mode and islanded mode
in VSI-fed microgrids. Indian Acad. Sci. 42(8), 1239–1250 (2017)
3. Vignesh, S.S., Sundaramoorthy, R.S., Megallan, A.: The combined V-F, P-Q and droop control
of PV in microgrid. Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET) 4(III) (2016)
4. Adhikari, S., Li, F.: Coordinated V-f and P-Q control of solar photovoltaic generators with
MPPT and battery storage in microgrids. IEEE Trans. Smart Grid 5(3), 1270–1281 (2014)
5. Lasseter, R.H.: Micro grids. In: Proceedings of IEEE Power Engineering Society Winter
Meeting, vol. 1, pp. 305–308 (2002)
6. Chandorkar, M., Divan, D., Adapa, R.: Control of parallel connected inverters in standalone ac
supply systems. IEEE Trans. Ind. Appl. 29(1), 136–143 (1993)
7. Najy, W., Zeineldin, H., Woon, W.: Optimal protection coordination for micro grids with
grid-connected and islanded capability. IEEE Trans. Ind. Electron. 60(4), 1668–1677 (2013)
8. Wasynczuk, O., Anwah, N.A.: Modeling and dynamic performance of a self-commutated
photovoltaic inverter system. IEEE Trans. Energy Convers. 4, 322–328 (1989)
9. Hatziargyriou, N., Asano, H., Iravani, R., Marney, C.: Microgrids. IEEE Power Energy Mag.
5, 78–94 (2007)
10. Loix, T., Wijnhoven, T., Deconinck, G.: Protection of microgrids with a high penetration of
inverter-coupled energy sources. In: Proceedings of 2009 CIGRE/IEEE PES Joint Symposium:
Integration of Wide-Scale Renewable Resources into the Power Delivery System, July2009
11. Bose, B., Tayal, V.K., Moulik, B.: Solar-based electric vehicle charging infrastructure with grid
integration and transient overvoltage protection. Bentham Science Publishers (2020)
12. Sahu, A.R., Bose, B., Kumar, S., Tayal, V.K.: A review of various power management schemes
in HEV. In: 2020 8th International Conference on Reliability, Infocom Technologies and
Optimization (Trends and Future Directions) (ICRITO), pp. 1296–1300. IEEE (2020)
13. Bose, B.: Modelling of microinverter and pushpull flyback converter for SPV application.
In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization
(Trends and Future Directions) (ICRITO), pp. 458–462. IEEE (2020)
14. Bose, B., Kumar, S.: Design of push-pull flyback converter interfaced with solar PV system. In:
2020 First International Conference on Power, Control and Computing Technologies (ICPC2T),
pp. 117–121. IEEE (2020)
16. Deshpande, P., Sharma, S.C., Peddoju, S.K., Abraham, A.: Efficient multimedia data storage
in cloud environment. Inform. Int. J. Comput. Inform. 39(4), 431–442 (2015)
Smart Student Assessment System
for Online Classes Participation
Sudheer Kumar Nagothu
Abstract During this COVID 19 epidemic time, the students could not attend the
classes regularly in physical form. Online courses have come to the students’ rescue
so that the technology has been taken to their homes. In certain instances, the students
are misusing this option and attending the classes just for participation. Generally,
the institution allocates certain internal marks for student attendance. In this paper,
we assess the students’ participation through various parameters such as the duration
of class the student is present, the number of poles responded, the number of chats
and talks, and the number of times the student raised doubts. Students’ participation
has been categorized into four levels: active, average, poor, and very poor. When the
student’s participation is abysmal and repetitive, he will be marked absent from that
class. The student participation is assessed with an artificial neuro-fuzzy inference
system using test and train data and satisfactory results.
Keywords ANFIS · Student participation · Student assessment
1 Introduction
Participation in online classes refers to many fields such as discussions, interac-

tion, and exchange of thoughts and ideas either in written or oral formats during the
class hours with the instructor. With mere attending the classes, student participation
cannot be evaluated. In academics, attendance and participation may be related to
each other, but they are different. Even though attendance is mandatory, the partic-
ipation assessment cannot be entirely based upon the student’s sheer presence. In
certain instances, the student is just logging in for online classes without any class
participation. Participation in the class is a measure of student engagement in course
discussions and lectures.
The students should not be passive participants in the consumption of knowledge
but should be active participants in discussing the course and topics related to the
S. K. Nagothu (B)
RVR & JC College of Engineering, Chowdavram, Guntur 522019, Andhra Pradesh, India
e-mail: nsudheerkumar@rvrjc.ac.in
542 S. K. Nagothu
course. Various parameters such as the number of forum questions read and posted,
the number of chat sessions participated, and the number of chat messages submitted
are considered to access students’ participation [1]. Certain weightage has been
allocated for every field, and when it is above a specific value, their participation was
recognized [2]. Thorough investigations proposed methods to encourage student
participation and achieve academic excellence in large classes [3].
Research has been done to measure students’ physical presence in the classroom
using a GPS sensor [4, 5]. The physical location will be sent to the server, and it will
be checked with the student’s predefined area [6, 7]. When the student is present at
the predefined location at the scheduled time, attendance will be marked [8]. This
research has concentrated only on the attendance of the student but not participation
[9]. In this research paper, it is proposed to measure the participation index of the
student by considering various parameters [10, 11]
2 Materials and Methods
A robust system to assess students’ participation in online classes requires a combi-

nation of fuzzy logic and neural network technologies. It is also known as adaptive
neural fuzzy inference systems (ANFIS). ANFIS model will generate fuzzy rules
logically to training data. For the present problem, the ANFIS model is considered
because of its ability to optimize until the given input matches the desired output.
ANFIS consists of five layers: the input layer, input membership function layer,
normalization layer, output membership function layer, and output player. ANFIS
model for the present system can be seen in Fig. 1.
2.1 ANFIS Implementation
The proposed system is implemented using the MATLAB simulation. Parameters

considered for the assessment of student participation are collected using the code
tantra platform. The data is collected at various time intervals, and it is used to measure
students’ involvement. Scenarios like low participation, average participation, and
optimal participation are considered outputs in the current research. An excel file has
been created with input parameters and output student assessment. A set of data is
provided for both testing and training the ANFIS model.
The ANFIS technique is used to convert the raw input data to the desired output.
The Gaussian membership functions are used to check the given five inputs, and the
inference model can be seen in Fig. 2. The Sugeno engine model is used for the imple-
mentation of the ANFIS model in current research. Each parameter’s membership
function plot is shown in Figs. 3, 4, 5, 6, and 7, respectively, for the total duration,
votes polled, chats, talks, and raise hand. The Gaussian membership functions are
used here because of their simplicity and fast processing capability.
Smart Student Assessment System for Online Classes Participation 543
Fig. 1 The ANFIS structure
The proposed system rules to assess student participation will be automatically

generated after input and output membership functions are defined. With the training
data, the ANFIS has generated 243 rules that are shown in Fig. 8. It can be observed
that the rules are using AND operator. A sample data set for the measurement of
student participation is given in Table 1. For the given input parameters, total duration,
poll votes, chats, talks, raise hand, students’ participation assessment is provided in
Table 1.
Figure 9 shows the visualization of rules for input parameters to measure the output
of student assessment. The rule bar for each input can be slide from one end to the
other to measure the output parameters. Using training and testing data sets, the
system will become robust and adaptive. After training, the probability of student
participation assessment for various input parameters can be seen in Fig. 10.
Figure 10 shows that the average testing error is deficient, confirming that the
current model is perfect for assisting student participation. For testing, a date set
544 S. K. Nagothu
Fig. 2 ANFIS Sugeno engine
Fig. 3 Total duration MF plot
of 108 samples is used. The data has been collected from various faculties who are
handling the subject at multiple time intervals and days. The measurement of student
participation will vary from 0 to 1. Students’ involvement has been categorized into
four levels: active, normal, low, and very poor. The range of data for each label
is given in Table 2. When the student’s participation is very poor and in previous
Fig. 4 Poll votes MF plot
Fig. 5 Chats MF plot
Fig. 6 Talks MF plot
classes, he will be marked absent if his participation is poor and very poor. If the
student’s involvement is low or very poor and in previous classes, the student will
be alerted if his participation is normal or active.
The membership function between input and output parameters for detail analyzed
using 3D graph Figs. 11, 12, 13, and 14 shows the way student participation varies to
the changes in input parameters like variation with poll votes and total duration, chats
and entire duration, chats and poll votes, talks and poll votes, etc. Figure 11 shows that
546 S. K. Nagothu
Fig. 7 Raise hand MF plot
Fig. 8 ANFIS generated rules
with an increase in polled votes, student participation is increased. Figure 12 shows

that chats and total duration play an important role in participation assessment. From
Figs. 13 and 14, it can be observed that votes polled by a student play a dominant
role in participation assessment.
Table 1 Sample input data and student participation assessment

Total duration Poll votes Chats Talks Raise hand Participation assessment
1 0.5 1 0.2 0.6 0.64
1 0.6 0.1 0.2 0.8 0.63
0.79 0.8 0.7 0.2 0.2 0.60
0.95 0.4 0.5 0.2 0.8 0.58
0.69 0.8 0.6 0.2 0.2 0.57
0.70 0 0.1 0.6 0.4 0.29
0.67 1 0.2 1 0.4 0.73
0.96 0.4 0.8 0.2 0.5 0.55
0.70 0 0.1 0.6 0.4 0.29
Fig. 9 ANFIS rules viewer
4 Conclusion and Future Work
A smart and intelligent student participation assessment system has been proposed
in this paper. The proposed model does not concentrate on the physical presence of
the student but on his participation. The proposed research was implemented using
ANFIS, which makes the system reliable and robust. The system will alert the students
when student participation is low or very poor to improve their class participation.
Using the proposed model, the student will become an active participant, but not a
passive listener, as his participation is evaluated, and marks will be awarded only
when his participation is satisfactory.
548 S. K. Nagothu
Fig. 10 ANFIS test results
Table 2 Student
SL. No. Participation assessment range Category
participation assessment
ranges with a label 1 0–0.19 Very poor
2 0.2–0.49 Poor
3 0.5–0.69 Normal
4 0.7–1 Active
Fig. 11 Poll votes versus

total duration surface plot
Fig. 12 Chats versus total

duration surface plot
Fig. 13 Chats versus poll

votes surface plot
550 S. K. Nagothu
Fig. 14 Talks versus poll

votes surface plot
References
1. Chan, A.Y.K., Chow, P.K., Cheung, K.S.: Student participation index: student assessment in
online courses. In: Liu, W., Shi, Y., Li, Q. (eds.) Advances in Web-Based Learning—ICWL
2004. ICWL 2004. Lecture Notes in Computer Science, vol. 3143. Springer, Berlin, Heidelberg
(2004). https://doi.org/10.1007/978-3-540-27859-7_58
2. Bergmark, U., Westman, S.: Student participation within teacher education: emphasising demo-
cratic values, engagement and learning for a future profession. Higher Educ. Res. Dev. 37(7),
1352–1365 (2018). https://doi.org/10.1080/07294360.2018.1484708
3. Kumaraswamy, S.: Promotion of students participation and academic achievement in large
classes: an action research report. Int. J. Instruct. 12(2), 369–382 (2019). https://doi.org/10.
29333/iji.2019.12224a
4. Nagothu, S.K., Kumar, O.P., Anitha, G.: Autonomous monitoring and attendance system
using inertial navigation system and GPRS in predefined locations. In: 2014 3rd International
Conference on Eco-friendly Computing and Communication Systems, Mangalore, pp. 261–265
(2014). https://doi.org/10.1109/Eco-friendly.2014.60
5. Nagothu, S.K., Anitha, G., Annapantula, S.: Navigation aid for people (joggers and runners)
in the unfamiliar urban environment using inertial navigation. In: 2014 Sixth International
Conference on Advanced Computing (ICOAC), Chennai, pp. 216–219 (2014). https://doi.org/
10.1109/icoac.2014.7229713
6. Nagothu, S.K., Kumar, O.P., Anitha, G.: GPS aided autonomous monitoring and attendance
system. Procedia Comput. Sci. 87, pp. 99–104 (2016). https://doi.org/10.1016/j.procs.2016.
05.133. ISSN 1877-0509
7. Nagothu, S.K.: Automated toll collection system using GPS and GPRS. In: 2016 International
Conference on Communication and Signal Processing (ICCSP), Melmaruvathur, Tamilnadu,
India, pp. 0651–0653 (2016). https://doi.org/10.1109/ICCSP.2016.7754222
8. Nagothu, S.K., Anitha, G.: INS-GPS integrated aid to partially vision impaired people using
Doppler sensor. In: 2016 3rd International Conference on Advanced Computing and Commu-
nication Systems (ICACCS), Coimbatore, pp. 1–4 (2016). https://doi.org/10.1109/ICACCS.
2016.7586386
9. Nagothu, S.K., Anitha, G.: INS-GPS enabled driving aid using Doppler sensor. In: 2015 Inter-
national Conference on Smart Sensors and Systems (IC-SSS), Bangalore, pp. 1–4 (2015).
https://doi.org/10.1109/SMARTSENS.2015.7873619
10. Nagothu, S.K., Anitha, G.: Low-cost smart watering system in multi-soil and multi-crop envi-
ronment using GPS and GPRS. In: Proceedings of the First International Conference on
Computational Intelligence and Informatics, vol. 507. Advances in Intelligent Systems and
Computing,pp. 637–643. https://doi.org/10.1007/978-981-10-2471-9_61
11. Nagothu, S.K.: Weather based smart watering system using soil sensor and GSM. In: 2016
World Conference on Futuristic Trends in Research and Innovation for Social Welfare (Startup
Conclave), Coimbatore, pp. 1–3 (2016). https://doi.org/10.1109/STARTUP.2016.7583991
Recommendation System
for Location-Based Services
Ritigya Gupta, Ishani Pandey, Kritika Mishra, and K. R. Seeja
Abstract Location-based services encompass a spectrum of services. Today, it is

easier to locate or search for our favorite restaurant, shop, etc., under these services. It
helps us get access to important and up-to-date information about their surroundings
on a single tap. This research proposes two location-based recommendation systems
by using the collaborative and content-based filtering recommendation techniques.
The first one is a personalized location-based recommender that uses the content
filtering technique. In this recommender, the behavioral patterns are extracted from
the user’s location history and then provide personalized recommendations based
on patterns. Apriori algorithm has been used to extract user-specific behavioral
patterns based on time zone, weekday, and location type. The second one is a gener-
alized location-based recommender that uses the collaborative filtering technique.
It employs the K-means clustering algorithm and the silhouette metric and elbow
method to find the optimal index K (clusters).
Keywords Location-based recommendation system · Collaborative

recommendation · Content-based recommendation · Location-based services
1 Introduction
Location-oriented recommendation services help to gauge and selectively filter items

that are of interest to the users. Selectively returned items can include venues, travel
routes, friends, etc., along with the consideration of relevant spatial information.
This helps us to build technologies that empower the realm of personal preferences.
There are broadly two different approaches used by recommendation systems—
collaborative and content-based filtering techniques. In the collaborative filtering
approach [1–5], user behavior is used to recommend places. This method can give
generalized recommendations to the user. It can be helpful to the user whose locations
of visits have not been traced. For example, if a place has received high ratings by
R. Gupta · I. Pandey · K. Mishra · K. R. Seeja (B)

Department of Computer Science & Engineering, Indira Gandhi Delhi Technical University for
Women, New Delhi 110006, Delhi, India
e-mail: seeja@igdtuw.ac.in
554 R. Gupta et al.
users, the recommendation will include that place. In the content-based filtering
approach [6, 7], suggestions are similar to user preferences in the past. This method
can give personalized recommendations to the user that could be more useful to
them. For example, if a user likes to visit a temple, then his/her recommendations
will include a temple. Bao et al. [8] proposes a location-based recommendation
system and uses selectivity to provide users with local recommendations based on
geo-location according to the observed user’s behavioral pattern.
There are some research works suggesting geo-fencing [3, 4, 8, 9] and tracking in
area-based recommendation programs. Huming [10] proposes a method for analyzing
user location data to find ways to use it. This kind of information is useful for building
a personalized advertising plan. Hlaing [11] has researched mobile devices being
part of user data and proposes a customized recommendation system developed on
a map showing user preferences. Babur et al. [12] proposed a mining data technique
to extract the latent data patterns is of utmost importance while having to make
decisions.
As found from the literature, the existing services give recommendations primarily
based on reviews, ratings, and vicinity from the user. A static record of location
timelines rather than a dynamic record is used to track user location history. Thus,
the objective of this research was to create dynamic recommendations through our
location-based recommender. There is also a need for more personalized and user-
centric recommendations, which can be achieved by focusing on the previously clas-
sified categories’ sub-classifications. The proposed models make recommendations
that are more personalized after deeply studying patterns in user location history.
This will be useful to users moving to a new location. It will help the user to see
all the places of interest in his/her vicinity. Tagging the locations into several cate-
gories, such as banks and temples, and analyzing each category’s frequency will help
identify the user’s top places of interest and thus provide a better user experience by
drilling more in the previously classified categories (Fig. 1).
The sections to follow will discuss in detail the proposed recommendation
systems. Section 2 deals with steps involved in personalized and generalized loca-
tion recommendation Data processing and time stamp processing are some of the
few steps required before applying the algorithms. Sections 3 and 4 deal with the
Fig. 1 Proposed workflow

Recommendation System for Location-Based Services 555
results and conclusions extracted from both recommendation techniques, plotted

on the graphs, and respective case studies have been stated to understand the
recommendation process’s flow better.
2 Proposed Location-Based Recommendation Systems
This paper proposes two location-based recommender system models. The first
model is based on content filtering techniques, and the second model is based on
the collaborative filtering technique.
2.1 Personalized Location-Based Recommendation System
In this model, the behavioral pattern from the user’s location history is extracted and
then providing personalized recommendations based on the extracted patterns. The
complete methodology is as follows:
Step 1: Data Collection and pre-processing
The data related to the travel history of a user is collected for building the model.
The data is then pre-processed to remove duplicate locations and then annotate the
data into different categories like hotels, banks, temples, etc.
Step 2: Data Reduction
In this step, clustering is used to group nearby locations into a set of representative
stay points.
Step 3: Processing time stamp information
Timestamp information is processed to add new properties like weekday and period
of the day.
Step 4: Behavioral pattern extraction using Association rule mining
The user’s behavioral patterns are extracted by applying association rule mining
techniques. The rules are like the following: (day, time) -> Category.
Step 5: Knowledge base creation
The exciting association rules satisfying the specified support and confidence
thresholds are selected and created the knowledge base.
Step 6: Providing recommendation to the user
In the last step, based on the current day and time, the model will provide
recommendations of categories in the current location.
556 R. Gupta et al.
2.2 Generalized Location-Based Recommendation System
In this model, clusters of hotels have been made with the clustering technique’s help,
and then based on the group to which the user belongs, it will recommend the top 5
hotels from that cluster. The complete methodology is as follows:
Step 1: Finding the optimal value of cluster number
In the first step, the optimal value of cluster number is selected based on graphical
results and plots.
Step 2: Sorting of the dataset
The dataset is sorted in descending order based on some useful properties of a hotel.
Step 3: Cluster creation and assignment to hotels
In this step, clusters of hotels are created and assigned to the hotels.
Step 4: Selection of a cluster
Based on a user’s input, an appropriate cluster is selected.
Step 5: Providing recommendation to the user
The model will provide the top 5 hotels’ recommendations in the current location
based on the selected cluster.
3.1 Personalized Location-Based Recommendation System
We have used our google location history data for building this model. The raw
location history data available in JSON format is cleaned and converted to data frame
format. Then the duplicate latitude–longitude pairs from the data are removed as a
part of pre-processing. Semantic annotation of data is done using reverse geocoding
API, which converts geo coordinates to human-readable address.
In the next step, the data points are further reduced by clustering the nearby
points. The DBSCAN clustering algorithm [13] is used to convert the nearby points
to spatially representative points or stay points. In DBSCAN, clustering is done based
on the distance between the points and the cluster size. In this implementation, a point
is assigned to a cluster if the physical distance is less than 100 m and the minimum
cluster size is 1. Apart from this, we have used the haversine metric [14] (to calculate
pointwise distance) and the ball tree algorithm (to find nearest neighbors of points)
to estimate excellent circle distances between points in DBSCAN. The clustering
output is shown in Fig. 2.
Fig. 2 Full data set versus DBSCAN reduced set
After that, the timestamp information is integrated into the data by introducing
new columns like period and weekday. The Apriori algorithm [15] is then used
to extract user-specific behavioral patterns based on time zone (period), weekday,
and location type, and a rule base is created. When latitude, longitude, timestamp
information is provided to the recommender system, it will first find the user location
types as per the association rules/patterns in the knowledge base. Then, GOOGLE
PLACES API is used to suggest suitable locations based on the current latitude and
longitude information.
3.1.1 Case Study 1
If a weekday is Thursday and the time zone is Noon, then the top three location type
retrieved from the Knowledge Base is
Applying Google places API, the recommendations provided are shown in Fig. 3.
558 R. Gupta et al.
3.2 Generalized Location-Based Recommendation System
We have used a dataset of hotels provided by GoIbibo. This is pre-built data, taken
as the most extensive dataset (over 33,344 hotels) generated by data extraction on
goibibo.com, a leading travel site from India.
K-means clustering technique [15] has been used to assign clusters to the hotels
based on user rating and image count. The elbow method, along with silhouette
metric [16], is used to find the optimal value for cluster number K. In the elbow
method, within-cluster sum of squared error is calculated for some values of K, and
then that value of K is selected for which WSS becomes first to diminish. WSS is
defined as the sum of the squared errors for all the points. The distance metric used
for clustering is Euclidean distance. WSS (distortions) values for cluster sizes 1–19
are shown in Fig. 4. An elbow in the curve represents the optimal value for K. In
silhouette metric, the silhouette value gives the cluster’s similarity, and a considerable
value shows a successful clustering. The range of the Silhouette value is between +
1 and −1. Silhouette values for cluster sizes 2–19 are shown in Fig. 4. A peak in the
curve represents the globally optimal value for K. As per the graphical results from
both methods, the optimal value for K (cluster count) is 4.
We have added cluster numbers as a feature in our dataset. We have sorted (in
descending order) the restaurants based on the image_count and hotel_star_rating
characteristics. Image count represents the count of images uploaded by the visitors
of a specific hotel. Hotel star rating means the star rating that a customer gives to
a hotel based on its service satisfaction. Using the K-means clustering technique,
Fig. 3 Sample recommendations

Fig. 4 Silhouette Method versus Elbow Method for optimal K
clusters of hotels are created based on the Euclidean distance metric, and then hotels
are assigned their group. When latitude and longitude information is provided to
the recommender system, it will first predict the cluster number to which the user
belongs and then recommend the top 5 restaurants.
3.2.1 Case Study 2
If latitude and longitude are given as 77.223300 and 28.604700, the recommendations
are shown in Table 1.
The above results outline the top 5 hotels that are recommended based on location-
based vicinity. The result also features the facilities, address, and geo-location
information in the form of latitude and longitude.
3.3 Mobile Application Development
We have developed two mobile applications for providing different location-based

services. In the first application, the Google location history of the user is fetched
to extract the patterns. This app brings the current day/time and the user location
to provide personalized recommendations of various services around them. This is
done by identifying patterns at the back end. Based on the patterns observed, areas
of similar interest are determined using the Google Maps Places API.
The second application uses the GoIbibo dataset and pushed recommendations
based on reviews/ratings in the user’s vicinity. This app is developed using the Flutter
framework to create a cross-platform mobile app (Android/iOS). In the back end,
the app calls the clustering algorithm deployed as an API using a Heroku service.
560 R. Gupta et al.
Table 1 Sample recommendations

Property_name Address Hotel_facilities Latitude Longitude
The Taj Mahal Number One Beauty shop hairdresser/doctor 28.604700 77.223300
Hotel (A Taj Mansingh on call
Hotel)
Hotel Raj Road8495, Airport transfer 28.645871 77.215558
Arakashan available/surcharge|business
Road,
Paharganj
Hotel Singh 46, Padam Airport transfer 28.648933 77.188719
International Singh Road, available/surcharge|babysitting
Karol bagh
Treebo Natraj 1750, Laxmi Airport transfer 28.640776 77.209130
Yes Please Narayan available/surcharge|currency
Street, Chuna
Mandi
Hotel Shivam 1335 Gali Airport transfer 28.643564 77.23108
International Sangatrashan available/surcharge|beauty
near Punjab
and Sindh
Bank
4 Conclusions
This research proposes two personalized location-based service models. The

proposed models are implemented as mobile applications. First, a generalized hotel
recommendation system based on reviews and image count is implemented and used
to enhance and make a user-centric personalized recommender system. The recom-
mender derives patterns from the user’s Google location travel history and suggests
the best location based on the time and the day of the week.
Our present implementation is based on identifying the user stay-points over a
while and extracting the user habits. However, information extraction could further
be improved by also analyzing the user-trajectories. Study of frequently traveled sites
and the general trajectories the user adopts will help to find the user’s tendency to
travel at specific dates and times.
References
1. Sahoo, S.: Location-based personalized recommendation systems for the tourists in India. Int.
J. Res. Appl. Sci. Eng. Technol. 1167–1177 (2017)
2. Bao, J., Zheng, Y., Wilkie, D., Mokbel, M.F.: A survey on recommendations in location-based
social networks. ACM Trans. Intell. Syst. Technol. 1–30 (2013)
3. Cumbreras, M.Á. Ráez, A.M. Díaz-Galiano, M.C.: Pessimists and optimists: improving
collaborative filtering through sentiment analysis. Expert Syst. Appl. 40, 6758–6765 (2013)
4. Fenza, G., Fischetti, E., Furno, D., Loia, V.: A hybrid context aware system for tourist guidance
based on collaborative filtering. In: 2011 IEEE International Conference on Fuzzy Systems
(FUZZ-IEEE 2011), pp. 131–138. IEEE (2011)
5. Sarwar, B.: Item-based collaborative filtering recommendation algorithms. (2001)
6. Liu, S., Meng, X.: A location-based business information recommendation algorithm. Math.
Probl. Eng. 2015
7. Tung, H., Soo, V.: A personalized restaurant recommender agent for mobile e-service. In: IEEE
International Conference on e-Technology, e-Commerce and e-Service. (2004)
8. Bao, J., Zheng, Y., Mokbel, M.F.: Location-based and preference-aware recommendation using
sparse geo-social networking data. In: Proceedings of the 20th International Conference on
Advances in Geographic Information Systems, pp. 199–208 (2012)
9. Mavalankar, A., Gupta, A., Gandotra, C., Misra, R.: Hotel recommendation system (2019).
arXiv:1908.07498
10. Huming, G., Weili, L.: A hotel recommendation system based on collaborative filtering and
rankboost algorithm. In: 2010 Second International Conference on Multimedia and Information
Technology, vol. 1, pp. 317–320. IEEE (2010)
11. Hlaing, H.H., Ko, K.T.: Location-based recommender system for mobile devices on University
campus. In: Proceedings of 2015 International Conference on Future Computational Technolo-
gies (ICFCT’2015); International Conference on Advances in Chemical, Biological & Envi-
ronmental Engineering (ACBEE) and International Conference on Urban Planning, Transport
and Construction Engineering (ICUPTCE’15), p. 7. (2015)
12. Babur, I.H., Ahmad, J., Ahmad, B., Habib, M.: Analysis of dbscan clustering technique on
different datasets using weka tool. Sci. Int. 27, 5087–5090 (2015)
13. Wang, F., Franco-Penya, H.H., Kelleher, J.D., Pugh, J., Ross, R.: An analysis of the application
of simplified silhouette to the evaluation of k-means clustering validity. In: International Confer-
ence on Machine Learning and Data Mining in Pattern Recognition, pp. 291–305. Springer,
Cham (2017)
14. Swara, G.Y.: Implementation of Haversine formula and best first search method in searching
of tsunami evacuation route. In: E&ES, vol. 97, no. 1 p. 012004. (2017)
15. Yuan, C., Yang, H.: Research on K-value selection method of K-means clustering algorithm.
Multidiscip. Sci. J. 2(2), 226–235 (2019)
16. Yabing, J.: Research of an improved apriori algorithm in data mining association rules. Int. J.
Comput. Commun. Eng. 2(1), 25 (2013)
17. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters
in large spatial databases with noise. In: Kdd, vol. 96, no. 34, pp. 226–231. (1996).
Optimal and Higher Order Sliding Mode
Control for Systems with Disturbance
Rejection
Ishwar S. Jadhav and Gajanan M. Malwatkar
Abstract This paper presents the higher order sliding mode control for a typical
unstable process to maintain the system’s stability with disturbance rejection. The
control of uncertainty and distance rejection is a difficult task in control engineering
applications. The literature found that non-linear uncertain systems have been studied
by different researchers in the control engineering field. In this paper second-order
integral sliding mode control (SMC) surface is chosen to derive the value of switching
surface control. The proposed controller design depends on the calculation of poles
of the systems irrespective of stable or unstable poles and gives practical value for
the control input signal, and it is implemented for the system’s nominal model. In
the optimal controller, the computed values of gains from systems poles are used to
derive the one of SMC law. In the presented work, the system’s unstable or stable
poles give the proper value for the control input signal. The proposed technique’s
significant advantages include disturbance rejection, insensitivity to variation in plant
variables, and implementation issues. The simulation results show an advantage over
the designed SMC approach to stabilize the system and its output responses.
Keywords Disturbance rejection · Higher order SMC · Robustness · Simulation ·

Uncertain system
I. S. Jadhav (B)
Department of Electronics & Telecomm Engineering, Godavari Foundation’s Godavari
College of Engineering, Jalgaon, India
G. M. Malwatkar
Department of Instrumentation Engineering, Government College of Engineering,
Jalgaon, India
564 I. S. Jadhav and G. M. Malwatkar
1 Introduction
The literature study observed that the SMC approach had gained more focus in the
last few decades for controlling certain and uncertain types of the systems [1, 2].
The sliding mode control was first studied by Utkin [2]. This study found that the
presented sliding approach has been achieved due to changing the controller’s struc-
ture. The sliding mode control concept is mainly an extension of the variable structure
systems (VSS) based control strategy, in which the control input is switched between
two control signals. The selected system state trajectory is switched on a selected
frame in the state space called the sliding manifold by generating a proper VSS
control signal. SMC is a powerful control tool to design a very robust and stable
system. It eliminates all types of external disturbances with appropriate matching
conditions [3]. The SMC has been given different approaches like dynamic SMC,
higher order SMC, and optimal SMC. These methods emphasize SMC’s primary
advantages and focus on accuracy, robustness, and specific performances [4, 5]. The
proposed approach to design the sliding surface using the calculation of input state
variable matrix eigenvalues with some tuning parameters guidelines is addressed in
this paper [6]. Due to its robustness properties and excellent invariance, VSS concepts
have been developed in real time, mainly control of servo motors [7], robotic manip-
ulators [8], Permanent magnet synchronous servo motors, induction motors, aircraft
control, spacecraft control, and flexible space structure control [9]. These experi-
mental examples show the practical applicability to confirm the theoretical results
regarding the robustness of VSS with sliding modes. However, it was found in the
researchers’ study that the resulting control strategy is discontinuous, and therefore
the chattering phenomenon leads to lower accuracy in the control applications [10].
These problems can be overcome by replacing a continuous control into the control
input computation (a sign function). We observed that larger error generally occurs
due to the discontinuity function in controller [11]. Also, it is observed that the sys-
tem’s behaviors at small error regions become high gain of the system, and this is the
same with the discontinuous control strategy. Hence, SMC’s high gain effect based
on VSS tolerate the uncertainties arises because of variation in parameter, external
disturbances, and change in loads [12]. In the literature, various approaches of VSS
such as given in [13–15] and optimal approaches are given in [16, 17] are developed
for acceptable and better performance of robotics and industrial applications.
The implementation of work is as per the following sections. In Sect. 2, problem
statement is discussed. In Sect. 3, controller design approaches in the form of state
space are emphasized. Section 4 describes the stabilizing control design in the SMC
techniques with switching control and equivalent control. The stabilization concept
with magnetic levitation (maglev) system applications and remarks are included in
Sects. 5 and 6.
Optimal and Higher Order Sliding Mode Control for Systems … 565
2 Problem Formulation
Considering uncertain systems subjected to parameter variation and external distur-

bances is described mathematically in the following state model
d x(t)
= [A + A(t)]x(t) + [B + B(t)]u(t) + δ(t) (1)
dt
y(t) = [C]x(t) (2)
In above equations state vector denoted as x(t) R n×1 , u(t) R shows the system input
control signal and δ(t) R n external disturbances vector. The matrices A, B, C shows
the real constant terms with proper dimension, while A(t) and B(t) represent the
parametric uncertainties present in systems. Let us consider that system uncertainty
and disturbance to be unknown but bounded so that there is resistance to derivatives.
These uncertainties, along with the disturbances present in the system, fulfill the
matching condition written as
d x(t)
= Ax(t) + Bu(t) + d(t) (3)
dt
y(t) = C x(t)
where d(t) represents the disturbance that arises in the system (1). The main objec-
tive of designing a robust controller is to reject the disturbance occurring, track the
uncertainty in plant parameters and stabilize the process. To achieve this objective,
SMC is combined with an optimal controller. In SMC, it is well known that combined
controller signal u(t) is
u(t) = u 1 (t) + u 2 (t) (4)
in Eq. (4) u 1 (t) shows the equivalent controller used to bring the state of the system
on a sliding manifold (surface),and u 2 (t) is input control signal enforce to keep the
system state variables once it reaches the sliding manifold.
3 The Integral Sliding Mode Controller
As per the linear transformation concept, it is known that system disturbance and
uncertainties fulfill matching conditions after the transformation. Hence, mathemat-
ically, it is written as
x(t) = T z(t) (5)

in Eq. (5) T is representing the transforming matrix. After transformation of the

system, it can written as
ˆ
A(t)z(t) ˆ
+ B(t)u(t) ˆ = T −1 [A(t)x(t) + B(t)u(t) + δ(t))]
+ δ(t) (6)
= T −1 Bd(t)
ˆ
= B(t)d(t))
⎡ ⎤ ⎡ ⎤
0 1 0 ... 0 0
⎢0 0 1 ... 0 ⎥ ⎢0⎥
ˆ = T AT = ⎢
A(t) −1 ⎥; B̂ = T B = ⎢ ⎥; and Ĉ = C T
−1
⎣ . . . . 1⎦ ⎣ . ⎦
a1 a2 a3 . . . an bi
The introduced robust optimal controller worked to convert the trajectory tracking
problem to the regulatory concept of control problem by calculating the error in the
given system. Hence, the optimal controller useful to eliminate effect occurs due to
system uncertainties and disturbance rejection and satisfies the matching condition
for this considering both these cases. It is investigated that for minimum control
input, the tracking error minimizes system states x1 (t) is tracking to the known
signal having the desired state. Let xd (t) is the desired trajectory and achieved in
terms of transformed domain z d (t). In this case, the error e(t) can be represented as
follows
en (t) = z n (t) − z di (t) (7)
where z d(1) (t) . . . z d(n−1) (t) denotes (n − 1) derivatives of z d (t). Now write the Eq. (1)
into transformation form and represent it in the form of error coordinate positions of
the system
de(t)
= Â + Â e(t) + B̂ + B̂ u(t) + δ(t) + Ad (t) (8)
dt
where Â, B̂ already represented in Eq. (6) and here

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 0 ... 0 0 0
⎢ 0 0 0 . . . 0 ⎥ ⎢0⎥ ⎢0⎥
ˆ =⎢
A(t) ⎥ ; B̂ = ⎢ ⎥ , δ̂ = ⎢ ⎥
⎣ . . . . 1 ⎦ ⎣ . ⎦ ⎣ . ⎦
a1 (t) a2 (t) a3 (t) . . . an (t) b1 δ1
⎡ ⎤
0
⎢ 0 ⎥
and, Ad (t) = ⎢
⎣
⎥
⎦
.
(1) (2) (n−1)
a1 z d (t) + a2 z d (t) + a3 z d (t) + · · · + an z d (t)
now write the expression in terms of system external disturbances and uncertainties
presents
Â(t) + B̂u(t) + δ(t) + Ad (t) = B̂φ(t) (9)
where unknown function denoted by φ(t) which fulfill the condition of matching
and its time derivatives. From Eqs. (8) and (9) with de(t)
dt
= ė(t)
ė(t) = Âe(t) + B̂(u 1 (t) + u 2 (t)) + B̂φ(t) (10)
4 The Proposed Controller Design Approach
The main focus is to derive the control input, so that which regulates control input
is u 1 . Consider normal condition, ignoring the uncertain part, Eq. (10) becomes
ė = Âe(t) + B̂u 1 (t) (11)
The stabilize control law u 1 (t) is designed as
u 1 (t) = −R −1 B̂ T Pe(t) = −K e(t) (12)
K is represented as the gain matrix K = [k1 , k2 , . . . kn ] are the tuning gain for sta-
bilization and robust performance of the systems. The gain K is a function of poles
of the systems and can be related as
K εeig(A)ε(Pi + j Q i ) (13)
where i = 1, 2, . . . n, and P1 > P2 . The sub-gains are computed using formulation

as
K i ε(Pi2 + Q i2 ) (14)
In this paper, the tuning parameter λi ε(0.01, 0.99) is used for the system’s smooth-
ness and robustness without compromising its performance. The various tuning
parameters can be designed and calculated as

√ P12 + Q 21
p12 + Q 21 K1
K1 = , K2 = and K 3 ε unstable poles and K 3 = .
λ1 λ2 λ3
The parameters obtained using the poles’ location are the optimal values of the gains,
and these gains are used to get the desired performance of the systems.
4.1 SMC Design
The proposed controller with SMC apply minimum control efforts to track the uncer-
tainty in the system. The above optimal controller strategy can be combined into a
proposed sliding surface based SMC described as follows. Now assume that an inte-
gral sliding surface s(t) written as
t
s(t) = e(t) − e0 − φ̇(τ )dτ (15)
0
where calculates the design parameter selected in such fashion that the inverse of
matrix B̂ is non-singular φ̇(t) = Âe(t) + B̂u 1 (t). Let e0 is the initial error condition
and which is constant, therefore

ṡ(t) = ė(t) − φ̇(t) . (16)
It represented as

ṡ(t) = Âe(t) + B̂(u 1 (t) + u 2 (t)) + B̂φ(t) − Âe(t) − B̂u 1 (t) (17)
or

ṡ(t) = B̂u 2 (t) + B̂φ(t) . (18)
In this Integral Sliding Mode Control (ISMC), observe that the reaching phase is
removed and system states become reached to the sliding manifold within a short time
interval. In ISMC, the controller design based on achieving the reaching condition
is written as
ṡ(t) < ρsgn(s(t)) (19)
where constant values chosen as ρ > 0 and sgn(s(t)) = [1, −1, 0] for s(t) >
0, s(t) < 0, s(t) = 0, respectively. So from above three equations
u 2 (t) < ( B̂)−1 [ρsgn(s(t)) + B̂φ(t). (20)
The above equation shows that due to the sign function switching control input,
u 2 (t) is influenced and becomes oscillatory. It is also known as the controller’s
chattering. To remove this chattering effect in ISMC, a design of the second-order
sliding manifold is required. There are two steps required to design the second-order
SMC manifold represent as follows.
t
s(t) = e(t) − φ̇(τ )dτ (21)
0
and it is written as

ṡ(t) = B̂u 2 (t) + B̂φ(t) . (22)
The main advantage is that it is unnecessary to require initial conditions for the design
above the sliding surface. Hence, achieving all system states on the sliding surface
of s(t) requires a non-singular terminal sliding surface. In this case, the non-singular
terminal sliding surface is written as
σ (t) = s(t) + δ ṡ(t)α/β . (23)
In the above equation, the switching gain is δ and generally selected as
δ>0 (24)
and the terms α, β are chosen that to fulfill the following conditions:
α, β ∈ [2n + 1] (25)
where ‘n’ denotes any integer value

α
1< <2 (26)
β
The second-order SMC combines the linear sliding surface with non-linear terminal
sliding manifold σ (t). The constant pulse proportional reaching law can be defined
as
σ̇ (t) = −η1 sgn(σ (t)) − 1 σ (t) (27)
where η1 > 0 and 1 > 0.
5.1 Magnetic Levitation (Maglev) System
To demonstrate and compare the effectiveness of the proposed controller with Das and
Mahant [3], the simulation studies are conducted for vertical displacement tracking
of the maglev model. Mathworks MATLABR2019b and its Simulink is used to
implement the sliding mode controller for the magnetic levitation (maglev) system.
Assume that the position control of a maglev vehicle model is extensively studied
in [3, 16]. The displacement in vertical direction of maglev system is shown in Fig. 1.
The aim of this simulation is to control the position of ball in vertical direction x(t)
Fig. 1 Setup of a maglev suspension system
The system model with uncertain parameters is written as
d x(t)
= [A + A(t)]x(t) + [B + B(t)]u(t) (28)
dt
y(t) = C x(t) (29)
where
⎡ ⎤
0 1 0
A=⎣ 0 0 1 ⎦ (30)
57000 1938 −16
⎡ ⎤
0 0 0
A(t) = ⎣ 0 0 0 ⎦ (31)
57000L r (t) 1624L r (t) 16L r (t)
and
⎡ ⎤
x1 (t)
x(t) = ⎣ ẋ1 (t) ⎦ (32)
ẍ1 (t)
with
⎡
⎤
0
B=⎣ 0 ⎦ (33)
14.25
⎡ ⎤
0
B(t) = ⎣ 0 ⎦ (34)
14.25L r (t)
C = [1, 0, 0] (35)
and L r (t) = 0.5sint is the uncertainty considered. The problem is to tract x1 (t) which
is expected to follow the xd (t), that is
t
s(t) = e(t) − e0 − φ̇(τ )dτ (36)
0
5.2 Case 1: Performance of Maglev with Cosine

xd (t) = cos(t) Trajectory
In this simulation xd (t) = cos(t) is the desired trajectory. The proposed method
is applied with xd (t) = cos(t) and results are obtained. During the simulation
required parameters selected as α = 7, β = 5, σ = 0.25, η = 29300, = 0.3. The
tuning of gain matrix is calculated by the proposed guideline so K = [2.945 ×
104 171.604, 6.9563]. The results are shown in Figs. 2 and 3 and it is clear that
the proposed method tracks the desired position while the method given by Das
and Mahant [3] produces offset. The systems like maglev are very sensitive distur-
bances. Therefore, any offset in tracking leads to the instability of the system. The
proposed method seems to be effective as any change in the desired position is pre-
cisely tracked. The precise monitoring is effective due to the optimal control law
designed using stable and unstable poles.
2
Desired
Actual-SO-SMC
1.5 Actual-Proposed
Vertical Position (mm)
0.5
-0.5
-1
-1.5
0 1 2 3 4 5 6 7 8 9 10
Time (sec.)
Fig. 2 Case 1: Vertical position of maglev. SO-SMC = second-order SMC [3]

4
10
3
SO-SMC
Proposed
2
Control input (N-m)
-1
-2
-3
-4
0 1 2 3 4 5 6 7 8 9 10
Time (sec.)
Fig. 3 Case 1: Total control signal of maglev of maglev. SO-SMC = second-order SMC [3]
5.3 Case 2: Performance of Maglev with Cosine

xd (t) = 2cos(2t) Trajectory
During the simulation required parameters selected as α = 7, β = 8, σ = 0.25, η =

29300, = 0.3.The tuning of gain matrix calculated by the proposed guideline so
K = [1.6827 × 104 , 162.1504, 6.9563]. It is found that the proposed stabilizing
sliding mode controller can properly track xd (t) = 2 cos(2t), the desired trajectory
whereas the robust output tracking control matches better than the designed method-
ology of Das and Mahanta [3]. The proposed controller result observed robustness
and performance is verified against the matched disturbances xd (t) = 2 cos(2t) and
states dependent uncertainties. Figures 4 and 5 show that the presented method gives
proper tracking for the desired signal and rejection of the disturbances.
6
Desired
Actual-SO-SMC
5 Actual-Proposed
Vertical Position (mm)
0
0 1 2 3 4 5 6 7 8 9 10
Time (sec.)
Fig. 4 Case 2: Vertical position of maglev. SO-SMC = second-order SMC [3]

4
10
6
SO-SMC
Proposed
4
2
Control input (N-m)
-2
-4
-6
-8
0 1 2 3 4 5 6 7 8 9 10
Time (sec.)
Fig. 5 Case 2: Total control signal of maglev of maglev. SO-SMC = second-order SMC [3]
6 Conclusions
In this paper, an advanced integral stabilizing SMC and chattering-free control

scheme is designed for a typical system. A uniform dynamic maglev model is used
in the simulation study to analyze and study the position control tasks and their vari-
ation from the desired trajectory. In the analysis and simulation, non-linear control
principles have been used extensively and effectively to ensure robustness. In this
work, the poles’ location is used in the design of the control input. Therefore the
chattering has been effectively addressed.
Further, the excitation of the dynamic system without high-frequency oscillations
has been reached. The presented control solution is close to the best possible control
function in the SMC technique. The proposed approach utilities only the information
about the distance from the sliding mode manifold to obtain the minimized variation
of the control signal. It can be concluded that the gain achieved from poles or char-
acteristics of the system are possible useful values for the system’s robust and stable
performance. The gains obtained in the proposed algorithm show the best trade-off
between stability and robustness of the uncertain system.
References
1. Utkin, V.I.: Variable structure systems with sliding modes. IEEE Trans. Autom. Control 22(2),
212–222 (1977)
2. Utkin, V.I.: Sliding mode control design principles and applications to electric drives. IEEE
Trans. Ind. Electron. 40(1), 23–36 (1993)
3. Das, M., Mahanta, C.: Optimal second order sliding mode control for linear uncertain systems.
ISA Trans. 53(6), 1807–1815 (2014)
4. Emel’yanov, S.V.: Variable-Structure Control Systems. Nauka, Moscow (1967)
5. Khandekar, A.A., Malwatkar, G.M., Kumbhar, S.A., Patre, B.M.: Continuous and discrete
sliding mode control for systems with parametric uncertainty using delay ahead prediction. In:
Twelfth IEEE Workshop on Variable Structure Systems. Mumbai, India (2012)
6. Gao, W.: Variable structure control of non-linear systems: a new approach. IEEE Trans. Indus.
Electron. 40 (1993)
7. Wai, R.J., Lin, F.J.: Adaptive recurrent neural network control for linear induction motor. IEEE
Trans. Aerosp. Electron. Syst. 37(4) (2001)
8. Slotine, J.J., Sastry, S.S.: Tracking control of non-linear systems using sliding surfaces with
applications to robot manipulators. Int. J. Control 38, 465–492 (1983)
9. Takahashi, I., Koganezawa, T., Su, G., Ohyama, K.: A super high speed PM motor drive system
by a quasi-current source inverter. IEEE Trans. Ind. Appl. 30, 683–690 (1994)
10. Jezernik, K., Curk, B., Harnik, J.: Discrete-time chattering free sliding mode control. In: Pro-
ceedings of the Workshop on Robust Control via Variable Structure & Lyapunov Techniques,
pp. 319–324. Benevento (1994)
11. Roh, Y.H., Oh, J.H.: Sliding mode control with uncertainty adaptation for uncertain input-delay
systems. Int. J. Control (2000)
12. Bianchi, N., Bolognani, S., Jang, J.H., Sul, S.K.: Comparison of PM motor structures and sen-
sorless control techniques for zero-speed rotor position detection. IEEE Trans. Power Electron.
22, 2466–2475 (2007)
13. Gao, W.B., Wang, Y., Homaifa, A.: Discrete-time variable structure control systems. IEEE
Trans. Ind. Electron. 42(2), 117–122 (1995)
14. Liu, Z.Z., Chen, W., Lu, J., Wang, H., Wang, J.: Formation control of mobile robots using
distributed controller with sampled-data and communication delays. IEEE Trans. Control Syst.
Technol. 24(6), 2125–2132 (2016)
15. Ding, S., Park, J.H., Chen, C.-C.: Second-order sliding mode controller design with output
constraint. Automatica 112, 108704 (2020). ISSN: 0005-1098 (1995)
16. Shieh, N., Liang, K., Mao, C.: Robust output tracking control of an uncertain linear system via
a modified optimal linear-quadratic method. J. Optim. Theory 117(3), 649–59 (2003)
17. Malwatkar, G.M., Khandekar, A.A., Nikam, S.D.: PID controllers for higher order systems
based on maximum sensitivity function. In: 3rd International Conference on Electronics, vol.
1, pp. 259–263 (2011)
Synchronization and Secure
Communication of Chaotic Systems
Ajit K. Singh
Abstract Chaos synchronization generally demands two coupled chaotic systems,

with either one driving the other or mutually coupled. The objective of guiding a sta-
ble system to circumstance when the drive signal is chaotic. A data signal could be
driving in a transmitter that turns out a chaotic signal. It is practicable to researchers
who have no idea regarding the transmitter. Also, if the receiver is a model of the
chaotic transmitter, it could be recovered. As in each practical execution of the com-
munication system, transmitter and receiver circuits work under somewhat distinct
situations; it is crucial to review the case of discrepancy between the transmitter
and receiver framework. These ideas have been implemented to the Chua’s circuit
and an electronic circuit that is especially appropriate for digital communications
and its numerical model. The computation of stability measure shows approximate
to the convergence of directed system to its stable situation. The numerical simula-
tion results reveal the viability of the synchronization of chaotic systems and their
implementation to the secure communication.
Keywords Secure communication · Chaotic signal · Synchronization · Chaotic

system
1 Introduction
The synchronization of chaotic systems has experienced considerable recognition

among researchers working in many fields due to its inherent properties. One of
the crucial areas is secure communications in which a primary approach is chaos
hiding [1, 2]. Chaotic systems have been worn in secure communications and as
random number generators on account of effects. Also, these are present in other
fields in the reports. In the last few decades, many independent techniques identify
with synchronization have been used, where the aim is to construct a monitor to
attain synchronization to the nonlinear circuit system [3]. To the synchronization
A. K. Singh (B)
Department of Mathematics, Amity University Maharashtra, Mumbai 410206, India
e-mail: ajit.brs@gmail.com
576 A. K. Singh
matter, a chaotic system is studied under drive-response systems. The main aim is
to synchronize the response system to the drive system by adding a controller with
a signal [4, 5].
The rest of the article is organized in the following way. A literature review is
presented in Sect. 2. In Sect. 3, the proposed methodology deals with the mathemat-
ical model and synchronization of the model in general. It also includes the circuit
description as well as application of synchronization to secure communication. Syn-
chronization of Chua’s circuit and numerical simulation is described under the results
and discussion in Sect. 4. Finally, conclusions are drawn in Sect. 5.
2 Literature Review
Secure communication established on the synchronization of chaotic systems has

appeared. A standard design for transmitting information using chaotic systems in
which an instructive signal is fixed in the transmitter system. This produces a chaotic
signal. The instructive passing signal is retrieved by the receiver system [6–8]. The
design of chaotic communication considers masking, shift keying, and modulation
technique of chaos masking in which instructive signal is put on directly to the
transmitter. Shift keying is assumed instructive signal to become binary, and this
is depicted through transmitter and receiver. Modulation depends on drive-response
synchronization, in which the instructive signal is introduced in the transmitter during
the nonlinear filter. If the transmitter and the receiver are synchronized in all three
instances, the receiver will retrieve the information signal [9].
Yang and Chua [10] initiated an important secure communication method, in
which the encryption regulation encrypts information signals. Synchronization of
chaotic systems using this method was discussed [11–13]. After that, these systems
and synchronization techniques were applied in secure communication established
on systems [14, 15].
3 The Proposed Methodology
3.1 Mathematical Model
Let us consider a chaotic drive system as
d X 1 (t)
Transmitter: = B X 1 (t) + f 1 (X 1 (t)) , (1)
dt
and a chaotic response system as
Synchronization and Secure Communication of Chaotic Systems 577
d X 2 (t)
Receiver: = C X 2 (t) + f 2 (X 2 (t)) + u(t), (2)
dt
Controller: u(t),
where X 1 = [x1 , y1 , . . . , z 1 ]T ∈ Rn and X 2 = [x2 , y2 , . . . , z 2 ]T ∈ Rn are the state

vectors, B, C ∈ Rn×n are the constant matrices, f 1 , f 2 : Rn → Rn are nonlinear
functions, and u (t) ∈ Rn is the control function to be design.
3.2 Synchronization
Defining the synchronization error state vector as
e (t) = X 2 (t) − X 1 (t)
the error dynamics is obtained as
d e (t)
= C e (t) + F (X 1 (t) , X 2 (t)) + u(t), (3)
dt
where F (X 1 (t) , X 2 (t)) = f 2 (X 2 (t)) − f 1 (X 1 (t)) + (C − B) X 1 (t). To stabi-
lize error system (3), appropriate control function is chosen by synchronization
method.
3.3 Circuit Description
Both transmitter and receiver are uniform circuits involving the integrator-formed
second-order R.C. resonance loop, a correctness, a complete OR gate, and the input to
external source and buffer to keep away from overloading the XOR gate. If someone
is willing to put into code a random message, then it is the order of square impulse of
interval or even additional complicated signal. The transmitter and receiver system
is displayed in Fig. 1.
3.4 Application of Synchronization to Secure

Communication
The entire structure’s reliability depends on the frameworks and chaotic systems
applied to transmitter and receiver systems. The formation of chaotic systems could
be altered by filling in distinct circuits to attain military exercises’ tremendous secu-
578 A. K. Singh
Fig. 1 Transmitter-receiver system diagram
rity. In commercial applications, the formation of chaotic systems could be remain-

ing identical in each mobile place. Different frameworks of chaotic systems could be
applied to non-identical mobile places to attain a significant security degree. It can
retrieve all the digitized synchronization impulses; characteristics of a chaotic system
could not be restored via synchronization impulses without knowing the frameworks.
Therefore, several chaotic cryptanalysis results of low-dimensional secure communi-
cation systems made the illusion that communication schemes were not very secure.
It can be sensible to utilize chaos-oriented secure communication systems, and this
type of systems may require additional problems to synchronization.
Using a standard method with a chaotic system, the security of low-dimensional

secure communication schemes can be enhanced in the other ways. Assemble trans-
mitted signal more complicated and shrink redundancy in the transmitted signal. To
achieve these aims, synchronization provides an up-and-coming technique [16–18].
4.1 Synchronization
Chua’s circuit [19, 20] is made up using two capacitors, one inductor, one piecewise-
linear nonlinear resistor, and one linear resistor. The mathematical description of the
circuit is given as
dx
= α (y − x − f (x))
dt
dy
=x−y+z (4)
dt
dz
= −βy
dt
where f (x) = bx + 0.5 (a − b) [|x + 1| − |x − 1|] . The voltages over two capaci-
tors are represented by variables x and y and the current through the inductor is rep-
resented by variable z. For system parameters value a = −8/7, b = −5/7, α = 9,
and β = 100/7 and construct chaotic nature in system (4).
Two Chua’s circuit chaotic systems in which drive system along with three state
vectors represented by subscript d and response system having same equations rep-
resented by subscript r. Initial condition of drive system is xd (0) = 0.5, yd (0) = 0.5
and z d (0) = −0.5 which is different from the initial condition of the response sys-
tem xr (0) = 1, yr (0) = 1, zr (0) = −0.2, and then two Chua’s circuit systems are
represented, respectively, by in the equations
d xd
= α (yd − xd − f (xd ))
dt
dyd
= xd − yd + z d
dt (5)
dz d
= −βyd
dt
f (xd ) = bxd + 0.5 (a − b) [|xd + 1| − |xd − 1|]
580 A. K. Singh
and
d xr
= α (yr − xr − f (xr )) + u 1
dt
dyr
= xr − yr + zr + u 2
dt (6)
dzr
= −βyr + u 3
dt
f (xr ) = bxr + 0.5 (a − b) [|xr + 1| − |xr − 1|] .
Three control functions u 1 , u 2 , and u 3 in system (6) are inserted. Phase portraits of
the chaotic system are depicted in Fig. 2.
4.2 Numerical Simulation
Chaotic systems are solved by using the fourth-order Runge-Kutta method with a
time step size of 0.001. Unknown parameters are taken as a = −8/7, b = −5/7,
α = 9, and β = 100/7 in the simulations process. At result of this Chua’s circuit
system shows chaotic behavior in the absence of control functions. Initial condition
of the drive system is xd (0) = 1, yd (0) = 4 and z d (0) = −4 and of the response
system is xr (0) = 4, yr (0) = 1, z d (0) = 4. Hence, the error system is e1 (0) = −3,
e2 (0) = 3, e3 (0) = −8. Figure 3 exhibits that Chua’s circuit systems have been
asymptotically synchronized. Error plot of drive and response systems is presented
in Fig. 4.
5 Conclusions
Synchronization of the transmitter-receiver systems is achieved. This result is pre-

sented through the numerical simulation of the considered mathematical model. If
the drive and response system’s unknown variables are non-identical, the system
is robust regarding employed unknown variables. The information signal can be
retrieved firmly based on approximated variable value. An illustration of the non-
linear Chua’s circuit chaotic systems has been taken, which has great importance
in electrical engineering and physics. Also, it pointed out the possibility of observ-
ing synchronization of chaos numerically. Thus this study gives out that this simple
circuit can be effectively utilized in secure communication.
20
10
0
30
20
10 −10
0
−10 x(t)
−20 −20
y(t) −30
60
40
z(t)
20
0
40
20 20
0 10
−20 0
−10
−40 −20
y(t) x(t)
20
x(t)
−20
0 10 20 30 40 50 60 70 80 90 100
50
y(t)
−50
0 10 20 30 40 50 60 70 80 90 100
50
z(t)
0
0 50 100 150
t
Fig. 2 Phase portraits of the chaotic system

582 A. K. Singh
60
x
1
40 x
2
20
2
x
x1,
−20
−40
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
150
y
1
100
y
2
50
y1, y2
−50
−100
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
400
z
1
300 z2
200
z1, z2
100
−100
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
t
Fig. 3 State trajectories of the drive and response systems

2 e (t) e (t) e (t)

1 2 3
e1(t), e2(t), e3(t)
−2
−4
−6
−8
0 1 2 3 4 5 6 7 8 9 10
t
Fig. 4 Error plot of drive and response systems
Acknowledgements The author, Dr. A. K. Singh, is extending his gratitude to the Ph.D. thesis
supervisor Prof. S. Das, Department of Mathematical Sciences, Indian Institute of Technology
(BHU), Varanasi-221005, India, for the continuous guidance.
References
1. Runzi, L., Yinglan, W.: Finite-time stochastic combination synchronization of three different
chaotic systems and its application in secure communication. Chaos: An Interdisc. J. Nonlinear
Sci. 22(2), 023109 (2012)
2. Singh, A.K., Yadav, V.K., Das, S.: Synchronization of time-delay chaotic systems with uncer-
tainties and external disturbances. Discontin. Nonlinear. Complex. 8(1), 13–21 (2019)
3. Miliou, A.N., Antoniades, I.P., Stavrinides, S.G., Anagnostopoulos, A.N.: Secure communi-
cation by chaotic synchronization: Robustness under noisy conditions. Nonlinear Anal. Real
World Appl. 8(3), 1003–1012 (2007)
4. Singh, A.K., Yadav, V.K., Das, S.: Dual combination synchronization of the fractional order
complex chaotic systems. J. Comput. Nonlinear Dyn. 12(1), 011017 (2017)
5. Dasgupta, T., Paral, P., Bhattacharya, S.: Fractional order sliding mode control based chaos
synchronization and secure communication. In: 2015 International Conference on Computer
Communication and Informatics (ICCCI). pp. 1–6. IEEE (2015)
6. Martínez-Guerra, R., García, J.J.M., Prieto, S.M.D.: Secure communications via synchroniza-
tion of Liouvillian chaotic systems. J. Franklin Inst. 353(17), 4384–4399 (2016)
7. Singh, A.K., Yadav, V.K., Das, S.: Synchronization between fractional order complex chaotic
systems with uncertainty. Optik 133, 98–107 (2017)
8. Naderi, B., Kheiri, H.: Exponential synchronization of chaotic system and application in secure
communication. Optik 127(5), 2407–2412 (2016)
9. Kwon, O., Park, J.H., Lee, S.: Secure communication based on chaotic synchronization via
interval time-varying delay feedback control. Nonlinear Dyn. 63(1–2), 239–252 (2011)
10. Yang, T., Chua, L.O.: Impulsive stabilization for control and synchronization of chaotic sys-
tems: theory and application to secure communication. IEEE Trans. Circuits Syst. I: Fundam.
Theory Appl. 44(10), 976–988 (1997)
584 A. K. Singh
11. Gao, X., Hu, H.: Adaptive-impulsive synchronization and parameters estimation of chaotic
systems with unknown parameters by using discontinuous drive signals. Appl. Math. Model.
39(14), 3980–3989 (2015)
12. Yang, J., Chen, Y., Zhu, F.: Associated observer-based synchronization for uncertain chaotic
systems subject to channel noise and chaos-based secure communication. Neurocomputing
167, 587–595 (2015)
13. Singh, A.K., Yadav, V.K., Das, S.: Synchronization between fractional order complex chaotic
systems. Int. J. Dyn. Control 5(3), 756–770 (2017)
14. Al-Hussaibi, W.: Effect of filtering on the synchronization and performance of chaos-based
secure communication over rayleigh fading channel. Commun. Nonlinear Sci. Numer. Simul.
26(1–3), 87–97 (2015)
15. Singh, A.K., Yadav, V.K., Das, S.: Nonlinear control technique for dual combination synchro-
nization of complex chaotic systems. J. Appl. Nonlinear Dyn. 8(2), 261–277 (2019)
16. Tsimring, L.S., Sushchik, M.M.: Multiplexing chaotic signals using synchronization. Phys.
Lett. A 213(3–4), 155–166 (1996)
17. Martinez-Guerra, R., Yu, W.: Chaotic synchronization and secure communication via sliding-
mode observer. Int. J. Bifurcat. Chaos 18(01), 235–243 (2008)
18. Lian, K.Y., Chiang, T.S., Chiu, C.S., Liu, P.: Synthesis of fuzzy model-based designs to syn-
chronization and secure communications for chaotic systems. IEEE Trans. Syst. Man, Cybern.
Part B (Cybernetics) 31(1), 66–83 (2001)
19. Chua, L.O., Itoh, M., Kocarev, L., Eckert, K.: Chaos synchronization in Chua’s circuit. J.
Circuits, Syst. Comput. 3(01), 93–108 (1993)
20. Murali, K., Lakshmanan, M.: Chaotic dynamics of the driven Chua’s circuit. IEEE Trans.
Circuits Syst. I: Fundam. Theory Appl. 40(11), 836–840 (1993)
Improvement in Ranking Relevancy
of Retrieved Results from Google Search
Using Feature Score Computation
Algorithm
Swati Borse and B. V. Pawar
Abstract Websites with a higher position in search engine ranking result; directly
and positively affect visitors’ number to such sites. Search engine optimization (SEO)
has become a promoting business that attempts to improve websites’ ranking. Some-
times, search engine results may contain undeserving websites at top rank due to
SEO techniques in an unethical way. It misleads the search engine, and thereby it
will increase the page rank of unfit websites. Due to this, such results downgrade
the performance of search engines and frustrate the users. These irrelevant pages
must be moved top-down from the search results to improve search engine quality.
This paper analyzes Google results and proposes a novel approach to move down the
top-ranking irrelevant Google search engine results. A ‘feature Score computation’
algorithm was presented here to compute scores based on features found in pages,
and using the score, the pages are re-ranked to move down irrelevant results and
uplift the relevant products. The accuracy of the corpus results’ relevancy was 88%,
and after applying the algorithm, it was improved to 99%. This work improved the
ranking of relevant products efficiently.
Keywords Google · Search engine optimization · Rank · Feature · Score
1 Introduction
The motivation behind designing a search engine is to return relevant search results
to a user. Generally, the user enters a query to a search engine and expects a list of
the most relevant websites. To determine which pages are most appropriate, search
engines match search keywords within their database, select the exact keyword or a
part of the keyword matches, and display the search results with ranking. Users are
interested in top-ranking results. If a website gets a place in top-ranking results, it
can gain more visitors. More visitors means business growth. The website owners
S. Borse (B)
S.S.V.P.S’s L.K. Dr. P.R. Ghogrey Science College, Dhule, India
B. V. Pawar
School of Computer Sciences, North Maharashtra University, Jalgaon, India
586 S. Borse and B. V. Pawar
compete for higher ranking in search engines, so that number of people will frequent
the website, and more revenues will be generated. That is the reason why search
engines keep their ranking algorithm a secret. To promote the ranking of a website in
the top search result, website owners use other ways. The most prominent method of
promoting a website in the search engine result list is using Search Engine Optimiza-
tion (SEO) techniques [1]. SEO is the process of increasing visitors to the website
from the search engine’s listing for selected keywords [2]. SEO engineers are trying
to find innovative ways to control search rankings. For this purpose, they use ethical
(white hat SEO) tactics or sometimes even unethical (black hat SEO) techniques.
Manipulation using unethical SEO tactics can be done with irrelevant content on a
page, excessive and unnecessary links, redirection, cloaking, click fraud, and manip-
ulating the tags’ contents. The search engine emphasizes some of the top-ranking
factors; the web page feature is one of them. Specific properties of a web page that are
used to mislead a search engine to push undeserving sites to the top of the result page
are called features. It includes the number of links pointing to other pages, frequency
or location of keywords or presence of keywords in the title tag, meta description
tag H1 tag, anchor text, etc. [3]. Such manipulated sites do not benefit the user but
instead lead to the problem of webspam.
Although search engines are continuously investing a lot of money and efforts to
fight against this, search results still contain irrelevant links. It consumes user time
by lowering the quality of search results and unnecessarily increasing the load of
traffic.
It becomes necessary to find out irrelevant sites from top search results and move
them downwards so that relevant results get the desired position. For this purpose,
we proposed a ‘Feature Score Computation’ algorithm [4]. This algorithm reads the
source code of web pages, checks each feature’s presence, assigns them weights
considering specific parameters, and accordingly computes a feature score. Further,
a total score was calculated for each page by adding all feature scores. This procedure
was repeated for the top 20 search results. The web pages were then rearranged by
descending order of total score, and a predicted ranking was assigned to each page.
Here google search results were collected as a database corpus. The actual Google
results were compared with the results after re-ranking. The predicted relevancy of
the effects of Google improved from 88 to 99%.
2 Related Work
Search engines continually attract many visitors for searching for information. A
search engine is also used as an essential website promotion method for commercial
sites. Kehoe and Pitkow report that 80% of people used a search engine as a starting
point [5]. Among all search engines on the web, Google is widely favored [6]. Several
researchers have evaluated search engine performance relating to parameters like
precision, recall, relevance, duplication, degree of overlap, and others. Hacking and
Marilyn compare the search engines Alta Vista search engine, Excite, and Lycos for
Improvement in Ranking Relevancy of Retrieved Results … 587
evaluation. Ten queries had been taken for examination. Their top 10 search results
were examined with different parameters like irrelevant, somewhat relevant, and
relevant. The author claims that Alta vista performs better over Excite and Lycos,
relating to high precision and search facilities [7]. Gordon and Pathak evaluated
eight search engines using thirty-three queries for the top two hundred search links.
The researchers have examined results and categorized them as highly irrelevant,
somewhat irrelevant, somewhat relevant, and highly relevant. Among the selected
eight search engines, Open Text, Lycos, and Alta Vista performed best with higher
precision and higher recall [8]. Shang and Longzhuang evaluated six leading search
engines with 300 queries. They computed the relevance score of hits from search
engines and the ranking of search engines based on the statistical comparison of
relevance score. This paper reports that Google performs the best among six search
engines [6]. Su evaluated four search engines over the top 20 links, proposed 16
performance measures, the evaluation parameters like efficiency, user satisfaction,
relevance, utility, and connectivity. Alta Vista has high precision among 04 search
engines [9]. Griesbaum evaluated 03 search engines Google, Lycos, and Alta Vista
(German) related to accuracy for top 20 results based on randomly selected 50 queries.
The results of Google were significantly better than Alta Vista, but Google and
Lycos have no significant difference [10]. Vaughan and Thelwall compared three
search engines with 04 queries for the first 20 links. Human evaluation was used for
checking ranking quality. Top rank pages were retrieved, and these results were stable
over ten weeks. These evaluations used multiple search engines for comparison. The
relevancy of search results was determined using various methods to examine various
search engines’ usability and precision [11]. Olakekan evaluated 05 popular search
engines’ performance based on the quantity of document retrieval, response time,
accuracy, and advert content. Google, MSN, and Yahoo retrieved high document
quality retrieval capacity with low response time, precision value, and suitability for
advert content. Alta Vista is excellent in both relevance and advertisement [12]. Singh
and Sharan compared semantic search performance measured on the precision ratio
of the keyword-based search engine (Yahoo, Google) and semantic-based search
engines (DuckDuckGo, Bing, Hakia). The paper classified the top 20 documents
as relevant and non-relevant for selected ten queries. Among all selected search
engines, Bing retrieved more relevant results. For one of the questions, Hakia and
Google perform better [13].
All the above papers work performed on comparison among search engines
relating to relevance. It could not be located that dealt explicitly with finding non-
relevant results in the top rank list due to SEO techniques’ illegal use. No previous
work has been found to move down irrelevant results from the search engine’s full
list and improve search engine performance. As search engines grow exponentially,
much research is still required to return relevant and accurate results to the top-ranking
list.
Table 1 Random ten queries prefer for evaluation

Beauty PSP Insurance iPod Pasta pizza
3D TV House plan Graphic design Deluxe room Search engine optimization
3 Methodology
3.1 Query Selection and Relevancy Checking
Google performs the best according to relevancy, quality, natural language

processing, and popularity [6, 10, 12, 13]. In this research, Google is selected to
collect search results. Here ten queries made up of 1, 2, or 3 words were chosen
randomly, as given in Table 1.
After entering each query to the Google search engine, the resulting URLs were
displayed. Each URL was clicked, and the resulting web page was stored in a file.
All result pages were collected for each query. Generally, the user is not interested in
search results after the top 10–20 results. Hence, the top 20 results were selected for
the initial study. The top 20 weblinks were evaluated manually to check relevancy.
Every document is classified as “relevant” or “non-relevant”. The sample of results
for query “beauty” is shown in Table 2.
3.2 Parameter Setting
Manually each website was studied, and 52 features were found, which played a
significant role in ranking. From these features, the ten most important features [3]
were selected. SEO techniques help websites improve ranking in search results using
keyword formatting, high-quality backlinks, and quality content. Major features used
in keyword formatting like keyword present in H1 tag, title tag, meta tag, meta
description tag, anchor text, URL path, domain name, keyword position in the title
tag, and keyword density title tag also outgoing links count, total links were selected.
The researchers defined some precise values for each feature. Weight is defined for
each feature according to the location and occurrence of the feature.
3.3 Processing of Parameters
Score_computation algorithm [4] was implemented in C# with .NET environment,

and a system was developed that reads the source code of web pages stored in files and
then checks whether selected features were present or absent. Accordingly, weight
was assigned to each feature, and for each feature present, a score was computed.
Table 2 Evaluation of relevancy or non-relevancy of the query result

Rank URL Relevant (Y/N)
1 http://www.google.co.in/images Yes
2 http://en.wikipedia.org/wiki/Beauty Yes
3 http://en.wikipedia.org/wiki/American_Beauty_(film) No
4 http://www.webindia123.com/women/tips/beauty.htm Yes
5 http://www.webindia123.com/women/Beauty/skintip.htm Yes
6 http://www.youtube.com Yes
7 http://www.indiaparenting.com/beauty/index.cgi Yes
8 http://www.imdb.com/title/tt0169547/ No
9 http://www.imdb.com/title/tt0101414/ No
10 http://www.beautytipshub.com Yes
11 http://www.hindustanlink.com/beautytex/beauty.htm Yes
12 http://www.hindustanlink.com Yes
13 http://www.beautyindia.in/ Yes
14 http://beauty.iloveindia.com/ Yes
15 http://beauty.iloveindia.com/basic-tips/index.html Yes
16 http://beauty.about.com/ Yes
17 http://www.mag4you.com/health/ Yes
18 http://www.beautycarehealth.com/ Yes
19 http://www.bwcindia.org/ No
20 http://www.indiatogether.org/manushi/issue145/lovely.htm Yes
Then the total score of each page was calculated using the score of features present
on that page. These web pages are then arranged in descending order according to the
score of the web page. The web page’s predicted ranking with the original Google
ranking was compared to carry out the analysis. The criteria assigning precise weights
to each feature are shown in Table 3.
The precision value was compared for search results of selected queries. Precision
is computed as a fraction of documents retrieved which are relevant according to the
user’s perception.
Total number of relevant documents

Precision =
Total number of retrieved documents
Table 3 Defining weight to each feature

Sr. no Feature Weight (Wi )
1 Search keyword present in the title tag 10
2 Search keyword present in Meta description 10
3 Search keyword present in the H1 tag 10
4 Search keyword present in the URL path 10
5 Search keyword in Domain name 10
6 At the first position in the title tag, search keyword present 10
At the second position in the title tag, search keyword found 09
Search keyword present after the second position in the title tag 00
7 Density of search keyword present in title tag <= mean density 10
Density of search keyword present in title tag > mean density 00
8 Search keyword count in anchor text <= mean count 10
Search keyword count in anchor text > mean count 00
9 Search keyword Mean outgoing links >= Total outgoing links 10
Search keyword Mean outgoing links < Total outgoing links 00
10 Total links in page <= total mean links (keyword present) 10
Total links in page > total mean links (keyword present) 00
Table 4 depicts the total number of irrelevant results found and the precision value
computed for the top 20 results for each of the queries.
Precision value is 1 for query “house plan” and “insurance” because these queries
returned the first 20 relevant results. For query “PSP,” the lowest precision value
among all is 0.75. Graphical representation for the number of irrelevant results and
their precision value is shown in Fig. 1.
Table 4 Precision for results

Query No. of irrelevant results Precision
of search engine Google
3d TV 2 0.9
Beauty 3 0.85
Deluxe room 3 0.85
House plan 0 1.0
Graphic design 2 0.9
Insurance 0 1.0
Ipod 1 0.95
Pasta pizza 3 0.85
PSP 5 0.75
SEO 1 0.95
Fig. 1 Retrieval of irrelevant results and their precision value
According to the above figure, we can conclude that for query “PSP,” the highest
irrelevant results were returned by Google. Among the top twenty results, five irrel-
evant results found for query “PSP.” 0.75 is the lowest precision weight for query
“PSP.” When we examined each page returned by query “PSP,” it was realized that
query “PSP” is ambiguous. The “PSP” is a short form and used for multiple mean-
ings. Query “PSP” was attempted for “Portable PlayStation.” However, the results
are obtained with various purposes containing the web pages regarding “Personal
Software Process,” “Progressive Supranuclear Palsy,” “Dynamic PSP” technology
for Oracle 11g, “PSP video express” video for software of PSP converter. Such results
lower the performance of search engines.
Here we can see that search engine performance goes down because of the web
pages containing natural language words that are ambiguous. Multiple meaning is
there for the same query. Search results can be improved if the web page contains
precise semantic annotations [14] or the search engine generates a list with multiple
meanings to a submitted query. From this list, the user can select the appropriate
definition for a given query, and the search engine shows the correct result without
ambiguity.
From result analysis, it was found that top search results of other queries also
contained irrelevant results. The next work implemented is to develop the system to
remove the irrelevant results from the top list. Keyword to be searched and retrieved
web pages are the two inputs given to the system. From given search keywords,
features are identified by the system from retrieved web pages. For each feature, using
precise weights, a score was computed. A web page’s total score was calculated by
adding each feature’s weights found on the page. The resulting pages are re-rank by
using this entire score. For the search keyword “3d TV”, the system returns the actual
results and results after re-ranking; its snapshot is depicted in Fig. 2. The position
of actual search results with predicted results was compared. It can be seen that for
Fig. 2 Snapshot of a system for query “3d tv”
most of the queries, the results which are not relevant to the query are moved down
after re-ranking are given in Table 5.
For search keyword “3d TV” results which are not relevant get ranking position
2, 7 which moves down to 18 and 20 resp. In this way, relevant results are placed
at the top. For query “iPod,” the irrelevant result at position two also moves down
at position 20, and the same happened for other queries. Table 5 shows the position
of irrelevant results found in Google’s original result and result after processing the
system, and its graphical representation is in Fig. 3. It shows that almost all the
irrelevant results are placed below, and relevant results are moved to the top because
the system computed a high score value for relevant pages and a low score value for
non-relevant pages.
The sample result for query “beauty” after computation of total score is shown in
Table 6.
Table 6 contained search results returned by Google for query “beauty.” Sites
present at ranking positions 3, 8, 9 are not relevant for query beauty. These results are
re-ranked in descending order according to the score of web pages. Due to re-ranking,
the results at positions 3, 8, and 9 move down, with a change in its ranking position
to 17, 11, and 10, respectively, are shown in Table 7. We can see that 03 non-relevant
sites have been placed in the top 10 search results by Google from observation. Still,
Table 5 Position of
Sr. No Query Original results Result after
irrelevant results after
re-ranking
re-ranking
1 Beauty 03 17
08 11
09 10
2 3d TV 02 18
07 20
3 Deluxe room 04 11
10 18
16 15
4 Graphic design 09 17
15 16
5 Pasta pizza 03 18
19 17
20 16
6 IPod 02 20
7 PSP 04 16
05 19
16 15
19 12
20 17
8 Search engine 13 18
optimization
Fig. 3 Position of irrelevant results

Table 6 Results after computing score for query “beauty”

Rank URL Score Relevant
1 http://www.google.co.in/images 40 Yes
2 http://en.wikipedia.org/wiki/Beauty 29 Yes
3 http://en.wikipedia.org/wiki/American_Beauty_(film) 31 No
4 http://www.webindia123.com/women/tips/beauty.htm 44 Yes
5 http://www.webindia123.com/women/Beauty/skintip.htm 57 Yes
6 http://www.youtube.com 35 Yes
7 http://www.indiaparenting.com/beauty/index.cgi 18 Yes
8 http://www.imdb.com/title/tt0169547/ 40 No
9 http://www.imdb.com/title/tt0101414/ 42 No
10 http://www.beautytipshub.com 34 Yes
11 http://www.hindustanlink.com/beautytex/beauty.htm 52 Yes
12 http://www.hindustanlink.com 50 Yes
13 http://www.beautyindia.in/ 82 Yes
14 http://beauty.iloveindia.com 47 Yes
15 http://beauty.iloveindia.com/basic-tips/index.html 44 Yes
16 http://beauty.about.com/ 54 Yes
17 http://www.mag4you.com/health/ 38 Yes
18 http://www.beautycarehealth.com/ 73 Yes
19 http://www.bwcindia.org/ 40 Yes
20 http://www.indiatogether.org/manushi/issue145/lovely.htm 17 Yes
after executing the system, the result contained only 01 non-relevant sites that remain
in the initial ten search results.
The original top 10 results and search results after processing and re-ranking are
compared for selected queries, as shown in Table 8. It was observed that occurrence
of relevant results in top pages increases than the actual results. For query “beauty”,
“3d TV”, “deluxe room”, and “PSP” the performance increased up to 20%. For the
query “graphic design”, “iPod” and “Pasta Pizza”, the performance of the result
increased up to 10% than the original. Comparative performance of relevant results
is shown graphically in Fig. 4.
5 Conclusion
A system was designed and developed, extracting the retrieved search results’
features—a precise weight assigned to each feature. Total features are found on the
page, and using their precise weight, each page’s score was computed. Using these
Table 7 Predicted results with the score for query “beauty”

Rank Re-rank URL Score Relevant
13 1 http://www.beautyindia.in/ 82 Yes
18 2 http://www.beautycarehealth.com/ 73 Yes
5 3 http://www.webindia123.com/women/Beauty 57 Yes
16 4 http://beauty.about.com/ 54 Yes
11 5 http://www.hindustanlink.com/beautytexbeauty.htm 52 Yes
12 6 http://www.hindustanlink.com 50 Yes
14 7 http://beauty.iloveindia.com/ 47 Yes
4 8 http://www.webindia123.com/women/tips/beauty.htm 44 Yes
15 9 http://beauty.iloveindia.com/basic-tips/index.html 44 Yes
9 10 http://www.imdb.com/title/tt0101414 42 No
8 11 http://www.imdb.com/title/tt0169547 40 No
19 12 http://www.bwcindia.org/ 40 Yes
1 13 http://www.google.co.in/images 40 Yes
17 14 http://www.mag4you.com/health/ 38 Yes
6 15 http://www.youtube.com 35 Yes
10 16 http://www.beautytipshub.com 34 Yes
3 17 http://en.wikipedia.org/wiki/American_Beauty_(film) 31 No
2 18 http://en.wikipedia.org/wiki/Beauty 29 Yes
7 19 http://www.indiaparenting.com/beauty/index 18 Yes
20 20 http://www.indiatogether.org/manushi/issue145/lovely.htm 17 Yes
Table 8 Comparative performance of retrieved result

Query Original result Predicted result
Relevant (%) Irrelevant Relevant (%) Irrelevant
Beauty 70 30% 90 10%
3d tv 80 20% 100 0
Deluxe room 80 20% 100 0
Graphic design 90 10% 100 0
House plan 100 0 100 0
Insurance 100 0 100 0
iPod 90 10% 100 0
Pasta pizza 90 10% 100 0
PSP 80 20% 100 0
SEO 100 0 100 0
Average 88 12% 99 1%
Fig. 4 Relevancy of results of Google
calculated scores, the search result pages are re-rank in descending order of calcu-
lated score. The ranking position of original results of Google with predicted results
was duly compared. It was then successfully concluded that there is an improvement
in the actual results. Thus, it will improve the Google search engine results from 88
to 99% for the corpus prepared and compiled.
References
1. Patil, S.P., Pawar, B.V., Patil, A.S.: Search engine optimization: a study. Res. J. Comput. Inf.
Technol. Sci. 1(1), 10–13 (2013)
2. Patil Swati, P., Pawar, B.V.: Study of website promotion techniques and role of SEO in search
engine results. Int. J. Recent Innov. Trends Comput. Commun. 3(11), 6229–6234 (2015)
3. Pawar, B.V., Patil Swati, P.: System for identification of ranking terms from retrieved results
of major search engines. Int. J. Inf. Retri. 8(2), 201–207 (2015)
4. Patil Swati, P., Pawar, B.V.: Removing non-relevant links from top search results using feature
score computation. Bull. Pure Appl. Sci. 37E(2), 311–320 (2018)
5. Pitkow, J.E., Kehoe, C.M.: Emerging trends in the WWW user. Commun. ACM 39(6), 106–108
(1996)
6. Shang, Y., Longzhuang, L.: Precision evaluation of search engines. World Wide Web 5(2),
159–179 (2002)
7. Heting, C., Rosenthal, M.: Search engines for the World Wide Web: a comparative study and
evaluation methodology. Proc. ASIS Ann. Meet. 33, 27–35 (1996)
8. Gordon, M., Pathak, P.: Finding information on the World Wide Web: the retrieval effectiveness
of search engines. Inf. Process. Manage. 35(2), 141–180 (1999)
9. Su, L.T.: A comprehensive and systematic model of user evaluation of Web search engines: I.
Theory and background. J. Am. Soc. Inf. Sci. Technol. 54(13), 1175–1192 (2003)
10. Griesbaum, J.: Valuation of three German search engines: Altavista.de, Google.de, and
Lycos.de. Inf. Res. Int. Electr. J. 9(4) (2004)
11. Vaughan, L., Thelwall, M.: Search engine coverage bias: evidence and possible causes. Inf.
Process. Manage. 40(4), 693–707 (2004)
12. Olakekan, A.: Comparative study of some popular web search engines. Afr. J. Comp. Sci. ICT
3(1), 3–20 (2010)
13. Singh, J., Sharan, A.: A comparative study between keyword and semantic-based search
engines. In: International Conference on Cloud, Big Data and Trust, pp 130–134 (2013)
14. Inkpen, D.: Information retrieval on the internet. Ph.D. thesis, University of Toronto (2006)
Author Index
A De Ghosh, Ishita, 93
Agrawal, Vaishnavi, 137 Deosarkar, S. B., 475
Ahire, Vijaya, 83 Devi, S. M. Renuka, 293
Ahmad, Tauseef, 403 Dhekane, Shariva, 137
Ahmed, Sajjad, 167
Ansari, Mohd. Javed, 403
Arya, Rajeev, 439 G
Ashok, M., 129 Garg, Aakansha, 439
Ashok, Umadevi, 129 Gaurav, Vipul, 381
Gujjeti, Sridhar, 429
Gupta, Ritigya, 553
B Gupta, Supriya, 147
Bakale, Ravindra S., 475
Bedi, S. S., 217
Begum, Shameedha, 255 H
Bhadade, R., 475 Haque, Md. Asraful, 403
Bisht, Arinjay, 187 Hasneen, Jehan, 447
Bombade, Balaji R., 341 Hoang, Vinh Truong, 1
Bopche, Litesh, 177 Ho, Toan Pham, 1
Borse, Swati, 83, 585
I
Islam, Saiful, 167
C Iswarya, N., 39
Chattopadhyay, Abir, 93
Chaudhari, Vijay D., 487
Chavan, Satishkumar, 121 J
Choudekar, Pallavi, 417, 529 Jadhav, Ishwar S., 563
Choudhary, Ankur, 461 Jagtap, Abhishek, 63
Jaware, Tushar H., 265
Joshee, Minita, 121
D Joshi, Amit, 519
Das, Anup, 255 Joshi, Yashwant, 273
Dash, Shruti, 417
Das, Maniklal, 371
Das, Sumanta, 93 K
Datta, Aniruddha, 137 Kagita, Mohan Krishna, 391
© The Editor(s) (if applicable) and The Author(s), under exclusive license 599
to Springer Nature Singapore Pte Ltd. 2022
Systems and Computing 1354, https://doi.org/10.1007/978-981-16-2008-9
600 Author Index
Kamat, Anirudh, 283 Patil, Vedika, 247

Kanade, Tarun Kumar, 487 Patil, Vinodkumar R., 265
Khan, Armaan, 283 Pawar, B. V., 585
Khan, Mohd. Shahnawaz, 403 Pawar, Onkar, 247
Khobragade, Sanjay, 507 Pawaskar, Omkar, 247
Kolpe, Prapti, 321 Pillai, Abhiram, 121
Kshirsagar, Deepak, 311, 321 Pinto, Anne, 121
Kulkarni, Anirudha, 497 Prakash, Anupama, 529
Kulkarni, Kunal, 137 Prasad, Ch. Durga, 227
Kulkarni, Nilima, 63, 73, 283 Prateek, 439
Kulkarni, Niranjan S., 103 Priya, Bhukya Krishna, 255
Kulkarni, U. V., 11, 25 Priyashan, W. D. Madhuka, 391
Kulkarni, Vinay, 273 Puri, Digambar, 157
Kumar, Ankit, 283
Kumaria, Aayush, 63
Kumar, Santosh, 461 R
Kumbhare, Rohit, 519 Rajamanickam, Siranjeevi, 331
Rajkar, Ajinkya, 73
Ramasamy, Kumar, 129
L Ramasubramanian, N., 255, 331
Lad, Kalpesh, 201 Ramteke, Ramyashri B., 51
Rao, P. Syamala, 227
Rao, R. Varaprasada, 293
M Rastogi, Alok, 487
Madhusudanan, N., 39 Raut, Aniket, 73
Mahajan, J. R., 247
Rege, Priti P., 177
Mahindrakar, Manisha S., 11, 25
Malwatkar, Gajanan M., 563
Manthalkar, Ramchandra, 273
Mishra, Kritika, 553 S
Mishra, Rahul Kumar, 217, 361 Sadique, Kazi Masum, 447
Mishra, Sunil, 487 Sawant, Suraj, 519
Seeja, K. R., 553
Shah, Jui, 371
N Sharaff, Aakanksha, 147
Nagothu, Sudheer Kumar, 541 Sharma, Harshal, 461
Nagwani, Naresh Kumar, 147 Shetty, Balaji S., 11, 25
Nalbalwar, Sanjay L., 103, 157, 507 Shidnal, Sushila, 381
Nandedkar, Shilpa, 497 Shrinivasacharya, Purohit, 301
Nandgaonkar, Anil B., 103, 157, 475, 507 Shruthi, G., 301
Nawale, Shankar, 497 Shukla, Arvind Kumar, 217, 361
Nizam, Amaan, 121 Singh, Ajit K., 575
Noorain, Zofia, 403 Singh, Rajdeep, 217
Singh, Shresth, 381
Siva Bharathi, K. R., 351
P Srinath, Pravin, 235
Pabboju, Suresh, 429 Srivastava, Avikant, 381
Pandey, Ishani, 553 Sule, Sanand, 519
Pandey, Manish Ranjan, 361 Sumalatha, R., 293
Pandian, Revathy, 129
Patel, Bhavna, 247
Patel, Mukesh, 201 T
Patel, Rachna, 201 Thilakarathne, Navod Neranjan, 391
Patil, Hemprasad Yashwant, 187 Thool, Vijaya R., 51
Author Index 601
U Vollala, Satyanarayana, 331

Udawant, Prashant, 235
V W
Vaidya, Atharva, 311 Wagh, Abhay, 157
Varma, G. Parthasaradhi, 227 Wangikar, Makarand D., 341
Venkateswari, R., 39, 351 Wani, Nasir Ul Islam, 529

Applied Information Processing Systems 2022

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applied Information Processing Systems 2022

Uploaded by

Copyright:

Available Formats

Advances in Intelligent Systems and Computing 1354

More information about this series at http://www.springer.com/series/11156

Valentina Emilia Balas

ISSN 2194-5357 ISSN 2194-5365 (electronic)

Dr. Babasaheb Ambedkar Technological University, Lonere-402103, is a State Tech-

Lonere, India Dr. Brijesh Iyer

CNN Parameter Adjustment for Brain Tumor Classification . . . . . . . . . . . 1

Deep Learning-Based Parameterized Framework to Investigate

Random Forest and Gabor Filter Bank Based Segmentation

Green Internet of Things: The Next Generation Energy Efficient

Smart Student Assessment System for Online Classes Participation . . . . 541

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599

Valentina Emilia Balas is currently Full Professor in the Department of Auto-

Journal of Advanced Intelligence Paradigms (IJAIP) and to International Journal of

Toan Pham Ho and Vinh Truong Hoang

Keywords Deep learning · Transfer learning · Brain tumor classifications ·

• Proposing techniques to improve performance

Table 1 Parameter values of enhancing approaches

Resized Images Resized Images Resized Images

Data Augmentation Data Augmentation Data Augmentation

Based ResNet's Architecture Based DenseNet's Architecture Based MobileNet's Architecture

Batch Normalization FC Layer (1024 filters) FC Layer (1024 filters)

FC Layer (3 filters) ReLU Layer ReLU Layer

FC Layer (512 filters) FC Layer (1024 filters)

Batch Normalization Batch Normalization

FC Layer (3 filters) ReLU Layer

Softmax Layer Batch Normalization

(a) (b) (c)

3 Results and Discussions

3.1 Database Preparation

Table 2 Summary of brain tumor dataset

Fig. 2 Several examples of

(a) (b) (c)

Table 3 Accuracy from both original and proposed methods

Table 4 The comparison with previous works

Balaji S. Shetty, Manisha S. Mahindrakar, and U. V. Kulkarni

Abstract Data mining discovers meaningful knowledge attributes from provided

Keywords Radial basis function neural network · Fuzzy membership function ·

B. S. Shetty (B) · M. S. Mahindrakar · U. V. Kulkarni

Fig. 1 Artificial intelligence hierarchy

2 Radial Basis Function Neural Network

Fig. 2 RBF architecture

cluster represents a subset of respective class data. Representation of kth cluster Hk

• Step 3: Gradient descent method is used in determination of weights between

3 Advance Fuzzy Radial Basis Function Neural Network

Fig. 3 Advance fuzzy radial basis function neural network architecture

m j (Xh , Cpj , r j ) = 1 − f (l, r j ) (3)

where Xh = (x h1 , x h2 , . . . , x hn ) is the hth given input to be trained and Cpj , r j are

4 Proposed AFRBFNN Learning Algorithm

To design a UFRBFNN classifier, an unbounded spread fuzzy clustering (USFC)

Step 2: Calculate the smallest distance of every pattern Xi ∈ Ck with patterns of

Step 3: (Temporary Cluster creation) We choose the pattern x kj which under

Algorithm 1 —Calculation of centroid and radius

Algorithm 2 —Calculation of number of patterns clusterd by HS

5 Performance Evaluation and Analysis of the AFRBFNN

5.1 Case Study with 2-D Examples

Example 1: In this experiment, for a better understanding of OSFC algorithm and to

Table 1 Case study 1—Example dataset

Fig. 4 Case study

Fig. 5 Case study

Fig. 6 Case Study

Fig. 7 Case study

Table 2 Case study 2—Example 2—dataset

Fig. 8 Scatter plot of Table 2: case study 2