Top Five Machine Learning Libraries in Python - A Comparative Analysis

Lecture Notes in Networks and Systems 494
Vikrant Bhateja
K. V. N. Sunitha
Yen-Wei Chen
Yu-Dong Zhang Editors
Intelligent
System
Design
Proceedings of INDIA 2022
Lecture Notes in Networks and Systems
Volume 494
Series Editor
Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences,
Warsaw, Poland
Advisory Editors
Fernando Gomide, Department of Computer Engineering and Automation—DCA,
School of Electrical and Computer Engineering—FEEC, University of
Campinas—UNICAMP, São Paulo, Brazil
Okyay Kaynak, Department of Electrical and Electronic Engineering,
Bogazici University, Istanbul, Turkey
Derong Liu, Department of Electrical and Computer Engineering, University of
Illinois at Chicago, Chicago, USA
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Witold Pedrycz, Department of Electrical and Computer Engineering, University of
Alberta, Alberta, Canada
Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland
Marios M. Polycarpou, Department of Electrical and Computer Engineering,
KIOS Research Center for Intelligent Systems and Networks, University of Cyprus,
Nicosia, Cyprus
Imre J. Rudas, Óbuda University, Budapest, Hungary
Jun Wang, Department of Computer Science, City University of Hong Kong,
Kowloon, Hong Kong
The series “Lecture Notes in Networks and Systems” publishes the latest
developments in Networks and Systems—quickly, informally and with high quality.
Original research reported in proceedings and post-proceedings represents the core
of LNNS.
Volumes published in LNNS embrace all aspects and subfields of, as well as new
challenges in, Networks and Systems.
The series contains proceedings and edited volumes in systems and networks,
spanning the areas of Cyber-Physical Systems, Autonomous Systems, Sensor
Networks, Control Systems, Energy Systems, Automotive Systems, Biological
Systems, Vehicular Networking and Connected Vehicles, Aerospace Systems,
Automation, Manufacturing, Smart Grids, Nonlinear Systems, Power Systems,
Robotics, Social Systems, Economic Systems and other. Of particular value to
both the contributors and the readership are the short publication timeframe and
the world-wide distribution and exposure which enable both a wide and rapid
dissemination of research output.
The series covers the theory, applications, and perspectives on the state of the art
and future developments relevant to systems and networks, decision making, control,
complex processes and related areas, as embedded in the fields of interdisciplinary
and applied sciences, engineering, computer science, physics, economics, social, and
life sciences, as well as the paradigms and methodologies behind them.
Indexed by SCOPUS, INSPEC, WTI Frankfurt eG, zbMATH, SCImago.
All books published in the series are submitted for consideration in Web of Science.
For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).

Vikrant Bhateja · K. V. N. Sunitha · Yen-Wei Chen ·
Yu-Dong Zhang
Editors
Intelligent System Design

Proceedings of INDIA 2022
Editors
Vikrant Bhateja K. V. N. Sunitha
Department of Electronics College of Engineering for Women
and Communication Engineering BVRIT Hyderabad
Shri Ramswaroop Memorial College Hyderabad, Telangana, India
of Engineering and Management
(SRMCEM) Yu-Dong Zhang
Dr. A. P. J. Abdul Kalam Technical Department of Informatics
University University of Leicester
Lucknow, Uttar Pradesh, India Leicester, UK
Yen-Wei Chen
College of Information Science
and Engineering
Ritsumeikan University
Kusuatsu, Shiga, Japan
ISSN 2367-3370 ISSN 2367-3389 (electronic)

Lecture Notes in Networks and Systems
ISBN 978-981-19-4862-6 ISBN 978-981-19-4863-3 (eBook)
https://doi.org/10.1007/978-981-19-4863-3
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
Conference Organization Commitees
Chief Patron
Sri. K. V. Vishnu Raju, Chairman, SVES
Patrons
Sri. Ravichandran Rajagopal, Vice-Chairman, SVES

Sri K. Aditya Vissam, Secretary, SVES
Conference Chairs
Dr. K. V. N. Sunitha, Principal, BVRIT, Hyderabad, India

Dr. Suresh Chandra Satapathy, KIIT, Bhubaneswar, Odisha, India
Organizing Chair
Dr. J. Naga Vishnu Vardhan, Professor, ECE, and Professor-Incharge Academics,

BVRIT, Hyderabad, India
Publication Chair
Dr. Vikrant Bhateja, Shri Ramswaroop Memorial College of Engineering and

Management, Lucknow, Uttar Pradesh, India
v
vi Conference Organization Commitees
Organizing Committee
Dr. Ch. Sunil Kumar, Vice-Principal and HoD, EEE

Dr. K. Srinivasa Reddy, HoD, CSE
Dr. S. L. Aruna Rao, HoD, IT
Dr. Anwar Bhasha Pattan, HoD, ECE
Dr. L. Lakshmi, HoD, CSE (AI&ML)
Dr. M. Anita, HoD, BSH
Prof. Murali Nath, Prof. Incharge, Accreditations
Dr. G. Naga Satish, Professor, CSE
Dr. K. Adinarayana Reddy, Professor, IT
Dr. J. Manoj Kumar, Prof. Incharge, Admissions
Ms. M. Praveena, Associate Professor, ECE
Mr. R. Guruswamy, Associate Professor, EEE
Website and Poster Committee
Ms. M. Shanmuga Sundari, Assistant Professor, CSE

Ch. Anil Kumar, Assistant Professor, IT
Mr. R. Priyakanth, Associate Professor, ECE
Publicity Committee
Dr. P. Kayal, Associate Professor, IT and R&D Incharge

Dr. V. Rajeswari, Professor, EEE
Dr. M. Parvathi, Professor, ECE
Dr. V. Madhavi, Associate Professor, BSH
Dr. M. Indrasena Reddy, Associate Professor, CSE
Dr. P. Anji Reddy Polu, Assistant Professor, BSH
Dr. A. Sudharshan Chakravarthy, Assistant Professor, CSE
Advisory Committee
Aimé Lay-Ekuakille, University of Salento, Lecce, Italy

Amira Ashour, Tanta University, Egypt
Aynur Unal, Standford University, USA
Bansidhar Majhi, IIIT Kancheepuram, Tamil Nadu, India
Dilip Kumar Sharma, Vice-Chairman, IEEE, U.P. Section
Conference Organization Commitees vii
Yu-Dong Zhang, University of Leicester, UK

Ganpati Panda, IIT Bhubaneswar, Odisha, India
Govardhan, Professor in CSE and Rector, JNTUH, Hyderabad, India
Wenxian Yang, Senior Lecturer, University of Newcastle
Jagdish Chand Bansal, South Asian University, New Delhi, India
Muruganandam, Lecturer, Department of Engineering (Electrical Engineering
Section), University of Technology and Applied Sciences—Ibri, Sultanate of Oman
João Manuel R. S. Tavares, Universidade do Porto (FEUP), Porto, Portugal
Jyotsana Kumar Mandal, University of Kalyani, West Bengal, India
K. C. Santosh, University of South Dakota, USA
Le Hoang Son, Vietnam National University, Hanoi, Vietnam
Naeem Hanoon, Multimedia University, Cyberjaya, Malaysia
V. Vijaya Kumar, Dean, Department of CSE and IT, Anurag University, Hyderabad,
India
Nilanjan Dey, JIS University, Kolkata, India
Noor Zaman, Universiti Tecknologi, PETRONAS, Malaysia
Roman Senkerik, Tomas Bata University, Zlin, Czech Republic
C. Krishana Mohan, Professor, CSE, IIT Hyderabad, India
P. Radha Krishna, Professor, CSE, NIT Warangal, India
Swagatam Das, Indian Statistical Institute, Kolkata, India
Vijayalakshmi Saravanan, University of South Dakota, Department of Computer
Science, Buffalo, New York, USA
Thanikanti Sudhakar Babu, Post Doctoral Fellow Institute of Power Engineering,
Department of Electrical Power Engineering, Universiti Tenaga Nasional (UNITEN),
Malaysia
Siba K. Udgata, University of Hyderabad, Telangana, India
K. M. Prasad, Senior R&D Engineer, Enginia Research Inc. Winnipeg, Canada
Meenalosini Vimal Cruz, Assistant Professor, Computer Science, Georgia Southern
University, Port Wentworth, Georgia, USA
Shri. Nivas Singh, MMMUT, Gorakhpur, Uttar Pradesh, India
Tai Kang, Nanyang Technological University, Singapore
Valentina Balas, Aurel Vlaicu University of Arad, Romania
Anil Kumar V., Associate Professor, ECE, IIIT Hyderabad, India
Technical Program Committee
Abdul Rajak A. R., Department of Electronics and Communication Engineering

Birla Institute of Dr. Nitika Vats Doohan, Indore, India
Ahmad Al-Khasawneh, The Hashemite University, Jordan
Alexander Christea, University of Warwick, London, UK
Amioy Kumar, Biometrics Research Lab, Department of Electrical Engineering, IIT
Delhi, India
Anand Paul, The School of Computer Science and Engineering, South Korea
viii Conference Organization Commitees
Apurva A. Desai, Veer Narmad South Gujarat University, Surat, India

Avdesh Sharma, Jodhpur, India
A. K. Chaturvedi, Department of Electrical Engineering, IIT Kanpur, India
Bharat Singh Deora, JRNRV University, India
Bhavesh Joshi, Advent College, Udaipur, India
Brent Waters, University of Texas, Austin, Texas, USA
Chhaya Dalela, Associate Professor, JSSATE, Noida, Uttar Pradesh
Dan Boneh, Computer Science Department, Stanford University, California, USA
Feng Jiang, Harbin Institute of Technology, China
Gengshen Zhong, Jinan, Shandong, China
Harshal Arolkar, Immd. Past Chairman, CSI Ahmedabad Chapter, India
Amlan Chakraborthi, Head, Professor, and Director, A. K. Choudhury School of IT,
University of Calcutta
H. R. Vishwakarma, Professor, VIT, Vellore, India
Jayanti Dansana, KIIT University, Bhubaneswar, Odisha
Jean Michel Bruel, Departement Informatique IUT de Blagnac, Blagnac, France
Jeril Kuriakose, Manipal University, Jaipur, India
Jitender Kumar Chhabra, NIT Kurukshetra, Haryana, India
Kalpana Jain, CTAE, Udaipur, India
Komal Bhatia, YMCA University, Faridabad, Haryana, India
Krishnamachar Prasad, Department of Electrical and Electronic Engineering, Auck-
land, New Zealand
K. C. Roy, Principal, Kautaliya, Jaipur, India
Lorne Olfman, Claremont, California, USA
Martin Everett, University of Manchester, England
Meenakshi Tripathi, MNIT, Jaipur, India
Mukesh Shrimali, Pacific University, Udaipur, India
Murali Bhaskaran, Dhirajlal Gandhi College of Technology, Salem, Tamil Nadu,
India
Nilay Mathur, Director, NIIT Udaipur, India
Ngai-Man Cheung, Assistant Professor, University of Technology and Design,
Singapore
Philip Yang, Price Water House Coopers, Beijing, China
Pradeep Chouksey, Principal, TIT College, Bhopal, Madhya Pradesh, India
Prasun Sinha, Ohio State University Columbus, Columbus, OH, USA
Rajendra Kumar Bharti, Assistant Professor, Kumaon Engineering College,
Dwarahat, Uttarakhand, India
Pradeep Chowriappa, Assistant Professor at Louisiana Tech University Ruston,
Louisiana, USA
R. K. Bayal, Rajasthan Technical University, Kota, Rajasthan, India
Sami Mnasri, IRIT Laboratory Toulouse, France
Savita Gandhi, Professor, Gujarat University, Ahmedabad, India
Nishanth Dikkala, Research Engineer, MIT, Michigan, USA
Soura Dasgupta, Department of TCE, SRM University, Chennai, India
Conference Organization Commitees ix
Sushil Kumar, School of Computer and Systems Sciences, Jawaharlal Nehru

University, New Delhi, India
Prasad Mavuduri, CEO, University of Emerging Technologies University of
Emerging Technologies, USA
S. R. Biradar, Department of Information Science and Engineering, SDM College
of Engineering and Technology, Dharwad, Karnataka
Ting-Peng Liang, National Chengchi University, Taipei, Taiwan
Xiaoyi Yu, National Laboratory of Pattern Recognition, Institute of Automation,
Chinese Academy of Sciences, Beijing, China
Yun-Bae Kim, Sungkyunkwan University, South Korea
Karuppanan, Professor, Electronics and Communication Engineering, Motilal Nehru
National Institute of Technology, Allahabad
Raghava Rao Mukkamala, Director of the Centre for Business Data Analytics,
CBDA, CBS, Denmark
Anil Kumar Vuppala, Assistant Professor, IIIT Hyderabad
P. Bala Prasad, Technology Head, TCS Hyderabad
M. Asha Rani, Professor, ECE, JNTUH, Hyderabad
P. Radha Krishna, Professor and Head, Computer Science and Engineering Depart-
ment, National Institute of Technology, Warangal
Narayana Prasad Padhy, Dean of Academic Affairs, Professor and Institute Chair,
Department of Electrical Engineering, IIT Roorkee
Sourav Mukhopadhyay, Professor, IIT Kharagpur
Sukumar Mishra, Professor, Department of Electrical Engineering, IIT Delhi
Siva Rama Krishna Vanjari, Associate Professor, Department of Electrical Engi-
neering, IIT Hyderabad
Durga Prasad Mohapatra, Professor, Department of Computer Science and Engi-
neering, National Institute of Technology Rourkela
Preface
This book is a collection of high-quality peer-reviewed research papers presented

at the “7th International Conference on Information System Design and Intelligent
Applications (INDIA-2022)” held at BVRIT HYDERABAD College of Engineering
for Women, Hyderabad, India, during February 25–26, 2022.
After the success of past six editions of INDIA conferences which was initi-
ated in the year 2012 and was first organized by Computer Society of India (CSI),
Vizag Chapter. Its sequel, INDIA-2015, has been organized by Kalyani Univer-
sity, West Bengal, followed by INDIA-2016, organized by ANITS, Vizag, INDIA-
2017, organized by Duy Tan University, Da Nang, Vietnam, INDIA-2018, organized
by Université des Mascareignes, Mauritius, and INDIA-2019, organized by LIET,
Vizianagaram. All papers of past INDIA editions are published by Springer Nature
as publication partner. Presently, INDIA-2022 provided a platform for academi-
cians, researchers, scientists, professionals, and students to share their knowledge
and expertise in the diverse domain of intelligent computing and communication.
INDIA-2022 had received a number of submissions from the field of Information
system design, intelligent applications, and its prospective applications in different
spheres of engineering. The papers received have undergone a rigorous peer review
process with the help of the technical program committee members of the conference
from the various parts of country as well as abroad. The review process has been
crucial with minimum two reviews each along with due checks on similarity and
content overlap. This conference has featured theme-based special sessions in the
domain of Blockchain 4.0, AI in IoT, Bio-Inspired Computing, etc., along with main
track.
The conference featured many distinguished keynote addresses by eminent
speakers like Prof. T. Bheemarjuna Reddy, Professor, CSE, IIT Hyderabad,
India, who addressed the gathering on V2X Technologies toward connected and
autonomous navigation regulation, challenges, and research opportunities. Prof.
Amlan Chakrabarti, University of Calcutta, delivered a talk on IoT for Societal Appli-
cations. Prof. Sanjay Ranka, Distinguished Professor in the Department of Computer
Information Science and Engineering at University of Florida, delivered a talk on
AIML for Smart Transportation.
xi
xii Preface
These keynote lectures/talks embraced a huge toll of audience of students, facul-

ties, budding researchers as well as delegates. The editors thank General Chair, TPC
Chair, and the Organizing Chair of the conference for providing valuable guidelines
and inspirations to overcome various difficulties in the process of organizing this
conference. The editors also thank BVRIT HYDERABAD College of Engineering
for Women for their whole-hearted support in organizing this edition of INDIA
conference.
The editorial board takes this opportunity to thank authors of all the submitted
papers for their hard work, adherence to the deadlines, and patience during the
review process. The quality of a refereed volume depends mainly on the expertise
and dedication of the reviewers. We are indebted to the TPC members who not only
produced excellent reviews but also did these in short time frames.
Bhubaneswar, India Suresh Chandra Satapathy

Hyderabad, India K. V. N. Sunitha
Lucknow, India Vikrant Bhateja
Contents
A Framework for Early Recognition of Alzheimer’s Using Machine

Learning Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Lolla Kiran Kumar, P. Srinivasa Rao, and S. Sreenivasa Rao
On the Studies and Analyzes of Facial Detection and Recognition
Using Machine Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Navya Thampan and Senthil Arumugam Muthukumaraswamy
IPL Analysis and Match Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Anjali Singhal, Deepanshi Agarwal, Esha Singh, Rajat Valecha,
and Ritika Malik
Application of ANN Combined with Machine Learning for Early
Recognition of Parkinson’s Disease . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Bharathi Uppalapati, S. Srinivasa Rao, and P. Srinivasa Rao
People Count from Surveillance Video Using Convolution Neural
Net . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
L. Lakshmi, A. Naga Kalyani, G. Naga Satish, and R. S. Murali Nath
Detection of Pneumonia and COVID-19 from Chest X-Ray Images
Using Neural Networks and Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Jeet Santosh Nimbhorkar, Kurapati Sreenivas Aravind, K. Jeevesh,
and Suja Palaniswamy
Plant Leaf Disease Detection and Classification Using Deep
Learning Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
S. S. Bhoomika and K. M. Poornima
Breast Mass Classification Using Convolutional Neural Network . . . . . . . 85
Varsha Nemade, Sunil Pathak, Ashutosh Kumar Dubey,
and Deepti Barhate
xiii
xiv Contents
Deep Generative Models Under GAN: Variants, Applications,

and Privacy Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Remya Raveendran and Ebin Deni Raj
Fusion-Based Celebrity Profiling Using Deep Learning . . . . . . . . . . . . . . . . 107
K. Adi Narayana Reddy, Naveen Kumar Laskari,
G. Shyam Chandra Prasad, and N. Sreekanth
DeepLeaf: Analysis of Plant Leaves Using Deep Learning . . . . . . . . . . . . . 115
Deepti Barhate, Sunil Pathak, Ashutosh Kumar Dubey,
and Varsha Nemade
Potential Assessment of Wind Power Generation Using Machine
Learning Algorithms for Southern Region of India . . . . . . . . . . . . . . . . . . . 125
P. Upendra Kumar, K. Lakshmana Rao, and T. S. Kishore
OCR-LSTM: An Efficient Number Plate Detection System . . . . . . . . . . . . 135
M. Indrasena Reddy, K. Srinivasa Reddy, B. Rakesh, and K. Prathima
Artificial Neural Network Alert Classifier for Construction
Equipments Telematics (CET) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Mohan Gopal Raje Urs, S. P. Shiva Prakash, and Kirill Krinkin
Hybrid Approach of Modified IWD and Machine Learning
Techniques for Android Malware Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Ravi Mohan Sharma and Chaitanya P. Agrawal
Intuitionistic Fuzzy 9 Intersection Matrix for Obtaining
the Relationship Between Indeterminate Objects . . . . . . . . . . . . . . . . . . . . . 171
Subhankar Jana and Juthika Mahanta
A Hybrid Model of Latent Semantic Analysis with Graph-Based
Text Summarization on Telugu Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
Aluri Lakshmi and D. Latha
A Combined Approach of Steganography and Cryptography
with Generative Adversarial Networks: Survey . . . . . . . . . . . . . . . . . . . . . . . 187
Kakunuri Sandya and Subhadra Kompella
Real-Time Accident Detection and Intimation System Using Deep
Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
K. Padma Vasavi
Design of Cu-Doped SnO2 Thick-Film Gas Sensor for Methanol
Using ANN Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
Amit Gupta, Shashi Kant Dargar, A. V. Nageswara Rao,
and B. Raghavaiah
Detect Traffic Lane Image Using Geospatial LiDAR Data Point
Clouds with Machine Learning Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
M. Shanmuga Sundari, M. Sudha Rani, and A. Kranthi
Contents xv
Classification of High-Dimensionality Data Using Machine

Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
D. Padmaja Usharani, G. Sridevi, Rambabu Pemula,
and Sagenela Vijaya Kumar
To Detect Plant Disease Identification on Leaf Using Machine
Learning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
P. Praveen, Mandala Nischitha, Chilupuri Supriya, Mitta Yogitha,
and Aakunoori Suryanandh
Association and Correlation Analysis for Predicting the Anomaly
in the Stock Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
R. Ravinder Reddy, M. Venkata Krishna Reddy,
and L. Raghavender Raju
Early Identification of Diabetic Retinopathy Using Deep Learning
Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
Sachin Sharma, Sakshi Zanje, and Dharmesh Shah
Performance Evaluation of MLP and CNN Models for Flood
Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
Ippili Saikrishna Macharyulu, Deba Prakash Satapathy,
Abinash Sahoo, Sandeep Samantaray, Nihar Ranjan Mohanta,
and Arkajyoti Ray
Bidirectional LSTM-Based Sentiment Analysis
of Context-Sensitive Lexicon for Imbalanced Text . . . . . . . . . . . . . . . . . . . . 283
P. Krishna Kishore, K. Prathima, Dutta Sai Eswari,
and Konda Srikar Goud
Improving Streamflow Prediction Using Hybrid BPNN Model
Combined with Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . 299
Nagarampalli Manoj Kumar, Ippili Saikrishnamacharyulu,
Abinash Sahoo, Sandeep Samantaray, Mavoori Hitesh Kumar,
Akash Naik, and Srinibash Sahoo
Prediction of Pullout Resistance of Geogrids Using ANN . . . . . . . . . . . . . . 309
Ippili Saikrishna Amacharyulu, Balendra Mouli Marrapu,
and Vasala Madhava Rao
Simulation of Water Table Depth Using Hybrid CANFIS Model:
A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Ippili Saikrishnamacharyulu, Nihar Ranjan Mohanta,
Mavoori Hitesh Kumar, Sandeep Samantaray, Abinash Sahoo,
Prameet Kumar Nanda, and Priyashree Ekka
xvi Contents
Monthly Runoff Prediction by Support Vector Machine Based

on Whale Optimisation Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Aiswarya Mishra, Abinash Sahoo, Sandeep Samantaray,
Deba Prakash Satapathy, and Suresh Chandra Satapathy
Application of Adaptive Neuro-Fuzzy Inference System and Salp
Swarm Algorithm for Suspended Sediment Load Prediction . . . . . . . . . . . 339
Gopal Krishna Sahoo, Abinash Sahoo, Sandeep Samantara,
Maturity Status Estimation of Banana Using Image Deep Feature
and Parallel Feature Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
Ashoka Kumar Ratha, Prabira Kumar Sethy, Nalini Kanta Barpanda,
and Santi Kumari Behera
Application of a Combined GRNN-FOA Model for Monthly
Rainfall Forecasting in Northern Odisha, India . . . . . . . . . . . . . . . . . . . . . . . 355
Deba Prakash Satapathy, Harapriya Swain, Abinash Sahoo,
Sandeep Samantaray, and Suresh Chandra Satapathy
Guided Image Filter and SVM-Based Automated Classification
of Microscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
Vikrant Bhateja, Disha Singh, and Ankit Yadav
Application of Machine Learning Algorithms for Creating a Wilful
Defaulter Prediction Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
B. Uma Maheswari, Hari Shankar Chandran, R. Sujatha, and D. Kavitha
Design of Metamaterial-Based Multilayer Dual Band Circularly
Polarized Microstrip Patch Antenna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Chirag Arora
Heart Disease Prediction in Healthcare Communities by Machine
Learning Over Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
Lingala Thirupathi, B. Srinivasulu, Unnati Khanapurkar, D. Rambabu,
and C. M. Preeti
A Novel Twitter Sentimental Analysis Approach Using Naive
Bayes Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
Lingala Thirupathi, G. Rekha, S. K. Shruthi, B. Sowjanya,
and Sowmya Jujuroo
Recognition and Adoption of an Abducted Child Using Haar
Cascade Classifier and JSON Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Ghousia Begum, C. Kishor Kumar Reddy, and P. R. Anisha
Automatic Brain Tumor Detection Using Convolutional Neural
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Amtul B. Ifra and Madiha Sadaf
Contents xvii
Deep Learning and Blockchain for Electronic Health Record

in Healthcare System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Ch. Sravanthi and Smitha Chowdary
Artificial Neural Networks in Improvement of Spatial Resolution
of Thermal Infrared Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Mallam Gurudeep, Gaddam Samatha, Sandeep Ravikanti,
and Gopal Rao Kulkarni
Facial Micro-expression Recognition Using Deep Learning . . . . . . . . . . . . 447
Nasaka Ravi Praneeth, Godavarthi Sri Sai Vikas,
Ravuri Naveen Kumar, and T. Anuradha
Precision Agriculture with Weed Detection Using Deep Learning . . . . . . 455
I. Deva Kumar, J. Sai Rashitha Sree, M. Devi Sowmya, and G. Kalyani
An Ensemble Model to Detect Parkinson’s Disease Using MRI
Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
T. Sri Lakshmi, B. Lakshmi Ramani, Rohith Kumar Jayana,
Satwik Kaza, Soma Sai Surya Teja Kamatam, and Bhimala Raghava
Classification of Diabetic Retinopathy Using Deep Neural Networks . . . . 475
J. Hyma, M. Ramakrishna Murty, S. Ranjan Mishra, and Y. Anuradha
A Deep Learning Model for Stationary Audio Noise Reduction . . . . . . . . 483
Sanket S. Kulkarni, Ansuman Mahapatra, and T. Bala Sundar
Optimizing Deep Neural Network for Viewpoint Detection
in 360-Degree Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
Surya Raj and Ansuman Mahapatra
ConvNet of Deep Learning in Plant Disease Detection . . . . . . . . . . . . . . . . . 501
J. Gajavalli and S. Jeyalaksshmi
Recognition of Iris Segmentation Using CNN and Neural Networks . . . . 515
S. Jeyalaksshmi and P. J. Sai Vignesh
Popularity of Optimization Techniques in Sentiment Analysis . . . . . . . . . 523
Priyanka and Kirti Walia
Predominant Role of Artificial Intelligence in Employee Retention . . . . . 535
Ravinder Kaur and Hardeep Kaur
Semantic Segmentation of Brain MRI Images Using Squirrel
Search Algorithm-Based Deep Convolution Neural Network . . . . . . . . . . . 547
B. Tapasvi, E. Gnana Manoharan, and N. Udaya Kumar
Top Five Machine Learning Libraries in Python: A Comparative
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
Mothe Rajesh and M. Sheshikala
xviii Contents
A Novel Technique of Threshold Distance-Based Vehicle Tracking

System for Woman Safety . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567
B. V. D. S. Sekhar, V. V. S. S. S. Chakravarthy, S. Venkataramana,
Bh. V. S. R. K. Raju, N. Udayakumar, and S. Krishna Rao
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

Editors and Contributors
About the Editors
Vikrant Bhateja is the associate professor in the Department of ECE, SRMGPC,

Lucknow (U.P.), and also the dean (Academics) in the same college. He is doctorate
in ECE (Bio-Medical Imaging) with a total academic teaching experience of 18
years with around 180 publications in reputed international conferences, journals
and online book chapter contributions, out of which 31 papers are published in SCIE
indexed high impact factored journals. He has been instrumental in chairing/co-
chairing around 30 international conferences in India and abroad as publication/TPC
chair and edited 45 book volumes from Springer Nature as a corresponding/co-
editor/author on date. He has delivered nearly 20 keynotes, invited talks in interna-
tional conferences, ATAL, TEQIP and other AICTE sponsored FDPs and STTPs.
He is Editor-in-Chief of IGI Global-International Journal of Natural Computing
and Research (IJNCR) an ACM and DBLP indexed journal since 2017. He has guest
edited special issues in reputed SCIE indexed journals under Springer Nature and
Elsevier.
K. V. N. Sunitha completed her B.Tech. in ECE from ANU, M.Tech. in CS from

REC Warangal in 1993 and Ph.D. in CSE from JNTU Hyderabad in 2006. She
has 29 years of teaching experience and 13 years of research experience. She
is working as Founder Principal, BVRIT Hyderabad College of Engineering for
Women, Hyderabad, since August 2012. She received “Academic Excellence Award”
by G. Narayanamma Institute of Technology and Science in 2005, “Best Computer
Science Engineering Teacher Award for the year 2007” by ISTE in 2008, “Best
Faculty Award” in Academic Brilliance Awards, 2013 at New Delhi, “Distinguished
Principal Award” by CSI Mumbai in 2017 at IIT Bombay, received Deewang Mehtha
“Women in Education Award” in 2017. She was felicitated for outstanding contri-
bution and achievements in the field of Engineering in Women Engineers Meet 29th
Indian Engineering Congress, IEI held at Visveswaraiah Bhavan, Hyderabad on 18
December 2014. She received “Engineer of the year 2019 Award” by IEI, Telangana
in 2019. She received “Acharya Ratna”–National Award for life-time achievement
for the year by Indian servers in association with IT Association of AP, Telangana
xix
xx Editors and Contributors
IT Association, in 2019. She has guided 9 Ph.Ds. and currently guiding 8 research
scholars. She authored 5 text books, published more than 150 papers.
Yen-Wei Chen received the B.E. degree in 1985 from Kobe University, Kobe, Japan,
the M.E. degree in 1987 and the D.E. degree in 1990, both from Osaka University,
Osaka, Japan. He was a research fellow with the Institute for Laser Technology,
Osaka, from 1991 to 1994. From October 1994 to March 2004, he was an associate
professor and a professor with the Department of Electrical and Electronic Engi-
neering, University of the Ryukyus, Okinawa, Japan. He is currently a professor
with the college of Information Science and Engineering, Ritsumeikan University,
Japan. He is also a visiting professor with the College of Computer Science, Zhejiang
University, China. He was a visiting professor with the Oxford University, Oxford,
UK, in 2003 and a visiting professor with Pennsylvania State University, USA, in
2010. His research interests include medical image analysis, computer vision and
computational intelligence. He has published more than 300 research papers in a
number of leading journals and leading conferences including IEEE Transactions on
Image Processing, IEEE Transactions on SMC, Pattern Recognition. He has received
many distinguished awards including ICPR2012 Best Scientific Paper Award, 2014
JAMIT Best Paper Award, Outstanding Chinese Oversea Scholar Fund of Chinese
Academy of Science. He is/was a leader of numerous national and industrial research
projects.
Yu-Dong Zhang received his Ph.D. degree from Southeast University, China, in
2010. He worked as postdoc from 2010 to 2012 and a research scientist from 2012
to 2013 at Columbia University, USA. He served as Professor from 2013 to 2017
in Nanjing Normal University, where he was the director and founder of Advanced
Medical Image Processing Group in NJNU. From 2017, he served as Full Professor
in Department of Informatics, University of Leicester, UK. His research interests are
deep learning in communication and signal processing, medical image processing.
He was included in “Most Cited Chinese researchers (Computer Science)” from 2015
to 2018. He won “Emerald Citation of Excellence 2017”, and “MDPI Top 10 Most
Cited Papers 2015”. He was included in top scientist list in “Guide2Research”. He is
now the editor of Scientific Reports, Journal of Alzheimer’s Disease, International
Journal of Information Management, etc. He is the senior member of IEEE and
ACM. He has conducted and joined many successful academic grants and industrial
projects, such as NSFC, NIH, EPSRC, etc.
Contributors
Adi Narayana Reddy K. BVRIT HYDERABAD College of Engineering for

Women, Hyderabad, Telangana, India
Editors and Contributors xxi
Agarwal Deepanshi Computer Science and Engineering, Inderprastha Engineering

College, Ghaziabad, India
Agrawal Chaitanya P. Department of Computer Science and Applications,
Makhanlal Chaturvedi University, Bhopal, Madhya Pradesh, India
Amacharyulu Ippili Saikrishna Department of Civil Engineering, GIET Univer-
sity, Bhubaneswar, India
Anisha P. R. Stanley College of Engineering and Technology for Women, Hyder-
abad, Telangana, India
Anuradha T. Department of Information Technology, Velagapudi Ramakrishna
Siddhartha Engineering College, Vijayawada, India
Anuradha Y. Department of CSE, G.V.P College of Engineering (A), Visakhap-
atnam, India
Aravind Kurapati Sreenivas Department of Computer Science and Engineering,
Amrita School of Engineering, Bengaluru, India
Arora Chirag KIET Group of Institutions, Delhi-NCR, Ghaziabad, Uttar Pradesh,
India
Bala Sundar T. National Institute of Technology Puducherry, Puducherry, India
Barhate Deepti SVKM’s NMIMS MPSTME Shirpur, Dhule, Maharashtra, India;
Amity School of Engineering & Technology, Department of Computer Science &
Engineering, Amity University Rajasthan, Jaipur, India
Barpanda Nalini Kanta Department of Electronics, Sambalpur University, Burla,
Odisha, India
Begum Ghousia Stanley College of Engineering and Technology for Women,
Hyderabad, Telangana, India
Behera Santi Kumari Department of Computer Science and Engineering, VSSUT
Burla, Burla, Odisha, India
Bhateja Vikrant Department of Electronics and Communication Engineering, Shri
Ramswaroop Memorial College of Engineering and Management (SRMCEM),
Lucknow, Uttar Pradesh, India;
Dr. A. P. J. Abdul Kalam Technical University (AKTU), Lucknow, Uttar Pradesh,
India
Bhoomika S. S. Department of Information Science and Engineering, GM Institute
of Technology, Shimoga, Karnataka, India
Chakravarthy V. V. S. S. S. Raghu Institute of Technology, Dakamari, Visakhap-
atnam, Andhra Pradesh, India
Chandran Hari Shankar PSG Institute of Management, Coimbatore, Tamil Nadu,
India
xxii Editors and Contributors
Chowdary Smitha Koneru Lakshmaiah Educational Foundation, Vijayawada,

Andhra Pradesh, India
Dargar Shashi Kant Department of Electronics and Communication Engineering,
Kalasalingam Academy of Research and Education, Krishnankoil, Tamil Nadu, India
Deva Kumar I. Velagapudi Ramakrishna Siddhartha Engineering College,
Vijayawada, Andhra Pradesh, India
Devi Sowmya M. Velagapudi Ramakrishna Siddhartha Engineering College,
Dubey Ashutosh Kumar Chitkara University School of Engineering and Tech-
nology, Chitkara University, Himachal Pradesh, India
Eswari Dutta Sai Department of IT, BVRIT Hyderabad College of Engineering
for Women, Hyderabad, India
Gajavalli J. Vels Institute of Science, Technology and Advanced Studies, Chennai,
Tamil Nadu, India
Gnana Manoharan E. ECE Department, Annamalai University, Chidambaram,
India
Goud Konda Srikar Department of IT, BVRIT Hyderabad College of Engineering
Gupta Amit Department of Electronics and Communications Engineering, NEC,
Narasaraopeta, Andhra Pradesh, India
Gurudeep Mallam Department of ECE, MCET Hyderabad, Hyderabad, India
Hyma J. Department of CSE, GITAM University (Deemed to Be), Visakhapatnam,
India
Ifra Amtul B. Shadan Women’s College of Engineering and Technology, Hyder-
Indrasena Reddy M. Computer Science & Engineering Department, BVRIT
HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India
Jana Subhankar Department of Mathematics, National Institute of Technology
Silchar, Silchar, Assam, India
Jayana Rohith Kumar Prasad V. Potluri, Siddhartha Institute of Technology,
Jeevesh K. Department of Computer Science and Engineering, Amrita School of
Engineering, Bengaluru, India
Jeyalaksshmi S. Vels Institute of Science, Technology and Advanced Studies,
Chennai, Tamil Nadu, India
Editors and Contributors xxiii
Jujuroo Sowmya CSE Department, Methodist College of Engineering & Tech-

nology, Hyderabad, Telangana, India
Kalyani G. Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada,
Kamatam Soma Sai Surya Teja Prasad V. Potluri, Siddhartha Institute of Tech-
nology, Vijayawada, Andhra Pradesh, India
Kaur Hardeep University Business Schools, Chandigarh University, Mohali, India
Kaur Ravinder University Business Schools, Chandigarh University, Mohali,
India
Kavitha D. PSG Institute of Management, Coimbatore, Tamil Nadu, India
Kaza Satwik Prasad V. Potluri, Siddhartha Institute of Technology, Vijayawada,
Khanapurkar Unnati CSE Department, Methodist College of Engineering &
Technology, Hyderabad, Telangana, India
Kishor Kumar Reddy C. Stanley College of Engineering and Technology for
Kishore T. S. GMR Institute of Technology, Vizianagaram, Andhra Pradesh, India
Kompella Subhadra GITAM Deemed to Be University, Vishakhapatnam, India
Kranthi A. BVRIT Hyderabad College of Engineering for Women, Hyderabad,
Bachupally, India
Krinkin Kirill Department of Software Engineering and Computer Applications,
Saint Petersburg Electrotechnical University “LETI”, Saint Petersburg, Russia
Krishna Kishore P. Department of IT, BVRIT Hyderabad College of Engineering
Kulkarni Gopal Rao IIT BOMBAY, Mumbai, India
Kulkarni Sanket S. National Institute of Technology Puducherry, Puducherry,
India
Kumar Lolla Kiran MVGR College of Engineering (A), Vizianagaram, Andhra
Pradesh, India
Kumar Mavoori Hitesh Department of Civil Engineering, NIT Tiruchirappalli,
Tiruchirappalli, Tamil Nadu, India
Kumar Nagarampalli Manoj Department of Civil Engineering, GIET University
Gunpur, Bhubaneswar, Odisha, India
Kumar Ravuri Naveen Department of Information Technology, Velagapudi
Ramakrishna Siddhartha Engineering College, Vijayawada, India
xxiv Editors and Contributors
Kumar Sagenela Vijaya Department of Computer Science and Engineering,

School of Technology, GITAM (Deemed to be University), Hyderabad, India
Lakshmana Rao K. GMR Institute of Technology, Vizianagaram, Andhra Pradesh,
India
Lakshmi Aluri Adikavi Nannaya University, Rajamahendravaram, Andhara
Pradesh, India
Lakshmi L. BVRIT HYDERABAD College of Engineering for Women, Hyder-
abad, India
Lakshmi T. Sri Prasad V. Potluri, Siddhartha Institute of Technology, Vijayawada,
Laskari Naveen Kumar BVRIT HYDERABAD College of Engineering for
Latha D. Adikavi Nannaya University, Rajamahendravaram, Andhara Pradesh,
India
Macharyulu Ippili Saikrishna Department of Civil Engineering, GIET Univer-
sity, Bhubaneswar, Odisha, India
Mahanta Juthika Department of Mathematics, National Institute of Technology
Silchar, Silchar, Assam, India
Mahapatra Ansuman National Institute of Technology Puducherry, Karaikal,
Puducherry, India
Malik Ritika Computer Science and Engineering, Inderprastha Engineering
Marrapu Balendra Mouli Department of Civil Engineering, AITAM, Visakhap-
atnam, India
Mishra Aiswarya Department of Civil Engineering, OUTR Bhubaneswar,
Bhubaneswar, Odisha, India
Mohanta Nihar Ranjan Department of Civil Engineering, NIT Raipur, Raipur,
Chhattisgarh, India
Murali Nath R. S. BVRIT HYDERABAD College of Engineering for Women,
Hyderabad, India
Muthukumaraswamy Senthil Arumugam School of Engineering and Physical
Sciences, Heriot-Watt University, Dubai, United Arab Emirates
Naga Kalyani A. BVRIT HYDERABAD College of Engineering for Women,
Hyderabad, India
Naga Satish G. BVRIT HYDERABAD College of Engineering for Women,
Hyderabad, India
Editors and Contributors xxv
Nageswara Rao A. V. Department of Electronics and Communications Engi-

neering, NEC, Narasaraopeta, Andhra Pradesh, India
Naik Akash Department of Civil Engineering, GIET University Gunpur,
Nanda Prameet Kumar Department of Civil Engineering, GIET University
Gunpur, Gunpur, Odisha, India
Nemade Varsha Department of Computer Science & Engineering, Amity School
of Engineering & Technology, Amity University Rajasthan, Jaipur, India;
SVKM’s NMIMS MPSTME Shirpur, Dhule, Maharashtra, India
Nimbhorkar Jeet Santosh Department of Computer Science and Engineering,
Amrita School of Engineering, Bengaluru, India
Nischitha Mandala Department of Computer Science Engineering, SR Engi-
neering College, Warangal, Telangana, India
Padma Vasavi K. Shri Vishnu Engineering College for Women, Bhimavaram, India
Padmaja Usharani D. Department of Computer Science and Engineering, Raghu
Engineering College, Visakhapatnam, India
Palaniswamy Suja Department of Computer Science and Engineering, Amrita
School of Engineering, Bengaluru, India
Pathak Sunil Department of Computer Science & Engineering, Amity School of
Engineering & Technology, Amity University Rajasthan, Jaipur, India
Pemula Rambabu Department of Computer Science and Engineering, Raghu
Engineering College, Visakhapatnam, India
Poornima K. M. Department of Computer Science and Engineering, JNN College
of Engineering, Shimoga, Karnataka, India
Praneeth Nasaka Ravi Department of Information Technology, Velagapudi
Prathima K. Computer Science & Engineering Department, BVRIT HYDER-
ABAD College of Engineering for Women, Hyderabad, Telangana, India
Praveen P. Department of Computer Science and Artificial Intelligence, SR Univer-
sity, Warangal, Telangana, India
Preeti C. M. CSE Department, Institute of Aeronautical Engineering, Hyderabad,
Telangana, India
Priyanka University Institute of Computing, Chandigarh University Gharuan,
Punjab, India
Priyashree Ekka Department of Civil Engineering, GIET University Gunpur,
Gunpur, Odisha, India
xxvi Editors and Contributors
Raghava Bhimala Prasad V. Potluri, Siddhartha Institute of Technology,

Raghavaiah B. Department of Electronics and Communications Engineering,
NEC, Narasaraopeta, Andhra Pradesh, India
Raghavender Raju L. Department of Computer Science and Engineering,
Matrusri Engineering College, Hyderabad, India
Raj Ebin Deni Indian Institute of Information Technology, Kottayam, Kerala, India
Raj Surya National Institute of Technology Puducherry, Karaikal, Puducherry,
India
Rajesh Mothe SR University, Warangal, Telangana, India
Raju Bh. V. S. R. K. S R K R Engineering College, Bhimavaram, Andhra Pradesh,
India
Rakesh B. Computer Science & Engineering Department, BVRIT HYDERABAD
College of Engineering for Women, Hyderabad, Telangana, India
Ramakrishna Murty M. Department of CSE, Anil Neerukonda Institute of Tech-
nology & Sciences (ANITS), Visakhapatnam, India
Ramani B. Lakshmi Prasad V. Potluri, Siddhartha Institute of Technology,
Rambabu D. CSE Department, Sreenidhi Institute of Science & Technology,
Ranjan Mishra S. Department of CSE, Anil Neerukonda Institute of Technology &
Sciences (ANITS), Visakhapatnam, India
Rao S. Krishna Sir C R R Engineering College, Eluru, Andhra Pradesh, India
Rao Vasala Madhava Department of Civil Engineering, GIET University,
Bhubaneswar, India
Ratha Ashoka Kumar Department of Electronics, Sambalpur University, Burla,
Odisha, India
Raveendran Remya Indian Institute of Information Technology, Kottayam, Kerala,
India
Ravikanti Sandeep Methodist College of Engineering & Technology, CSE, Hyder-
abad, India
Ravinder Reddy R. Department of Computer Science and Engineering, Chaitanya
Bharathi Institute of Technology(A), Gandipet, Hyderabad, India
Ray Arkajyoti Department of Civil Engineering, GIET University, Bhubaneswar,
Odisha, India
Editors and Contributors xxvii
Rekha G. CSE Department, Kakatiya Institute of Technology & Science, Warangal,

Telangana, India
Sadaf Madiha Chaitanya Bharathi Institute of Technology, Hyderabad, India
Sahoo Abinash Department of Civil Engineering, NIT Silchar, Silchar, Assam,
India
Sahoo Gopal Krishna Department of Civil Engineering, OUTR Bhubaneswar,
Sahoo Srinibash Department of Civil Engineering, GIET University Gunpur,
Sai Rashitha Sree J. Velagapudi Ramakrishna Siddhartha Engineering College,
Sai Vignesh P. J. Rajalakshmi Engineering College, Thandalam, Chennai, India
Saikrishnamacharyulu Ippili Department of Civil Engineering, GIET University
Gunpur, Bhubaneswar, Odisha, India;
Department of Civil Engineering, GIET University Gunpur, Gunpur, Odisha, India
Samantaray Sandeep Department of Civil Engineering, OUTR Bhubaneswar,
Samatha Gaddam Department of ECE, JBIET Hyderabad, Hyderabad, India
Sandya Kakunuri GITAM Deemed to Be University, Vishakhapatnam, India
Satapathy Deba Prakash Department of Civil Engineering, OUTR Bhubaneswar,
Satapathy Suresh Chandra Department of Computer Science and Engineering,
KIIT University, Bhubaneswar, Odisha, India
Sekhar B. V. D. S. S R K R Engineering College, Bhimavaram, Andhra Pradesh,
India
Sethy Prabira Kumar Department of Electronics, Sambalpur University, Burla,
Odisha, India
Shah Dharmesh Faculty of Engineering and Technology, Sankalchand Patel
University, Visnagar, India
Shanmuga Sundari M. BVRIT Hyderabad College of Engineering for Women,
Hyderabad, Bachupally, India
Sharma Ravi Mohan Department of Computer Science and Applications,
Makhanlal Chaturvedi University, Bhopal, Madhya Pradesh, India
Sharma Sachin Department of Engineering and Physical Sciences, Institute of
Advanced Research, Gandhinagar, India
xxviii Editors and Contributors
Sheshikala M. SR University, Warangal, Telangana, India

Shiva Prakash S. P. Department of Information Science and Engineering, JSS
Science and Technology University, Mysuru, Karnataka, India
Shruthi S. K. CSE Department, Methodist College of Engineering & Technology,
Shyam Chandra Prasad G. Matrusri Engineering College, Hyderabad, Telan-
gana, India
Singh Disha Department of Electronics and Communication Engineering, Shri
India
Singh Esha Computer Science and Engineering, Inderprastha Engineering College,
Ghaziabad, India
Singhal Anjali Computer Science and Engineering, Inderprastha Engineering
Sowjanya B. CSE Department, Methodist College of Engineering & Technology,
Sravanthi Ch. G. Narayanamma Institute of Technology & Science, Hyderabad,
Telangana, India
Sreekanth N. BVRIT HYDERABAD College of Engineering for Women, Hyder-
Sreenivasa Rao S. Department of Computer Science & Engineering, MVGR
College of Engineering (A), Vizianagaram, Andhra Pradesh, India
Sridevi G. Department of Computer Science and Engineering, Raghu Engineering
College, Visakhapatnam, India
Srinivasa Rao P. Department of Computer Science & Engineering, MVGR College
of Engineering (A), Vizianagaram, Andhra Pradesh, India
Srinivasa Rao S. Department of CSE, MVGR College of Engineering (A), Viziana-
garam, Andhra Pradesh, India
Srinivasa Reddy K. Computer Science & Engineering Department, BVRIT
HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India
Srinivasulu B. CSE Department, Vidya Jyothi Institute of Technology, Hyderabad,
Telangana, India
Sudha Rani M. BVRIT Hyderabad College of Engineering for Women, Hyder-
abad, Bachupally, India
Editors and Contributors xxix
Sujatha R. PSG Institute of Management, Coimbatore, Tamil Nadu, India

Supriya Chilupuri Department of Computer Science Engineering, SR Engineering
College, Warangal, Telangana, India
Suryanandh Aakunoori Department of Computer Science Engineering, SR Engi-
neering College, Warangal, Telangana, India
Swain Harapriya Department of Civil Engineering, OUTR Bhubaneswar,
Tapasvi B. ECE Department, Annamalai University, Chidambaram, India
Thampan Navya School of Engineering and Physical Sciences, Heriot-Watt
University, Dubai, United Arab Emirates
Thirupathi Lingala CSE Department, Stanley College of Engineering and Tech-
nology for Women, Hyderabad, Telangana, India
Udaya Kumar N. ECE Department, SRKR Engineering College, Bhimavaram,
Uma Maheswari B. PSG Institute of Management, Coimbatore, Tamil Nadu, India
Upendra Kumar P. GMR Institute of Technology, Vizianagaram, Andhra Pradesh,
India
Uppalapati Bharathi Department of CSE, MVGR College of Engineering (A),
Vizianagaram, Andhra Pradesh, India
Urs Mohan Gopal Raje Department of Information Science and Engineering, JSS
Science and Technology University, Mysuru, Karnataka, India
Valecha Rajat Computer Science and Engineering, Inderprastha Engineering
Venkata Krishna Reddy M. Department of Computer Science and Engineering,
Chaitanya Bharathi Institute of Technology(A), Gandipet, Hyderabad, India
Venkataramana S. S R K R Engineering College, Bhimavaram, Andhra Pradesh,
India
Vikas Godavarthi Sri Sai Department of Information Technology, Velagapudi
Walia Kirti University Institute of Computing, Chandigarh University Gharuan,
Punjab, India
Yadav Ankit Department of Electronics and Communication Engineering, Shri
India
xxx Editors and Contributors
Yogitha Mitta Department of Computer Science Engineering, SR Engineering

College, Warangal, Telangana, India
Zanje Sakshi Department of Engineering and Physical Sciences, Institute of
Advanced Research, Gandhinagar, India
A Framework for Early Recognition
of Alzheimer’s Using Machine Learning
Approaches
Lolla Kiran Kumar, P. Srinivasa Rao, and S. Sreenivasa Rao
Abstract Alzheimer’s disease is a neurological disorder of the brain that primarily

affects the blood nuclear cells in our brain. Early detection of Alzheimer’s disease
is extremely crucial for disease prevention. Recently, the developers proposed a
pre-selection technique for measuring image similarity. However, this method has
a high computation time and a lengthy process. As a result, propose a novel
machine learning framework for classifying Alzheimer’s disease. In this paper,
several machine learning algorithms are used to classify Alzheimer’s disease in order
to predict it at an early stage. Some of these include random forest, SVM, decision
tree, and XGB classifier. Based on these algorithms propose a CatBoost classifier for
the highest accuracy. These algorithms are applied on the OASIS dataset. In these
algorithms, the CatBoost classifier achieves 85.7% accuracy on the OASIS dataset.
The findings show that this framework can be used to identify and treat Alzheimer’s
disease in healthcare at an early stage.
Keywords Alzheimer’s disease · Magnetic resonance imaging · Machine learning
1 Introduction
Alzheimer’s disease is a neurological brain disorder that gradually degrades memory

and learning abilities, as well as our ability to forget minor tasks [1, 2]. Symptoms
must be used to identify the disease in the majority of people in their mid-60s who
have it. Alzheimer’s disease strikes people between the ages of 30 and 60 without
warning [3, 4]. Alzheimer’s disease is the most common cause of mental illness
in people over 65. The disease was named after Dr. Alois Alzheimer. In 1906, Dr.
Alzheimer observed changes in the brain of a woman who died of mental illness.
L. K. Kumar (B)
MVGR College of Engineering (A), Vizianagaram, Andhra Pradesh, India
e-mail: kirankumarsharma369@gmail.com
P. Srinivasa Rao · S. Sreenivasa Rao
Department of Computer Science & Engineering, MVGR College of Engineering (A),
Vizianagaram, Andhra Pradesh, India
e-mail: siringisrao@mvgrce.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023 1
V. Bhateja et al. (eds.), Intelligent System Design, Lecture Notes in Networks
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3_1
2 L. K. Kumar et al.
Her symptoms include memory loss, language difficulties, and erratic behavior [5,
6]. Following her death, he examined her brain and discovered numerous abnormal
or unusual clumps and tangled bundles of natural fibers. These plaques and tangles
in the brain are thought to be some of the most significant symptoms of Alzheimer’s
disease [7–9]. The loss of nerve cell neural connections is another distinguishing
feature. Neurons in the body communicate with muscles and organs by sending
messages from various parts of the brain. Many other complex brain problems are
on the verge of developing Alzheimer’s disease. This damage first manifests itself in
the main part of the brain known as the hippocampus [10–12]. Our brain’s memory
is stored in the hippocampus. As neurons die in our brain, other parts of the brain
suffer. By the end stage of Alzheimer’s disease, the damage is extensive, and brain
tissue begins to shrink slowly. It is a neurodegenerative disease that causes our brain
to gradually shrink. As the disease attacks our body, they experience symptoms such
as difficulty speaking in a foreign language, mood swings, loss of motivation, self-
neglect, and some behavioral changes [13–15]. Gradually, our functionality deteri-
orates, eventually leading to death. A proper Alzheimer’s disease treatment is made
on the patient’s medical history, as well as testing with medical imaging and possibly
blood tests to mandate out other potential causes. Initial symptoms are misdiagnosed
as normal aged in people’s behavior. A detailed assessment and understanding of the
brain part is required for a definitive diagnosis, but this can only be done after death.
Good nutrition, physical activity, and social connections are known to be beneficial
in general in growing older, and these may assist in lowering the risk of dementia
and Alzheimer’s disease [16–18].
Alzheimer’s disease is the world’s most common brain disorder. Examine the
mini-mental state, age, and gender here. Examine the brain structure to see how
much risk is associated with that disease. Deep learning approaches for classification
process were demonstrated using an MRI test [19–21]. Alzheimer’s disease contains
a phase of mild cognitive impairment. Because AD can develop or not, it is one of
the methods for appropriately diagnosing patients. This disease has been identified
in a large number of people. This disease primarily affects people over the age of 65.
People frequently lose their identity at this stage. They gradually lose their memory
power [22–24]. Patients with Alzheimer’s disease and patients with mild cognitive
impairment were classified in terms of efficient accuracy. They were primarily inter-
ested in differentiating between people with AD or MCI [25–27]. The most common
form of dementia is Alzheimer’s disease. Alzheimer’s disease is manageable if mild
cognitive impairment symptoms are detected early on. Continue to improve the classi-
fication and prediction accuracy of Alzheimer’s disease. Alzheimer’s is a neurodegen-
erative disease. The dataset from the Neuroimaging initiative is being used to develop
a novel method for classifying MCI, normal control, and using structural magnetic
resonance imaging [28–30]. By analyzing the functional and anatomical changes in
the brain, a computer-based diagnosis of Alzheimer’s disease can be made. Multi-
spectral image fusion considers fusing complementary information while removing
excess data to produce a single image that contains both spatial and spectral charac-
teristics [31]. The CDR is a clinical rating scale used to characterize the severity of
dementia. As a result of the CDR value of 0, There is no dementia; however, the CDR
A Framework for Early Recognition of Alzheimer’s … 3
value is 3, indicating severe dementia [32–34]. The mini-mental state examination

(MMSE) is the most well-known tool for assessing cognitive impairment in clinical
development [35–37].
The following sections are included in this paper: Sect. 2 describes related work,
Sect. 3 demonstrates proposed methodology, Sect. 4 relates to environmental setup,
Sect. 5 explains results and discussion, and Sect. 6 details regarding conclusion and
future direction.
2 Related Work
Many researchers are working to create a model for detecting Alzheimer’s disease
early. Some researchers employ various techniques to detect the presence of
Alzheimer’s disease, such as developing a model and classifying the model to achieve
the best results.
Dill et al. [38] explained, input image is registered to the template, a difficult
process, particularly in brains. Three meta data points, such as age, gender, and range,
can be used to improve accuracy. In order to achieve better results, hippocampus
segmentation can be used. Statistical analysis can be used for hippocampus segmen-
tation. Cao et al. [39] proposed, MR images contain hippocampus-derived features.
That is used in computer-aided disease diagnosis. Previously, human annotations
have been used for hippocampus segmentation. Which of the respective prepro-
cessing techniques has the greatest computational cost? So resolve the problems. For
segmentation, a multi-deep learning method is used, as well as a regression method
with high accuracy. The advantage of this method is that this is not a time-consuming
process. In this study, we create a classification for hippocampus segmentation and
regression. Yalcin et al. [40] proposed, a rough set model is a mathematical technique
for analyzing clinical data. Physiological characteristics, diagnostics, and neurolog-
ical function values are included in this report, which is mainly intended at ill patients.
In terms of clinical power characteristics, classification techniques include support
vector machine, logistic regression, random forest, and decision trees. A set of data
sets that includes genome sequences, images, demographic information, diagnostic
tests, and environmental data. In order to refer patients to long-term care, they must
also address the chronological aspect of the disease progression model.
Shipe et al. [41] Prediction models are employed during the diagnostic testing and
therapy phases to help healthcare professionals and patients. The patient’s state is
estimated using risk prediction algorithms. For instance, the TREAT model can fore-
cast whether or not a patient’s lungs would get cancer. A prognostic model predicts
the risk factors that will exist after surgery, such as the ACS surgical risk calculator.
Lin et al. [42] proposed, In terms of predicting Alzheimer’s disease, a contempo-
rary machine learning system proposals are submitted in biological domains such
as proteomics, genotyping, and system biology. This study employed statistical and
spectral methods to aid in the diagnosis of arterial hypertension. Classifiers such as
J48 decision tree, random forest, Bayes Net, and Ripper rule-based induction are used
to extract features from BPW. Machine learning techniques, such as neuroimaging

and speech analysis, have been used to aid in the diagnosis of dementia. These types
of machine learning algorithms are commonly used in decision trees, support vector
machines, and artificial neural networks. The study’s goal is to compare the effec-
tiveness of Alzheimer’s disease differentiation in patients and healthy people. Jo
et al. [43] proposed, deep learning, a cost-cutting machine learning strategy, outper-
forms traditional machine learning. Deep learning is being used in the early stages
of Alzheimer’s disease to detect and classify it. Steps for preprocessing or architec-
tural design must be predefined. The four steps of machine learning classification
are as follows: feature extraction, selection, dimensionality reduction, and feature-
based classification algorithm selection. Deep learning is a new subfield of machine
learning research that generates features from high-dimensional medical imaging
data using raw neuroimaging data. Oh et al. [6] proposed, a volumetric convolu-
tional neural network (CNN) model for binary classification tasks for end-to-end
learning. The AD versus NC classification task is solved using unsupervised convo-
lutional auto encoder (CAE) learning, and the pMCI versus sMCI classification task
is solved using supervised transfer learning.
To handle the data, all of these techniques are used. Based on the findings, use
a variety of machine learning techniques to classify Alzheimer’s disease in order to
detect it early. The section that follows proposes a methodology and framework for
classifying Alzheimer’s disease.
3 Proposed Methodology
The proposed model is depicted in the below diagram. In this case, the data set was
obtained from the repository. The first step is to pre-process the data. The second
step is to apply all classifiers to pre-processed data. The third step is for classifiers to
divide the pre-processed data set into train data set and test data set. Finally, predict
the data based on the results.
3.1 Model Diagram
Preprocessing is the first step in any data classification process. Data cleaning, data
integration, data transformation, and data reduction are some of the techniques used in
preprocessing. Incomplete, noisy, and inconsistent data are common characteristics
of real-world data. Data cleaning and cleansing methods aim to fill in missing values,
smooth out noise while identifying outliers, and fix data errors. Data can be noisy,
and attribute values can be erroneous. The data collection instruments may be faulty
as a result of the following. Data entry mistakes could have been made by humans
or computers. Data transfer errors are possible as well. As in data warehousing,
data integration is used in data analytic tasks that combine data from numerous
Fig. 1 Schematic representation of proposed model
sources into a logical data storage. Binning, grouping, and regression are examples
of such procedures. Aggregation is the process of applying summary or aggregation
operations to data. Using concept hierarchies, low-level or primitive/raw data is
replaced with higher-level concepts in generalization of the data (Fig. 1).
To categorize the OASIS data set, various machine learning classifiers are used
in this paper. The dataset is classified using random forest, SVM, decision tree, and
XGB classifier. For the best outcomes, make a framework in machine learning algo-
rithms. As a result, the framework known as a CatBoost classifier is recommended
in this paper. Applying all methods to the OASIS dataset allowed us to determine
the categorization accuracy for usage in the future. In this paper, random forest,
support vector machine, decision tree classifier, and XGB classifiers are compared
to CatBoost classifier to determine which has the highest accuracy.
3.2 CatBoost Classifier
The CatBoost algorithm has a wide range of parameters for fine-tuning the features
during the processing stage. Gradient boosting is a machine learning algorithm for
problem solving in classification and regression. The results are in a classification
algorithm based on a collection of weak prediction models, typically decision trees.
CatBoost can boost model performance while decreasing overfitting and tuning time.
There are several parameters that can be twisted with CatBoost.
3.2.1 Algorithm Description
Input: training set {(Xk , Yk )}p k=1 , number of Iterations I

1. Random Permutation of training examples
σ = Random Permutation of [1, p]
2. Ni is 0 for random permutation of 1,…, p
Ni = 0 for i = 1… p;
3. For t = 1 to I:
(a) Compute un shifted Residuals:
r i = yi −N σ (i-1) (i) for i = 1, …, p
(b) Compute Learn Model for residuals
Δ N = Learn Model ((X j , r j ): σ (j) ≤ i); for i = 1,…, p
(c) Update the Model Ni
N i = N i + ΔN for i = 1,…, p
4. Return Np
Output: Np model for target values
The CatBoost algorithm is a new gradient boosting implementation with a high

performance and greedy nature. A CatBoost is a critical algorithmic advancement
that implements an innovative ordered boosting algorithm for processing categor-
ical features. Let us demonstrate algorithm. A novel algorithm for categorical feature
processing. Gradients are estimated at each step of this algorithm using the same data
points that the current model is based on. This causes a shift in the distribution of
estimated gradients in any domain of feature space relative to the true gradient distri-
bution in that domain, resulting in overfitting. As a solution to overfitting, CatBoost
samples a new dataset independently at each step of the boosting process to obtain
residuals by applying the current model to new training subjects. CatBoost maintains
a collection of models that differ in terms of the subjects. CatBoost then computes the
residual on an example using a model that was not trained with it. A random permu-
tation of the training examples is used to accomplish this. Only the first i examples
in the permutation are used to train each model. CatBoost works the N σ (i−1) model
at each stage to calculate the residual for the rth sample. Update the N n model for
target values. Target values are output for classification.
3.3 Model Evaluation
The classifier’s performance is assessed using a confusion matrix, precision, recall,

and F1-score. A confusion matrix is a matrix in the form of a specific table that
displays true classes, classifier predicted classes, and various types of errors made by
the classifier. Using these classes, calculate accuracy, precision, recall, and F1-score.
4 Environmental Setup
Python is the most commonly used and popular programming language. Various
machine learning tasks are carried out in Python. So, for the best results, I use Jupyter
notebook to run Python modules. Classification algorithms are used to retrieve data
from a dataset and apply algorithms to determine the best accuracy. Python 3.7.6 is
used here. Jupyter notebook is a free and open source web application for producing
documents. Windows 10 operating system is used for application development. Intel
core processor and 8GB RAM were used to implement the application. This paper
may make use of data from the OASIS dataset. This data set includes 150 individuals
ranging in age from 60 to 96. On the 373 image collections in the data set, each
subject was scanned twice or more. Non-dementia is assigned to 72 individuals.
Sixty-four subjects have been classified as demented, while 14 have been classified
as converted. Various classification algorithms are used in this data set to determine
the best classification accuracy for dividing two groups. Demented and non-demented
people are separated into two groups. A data set is a group of fields and records. The
data set is made up of real-world data. The majority of the data is raw and will be
used in the future. To classify data, must be perform several operations on the data
set.
5 Results and Discussion
A classifier’s performance is evaluated using the confusion matrix. When evaluating

a confusion matrix, accuracy, precision, and recall are critical metrics.
True Positive + True Negative

Accuracy =
True Positive + False Positive + True Negative + False Negative
(1)
True Positive
Precision = (2)
True Positive + FalsePositive
True Positive
Recall = (3)
True Positive + FalseNegative
2 ∗ Precision ∗ Recall
F1 − score = (4)
Precision + Recall
Error Rate = 1 − Accuracy (5)

Table 1 Accuracy of a
Classifier Accuracy Error rate
machine learning classifiers
Random forest 0.8035 0.1965
Support vector machine 0.7767 0.2233
Decision tree 0.7946 0.2054
Extreme gradient boost 0.8392 0.1608
Cat boosting 0.8571 0.1429
Accuracy Chart
0.9
0.8
0.7
0.6
0.5
0.4 Accuracy
0.3 Error Rate
0.2
0.1
0
Random Support Decision Tree Extreme Cat Boosting
Forest Vector Gradient
machine Boost
Fig. 2 Accuracy chart for a machine learning classifiers
5.1 Accuracy
The accuracy of a classification model refers to how well it classifies data samples.
Table 1, the CatBoost classifier has accuracy of 85.71%. The accuracy of the
random forest and support vector machine is 80.35% and 77.67%, respectively.
The results of decision tree and extreme gradient boosting are 79.46% and 83.92%,
respectively. This demonstrates that the CatBoost classifier produces the best results.
Table 1 depicts how different machine learning classification algorithms are used
to determine accuracy.
Figure 2 represents accuracy and error rate of machine learning classifiers.
CatBoost produces high accuracy in chart.
5.2 Precision
A classification model’s precision refers to the percentage of positive predicted values

that are actually positive.
Table 2 Precision of a
Classifier Precision
Random forest 0.83
Support vector machine 0.84
Decision tree 0.84
Extreme gradient boost 0.86
Cat boosting 0.89
Precision
0.9
0.89
0.88
0.87
0.86
0.85
0.84 Precision
0.83
0.82
0.81
0.8
machine Boost
Fig. 3 Precision chart for a machine learning classifiers
to determine precision.
Figure 3 represents precision of machine learning classifiers. CatBoost produces
high precision in chart.
According to Table 2, the CatBoost classifier has precision of 89%. The preci-
sion of the support vector machine and decision tree is 84% and 84%, respectively.
The results of random forest and extreme gradient boosting are 83% and 86%,
respectively. This demonstrates that the CatBoost classifier produces the best results.
5.3 Recall
A classification model’s recall refers to what percentage of the total positive values
are predicted positive values.
Figure 4 represents recall of machine learning classifiers. CatBoost and extreme
gradient boost produces high recall in chart.
Table 3, the CatBoost classifier has a recall of 83%. The recall of the random
forest, support vector machine, decision tree, and extreme gradient Boost are 80%,
Recall
0.84
0.82
0.8
0.78
0.76
0.74
Recall
0.72
0.7
0.68
0.66
Forest Vector Gradient Boost
machine
Fig. 4 Recall chart for a machine learning classifiers
Table 3 Recall of a machine

Classifier Recall
learning classifiers
Random forest 0.80
Decision tree 0.77
Cat boosting 0.83
72%, 77%, and 83%, respectively. This demonstrates that the extreme gradient boost
and CatBoost classifier produces the best results.
to determine recall.
5.4 F1-Score
The F1-score of a classification model refers to how well it classifies data samples.
Figure 5 represents F1-score of machine learning classifiers. CatBoost produces
high F1-score in chart.
to determine F1-score.
According to Table 4, the Cat Boost classifier has an F1-score of 86%. The
F1-score of the random forest, support vector machine, decision tree, and extreme
gradient boost are 81%, 77%, 80%, and 85%, respectively. This demonstrates that
the CatBoost classifier produces the best results.
F1-score
0.88
0.86
0.84
0.82
0.8
0.78 F1-score
0.76
0.74
0.72
machine Boost
Fig. 5 F1-score chart for a machine learning classifiers
Table 4 F1-score of a
Classifier F1-score
Random forest 0.81
Decision tree 0.80
Cat boosting 0.86
6 Conclusion and Future Direction
Machine learning techniques can aid in the early diagnosis and detection of a variety
of diseases in medicine and health care studies. According to the findings of this
study, the CatBoost classifier has an accuracy of 85.71%. According to the findings,
data can be classified into groups. As a result, the CatBoost classifier divides data
into two categories: demented and non-demented. The findings demonstrated that
the data could be classified as demented or non-demented. Use precision, recall, and
F1-score results in the same manner. To detect Alzheimer’s disease early, a demented
group was selected from a classification algorithm, and different fields were applied
to the results, fields such as the MMSE and the clinical dementia rating, which were
used to determine the disease status of the patients. Machine learning techniques can
be used successfully in disease detection, prediction, and diagnosis. Identify people
with Alzheimer’s at an early stage and recommend yoga, daily exercises, eating
healthy foods, and counseling to help them maintain their mental stability. Because
of all of the suggestions, people can avoid reaching a critical stage and live a healthy
life.
References
1. Shahbaz M, Ali S, Guergachi A, Niazi A, Umer A (2019) Classification of Alzheimer’s disease

using machine learning technique
2. Silva MVF, de Mello Gomide Loures C, Alves LCV, de Souza LC, Borges KBG, das Graças
Carvalho M (2019) Alzheimer’s disease: risk factors and potentially protective measures. J Bio
Med Sci
3. Greenwood N, Smith R (2016) The experiences of people with young-onset dementia: a meta-
ethnographic review of the qualitative literature. Maturitas
4. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM (1984) Clinical
diagnosis of Alzheimer’s disease. Neurology
5. Schachter AS, Davis KL (2000) Alzheimer’s disease. Dialouges Clin Neurosci 2
6. Oh K, Chung Y-C, Kim KW, Kim W-S, Oh I-S (2019) Classification and visualization of
Alzheimer’s disease using volumetric convolutional neural network and transfer learning. Sci
Rep
7. Hippius H, Neundfer G (2003) The discovery of Alzheimer’s disease. Nat Libr Med
8. Madhusudhana Rao TV, Latha Kalyampudi PS (2020) Iridology based vital organs malfunc-
tioning identification using machine learning techniques. Int J Adv Sci Technol 29(5):5544–
5554
9. Davatzikos C, Fan Y, Wu X, Shen D, Resnick SM (2008) Detection of prodromal Alzheimer’s
disease via pattern classification of magnetic resonance imaging. Neurobiol Aging 29(4)
10. Chupin M, Gérardin E, Cuingnet R, Boutet C, Lemieux L, Lehéricy S, Benali H, Garnero L,
Colliot O (2010) Fully automatic hippocampus segmentation and classification in Alzheimer’s
disease and mild cognitive impairment applied on data from ADNI. HHS Public Access
11. Madhusudhana Rao TV, Srinivas Y (2017) A secure framework for cloud using map reduce. J
Adv Res Dyn Control Syst (IJARDCS) 9(Sp-14):1850–1861. ISSN: 1943-023x
12. Nguyen M, He T, An L, Alexander DC, Feng J, Thomas Yeo BT (2020) Predicting Alzheimer’s
disease progression using deep recurrent neural networks
13. Melinosky C (2021) Understanding Alzheimer’s disease: the basics. WebMD
14. Srinivasa Rao P, Sushma Rani N (2017) An efficient statistical computation technique for health
care big data using R. IOP Conf Ser Mater Sci Eng 225:012159. ISSN: 1757-8981
15. Krishna Prasad MHM, Thammi Reddy K (2015) An efficient semantic ranked keyword search
of big data using map reduce. IJDTA 8(6):47–56
16. Oleksiw B (2019) What are the early signs of Alzheimer’s and am I at risk? Jackson Laboratory
17. Muppdi S, Rama Krishna Murthy M (2019) Identification of natural disaster affected area
using twitter. In: 2nd international conference on cyber security, image processing, graphics,
mobility and analytics, NCCSIGMA-2019, Advances in decision sciences, image processing,
security and computer vision. Springer Nature, pp 792–801
18. Toshkhujaev S, Lee KH, Choi KY, Lee JJ, Kwon G-R, Gupta Y, Lama RK (2020) Classification
of Alzheimer’s disease and mild cognitive impairment based on cortical and subcortical features
from MRI T1 brain images utilizing four different types of datasets. J Healthc Eng 2020
19. Qiu S, Joshi PS, Miller MI, Xue C, Zhou X, Karjadi C, Chang GH, Joshi AS, Dwyer B,
Zhu S (2020) Development and validation of an interpretable deep learning framework for
Alzheimer’s disease classification. J Neurol 143(6)
20. Bheemavarapu P, Latha Kalyampudi PS, Madhusudhana Rao TV (2020) An efficient method
for coronavirus detection through X-rays using deep neural network. J Curr Med Imaging
[online Available with ISSN: 1875-6603]
21. Vidya Sagar Appaji S, Lakshmi PV (2020) Maximizing joint probability in visual question
answering models. Int J Adv Sci Technol 29(3):3914–3923
22. Srinivasa Rao P, Krishna Prasad PESN (2017) A secure and efficient temporal features based
framework for cloud using MapReduce. In: 17th international conference on intelligent systems
design and applications (ISDA 2017), vol 736, ISSN 2194-5357. Held in Delhi, India, pp
114–123
23. Gupta Y, Lama RK, Kwon G-R (2019) Prediction and classification of Alzheimer’s disease
based on combined features from apolipoprotein-E genotype, cerebrospinal fluid, MR, and
FDG-PET imaging biomarkers. Front Comput Nuerosci
24. Billeci L, Badolato A, Bachi L, Tonacci A (2020) Machine learning for the classification
of Alzheimer’s disease and its prodromal stage using brain diffusion tensor imaging data: a
systematic review. MDPI
25. Krishna Prasad MHM, Thammi Reddy K (2014) A efficient data integration framework
in Hadoop using MapReduce. In: Computational intelligence techniques for comparative
genomics. Springer Briefs in Applied Sciences and Technology, pp 129–137. ISSN: 2191-530X
26. Li Q, Wu X, Xu L, Chen K, Yao L (2018) Classification of Alzheimer’s disease, mild cognitive
impairment, and cognitively unimpaired individuals using multi-feature kernel discriminant
dictionary learning. Front Comput Neurosci
27. Liu M, Zhang D, Shen D (2012) Ensemble sparse classification of Alzheimer’s disease.
NeuroImage
28. Latha Kalyampudi PS, Swapna D (2019) An efficient digit recognition system with an
improved pre-processing technique. In: ICICCT 2019—system reliability, quality control,
safety, maintenance and management. Springer Nature Singapore, pp 312–321
29. Khan RU, Tanveer M, Pachori RB (2020) A novel method for the classification of Alzheimer’s
disease from normal controls using magnetic resonance imaging. Expert Syst
30. Vidya Sagar Appaji S, Srinivasa Rao P (2018) A novel scheme for red eye removal with image
matching. J Adv Res Dyn Control Syst 10(13)
31. Bhateja V, Moin A, Srivastava A, Bao LN, Lay-Ekuakille A, Le D-N (2016) Multispectral
medical image fusion in contourlet domain for computer based diagnosis of Alzheimer’s
disease. Rev Sci Instrum 87(7):074303
32. Vadaparhi N, Yarramalle S (2014) A novel clustering approach using Hadoop distributed
environment. (Appl Sci Technol) 9:113–119, ISSN: 2191-530X
33. Vos SJB, Xiong C, Visser PJ, Jasielec MS, Hassenstab J, Grant EA, Cairns NJ, Morris
JC, Holtzman DM, Fagan AM (2014) Preclinical Alzheimer’s disease and its outcome: a
longitudinal cohort study. HHS Public Access
34. Zhang D, Wang Y, Zhou L, Yuan H, Shen D (2011) Multimodal classification of Alzheimer’s
disease and mild cognitive impairment. Neuro Image Sci Direct 55(3)
35. Maram B, Gopisetty GKD (2019) A framework for data security using cryptography and image
steganography. Int J Innov Technol Explor Eng (IJITEE) 8(11). ISSN: 2278-3075
36. Arevalo-Rodriguez I, Smailagic N, Figuls MRI, Ciapponi A, Sanchez-Perez E, Giannakou
A, Pedraza OL, Cosp XB, Cullum S (2015) Mini-mental state examination (MMSE) for the
detection of Alzheimer’s disease and other dementias in people with mild cognitive impairment
(MCI). Cochrane Library
37. Calero M, Gómez-Ramos A, Calero O, Soriano E, Avila J, Medina M (2015) Additional
mechanisms conferring genetic susceptibility to Alzheimer’s disease. Front Cell Neurosci 9
38. Dill V, Klein PC, Franco AR, Pinho MS (2018) Atlas selection for hippocampus segmentation:
relevance evaluation of three meta-information parameters. Comput Biol Med 95
39. Cao L, Li L, Zheng J, Fan X, Yin F, Shen H, Zhang J (2018) Multi-task neural networks for
joint hippocampus segmentation and clinical score regression. Springer Science
40. Yalcin A, Barnes LE, Centeno G, Djulvegovic B, Fabri P, Kaw A, Tsalatsanis A (2013)
Classification models in clinical decision making. University of Florida
41. Shipe ME, Deppen SA, Farjah F, Grogan EL (2019) Developing prediction models for clinical
use using logistic regression: an overview. J Thoraic Dis (4)
42. Lin S-K, Hsiu H, Chen H-S, Yang C-J (2021) Classification of patients with Alzheimer’s disease
using the arterial pulse spectrum and a multilayer-perceptron analysis. Sci Rep 11
43. Jo T, Nho K, Saykin AJ (2019) Deep learning in Alzheimer’s disease: diagnostic classification
and prognostic prediction using neuroimaging data. Front Aging Neurosci
On the Studies and Analyzes of Facial
Detection and Recognition Using
Machine Learning Algorithms
Navya Thampan and Senthil Arumugam Muthukumaraswamy
Abstract This paper compares practical machine learning-based algorithms of

detection and recognition such as Haar cascade classifier and local binary pattern
histogram (LBPH) method against GoogleNet, which uses convolutional neural
network (CNN) architecture, using transfer learning. From the comparative analyzes
and studies, it was elucidated that LBPH and Haar cascade are computationally
efficient, but CNN has more accuracy despite its longer computational time.
Keywords Convolutional neural network · Haar cascade · Transfer learning ·

OpenCV · MATLAB · Facial detection · Facial recognition · Local binary pattern
histogram
1 Introduction
In terms of computer vision, one of the most widely researched topics would be
detection and recognition. This concept is implemented in various fields like safety,
education, automobile, social media, etc. Facial detection and recognition are widely
implemented concepts, and one such example in our daily life is Facebook, where
the app automatically detects and recognizes the people in a particular photograph.
The working of different algorithms implemented is the initial base in understanding
the process of detection and recognition. The facial recognition market is predicted
to be estimated at 8.5 billion dollars by the year 2025 [1]. This proves that the market
expands with the increasing demand of employing facial detection and recognition
in almost all aspects and different industries.
N. Thampan (B) · S. A. Muthukumaraswamy

School of Engineering and Physical Sciences, Heriot-Watt University, Dubai, United Arab
Emirates
e-mail: nt37@hw.ac.uk
S. A. Muthukumaraswamy
e-mail: m.senthilarumugam@hw.ac.uk
16 N. Thampan and S. A. Muthukumaraswamy
2 Related Study
From unlocking smart phones to aiding in recognizing criminals or finding missing

persons, facial detection and recognition play a vital role and hence are reliable
and secure. A good database is required in order to achieve efficient prediction and
recognition. The first input would be the object, it would then be detected with certain
detection algorithms, and the result obtained would be analyzed with recognition
algorithms and certain relevant datasets to identify the object. The objects can differ,
and this process can be used in the detection and recognition of faces even other
objects such as license plates and vehicles.
The Haar cascade algorithm was first implemented by Viola and Jones which is
capable of detection objects quickly with extreme detection rates [2]. Along with
the intervention of AdaBoost, this classifier becomes accurate and sturdier. Gopi
Krishna et al. [3] proposed a system for face detection centered on AdaBoost algo-
rithm with Haar-like features. They devised the classifier on pipelined structure and
employed triple classifier to increase the speed of the facial detection system [3]. One
of the drawbacks of Haar cascade algorithm is that illumination affects its perfor-
mance. Hence, with proper illumination, proper detection rates have a better chance
of growing and so does its accuracy.
Convolutional neural networks are used almost everywhere to carry out large-
scale learning and training-based tasks. There have been comparisons of the ability
to detect by humans tested against a trained network. It has been revealed that the
human had a 73.1% accuracy; however, the trained network had an accuracy of about
64% [4]. Nevertheless on proposing CNN to this network, the accuracy went up to a
74.9% [4]. This evidently shows that the CNN trained network performed better than
a human’s natural ability of detection. Sharma et al., after experiments, concluded
that with the increase in the number of layers in the net, the training efficiency is
improved, and by result, a greater precision of prediction is also accomplished [5].
Some of the recognition algorithms are CNN classifiers that are used in deep learning
with the skill to process input images besides allocating exclusive weights and biases
(that are used by the algorithm to learn independently), to distinctive parts of an image
to be able to classify one feature from another images.
3 Facial Detection and Recognition Methods
Object recognition is considered as a series of tasks which are

• Image/object classification: The image is to be classified by the algorithm to its
class, may be those having only the subject in it.
• Localization: Localization insinuates the identification of the location of a partic-
ular object in the image or real-time frame by placing boundary boxes around the
specific object.
On the Studies and Analyzes of Facial Detection … 17
• Recognition: This is the last step of the process where the object is recog-
nized based on what it has been trained on. In other words, it is the process
of identification of various categories of certain objects.
3.1 Machine Learning Approach
This implementation does not require a large dataset and hence, is a better choice to
opt for, if the datasets are limited. Features in the images or object(s) can be extracted
and fed into the machine learning model by various feature extraction methods, and
subsequently, these are categorized and classified. Along with this, this mode of
implementation offers flexibility since it chooses the best outcome with the specified
features and is less intricate than deep learning. This means of working can fetch
accurate results despite the size of the dataset.
3.1.1 Haar Cascade Object Detection Algorithm
One of the most powerful and oldest algorithms used to detect an object is the Haar
cascade detection algorithm (or HC algorithm) shown in Fig. 1. Those features in
faces are extracted by the HC features.
This machine learning algorithm was proposed by Paul Viola and Michael Jones
in 2001. This method involves training on various amounts and sorts of positive and
negative images. With these negative and positive images, one can train the Haar
cascade classifier to distinguish whether a face is present or not. Furthermore, it
condenses computational time and simplifies the algorithm.
Fig. 1 HC algorithm working with features [6]

3.1.2 LBPH Face Recognizer
The LBPH makes use of four factors to analyze a face for recognition: radius, neigh-
bors, grid X, and Y coordinates. The concept is to train the LBPH algorithm with
the training datasets and yield the ID of the facial recognition that describes what
the recognized object is [7]. Its working is simple yet efficient. The concentration of
each pixel is either 0 (very less) or 1 (high), which are arranged in 3 × 3 window [7].
After this, the binary is converted to decimals and further translated to histograms
[7]. In this approach, the histogram values for each face in the training datasets can
be retrieved as shown in Fig. 2.
On comparing other local face detection algorithms like Eigenface and Fisherface,
these can be influenced by lighting and illumination conditions, whereas LBPH
algorithm obtain results regardless of the lighting circumstances involved in the
dataset. In this study, the Haar cascade classifier was considered for face detection,
and the LBPH was considered for face recognition as machine learning approach.
3.2 Deep Learning Approach
Deep learning, a subfield of machine learning, involves various networks and layers
working beneath one another. This mode of implementation can be done either from
scratch or with a pre-trained model to execute the action. If going for the initial
method, then a large number of datasets are required for the algorithm to train on
and build up its confidence score when recognizing the object. Since this model
is being built from scratch, one will have to manually assign weights and biases.
If choosing the latter, since that model would already be trained, the only addition
would be to supply the new data. This is comparatively a lesser drawn-out method
since computational time is speedy. Despite the complexity and time involved in
means of recognition, it can promise results in levels of high accuracy. In this study,
GoogleNet was considered as the deep learning approach.
Fig. 2 LPBH output and its histogram [7]

3.2.1 GoogLeNet
This deep learning algorithm based on neural networks was proposed in 2014 in the
paper ‘Going Deeper with Convolutions’ by Google. This model can detect objects in
various images included in the dataset. This CNN model does not require bounding
boxes around the detected object, and throughout their paper [8], there are evident
signs indicating that using sparse architecture is practical and much more convenient
[8]. The overall function here is to retrieve information through convolution and
pooling layers with unalike window shapes to reduce model complexity. This layout
can be seen in Fig. 3.
4 Implementation
Right algorithms need to be used for effective result. If on a small-scale-based

learning, the machine learning approach would be ideal. If looking for a large scale
or industrial applications, then the deep learning approach is preferred. This clearly
displays that each algorithm has its own merits and demerits, and the user must
choose the right approach based on needs and other requirements. The initial prepa-
ration of installing OpenCV, Python, VSCode, PyCharm, etc., was done. With these
libraries readily available, the experiments were set up.
4.1 Face Detection: Haar Cascade Detection Algorithm
The detection algorithm being evaluated first is the ‘Haar cascade object detection’
algorithm. In this case, the object is replaced by a face. With the importation of the
main library OpenCV, the algorithm file ‘Haar cascade for frontal face’ in its .xml
extension was downloaded in prior.
The training datasets consisting of three people, each containing 8 data samples
that would be useful in training to recognize them were stored. The images were
converted to its gray scale in order to minimize noise in the image as much as possible,
and boundary boxes were drawn around the detected face. Figure 4 represents the
Haar cascade algorithm’s path of work.
Fig. 3 GoogLeNet architecture layout

Fig. 4 Haar cascade classifier path of work
4.2 Face Recognition: Local Binary Pattern Histogram

(LBPH) Algorithm
The next step of evaluation was recognition of faces. Again, with the use of OpenCV
library, the Haar cascade classifier for detection of frontal face and inclusion of the
LBPH face recognizer was introduced in the initial lines of the code. In the later
stages, the images were converted to gray scale and resized for training purposes.
Once the execution was done, the algorithm started to train itself and get saved under
‘trainner.yml’. This file has all the data, such as the coordinates of the object in each
iteration of different images and many such information stored. Figure 5 shows the
working path of this algorithm.
Fig. 5 LBPH path of work

4.3 Detection and Recognition by GoogLeNet
With importing GoogLeNet on MATLAB, the pre-trained network was loaded. In

prior to this, the required image datasets were saved and reserved for the network to
train and validate its learning. In this method, two of the task particular layers of the
network were replaced with the user-specified objects for training. Using network
analyzer, it is possible to find out what layer is responsible for what task. In this
case, layers 142 and 144 were responsible in training and classification, respectively.
Figure 6 shows the network analysis on MATLAB.
The training technique used was ‘SGDM’ which is known as ‘stochastic gradient
descent with momentum’. It is a well-regarded optimization algorithm used in deep
learning. This incorporates specified training options like learning rate information,
mini-batch size, etc. [9]. This training method is commonly used for training deep
learning neural networks. Values were chosen by trial and error methods.
4.3.1 Transfer Learning (CNN)
Pre-trained GoogleNet convolutional neural network can be fine-tuned to give the

anticipated outcome. When it comes to deep learning, using transfer learning method
to train for a task is very common. The advantage of using transfer learning mode is
that it becomes much easier to train a network than initializing weights and biases
from the very scratch. This makes work quicker as well. The two layers of the network
Fig. 6 Network analysis

Fig. 7 Rephrasing pre-trained network [10]
are to be replaced by user-specified layer for classification and training purposes [10].
Figure 7 denotes the process of transfer learning in a pre-trained network.
By doing so, the weights/biases of the pre-trained model will be frozen, and as the
network trains, those values will not be trained or updated. Subsequently, this network
would only re-initialize the weights for the required purposes. On loading the pre-
trained network, the final layers are replaced. The fully connected layer is substituted
with a new fully connected layer. This new layer consists of valuable information of
class probabilities, predicted labels, etc. GoogLeNet used in this study was trained
for 6 epochs with mini-batch size of 5.
With 18 training data samples included for the face recognizer and around 40 for the
CNN model, the codes were compiled. The program Python with OpenCV library
was opted for Haar cascade and LBPH face recognizer, whereas MATLAB was opted
for exploration of the CNN-based algorithm, GoogLeNet.
5.1 Haar Cascade Face Detection (Machine Learning

Approach)
This algorithm performed well in terms of computational time; it provided the output
in only 2 s. It took 2.11 s as compiling time to detect a face in real-time video and
only 2.2 s in detecting a face in an image (inclusive of computational time). In the
initial running of the algorithm, it was only able to detect a single face from a crowd
or detect all faces except for one. Some of the results also had false positives. These
discrepancies can be seen in Fig. 8.
Tuning the parameters was found as the ideal solution to obtain all faces in a picture
with no false positives. Hence, this issue was rectified by fine-tuning constraints such
(a) (b)
Fig. 8 a Person not detected, b false detection in a picture of a group of people using HC algorithm
Fig. 9 Haar cascade object detection, detecting all faces after final tuning
as ‘scale factor’ and ‘minimum neighbors’ accordingly to the desired requirements

which can be noticed in Fig. 9.
By methods of trial and error, the needed output was obtained. It was also discov-
ered that with a greater significance of scaling factor, the faces were detected from
distance beyond, but the accuracy was risked.
5.2 LBPH Face Recognition in Real Time (Machine

Learning Approach)
OpenCV also supports LBPH face recognizer. The training datasets consisted of 8 sets
of pictures of three people. This algorithm, in the first run, was not able to recognize
faces. This would have been due to the presence of other objects/accessories in the
training images such as sunglasses. This is a factor that can affect the algorithm. With
rectifying that issue by replacing those training images with those of only proper
faces of the person (no accessories, no photos of the person with other people), the
algorithm was able to recognize some faces upon training. Initially, there were false
outcomes which were rectified by
• Readjusting the size of the training images
• Including even number of training datasets to avoid faulty learning/training.
Despite tuning certain parameters, the algorithm was not able to determine another
person. This could have been due to the fact that there were occurrences of shadow
on the test image. Shadow occurrences in face can change the confidence score
and boost more of false recognition. Another reason as to why the third person
was not recognized rightly could be due to the fact that the training images of this
person included a non-uniform number of his younger self and older self. This initial
discrepancy would have been the reason the algorithm showed false result. These
problems were rectified by including a good amount of training images.
Later, different test images of the third person were put to validation, and finally,
the algorithm was able to accurately recognize that person. After altering those
inconsistencies, the LBPH algorithm was able to train in about 2.5 s, with an overall
computational time being 3.4 s and in this interval of time, it was able to recognize
on an accuracy scale of 85% when subjected to test images.
5.3 GoogLeNet Image-Based Analysis (Deep Learning

Approach)
This pre-trained model that was loaded on MATLAB uses CNN model architected by
Google and has already been trained with more than hundred images and can classify
almost thousand different objects. This model has been used for facial detection and
recognition training by the method of ‘transfer learning’ by replacing the last two
layers of the structure with user-defined layers. By doing so, the user is allowed to
be flexible in terms of training the network to produce desired results. The dataset
consisted of training images of four people, each having a set of 10 image, where
a common split of 70% was considered for training, and the remaining 30% was
used up for validation. With the pixel and scaling ranges defined, this deep learning
algorithm was allowed to pass through the datasets and train itself. As it can be seen
in Fig. 10, the accuracy was low initially, but it gradually spiked up with rate of loss
decreasing. The training was completed with a learning rate set to 0.0003.
In the start, the network recognized the test images incorrectly. This was due to
the fact that some of the test images in the directory along with the network directory
were not in the same path. Another case arose due to the poor resizing of images
and uneven data samples submitted. After those rectifications, some of the samples
Fig. 10 Training GoogleNet based on CNN
scored a confidence rate less than 80%. To resolve this, more training datasets were
supplied, and finally, the algorithm scored a better confidence score as seen in Fig. 11.
For (c) of Fig. 11, the network felt extremely confident in seeing a black and
white scaled test image to be of Einstein. Although Einstein was included in the
dataset, the test image is not of him. A reason could be that there were more black
and white photos supplied for Einstein’s dataset. This could be rectified by supplying
the training datasets of Einstein with colored samples as well; giving uniformity in
both learning and training. For this case of GoogLeNet, it can be built to be more
accurate by supplying a vaster number of datasets to train on.
Table 1 summarizes the efficiency of algorithms obtained during the tests. The
only shortcoming for GoogLeNet is that it takes more time in training its sets (around
40 s) unlike LBPH which is much quicker. Directly inputting black and white images
(a) (b) (c)

Fig. 11 a Correct recognition of 0.84 confidence score, b correctly recognized image with a
confidence score of 0.94, c wrongly guessed picture with a confidence score of 1
Table 1 Summary of resulted efficiency of algorithms used in the analyzes

Algorithm No. of images (test) Accuracy (%) Computational time (s)
Haar cascade detection 10 98 2.2
LBPH face recognition 13 84 3.5
GoogLeNet 14 95 40
for training could save computational time, but it would not be ideal since some edges
might not be expressed, thereby affecting the algorithm.
6 Conclusion
From evaluating the Haar cascade classification for face detection and LBPH recogni-
tion with the GoogLeNet convolutional neural network, it is conspicuous that the Haar
cascade classifier with LBPH for detection and recognition, respectively, computes
much quicker and is ideal when dataset is on a smaller scale. With some parameters
tuned, facial detection algorithm can work effectively, and the user can manually
tune out false positives. The LBPH can train as quick and provide accurate results.
When compared to the previous, the GoogLeNet requires a large dataset to train and
validate over. This network takes much more time to train compared to the previous
approach, but its accuracy is much higher. To create higher accuracy, more datasets
are required. One should opt for the right approach based on their preference. If less
datasets and quick computation are the preference, then machine learning approach
is advised. If not, then the CNN network does the job with much higher accuracy
with a downside of longer computational time. In any case of facial recognition, age
factor in the test images and training set makes an impact on the algorithm and its
result; hence, algorithms can find it difficult to make the right guess in recognizing
a person of young age and their older self, but the possibility of including a rather
huge dataset of such faces may open up a powerful recognition tool.
References
1. Statista (2020) Facial recognition global market size 2025. https://www.statista.com/statistics/

1153970/worldwide-facial-recognition-revenue/
2. Viola P, Jones M (2004) Robust real-time object detection. Int J Comput Vision 57(2):137–154
3. Gopi Krishna M, Srinivasulu A, Basak T (2012) Face detection system on ada boost algorithm
using haar classifiers. Int J Modern Eng Res (IJMER) 2(6):3996–4000. www.ijmer.com [online]
4. Yang Y, Hospedales TM (2015) Deep neural networks for sketch recognition
5. Sharma N et al (2018) An analysis of convolutional neural networks for image classification.
Proc Comput Sci 132:377–384. https://doi.org/10.1016/j.procs.2018.05.198
6. OpenCV: https://docs.opencv.org/3.4/db/d28/tutorial_cascade_classifier.html
7. Face Recognition: Understanding LBPH Algorithm. Towards Data Science (2017). http://www.
towardsdatascience.com/face-recognition-how-lbph-works-90ec258c3d6b
8. Szegedy C et al (2015) Going deeper with convolutions. In: IEEE conference on computer
vision and pattern recognition (CVPR), pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594
9. MathWorks, Update parameters using stochastic gradient descent with momentum (SGDM)
MATLAB sgdmupdate. https://www.mathworks.com/help/deeplearning/ref/sgdmupdate.html
10. MathWorks, Transfer learning using pretrained network. https://in.mathworks.com/help/dee
plearning/ug/transfer-learning-using-pretrained-network.html
IPL Analysis and Match Prediction
Anjali Singhal, Deepanshi Agarwal, Esha Singh, Rajat Valecha,

and Ritika Malik
Abstract A comprehensive analysis of the complete IPL dataset and visualization

of different highlights necessary for IPL assessment is performed. Many machine
learning (classification) algorithms have been used to compare and predict the winner
of the match. Every game has its own requirements; similarly, the T-20 game also has
its own which were not satisfied by current models. By using Python, the intricacy
of data analysis is reduced as it shows the analysis results using visual portrayals.
The dataset is loaded, and pre-processing is done trailed by feature selection. Four
machine learning (classification) algorithms such as decision tree, K-nearest neigh-
bour, SVM, and random forest are applied, and the outcomes are compared. The best
of the four classification techniques is then applied to anticipate the winner of the
match and visualize the results as graphs.
Keywords Cricket prediction · Decision trees · KNN · SVM · Random forest ·

Sports analysis
1 Introduction
With innovation filling plentifully over the most recent years, an all-around getting of
data has gotten reasonably straightforward. Consequently, machine learning is ending
up being a critical example in sports assessment considering the availability of life
similarly to chronicled data. Analytics of sports would be the procedure for gathering
the previous game information and investigating it exploring basic information from
it, from an assumption that supports the powerful and dynamic judgement. It could
be anything whether to buy a player (not just in the auction), else whom to set on the
field in coming match, using more aggressive errand like setting up the systems to
match in the future relying on the prediction being made utilizing different factors
A. Singhal · D. Agarwal · E. Singh · R. Valecha (B) · R. Malik

Computer Science and Engineering, Inderprastha Engineering College, Ghaziabad, India
e-mail: rajatvalecha126@gmail.com
A. Singhal
e-mail: anjali.singhal@ipec.org.in
30 A. Singhal et al.
from past matches. Benefits of the proposed system include: (i) it has the option of
both visualization and tabular output for a few functions, (ii) it could be of great
help to captain and coaches to make the right pre-match decisions, (iii) it could be of
incredible assistance to in conclusion for individuals who are interested with regards
to IPL and its insights, (iv) it could be of extraordinary assistance to put resources
into the right group for wagering.
The dataset utilized in this work is an assortment of various match plays, and
there are around 817 match subtleties with complete data about the match winner,
location toss winner, team names, and other important attributes. The matches are
from 2008–2020. This dataset has helped us accomplish the main aim of our project.
2 Literature Survey
In most recent couple of years, numerous models were made related to sports analysis.
One of them, Banasode et al. [1], made an application for the purpose of analyzing
the data by picking up the attribute from the dataset and anticipating the fate of the
match and as well as of the players. Prediction is done for anything like which player
of the team will play good in the upcoming matches, which team will win the toss
and even the matches. Anticipating the winner of a cricket match relies upon many
factors like batsman’s performances, team’s strength, venues, weather conditions,
etc. In one of the research, the algorithms used are Naive Bayes, decision tree, and
SVM. The expectation model will have benefits for cricketing sheets like studying
the group’s solidarity and cricket examination [2].
Some also try for the mining models [3]. For making the mining model, the model
is streamlined by selecting parameters and iterating. To extract actionable patterns
and detailed statistics, the parameters are then taken care of into the dataset. This
work centres around observing the significant data about the IPL teams by using the
functions of the R package. R reduces the complexity of data analysis as it shows the
results using visual portrayals. The dataset is stacked, and a bunch of pre-handling
is done trailed by highlight determination, or we know as feature selection.
Some likewise attempt KNIME tool [4]. In his model, prediction is done by
using Euler’s strength calculation formula, KNIME tool, and Naive Bayes network.
Datasets and the previous statistics are trained in order to cover all dimensions and
important factors such as toss, venue, captains, favourite players, previous battles,
previous statistics, etc.
Amala Kaviya et al. [5] have engraved the results by using a detailed ball-by-ball
dataset of all the matches played in the history of IPL and done a comprehensive
analysis on various aspects regarding measures associated with the game alongside
pragmatic visualizations. They ranked all the players on the basis of the player ranking
index.
Some uses multivariate regression-based methodology to measure the points of
each team in the league table. The past performance of every team decides its like-
lihood of dominating a game against a specific rival. At last, a set of seven factors
IPL Analysis and Match Prediction 31
or attributes is recognized that can be used for predicting the IPL match winner [6].
Some uses logistic regression as well. Experimental results show that accuracy is
very less if we use logistic regression [7].
Lamsal and Choudhary [8] built a model after recognizing seven variables which
impact the result of an IPL match using multi-layered perceptron.
Sankaranarayanan and Sattar [9] use clustering methods as well for match predic-
tion. They used linear regression, nearest neighbouring, and clustering methods to
introduce the numerical outcomes that exhibit the performance of all the algorithm
used in result prediction of the model.
We know that cricket carries a lot of similarities in itself from baseball. Since a
great work and conversations are as of now accessible on baseball. This technique
for sabermetrics manages the use of measurable strategies to make predictions on the
sport of baseball. This paper attempts to apply comparable methodologies and proce-
dures to the sport of cricket. Execution examination utilizing bowling and batting
midpoints, economy rate, and strike rates was proposed by Lemmer separately [10].
Normal batting averages face a drawback related to the players who are not out in
a match. To overcome this drawback, Kimber and Hansford [11] came up with a
mind-blowing idea of alternate batting averages methods. To manage circumstances
when the batsman has not been out in the one-day matches.
This paper attempts to fulfil all those needs. From providing an interactive and
user-friendly portal, which would provide very advanced functionalities in order to
perform detailed exploratory analysis on all dimensions of matches.
The proposed methodology that is used has been described compactly in this archi-
tecture. The first step is the processing of datasets and loading them in the back
end. The analysis and prediction of the match are performed. Then, user interface
with different functionalities is provided, which can be used for match analysis and
prediction (Fig. 1).
For analysis, prediction, and visualization, we have implemented the following
modules.
• Processing the datasets
• Match analysis
• Visualization
• Match prediction
• Creating user interface
Fig. 1 Flow of proposed

methodology
3.1 Processing the Datasets
Two datasets that we are using are IPL_Matches_2008–2020.csv1 (IPL matches

data from 2008 to 2020) whose size is 150,820 bytes and IPL_Ball-by-Ball_2008–
2020.csv2 (IPL ball-by-ball data from 2008 to 2020), and we took these datasets
from the Kaggle repository.
After taking the dataset, data cleaning is performed on the dataset which helps
to transform the raw data into useful data. Feature selection is also performed to
maximize efficiency and performance.
We created XML files to store filtered datasets. The number of attributes in these
files is 17 and 18, and the total number of records is 817 and 193,469, respectively.
These files embody the heart of the project.
3.2 Match Analysis
This module is to analyze the datasets completely. Apart from including the basic
functionalities like previous battles, it is also integrated with advanced analysis and
visualization functionalities. A subset of them includes:
Head on Head Analysis of Teams: In this, a comparison of the two teams is
performed by analyzing the matches they played in the past against each other.
This feature will offer great help in predicting the winning team. A subset of which
includes captain’s decision after winning the toss, percentage of winning of toss
winning team.
Team Performance: In this, team’s performance will be analyzed on the basis of

three factors:
Toss: The decision after winning the toss and the percentage of winning the match
after winning the toss will be analyzed to determine the chances of winning of that
team.
Venue: The team’s overall performance and winning percentage on that venue will
be analyzed to determine the chances of winning of that team.
Defending/Chasing: The team’s overall performance on the basis of defending the
target and chasing the target will be analyzed to determine the chances of winning
of that team.
After analyzing all the factors mentioned above, an overall winning percentage
of teams will be calculated to predict the match winner.
3.3 Visualization
After analyzing the datasets, visualization of the data is performed (Figs.

2, 3, and 4).
Fig. 2 Matches played in every season

Fig. 3 Decision after

winning the toss
Fig. 4 Total matches played and matches wins
3.4 Match Prediction
For prediction, we analyze all the factors affecting the results of the match.
Prediction is made on three sets of data:
Set 1: Training Data—Season 1 to Season 10 IPL Data and Testing Data—Season
11.
12.
13.
We are using various classification algorithms for predicting the match winner.
In machine learning, classification is an important approach to classify different
classes. It is a supervised learning method in which the computer programme gains
from the training data and uses this to figure out how to classify new data. Here, four
different classification algorithms are applied, namely SVM, decision tree, K-nearest
neighbour, and random forest.
Decision Tree: A decision tree is a supervised learning algorithm that is utilized for
both regression as well as classification. Decision tree is fundamentally a graph that
uses a tree-based technique to exhibit each conceivable result of a choice.
SVM: A SVM is a supervised machine learning algorithm that can be used for
both classification and regression problems. We perform classification by finding the
hyperplane that differentiates the two classes very well.
K-Nearest Neighbour: KNN is the machine learning which is used for both regres-
sion as well as classification. A K-nearest neighbour is a classification algorithm that
aims to determine how close the group of data points are around it.
Random Forest: Random forest is a powerful and versatile supervised machine
learning algorithm that can be used for both regression as well as classification. It
develops and consolidates multiple decision trees to create a “forest”.
3.5 User Interface Creation
A Web application is created using various front-end and back-end frameworks like
React and Django to make user interaction more efficiently.
Web templates are designed to make the output more attractive to the user, and
all the ML algorithms are connected through Django in the back end.
The IPL dataset was prepared and trained in different machine learning algorithms for
the database that included all the match details from 2008 to 2020, and the accuracy
shown by the algorithms is discussed below (Tables 1 and 2).
Some of the predictions made by our model is discussed below (Table 3).
5 Conclusion
The T-20 format of cricket carries a lot of randomness as it is the shortest format, and
the whole game can be changed in just one over. Therefore, predicting the winner of
these formats is very challenging and complex. But with the help of ML algorithms,
prediction can be made more efficiently and easily.
In this study, we identified various factors that influence the results of any IPL
match like match venue, toss, and decision after winning the toss.
Table 1 Accuracy of algorithms

Algorithm Train dataset Test dataset Accuracy (%)
Decision tree IPL 2008–2017 IPL 2018 80.48
IPL 2008–2018 IPL 2019 81.95
IPL 2008–2019 IPL 2020 82.66
SVM IPL 2008–2017 IPL 2018 44.39
IPL 2008–2018 IPL 2019 43.29
IPL 2008–2019 IPL 2020 45.13
K-nearest neighbour IPL 2008–2017 IPL 2018 82.58
IPL 2008–2018 IPL 2019 78.41
IPL 2008–2019 IPL 2020 80.13
Random forest IPL 2008–2017 IPL 2018 88.78
IPL 2008–2018 IPL 2019 86.43
IPL 2008–2019 IPL 2020 88.4
Table 2 Average accuracy of algorithms

Algorithm SVM KNN Decision tree Random forest
Average accuracy (%) 44.27 80.37 81.69 87.87
Table 3 Predictions
Match Result Prediction
Decision tree SVM KNN Random forest
MI versus CSK (2018) CSK CSK CSK CSK CSK
KXIP versus DD (2018) KXIP KXIP DD DD KXIP
RCB versus KKR (2018) KKR KKR RCB KKR KKR
RCB versus CSK (2019) CSK CSK RCB RCB CSK
KKR versus SRH (2019) KKR KKR SRH SRH KKR
MI versus DC (2019) DC MI MI MI DC
MI versus CSK (2020) CSK CSK CSK MI CSK
DC versus KXIP (2020) DC DC KXIP DC DC
KKR versus RCB (2020) RCB RCB RCB RCB RCB
Different machine learning (classification) algorithms were trained on the training

datasets for this work. The algorithms used in our work to find the final evaluation
are K-nearest neighbour, SVM, decision tree, and random forest.
Amongst these techniques, random forest provided the highest accuracy of
87.87%.
For the future, we plan to expand our work on International Matches as well, which
includes International T-20 Matches, ODI Matches, and Test Matches by extending
the dataset. We also plan to make our model more accurate by using more attributes
like player’s performance, etc.
References
1. Banasode P, Patil M, Verma S (2021) Analysis and predicting results of IPL T20 matches. IOP
Conf Ser Mater Sci Eng 1065:012040
2. Srikantaiah KC, Khetan A, Kumar B, Tolani D, Patel H (2021) Prediction of IPL match outcome
using machine learning techniques. In: Proceedings of the 3rd international conference on
integrated intelligent computing communication & security (ICIIC). Atlantis highlights in
computer sciences, vol 4
3. Sudhamathy G, Raja Meenakshi G (2020) Prediction on IPL data using machine learning
techniques in R package. ICTACT J Soft Comput 11(01)
4. Bhutada S, Team (2020) IPL match prediction using machine learning. Int J Adv Sci Technol
29(5):3438–3448
5. Amala Kaviya VS, Mishra AS, Valarmathi B (2020) Comprehensive data analysis and
prediction on IPL using machine learning algorithms. Int J Emerg Technol 11(3):218–228
6. Sai Abhishek Ch, Patil KV, Yuktha P, Meghana KS, Sudhamani MV (2019) Predictive analysis
of IPL match winner using machine learning techniques. Int J Innov Technol Explor Eng
(IJITEE) 9(2S). ISSN: 2278-3075
7. Vistro DM, Rasheed F, David LG (2019) The cricket winner prediction with application of
machine learning and data analytics. Int J Sci Technol Res 8(09)
8. Lamsal R, Choudhary A (2018) Predicting outcome of Indian premier league (IPL) matches
using machine learning
9. Sankaranarayanan, Sattar J (2014) Auto-play: a data mining approach to ODI cricket simulation
and prediction. In: Proceedings of SIAM conference on data mining, pp 1–7
10. Lemmer H (2004) A measure for the batting performance of cricket players. S Afr J Res Sport
Phys Educ Recreation 26:55–64
11. Kimber AC, Hansford AR (1993) A statistical analysis of batting in cricket. J R Stat Soc
156:443–455
12. Rupai AAA, Mukta M, Islam AKMN (2020) Predicting bowling performance in cricket from
publicly available data. In: International conference on computing advancements, pp 1–6
13. Passfield L, Hopker JG (2017) A mine of information: can sports analytics provide wisdom
from your data? Int J Sports Physiol Perform 12(7):851–855
14. Gupta S, Jain H, Gupta A, Soni H (2017) Fantasy league team prediction. Int J Res Sci Eng
6(3):97–103
15. Deep Prakash Dayalbagh C, Patvardhan C, Vasantha Lakshmi C (2016) Data analytics based
deep mayo predictor for IPL-9. Int J Comput Appl 152(6):6–11
16. Kampakis S, Thomas W (2015) Using machine learning to predict the outcome of English
county twenty over cricket matches. arXiv preprint arXiv:1511.05837
17. Hajgude J, Parameshwaran A, Nambi K, Sakhalkar A, Sanghvi D (2015) IPL dream team-
A prediction software based on data mining and statistical analysis. Int J Comput Eng Appl
9(4):113–119
18. Freitas AA (2014) Comprehensible classification models—a position paper. SIGKDD Explor
15(1)
19. Halvorsen P, Sægrov S, Mortensen A, Eichhorn A, Stenhaug M, Dahl S, Stensland HK, Gaddam
VR, Griwodz C et al (2013) Bagadus: an integrated system for arena sports analytics: a soccer
case study. In: Proceedings of the 4th ACM multimedia system conference. ACM, pp 48–59
20. Saikia H, Bhattacharjee D (2011) A Bayesian classification model for predicting the
performance of all-rounders in the Indian premier league. Vikalpa 36(4):51–66
21. Lewis A (2008) Extending the range of player performance measures in one-day cricket. J
Oper Res Soc 59:729–742
22. Bandulasiri A (2008) Predicting the winner in one day international cricket. J Math Sci Math
Educ 3(1):6–17
23. Saikia H, Bhattacharjee D, Bhattacharjee A (2003) Performance based market valuation of
cricketers in IPL. Sport Bus Manage Int J 3(2):127–146
24. Ho TK (1995) Random decision forests. In: Proceedings of 3rd international conference on
document analysis and recognition, vol 1. IEEE, pp 278–282
25. https://www.rediff.com/cricket
26. https://www.iplt20.com
Application of ANN Combined
with Machine Learning for Early
Recognition of Parkinson’s Disease
Bharathi Uppalapati, S. Srinivasa Rao, and P. Srinivasa Rao
Abstract Parkinson’s disease (PD) is an intensifying neurological an illness brought

on because dopamine production is reduced. It acts as a transmitter for neuron
communication in the brain. When a sufficient amount of dopamine is not released,
the brain starts causing movement-related problems (motor symptoms) which lead
to PD. The progression of PD in people changes from non-motor symptoms to motor
symptoms. So, many researchers are trying to detect Parkinson’s disease at the prelim-
inary stage, to halt the advancement of the disorder. A hybrid neuronal fuzzy classi-
fier is proposed for early detection of PD by associating Artificial Neural Networks
(ANN) with Random Forest (RF). Different classification algorithms such as K-
Nearest Neighbor (KNN), Gradient Boost (GB), Support Vector Machine (SVM),
Decision Tree (DT) and Random Forest (RF) are employed for performance analysis.
For testing, the UCI Machine Learning Repository’s Parkinson’s disease Classifica-
tion Data Set was employed. The results showcased that the proposed neuronal fuzzy
classifier has outperformed other competing traditional classifiers with a 96.23%
overall accuracy.
Keywords Machine learning · Parkinson’s disease · Artificial neural networks ·

Random forest · Feature selection · Classification · Computer-aided detection ·
Speech features
1 Introduction
Parkinson’s disease is a neurological condition that impairs the functioning of the

central nervous system and brings about significant alterations in physical mobility,
posture, cognitive abilities, and neuro reflexes [1, 2]. PD is surfacing as a most
common problem with age growth and 10% of the patients are diagnosed before
B. Uppalapati (B) · S. Srinivasa Rao · P. Srinivasa Rao

Department of CSE, MVGR College of Engineering (A), Vizianagaram, Andhra Pradesh, India
e-mail: jayauppalapati.123@gmail.com
S. Srinivasa Rao
e-mail: siringisrao@mvgrce.edu.in
40 B. Uppalapati et al.
the age of 50 [3]. Men are most likely to get affected by Parkinson’s disease than
women by 1.5 times. Parkinson’s disease symptoms are split into two types: motor
and non-motor. Posturing difficulty, gait freezing, tremor, and tiredness are all motor
signs. Rapid Eye Movement (REM) sleep behavior disorder, cognitive impairment,
and mental problems are examples [4, 5] of non-motor symptoms [6, 7]. The actual
cause of this disease is unknown yet. But some elements are more likely to cause PD
like genetic inheritance and environmental factors. Researchers observed that gene
mutation is only observed in about 10–15% of the patients.
Parkinson’s disease symptoms are mostly unnoticeable at the beginning, but
gradually they get severe over time. As the severity increases, people experience
major changes in their body like difficulty in body movement, speaking, depression
and, memory problems [8–10]. Parkinson’s disease mainly causes the dopamin-
ergic effect, neurons degenerating less amount of dopamine, which results in weak
neurotransmission. This causes shakiness, muscle stiffness, and movement prob-
lems. The breakdown or death of neurons in the basal ganglia, the area in the brain
which controls body movement, causes Parkinson’s disease [11]. PD patients suffer
from loss of norepinephrine hormone which acts as a messenger between the nerve
endings. It controls non-motor features like heart rate and blood pressure. The diag-
nosis of Parkinson’s disease is difficult since early symptoms are also observed in
various health issues [12–14]. CT and MRI scans are used to detect such disor-
ders that cause similar symptoms. However there is no cure for Parkinson’s disease,
medicines and other therapies can help people manage their symptoms. Exercise is
the best practice to help in controlling symptoms significantly [15–17].
Deep learning is one of the methods of Machine Learning (an ability of the
machine to learn using large amounts of data sets instead of a set of instructions)
where it allows us to train machines to predict an output on a given set of inputs.
It allows us to use both structured and unstructured data to train and learn. The
biggest advantage of deep learning is, with continuous training, the architecture will
become adaptive and work on complex problems. It works like a human brain for
data management and mapping the patterns for future reference. The larger the size
of the data set more the efficient decision-making [18]. One of the hardships using
deep learning will be the cost of computational power, the larger the data set, the
computational power increases more. This will cause a lack of transparency in fault
revision [19, 20].
Deep learning is based on neural networks, which are layers of nodes that function
similarly to neurons in the human brain. Individual layers’ nodes are linked to those
of contiguous layers. Because of the amount of levels, the network is considered to be
more complex [21–23]. Signals move between nodes in an artificial neural network
and apply weights to them. A node with a higher weight will have a greater impact
on the nodes below it. The last layer assembles the weighted inputs into a result [24].
Because deep learning systems handle a vast quantity of data and perform multiple
difficult mathematical computations, they demand strong hardware. The real-world
Application of ANN Combined with Machine Learning … 41
applications of deep learning are virtual assistants (Siri and Alexa), language transla-
tion applications, chatbots used in banking and health applications, and facial recog-
nition methods to recognize a person in pictures [25, 26]. The contributions for this
paper are:
1. To distinguish between healthy and PD patients, a Parkinson’s disease classifying
system is being developed.
2. Design of a hybrid classifier by associating deep learning ANN with a Machine
Learning classification algorithm.
The remaining portion of the paper is structured as follows: Sect. 2 goes over the
literature work, Sect. 3 deals with the methodology, Sect. 4 talks about the experi-
mental setup, Sect. 5 reviews the performance analysis and experimentation results,
and Sect. 6 discusses the conclusion and future work.
2 Literature Work
At present, the available medical diagnosis methods for Parkinson’s disease are fewer,
which made many researchers look for effective solutions for detecting PD at the early
phase. Salama et al. [27] developed a method using various feature assessments and
Machine Learning classification approaches based on the study of voice problems
to enhance the diagnosis of PD. For determining the best solution to the problem, a
multi-feature evaluation approach and Machine Learning classifiers are used.
The results revealed that Naive Bayes and Random Forest both improved their
accuracy in detecting PD at a faster rate. Sivaranjini and Sujatha [28] designed an
approach for identifying MR pictures of Parkinson’s disease patients and healthy
participants using Deep Learning Neural Networks. For data categorization, the
AlexNet architecture from Convolution Neural Network is used. Transfer Learning
techniques were used to train the MR images and tested for their accuracy measures.
With the proposed approach, an accurate measure of 88.9% is obtained.
Senturk [29] suggested a method for identifying Parkinson’s disease using
Machine Learning classifiers and feature selection approaches. The Recursion
Feature Elimination approach and the Feature Importance method were used in the
feature selection process. Support Vector Machine classifier algorithm with Recur-
sive Feature Elimination method obtained 93.85% of accuracy for detecting PD. Celik
and Omurca [30] experimented by analyzing voice datasets of PD patients and healthy
control subjects. They applied Principal Component Analysis (PCA) and Information
Gain (IG) techniques for analyzing features extracted. Machine Learning classifica-
tion methods were implemented for better prediction of PD. Pahuja and Nagab-
hushan [31] experimented by processing speech datasets and implemented Machine
Learning classifiers to find the most efficient and accurate classifier for PD classi-
fication. Levenberg-Marquadart with Artificial Neural Networks was found as the
best and most accurate with 95.89% measure in Parkinson’s disease. A comparison
of related works is listed in Table 1.
Table 1 Related work comparison

Paper, Year Dataset used Classifier Accuracy (%)
[32], 2018 Telemonitoring voice dataset Deep neural network 81.66
[33], 2018 Parkinson’s voice dataset Multiple ANNs 86.47
[28], 2020 MR image dataset Alexnet 88.90
[29], 2020 Parkinson’s speech dataset SVM 93.84
[34], 2019 Parkinson’s speech dataset SMOTE and RF 94.89
[35], 2020 Non-motor features Gaussian Kernel-based Nu-SVM 95.00
[7], 2019 Parkinson’s voice dataset XGBoost 95.39
[31], 2021 Voice dataset ANN 95.89
Wang et al. [36] suggested a deep learning model and compared it with twelve
Machine Learning models for finding the best and accurate model for early predic-
tion of PD depending upon premotor indicators. The proposed deep learning model
obtained the superior accuracy of 96.45%. Byeon [35] developed a depression model
for early diagnosis of PD by implementing eight models of Support Vector Machine
ML classifier where two types of SVM are implemented with four algorithms. The
results indicated that Nu-SVM model with Gaussian-based algorithm achieved the
highest accuracy with measure of 95.0%. Grover et al. [32] developed a Deep Neural
Network model for predicting the severity of PD by analyzing the speech dataset
of PD suffering patients using TensorFlow library. The proposed model achieved
accuracy of 83.36% and 81.66% in training and testing data.
Lahmiri and Shmuel [37] presented a Machine Learning model to identify
Parkinson’s disease according to the voice patterns. This model focuses on eval-
uating the effectiveness of eight alternative pattern ranking approaches when used
in conjunction with the Nonlinear SVM to distinguish between PD patients and
healthy subjects. Shahid and Singh [38] developed a Deep Neural Network model
by analyzing speech dataset for the prediction of PD. Principal Component Anal-
ysis is implemented to lessen the input feature space. Unified Parkinson’s Disease
Rating Scale (UPDRS) score is considered for assessing the Parkinson’s disease in
the proposed model. Nilashi et al. [39] designed a method to predict UPDRS metrics
utilizing voice signals. They developed a hybrid technique for predicting the PD’s
clinical scales Total UPDRS and Motor UPDRS. They devised a hybrid method
for predicting the Total UPDRS and Motor UPDRS clinical scales for Parkinson’s
disease. The findings showed that the suggested method is effective in forecasting
PD development by lowering calculation time and enhancing efficiency.
As per the literature review, many researchers addressed the problem in detecting
Parkinson’s disease using state-of-art and advanced Machine Learning techniques
[40]. This paper focuses on designing and developing a hybrid classifier by combining
Artificial Neural Network (ANN) with ML classifier to detect PD using speech
features.
3 Methodology
With the main objective of detecting the Parkinson’s disease, usually a neurolo-
gist may suggest Single-Photon Emission Computerized Tomography (SPECT) scan
named Dopamine Transporter scan (DaTscan) [41]. Despite the fact that this helps to
strengthen the suspicion of having Parkinson’s disease, the symptoms, neurologic,
and physical testing will eventually lead to the proper diagnosis. Early detection
helps patients to receive the effective treatment and slow down the progression of
disease by providing symptomatic relief. To detect Parkinson’s disease in early stage,
an automated PD classification model using neuronal fuzzy classifier is explored in
this paper.
The architecture of the proposed neuronal fuzzy inference classification system
is presented in Fig. 1. The dataset is parted into 80% training and 20% testing data.
Correlation Coefficient is employed to eliminate the irrelevant features in the data as
a preprocessing step [42]. The filtered data is passed on to proposed neuronal fuzzy
classifier for training the model. The trained model is tested for its performance using
the testing data in classifying healthy and PD patients.
The proposed neuronal fuzzy classifier is built up by 5 layers as shown in Fig. 2.
The hybrid architecture is piled up with input layer of size 139 × 1, four hidden
layers of Mnodes each and ReLU as activation function. All the hidden layers are
dense which are fully connected to each other. The features extracted using the hidden
layers are passed on to final layer for classification which is Random Forest classifier.
Nfeatures ∗ 2
Mnodes = +2 (1)
3
The neurons in each hidden layer are fired upon satisfying the ReLU activation
function, where the input to activation function is calculated as follows:
Fig. 1 System architecture

Fig. 2 Neuronal fuzzy classifier network model
ReLU : f (h θ (x)) = h θ (x)+ = Max(0, h θ (x)) (2)
whereas the h θ (x) is attained using Eq. (3)
Σ
n
h θ (x) = wi xi + bias = w1 x1 + w2 x2 + bias (3)
i=1
where wi is the weight of the ith neuron, xi signifies the input at neuron i, bias makes
sure that the boundaries do not pass through the origin.
4 Experimental Setup
The entire experiment was carried out on a (64-bit) Windows 10 Operating System
with an Intel Core i7 processor running at 2.20 GHz, 16GB of RAM, and a 2TB
hard drive. The platform is setup with Anaconda supported by Machine Learning
and deep learning packages using Python as programming interface.
The UCI Machine Learning Repository provided the dataset “Parkinson’s Disease
Classification” [43]. The dataset consists of 754 characteristics and 756 samples from
188 individuals, with 564 samples from Parkinson’s disease patients and 192 samples
from healthy controls. The dataset description is shown in Table 2.
Table 2 Description of dataset

Dataset Features Samples Size
Parkinson’s disease 756
classification dataset 754 Parkinson’s disease Healthy controls 5.18 MB
patients
564 192
5 Performance Analysis and Experimentation Results
The proposed neuronal fuzzy classifier is assessed for its performance by surveying
different evaluation metrics precision, MSME, recall, RSME, F1-Score, and accuracy
explained as follows:
Tpositive
Precision = (4)
Fpositive + Tpositive
1 Σ( )2
n
MSE = Yi − Ŷi (5)
n i=1
Tpositive
Recall = (6)
Fnegative + Tpositive
⎡
⎟ n ( )2
⎟1 Σ
RMSE = √ Yi − Ŷi (7)
n i=1
2 × Precision × Recall
F1 − Score = (8)
Precision + Recall
Tpositive + Tnegative
Accuracy = (9)
Tpositive + Fpositive + Tnegative + Fnegative
where Tpositive refers to the samples classified as PD patient correctly, Fpositive repre-
sents the samples classified as PD patient incorrectly, Tnegative refers to the samples
classified as healthy correctly and Fnegative defines the samples classified as healthy
incorrectly, yi is the actual output and, ŷi is the predicted output of the classifier and
n refers to total number of samples. The evaluation metrics of the proposed Neuronal
Fuzzy Inference Classifier are shown in Fig. 3.
The related work differentiation extracted from accuracies of PD classification
is displayed in Table 3. Accuracy of the model during training and testing stage is
presented in Fig. 3.
The related work comparison derived from accuracies of kidney tumor classifi-
cation is presented in Table 3. The model accuracy during training and testing phase
is shown in Fig. 4.
6 Conclusion and Future Work
Parkinson’s disease is the second most prevalent neurodegenerative illness for aging
and movement disorder. Less production or loss of transmitter dopamine is the under-
lying cause of PD. Neurologists face a difficult challenge in diagnosing Parkinson’s
120
98.65 95.36 94.65 96.23
100
Scores 80
60
40
20 14.24
2.02
0
Precision Recall F-Score Accuracy MSR RMSR
Evaluation metrics
Fig. 3 Performance metrics for proposed neuronal classifier
Table 3 Accuracy
Paper, Year Dataset used Classifier Accuracy (%)
comparison for various
classifiers [32], 2018 Telemonitoring Deep Neural 81.66
voice dataset Network
[28], 2020 MR image Alexnet 88.90
dataset
[34], 2019 Parkinson’s SMOTE and RF 94.89
speech dataset
[7], 2019 Parkinson’s XGBoost 95.39
voice dataset
[31], 2021 Voice dataset ANN 95.89
This paper Parkinson’s Neuronal fuzzy 96.23
speech dataset classifier
Fig. 4 Model accuracy

disease (PD) at an early stage before the condition progresses. This research tests an
automated end-to-end classification approach for early identification of Parkinson’s
disease. The Voice-based Parkinson’s disease Classification dataset is used in this
approach. A hybrid neuronal fuzzy classifier is developed by combining Artificial
Neural Network and Random Forest classifier for classification of healthy and PD
patient class. The proposed neuronal fuzzy classifier is compared with GB, SVM,
KNN, DT, and RF for its performance analysis. Among all the compared classi-
fication algorithms, the hybrid neuronal fuzzy classifier achieved superior accu-
racy of 96.23% in classifying Healthy and PD patients. Design and implementa-
tion of different architectures using deep learning techniques and surveying large
volume datasets for improved classification performance in detecting PD patients
are considered as future work.
References
1. Gokul S, Sivachitra M, Vijayachitra S (2013) Parkinson’s disease prediction using machine

learning approaches. In: 2013 fifth international conference on advanced computing (ICoAC),
pp 246–252. https://doi.org/10.1109/ICoAC.2013.6921958
2. Krishna Prasad MHM, Thammi Reddy K (2015) An efficient semantic ranked keyword search
of big data using map reduce. IJDTA 8(6):47–56
3. Gao C, Sun H, Wang T et al (2018) Model-based and model-free machine learning techniques
for diagnostic prediction and classification of clinical outcomes in Parkinson’s disease. Sci Rep
8:7129. https://doi.org/10.1038/s41598-018-24783-4
4. Vidya Sagar Appaji S, Srinivasa Rao P (2018) A novel scheme for red eye removal with image
matching. J Adv Res Dyn Control Syst 10(13)
5. Muppdi S, Rama Krishna Murthy M (2019) Identification of natural disaster affected area
using twitter. In: 2nd international conference on cyber security, image processing, graphics,
mobility and analytics, NCCSIGMA-2019. Advances in decision sciences, image processing,
security and computer vision. Springer Nature, pp 792–801
6. Prashanth R, Dutta Roy S, Mandal PK, Ghosh S (2016) High-accuracy detection of early
Parkinson’s disease through multimodal features and machine learning. Int J Med Inform
90:13–21. https://doi.org/10.1016/j.ijmedinf.2016.03.001
7. Nissar I, Rizvi D, Masood S, Mir A (2018) Voice-based detection of Parkinson’s disease through
ensemble machine learning approach: a performance study. EAI Endorsed Trans Pervasive
Health Technol 5:162806. https://doi.org/10.4108/eai.13-7-2018.162806
8. Campbell MC, Myers PS, Weigand AJ, Foster ER, Cairns NJ, Jackson JJ, Lessov-Schlaggar CN,
Perlmutter JS (2020) Parkinson disease clinical subtypes: key features and clinical milestones.
Ann Clin Transl Neurol 7(8):1272–1283. https://doi.org/10.1002/acn3.51102
9. Madhusudhana Rao TV, Srinivas Y (2017) A secure framework for cloud using map reduce. J
Adv Res Dyn Control Syst (IJARDCS) 9(Sp-14):1850–1861. ISSN: 1943-023x
10. Mostafa SA, Mustapha A, Khaleefah SH, Ahmad MS, Mohammed MA (2018) Evaluating the
performance of three classification methods in diagnosis of Parkinson’s disease. In: Ghazali
R, Deris M, Nawi N, Abawajy J (eds) Recent advances on soft computing and data mining.
SCDM 2018. Advances in intelligent systems and computing, vol 700. Springer, Cham. https://
doi.org/10.1007/978-3-319-72550-5_5
11. Selvaraj S, Piramanayagam S (2019) Impact of gene mutation in the development of Parkinson’s
disease. Genes Dis 6(2):120–128. https://doi.org/10.1016/j.gendis.2019.01.004
12. Srinivasa Rao P, Sushma Rani N (2017) An efficient statistical computation technique for health
care big data using R. IOP Conf Ser Mater Sci Eng 225:012159. ISSN: 1757-8981
13. Vidya Sagar Appaji S, Lakshmi PV (2020) Maximizing joint probability in visual question
answering models. Int J Adv Sci Technol 29(3):3914–3923
14. Madhusudhana Rao TV, Latha Kalyampudi PS (2020) Iridology based vital organs malfunc-
tioning identification using machine learning techniques. Int J Adv Sci Technol 29(5):5544–
5554
15. Delaville C, Deurwaerdère PD, Benazzouz A (2011) Noradrenaline and Parkinson’s disease.
Front Syst Neurosci 5:31. https://doi.org/10.3389/fnsys.2011.00031
16. Bhat S, Rajendra Acharya U, Hagiwara Y, Dadmehr N, Adeli H (2018) Parkinson’s disease:
cause factors, measurable indicators, and early diagnosis. Comput Biol Med 102
17. Srinivasa Rao P, Krishna Prasad PESN (2017) A secure and efficient temporal features based
framework for cloud using MapReduce. In: 17th international conference on intelligent systems
design and applications (ISDA 2017), vol 736. Springer, pp 114–123. ISSN 2194-5357 Held
in Delhi, India, December 14–16, 2017
18. Lauzon FQ (2012) An introduction to deep learning. In: 2012 11th international conference on
information science, signal processing and their applications (ISSPA), pp 1438–1439. https://
doi.org/10.1109/ISSPA.2012.6310529
19. Vásquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, Eskofier B, Klucken J, Nöth E
(2019) Multimodal assessment of Parkinson’s disease: a deep learning approach. IEEE J
Biomed Health Inform 23(4):1618–1630. https://doi.org/10.1109/JBHI.2018.2866873
20. Krishna Prasad MHM, Thammi Reddy K (2014) A efficient data integration framework in
Hadoop using MapReduce. Published in Computational Intelligence Techniques for Compar-
ative Genomics, Springer Briefs in Applied Sciences and Technology, pp 129–137. ISSN:
2191-530X
21. Wodzinski M, Skalski A, Hemmerling D, Orozco-Arroyave JR, Nöth E (2019) Deep learning
approach to Parkinson’s disease detection using voice recordings and convolutional neural
network dedicated to image classification. In: 2019 41st annual international conference of the
IEEE engineering in medicine and biology society (EMBC), pp 717–720. https://doi.org/10.
1109/EMBC.2019.8856972
22. Kaur S, Aggarwal H, Rani R (2020) Hyper-parameter optimization of deep learning model for
prediction of Parkinson’s disease. Mach Vis Appl 31. https://doi.org/10.1007/s00138-020-010
78-1
23. Vadaparthi N, Srinivas Y (2014) A novel clustering approach using Hadoop distributed
environment. In: Applied science and technology, vol 9. Springer, pp 113–119. ISSN:
2191-530X
24. Walczak S (2018) Artificial neural networks. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia
of information science and technology, 4th edn. IGI Global, pp 120–131. https://doi.org/10.
4018/978-1-5225-2255-3.ch011
25. Wingate J, Kollia I, Bidaut L, Kollias S (2019) A unified deep learning approach for prediction
of Parkinson’s disease
26. Maram B, Gopisetty GKD (2019) A framework for data security using cryptography and image
steganography. Int J Innov Technol Explor Eng (IJITEE) 8(11). ISSN: 2278-3075
27. Mostafa SA, Mustapha A, Mohammed MA, Hamed RI, Arunkumar N, Ghani MKA, Jaber
MM, Khaleefah SH (2019) Examining multiple feature evaluation and classification methods
for improving the diagnosis of Parkinson’s disease. Cogn Syst Res 54. ISSN 1389-0417
28. Sivaranjini S, Sujatha CM (2020) Deep learning-based diagnosis of Parkinson’s disease using
convolutional neural network. Multimedia Tools Appl
29. Senturk ZK (2020) Early diagnosis of Parkinson’s disease using machine learning algorithms.
Med Hypotheses 138
30. Celik E, Omurca SI (2019) Improving Parkinson’s disease diagnosis with machine learning
methods. In: Scientific meeting on electrical-electronics & biomedical engineering and
computer science (EBBT)
31. Pahuja G, Nagabhushan TN (2021) A comparative study of existing machine learning
approaches for Parkinson’s disease detection. IETE J Res
32. Grover S, Bhartia S, Akshama, Yadav A, Seeja KR (2018) Predicting severity of Parkinson’s
disease using deep learning. Proc Comput Sci 132
33. Berus L, Klancnik S, Brezocnik M, Ficko M (2019) Classifying Parkinson’s disease based on
acoustic measures using artificial neural networks. Sensors 19:16. https://doi.org/10.3390/s19
010016
34. Polat K (2019) A hybrid approach to Parkinson disease classification using speech signal:
the combination of SMOTE and random forests. In: 2019 scientific meeting on electrical-
electronics & biomedical engineering and computer science (EBBT), pp 1–3. https://doi.org/
10.1109/EBBT.2019.8741725
35. Haewon B (2020) Development of a depression in Parkinson’s disease prediction model using
machine learning. World J Psychiatry 10:19
36. Wang W, Lee J, Harrou F, Sun Y (2020) Early detection of Parkinson’s disease using deep
learning and machine learning. IEEE Access 8
37. Lahmiri S, Shmuel A (2019) Detection of Parkinson’s disease based on voice patterns ranking
and optimized support vector machine. Biomed Signal Process Control 49
38. Shahid AH, Singh MP (2020) A deep learning approach for prediction of Parkinson’s disease
progression. Biomed Eng Lett
39. Nilashi M, Ibrahim O, Samad S, Ahmadi H, Shahmoradi L, Akbari E (2019) An analyt-
ical method for measuring the Parkinson’s disease progression: a case on a Parkinson’s
telemonitoring dataset. Measurement 136
40. Bheemavarapu P, Latha Kalyampudi PS, Madhusudhana Rao TV (2020) An efficient method
for coronavirus detection through X-rays using deep neural network. J Curr Med Imag [online
Available with ISSN: 1875-6603]
41. Hustad E, Aasly JO (2020) Clinical and imaging markers of prodromal Parkinson’s disease.
Front Neurol 11. https://doi.org/10.3389/fneur.2020.00395
42. Latha Kalyampudi PS, Swapna D (2019) An efficient digit recognition system with an
improved preprocessing technique. In: ICICCT 2019—system reliability, quality control,
safety, maintenance and management. Springer Nature Singapore, pp 312–321
43. https://archive.ics.uci.edu/ml/datasets/Parkinson%27s+Disease+Classification
People Count from Surveillance Video
Using Convolution Neural Net
L. Lakshmi , A. Naga Kalyani , G. Naga Satish ,

and R. S. Murali Nath
Abstract People counting is used to count the quantity of people in the picture.
People counting is not an easy task if it is done manually by our hand because we
can lost count in the middle of doing this laborious task, especially when dealing
with object that intersects with each other or dense crowd. This project automates the
counting process by building a machine learning system that can convert a video into
frames, then the model will output number of objects in a particular frame. We built
the model using convolutional neural network (CNN) technique. The system that
we built is capable of counting pedestrians in a mall. The frames/images are gener-
ated from CCTV that is placed somewhere in the mall. From those frames/images,
the system will output how many pedestrians at that particular place in the mall.
VGG16 is used to excerpt the topographies of the image and structural similarity
index (SSIM) for measuring the similarity among the given images. Then, use the
similarity measure as a loss function, named Euclidean and local pattern consistency
loss. The experimental results show the predicted number of people and exact number
of people in the image with 90% of accuracy using convolution neural net.
Keywords VGG16 · Convolution neural net · Structure similarity index · Frames ·

Surveillance video
L. Lakshmi (B) · A. Naga Kalyani · G. Naga Satish · R. S. Murali Nath

BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India
e-mail: laxmi.slv@gmail.com
A. Naga Kalyani
e-mail: Nkalyani.a@gmail.com
G. Naga Satish
e-mail: gantinagasatish@gmail.com
R. S. Murali Nath
e-mail: muralinath.r.s@gmail.com
52 L. Lakshmi et al.
1 Introduction
Counting how many objects in an image is essential for various industries and
researchers. Some of the use cases are (1) monitoring high traffic roads, or public
places. (2) Preventing people from entering the forbidden, or dangerous places. (3)
Giving information about the favorite spots where a high number of people gather.
This shows the usefulness of crowd counting in real life. People counting [1] is not an
easy task if it is done manually by our hand because we can lose count in the middle
of doing this laborious task, especially when dealing with object that intersects with
each other or dense crowd.
To solve this problem, we can automate [2] the counting process by building a
machine learning system that can receive an image that contains objects that we want
to count as an input, and then, the model will output number of objects in that image.
The system that we build is capable of counting pedestrians in a mall. The images [3]
are generated from CCTV that is placed somewhere in the mall. From those images,
the system will tell us how many pedestrians at that particular place in the mall. We
built the model using convolutional neural network (CNN) technique.
People counting is not an easy task. The objective of the project is to count [4]
the total number of people per frame which is obtained from a video, i.e., CCTV
footage. This will help us to control the people entering into heavy traffic areas and
monitoring high-traffic roads, or public places. There are so many others use cases [5]
of crowd counting that are not mentioned here. This shows the usefulness of crowd
counting in real life. The model takes a video and converts into frames/images. The
images are trained using a machine learning model and predict [6] the quantity of
individuals in every frame/image.
2 Literature Review
Counting the number of people in surveillance applications is an important appli-

cation in the present scenarios. A novel method for counting people in videos [7]
proposed uses SURF features and SVR regressor method to estimate the number of
people. They have used PEST 2009 database for training. The object detection with
grid size estimation [8] used for counting people shows good performance in terms
of predicting the count. They have performed preprocessing for foreground objects
for eliminating the small objects and noise in the video.
Real-time people counting system with one video camera provides strong esti-
mation of background scene by concentrating on grouped and ungrouped areas.
Numerous people at single location can be under counted due to shadows and occlu-
sions. So, they have performed background [9] subtraction using automatic thresh-
olding and adaptive models. After which segmentation results are post processed for
removing shadows. Adaptive Kalman filter is used for robust estimation of people
features even under heavy occlusion conditions [10].
People Count from Surveillance Video Using Convolution Neural Net 53
They are numerous algorithms for people count as the applications [11] range from
handling emergency situations in high-rise buildings to environmental conditions.
They have performed a comparative study on various algorithms that are used in
people counting in surveillance videos which are used for gradient-based methods,
differentiating frames, and transforming circular frames.
A huge number scenarios we can come across in real life where there is need to
detect and count the number of people in surveillance videos. Even though there are
people counting systems and detection systems available but are some challenges
in accurately predicting the number of people in real time scenarios. They worked
on segmentation of group of people [12] into individuals and track them over a
period of time. The literature survey we have conducted on various techniques used
for counting the number of people in a surveillance video over a period of time.
They have used different regression models and classification models. Now in our
proposed model, we are using convolution neural networks as progression of deep
learning in various applications, the convolution neural nets have shown tremendous
performance in various application.
3 Dataset
The dataset used in the model is shown in Fig. 1 which is set of images that are
generated from a single CCTV that is placed somewhere in a mall of the same spot,
which contains pedestrians who walks around the CCTV. Each image has different
number of persons. The images are generated from video with given time rate, i.e.,
frame rate. The video is of 100 s, and frame rate is 0.1 s, i.e., 1000 images/frames are
generated in the .jpg format from the video. The 1000 images extracted from video
are used as dataset.
The architecture of the proposed model is shown in Fig. 2, where ‘CONV a’ denotes
convolution with kernel size a × a, and ‘CONV T’ means transposed convolutional
layer. The structure of VGG16 shown in Fig. 3 consists of five convolution blocks
with ReLU activation function is created, and batch normalization layer inside this
block to reduce internal covariate shift of the model is added. This will cause the
model to train faster. Average pooling layer is to reduce the feature map dimensions is
also added to make the model much easier to train. Then, dropout layer for preventing
overfitting of the model. Finally, global max pooling is to reduce the depth of the
feature map is added and connected it to final activation ReLU layer. Mean squared
error loss for this task and Adam method to perform the optimization is used. Also
mean absolute error for the metrics is used.
Fig. 1 Frames generated from CCTV video
Fig. 2 Architecture of the proposed system

Fig. 3 Convolution layers of VGG16
As the dataset consists of set of images generated from surveillance video. Suppose
an image with a size of 96 × 128 is given to VGG16, the following steps are
performed.
1. First, the first ten layers of VGG16 are used for extracting the features of the
image. After features with the first ten layers of VGG16 are extracted, we got an
output with size 24 × 32 which is a quarter of the original size.
2. Second, we fed our output from VGG16 to four filters with different sizes. We
did not use the pooling layer in the set of convolutional layers, because using
many pooling layers will affect the loss of spatial information from the feature
map.
3. Third, we used feature enhancement layer, where we concatenate our output
(x_conct, contain 4 filter from the previous layers) and used flatten + MLP with
softmax function in the output layer to get the weight for each input filters. Thus,
the model will learn to give a high weight for the filter which best represents the
image.
4. Last, we need to upsampling the image with size 24 × 32 to 96 × 128. Transposed
convolutional layer for the upsampling method is used. Concatenated the x_conct
with the filter that generated by convolution, the weighted x_conct is also used.
In the last layer, we set the filter size with 1, and that filter represents the predicted
density map.
For every convolutional layer, batch normalization and ReLU activation functions
are used. Local pattern consistency loss and structural similarity index (SSIM) are
calculated as follows
1. Structural similarity index measures the similarity between two images. Then,
we use similarity measure as a loss function named local pattern consistency loss
with Euclidean loss.
2. For each predicted density map and actual density map, the similarity between
a small patch of the image with a Gaussian kernel of 12 × 16 as the weight is
counted.
3. The model loss function is defined as follow: Euclidean loss (MSE) + alpha *
local pattern consistency loss.
4. Local pattern consistency loss helps model to learn similarity between small
patches of the image.
5 Results
The convolution neural network VGG16 is used to train the model. The dataset of
images generated from surveillance video is divided into training and test dataset.
70% of data is used as training data, and 30% is used as test data. We have taken a
random image and given as input to count the number of people. Figure 4 represents
a random image given to model to count the number of people.
The model is evaluated by various convolution neural networks like VGG16,
ResNet50, and ResNetV2 with activation functions as ReLU and ELU (Table 1).
We have evaluated our proposed model with different optimizers and activation
functions. Loss and accuracy graphs of ResNetV2 optimizer with ELU and ReLU
activation functions are shown in Fig. 5. We have evaluated the model with 20 epochs,
attained train and test accuracy as 92 and 91% subsequently which is sensibly good.
Similarly, loss and accuracy graphs of ResNet50 optimizer with ELU and ReLU
activation functions are shown in Fig. 6. We have evaluated the model with 20
epochs, attained train and test accuracies as 100 and 92 subsequently but ResNet50
performed well when compared ResNetV2. Similarly, loss and accuracy graphs of
VGG16 optimizer with ELU and ReLU activation functions are shown in Fig. 7. We
Fig. 4 Test image to count

the number of people
Table 1 Comparison of loss and accuracy for VGG16, ResNet50, and ResNetV2 with activation
functions as ReLU and ELU
Activation Optimizer Train loss (%) Test loss (%) Train acc. (%) Test acc. (%)
ELU ResNetV2 59 58 92 91
ReLU ResNetV2 6 6 78 91
ELU ResNet50 4 19 100 92
ReLU ResNet50 1 1 100 100
ELU VGG16 2 3 100 100
ReLU VGG16 3 2 100 100
Fig. 5 Performance of ResNetV2 optimizer with ReLU and ELU activation functions
Fig. 6 Performance of ResNet50 optimizer with ReLU and ELU activation functions
have evaluated the model with 20 epochs, attained train and test accuracies as 100
and 100 subsequently. The performance of VGG16 is superior when compared with
ResNetV2 and ResNet50.
6 Conclusion
The advancement of deep learning techniques used for people count in surveil-
lance videos has shown significant impact in various applications with respect to
various domains. In our proposed system, we have used ResNetV2, ResNet50, and
Performance Evaluation using VGG16
Accuracy and Loss Perentages

100%
80%
60%
40%
20%
0%
Train Loss Test Loss Train Acc. Test Acc.
Activation Functions
ELU ReLU
Fig. 7 Performance of VGG16 optimizer with ReLU and ELU activation functions
VGG16 optimizers with ELU and ReLU activation functions. The structural simi-
larity index (SSIM) is used to measure the similarity between two images. Trans-
posed convolutional layer is used for the upsampling method rather than the conven-
tional upsampling method. The proposed system shows superior performance using
VGG16 model in terms of train and test accuracy in counting the number of people in
surveillance video. The future work of this application includes to detect the people
directly from ongoing CCTV footage; thus, we can predict the total number of people
in the mall as dividing video into frames may have same number of people in 5–10
frames if the person is at same place for so long.
References
1. Kowcika A (2017) People count from the crowd using unsupervised learning technique from
low resolution surveillance videos. In: 2017 international conference on energy, communica-
tion, data analytics and soft computing (ICECDS), August 2017, pp 2575–2582. https://doi.
org/10.1109/ICECDS.2017.8389919
2. CrowdNet | Proceedings of the 24th ACM international conference on Multimedia. https://doi.
org/10.1145/2964284.2967300. Accessed 9 Dec 2021
3. Pervaiz M, Jalal A, Kim K (2021) Hybrid algorithm for multi people counting and tracking
for smart surveillance. In: 2021 International Bhurban conference on applied sciences and
technologies (IBCAST), Jan 2021, pp 530–535. https://doi.org/10.1109/IBCAST51254.2021.
9393171
4. Pervaiz M, Ghadi YY, Gochoo M, Jalal A, Kamal S, Kim D-S (2021) A smart surveillance
system for people counting and tracking using particle flow and modified SOM. Sustainability
13(10), Art no. 10. https://doi.org/10.3390/su13105367
5. Park JH, Cho SI (2021) Flow analysis-based fast-moving flow calibration for a people-counting
system. Multimed Tools Appl 80(21):31671–31685. https://doi.org/10.1007/s11042-021-112
31-1
6. Lakshmi L, Reddy MP, Santhaiah C, Reddy UJ (2021) Smart phishing detection in web pages
using supervised deep learning classification and optimization technique ADAM. Wirel Pers
Commun 118(4):3549–3564. https://doi.org/10.1007/s11277-021-08196-7
7. Conte D, Foggia P, Percannella G, Tufano F, Vento M (2010) A method for counting moving
people in video surveillance videos. EURASIP J Adv Signal Process 2010(1), Art no. 1. https://
doi.org/10.1155/2010/231240
8. Agustin OC, Oh B-J (2012) People counting using object detection and grid size estimation.
In: Communication and networking. Berlin, Heidelberg, pp 244–253. https://doi.org/10.1007/
978-3-642-27192-2_29
9. Lefloch D, Alaya Cheikh F, Hardeberg J, Gouton P, Picot-Clemente R (2008) Real-time people
counting system using a single video camera. In: Proceedings of SPIE, pp 6811. https://doi.
org/10.1117/12.766499
10. Alekya L, Lakshmi L, Susmitha G, Hemanth S (2020) A survey on fake news detection in
social media using deep neural networks 9(03):4
11. Raghavachari C, Aparna V, Chithira S, Balasubramanian V (2015) A comparative study of
vision based human detection techniques in people counting applications. Proc Comput Sci
58:461–469. https://doi.org/10.1016/j.procs.2015.08.064
12. Liu X, Tu PH, Rittscher J, Perera A, Krahnstoever N (2005) Detecting and counting people in
surveillance applications. In: Proceedings of IEEE conference on advanced video and signal
based surveillance. Como, Italy, pp 306–311. https://doi.org/10.1109/AVSS.2005.1577286
Detection of Pneumonia and COVID-19
from Chest X-Ray Images Using Neural
Networks and Deep Learning
Jeet Santosh Nimbhorkar, Kurapati Sreenivas Aravind, K. Jeevesh,

and Suja Palaniswamy
Abstract Early detection of pneumonia and COVID-19 is extremely vital in order

to guarantee timely access to medical treatment. Hence, it is necessary to detect
pneumonia/COVID-19 from the X-ray images. In this paper, convolutional neural
networks along with transfer learning are used to aid in the detection of the disease.
A CNN model is proposed with four convolutional layers with four max pooling
layers, one flatten layer followed by one fully connected hidden layer and output
layer. Pre-trained models, namely AlexNet, InceptionV3, ResNet50, and VGG19
are implemented. Chest X-ray images (pneumonia), chest X-ray (COVID-19 and
pneumonia), and COVID-19 radiography database are used for implementation for
all the models. Precision, recall, and accuracy are used as performance evaluation
metrices. The performance of all the models are compared. Experimental results show
that the proposed CNN model outperforms all pre-trained models with improved
accuracy with reduced trainable parameters. The highest accuracy achieved across
all three datasets is 94.25% for the chest X-ray (COVID-19 and pneumonia) dataset.
Keywords AlexNet · Chest X-ray · Convolution neural networks · COVID-19 ·

InceptionV3 · Transfer learning · Pneumonia · ResNet50 · VGG19
1 Introduction
Pneumonia is an infection which causes the air sacs in lungs to get filled up with fluid
or pus, causing cough, difficulty in breathing, and various other breathing problems.
Pneumonia can be caused due to a variety of organisms like bacteria, virus, and
fungi. Children under the age of 2, adults over the age of 65, people who have
received/receiving chemotherapy are considered to be in the high-risk zone, and it
J. S. Nimbhorkar · K. S. Aravind (B) · K. Jeevesh · S. Palaniswamy

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita
Vishwa Vidyapeetham, Bengaluru 560035, India
e-mail: arvind.kurapati@gmail.com
S. Palaniswamy
e-mail: p_suja@blr.amrita.edu
62 J. S. Nimbhorkar et al.
could be life threatening for people in these age groups if pneumonia is not detected
and diagnosed in early stages. Study shows that in India around 4 lakh children die
every year due to pneumonia which is almost equal to fifty percent of pneumonia-
related deaths that are caused in India. Vaccines are available to prevent some types
of pneumonia to a large extent. There are various steps that a doctor might follow to
diagnose pneumonia; (1) first check about the person’s medical history or smoking
habits and check for some crackling or bubbling sound when the person inhales. (2)
The blood tests are taken to check if he/she has signs of bacterial infection. (3) A
pulse oximeter is used to measure the level of oxygen in the blood. (4) CT scan which
gives a more detailed image of the person’s lungs which is used to detect if a person
is suffering from pneumonia or not.
Recently, there have been a lot of advancements made in the field of neural
networks, deep learning, and especially in medical imaging using CNN to solve
real-world problems. Models based on CNN have been extensively used in the recent
past to classify tumors, segment images, and detection of abnormalities. By training
a CNN model, it is possible to detect the abnormalities like missing tissues and weak
diaphragm, etc. Even experienced radiologists may take a lot of time to observe these
minute details and sometimes may even miss these details which in turn leads to delay
in diagnosis and treatment. So these automated models will be really helpful to give
fast and accurate results and would also provide aid in areas which have limited
availability to skilled radiologists. Training models from scratch require a high-
computational cost. Hence, we emphasize to adopt the method of transfer learning.
We have contributed to the existing work by executing different pre-trained models
on three similar datasets with different classes in order to get a generalized solution.
The short computational characteristic of transfer learning becomes apparent when
we observe the large reduction in parameters.
The remainder sections of this paper are as follows: The literature review is
explained in Sect. 2. Various CNN architectures are discussed in Sect. 3. The proposed
CNN model is explained in Sect. 4. The experimentation and results are discussed
in Sect. 5 and the conclusion is given in Sect. 6.
2 Related Work
Artificial neural network model is used to extract knowledge and can be used to
identify redundant inputs and outputs and also for the analysis of behavior of hidden
neuron [1]. The authors have used InceptionV3, Xception, and ResNeXt models to
perform image classification of COVID-19, normal, and pneumonia images on a
single dataset [2]. One dense layer with 256 neurons was added to Xception and
Inception, and 126 neurons were added to the extra dense layer in ResNeXt.
LeakyReLU was used as the activation function instead of the originally used
ReLU function. Convolution neural network (CNN) is improved by reducing param-
eters in [3]. It is used to detect pneumonia. Retinal disease is identified using deep
learning models with transfer learning [4]. The author has used Inception-ResNetV2,
Detection of Pneumonia and COVID-19 from Chest X-Ray Images … 63
Xception, DenseNet201, and VGG19 for pneumonia detection [5]. The input tensors
for these models were reduced. A basic CNN, VGG16, VGG19, and InceptionV3
were executed on a pediatric pneumonia dataset [6]. The convolutional layers of
the pre-trained models were frozen during training. All the models had an accu-
racy above 97%. Tuberculosis disease is classified using deep learning models with
transfer learning [7] on chest X-ray images.
In [8], a modified VGGNet was used to categorize chest X-ray images into four
different categories, namely COVID-19, bacterial pneumonia, viral pneumonia, and
normal X-ray. In order to obtain a higher classification rate, three different pooling
layers are used. In [9], the authors developed an algorithm from scratch using deep
convolutional neural networks. They included three convolutional layers with a
ReLU activation for each layer. For classification of emotions, fully connected layers,
softmax, and classification output layers have been used.
Deep learning models are used to identify tumor cells [10]. The authors have
used a deep learning-based model to recognize and characterize the inconsistencies
in a given chest X-ray sample and classify them as unaffected, COVID affected,
or pneumonia [11]. Support vector machine (SVM) is used to identify tumor from
X-ray images [12].
3 CNN Architectures
A convolutional neural network (CNN) is a model comprised of multiple layers

which are composed of artificial neurons. The model can read an input image, assign
weights, and biases to the objects/features in the image which are different from each
other. Multiple models are used in this project, namely AlexNet [13], InceptionV3
[14] (pre-trained), ResNet50 [15] (pre-trained), and VGG19 [16] (pre-trained) on
ImageNet [13].
4 Proposed CNN Model
In this paper, we have proposed a CNN model with four convolutional layers as
shown in Fig. 1. The first layer is with 32 filters, second and third with 64 filters, and
fourth with 128 filters. The size of the filter is 3 × 3. Input size of the image is 150
× 150 × 3. The activation function used is ReLU. The max pooling layer of 2 × 2
size is implemented after each convolutional layer to reduce the spatial dimensions
of the output volume. The hidden layer has 64 neurons, and the output layer is the
last one.
Fig. 1 Architecture of our CNN model
5 Experimentation and Results
5.1 Dataset
In this experiment, three publicly available data from Kaggle are used. Pneumonia
chest X-ray [17] dataset is comprised of 5863 X-ray images divided into 2 cate-
gories (pneumonia/normal). COVID-19 and pneumonia [18] dataset contain a total of
6432 X-ray images with 20% data as test images divided into 3 categories (COVID-
19, pneumonia, normal). COVID-19 radiography database [19] dataset has 3616
COVID-19 positive cases, 10,192 normal, 6012 lung opacity (non-COVID lung
infection), and 1345 viral pneumonia images.
5.2 Experiment Setup
The experiment is conducted using the online Kaggle Nvidia P100 GPU with 16 GB
memory and 1.32 GHz memory clock on all the three datasets. AlexNet, InceptionV3,
ResNet50, VGG19, and our CNN model are executed using following parameters as
shown in Table 1.
Table 1 Parameters used for all the models

Output function Binary sigmoid/softmax
Loss function Binary/categorical cross entropy
Optimizer SGD/Adam
SGD(lr = 0.01, m = 0.9, w = 0.0005) for AlexNet
SGD(lr = 0.045, decay rate = 0.94 every 2 epochs, clipvalue = 0.2) for
Inception
SGD(lr = 0.1, m = 0.9, w = 0.0001) for ResNet
SGDW(lr = 0.01, m = 0.9, w = 0.0005) for VGG
Where, m = Momentum and w = weight_decay
Epochs 20
5.3 Results
In this section, Table 2 shows the precision, recall, and accuracy of all the five models
on different datasets by varying the parameters.
It is observed from the results in Table 2 that Adam optimizer is giving better
accuracy compared to SGD across all the datasets. Hence, we have implemented our
proposed CNN model with Adam optimizer with 20 epochs on all the three datasets.
The result of our proposed CNN model is shown in Table 3.
5.4 Performance Evaluation
We first built a CNN model from scratch whose architecture is as shown in Fig. 1.
The accuracy obtained when we executed the model on the dataset [17], dataset
[18], and dataset [19] are 93.58%, 94.25%, and 84.32%, respectively. On the first
dataset [17], we implemented three pre-trained models (InceptionV3, ResNet50 and
VGG19), one non-pre-trained model (AlexNet). As this dataset contains only two
classes, binary sigmoid is used as the output function, and binary cross entropy is used
as the loss function. First, the original versions of these four models were executed.
The optimizer along with its different parameters are taken into consideration for
all the four models according to [13–16]. The results are displayed in Fig. 2. It is
observed from the graphs that our proposed model outperforms all the other models.
The accuracy, precision, and recall are computed using Eq. 1, 2, and 3, respectively.
Accuracy = (TP + TN)/(TP + TN + FP + FN) (1)
Precision = TP/(TP + FP) (2)
Recall = TP/(TP + FN) (3)
The original models of ResNet50 [15] and Inceptionv3 [16] do not contain any
fully connected dense layers. Hence, we fine-tuned these two models by adding
hidden layers to get better results. A general and more widely used rule of thumb is
that the hidden layer should have less nodes than the input layer (usually capturing
70–90% of the variance in the input, according to [1]). Taking this into consideration,
one hidden layer with 2048 neurons are added initially. The results are shown in Table
2. Comparing the results from Table 2, the accuracy of InceptionV3 reduced by less
than 1% but that of ResNet50 increased by around 5%. Taking into consideration the
above-mentioned rule of thumb, we executed the same models but with 1024 neurons
in the hidden layer. As seen in Table 2, this gave better results than the previous
InceptionV3 and ResNet50 models. Hence, we proceeded with this architecture by
adding one more hidden layer with 512 neurons. This structure gave the best results
Table 2 Precision (P), recall (R), and accuracy (A) for dataset1
66
Models Optimizer Input Addition Dataset-1 Dataset-2 Dataset-3

image of dense Pneumonia Normal A COVID Normal Pneumonia A COVID Normal Pneumonia Lung A
size layers opacity
P R P R P R P R P R P R P R P R P R
AlexNet SGD 256 × – 0.63 1 1 1 0.63 0.93 0.96 0.91 0.85 0.94 0.96 0.93 0.73 0.86 0.89 0.77 0.80 0.98 0.90 0.64 0.81
256
Inception SGD 299 × – 0.83 0.98 0.95 0.68 0.87 1 0.91 0.85 0.91 0.96 0.95 0.93 0.93 0.75 0.86 0.90 0.92 0.99 0.83 0.88 0.88
299
ResNet SGD 224 × – 0.88 0.8 0.71 0.81 0.80 – – 0.68 0.68 0.77 0.88 0.75 – – 0.25 1 0 0 0 0 0.24
224
VGG SGDW 224 × – 0.92 0.89 0.82 0.86 0.88 – – 0.95 0.79 0.82 0.98 0.84 0.86 0.14 0.34 0.33 0.58 0.99 0.58 0.69 0.53
224
AlexNet Adam 256 × – 0.68 1 1 0.22 0.71 0.81 0.97 1 0.01 0.73 0.98 0.92 0.76 0.89 0.67 0.62 0.77 0.95 0.91 0.59 0.75
256
Inception Adam 299 × – 0.83 0.98 0.96 0.67 0.87 1 0.86 0.83 0.91 0.95 0.94 0.92 0.91 0.84 0.79 0.96 0.97 0.98 0.91 0.77 0.88
299 1 with 0.82 0.99 0.98 0.63 0.87 1 0.95 0.88 0.89 0.95 0.95 0.93 0.97 0.65 0.79 0.96 0.96 0.99 0.79 0.86 0.86
2048
neurons
1 with 0.86 0.97 0.93 0.73 0.88 1 0.91 0.83 0.93 0.97 0.93 0.93 0.98 0.69 0.82 0.95 0.95 0.99 0.80 0.89 0.88
1024
neurons
2 with 0.86 0.98 0.96 0.73 0.88 1 0.94 0.85 0.91 0.97 0.95 0.93 0.98 0.64 0.7 0.95 0.95 1 0.84 0.79 0.84
1024 +
512
neurons
2 with 0.85 0.97 0.94 0.71 0.87 0.86 0.92 0.76 0.73 0.86 0.93 0.93 0.96 0.75 0.77 0.97 0.97 0.99 0.84 0.8 0.87
1024 +
1024
neurons
ResNet Adam 224 × – 0.64 0.98 0.96 0.67 0.64 0.92 0.66 0.86 0.67 0.85 0.96 0.75 0.46 0.86 0.49 0.31 0.52 0.77 0.94 0.02 0.49
224
(continued)
J. S. Nimbhorkar et al.
Table 2 (continued)
Models Optimizer Input Addition Dataset-1 Dataset-2 Dataset-3
image of dense Pneumonia Normal A COVID Normal Pneumonia A COVID Normal Pneumonia Lung A
size layers opacity
P R P R P R P R P R P R P R P R P R
1 with 0.67 0.99 0.96 0.19 0.69 0.96 0.38 0.88 0.68 0.83 0.97 0.84 0.68 0.48 0.49 0.73 0.74 0.9 0.83 0.48 0.65
2048
neurons
1 with 0.73 0.98 0.92 0.39 0.75 0.92 0.66 0.86 0.67 0.85 0.96 0.85 0.84 0.33 0.52 0.35 0.57 0.97 0.6 0.71 0.59
1024
neurons
2 with 0.85 0.92 0.84 0.72 0.84 1 0.94 0.85 0.91 0.97 0.95 0.86 0.82 0.27 0.70 0.57 0.67 0.94 0.57 0.82 0.65
1024 +
512
neurons
2 with 0.74 0.98 0.92 0.83 0.77 0.94 0.66 0.69 0.88 0.92 0.86 0.85 0.54 0.83 0.75 0.70 0.76 0.84 0.75 0.29 0.66
1024 +
1024
neurons
VGG Adam 224 × – 0.91 0.97 0.94 0.84 0.92 1 0.9 0.85 0.94 0.95 0.97 0.93 0.90 0.74 0.84 0.95 0.96 0.98 0.82 0.85 0.88
244
Detection of Pneumonia and COVID-19 from Chest X-Ray Images …
67
68
Table 3 Performance of our CNN model here, P = Precision and R = Recall

Dataset Input image size Output function Loss function Pneumonia Normal COVID Lung opacity A
P R P R P R P R
1 150 × 150 Binary sigmoid Binary cross entropy 0.93 0.97 0.95 0.87 – – – – 0.93
2 150 × 150 Softmax Categorical cross entropy 0.73 0.98 0.92 0.86 0.98 0.97 – – 0.94
3 150 × 150 Softmax Categorical cross entropy 0.85 0.99 0.83 0.82 0.96 0.67 0.77 0.89 0.84
J. S. Nimbhorkar et al.
Dataset 1 Dataset 2 Dataset 3
1
0.8
0.6
0.4
0.2
0
AlexNet InceptionV3 ResNet50 VGG19 Our CNN model
Fig. 2 Accuracy of all the models across all the datasets
with a great increase in accuracy for ResNet50. Lastly, the 1024 + 1024 structure
was executed which did not give satisfactory results compared to the 1024 + 512
structure.
All these models were executed on the other two datasets as well. As there are
more than two classes, softmax was used as the output function, and categorical cross
entropy was used as the loss function for these two datasets. The precision and recall
for all the classes in each dataset are displayed in Table 2.
Out of all the models, VGG19 with Adam optimizer was the best model for the
first dataset, InceptionV3 with one dense layer (2048 neurons) was the best model
for the second dataset, and the original model of InceptionV3 with Adam optimizer
was the best model for the third dataset.
There were few observations that were similar across all the three datasets.
. There was a decrease in loss and a great increase in accuracy when the optimizer
was changed from stochastic gradient descent to Adam in VGG19 (4.3% increase
in the first dataset, 8.5% increase in the second dataset, and 34.5% increase in the
third dataset).
. The best performance of ResNet50 was obtained when two hidden layers with
1024 and 512 nodes were added.
. Out of the all the models, the performance of ResNet50 was the least.
In [11], two different datasets were used. The first dataset was processed into
an improved LeNet model’s input. Only transfer learning was used to experiment
on this dataset. On the second dataset, hyperparameters were adjusted many times
without changing the network structure, and the results between them are compared.
In [5], the authors implemented Inception-ResNetV2, Xception, and DenseNet201
on a single dataset. Only the method of transfer learning was used. In [6], the authors
implemented VGG16, VGG19, and InceptionV3 on a single dataset. Two dense
layers of 32 nodes each were added. In our paper, we experimented on three datasets
Table 4 Difference between actual and trainable parameters for all the models
Models Actual parameters Trainable parameters Difference between actual and
trainable parameters
AlexNet 58,290,945 58,288,193 2752
InceptionV3 21,804,833 2049 21,802,784
ResNet50 23,589,761 2049 23,587,712
VGG19 13,957,437 119,549,953 20,024,384
using transfer learning with different pre-trained models like AlexNet, InceptionV3,
ResNet50, and VGG19. Our own CNN model was also built from scratch, and all
the results are compared. The architectures of the models are tweaked, and dense
layers with a varying number of nodes were added which resulted in the reduction
of trainable parameters. The reduction in parameters is shown in Table 4.
Adam is a combination of the best properties of AdaGrad and RMSProp algo-
rithms. According to [11], Adam is straightforward to implement, computationally
efficient and has less memory requirements. It performs best when there are a large
number of parameters. Hence, the combination of Adam with VGG19 gave a great
increase in accuracy as VGG19 has the most number of parameters compared to the
other models used.
The concept of skip connections makes the architecture of ResNet50 [15] different
from the other models. It uses shortcut connections to solve the vanishing gradient
problem. As a result of this architecture, the accuracy of this model would have been
compromised.
6 Conclusion
Pneumonia and COVID-19 have been a major cause for a large number of deaths
across various age groups. Early detection has been proved to provide aid for
faster diagnosis and treatment. In this paper, we have implemented transfer learning
methods on existing CNN models to acquire faster results. The results were
compared between AlexNet (non-pre-trained), InceptionV3, ResNet50, VGG19, and
our proposed CNN model across three datasets. For datasets 1 and 2, the proposed
CNN model achieved the highest accuracy, while for dataset 3, it was the original
InceptionV3 model with Adam, but with minor difference.
References
1. Boger Z, Guterman H (1997) Knowledge extraction from artificial neural network models. In:
IEEE international conference on systems, man, and cybernetics. Computational cybernetics
and simulation, vol 4, pp 3030–3035. https://doi.org/10.1109/ICSMC.1997.63305
2. Jain R, Gupta M, Taneja S, Hemanth D. Deep learning based detection and analysis of COVID-
19 on chest X-ray images applied intelligence. Oct 1–11. PMCID: PMC7544769
3. Li X, Chen F, Hao H, Li M (2020) A pneumonia detection method based on improved convo-
lutional neural network. In: 2020 IEEE 4th information technology, networking, electronic and
automation control conference (ITNEC), pp 488-493. https://doi.org/10.1109/ITNEC48623.
2020.9084734
4. Kermany DS et al (2018) Identifying medical diagnoses and treatable diseases by image-based
deep learning. Cell 172(5):1122-1131.e9. https://doi.org/10.1016/j.cell.2018.02.010 PMID:
29474911
5. Jiang Z (2020) Chest X-ray pneumonia detection based on convolutional neural networks.
In: 2020 international conference on big data, artificial intelligence and internet of things
engineering (ICBAIE), Fuzhou, China, pp 341–344. https://doi.org/10.1109/ICBAIE49996.
2020.00077
6. Labhane G, Pansare R, Maheshwari S, Tiwari R, Shukla A (2020) Detection of pediatric
pneumonia from chest X-Ray images using CNN and transfer learning. In: Proceedings of
the 3rd international conference on emerging technologies in computer engineering: machine
learning and internet of things (ICETCE), Jaipur, India, 7–8 Feb 2020, pp 85–92
7. Seshu Babu G, Sachin Saj TK, Sowmya V, Soman KP (2021) Tuberculosis classifica-
tion using pre-trained deep learning models. In: advances in automation, signal processing,
instrumentation, and control, select proceedings of i-CASIC 2020, 2021, pp 767–774
8. Anand R, Sowmya V, Vijay krishnamenon, Gopalakrishnan EA, Soman KP (2021) Modified
Vgg deep learning architecture for Covid-19 classification using bio-medical images, IOP Conf
Ser Mater Sci Eng 1084:012001
9. Palaniswamy S, Suchitra (2019) A robust pose & illumination invariant emotion recognition
from facial images using deep learning for human-machine interface. In: 2019 4th international
conference on computational systems and information technology for sustainable solution
(CSITSS), pp 1–6. https://doi.org/10.1109/CSITSS47250.2019.9031055
10. Subbiah U, Kumar RV, Panicker SA, Bhalaje RA, Padmavathi S (2020) An enhanced deep
learning architecture for the classification of cancerous lymph node images. In: 2020 second
international conference on inventive research in computing applications (ICIRCA), pp 381–
386. https://doi.org/10.1109/ICIRCA48905.2020.9183250
11. Kishore SLS, Sidhartha AV, Reddy PS, Rahul CM, Vijaya D (2021) Detection and diagnosis
of covid-19 from chest X-ray images. In: 2021 7th international conference on advanced
computing and communication systems (ICACCS), pp 459–465. https://doi.org/10.1109/ICA
CCS51430.2021.9441862
12. Pooja A, Mamtha R, Sowmya V, Soman KP (2016) X-ray image classification based on tumour
using GURLS and LIBSVM. Int Conf Commun Signal Processing (ICCSP’16)
13. Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional
neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
14. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception
architecture for computer vision. IEEE Conf Comput Vision Pattern Recognit (CVPR)
2016:2818–2826. https://doi.org/10.1109/CVPR.2016.308
15. He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. IEEE
Conf Comput Vision Pattern Recognit (CVPR) 2016:770–778. https://doi.org/10.1109/CVPR.
2016.90
16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image
recognition. CoRR, abs/1409.1556
17. Chest X-ray images (Pneumonia). https://www.kaggle.com/paultimothymooney/chest-xray-
pneumonia
18. Chest X-ray (COVID-19 and Pneumonia). https://www.kaggle.com/prashant268/chest-xray-
covid19-pneumonia
19. COVID-19 Radiography Database. https://www.kaggle.com/tawsifurrahman/covid19-radiog
raphy-database
Plant Leaf Disease Detection
and Classification Using Deep Learning
Technique
S. S. Bhoomika and K. M. Poornima
Abstract Food is the main resource for humans; securing and taking care of the
plants are the number one priority. Raises in crop leaf disease are becoming a major
problem in agriculture. Taking care of the disease in the early stage will prevent
the disease from spreading between plants. Modern technology will be the way for
the detection of crop leaf disease. The deep learning technology made it easier to
detection of crop leaf diseases. The dataset used for training is publicly available. The
trained model can classify up to 15 diseases. The training accuracy reached 97.35%
which is more than enough to detect disease accurately. The proposed project can
detect crop leaf disease with higher accuracy which can be utilized to detect the
disease in the real world.
Keywords CNN (Convolutional neural network) · Conv2D (Convolutional 2

dimensional) · KNN (K-nearest neighbor) · GLCM (Gray-level co-occurrence
matrix)
1 Introduction
The modern technology helped humans in useful ways like boosting the yield of
the crops, providing a new way of harvesting the crops. The technology provided a
way to grow crops so much that they can be fed to almost 7 billion people on the
earth. Crop diseases due to changes in external factors such as a change in climate
condition, decrease in pollination, increase in the spread of disease, and so on. By
controlling one of these external factors can help in increasing the yield of crops.
The main factor that can be controlled is plant disease. Detecting the disease in an
S. S. Bhoomika (B)
Department of Information Science and Engineering, GM Institute of Technology, Shimoga,
Karnataka, India
e-mail: ssbhoomika8@gmail.com
K. M. Poornima
Department of Computer Science and Engineering, JNN College of Engineering, Shimoga,
Karnataka, India
e-mail: kmpoornima@jnnce.ac.in
74 S. S. Bhoomika and K. M. Poornima
early stage can help in saving the crops. More than 80% of the harvesting of crops in
developing countries is taken care of by the holders of small farmers. The statistical
analysis shows a loss of yield of more than 50% which is not acceptable in modern
days. The loss of yield is majorly due to the spread of diseases between leaves. The
disease is affecting major smallholder farmers. The farmers are trying their best to
minimize the crop disease and spreading of disease between plants. The farmers are
using pesticides and other preventive measures to mitigate the problem. But, these
are not effective against leaf disease. Because the disease has to be analyzed first
before applying pesticides. Wrong pesticides on the wrong plant are not effective
and also can affect the crop yield. In rural areas, most of the people are working in
the agricultural field sector. In plants, a leaf is the main part to produce food. The
major factor to decrease the yield by attacking the diseases on leaf, stems, and nodes
of the plant. So, it is required to identify the diseases in an early stage to prevent
losses.
Identification of disease on plant is very difficult due to the variety of diseases
spanning across different crops. There are many agriculture centers where they can
analyze the disease and provide the correct pesticides. Analyzing disease is a major
challenge since the crop has to be analyzed manually. The analysis of disease will be
based on the knowledge and experience of the persons and may vary between them.
This is where modern technology comes into play, which can detect the disease type
within seconds by make use of deep learning. Deep learning technology is being
applied in many fields and is proven to provide good and acceptable results.
The problem statement implements the deep learning technique to detect and
classify the leaf disease in plant in an accurate way. The main objectives are as
follows:
• To extract the features from the leaf.
• To detect the type of leaf disease.
• To classify the type of leaf disease.
2 Literature Survey
Sardogan et al. [1] have proposed a classification of leaf diseases on plant using
CNN and the LVQ algorithm. It extract a features from the image during the training
process which means that features are extracted during each epoch of the training.
In the proposed system, an algorithm called LVQ is utilized for the classification of
data and also for training purposes. The proposed system achieves 86% of accuracy.
Thejuswini et al. [2] have proposed the detection of leaves disease with fertilizer
recommendation. The system utilizes k-means clustering with SVM algorithm for
classification and detection of disease. The system reaches around 80% of accuracy.
Jasim et al. [3] have proposed the detection of plant leaf diseases using image
processing and deep learning techniques. Convolutional neural networks are imple-
mented to detect the diseases. The classification is performed based on the inputs
Plant Leaf Disease Detection and Classification Using Deep … 75
from the previous layer. The system obtains 98.29% of training accuracy and 98.02%
of testing accuracy.
Sholihati et al. [4] have proposed a classification of potato leaves diseases. The
system uses the neural network model called VGG16 and VGG19. In this proposed
system, deep learning technology is utilized which consists of convolutional neural
networks. A VGG16 model is slightly better than the VGG19 architecture model.
The proposed system achieved a maximum accuracy of 91%.
Karol et al. [5] have proposed using convolutional neural networks and image
processing techniques for detect plant disease. The proposed system database is
used to store the pesticides for the corresponding detected pests and diseases. A
layers residing in the convolutional neural network are dense, dropout, activation,
flatten, Convolution2D, MaxPooling2D. The model provides an accuracy of 78%.
Haridas et al. [6] proposed diagnosis and severity measurement of tomato leaf
disease. The different types of the algorithm used to analyze the performance are
linear regression analysis (LDA), KNN, SVM, Naïve Bayes, decision tree. Based
on the final result, it is concluded that the support vector machine performed better
compared to other algorithms.
Rajesh et al. [7] proposed the classification of leaf disease using a decision tree.
The image is then refined with a refinement filter. To distinguish between healthy and
diseased leaves, decision tree classifiers are applied. Finally, trees can’t be segmented
to particular nodes with insufficient supporting data.
Kumari et al. [8] proposed the detection of the leaf using a k-means clustering
algorithm and ANN to classification. Use GLCM to generate statistics (features).
To classify the data, a backpropagation neural network was deployed. After the
network has been trained, it displays the performance plot, confusion matrix, and
error histogram plot. The proposed system detects leaf disease with 92% accuracy.
Robert et al. [9] implemented deep learning-based automated image capturing
system detecting and recognizing tomato plant leaf disease. The network model
used on a proposed system was Alexnet. The faster R-CNN model is used, and this
model is trained up to 50 epochs with modification to the fully connected layer.
A Web server is implemented which runs the Webpage, and this webpage can be
accessed by the user to view the resulting output image. The proposed system works
very well and has an accuracy of 91.6%.
3 Proposed System Architecture
Figure 1 describes the system architecture for the training phase. It consists of
image acquisition, preprocessing, and feature extraction using CNN. Finally, trained
datasets are stored in the model. Training is the process of generating a trained model
file based on the given input. In a developed system, a dataset was split into a training
and a testing. 80% dataset is used for training purposes.
Figure 2 describes the architecture of a system for testing the leaf disease on
a plant. The proposed system composed of image acquisition, preprocessing, and
Fig. 1 System architecture Image Image

for training Acquisition Preprocessing
Feature
Training Extraction
Dataset using CNN
Select image Image Image

for testing Acquisition Preprocessing
Feature
Detect Disease Extraction
using CNN
Classify Disease Training

Dataset
Fig. 2 System architecture for testing
feature extraction using CNN. Testing is a process of checking whether the model is
trained perfectly or not. During the testing phase, the trained model will go through
rigorous checking by giving a testing data set to the model. The remaining 20% is
used for testing purposes.
3.1 Image Acquisition
Image acquisition is an initial stage of a process. The dataset of the leaf disease
images is collected from the Web site. The dataset includes leaf diseases images
of three major crops they are pepperbell, potato, and tomato. The collected dataset
contains 15 different classes.
3.2 Image Preprocessing
Preprocessing is a process for modifying the raw input image before passing it to the
learning algorithm. The input image was resized to fit the network. A selected input
image was resized to 256 × 256 pixels to fast the training process and generate a
model that can be tested realistically.
3.3 Feature Extraction Using CNN
The CNN is a best approach in a deep learning technique which contains multiple
layers used to train the dataset in a faster manner. In the proposed system, the structure
of CNN consists of 8 layers; they are as follows:
• Input layer
• Convolutional layer
• Pooling layer
• Normalization layer
• Nonlinear layer
• Fully connected layer
• Dropout layer
• Softmax layer
3.3.1 Input Layer
The input images and the pixel values of input images were stored in an input layer.
3.3.2 Convolutional Layer
In convolutional neural networks, the major building elements are convolutional

layers. The 2D convolution layer is the most frequent type of convolution and is
generally abbreviated as conv2D. In a conv2D layer, a filter through the 2D input data
is executing element-wise multiplication. The convolutional layer extracts features
from an input images by computing a convolutional operation with kernel filters. An
image can be represented in Eq. 1;
Dim(image) = (nh, nw, nc) (1)

where
nh is a size of height
nw is a size of width
nc is a number of channels
This layer develops the tensor of the output by convolving a layer input with the
convolution kernel. A stride refers to the kernel’s sliding size. The stride value is set
to 2.
3.3.3 Pooling Layer
A pooling layer works on the feature map to reduce the minimum value. In the neural
network, the position of the pooling layer comes after the layer of convolution. The
pooling layer mainly performs two functions: The first function is to minimize the
parameters present in the feature map and also in weights. The second function is to
control the overfitting of the model. Max pooling is used. The pooling layer mainly
chooses a maximum value of the region in a feature map which is covered using a
filter. Normalization Layer The proposed system uses a batch normalization layer.
The batch normalization layer will perform normalization of the layer in batches. In
batch, normalization is applied to each of the feature maps by performing mean and
variance. The formula of normalizing is defined in Eq. 2;
xi − μ B
xi = / (2)
σ B2 + ∈
where
μB is a mean of the input,
xi is the instance of a input data,
σ B2 is the standard deviation of a input,
∈ is a smoothing term.
3.3.4 Nonlinear Layer
A function of nonlinear layer is a creation of activation maps based on a given input.

This layer takes a feature map which is generated by a convolutional layer also creates
an activation map out of it. Along with this function, an additional function is used
to call as rectified linear activation function. ReLU is determined in Eq. 3;
y = max(0, x) (3)
where x is a input to the neuron. Compared to other activation functions such as

sigmoid or tanh, the rectified linear unit activation function performs much better
and is computationally efficient.
3.3.5 Fully Connected Layer
A fully connected layer will perform a connection to every layer, which means that
all previous layers will be interconnected in a neural network of a convolutional layer.
The fully connected layer uses matrix form to perform the calculation. Using matrix
form, the activation is multiplied using an offset of a bias. The matrix multiplication
computed by the fully connected layer helps to classify the given image.
3.3.6 Dropout Layer
This system nullifies the given data by masking data from the neurons, and nonmodi-
fied data will be left behind. The function of the dropout layer sets some of the values
to 0 based on the frequency set during each epoch of the training period, which will
help prevent models from overfitting.
3.3.7 Softmax Layer
The softmax layer is a last layer in the CNN. Using a softmax function, correct
disease is predicted. The function of the softmax is to convert the value of the vector
of a k real values to the sum of the values when computed will be a 1. It is necessary
to use the softmax function in CNN for the classification of a network.
Algorithm
The steps of CNN algorithm are as follows:
Step 1 : Input image.
Step 2 : Feature extraction using convolutional operation.
Step 3 : Max pooling layer to reduce the feature map.
Step 4 : Batch normalization normalizes the mini-batch mean and standard
deviation.
Step 5 : Rectified linear unit creates the activation map.
Step 6 : Fully connected layer connects the neurons.
Step 7 : Softmax function detects and classifies the disease.
3.4 Detect and Classify Disease
In the proposed system, a village dataset is used. This dataset is trained, and a model
file is generated. Using this trained model file, the system can classify up to three
types of plants leaf diseases; they are tomatoes, peppers, and potatoes. The user
interface library used in this project is the TK interface in short it is also called a
Tkinter. The TK interface library is very easy to use where one can place buttons or
edit textboxes or images.
4 Result and Analysis
In our project, we used village dataset collected from kaggle.com. The dataset
comprises of 900 images of three types of leaf plants they are pepperbell, potato, and
tomato, each image is size of 256 × 256 pixels. The dataset consists of 15 different
classes of leaf disease. In 900 images, 750 images were selected to training, and 150
leaf images were used to detect the diseases in the plant. From each class, 50 images
of leaf are considered for training. For testing, 10 leaf images are selected from each
class.
Figure 3a shows the original, and Fig. 3b shows the resized image of the tomato
leaf disease. Figure 3c shows layer_1 image of the convolutional layer of tomato
leaf. In layer1, the filter is applied to produce 128 × 128 pixels. Figure 3d shows the
layer_2 image, and Fig. 3e shows the layer_3 image of the convolutional layer of
tomato leaf. Layer_2 produces 64 × 64 pixels, and layer_3 produces 32 × 32 pixels.
Figure 3f shows the visual image of a pooling layer, and Fig. 3g shows the visual
image of a normalization layer.
In a pooling layer, max pooling is used. The batch normalization is applied in
the normalization layer. Figure 3h shows the visual image of the tomato leaf of the
nonlinear layer, and Fig. 3i shows the visual image of the tomato leaf of dropout layer.
Figure 3j shows a visual image of tomato leaf of softmax layer in CNN. Figure 3k
shows the detection and classification of disease on tomato leaf. The predicted disease
is tomato bacterial spot.
Table 1 shows testing of pepperbell leaf. For training, 100 images are taken. For
testing the pepperbell leaf disease, 20 images are taken. The bacterial spot and healthy
leaf of the pepperbell are detected. In 10 images, 8 images correctly classify the
disease of pepperbell leaf that is a bacterial spot, and the remaining 2 images predicted
false. All 10 images of the healthy leaf of pepperbell were classified correctly. So,
out of 20 images, 19 images are correctly classified. The recognition rate is 90%.
Table 2 shows the testing result of potato leaves. For training 150 images and for
testing 30 images are taken. In potato leaf, 2 diseases are detected that are early blight
and late blight. Also, a healthy leaf of the potato is identified. In potato early blight,
9 images correctly predict the disease, and the remaining 1 leaf image predicted
false. In potato late blight, all 10 images of leaf correctly detect the disease. In potato
a) b) c) d)
e) f) g)
h) i) j)
k)
Fig. 3 a Original tomato leaf image, b Resized leaf image, c Image of convolutional layer_1, d
Image of convolutional layer_2, e Image of convolutional layer_3, f Image of pooling layer, g Image
of normalization layer, h Image of nonlinear layer, i Image of dropout layer, j Image of softmax
layer, k Detection and classification of tomato leaf disease
Table 1 Testing of Pepper_bell leaf

Sl. No. Name of the leaf disease Total number of Correctly Wrong predicted
leaf classified
1 Pepper_bell_Bacterial_spot 10 8 2
2 Pepper_bell_healthy 10 10 0
healthy leaf, all 10 input images are predicted correctly. So, out of 30 images, 29
images are correctly classified, and the recognition rate is 96.67%.
Table 3 shows the testing result of tomato leaves. For training 500 images and
for testing 100 images are taken. In the tomato leaf, 9 disease leaves and 1 healthy
leaf are detected. In tomato bacterial spots, 8 images detect the disease correctly,
and the remaining 2 images produce the wrong prediction. All 10 images of tomato
early blight, target spot, late blight, septoria leaf spot, spider mite, and healthy leaf
predict correct classification. In tomato leaf mold and yellow leaf curl virus, 9 images
classified the correct diseases, and the remaining 1 image wrongly predict the disease.
So, out of 100 images, 94 images are recognized correctly. The recognition rate is
94%.
Table 2 Testing of potato leaf

Sl. No. Name of the leaf Total number of leaf Correctly classified Wrong predicted
disease
1 Potato_Early_blight 10 9 1
2 Potato_Late_blight 10 10 0
3 Potato_healthy 10 10 0
Table 3 Testing of tomato leaf

Sl. No. Name of the leaf disease Total number of Correctly Wrong
leaf classified predicted
1 Tomato_Bacterial_spot 10 8 2
2 Tomato_Early_blight 10 10 0
3 Tomato_Target_spot 10 10 0
4 Tomato_Late_blight 10 10 0
5 Tomato_Leaf_Mold 10 9 1
6 Tomato_Mosaic_Virus 10 8 2
7 Tomato_Septoria_Leaf_Spot 10 10 0
8 Tomato_Yellow_Leaf_Curl_Virus 10 9 1
9 Tomato_Spider_Mite 10 10 0
10 Tomato_Healthy 10 10 0
5 Conclusion
The identification of leaf diseases is very important for the successful cultivation of
crops, and the system is implemented using deep learning technique. It identifies
the diseases of pepper, tomato, and potato leaves. The trained model can classify up
to 15 different classes of disease. It detects both healthy and diseased leaves. The
system is developed to detect and classify the type of diseases in an accurate way
using deep learning technique. The convolutional neural network algorithm is used
to classify the different types of leaf diseases of a plant. The proposed method gives
the accuracy of 97.35%.
References
1. Sardogan M, Tuncer A, Ozen Y (2018) Plant leaf disease detection and classification based on
CNN with LVQ algorithm. In: Proceeding of third international conference on computer science
and engineering, pp 382–385, Bosnia and Herzegovina, 20–23 Sept 2018
2. Indumathi R, Saagari N, Thejuswini V, Swarnareka R (2019) Leaf disease detection and fertilizer
suggestion. In: Proceeding of third international conference on systems computation automation
and networking, pp 1–7, India, 2930 Mar 2019
3. Jasim MA, AL-Tuwaijari JM (2020) Plant leaf diseases detection and classification using image
processing and deep learning techniques. In: Proceeding of international conference on computer
science and software engineering, pp 259–265, Ira, 16–18 Apr 2020
4. Sholihati RA, Sulistijono IA, Risnumawan A, Kusumawati E (2020) Potato leaf disease classi-
fication using deep learning approach. In: Proceeding of international electronics symposium,
pp 392–397, Indonesia, 29–30 Sept 2020
5. Karol AMA, Gulhane D, Chandi T (2019) Plant disease detection using CNN and remedy. Int
J Adv Res Electr Electron Instrum Eng 08(3):622–628, India, Mar 2019
6. Gadade HD, Kirange DK (2020) Tomato leaf disease diagnosis and severity measurement. In:
Proceeding of fourth world conference on smart trends in systems, security and sustainability,
pp 318–323, UK, 27–28 Jul 2020
7. Rajesh B, Vishnu Sai Vardhan M, Sujihelen L (2020) Leaf disease detection and classification
by decision tree. In: Proceedings of fourth international conference on trends in electronics and
informatics, pp 705708, Tirunelveli, India, 15–17 June 2020
8. Kumari CU, Prasad SJ, Mounika G (2019) Leaf disease detection feature extraction with K-
means clustering and classification with ANN. In: Proceedings of third international conference
on computing methodologies and communication, pp 1095–1098, India, 27–29 Mar 2019
9. de Luna RG, Dadios EP, Bandala AA (2018) Automated image capturing system for deep
learning-based tomato plant leaf disease detection and recognition. In: Proceeding of tenth
international conference of electrical and electronics engineers, pp 1414–1419, Korea, 28–31
Oct 2018
Breast Mass Classification Using
Convolutional Neural Network
Varsha Nemade, Sunil Pathak, Ashutosh Kumar Dubey, and Deepti Barhate
Abstract Cancer in breast is the most prevalent disease discovered in women’s

breast cells, and it can also lead to death. Early detection and diagnosis help to reduce
this mortality rate. Different Artificial Intelligent (AI) techniques such Machine
Learning (ML), Deep Learning (DL) is being used in the medical industry to predict
breast cancer. Different breast cancer images like mammogram, ultrasound and
biopsy, etc. are used for analysis. This work aims to provide deep Convolutional
Neural Network (CNN) architecture to analyze breast cancer through mammogram
images. Nowadays, due to great performance deep CNN is widely used. In this
paper, we presented CNN model with five layers of convolutional, five layers of max
pooling, four layers of dropout and two layers of fully connected. CNN is devel-
oped using breast image dataset DDSM which is publicly available. Proposed CNN
model has achieved 89.46% accuracy for breast mass classification as benign and
malignant.
Keywords Deep learning · CNN · ML · AI
V. Nemade (B) · S. Pathak

Department of Computer Science & Engineering, Amity School of Engineering & Technology,
Amity University Rajasthan, Jaipur, India
e-mail: vvfegade@gmail.com
S. Pathak
e-mail: spathak@jpr.amity.edu
A. K. Dubey
Chitkara University School of Engineering and Technology, Chitkara University, Himachal
Pradesh, India
e-mail: ashutosh.dubey@chitkara.edu.in
D. Barhate
e-mail: deepti.barhate@nmims.edu
86 V. Nemade et al.
1 Introduction
Now a day’s breast cancer is the most perilous disease for women. According to report
of Indian Council for Medical Research (ICMR) 2019, new breast cancer cases in
India up to 1.5, out of which every year 70,000 succumb due to delays in prediction
and treatment [1]. This shows that cases in India are increasing. This rate can be
reduced by awareness of early detection [2]. Imaging techniques such as Mammog-
raphy, Magnetic Resonance Imaging (MRI), Ultrasound, and tomosynthesis, used
for breast cancer study and diagnosis. Mammogram is one the imaging method used
by many people for early diagnosis of breast cancer. For each breast two views are
produced such as Mediolateral Oblique (MLO) and Craniocaudal (CC) by a radiolo-
gist. Early diagnosis through mammogram screening increases the chance of survival
[3]. Diagnosis depends on the radiologist experience and expertise. In some surveys
it has been found that error in diagnosis may increase the cost of surgeries [4].
Benign and malignant are two main classes of cancers, in this study mammogram
images are used for classification of breast cancer into these two classes. Computer
Aided Diagnosis (CAD) helps to provide the second option for radiologist for taking
the decision from mammogram image interpretations. Deep learning, particularly
with CNN, is widely used technique in medical image analysis [5]. There are several
algorithms have already been used for the breast cancer and other cancer detection
[6–10] but there is the scope of improvement in this area.
The work is organized as follows: related work on breast mass categorization is
discussed in Sect. 2 followed by the proposed model in Sect. 3. Observations are
discussed in Sect. 4 which is followed by conclusions of the work in Sect. 5.
2 Related Works
Computer Aided Diagnosis (CAD) systems with computer technology are used to
detect anomalies in mammograms, which helps to increase accuracy. Figure 1 shows
the difference between conventional ML and DL techniques. Recently, many tech-
niques for classification of masses have been proposed. Traditional machine learning
approach from input image features need to be extracted and then classifier is applied.
Beura et al. [11] used 2-D discrete orthonormal S-transform (DOST) for extraction
of features using the AdaBoost algorithm with RF as base classifier and achieved
accuracy 98.3% and 98.8%, respectively, on MIAS and DDSM dataset. Li et al. [12]
used DDSM dataset and performed classification by using contour features and got
best accuracy 99.66% with SVM. Mughal et al. [13] described method for detection
of tumor from breast masses and classified it into a class of normal or abnormal,
benign or malignant by using the combination of Hat transformation and GLCM
with the back propagation network. Khan et al. [14] proposed different techniques
for mass classification from mammograms by using Gabor feature extraction and
for classification used SELwSVM. Textural features are extracted from by using
Breast Mass Classification Using Convolutional Neural Network 87
Fig. 1 Difference between conventional machine learning and deep learning
contourlet transform and GLCM and used SVM and KNN for classification and
achieved accuracy 94.12% and 88.89%, respectively, for MIAS dataset [15].
In deep learning features are automatically learned and used for classification
[16–22], there is no need of handcrafted features. Suzuki et al. [23] shows result by
using deep convolutional network (DCNN) trained by transfer learning and achieved
89.9% sensitivity. Ribli et al. [24] described the approach Faster RCNN for detecting
and classifying masses in mammograms, with an Area under curve of 0.95. Wang
et al. [25] works on a hybrid deep network using a Recurrent Neural Network (RNN)
to sort out features from multi-view data on the BCDR dataset and achieved an AUC
of 0.89. Al-Masni et al. [26] described method for finding and classifying masses in
DDSM dataset using ROI-based CNN—YOLO model and achieved classification
accuracy 97%. On the DDSM dataset, Al-Antari [27] et al. suggested a system that
combined a deep learning YOLO detector with an InceptionResNetV2 classifier and
achieved 97.50% accuracy. Gnanasekaran et al. [28] proposed CNN model with 8
convolutional layers, 4-max pooling and 2-fully connected layer and applied it on
MIAS and DDSM dataset and got accuracies 92.54 and 96.47, respectively.
Deep learning CNN provides the ability for automatically learns feature for clas-
sification problems which overcomes the problem of handcrafted features such as
feature extraction and selection. Deep learning has consistently shown improved and
accurate performance. This feature of DL motivated us to explore DL for breast
cancer analysis. In this we designed a CNN model with five layers of convolutional,
five layers of max pooling, four layers of dropout and two layers of fully connected
of classification mammogram images using breast image dataset DDSM.
We proposed deep CNN model and applied it to a mammogram dataset for

classification of breast masses.
88 V. Nemade et al.
Fig. 2 Architecture of proposed CNN model
3.1 About Dataset
The publically available mammogram datasets DDSM and CBIS-DDSM are used.
DDSM dataset contains more than 2600 images with normal, benign and malig-
nant cases. This dataset has CC and MLO views of breast images. This dataset
is collaborative effort from Massachusetts General Hospital, University of South
Florida, Sandia National Laboratories. CBIS-DDSM dataset is a subset and updated
version of DDSM dataset that containing ROI segmentation and bounding boxes and
pathologic diagnosis data. Negative images from DDSM and positive images from
CBIS-DDSM are taken. On extracted ROI preprocessing is applied by using random
flips and rotations and then resizes them.
3.2 Architecture of CNN
Deep learning shows improvement in medical images [29]. Figure 2 shows the new
proposed model. It has convolutional layer, max pooling layer and dropout layer.
It contains five layers of convolutional, five layers of max pooling, four layers of
dropout and two fully connected layer. Model used batch size of 32 with dropout
rate 0.20. Convolutional layer uses 3 × 3 kernels, ‘Relu’ as activation function. Max
pooling is used with pool size (2, 2) and stride used is ‘same’. Figure 3 shows detailed
model summary, with details of layers, output shape and parameters.
For experimental analysis proposed model is evaluated on Mammogram dataset.

Proposed model is evaluated by using confusion matrix as shown in Fig. 4 and Table
1 shows classification report. Confusion matrix shows the how our model correctly
classified the tuples. Proposed model has achieved 89.46% accuracy. According
to a classification report Class 1 (Malignant) has high precision 0.94 (+5% than
Class 0), Class 0 (Benign) has high recall 1.00. Figure 5 shows the ROC curve for
proposed model and AUC is 0.59, as our dataset is highly imbalanced. Proposed CNN
Fig. 3 Proposed model summary
architecture applied to the dataset created by using images from DDSM and CBIS-
DDSM dataset. The classification report shows that recall of class 1 (Malignant) is
very low, this is because of imbalanced dataset. The dataset has a smaller number of
malignant records. To overcome the problem of imbalanced dataset can be overcome
by K-fold cross validation in future work.
90 V. Nemade et al.
Fig. 4 Confusion matrix
Table 1 Classification report

Class Precision Recall F1-score
Benign 0.89 1.00 0.94
Malignant 0.94 0.20 0.33
Fig. 5 ROC curve
5 Conclusions
Deep learning is come up with high effectiveness in imaging modalities in medical

field. We presented CNN architecture for classifying breast cancer images into malig-
nant and benign in this paper. This model evaluated with images from DDSM and
CBIS-DDSM dataset. The CNN model used in this paper has four different layers.
Output taken from the last layer with fully connected, soft max layer. This model
produces accuracy 89.46%. For the enhancement in performance of the proposed
CNN model one can add extension of K-fold cross validation. Also, to improve
accuracy and recall we can use some other techniques to balanced dataset.
References
1. Josephine SP (2019) Evaluation of lymphedema prevention protocol on quality of life among

breast cancer patients with mastectomy. Asian Pac J Cancer Prev 20(10):3077
2. Dubey AK, Gupta U, Jain S (2015) Breast cancer statistics and prediction methodology: a
systematic review and analysis. Asian Pac J Cancer Prev 16(10):4237–4245
3. Dubey AK, Gupta U, Jain S (2019) Computational measure of cancer using data mining
and optimization. In: International conference on sustainable communication networks and
application, Jul 30, Springer, Cham, pp 626–632
4. Namjoshi M, Khurana K (2021) A mask-RCNN based object detection and captioning
framework for industrial videos. Int J Adv Technol Eng Explor 8(84):1466–1478
5. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E (2018) Deep learning for computer
vision: a brief review. Comput Intell Neurosci 1:2018
6. Chaturvedi P, Jhamb A, Vanani M, Nemade V (2021) Prediction and classification of lung
cancer using machine learning techniques. IOP Conf Ser Mat Sci Eng 1099(1):012059
7. Rela M, Rao SN, Reddy PR (2022) Performance analysis of liver tumor classification using
machine learning algorithms. Int J Adv Technol Eng Explor 9(86):143–154
8. Melekoodappattu JG, Dhas AS, Kandathil BK, Adarsh KS (2022) Breast cancer detection in
mammogram: combining modified CNN and texture feature-based approach. J Ambient Intell
Humaniz Comput 24:1
9. Gonçalves CB, de Souza JR, Fernandes H (2022) CNN architecture optimization using bio-
inspired algorithms for breast cancer detection in infrared images. Comput Biol Med 5:105205
10. Joshi G, Kumar R, Chauhan AK (2021) Segmentation and classification of brain tumor images
using statistical feature extraction and deep neural networks. Int J Adv Technol Eng Explor
8(85):1585–1602
11. Beura S, Majhi B, Dash R, Roy S (2015) Classification of mammogram using two-dimensional
discrete orthonormal S-transform for breast cancer detection. Healthc Technol Lett 2(2):46–51
12. Li H, Meng X, Wang T, Tang Y, Yin Y (2017) Breast masses in mammography classification
with local contour features. Biomed Eng Online 16(1):1–2
13. Mughal B, Sharif M, Muhammad N, Saba T (2018) A novel classification scheme to decline
the mortality rate among women due to breast tumor. Microsc Res Tech 81(2):171–180
14. Khan S, Hussain M, Aboalsamh H, Bebis G (2017) A comparison of different gabor feature
extraction approaches for mass classification in mammography. Multimedia Tools Appli
76(1):33–57
15. Taifi K, Taifi N, Fakir M, Safi S, Sarfraz M (2020) Mammogram classification using nonsub-
sampled contourlet transform and gray-level co-occurrence matrix. In: Critical approaches to
information retrieval research, IGI Global, pp. 239–255
16. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
17. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science
349(6245):255–260
18. Kooi T, Gubern-Merida A, Mordang JJ, Mann R, Pijnappel R, Schuur K, den Heeten A,
Karssemeijer N (2016) A comparison between a deep convolutional neural network and radi-
ologists for classifying regions of interest in mammography. In: International workshop on
breast imaging, Springer, Cham, pp 51–56
92 V. Nemade et al.
19. Chartrand G, Cheng PM, Vorontsov E, Drozdzal M, Turcotte S, Pal CJ, Kadoury S, Tang A
(2017) Deep learning: a primer for radiologists. Radiographics 37(7):2113–2131
20. Platania R, Shams S, Yang S, Zhang J, Lee K, Park SJ (2017) Automated breast cancer diagnosis
using deep learning and region of interest detection (BC-DROID). In: Proceedings of the ACM
international conference on bioinformatics, computational biology, and health informatics,
ACM, pp 536–543
21. Wang J, Ding H, Bidgoli FA, Zhou B, Iribarren C, Molloi S, Baldi P (2017) Detecting
cardiovascular disease from mammograms with deep learning. IEEE Trans Med Imaging
36(5):1172–1181
22. Patil S, Kirange DK, Nemade V (2020) Predictive modelling of brain tumor detection using
deep learning. J Crit Rev 7(4):1805–1813
23. Suzuki S, Zhang X, Homma N, Ichiji K, Sugita N, Kawasumi Y, Ishibashi T, Yoshizawa M
(2016) Mass detection using deep convolutional neural network for mammographic computer-
aided diagnosis. In: 2016 Annual conference of the society of instrument and control engineers
of Japan, Sep 20, IEEE, pp 1382–1386
24. Ribli D, Horváth A, Unger Z, Pollner P, Csabai I (2018) Detecting and classifying lesions in
mammograms with deep learning. Sci Rep 8(1):1–7
25. Wang H, Feng J, Zhang Z, Su H, Cui L, He H, Liu L (2018) Breast mass classification via deeply
integrating the contextual information from multi-view data. Pattern Recogn 1(80):42–52
26. Al-Masni MA, Al-Antari MA, Park JM, Gi G, Kim TY, Rivera P, Valarezo E, Choi MT,
Han SM, Kim TS (2018) Simultaneous detection and classification of breast masses in digital
mammograms via a deep learning YOLO-based CAD system. Comput Methods Programs
Biomed 157:85–94
27. Al-Antari MA, Han SM, Kim TS (2020) Evaluation of deep learning detection and classification
towards computer-aided diagnosis of breast lesions in digital X-ray mammograms. Comput
Methods Programs Biomed 1(196):105584
28. Gnanasekaran VS, Joypaul S, Sundaram PM, Chairman DD (2020) Deep learning algorithm
for breast masses classification in mammograms. IET Image Proc 14(12):2860–2868
29. Bhatt C, Kumar I, Vijayakumar V, Singh KU, Kumar A (2020) The state of the art of deep
learning models in medical science and their challenges. Multimedia Syst 25:1–5
Deep Generative Models Under GAN:
Variants, Applications, and Privacy
Issues
Remya Raveendran and Ebin Deni Raj
Abstract Deep learning has lately acquired a lot of attention in machine learning
because of its capacity to train features and classifiers at the same time, resulting in
a significant boost in accuracy. To attain a high level of accuracy, the models require
huge amounts of data and processing capacity, both of which are now available due
to the advancements in big data, Internet of Things, and cloud computing. Even
though, some applications like medical diagnosis, image recognition, and biometric
authentication faces the problem of data scarcity which affects the predictive analytics
of deep learning. In order to tackle the issue, deep generative models like Generative
Adversarial Networks (GAN) come into existence that are capable of artificially
generating synthetic data for specific problems. In this article, various models of
GAN and their applications were explored and a comparison of the models were
also given. As the data increases, another issue faced by the applications is of data
privacy. With rising privacy concerns, more priority has to be given for privacy issues
while developing intelligent applications. GAN and its variants are nowadays used
as an attacker as well as a defender against various privacy risks which were also
presented in this review. As a future work, GANs potential to solve the issues of data
privacy and security has to be deeply explored.
Keywords Generative Adversarial Networks · Generative models · Privacy

issues · GAN in privacy · GAN models
1 Introduction
Deep structural learning or deep learning is a part of machine learning based on the
concepts of artificial neural networks. It differs from the traditional machine learning
in their feature learning techniques that allows the machine to automatically discover
R. Raveendran (B) · E. D. Raj

Indian Institute of Information Technology, Kottayam, Kerala, India
e-mail: remsanoop.phd2117@iiitkottayam.ac.in
E. D. Raj
e-mail: ebindeniraj@iiitkottayam.ac.in
94 R. Raveendran and E. D. Raj
the patterns which replaces the manual feature engineering process to perform a
specific task. To achieve a high level of accuracy, the models need access to massive
amounts of data and processing power. With the recent advancements in the field of
deep learning and Internet of Things, the amount of data arises to a large extent as
a lot of new devices are employed in various fields for better communication. Deep
neural learning creates complicated statistical models utilizing its own repetitive
output from enormous amounts of unlabeled, unstructured data, resulting in accurate
predictive models. All forms of big data analytics applications, particularly those
focused on NLP, language translation, medical diagnostics, stock market trading
signals, network security, and image identification, are now using deep learning.
Even though it has gained tremendous achievements, some applications like medical
diagnosis, image recognition, and biometric authentication face the problem of data
scarcity and data privacy. Insufficient data affects the predictive analytics of deep
learning techniques, and privacy issues affects the improvement of model robust-
ness by preventing the sharing of data. One way to tackle the unavailability of data
is to artificially generate synthetic data for the specific problem. Synthetic datasets
are automatically generated by extracting the statistical properties of features from
the original dataset, which increases the performance of algorithms and allows for
more generic models to be created. Nowadays, the two class of algorithms Genera-
tive Adversarial Networks (GAN) [1] and Variational Autoencoder’s (VAE) [2] have
gained importance due to their generative properties for creating sample data. Exten-
sive research and development have been done on these models, and many synthetic
data architectures have been built using these core methods, for generating images,
audio, tabular, and textual data.
However, data shortage can be solved by the generative modeling techniques,
more focus has to be given for preserving the data privacy. Traditionally, the machine
learning models for intelligent applications have been done by uploading the data
from all the connected devices to a centralized server in the cloud environment to
train a generic model. Since the clients here are distributing the data, there arises
a chance of data leakage which results in privacy concerns and even regulatory
and judicial issues. Machine learning has presented a new concept of collaborative
decentralized learning to address the problem in which the model is learnt locally
using the client’s real data and then disseminated to a remote/global server without
exposing the original sensitive data. Recent advancements in distributed learning
and Generative Adversarial Networks are emerging as a new solution for most of the
challenges faced for the processing of the heterogeneous data available from different
edge devices, which results in efficient communication and predictive models.
2 Generative Adversarial Networks (GANs)
Generative Adversarial Networks or GAN’s have gained more popularity in the

field of machine learning using deep learning methods such as conventional neural
Deep Generative Models Under GAN: Variants, Applications… 95
networks which was first introduced by Goodfellow et al. [1]. GAN can be consid-
ered as a generative model that can be applied to unsupervised and semi-supervised
learning tasks. Due to its generative capability, the foremost issue faced by the deep
learning methods of over fitting of data can be solved to a great extent. The basic
architecture and the objective function of GAN model are explained in the below
section.
2.1 GAN Architecture
In Generative Adversarial Networks, the model uses two neural networks; a generator
G and a discriminator D. The two models compete with each other as a min–max
game. Generators are responsible for generating synthetic data which is used by GAN
for training along with the real data, whereas discriminators are networks which
act as binary classifier that distinguish between true and fraudulent data. The two
models work like a min–max optimization formulation. The generator is attempting
to mislead the discriminator by producing more realistic data while focusing on
minimizing the objective function, while the discriminator is attempting to maximize
the objective function in order to detect the fake one. Here a loss function is used as
an objective function which is backpropagated to increase the accuracy of the model.
This ability makes the model act as a supervised model which helps in most of the
classification and regression tasks. Figure 1 shows the architecture of basic GAN.
Fig. 1 Structure of GAN model

2.2 Objective Function
The generator tries to generate fake but realistic data, while the discriminator tries to
tell the difference between artificial data (generated by the generator) and genuine
data. The discriminator is specified as D(x) →[0, 1], while the generator G transfers
a random vector z in the latent space to synthetic data: G(z) →x (i.e., close to 0).
The following is the objective or loss function that was used to train the network.
[ ] [ ]
minG max D V (D, G) = E x∈X log D(x) + E z∈Z log(1 − D(G(Z ))) (1)
where X denotes the set of actual images and Z denotes the latent space. The above
loss function (1) is referred to as the adversarial loss. The generator tries to minimize
the loss function while the discriminator tries to maximize it.
3 Existing Models and Applications
Instead of being a rigorous formulation, GAN is more like an adversarial training

framework. As a result, it is more adaptable and extendable, allowing it to be turned
into a variety of versions to meet various needs. Here we discuss some of the popular
modified versions of GAN that has been developed on the basis of loss functions,
architecture, and latent space.
3.1 GAN Models
A taxonomy of the evolution of GAN models is given in the Fig. 2.
CGAN Conditional GAN is a conditioned version of original GAN. The model is

constructed by feeding an extra auxiliary information. The discriminator takes the
auxiliary information c with real data to differentiate the generated data D(G(z|c))
from real data. The generator takes the extra information like class label, text, or
images along with the latent vector to generate conditioned real-looking data G(z|c).
The data creation can be controlled using CGAN.
DCGAN Deep convolutional GAN uses convolutional neural network (CNN) for
generator and discriminator. This is considered as the first structure using de-
convolutional neural networks (de-CNN) as discriminator for stabilizing the GAN
training. Along with this, a newly proposed class of constraints has been added to this
network which includes batch normalization and Leaky ReLU and ReLU activation
functions in generator and discriminator, respectively.
Fig. 2 Taxonomy of GAN models
LapGAN Laplacian GANs are composed of a cascade of CGAN using the Lapla-
cian Pyramid framework with K levels. The model incorporates a variety of GAN
processes that generate various levels of image details in LP representation. Each
generation is distinct from the others. The key modification for GAN is the LapGAN,
which up-scales the low resolution input image to a higher-resolution output image in
a coarse-to-fine pattern, resulting in more photo-realistic images than regular GAN.
The coarse-to-fine approach makes the model computationally expensive, and the
convergence rate will be slower for deep LapGAN.
InfoGAN Information maximizing GAN, a variation of CGAN, learns interpretable
and meaningful representations of disentangled design in an unsupervised way.
The regularization term in this model maximizes the similarity measure between
a predefined small subset of latent random variables and the observations.
EBGAN Energy-based GAN uses a combination of AutoEncoder (AE) and GAN
frameworks. Instead of utilizing a probability function to detect actual and fake data
as in the original GAN, the EBGAN discriminator uses an energy function, with low
energy indicating real data and high energy indicating fake data. Both G and D are
trained using two different losses in this model.
WAGAN Wasserstein GAN (WGAN) is a loss function variant of GAN model which
uses Earth Mover (EM) or Wasserstein distance as cost function. Using this loss
function, the GAN’s vanishing gradient problem is avoided, and the mode collapse
impediment for stabilizing GAN training is partially removed. Similarly, WGAN-GP
was created by adding a gradient penalty (GP) term to the discriminator to improve
GAN training stability, resulting in high-quality samples with better convergence
rate than WGAN.
PROGAN Progressively growing GAN emphasizes on a multi-scale generating
process in which both the generator and the discriminator begin training with low
resolution images (4 × 4), slowly increasing the depth by adding new layers, and
eventually producing high resolution images (1024 × 1024). In comparison with
existing non-progressive GANs, the model enhances quality, stability, and variation.
However, because to the uneven training of the generator and discriminator, which
creates comparable samples, it is still not pleased with the mode collapse problem.
BigGAN Because to its vast scale, indistinguishable, and high-quality image produc-
tion capability, BigGAN has become one of the best models. The model surpasses
large computational models with more parameters in terms of output control and
interpolation phenomena between pictures. BigGAN has limited data augmentation
capacity in large size datasets, despite its great performance in huge and high-fidelity
diversified image production. Also, it can’t repeat the outcomes from scratch without
sufficient data.
StyleGAN StyleGAN, an improved version of ProGAN, that relies on the gener-
ator network to enable reasonable control over the specific features of the generated
image. The model uses an unconventional GAN architecture and Adaptive Instance
Normalization, which scales normalized input with style spatial statistics, to control
the correlation between input features such as coarse features (hair, face, pose, and
shape), medium features (facial features, eyes), and fine scheme without compro-
mising the high quality. StyleGAN2 comes up with an enhancement in normaliza-
tion used in the StyleGAN’s generator, which improves the image quality, efficiency,
diversity, and disentanglement.
MSGGAN Multi-scale gradients networks is inspired from ProGAN but here the
network is not trained progressively instead all the layers are trained at the same
time.
Other Models Least square GAN (LS-GAN) and unrolled GAN (UGAN) are loss
function variants models of GAN that has been introduced to solve the vanishing
gradient and mode collapse problems, respectively. Another model loss sensitive
GAN (LS-GAN) produces realistic samples by reducing the margins between real
data distribution and the generated sample distribution. Traditional CNN-based
GAN’s have difficulty in learning multi-class images which can be solved by a
self-attention mechanism which results in the development of SAGAN. CycleGAN
is a very popular architecture for image to image transformation. BicycleGAN an
enhanced version of CycleGAN used for multimodal image to image translation.
Comparative analysis of the models based on the architecture, learning method,
and merits and demerits are mentioned in Table 1.
3.2 Applications
Due to various advancement in the GAN architecture, it has been widely used in
various domains like computer vision, medical diagnosis, cyber security, and natural
Table 1 Comparison of GAN models
Variants Learning Architecture Optimizer Activation Merits Demerits
GAN Unsupervised MLP SGD Sigmoid function Can generate different Harder to train Suffers
versions of the text, video, from vanishing gradient
and audio and mode collapse
CGAN Supervised MLP SGD ReLU Prevent mode collapse and Better performance for
produce high-quality only labeled dataset
images
DCGAN Unsupervised CNN SGD Leaky ReLU Steadier in terms of The misclassification rate
generating higher quality is higher than other
samples and training GAN-based models
InfoGAN Unsupervised MLP Adam Optimizer Discriminator: Leaky It learns latent variables It gives better
ReLU Generator: ReLU without labels in the data performance
when data is not very
complex and small in size
Deep Generative Models Under GAN: Variants, Applications…
CycleGAN Unsupervised CNN Adam Optimizer ReLU and Leaky ReLu Better for paired image to Poor performance if
image translation tasks substantial geometric
changes to the images
BigGAN Supervised and Deep CNN Adam Optimizer Discriminator: Leaky Capable of generating large It can’t repeat the
unsupervised ReLU Generator: ReLU and high-quality images. outcomes from scratch
Suitable for large neural without sufficient data
networks
99
language processing. This section details some of the related works in different
domains. Synthetic data generation or data augmentation capability of GAN has made
incredible development in the fields facing data scarcity such as medical diagnosis,
target detection, satellite imaging, etc. It can be used for generating various data
sources like images, videos, audios, and structural data in [3].
GAN were used for synthetic image generation with CNN classifier used as
discriminator for classification of polyps into benign and malignant. WGAN [4]
which is more effective that GAN have been used for multi-classification of cancer
stages with DNN classifier as discriminator. Inspired from the CGAN, a new model
Conditional SinGAN [5] which is a combination of SinGAN and CGAN have been
proposed for generating constrained multi-target scene images which makes the
images more realistic in spatial layout and semantic information and also improves
the controllability of generated images. Deep residual GAN [6] can be used for image
denoising and defogging applied to both grayscale and color images. It keeps the
main features without loss of perceptual details. Rather than learning clean images
from noisy images as in traditional approaches, the complex-valued convolutional
neural network (CVMIDNet) proposes [7] residual learning, which studies noise
from noisy images and then removes it from noisy images to generate clean images.
The method has shown high accuracy with chest X-ray images and can also be applied
to MRI and CT images. FA-GAN [8] and Res-WGAN [9] have been proposed to
generate super resolution images from low resolution image to reduce the scanning
time effectively. Other than the medical images in [10], the authors have presented
a new GAN framework for the reconstruction of satellite images which are of low
resolutions. The term “image in-painting” refers to the approximate replacement
of a picture’s missing pixels. It is a sophisticated reconstruction technique used in
photo and video editing software. For the same, Exemplar GAN (Ex-GAN) [11]
is employed. Another image-painting model that achieves good results by mixing
local and global data is PGGAN. Photo editing, computer-aided design, and image
synthesis are just a few of the uses for text to image generation in computer vision.
Attentional GAN (AttnGAN) [12], text ACGAN (TAC-GAN), and (KD-GAN) [13]
have proposed for text to image manipulation. GAN has also been used in music
generation, dialog systems, and machine translations. [14] introduced a ranker GAN
used for high-quality language (sentence) generation. Another method which inte-
grates VAE and WGAN called (VAE-WGAN) has been used for voice conversion. In
addition to that, GAN has also utilized for music generation by creating continuous
sequential data.
GANs models have been used in a variety of video applications, including future
frame prediction, video retargeting, and learning disentangled image representations
from video, in addition to image and audio.
4 GANs in Privacy
With rising privacy concerns among the individuals, resisting security and privacy
risks have become a top priority while developing applications that share private data
like medical image and record analysis, street-view image sharing, face recognition,
and biometric authentication. Various GAN models can be used to investigate privacy
concerns without making any assumptions. The models can be employed to launch
an attack or protect against powerful adversaries. In the attack model, the generator
will take on the role of an attacker to deceive the discriminator, whereas in the
defend model, the generator will take on the role of a defender to counterattack a
powerful attacker. GAN-based privacy issues can be related to data utilization and
model design as shown in Fig. 3.
4.1 Privacy in Data
Image data, speech data, video data, textual data, graph data, and spatio-temporal
data [15–17] are the six types of data that GAN can safeguard. On the one hand,
the generator is created to hide private information and/or trained to generate data
that is privacy-preserving by one or more discriminators. The discriminator, on the
other hand, ensures data similarity so that the created privacy-preserving data can
be utilized in real applications while remaining difficult for attackers to distinguish
from genuine data.
Face and medical images [18, 19] that focus on a single object, as well as street-
view images that deal with several objects, contain a variety of sensitive informa-
tion, resulting in privacy leaks, and hence have gotten a lot of research interest.
Different GAN techniques for anonymous text synthesis and privacy-preserving
public/medical records have been presented for textual data. The works on speech
data focuses on remote health monitoring and voice assistance in IoT systems. Due
Fig. 3 Privacy issues based on GAN

Table 2 Summary of GAN models for data privacy

GAN model Application Input Output Data privacy
VGAN [20] Expression Face images Synthetic face Identity
recognition image
PPGAN [21] Face Face images Synthetic face Soft biometric
recognition image attributes
DCGAN [22] Image Medical images Synthetic medical Identity
analysis image
DCGAN [23] Image Street images In painted street Private regions
synthesis images
MedGAN [24] Record EHR records Synthetic EHR Identity
sharing records
CyclicGAN [25] Voice Voice signal Synthetic voice Emotional states
assistance
GDGAN [26] Graph Graph Graph Private attributes
embedding representations
to the popularity of edge computing devices, GPS data need to be collected from
IoT devices which includes user’s sensitive information that need to be protected.
Various GAN approaches used for data privacy are summarized in Table 2.
4.2 Privacy in Model
If an adversary utilizes the model output to deduce the private features of the data used
to train the model, data breaches can occur not only through data but also through
learning models.
Privacy in Centralized Learning System The models for intelligent applications

adopted centralized learning in which a centralized server is responsible for providing
services to all the connected devices. Since the clients here are distributing the data,
there arises a chance of data leakage which results in privacy concerns and even regu-
latory and judicial issues. Membership privacy and preimage privacy (data recon-
struction and model inversion attacks) can be maliciously inferred in centralized
machine learning systems. In membership inference attacks, the attacker feeds data
into a trained model to obtain predicted results, which can then be used in black-
box inference attacks. GAN, anonymization, and obfuscation techniques have been
used to protect against these attacks [27]. Model inversion and data reconstruction
attacks both aim to retrieve the raw data from a model using some useful insights.
Various GAN models have been proposed for both guarding and attacking against
these threats. In [28], a compressive privacy GAN has been proposed that generates
a compressive representation retaining utility with an additional defending mecha-
nism against reconstruction attacks. Another GAN variation, GANobfuscator [29],
achieves differential privacy under GAN by carefully designing noise to gradients

throughout the learning phase.
Privacy in Distributed Learning Systems Machine learning has developed a new

notion of decentralized learning to address the issue of storing all sensitive data in a
single central server. In which geographically dispersed data is educated locally by
various participants without exchanging data. Distributed selective SGD (DSSGD)
and federated learning are two of the most common distributed learning systems. In
DSSGD, the local models exchange a tiny fraction of their parameters with a remote
server, whereas in federated learning, the parameters are shared by the client devices
and aggregated by a global server. Although the distributed learning system can
protect data privacy to a great extent still it is not perfect. Because the GAN generator
can replicate data distribution, a well-designed GAN-based model compromises data
privacy in dispersed learning scenarios. In addition to membership and preimage
privacy, additional difficult privacy challenges for distributed learning should be
addressed. For example, attackers can compromise a server or local users in order
to obtain parameters and launch malicious attacks. In such a setup, more sensitive
information must be protected, such as whose data belongs to which local user, which
user participates in the distributed training process, and how to identify/defend fake
servers or local users who appear to be trustworthy. GAN is employed as an attacker in
[30], where the attacker poses as a trustworthy local model and builds a local dynamic
model using parameters shared by other models without compromising the central
server. The main technique of the attacker is to submit a faked gradient to the central
server to induce a victim model to upload more local data. In distributed learning
systems, the central server may be untrustworthy as well. According to Wang et al.
[31], a malicious server can leak user-level privacy in distributed learning systems
without sacrificing system performance by training a multi-task GAN with auxiliary
identification (mGAN-AI). FedGP [32], a framework for privacy-preserving data
release in the federated learning scenario that employs GAN with differential privacy
to defend models from model inversion attacks, has also been presented.
5 Future Works
GAN potentials should be thoroughly investigated in the future for resolving

privacy and security challenges. Light-weighted GAN models are predicted to secure
data privacy in connected devices before transmission while keeping computation
costs (e.g., time and energy) low for these systems, where resolving the trade-off
between computation performance and computation cost is an inherent difficulty.
When faced with a powerful attack, differential privacy has been considered as an
effective defense mechanism, and federated learning can help ease privacy leaks.
Integrating both federated learning and differential privacy can boost the GAN’s
privacy-protection capabilities even more.
6 Conclusion
Generative Adversarial Network has been widely used in various domains due to its
generative capability that makes it effective in overcoming data scarcity as well as
privacy issues. The review of current GAN models demonstrates GAN’s creative
contributions in a variety of disciplines like image processing tasks, audio and
video synthesis, textual data synthesis as well as graphical data synthesis. Other
than these use cases, we have also reviewed various GAN models addressing the
privacy issues both in the case of centralized and decentralized learning systems of
machine learning.
References
1. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio

Y (2014) Generative adversarial nets. Adv Neural Info Process Syst 27
2. Pavan Kumar MR, Jayagopal P (2021) Generative adversarial networks: a survey on
applications and challenges. Int J Multimedia Info Retr 10(1):1–24
3. Sasmal P, Bhuyan MK, Sonowal S, Iwahori Y, Kasugai K (2020) Improved endoscopic polyp
classification using GAN generated synthetic data augmentation. In: 2020 IEEE applied signal
processing conference (ASPCON), IEEE, pp 247–251
4. Liu Y, Zhou Y, Liu X, Dong F, Wang C, Wang Z (2019) Wasserstein GAN-based small-sample
augmentation for new-generation artificial intelligence: a case study of cancer-staging data in
biology. Engineering 5(1):156–163
5. Xinwei L, Jinlin G, Jinshen D, Songyang L (2021) Generating constrained multi-target
scene images using conditional sinGAN. In: 2021 6th International conference on intelligent
computing and signal processing (ICSP), IEEE, pp 557–561
6. Wang Z, Wang L, Duan S, Li Y (2020) An image denoising method based on deep residual
GAN. J Phys Conf Ser 1550(3):032127
7. Rawat S, Rana KPS, Kumar V (2021) A novel complex-valued convolutional neural network
for medical image denoising. Biomed Signal Process Control 69:102859
8. Jiang M, Zhi M, Wei L, Yang X, Zhang J, Li Y, Wang P, Huang J, Yang G (2021) FA-GAN:
fused attentive generative adversarial networks for MRI image super-resolution. Comput Med
Imaging Graph 92:101969
9. Nan F, Zeng Q, Xing Y, Qian Y (2020) Single image super-resolution reconstruction based on
the ResNeXt network. Multimedia Tools Appl 79(45):34459–34470
10. Jiang K, Wang Z, Yi P, Wang G, Lu T, Jiang J (2019) Edge-enhanced GAN for remote sensing
image superresolution. IEEE Trans Geosci Remote Sens 57(8):5799–5812
11. Dolhansky B, Ferrer CC (2018) Eye in-painting with exemplar generative adversarial networks.
In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7902–
7911
12. Ak KE, Lim JH, Tham JY, Kassim AA (2020) Semantically consistent text to fashion image
synthesis with an enhanced attentional generative adversarial network. Pattern Recogn Lett
135:22–29
13. Peng J, Zhou Y, Sun X, Cao L, Wu Y, Huang F, Ji R (2021) Knowledge-driven generative
adversarial network for text-to-image synthesis. IEEE Trans Multimedia
14. de Rosa GH, Papa JP (2021) A survey on text generation using generative adversarial networks.
Pattern Recogn 108098
15. Yan X, Cui B, Xu Y, Shi P, Wang Z (2019) A method of information protection for collaborative
deep learning under gan model attack. IEEE/ACM Trans Comput Biol Bioinf
16. Qu Y, Yu S, Zhou W, Tian Y (2020) Gan-driven personalized spatial-temporal private data

sharing in cyber-physical social systems. IEEE Trans Netw Sci Eng 7(4):2576–2586
17. Chen Z, Zhu T, Xiong P, Wang C, Ren W (2021) Privacy preservation for image data: a
GAN-based method. Int J Intell Syst 36(4):1668–1685
18. Chlap P, Min H, Vandenberg N, Dowling J, Holloway L, Haworth A (2021) A review of
medical image data augmentation techniques for deep learning applications. J Med Imaging
Radiat Oncol
19. Wu Y, Yang F, Xu Y, Ling H (2019) Privacy-protective-GAN for privacy preserving face
de-identification. J Comput Sci Technol 34(1):47–60
20. Chen J, Konrad J, Ishwar P (2018) Vgan-based image representation learning for privacy-
preserving facial expression recognition. In: Proceedings of the IEEE conference on computer
vision and pattern recognition workshops, pp 1570–1579
21. Mirjalili V, Raschka S, Ross A (2020) PrivacyNet: semi-adversarial networks for multi-attribute
face privacy. IEEE Trans Image Process 29:9400–9412
22. Kim BN, Dolz J, Jodoin PM, Desrosiers C (2021) Privacy-net: an adversarial approach for
identity-obfuscated segmentation of medical images. IEEE Trans Med Imaging
23. Uittenbogaard R, Sebastian C, Vijverberg J, Boom B, Gavrila DM (2019) Privacy protection in
street-view panoramas using depth and multi-view imagery. In: Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition, pp 10581–10590
24. Yale A, Dash S, Dutta R, Guyon I, Pavao A, Bennett KP (2020) Generation and evaluation of
privacy preserving synthetic health data. Neurocomputing 416:244–255
25. Aloufi R, Haddadi H, Boyle D (2019) Emotionless: privacy-preserving speech analysis for
voice assistants. arXiv preprint 1908.03632
26. Li K, Luo G, Ye Y, Li W, Ji S, Cai Z (2020) Adversarial privacy-preserving graph embedding
against inference attack. IEEE Internet Things J 8(8):6904–6915
27. Al-Rubaie M, Chang JM (2019) Privacy-preserving machine learning: threats and solutions.
IEEE Secur Priv 17(2):49–58
28. Tseng B-W, Wu P-Y (2020) Compressive privacy generative adversarial network. IEEE Trans
Inf Forensics Secur 15:2499–2513
29. Xu C, Ren J, Zhang D, Zhang Y, Qin Z, Ren K (2019) GANobfuscator: mitigating information
leakage under GAN via differential privacy. IEEE Trans Inf Forensics Secur 14(9):2358–2371
30. Hitaj B, Ateniese G, Perez-Cruz F (2017) Deep models under the GAN: information leakage
from collaborative deep learning. In: Proceedings of the 2017 ACM SIGSAC conference on
computer and communications security, pp 603–618
31. Wang Z, Song M, Zhang Z, Song Y, Wang Q, Qi H (2019) Beyond inferring class represen-
tatives: user-level privacy leakage from federated learning. In: IEEE INFOCOM 2019-IEEE
conference on computer communications, IEEE, pp 2512–2520
32. Triastcyn A, Faltings B (2020) Federated generative privacy. IEEE Intell Syst 35(4):50–57
Fusion-Based Celebrity Profiling Using
Deep Learning
K. Adi Narayana Reddy , Naveen Kumar Laskari ,

G. Shyam Chandra Prasad , and N. Sreekanth
Abstract Celebrity profiling predicts sub-profiles such as gender, occupation, birth

year, and fame from the celebrity tweets. This task is introduced in 2019 PAN chal-
lenge. Most of the researchers used stylistic features and content-based features and
predicted the sub-profiles. The accuracies of the existing model are not satisfactory.
The stylistic features learn the writing style of the celebrity, and the embedding vec-
tor learns context of the words. In this work, we propose a fusion-based deep learn-
ing technique which takes embedding’s vector and stylistic features. The proposed
model is implemented to predict sub-profiles such as fame, gender, and occupation.
The accuracies of the proposed model are improved over the existing models.
Keywords Author profiling · Celebrity profiling · Stylistic feature · Deep

learning · Fusion
1 Introduction
In 2019, PAN laboratories organized celebrity profiling challenge [1]. The celebrity
profiling task predicts gender, fame, birth year, and occupation of the celebrities. The
gender has male, female, and nonbinary as sub-profiles. Rising, star, and super star
are sub-profiles in the degree of fame. The birth year has ages between 1940 and 2012.
Sports, performer, creator, manager, science, politics, professional, and religion are
the sub-profiles under the occupation. This task has 48,335 user profiles written in 50
languages. Among these user profiles, 33,836 profiles are considered for training of
the model and remaining are for testing. Celebrity profiling analyzes the user tweets
and predicts the user traits such gender, degree of fame, birth year, and occupation.
Celebrity profiling is similar to author profiling. From 2013 to 2018, PAN laboratories
organized the author profiling task [2], which predicts demographics such as gender,
K. Adi Narayana Reddy (B) · N. K. Laskari · N. Sreekanth

BVRIT HYDERABAD College of Engineering for Women, Hyderabad, Telangana, India
e-mail: aadi.iitkgp@gmail.com
G. Shyam Chandra Prasad
Matrusri Engineering College, Hyderabad, Telangana, India
108 K. Adi Narayana Reddy et al.
age, native language, and personality. From 2013 to 2018, the organizers of the author
profiling task have the demographic features and dataset. In 2019, PAN laboratories
included celebrity profiling.
The profiles of the celebrities are used in applications such as marketing, foren-
sic, security services, and recommendation systems. Most of the celebrities follow
writing style, while writing text in social media platforms. In general, the writing
of the author never changes throughout their lifetime. To analyze the profiles of the
celebrities, researchers started using stylistic features such as word level, character
level, syntactic, and semantic features. The researchers found different styles by ana-
lyzing different datasets. Rangel Pardo et al. [2] analyzed the tweets and identified
that male authors write more about politics, sports, and technology, whereas female
author wrote more about lifestyle such as jewellery, shopping, and beauty. Koppel
et al. [3] identified that the features selection and content of the text play major in
gender prediction. Newman et al. and Pennebaker et al. [4, 5] observed that as age
of the author increases they tend to use prepositions, idioms, and determiners. They
also found that the younger authors wrote articles and pronouns in the text, while
older authors wrote lengthy sentences. Celebrity profiling predicts the class labels
such gender, degree of fame, birth year, and occupation.
The text data was represented in different ways by the researchers. On this, dif-
ferent machine learning algorithms were trained to classify the data. Most of the
researchers were used classification algorithms such as SVM, Naïve Bayes classi-
fier, and random forest. The paper is organized into 5 sections. Section 2 covers the
related work. The methodology is discussed in Sect. 3. The results and discussion is
covered in Sect. 4. The final Sect. 5 concludes the paper.
2 Related Work
In author profiling and celebrity profiling, most of the researchers were differentiated
the writing style of the author by selected stylistic features. Argamon et al. [6]
proposed technique extracted features such as corpus-extracted, stylistic, and lexicon-
based attributes, which are useful to distinguish range of age and gender of the author.
De-Arteaga et al. [7] proposed method, in which feature vector was generated using
TFIDF. Random forest is trained on this. They observed that the model not performed
because of usage of more features and also consumed more time and memory.
Petrik and Chuda [8] proposed model uses features as n-grams and trained using
logistic regression. This model outperformed in predicting gender of the author but
the performance is poor in predicting age-range, fame, and occupation. PAN 2019
competition 3rd ranker created four models one per each of the sub-profile and applied
mainly preprocessing on the tweets and used n-gram features such as unigrams,
character-level tetragrams. The experiments were conducted using classifiers such as
SVM, random forest, gradient boosting, and logistic regression. Logistic regression
gave good accuracy but not outperformed.
Fusion-Based Celebrity Profiling Using Deep Learning 109
Martinc et al. [9] proposed a transfer learning using ULMFiT. Four classifiers were
created to predict gender, fame, occupation, and birth year using ULMFiT. They got
accuracies 68, 51, 39, and 32 for each of the sub-profile gender, occupation, fame,
and birth year, respectively. Pelzer [10] implemented SVM and logistic regression
on TFIDF feature vector. TFIDF vector is generated with n-grams. The performance
of these algorithms was good for the bigrams comparing with other n-grams.
Radivchev et al. [11] proposed model use word distance as feature vector and
implemented six algorithms on each of task gender, fame, occupation, and birth year.
Decision tree, random forest, Naïve Bayes, KNN, logistic regression, and SVM are
the classification algorithms and predicted all the four profiles.
Asif et al. [12] extracted socio-linguistic feature from the user tweets. Logistic
regression is applied on this feature vector and got the accuracies as 88, 65, 38.7
for gender, fame, birth year, respectively. The multinomial Naïve Bayes classifier
got 56.7 as accuracy for the occupation prediction. Kavadi et al. [13] proposed sub-
profile-based weightage (SPW) for feature representation. This has outperformed
the existing models. In this paper, we represent the tweets as combination of stylistic
and word embedding. We propose fusion-based deep learning algorithm to predict
profiles such as gender, fame, occupation, and birth year.
3 Methodology
3.1 Dataset
The dataset is taken from PAN 2019 challenge. The training set of the dataset is in
English, and the details are presented in Table 1. The dataset has 48,835 user profiles.
The average tweets per user are 2181. The dataset is not balanced. Some of the sub-
profiles in profile are not balanced, and while building the model, those low user
count sub-profiles were not considered.
3.2 Evaluation Measure
The performance of the model is evaluated by F1 score, precision, recall, and accu-
racy. Precision checks how frequently it gets it True when it predicts True. Recall
checks how often it forecasts True when it is actually True.
Table 1 Train dataset

Profile name Sub-profile Number of tweets
Fame Star 25,230
Rising 1490
Superstar 7116
Gender Male 24,221
Female 9583
Nonbinary 32
Occupation Creator 5475
Manager 768
Performer 9899
Politics 2835
Professional 525
Science 818
Sport 13,481
Religious 35
3.3 Stylistic Features
Initially, the preprocessing is applied on the tweets. In the preprocessing, all the
words are converted into lower case and also unwanted data which is not required for
stylistic features is removed. The stylistic features are used to identify the author’s
writing style. In general, the features of the document are considered as word count,
sentence count, average word count, period count, and average word length, count
of exclamation marks, count of colons, count of commas, and count of semicolons.
Along with these, other features also extracted from each of tweet as count of @,
count of hashtag, count of URLs, count of re-tweets, positive and negative word
count, and POS tags. The tweets are represented as vectors using stylistic features.
3.4 Word Embedding
Word embedding plays an important role in tasks such as text classification, text sum-
marization, question and answering, and machine translation. The text is represented
as word embedding using Word2Vec [14, 15], Glove [16], and FastText [17].
Word2Vec Word2vec [14, 15, 18] is a popular word embedding technique which
uses shallow two layer neural network. It is trained on a large corpus which learns
the context of the words. The vector representation is carried by two methods CBOW
and Skip-grams.
Fig. 1 Fusion-based deep learning architecture
Glove Glove [16], global vector representation, uses co-occurrence of words globally.
It learns the substructure of words and outputs the vector representation.
FastText FastText [17] is the extension of Word2vec model. It represents each word
as n-gram of characters. It also generated word vectors for unknown words or out of
vocabulary words. The word representation of FastText works well for rare words.
3.5 Method
In the proposal, we use both content-based features using word embedding’s and
stylistic features as input to the model and predict the profiles of the celebrities. The
architecture is presented in Fig. 1. The LSTM takes word embedding’s as input. The
flow of LSTM is presented with the mathematical equations.
In the experiment, word embedding’s and stylistic features were given as input to
LSTM and fully connected network, respectively. For each profile the model is trained
and the results are presented in Table 2. The accuracy of the proposed model more
than existing models for the profiles gender, fame, occupation.
Table 2 Accuracy comparison

Model/PAN team Gender Fame Occupation
Morenosandoval 64.4 56.3 46.9
Radivchev 72.6 55.1 51.5
IQCFTW 69.7 82.98 83.67
SUTW 77.3 87.76 91.54
Proposed fusion 78.1 88.2 84.3
The fusion of two different data representation is more effective than the single
word embedding or stylistic features. The classification accuracies of gender, fame,
and occupation are 78.1%, 88.2%, and 84.3%, respectively. The test accuracies of
the proposed model are higher than the existing stylistic and content-based models.
The fusion of content and stylistic features dominated other existing models.
5 Conclusion
The celebrity profiling predicts the author demographics such as gender, occupation,
fame, and birth year by analyzing the user written text. In this proposal, we consid-
ered gender, fame, and occupation Profiles. In the proposal, the content-based and
stylistic-based features are used. The fusion of these features improved the accu-
racy of the profiles such as gender, fame, and occupation. The best accuracies were
achieved using deep leaning. In the future, we are planning to propose attention-based
technique to predict the profiles.
References
1. https://pan.webis.de/clef19/pan19-web/celebrity-profiling.html
2. Rangel Pardo F, Rosso P, Koppel M, Stamatatos E, Inches G (2013) Overview of the author
profiling task at PAN 2013. In: Forner P, Navigli R, Tufis D (eds) CLEF 2013 evaluation labs
and workshop—working notes papers, Valencia, Spain, Sept 2013. CEUR-WS.org, pp 23–26
3. Koppel M, Argamon S, Shimoni A (2003) Automatically categorizing written texts by author
gender. Lit Linguist Comput 401–412
4. Newman ML, Groom CJ, Handelman LD, Pennebaker JW (2008) Gender differences in lan-
guage use: an analysis of 14,000 text samples. Discourse Process 45(3):211–236
5. Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC 2001,
vol 71, no 2001. Lawrence Erlbaum Associates, Mahwah, pp 2001–2009
6. Argamon S, Koppel M, Pennebaker JW, Schler J (2007) Mining the blogosphere: age, gender
and the varieties of self-expression. First Monday 12(9)
7. De-Arteaga M, Jimenez S, Duenas G, Mancera S, Baquero J (2013) Author profiling using
corpus statistics, lexicons and stylistic features-notebook for PAN at CLEF
8. Petrik J, Chuda D (2019) Twitter feeds profiling with TF-IDF-notebook for PAN at CLEF
2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF 2019 labs and workshops,
notebook papers, Sept 2019. CEUR-WS.org
9. Martinc M, Škrlj B, Pollak S (2019) Who is hot and who is not? Profiling celebs on twitter-
notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF
2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
10. Pelzer B (2019) Celebrity profiling with transfer learning-notebook for PAN at CLEF 2019. In:
Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF 2019 labs and workshops, notebook
papers, Sept 2019. CEUR-WS.org
11. Radivchev V, Nikolov A, Lambova A (2019) Celebrity profiling using TF-IDF, logistic regres-
sion, and SVM-notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller
H (eds) CLEF 2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
12. Asif MU, Naeem S, Ramzan Z, Najib F (2019) Word distance approach for celebrity profiling-
notebook for PAN at CLEF 2019. In: Cappellato L, Ferro N, Losada DE, Müller H (eds) CLEF
2019 labs and workshops, notebook papers, Sept 2019. CEUR-WS.org
13. Kavadi DP, Al-Turjman F, Adi Narayana Reddy K, Patan R (2021) A machine learning approach
for celebrity profiling. Int J Ad Hoc Ubiquitous Comput 38(1–3):111–126
14. Mikolov T, Chen K, Corrado GS, Dean J (2013) Efficient estimation of word representations
in vector space. ICLR
15. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of
words and phrases and their compositionality. Adv Neural Inf Process Syst 26
16. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword
information. Trans Assoc Comput Linguist 5. https://doi.org/10.1162/tacl_a_00051
17. Xu J, Du Q (2019) A deep investigation into fastText. In: 2019 IEEE 21st international
conference on high performance computing and communications; IEEE 17th international
conference on smart city; IEEE 5th international conference on data science and sys-
tems (HPCC/SmartCity/DSS), pp 1714–1719. https://doi.org/10.1109/HPCC/SmartCity/DSS.
2019.00234
18. Rong X (2014) word2vec parameter learning explained
DeepLeaf: Analysis of Plant Leaves Using
Deep Learning
Deepti Barhate, Sunil Pathak, Ashutosh Kumar Dubey, and Varsha Nemade
Abstract A growing number of scientists are examining the issue of the survival
of plant species under adverse climate conditions caused by global warming. The
extinction of some plant species is a more concern and such they must be saved
one must have experience and expertise with the species before one can assess it
which is manual and time consuming. Various Scientific methods are being evolved
such as image processing, digital camera, mobile devices, pattern recognition but it
is lagging by accuracy. For such problems, solution could be to identify the correct
species of plant by identifying recent methods like Convolutional Neural Network
(CNN) and Visual Geometry Group-16 (VGG16), deep learning, machine learning.
The proposed System is comprised CNN and VGG16 for Feature fusion extrac-
tion which extracts shape, texture, Contour, Margin. Finally, the results of each
feature were combined and classified using Hyper Parameter Tuned Gradient Descent
(HPTGD) classifier with dimension reduction method PCA. This paper represents
collection of images, preprocessing, and extraction of features using deep learning
methods and Classification on Flavia dataset. The preprocessing was done on images,
Augmented and forwarded for CNN + VGG16 and Classifier. Our model achieved an
accuracy up to 97%. It has been observed that the VGG16 architecture with HPTGT
classifier achieved better accuracy at a similar execution time compared to other
methodologies.
D. Barhate (B) · S. Pathak

Amity School of Engineering & Technology, Department of Computer Science & Engineering,
Amity University Rajasthan, Jaipur, India
e-mail: deepti1.barhate@gmail.com
S. Pathak
e-mail: spathak@jpr.amity.edu
A. K. Dubey
Chitkara University School of Engineering and Technology, Chitkara University, Himachal
Pradesh, India
e-mail: ashutosh.dubey@chitkara.edu.in
V. Nemade
e-mail: varsha.nemade@nmims.edu
116 D. Barhate et al.
Keywords CNN · VGG16 · PCA · HPTGD
1 Introduction
Since ancient period plants have been used for various purpose such as medicine,
decoration, environment, and mainly agriculture. Due to various properties of
different plants these are also used in preparation of food items, scents, beauty
product, and medicine. For increasing the demand of such plants, one should know
about its names and properties which can be done by providing solution in the form
of automatic species recognition using deep learning. In this computerized time,
individuals do not have sufficient information to distinguish different natural plants
which were utilized by our ancestor for long time [1]. As of now, the recognition of
herbal plants is simply founded on the human discernment or information. To amend
this issue technical process is required. When a plant species is recognized, its leaf,
fruit, and flower parts are examined [2]. The leaves, which are available almost all
year round, are perceived by as useful for the identification of plant species [3]. DL
utilizes both classification prior feature extraction in addition to the preprocessing
of images. In conventional computer vision approaches to plant recognition, these
accuracies have been reported as 94% [4], 76.3 [5], 90 [6]. The deep learning and
image processing methods were combined termed as CenterNet model [7] which
detected Vegetables by drawing rectangles around it, the rest of the green images out
of drawn box was considered as weed. For the background removal a color index-
based segmentation was used and evaluated through Genetic Algorithm. The CNN is
a supervised deep learning (DL) approach. A convolution layer has both convolutions
and pooling layers (FCLs) [8–14]. New technologies have been made possible such
as ALEXNET, Resnet50, VGG16, and Inception V3 for identifying Plant species.
Recognition is basically the assurance of the likenesses or contrasts between two
components, i.e., two components are something very similar or they are unique. In
this paper CNN and VGG16 have been introduced to extract feature and combined
various features such as shape, contour, margin, color, texture. For extracting all
features dimensionality reduction by PCA followed by HPTGD classifier were used
the flow of process is depicted in Fig. 1. The performance analysis is identified on
Flavia datasets which is high as compared to other methods it is observed that the
results obtained by these methods with feature fusion methods enhanced the results
compared with the other methodologies which is depicted in Fig. 2. The final output
of this system will be species name of plants in which various stages of images were
used for research.
DeepLeaf: Analysis of Plant Leaves Using Deep Learning 117
Fig. 1 Proposed methodology
Fig. 2 Comparison with existing system

2 Related Work
The Author [15] worked using Semantic Annotation-Based Clustering (SABC) and
Semantic-Based Clustering (SBC) for picture and site page, respectively. Both the
pictures as well as page content was recovered in proposed work. Various factors such
as calculation time, review and accuracy were investigated using SABC methodology.
In [16] Author presented a work based on Multilayer perceptron and Ada-supporting
for extraction of morphological features such as shape, color, margin, texture with
classification. This system had preprocessing of images, features extraction, clas-
sification, and prediction of result which achieved over 90% of precision. In [17]
they proposed a method involving IOT framework for Fine grained infection detec-
tion. The analysis done by system was fetched and forwarded to farmers for next
action. They have also worked on multidimensional feature compensation residual
neural network (MDFC–ResNet) model for better results. In [18] they have intro-
duced a novice system of plant recognition from contour data of leaf images so
that different species of plants will get distinguished by observing contour informa-
tion. Arranging impeded plant leaves are more difficult than whole leaf matching
due to enormous varieties and intricacy of leaf structures. In the article [19], the
author proposed a leaf segmentation method based on overlapping-free individual
leaf segmentation. They have identified plant point clouds using 3D filtering for
removing leaves images with covering constrains. Other 3D sifting method was intro-
duced to add Radius-based Outlier Filter (RBOF) and Surface Boundary Filter (SBF)
to assist with isolating blocked leaves. Concentric circles-based technique to inves-
tigate outer layer of the leaves was presented in [20]. The shading progression effect
in paired pictures are identified to get the compound leaves. The technique was deliv-
ered the most extreme accuracy in prediction of leaves. In [21], Convolutional Neural
Network (CNN), AlexNet, fine tune AlexNet and D-Leaf were used for preprocessing
and various features extraction. The hybridization of these methods extracted better
features followed by classification provided better results compared to other avail-
able methods. In [22] Author proposed various CNN models such VGG16, VGG19,
InceptionV3, and Xception with classifier ANN and Xception-SVM and Xception-
SVM-BO these two CNN models were used with SVM classifier. Out of above model
Deep Herb model (Xception + ANN) extracted features with accuracy of 97.5% using
Deep herb dataset. In [23] Author used canny detection method for edge detection.
The edge detected images are used for further feature extraction. They have proposed
three CNN models such as, ResNet101, InceptionV3, and VGG16, to avoid training
model using unwanted images transfer learning was used. They found the excellent
results with accuracy of 97.32 using Inception V3 which are high as compared to
Other Models. In [24] they have used MobileNet, Xception, and DenseNet-121 they
have used hybridizations of these models as homogeneous (MoMoNet-MobileNet +
MobileNet, XXNet, DEDeNet, and heterogeneous models as MOXNet (MobileNet
+ Xception), XDeNet (Xception + DeNet), MoDeNet (MobileNet + DenseNet)
with set of Linear Discriminant Analysis (LDA), multinomial Logistic Regression
(MLR), k-Nearest Neighbor (k-NN), Naïve Bayes (NB), Bagging Classifier (BC),
Random Forest Classifier (RF), Classification and Regression Tree (CART), Multi-
layer Perceptron (MLP), and Support Vector Machine (SVM) classifiers out of which
MoDeNet + MLR word best with accuracy of 98.71%. In [25] Author worked on
shape and texture features of Leaf. For this they have proposed multiscale triangle
descriptor (MTD), local binary pattern histogram Fourier (LBP-HF) the former one
extracted shape feature whilst later one extracted texture features finally both the
shape and texture features were added up to get final accuracy. They have achieved
an accuracy of 77.6% on Flavia dataset. As per the literature and study it is observed
that the traditional methods such as simple CNN and classifiers did not work for
feature fusion methodology. The missing work is no one had used various stages
of leaves such as seedling, tiny, matured, and dried. While in this research we have
considered above stages of leaves images.
3 Methodology
Various computer algorithms such as neural network, deep learning algorithms

detects the species of plant leaves which improves accuracy Fig. 1 depict the proposed
methodology. For this research various images of different stages of leaves are consid-
ered followed with preprocessing and augmentation. For feature extraction conven-
tional deep learning methods such as CNN and VGG16 are used which enhanced
the extraction in single epoch. Dimension reduction was done by enhanced Principal
component analysis and classification was done by various classification methods
such as KNN, random forest, and HPTGD classifier out of which last classifier
generated accurate results with correct species name of plant.
3.1 Feature Extraction
The most prominent feature of leaf is Shape as it has various dimensions and contour
apart from that margin, texture, venation, apex to centroid ratio, eccentricity can be
calculated. For this research various feature fusion method is proposed which extracts
Shape, contour, Texture, venation features and combined to detect exacts species
of plant. For this CNN and VGG16 were used for feature extraction in addition
for dimensionality reduction we have used PCA followed by HPTGD classifier.
The above experiment is conducted on Flavia dataset. Which consist of 33 species
having in total 1907 images. These dataset had images of species such as Chinese
rose, pinewood, acer, barberries, citrus.
In addition, analysis is carried out for Flavia by considering Local Directional
Patterns (LDP) and Local Binary Patterns (LBP). The results for Flavia dataset are
shown in Table 1. For feature extraction texture is considered and classified using
KNN. Existing method showed 96.03% accuracy for LBP and 96.94% for LDP.
Table 1 Analysis in terms of features and descriptor

Descriptor Feature Classifier Dataset Existing Proposed
LBP Texture KNN Flavia 96.03 97
LDP Texture KNN Flavia 96.94 97
However, the proposed system showed better accuracy rate (97%) for LBP and LDP
shown in Table 1.
3.2 CNN and VGG16
Our deep Network consist of CNN and VGG16. VGG model, as an extremely new
style network architecture in convolutional neural network, investigates the relation-
ship between convolutional neural network and their execution [21]. VGG16 model,
has smaller kernels parts. Also, it has various parameters than AlexNet. The first
five layers are based on VGG16 for features extraction and provide more resolution
map which provide high resolution map. Finally, the full connected layer reduced the
parameters. Dimension reduction is done by enhanced Principal component analysis.
It removes the unwanted parameters and irrelevant feature without any information
loss. Hyper parameter tuning with gradient descent (HPTGD) reduced the prede-
fined loss functions and classified the species without any error. Input layer is fed to
zero padding layer for eliminating information loss. Then this is fed to into Rectified
Linear Activation Function (ReLU) which generated positive outputs. This result is
then applied to max pooling which extracted maximum element, followed by average
polling for average feature map. We have also compared our results with naive Bayes,
random forest, SVM classifier but our proposed solutions gave more accurate results
as compared to other methods.
Performance of Model is analyzed on Flavia dataset having 32 species with 3857

images by calculating the accuracy, precision, recall, F1-measure. The following
Table 2 depicts the results of performance analysis, compared with several methods.
Our model achieved an accuracy of 97% compared to other models. Various perfor-
mance metrics have been considered for better results as accuracy, precision, Recall,
and F1-measure. Compared to various traditional classifiers our system achieved
good results as shown in Table 2.
Table 2 Performance
Performance Random forest Support vector Proposed
analysis compared with other
metrics machine
methods
Accuracy 84.11 79.05 97
Precision 85.25 82.04 88
Recall 83.52 79.05 84
F1-measure 82.23 79.36 83
Table 3 Comparison of other

Methods Accuracy
methods with proposed
method GIST 84.7
MSRA 86.56
SIFT 87.5
Overall existing method 86.25
Proposed method 97
4.1 Comparative Analysis with Respect to Accuracy
For analysis of the proposed and existing systems for Flavia dataset, Scalar Invariant
Fourier Transform (SIFT), Global Image Descriptor (GIST), Multiscale R-Angle
(MSRA) has been considered. The obtained results are shown in Table 3.
Figure 2 shows the parameter extraction. The model consists of convolution layer,
max pooling, and drop out layer. It has 4 layers of convolution, 3 layers of max
pooling, and 2 layers of dropout. The total parameters were execrated are 1,653,568.
The results were classified by HPTGD and PCA with accuracy of 97% as shown in
Table 3.
5 Conclusion
CNN and other hybrid models working well in field in agriculture but need to be
more specific and effective. We proposed an improved method of CNN for Recog-
nition of huge dataset images of different plant species. The proposed model is
hybrid version of CNN with VGG16 which improved the Classification accuracy.
Unwanted parameters and irrelevant features were eliminated by proposed Hyper
parameter tuning with gradient descent and Principal Component analysis was used
for dimension reduction as a results our model achieved the highest classification rate
of 97% compared with other classifiers. We have performed the experiment using
Other classifiers as Random Forest and SVM along with proposed systems and it
has been observed that our Proposed Method achieved an state of art Performance
as compared to others.
References
1. Pravin A, Deepa C (2021) A identification of piper plant species based on deep learning
networks. Turk J Comput Math Educ (TURCOMAT) 12(10):6740–6749
2. Raj AP, Vajravelu SK (2019) DDLA: dual deep learning architecture for classification of plant
species. IET Image Proc 13(12):2176–2182
3. Aakif A, Khan MF (2018) Automatic classification of plants based on their leaves. Biosyst
Eng 139:66–75
4. Selvam L, Kavitha P (2020) Classification of ladies finger plant leaf using deep learning. J
Ambient Intell Humanized Comput 1–9
5. Jin T, Hou X, Li P, Zhou F (2015) A novel method of automatic plant species identification
using sparse representation of leaf tooth features. PloS one 10(10):e0139482
6. Wu SG, Bao FS, Xu EY, Wang YX, Chang YF, Xiang QL (2017) A leaf recognition algorithm for
plant classification using probabilistic neural network. In: 2007 IEEE international symposium
on signal processing and information technology 7 Dec 15, IEEE, pp 11–16
7. Jin X, Che J, Chen Y (2021) Weed identification using deep learning and image processing in
vegetable plantation. IEEE Access 8(9):10940–10950
8. Ashhar SM, Mokri SS, Abd Rahni AA, Huddin AB, Zulkarnain N, Azmi NA, Mahaletchumy
T (2021) Comparison of deep learning convolutional neural network (CNN) architectures for
CT lung cancer classification. Int J Adv Technol Eng Explor 8(74):126
9. Barhate D, Nemade V (2019) Comprehensive study on automated image detection by using
robotics for agriculture applications. In: 2019 3rd International conference on electronics,
communication and aerospace technology (ICECA), Jun 12. IEEE, pp 637–641
10. Kumar PY, Singh P, Pande S, Khamparia A (2022) Plant leaf disease identification and
prescription suggestion using deep learning. In: Proceedings of data analytics and management.
Springer, Singapore, pp 547–560
11. Minowa Y, Kubota Y (2022) Identification of broad-leaf trees using deep learning based on
field photographs of multiple leaves. J Forest Res 1–9
12. Tarek H, Aly H, Eisa S, Abul-Soud M (2022) Optimized deep learning algorithms for tomato
leaf disease detection with hardware deployment. Electronics 11(1):140
13. Senthil T, Rajan C, Deepika J (2021) An efficient CNN model with squirrel optimizer for
handwritten digit recognition. Int J Adv Technol Eng Explor 8(78):545
14. Mundada MR, Shilpa M (2022) Detection and classification of leaf disease using deep neural
network. In: Deep learning applications for cyber-physical systems. IGI Global, pp 51–77
15. Deepa C (2017) SABC-SBC: a hybrid ontology based image and webpage retrieval for datasets.
Automatic Control Comput Sci 51(2):108–113
16. Kumar M, Gupta S, Gao XZ, Singh A (2019) Plant species recognition using morphological
features and adaptive boosting methodology. IEEE Access 7:163912–163918
17. Chaudhury A, Barron JL (2018) Plant species identification from occluded leaf images.
IEEE/ACM Trans Comput Biol Bioinfo 17(3):1042–1055
18. Chouhan SS, Kaul A, Singh UP, Jain S (2018) Bacterial foraging optimization based radial
basis function neural network (BRBFNN) for identification and classification of plant leaf
diseases: an automatic approach towards plant pathology. IEEE Access 6:8852–8863
19. Li D, Cao Y, Shi G, Cai X, Chen Y, Wang S, Yan S (2019) An overlapping-free leaf segmentation
method for plant point clouds. IEEE Access 7:129054–129070
20. Chau AL, Hernandez RR, Mora VT, Canales JC, Mazahua LR, Lamont FG (2017) Detection
of compound leaves for plant identification. IEEE Latin Am Trans 15(11):2185–2190
21. Wei Tan J, Chang SW, Abdul-Kareem S, Yap HJ, Yong KT (2018) Deep learning for plant
species classification using leaf vein morphometric. IEEE/ACM Trans Comput Biol Bioinfo
17(1):82–90
22. Gu J, Yu P, Lu X, Ding W (2021) Leaf species recognition based on VGG16 networks
and transfer learning. In: 2021 IEEE 5th advanced information technology, electronic and
automation control conference (IAEAC), Mar 12. vol 5, IEEE, pp 2189–2193
23. Roopashree S, Anitha J (2021) DeepHerb: a vision based system for medicinal plants using
xception features. IEEE Access 9:135927–135941
24. TS SK, Prabalakshmi A (2021) Identification of indian medicinal plants from leaves using
transfer learning approach. In: 2021 5th international conference on trends in electronics and
informatics (ICOEI), Jun 3. IEEE, pp 980–987
25. Yang C (2021) Plant leaf recognition by integrating shape and texture features. Pattern Recogn
112:107809
Potential Assessment of Wind Power
Generation Using Machine Learning
Algorithms for Southern Region of India
P. Upendra Kumar, K. Lakshmana Rao, and T. S. Kishore
Abstract Now a day, large scale grid interconnected wind power generation systems
are increasing day by day, the stable operation of grid highly depends on the amount
of wind energy penetrating into the grid. This is not only essential for stable operation
but also necessary for generation allocation and load scheduling. In order to achieve
this, a precise method for estimating the potential is necessary. In this paper, a modest
attempt has been made to estimate the potential of wind power generation for southern
region of India. The methodology presented is based on an efficient machine learning
algorithm based regression methods viz. linear, support vector, K-nearest neighbour,
and decision trees regression models for prediction of number of units’ generated and
output power has been presented. To evaluate the efficiency of these algorithms key
performance indicators such as mean absolute error, mean square error, root mean
square error and R2 score have been considered. It has been observed that linear
regression model performs better than all the other methods considered in this study
and the same was summarized in the results.
Keywords Wind power output · Grid interconnection · Potential assessment ·

Machine learning · Regression
1 Introduction
The global demand for power in developing countries is predicted to quadruple by

2030, growing at a rate of nearly 4–5% each year. In addition, emerging countries’
share of global electricity demand is predicted to increase from 27% in 2000 to 43%
in 2030. To meet this growth there is need of enhancement of renewable energy
P. Upendra Kumar (B) · K. Lakshmana Rao · T. S. Kishore

GMR Institute of Technology, Rajam, Vizianagaram, Andhra Pradesh 532127, India
e-mail: upendrakumar.p@gmrit.edu.in
K. Lakshmana Rao
e-mail: lakshmanarao.k@gmrit.edu.in
T. S. Kishore
e-mail: kishore.ts@gmrit.edu.in
126 P. Upendra Kumar et al.
sources in the country. As per the recent statistics of Central Electricity Authority
(CEA) the installed capacity in India is 392,017 MW by November 2021. In the total
generation the share of renewable energy sources is around 26.5% i.e. 104,031 MW
in which the wind power generation is around 40,034 MW i.e. 38.48% out of all
renewable energy sources as per CEA 2021 November statistics. The Peak demand
in India is around 203,014 MW with a peak meet in demand 200,539 MW with a
deficits of 1.2%. With the associated environmental issues and depleting fossil fuel-
based resources, the emphasis is now shifting to the utilisation of renewable energy.
This necessitates the extension of the grid to renewable energy sources in order to
capitalise on generation diversity and dispersed renewable resources [1].
To fulfil the ever-increasing energy demand, new generation capacity must be
planned and built simultaneously. In this aspect wind power generation technology
is one of the most widely explored types of renewable energy technologies in order
to meet rising load demand while reducing carbon emissions and protecting fossil
fuels and natural resources with minimum operating cost. It also has the advantage of
manufacturing the wind turbines and the other auxiliary equipment at industries and
readily assembling them at the project construction site making it easy for installation
and operation. It is evident that such a huge amount of grid connected wind power
generation needs proper planning in execution and operation to maintain the grid
stability. In this aspect precise estimation of wind power generation is necessary for
effective planning. However the output power generation from the wind power plant
is random in nature. It depends on climatic conditions, wind speed, wind direction
etc. One of the most reliable ways of meeting rising load demand by augmenting
wind power generation is to perform wind power generation assessment to have prior
information about the power that could be generated so that other activities related
to generation scheduling and O and M can be planned without effecting the grid
operations. In order to connect the wind power system to grid an efficient prediction
mechanism is required to avoid grid instability problems [2].
Predicting wind energy is not an easy process as it is highly climate dependent
and atmospheric conditions that change over time. Earlier the process of estima-
tion depends on metrological data from numerical weather prediction and satellite
images illustrating clouds movement to predict wind speed, wind direction and the
other dependent parameters. The fundamental problem of older methods is that the
requisite meteorological data is not always accessible for the wind power site, and
it is not always available at the required resolution level, limiting their applicability
for extremely accurate forecasts. To overcome the difficulty it is important to use
new and intelligent methods to get valid and accurate results. Presently, advanced
estimation algorithms and techniques for power output estimation which combines
the advantages of artificial intelligence and machine learning algorithms are gaining
importance because, they can extract detailed information from wind power records
and produce more reliable forecast results [2]. In recent past, Machine learning
brought radical changes in various domains. In this connection, recently, most of
the researchers started integrating Machine learning based prediction methods in the
field of electrical engineering like grid management, fault prediction, load balancing,
output power prediction and load prediction etc. [3, 4]. Regression models such as
Potential Assessment of Wind Power Generation… 127
linear regression, support vector regression, K-nearest neighbour regression and deci-
sion trees regression are some of the most popular ways of supervisory learning in
the machine learning domain. In this paper it is proposed to assess the potential of
wind power generation using the previously mentioned regression based machine
learning techniques for southern region of India and the results are summarized.
2 Wind Power Technology
Wind Energy is a technology that captures the natural wind in our environment
and converts it into mechanical energy. Differences in air pressure generate wind.
Wind speeds differ depending on location, topography, and season. The apparatus
that converts air velocity into power is known as a turbine. Turbines are enormous
structures with multiple spinning blades. When the wind drives the blades to spin,
generates electrical energy since they are connected to an electro-magnetic generator
[5] (Figs. 1 and 2).
Kinetic energy is the energy associated with wind movement and is given by
1 2
KE = mv (1)
2
where m is the air mass in kg, V is the velocity of air m/s
Fig. 1 Layout diagram of wind power system

Fig. 2 Typical parts in wind power systems
Table 1 Parameters affecting

Name of the feature Description of the feature Units
wind power generation
Wind_speed Wind speed m/sec
Wind_direction Direction of wind Deg
Wind_energy Energy output Kwh
Wind_power Power output Kw
dE
Power = W (2)
dt
1
Power = ρ AV 3 W (3)
2
where ρ is the air density in kg/m3 , A is the Area of cross section of blade movement
in m2 (Table 1).
3 Methodology
The intensity of wind in a region depends on latitude, time of year, atmospheric condi-
tions. Problems arise from velocity of wind, temperature, wind direction, expensive
energy storage, grid stability, and continuous fluctuations due to seasonal effects.
Also, integrating the wind power system to the power grid as an emergency power
source to cover the increasing demand is not directly technically feasible. Such a
structure affects the stability of the network. In short, fluctuations in weather condi-
tions lead to uncertainty in wind system performance. It is in this aspect, accurate
and precise models are required for estimation of wind power generation especially
for grid connected large scale systems. Earlier the process of estimation depends on
metrological data from numerical weather prediction and satellite images illustrating
clouds movement to predict wind velocity and the other dependent parameters [6]. To
overcome all the demerits in the earlier forecasting methods the present study focuses
on various machine learning based regression methods for predicting the performance
of wind power system. The proposed machine learning algorithms are used to predict
the number of units generated and wind power output from a wind power plant which
depends upon various independent variables such as Wind Direction, Wind Speed
[7–12].
3.1 Linear Regression
This model determines a best straight line which fits the given data effectively with
less error. It finds a linear/straight line between predicator (Wind Direction and Wind
Speed) and response (Number of billable units generated or output power) variables
are known as linear regression. If Y is the dependent variable and X is an independent
variable then the population regression line is given by.
Y = B0 + B1 X (4)
where B0 is constant and B1 is the regression coefficient.
3.2 Support Vector Regression
It is one more supervised learning approach used for both classification and regression
problems. It finds a best hyper plane with less error and fixes positive and negative
boundaries that hyper plane in its training phase. Then, in testing phase it checks new
point lies in which side of the hyper plane to predict its value. The decision surface
separating the classes in the hyper line of the form.
WTX + b = 0 (5)
where W is the weight vector, X is the input vector and b is the bias.
3.3 K-Nearest Neighbour Algorithm
The K-Nearest Neighbour algorithm is based on the Supervised Learning technique

and is one of the most basic Machine Learning algorithms. For distance metrics, one
can use the Euclidean metric.
/
d(a, a 1 ) = (a1 − a11 )2 + (a2 − a21 )2 + ...(an − an1 )2
1 Σ (6)
P( b = i|A = a) = I (b x = i )
K j∈X
In this algorithm K value plays vital role. Small K values provide the most
adaptable fit with little bias but large variance.
3.4 Decision Trees Regression
Decision trees are a non-parametric supervised learning technique used for classifi-
cation and regression. The goal is to build a model that predicts the value of the target
variable by learning simple decision rules derived from the data characteristics. It
builds models as tree structures. It decomposes a data set into smaller and smaller
subsets while gradually growing a related decision tree. The end result is a tree with
decision nodes and leaf nodes.
The information gain is given by.
F(b, a) = F(b) − F(b|a) (7)
The amount of information obtained for a random variable (b) is the reduction in
uncertainty when observing other existing variables (a).
4 Performance Indices
The performance of the proposed model is to be evaluated with the help of certain
performance indices called evaluation metrics [9]. The verification values of Mean
Absolute Error, Mean Square Error, and Root Mean square Error are close to zero
which means that the actual values are similar to predicted values. The magnitude of
the difference between a prediction and the true value of an observation is referred
to as absolute error. Mean Absolute Error computes the average of absolute errors
for a set of predictions and observations to determine the magnitude of errors for the
entire set. The Mean Square Error measures the average squared difference between
estimated value and the actual value. The mean square error is the standard deviation
of the residuals. The residual is a measure of the distance from the data points on the
regression line. The Root Mean Square Error is a measure of the distribution of these
deposits. In other words, it tells us how concentrated the data is around the best-fit
line. The R2 score describes how well the regression model fits the observed data.
Usually, a rating nearer to 1 indicates a more suitable model.
4.1 Mean Absolute Error (MAE)
It calculates the absolute difference between actual (Ai ) and predicted (Pi ) data point.
This difference gives an absolute
Σ error (E i ) made by the model. It finds the sum of
all absolute errors such as in E i divided by total number of data points is known as
Mean Absolute Error. Its mathematical representation is as shown in Eq. (8).
Σ
n
Mean Absolute Error = 1
n
|Ai − Pi | (8)
i=1
4.2 Mean Square Error (MSE)
This metric used as loss function. It finds squared distance between actual and
predicted data points. The square operation is useful to avoid the cancellation of
negative terms. Its mathematical representation is as shown in Eq. (9).
Σ
n
Mean Square Error = 1
n
( Ai − Pi )2 (9)
i=1
4.3 Root Mean Square Error (RMSE)
It tells how closely the data scattered around the line. It is measured as square root
of MSE as shown in Eq. (10).
[
| n
1]
|Σ
Root Mean Square Error = n (Ai − Pi )2 (10)
i=1
4.4 R2 Score
This metric describes about the performance of the regression method. It is the key
output of a regression analysis. It’s defined as the fraction of the dependent vari-
able’s variation that can be predicted by the independent variable. The determination
coefficient ranges from 0 to 1. The higher value of R2 score indicates that the model
better fits the observed data points. The calculation of R2 score divides the Sum of
Squared Regression (SSR) with the Sum of Squares Total (SST) and subtracts its
result from 1 as shown in Eq. (11).
Σ
n
(A j − P̂ j )2
SSR j=1
R Score = 1 −
2
=1− n (11)
SST Σ
(A j − Ā)2
j=1
5 Results and Discussions
For preforming the estimation, the day wise data for every 10 min is considered
from wind power plants operating in South India with various practical parameters
like voltage, current, power in terms of AC and DC quantities along with frequency.
The dataset consists of two features and 3962 instances. Of the total data collected
from the dataset, 80% of data is used for training and 20% of data is used for testing
purpose. The dependent variable Wind_Energy i.e. number of units generated and
Wind_Power i.e. output power relies on various independent variables such as wind
direction, wind speed. Table 2 and Fig. 4 illustrates the results of Mean Absolute
Error, Mean Square Error, Root Mean Square Error and R2 score metrics of all four
proposed regression methods. From the performance indices one can observe that the
Mean Absolute Error, Mean Square Error, and Root Mean Square Error of K-Nearest
Neighbour Regression are higher than all other regression methods. At the same time
the R2 Score of Support Vector Regression are less than all other regression methods.
The Support Vector Regression performs better than Decision Trees Regression and
K-Nearest Neighbour Regression. Of all the four methods it is evident that the Linear
Regression performs better than all the methods performed in this study. The error
analysis for Linear Regression model of the proposed machine learning algorithm is
depicted in Fig. 8 in terms of actual and predicted quantities.
Table 2 Performances indices values for regression models

S. No. Regression Mean absolute Mean square Root mean R2 score
name error error square error
1 Linear 0 0 0 1.000000
regression
2 Support 0.04 0.00 0.05 0.997769
vector
regression
3 K-nearest 21.76 1258.94 35.48 0.999914
neighbour
regression
4 Decision trees 4.56 73.29 8.56 0.999995
regression
Fig. 3 Performance indices illustration for regression models
6 Conclusions
In this paper, a modest attempt has been made to estimate the potential of wind
power generation for southern region of India. Since large scale grid interconnected
wind power generation systems are increasing day by day, the stable operation of
grid highly depends on the amount of wind energy penetrating into the grid. This is
not only essential for stable operation but also necessary for generation allocation
and load scheduling. In order to achieve this, a precise method for estimating the
potential is necessary. The methodology used in this study, an efficient machine
learning algorithm based regression methods viz. linear, support vector, K-nearest
neighbour and decision tree regression models for prediction of number of units’
generated and power output has been presented. The evaluation for efficiency of these
algorithms was tested based on key performance indicators such as mean absolute
error, mean square error, root mean square error and R2 score. It has been observed
that linear regression model performs better than all the other methods considered
in this study. The above techniques used and methodology proved to be efficient and
effective for potential assessment for wind energy systems. It is to be noted that the
methodology presented can be extended to all renewable power generation sources for
addressing the concerns regarding grid operation. Further, the proposed methodology
is extremely useful during planning stage for capacity fixation, generation and load
scheduling.
References
1. CEA, Load generation balance report, New Delhi, India, 2021

2. Peiris AT, Jayasinghe J, Rathnayake U () Forecasting wind power generation using artificial
neural network: “Pawan Danawi”—a case study from Sri Lanka. J Electr Comput Eng
3. Buturache AN, Stancu S (2021) Wind energy prediction using machine learning. Low Carbon
Economy 12:1–21
4. Singh U, Rizwan M, Alaraj M, Alsaidan I (2021) A machine learning-based gradient
boosting regression approach for wind power production forecasting: a step towards smart
grid environments. Energies 14:5196
5. Xiaoming W, Yuguang X, Bo G, Yuanjie Z, Fan C (2018) Analysis of factors affecting wind
farm output power. In: 2nd IEEE conference on energy internet and energy system integration
6. Elyasichamazkoti F, Khajehpoor A (2021) Application of machine learning for wind energy
from design to energy-water nexus: a survey. Energy Nexus 2
7. Deng YC, Tang XH, Zhou ZY, Y Yang, Niu F (2021) Application of machine learning
algorithms in wind power: a review. Energy Sources Part A 1–22
8. Eyecioglu O, Hangun B, Kayisli K, Yesilbudak M (2019) Performance comparison of different
machine learning algorithms on the prediction of wind turbine power generation. In: 8th Inter-
national conference on renewable energy research and applications, Brasov, ROMANIA, Nov.
pp 3–6
9. Khosravi A, Koury RNN, Machado L, Pabon JJG (2018) Prediction of wind speed and wind
direction using artificial neural network, support vector regression and adaptive neuro-fuzzy
inference system. Sustain Energ Technol Assess 25:146–160
10. Goh HH, He R, Zhang D, Liu H, Dai W, Lim CS, Kurniawan TA, Teo KTK, Goh KC
(2021) Short-term wind power prediction based on pre-processing and improved secondary
decomposition. J Renew Sustain Energy 13:053302
11. Deng X, Shao H, Hu C, Jiang D, Jiang Y (2020) Wind power forecasting methods based on deep
learning: a survey, computer modeling in engineering and sciences. CMES 122(1):273–301
12. Qureshi AS, Khan A, Zameer A, Usman A (2017) Wind power prediction using deep neural
network based meta regression and transfer learning. Appl Soft Comput 58:742–755
OCR-LSTM: An Efficient Number Plate
Detection System
M. Indrasena Reddy, K. Srinivasa Reddy, B. Rakesh, and K. Prathima
Abstract The traffic on roads is increasing day by day. It is becoming very difficult
to track the numbers of the vehicles which are violating traffic rules manually. To
address this issue, researchers has proposed various methodologies to detect the
number plates automatically. But the major issue in these methodologies is accuracy
of the existing methodologies is very low. To overcome this issue, a new efficient
methodology integrated with image processing is proposed. The proposed system
captures the number plate from the video of the vehicle. After capturing the image
of the number plate part, the long-short term memory (LSTM) algorithm is used for
recognizing the characters from captured number plate. This LSTM algorithm is an
optical character recognition algorithm. The recognized number is compared with
the numbers in the database to find the details of the vehicle. The proposed system
is implemented by python-tesseract. The proposed methodology is implemented in
a real time scenario at a busiest security gate. The proposed system is compared
with the existing methodologies, and it shown better accuracy than the existing
methodologies.
Keywords OCR · LSTM · Deep learning · Segmentation · Recognition
1 Introduction
In the number plate detection system (NPDS), the computer will detect the vehicle
numbers from the digital images of the vehicle. The number plate detection system is
integrated in various applications like traffic cameras, security cameras at buildings,
M. Indrasena Reddy (B) · K. Srinivasa Reddy · B. Rakesh · K. Prathima

Computer Science & Engineering Department, BVRIT HYDERABAD College of Engineering
for Women, Hyderabad, Telangana, India
e-mail: indrasenareddy.m@bvrithyderabad.edu.in
K. Srinivasa Reddy
e-mail: srinivasareddy.k@bvrithyderabad.edu.in
K. Prathima
e-mail: prathima.k@bvrithyderabad.edu.in
136 M. Indrasena Reddy et al.
industries, companies and business areas. Any NPDS will consists of three phases
namely Detection of number plate, segmentation of characters and recognition of
characters. As well as most of the commercial buildings needs the system in which
only authorized vehicles needs to be parked inside the parking areas. As well as these
systems need to be more efficient, accurate and fast as well as the systems must be
robust in nature.
After capturing of the vehicle images, the images need to processed in order to
extract the information like number of the vehicle. Processing of the image consists of
several phases like, segmentation of the image, enhancement of the image by filtering
unnecessary parts of the image. Before processing the image, the image needs to be
preprocessed by performing operation like conversion of RGB image to the gray
scale image. After converting to the gray scale image, blurring operation need to be
applied to the resultant image. After that thresholding and contouring operations will
be performed. The bilateral filter will be used for filtering and removing the unwanted
parts in the image like redundant fragments. The canny edge detection algorithm will
be used for detecting the edges of the image. In the NPDS, for extracting the text
from image the preprocessing operation is very important.
Extracting of text from the documented images like handwritten images, skewed
images and typed images is quite faster when compared with the extracting of the
text from number plates. The text from documented images can be extracted very
easily as they are having fixed parameters. But when coming to the number plates,
they will have variable parameters. The number plate images are captured from
different distances as well as the clarity of the image is also lower when compared
with the document images. Hence extraction of the text from the number plates is
a bit difficult when compared with the document images. The preprocessed image
will be given to tesseract for extraction of text from the image. The tesseract is a
package which contains LSTM for extraction of the text from image. The LSTM is
a deep learning technique. The LSTM detects the text with greater accuracy than the
existing algorithms.
1.1 Tesseract
The tesseract is a package contains two components libtesseract and tesseract. The
libtesseract is an OCR engine and tesseract is a command line program. The tesseract
package is having the capability to recognize the text in greater than 100 languages.
The tesseract can be trained to recognize the text in a language. It is having the
capability to produce the output in various formats like html, plain text and pdf.
OCR-LSTM: An Efficient Number Plate Detection System 137
1.2 Opencv
This is an open-source tool for both commercial and academic use. The objective of
this tool is real time computer vision. It has several applications like object identi-
fication, facial recognition, motion recognition, motion tracking etc. The OpenCV
package is used to convert the RGB image into the grayscale image. The grayscale
image will be filtered by using the bilateral filer for removing the unwanted parts of
the image which is also called as noise in the image.
1.3 Lstm
It is a specialized version of RNN, the problems that cannot be solved by using

the RNN can be solved by using the LSTM. In RNN, the memory is the short-
term memory whereas LSTM is long-term memory. The LSTM is a deep learning
technique, which is mainly used for sequence detection.
2 Literature Review
Authors of [1] have proposed an ALPR methodology. The proposed methodology

uses YOLO object detection algorithm. Here in the proposed methodology, the CNN
is trained and after that the fine-tuning operation is performed at each stage of the
ALPR to make the proposed ALPR methodology is robust under different environ-
mental conditions like low lighting, less clarity of the camera and background is not
clear. For character recognition and segmentation, a two-step approach is proposed.
The proposed methodology is analyzed by using SSIG dataset. The SSIG dataset
consisting of 2000 frames. The proposed methodology achieves less accuracy.
Authors of [2] have proposed a real time object detection methodology using
region proposal algorithm (RPN). The RPN detects the object bounds and generates
the high-quality region proposals. The RPN cannot work on smaller images.
Authors of [3] have proposed an ALPR methodology for detecting the Bangladeshi
vehicle number plates using neural network and chain code. In Bangladesh, the
common pattern will not be followed. Some vehicle will have Bangla language
and some plates will have English language and some vehicles will have number
plates in two line and some number plates will be in single line. In the proposed
methodology, the character recognition, segmentation and extraction are proposed
for Bangla language. The proposed methodology uses Sobel filter. The character
recognition is performed by using stored knowledge and chain code generation. The
proposed methodology can be used only for two languages.
Authors of [4] have proposed a methodology DL-ALPR for detecting Brazilian
number plates. The proposed methodology uses Dee learning technique for detecting
the number plates. The proposed methodology uses public datasets. The proposed
methodology able to show high detection accuracy with the number plates having 5
characters but it is not showing high detection accuracy when the number plates are
of 7 characters.
In generally the deep neural networks are very difficult to train. Authors of [5–7]
has proposed a methodology with learning frame work which can be trained very
easily. By using this frame work the objects are detected. The proposed methodology
requires more epoch to achieve high accuracy.
2.1 Problem Statement
From the existing works, it is observed that most of the works are achieving less
accuracy and not detecting the complete number on the number plates. Some of the
methods are limited to language specific number plates only. As well as some of the
methods are detecting the text which is not present on the number plate?
2.2 Objective
To overcome the above limitations OCR-LSTM methodology is proposed. Which

can.
. Enhance the detection accuracy.
. Reduces the detection time.
. Not recognizes the unwanted text around the number plate.
3 System Model
The proposed OCR-LSTM is an efficient number plate recognition system. The

proposed methodology first of all converts the RGB image into the gray scale image.
After converting the image into the gray scale, the image will be filtered by using the
bilateral filter. This filter will remove unwanted parts of the image that is noise in the
image will be removed. The edges of the number plate are detected by using the canny
edge detection algorithm. After detecting the edge contours around the image will
be drawn by using contour’s function. The contour points will remove the unwanted
parts of the image and in the proposed methodology the contour points are restricted
to 20 points. By using this operation, only number plate will be extracted from the
total image. The image will then give to OCR-LSTM algorithm. This algorithm
will identify the number on the number plate. Figure 1, describes the OCR-LSTM
procedure.
Fig. 1 Block diagram of

OCR-LSTM methodology
3.1 Conversion of RGB Image to Gray Scale Image
The RGB image can be converted into gray scale image in two ways. Average method
and weighted method.
3.1.1 Average Method
In the RGB image, any pixel in the picture is having three colors red, green and blue.
The weights of these colors will be from 0 to 255. To convert the RGB image to the
gray scale image, each pixel color weight will be divided by 3 and resultant values
will be added. The Eq. (1) represents the conversion formula for RGB image to the
gray scale image by using average method
Gray scale = R/3 + G/3 + B/3 (1)
3.1.2 Weighted Method
The weighted method is also called as luminosity method. Here the colors are
weighted according to the wave lengths. Figure 2 depicts the weighing method-
ology for conversion of RGB image to gray scale image. Here the average will not
be taken into the consideration. As per the Fig. 2, the red color is having higher wave
length than the green and blue color. So for converting the RGB image into gray
scale image 30% of red color, 53% of green color and 17% of blue color value will
be taken into the consideration.
Fig. 2 Weighted method
Fig. 3 Bilateral filter

algorithm
Table 1 Bilateral algorithm filter parameters

Symbol Parameter Gray scale RGB image
W Window size 5 5
σd Spatial domain standard deviation 3 3
σr Intensity domain standard deviation 0.1 10
N Gaussian noise intensity 0.03 0.03
3.2 Bilateral Filter
It is edge-preserving, non-linear and noise removing filter. This methodology replaces

each pixel intensity with the average intensity of nearby pixels. The weights are calcu-
lated depending upon the Gaussian distribution, Euclidean distance, depth distance
and color intensity. Figure 3 depicts the bilateral filter algorithm for removing the
noise from the image.
The Table1 describes the bilateral filter parameters of gray scale and color image.
3.3 Lstm
The LSTM is introduced in 1997 by Schmidhuber et al. The LSTM solved variety
of problems. Here in LSTM the information will be remembered for longer period
than the usual. Figure 4 depicts the LSTM cell architecture.
Fig. 4 LSTM cell architecture
The proposed OCR-LSTM methodology is simulated using python. Figure 5 depicts

the original image that is given as the input.
Figure 6 depicts the gray scaled image. The RGB image is converted into gray
scale image using the weighted method.
The gray scaled image will be passed as input to the bilateral filter algorithm.
The bilateral filter algorithm will remove the noise from image. Figure 7 depicts the
image after applying bilateral filter.
Fig. 5 Original image

Fig. 6 Gray scaled image
Fig. 7 Image after applying bilateral filter
The result of the bilateral filter will be given as the input to the canny edge detection
algorithm. Figure 8 depicts the result of canny edge detection algorithm.
The canny edged image will be given as input for drawing contours. Figure 9
depicts the resultant image after drawing contours.
After drawing contours, the number plate will be detected. Figure 10 depicts the
detected number plate.
Fig. 8 Canny edged image
Fig. 9 Image after drawing contours
The extracted number plate will be given as input to the LSTM algorithm. The
LSTM algorithm will extract the number from the image and extracted number will
be compared with the details in the database to display the information regarding the
vehicle.
The result of the proposed OCR-LSTM methodology is compared with the
existing methodologies in terms of extraction of number plate region, segmenta-
tion rate and recognition rate. Figure 11 depicts the extracted information from the
database and Fig. 12 depicts the performance analysis of proposed OCR-LSTM when
compared with the existing methodologies.
5 Conclusions
In this paper the deep learning technique is integrated to detect the number plates.
The proposed OCR-LSTM uses LSTM algorithm which is a deep learning-based
algorithm. As the LSTM algorithm remembers the information for longer duration,
the proposed algorithm shows much better performance when compared with the
existing algorithms. The proposed algorithm recognizes the number plate with high
accuracy within lesser duration. The usage of bilateral filter and contour will remove
the unwanted space around the number plate which is enhancing the accuracy in
number plate detection and reducing the time consumption. The proposed OCR-
LSTM methodology shows 98% detection accuracy which is much better when
compared with the existing methodologies. This methodology is working efficiently
in all types of environments and all types of images.
Fig. 10 Detected number plate

Fig. 11 Extracted information from the database
Fig. 12 Performance
analysis of OCR-LSTM
References
1. Laroca R, Severo E, Zanlorensi LA, Oliveira LS, Gonçalves GR, Schwartz WR et al. (2018) A
robust real-time automatic license plate recognition based on the YOLO detector. Int Joint Conf
Neural Netw (IJCNN) 1–10
2. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with
region proposal networks. Adv Neural Info Proc Syst 91–99
3. Ghosh AK, Sharma SK, Islam MN, Biswas S, Akter S (2019) Automatic license plate recognition
(alpr) for bangladeshi vehicles. Global J Comput Sci Technol
4. Montazzolli S, Jung CR (2017) Real-time Brazilian license plate detection and recognition using
deep convolutional neural networks. In: SIBGRAPI conference on graphics patterns and images,
pp 55–62
5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
6. Li H, Wang P, Shen C (2017) Towards end-to-end car license plates detection and recognition
with deep neural networks. CoRR abs/1709.08828
7. Masood SZ, Shu G, Dehghan A, Ortiz EG (2017) License plate detection and recognition using
deeply learned convolutional neural networks. arXiv preprint 1703.07330
Artificial Neural Network Alert Classifier
for Construction Equipments Telematics
(CET)
Mohan Gopal Raje Urs, S. P. Shiva Prakash, and Kirill Krinkin
Abstract The Internet of Things (IoT) is connection between the Internet of Things
via cloud platform or centralized platform. It can be useful to many applications
that deal with varieties of services like sharing the information from one device to
another. Similarly, on these concepts, a concept called telematics, which deals with
the long-distance transmission of computerized information. It gives the navigations,
routing, or network-related information for many applications in service providers
like transportations, logistics, travelling, and many more. It has many challenges,
namely prediction of failure in the system, diagnostics analysis, etc. Therefore, there
is a need in predictive analysis of CET to analyse the failure in the system. Hence, the
proposed work using artificial neural network to alert the system. The experiment is
conducted using ANN on CET data set, with obtained the metric of accuracy 100%.
Also analysed the various machine learning (ML) algorithm, namely DT, KNN, and
Naive Bayes classifiers obtained in the metric of accuracy of 93.72%, 93.19%, and
62.57%, respectively.
1 Introduction
The Internet of Things (IoT) is connection between the Internet of Things via cloud
platform or centralized platform. It can be useful to many applications that deal with
varieties of services like sharing the information from one device to another. Similarly
on these concepts, a concept called telematics, which deals with the long-distance
transmission of computerized information [1]. It gives the navigations, routing, or
M. G. R. Urs · S. P. Shiva Prakash (B)

Department of Information Science and Engineering, JSS Science and Technology University,
Mysuru, Karnataka, India
e-mail: shivasp@jssstuniv.in
K. Krinkin
Department of Software Engineering and Computer Applications, Saint Petersburg
Electrotechnical University “LETI”, Saint Petersburg, Russia
e-mail: kirill@krinkin.com
148 M. G. R. Urs et al.
network-related information for many applications in service providers like trans-

portations, logistics, travelling, and many more. Telematics systems deliver a broad
range of mission-critical benefits to heavy equipment and construction industry own-
ers and operators. Telematics has been used in the logistics industry for years to
determine the location of trucks, containers, and other assets. The technology serves
a similar purpose within the heavy equipment and construction industry, but with
additional benefits. Asset tracking offers improved security and protection against
theft, misuse, and misappropriation of vehicles and equipment.
However, it also provides the ability to determine whether equipment is being
intelligently and efficiently allocated to make other decisions in CET. Therefore, to
develop the decision to the telematics, there is a need of intelligence into the system
using the artificial intelligence model. Hence, the proposed work is concentrated
on analysing the telematics data using artificial neural network (ANN). And also
perform the various machine learning algorithms such as decision tree (DT), Naive
Bayes (NB), and K-nearest neighbour (NB) [2, 3].
The organization of the work is as follows: Sect. 2 on related work, Sect. 3 for
problem statement, Sect. 4 on system model, Sect. 5 on design and methodology,
Sect. 6 on results and discussion, Sect. 7 on conclusion.
2 Related Works
Authors Aslan and Koo worked on an optimizing operation planning and establishing
long-term strategic organization in telematics [1]. Authors Chan and Louis worked
on novel use of telematics data, which is currently used only for equipment-centric
analysis [4]. Authors Lee et al. worked on GPS-based fleet telematics system for
heavy earthwork equipment which can analyse time log information without utiliz-
ing any other on-board sensors [5]. Authors Slaton et al. worked on automated activity
recognition systems for tracking and monitoring equipment [6]. Authors Lekan et al.
worked on the framework for sustainable innovation and a system for the inclusive
monitoring of innovations in the design and planning of construction maintenance
[7]. Authors Singh et al. worked on the realistic demand of handing and manipulat-
ing humongous data coming every few seconds from several vehicles through IoT,
NoSQL CloudantDB database, and cloud computing [8]. Authors Aldelaimi et al.
worked on the objects belonging to a community will collaborate with each other to
collect, manipulate, and share interesting content and provide services to enhance
the quality of human interactions in smart cities [9]. Authors Hussein et al. worked
on the resource capabilities of context-awareness in addition to the user-friendliness
and connectivity proposed as a part of its infrastructure [10]. Authors Barrett-Powell
et al. worked on the lightweight, to facilitate experiments and demonstrations [11].
Authors Hao et al. worked on the a novel diversified top-maximal clique detection
approach based on formal concept analysis [12]. Authors Bruno et al. worked on
sensor that monitor a person’s position at the topological level and generate tracking
signals [13]. Authors Huk and Kurowski worked on to analyse telematics systems
Artificial Neural Network Alert Classifier for Construction … 149
used in transport and forwarding and to propose improvements in the form of cen-
tral solutions [14]. Authors Hu et al. worked on the spatio-temporal distributions of
different parameters including traffic speeds, fuel economy, and emissions [15].
3 Problem Statement
The usage of telematics in the field of construction equipment technology improves

the quality of services. The sensors connected to the equipments of construction tech-
nology generate huge volume of data that needs to be diagnose to detect the anomalies
and alert the system. Hence, there is a need to propose an efficient machine learning
(ML) algorithm that classifies the alert based on the sensor features and compares
the performance with other ML algorithms. This work focuses on proposing best
classifier called artificial neural network (ANN) for CET data.
4 System Model
The CET system model SM has many equipments E, where it gives the services Sr
and applications Ap . Among independent equipments E within the given set of net-
works, N x is in various distribution in a space DNx . The homogeneous equipments
Ho and heterogeneous equipments Ht having sensors Sn gather data including vehicle
location, driver behaviour, engine diagnostics and vehicle activity, and visualize this
data on software platforms that help fleet operators manage their resources. In con-
striction equipment telematics, data is based on the communications of services and
applications via global positioning system (GPS) tracking T using these technology
supply chain management is functioning in construction services. The variables and
its descriptions used to model system are given in Table 1.
The objects are distributed randomly in the environment. The equipments service
and applications in the given system network. Hence, the system model is as shown
in Eq. 1.
( )
( ) DNx E(Sr , Ap )
SM = lim Sn DNx (E) : ∀(Sr , Ap ) = f (1)
Ho ,Ht T · Sn
4.1 Problem Formulation
In CET, equipments in the given network are distributed randomly that obtains the
data from sensors. The application and services act as interface to fetch the provided
services to each application from the equipments. These equipments have service
with respect to time for an application in CET.
Table 1 Variables and its descriptions

Variables Descriptions
SM System model
E Equipments
Ho Homogeneous equipments
Ht Heterogeneous equipments
T Tracking of equipments
Nx Networks
DN x Distribution in a space
Sr Services
Ap Applications
Sn Sensors
AM Alert model
The CET data set has an environment data helps to make a decision to the user
by giving alert model Am using decision support based on artificial intelligence
model (ANN). The artificial intelligence model provides the decision based on the
environment information like locations status L, time status T , ignition status I ,
power status P, speed status S, and fuel status F. So, data is analysed through
predictive modelling technique using artificial neural network algorithm. Hence, the
problem formulation as Eq. 2.
( ) ( )
1 L·S·I ·P·F
Am = lim DNx , Sr , Ap = f (2)
Ho ,Ht T T
The objective function can be defined as that makes the proposed alert model ( Am )
will take decision with respect to time (T ) in network (DNx ) for services (Sr ) and
applications ( Ap ). Therefore, in the environment, information E is shown in Eq. 3.
Σ
Ht (E)
Am = limn→∞ f T (3)
n=Ho
Subjected to environment information having locations status L, time status T ,

ignition status I , power status P, speed status S, and fuel status F as shown in Eq. 4.
E = (L + P + S + I + F) (4)
5 Proposed Design and Methodology
This section explains the proposed work carried out to conduct an experiment on
CET data.
The construction telematics is all about data. When a piece of construction equip-
ment or asset is being called into service, it can be monitored by software solutions
and provide a whole host of information. These areas generate an immense amount
of information that has wide-ranging implications and applications, from reducing
engine idling to identifying the need for further operator training or even investing in
alternative energy machines, such as electric vehicles. Further, under the concept of
predictive modelling technique, the data is analysed using artificial neural network
(ANN) that produces the corresponding information to the alert the system with
respect to time. Thus, the proposed ANN is based on the services in CET, that helps
to make decision according the alert in the system from applications of CET.
The working principle of artificial intelligence model provides the decision based
on the environment information like locations status L, time status T , ignition status
I , power status P, speed status S, and fuel status F. So, data is analysed through
predictive modelling technique using artificial neural network (ANN) algorithm as
Eq. 2. The process of ANN contains the three input layer of ReLU and one sigmoid
output function for 18 dimension of data, it is optimized using Adam and measured
the error in mean square error and obtained the accuracy. Hence, the design and
working methodology of CET environment as shown in Fig. 1.
Fig. 1 Proposed design and methodology

Fig. 2 Construction
equipment telematics
proposed methodology
Hence, the methodology follows the pre-processing of the CET data to normalize
the null values and missing values in the data set. Next feature selection based on the
equipments models, power models, ignition model, location models, engine models,
and application models. Next to make the decision using artificial neural network
model, it has input function of rectified linear unit (ReLU) and one output layer
using sigmoid function to provide the alert system into CET environment. Hence,
the methodology of CET environment is shown in Fig. 2.
6.1 Data Set
The CET data set contains real-IoT objects for a time of 36 h that contains the 12,000
values that are categorized into few service data models, namely equipments models,
power models, ignition model, location models, engine models, and application mod-
els has alert system. In the equipment model has equipment id, equipment name, in
power models has main power, status of power, in ignition model has ignition status,
vehicle status, digital input, speed status, conditions. In location models has time, lat-
itude, longitude, in engine models has fuel status, temperature, battery status, battery
alert.
6.2 Results
The proposed work is to predict the services using CET environment. This experiment
helps to decide the alert system based on knowledge model using ANN models. The
conventional ANN explores to analyse the sensors data that is encoded with the
service. The data is categorized into two types: target and features, and the total 6
features corresponding to sensors values and alert system have 3 class, namely good,
bad, and average target class. These classes and features data are split into two phase
like training data of 70% and testing data of 30%. The neural network model has
three ReLU activation function layers and one softmax activation function of output
layer for training the network that is obtained in the metric of accuracy 100%. The
proposed work is also analysed by split into ratio of training data 70%, and testing
data 30% on algorithm DT, KNN, and Naive Bayes classifiers obtained in the metric
of accuracy of 93.72%, 93.19%, and 62.57%, respectively as given in Table 2 (Fig. 3).
Table 2 Results
Algorithm Precision Recall F1-score Accuracy
DT 0.88 0.88 0.96 0.93
NB 0.58 0.98 0.92 0.62
KNN 0.78 0.58 0.77 0.93
ANN 1.00 1.00 1.00 1.00
Fig. 3 Accuracy
7 Conclusions
The proposed work is carried on CET data set using artificial neural network to
alert the system. The experiment is conducted using ANN and obtained the metric of
accuracy 100%. Also analysed the various machine learning (ML) algorithm, namely
DT, KNN, and Naive Bayes classifiers obtained in the metric of accuracy of 93.72%,
93.19%, and 62.57%, respectively. Hence, in future increase the data set to analyse
the CET also analyse the performance with various deep learning techniques to the
CET environment.
Acknowledgements This work was carried out under the “Development program of ETU ‘LETI’
within the framework of the program of strategic academic leadership” Priority-2030 No. 075-15-
2021-1318 on 29 Sept 2021.
References
1. Aslan B, Koo DH (2012) Productivity enhancement for maintenance equipment operations

using telematics technology. In: Construction research congress 2012: construction challenges
in a flat world, pp 971–980
2. Le LT, Nguyen H, Dou J, Zhou J (2019) A comparative study of PSO-ANN, GA-ANN, ICA-
ANN, and ABC-ANN in estimating the heating load of buildings’ energy efficiency for smart
city planning. Appl Sci 9(13):2630
3. Drewil GI, Al-Bahadili RJ (2021) Forecast air pollution in smart city using deep learning
techniques: a review. Multicult Educ 7(5)
4. Chan K, Louis J (2017) Leveraging telematics and real-time sensor data to increase safety of
equipment-intensive construction operations. In: Proceedings of the Canadian society for civil
engineering annual conference and general meeting
5. Lee SS, Park SI, Seo J (2018) Utilization analysis methodology for fleet telematics of heavy
earthwork equipment. Autom Constr 92:59–67
6. Slaton T, Hernandez C, Akhavian R (2020) Construction activity recognition with convolutional
recurrent networks. Autom Constr 113:103138
7. Lekan A, Clinton A, James O (2021) The disruptive adaptations of construction 4.0 and industry
4.0 as a pathway to a sustainable innovation and inclusive industrial technological development.
Buildings 11(3):79
8. Singh P, Suryawanshi MS, Tak D (2019) Smart fleet management system using IoT, computer
vision, cloud computing and machine learning technologies. In: 2019 IEEE 5th international
conference for convergence in technology (I2CT). IEEE, pp 1–8
9. Aldelaimi MN, Hossain MA, Alhamid MF (2020) Building dynamic communities of interest
for internet of things in smart cities. Sensors 20(10):2986
10. Hussein D, Han SN, Lee GM, Crespi N, Bertin E (2017) Towards a dynamic discovery of smart
services in the social internet of things. Comput Electr Eng 58:429–443
11. Barrett-Powell K, Furby J, Hiley L, Vilamala MR, Taylor H, Cerutti F et al (2020) An
experimentation platform for explainable coalition situational understanding. arXiv preprint
arXiv:2010.14388
12. Hao F, Pei Z, Yang LT (2020) Diversified top-k maximal clique detection in social internet of
things. Future Gener Comput Syst 107:408–417
13. Bruno B, Giuni A, Mastrogiovanni F, Reboscio E, Scalmato A, Sgorbissa A (2020) U.S. patent
no. 10,682,097. U.S. Patent and Trademark Office, Washington, DC
14. Huk K, Kurowski M (2022) The use of telematics systems in transport and forwarding manage-
ment. In: 5th EAI international conference on management of manufacturing systems. Springer,
Cham, pp 305–317
15. Hu S, Shu S, Bishop J, Na X, Stettler M (2022) Vehicle telematics data for urban freight
environmental impact analysis. Transp Res Part D Transp Environ 102:103121
Hybrid Approach of Modified IWD
and Machine Learning Techniques
for Android Malware Detection
Ravi Mohan Sharma and Chaitanya P. Agrawal
Abstract Mobile phones have become an indispensable part of our daily lives due
to the rapid improvement in smartphone technologies. The increased use of smart-
phones in online payments has attracted cybercriminals and is contributing to the
rise of malware infections. Many cyberattacks are caused by mobile application
vulnerabilities and malware. As a result, these attacks pose a significant threat to
smartphone security. In general, big datasets are employed for malware analysis, and
these datasets may contain numerous redundant, inappropriate, and noisy features,
causing misclassification and low detection rates. So, we have to choose the most
important features from the dataset. This research work presents a hybrid model
for malware detection, based on a modified intelligent water drop algorithm (IWD)
and ML techniques. To investigate the performance of the proposed techniques, we
used the DREBIN dataset. The results of the experiments reveal that this approach
removes more than 60% of irrelevant features from the dataset effectively and produce
a promising result.
Keywords Computational intelligent techniques · Android malware detection ·

IWD algorithm · Machine learning · Feature selection
1 Introduction
Smartphones are frequently utilized nowadays because of their portability and multi-
functional capabilities. Smartphones play a huge role in our daily lives, and they are
utilized for a variety of things like Web browsing, e-banking, e-learning, e-shopping,
social media, and so on. In a previous couple of decades, android has risen to promi-
nence as a dominant mobile operating system. The ability to make digital payments
R. M. Sharma (B) · C. P. Agrawal

Department of Computer Science and Applications, Makhanlal Chaturvedi University, Bhopal,
Madhya Pradesh 462001, India
e-mail: ravi@mcu.ac.in
C. P. Agrawal
e-mail: cpa@mcu.ac.in
158 R. M. Sharma and C. P. Agrawal
via mobile device makes it incredibly unique, and this characteristic makes it the
most attractive target for hackers. Mobile apps can be acquired from a variety of
sources, depending on the requirement and purpose. Malware and benign-ware are
the two main categories of android apps. Malware is a malicious program that is
purposely created to harm mobile functions. Malware infects mobile devices and
executes a variety of fraudulent operations on its own. Benign-ware is a program
designed to aid the user and does not harm system functions in any way. Signature-
based detection has several limitations, including the inability to detect new malware
and the requirement for malicious source code to build the signature. As a result,
behavior-based malware detection is becoming increasingly common. The proposed
work is dedicated to behavior-based malware detection. To study the behavior of
malware, a large number of the attribute are extracted from the APK file of apps,
and progressively, more attributes are included in the dataset. For this reason, a very
large dataset is built, which may contain many duplicates, useless, and noisy features
[1]. The feature selection can improve computational complexity and classification
time, and it is also used for eliminating inoperative features. The optimal set N is
determined from the entire set M in feature selection, where N < M. The optimization
criteria function is set up in such a way that the best set is generated from the full
set. Many nature-inspired meta-heuristics algorithms, such as particle swarm opti-
mization (PSO), ant colony optimization (ACO), artificial bee colony (ABC), and
genetic algorithms (GA), have established their effectiveness in feature selection in
a variety of domains in recent decades [2]. For android malware detection, machine
learning-based methods with meta-heuristic methodologies are progressively being
explored and deployed [3]. We introduce a hybrid detection model in this paper
that combines a modified version of the intelligent water drop algorithm for optimal
feature selection with machine learning techniques for optimal set evaluation. The
following are the major contributions of the planned work.
• We modified the old IWD by using the feature important function instead the of
probability function for edge selection.
• After getting the subset from the first step, we evaluate the subset using six different
machine learning classifiers.
• To demonstrate the effectiveness of the proposed hybrid approach, we present the
results of various classifiers. In addition, we compare the results of the previous
work to the proposed work.
• To test the proposed method’s performance, we used a well-known android dataset
DREBIN.
The remainder of the paper is divided into the following sections. Section 2
discusses related work in this field, whereas Sect. 3 describes the modified IWD
algorithm, and Sect. 4 describes the feature selection procedure. Section 5 describes
the datasets, data preprocessing steps, and experimental environment in detail; Sect. 6
presents performance assessment metrics to assess the proposed approach’s perfor-
mance; Sect. 7 summarizes the proposed work’s findings, and Sect. 8 summarizes
the proposed approach with the future development.
Hybrid Approach of Modified IWD … 159
2 Related Works
We discuss previous work on feature selection and machine learning-based android

malware detection in this section. In the paper, Milosevic et al. [4] presented two
methods for malware detection: The first is based on permissions analysis, while the
second is based on source code analysis, and they use a bag-of-words to classify
malware. In another study, Sun et al. [5] proposed (significant permission identifi-
cation) for machine learning-based android malware prediction and subset evalua-
tion, with the SVM classifier being utilized for subset evaluation. The fine-grained
dangerous permission (FDP) technique is introduced by Jiang et al. [6]; it collects the
difference between malicious and benign apps and evaluates the performance using
KNN SVM, J48, and NB classifiers. In another study, a two-layer malware detection
technique is proposed by Feng et al. in which the first layer is a fully connected neural
network (NN) that applies permission, intent, and component properties, while the
second layer is a CNN and autoencoder that detects malware. Zhang et al. presented
a DAMBA in their publication [7], which is a novel prototype system that lever-
ages object reference graph birthmarks (ORGBs) for similarity matching and the
TANMAD method for malware identification. The multi-view neural network tech-
nique for malware detection utilizing URLs visited by apps was proposed by Wang
et al. [7]. Arp et al. [8] used 123,453 benign apps and 5560 malware apps to develop a
DREBIN as a lightweight malware detection approach. APK auditor was created by
Talha et al. [9] to analyze permission features using the Genome, Contagio–Mobile,
and VirusShare datasets. Mehtab et al. [10] used Contagio Dump and the VirusShare
dataset to present a rule-based feature selection method. Another study used the
DREBIN dataset to offer a (DT + SVM) technique for android virus detection. As a
malware detection method, Jerlin and Marimuthu [11] proposed a multi-dimensional
Naive Bayes (MDNB rete) classifier that employs the API call sequence as a feature
set. The DL-Droid is proposed by Alzaylaee et al. [12] which is a deep learning-
based method that uses stateful input creation to perform dynamic analysis. Using
themes and sensitive data-flows characteristics as a feature set, Lou et al. [3] demon-
strated the TFDroid approach to identify android malware. The Pindroid is proposed
by Idrees et al. [13], which is permission and intent-based method for malware
detection systems; in another study, Alam et al. [14] presented the DroidDomTree
technique, which mines the dominance tree of API calls to uncover similar patterns
in android apps for malware detection.
3 Proposed Modified Version of IWD Algorithm
This section explains the proposed IFWDA feature selection algorithm. The
suggested algorithm is a tweaked version of the IWD algorithm. Hosseini was the
first to introduce the IWD in 2007 [15]. The swarms of water droplets pursue the
Table 1 Static and dynamic

Dynamic parameters Static parameter
parameters values
NlD = {} initially empty list of ND = 215 (number of
visited nodes features)
v d = 4 (initial velocity) via , = 1, vib = 0.0.01, and vic
S d = 0 (initial soil on water =1
drop) sia , = 1, sib = 0.01, and sic . =
1
ρIWD , = 0.9 (constant)
ρ D = 0.9 (constant)
Mx_It = 20
It_S = 100
most efficient path from the source to the destination, avoiding obstacles and environ-
mental disturbances. This algorithm was created using data from the abovementioned
natural phenomenon. Based on previously acquired preserved solutions, this tech-
nique generates new solutions. It achieves an optimal path using artificially generated
intelligent water droplets (IWDs) and ambient factors. The problem is denoted by a
graph G (N, E), where N denotes the nodes of the graph and E represents the edges.
Each water drop makes a path gradually, roaming through the edge and node till
its whole solution is reached. Iteration is finished when all IWDs have established
their whole solution. The algorithm obtains the final solution after completing the
following steps.
3.1 Step 1: The Static Parameters and Dynamic Parameters

are Initialized
The static parameters are unchanged during the whole process. The number of artifi-
cially generated IWD is denoted by ND. The velocity of IWDs is represented by (via ,
vib , and vic ). The soil values of the local path are represented by three parameters sia ,
sib , and sic . The Mx_It represents the total number of iterations, and It_S denotes the
initial soil value of the local pathway. The dynamic parameters are initialized at the
beginning of the process and updated during the process. The list of nodes visited by
each water drop NlD primarily blank and updated if IWD visits that node. The initial
velocity of IWD is denoted by v D . Initial soil is denoted by s D is set to zero. Table
1 represents the static and dynamic parameters.
3.2 Step 2: Modified Edge Selection Process
The proposed modification has been applied to this step. In the IWD algorithm,
artificially generated water droplets select the edges which have less soil content. In
this way, the water droplets follow the optimal path from the source to the destination.
In the proposed work, the nodes represent the features of the dataset, each edge that
connects the nodes to form an undirected network. In this network, a matrix was
generated by the probability function for each node in the traditional algorithm to
select the connected path with less soil, and the node with a higher value was selected.
Similarly, in the proposed modification, the matrix obtained from sci-kit-learn is used
to determine the optimal path and the choice of the important node. All the nodes of
the optimal path obtained in this way determine the optimal subset. If an IWD drop
k is presently in node i and move to node j. Then feature importance is calculated
using Eq. (1).
Σ
i : node(i )split on node( j )N p(i )
F p( j ) = Σ (1)
all node N p(t)
where N p(i ) denotes the importance of node i. The value of F p( j) is standardized

between 0 and 1.
N p(i) = Wz(i) Hz(i) − Wzleft(i) Hzleft(i) − Wzright(i ) Hzright(i )) (2)
Σ
L
( )
G mp = f z(i ) ∗ 1 − f z(i ) (3)
i=1
In Eq. (2), Wz(i) represented the weighted number of IWD reaching node j. The Hz(i)
indicates the Gini impurities value of node i. The right (i) denotes the child node
from the right split on node i. The left (i) denotes the child node from the left split
on node j.
In Eq. (3), L denotes the number of labels, and f z(i) represents (frequency) of
label i. If the value of N p(i) is higher than the earlier calculated N p(i) , then add node
j to the list of visited node NlD .
3.3 Step 3: Updating Velocity and Soil Values
The v D(t+1) represents the velocity of IWD k in time (t + 1); this parameter is updated
using Eq. (4).
vi a
v D(t+1) = v D(t) + (4)
vi b + vi c ∗Slp (i, j )
( )
Slp (i, j ) = (1 − ρin ) ∗ Slp (i, j) − ρin Δs(i, j ) (5)
The amount of soil in the local path is denoted by Slp (i, j).
Soil values updated using Eqs. (5), (6), and (7), respectively.
where the value of constant (ρin ) lies between 0 and 1.
s D = s D + Δs(i, j ) (6)
sia
Δs(i, j ) = ( ) (7)
sib + sic ∗ time i, j : velIWDk (t+1)
( ) HU D (i, j )
t i, j : v D (t+1) = (8)
v D (t+1)
Equation (8) denotes the time function ti(i, j : v D (t+1) ) that denotes the time needed
for water drop k, to travel from i to j at the time (t + 1) where HU D (i, j ) denotes the
heuristic desirability function.
3.4 Step 4: Reinforcement and Termination Phase
The iteration best solution T IBest is calculated using Eq. (9)
T IBest = arg(min)\max∀x p∈ T Pq(x p) (9)
where T P denotes the population of the solution and q (xp) is the fitness function
that is used to measure the quality of the solution. The soil of all edges in T IBest is
calculated using Eq. (10).
( )
1
Sgp (i, j) = (1 + ρ D ) ∗ Sgp (i, j ) − ρIWD ∗ s Bk (10)
q(T IBest )
where Sgp (i, j) denotes the global soil, ρ D denotes a constant.

And s Bk denotes the soil value in kth iteration. The global best solution T GBest is
calculated as follows
⎧ IBest ( ) ( )
T if q T IBest ≥ q T GBest
T GBest
= (11)
T GBest otherwie
Equation (11) is used to substitute the T GBest by T IBest or preserve the same value.
The solution building and reinforcement phase are repeated until the termination
state is reached. If the value of It_Cnt becomes equal to or higher than Mx_It, then
the iteration progression is stopped.
4 Feature Selection Procedure Using Modified IWD
The improved IWD feature selection process finds the best subset S from the complete
dataset U. The suggested modified IWD’s searching process is represented by an
undirected graph G (N, E), where N is the node (i.e., features) connected by edge E.
The selection of an edge indicates the next node to be selected. A small amount of
soil is present on each edge, signifying impediments in the nearby path. Each water
drop is randomly dispersed over the graph and serves as a search agent. The iteration
best solution T IBest result is utilized to determine the global best solution T GBest . The
path with the fewest barriers is the best solution. The optimal feature subset is the
set of all nodes that are members of the optimal path.
5 Dataset Prepossessing and Experimental Environment
5.1 Dataset and Preprocessing
To examine the performance of the proposed approach, the Drebin-215 dataset is used
for evaluation purposes, which contains 9476 are benign and 5560 malware samples
from the DREBIN project. This dataset is extensively used by many researchers
[16]. In the data preprocessing phase, the duplicate occurrences are removed from
the dataset, and the entry containing a NaN value is also removed. Then, important
features are selected using modified IWD algorithms. Then, the selected subset is
evaluated using six machine learning classifiers. The flowchart of the proposed model
is given in Fig. 1.
5.2 Experimental Environment
The proposed approach was implemented in Anaconda Python 3.8 on Jupyter note-
book, and the system has processor Intel(R) Core (TM) i-7 8550U @ 1.80 GHz and
8 GB RAM.
6 Performance Evaluation Matrix
The confusion matrix is used to display the results of classification; it not only
provides insight into the performance of classifiers but also shows which classes are
correctly classified and which are not.
Fig. 1 Working model of the proposed approach

• True Positive (∂): A true positive is a correctly forecast occurrence that exists in
malware samples.
• False Positive (α): A false positive is incorrectly forecasting occurrences that exist
in malware samples.
• True Negative (ρ): A true negative is correctly forecasting occurrences that belong
to benign-ware samples.
• False Negative (σ ): A false negative is incorrect forecasts occurrences that belong
to benign-ware samples.
The following metrics are used to evaluate the usefulness of the proposed
technique.
• Accuracy (μ): The accuracy represents the ratio of correctly categorized samples
to the total number of samples. The accuracy can be defined as Eq. (12).
∂ +ρ
Accuracy(μ) = (12)
∂ +α+ρ+σ
• Recall (δ): The recall is a ratio of total number forecast that is pertinent to the
total number of relevant predictions. The recall can be described as Eq. (13).
∂
Recall(δ) = (13)
∂ +σ
• Precision (λ): The precision is the ratio of true positive predictions to the total
number of positive forecasts. The precision can be represented as Eq. (14)
∂
Precision (λ) = (14)
γ +α
• F 1-Score (τ ): The F1-score is the harmonic mean of precision (λ) and recall
(δ); it delivers a better measure of the wrongly classified occurrences than the
accuracy metric. The F1-score can be expressed as Eq. (15).
2(λ ∗ δ)
F1 − Score (β) = (15)
(λ + δ)
7 Result and Discussions
To evaluate the subset performance of the proposed approach, six ML-classification

approaches were implemented on the same platform. These were namely logistic
regression (LR), k-nearest neighbors (KNN), decision tree (DT), support vector
machine (SVM), random forest (RF), and multi-layer perceptron (MLP). The results
of all applied ML-classification methods are given in Table 2. Among the proposed
variants, (Modified IWD + RF) achieved 96% accuracy and 98% recall value on
benign-ware. The result also shows that (Modified IWD + KNN), (Modified IWD
+ DT), and (Modified IWD + MLP) performed better than other methods in terms
of precision. The variants (Modified IWD + RF), (Modified IWD + SVM), and
(Modified IWD + KNN) have achieved an F1-score of 98% and 97%, respectively.
The contrast of the proposed method with the previous methods is given in Table
4. It is also clear from Table 4 that the proposed approach is better than the other
except one [7]. The outcomes also demonstrate that the model hybridized with meta-
heuristic methods gives better performance. Since heuristic algorithms unremittingly
try to reach an optimum solution by learning from their previous steps, these methods
do not reconsider the paths that have already been covered. Instead, the meta-heuristic
used the information of the preceding steps, to discover new promising solutions.
Thus, one of the benefits of using meta-heuristic optimization algorithms is that
they considerably reduce the sizes of the dataset by choosing appropriate features,
thereby reducing the time and complexity. The modified IWD algorithm selected 71
optimal features from the dataset. The top 12 features selected from the DREBIN
dataset are shown in Table 3. This process reduces the size of the dataset by more
than 60%; thus, it can be established that the proposed method is more effective
and deliver better performance than many other existing models. Figure 2 depicts
the suggested model’s F1-score, whereas Fig. 3 depicts its accuracy. The recall and
precision derived from subset evaluation are shown in Fig. 4 and Fig. 5, respectively.
Figure 6 shows a comparison of the proposed approach’s precision and recall.
Table 2 Performance of proposed method using six classifiers

Modified IWD+ Class type Recall % Prec. % F1 % Accuracy %
KNN B 97 97 97
M 89 90 89 95
DTC B 95 96 95
M 86 84 85 93
LR B 97 97 97
M 89 90 89 95
SVM B 98 96 97
M 86 92 89 95
RF B 98 97 98
M 90 94 92 96
MLP B 97 97 97
M 89 92 90 95
Table 3 Top twelve features selected by modified IWD

S. No. Features S. No. Features
1 onServiceConnected 7 CAMERA
2 RECEIVE_SMS 8 WRITE_APN_SETTINGS
3 MANAGE_ACCOUNTS 9 CHANGE_WIFI_MULTICAST_STATE
4 DexClassLoader 10 MOUNT_FORMAT_FILESYSTEMS
5 Ljava.lang.Class.getMethod 11 INTERNAL_SYSTEM_WINDOW
6 android.intent.action.SEND 12 GET_TASKS
Table 4 Comparisons of proposed approach with the earlier work

Reference Approaches F1 Recall Dataset Accuracy (%)
Proposed (modified IWD+ 98% 98 DREBIN 96
RF)
[8] Linear (SVM) 94% – DREBIN 94
95% Malgenome 95
[3] Two class – – DREBIN 93.7
(SVM)
[17] PSO + MLP 91.6 95.6% DREBIN –
AdaBoost, RF, (PSO + RF) (PSO + Androzoo
KNN, J48 Adaboost)
[7] ORGB 97.09% 98.55% Malgenome 96.9
Fig. 2 Obtained F1-score F1 score %

from six classifiers (after 100
feature selection)
90
80
70
BMBMBMBMBMBM
KNN DTC LR SVM RF MLP
Fig. 3 Obtained accuracy Accuracy%

from six classifiers (after 97
feature selection) 96
95
94
93
92
91
Fig.4 Obtained recall from Recall

subset evaluation 100
95
90
85
80
BM BM BM BM BM BM
Fig.5 Obtained precision Precision

from subset evaluation 100
95
90
85
80
75
BM BM BM BM BM BM
RECALL VS PRECISION
Recall Precision
100
95
90
85
80
75
B M B M B M B M B M B M
Fig. 6 Comparison of recall and precision (after feature selection)
In this paper, we presented an efficient hybrid model of a modified intelligent water

drop (IWD) algorithm and machine learning techniques for malware detection. In
which, modified water drop algorithm is used for feature selection, and six different
ML-classifiers are used for subsets examination. Among all the proposed variants,
(modified + RF) has achieved the highest accuracy of 96%. Some variants of this
approach, such as (modified IWD + RF), (modified IWD + SVM), and (modified
IWD + KNN), have achieved F1-scores of 98% and 97%, respectively. And this also
proved from the result of the proposed approach was able to reduce the size of datasets
by more than 60% and was found to be better than many other previous approaches.
The future work will syndicate the other meta-heuristic for feature optimization and
ML techniques for subset examination to attain a more effective hybrid approach.
References
1. Acharya N, Singh S (2017) An IWD-based feature selection method for intrusion detection
system. Soft Comput 22(13):4407–4416
2. Shunmugapriya P, Kanmani S (2017) A hybrid algorithm using ant and bee colony optimization
for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 36:27–36
3. Lou S, Cheng S, Huang J, Jiang F (2019) Tfdroid: Android malware detection by topics
and sensitive data flows using machine learning techniques. In: 2019 IEEE 2nd international
conference on information and computer technologies (ICICT), pp 30–36. https://ieeexplore.
ieee.org Accessed: 12 Jan 2021.
4. Sun L, Li Z, Yan Q, Srisa-An W, Pan Y (2017) SigPID: significant permission identification
for android malware detection. In: 2016 11th International conference on malicious unwanted
software, MALWARE 2016, pp 59–66
5. Jiang X, Mao B, Guan J, Huang X (2020) Android malware detection using fine-grained
features. Sci Prog 2020(5190138):1–13. https://www.hindawi.com
6. Zhang W, Wang H, He H, Liu P (2020) DAMBA: detecting android malware by ORGB analysis.
IEEE Trans Reliab 69(1):55–69
7. Wang W, Zhao M, Wang J (2019) Effective android malware detection with a hybrid model
based on deep autoencoder and convolutional neural network. J Ambient Intell Hum Comput
10(8):3035–3043
8. Arp D, Spreitzenbarth M, Hübner M, Gascon H, Rieck K (2014) Drebin: effective and
explainable detection of Android malware in your pocket. NDSS 14:1–15
9. Talha KA, Alper DI, Aydin C (2015) APK auditor: permission-based android malware detection
system. Digital Invest 13:1–14
10. Mehtab A et al. (2020) AdDroid: rule-based machine learning framework for android malware
analysis. Mob Netw Appl 25(1):180–192
11. Jerlin MA, Marimuthu K (2018) A new malware detection system using machine learning
techniques for API call sequences. J Appl Secur Res 13(1):45–62
12. Alzaylaee M, Yerima SY, Sezer S (2020) DL-Droid: deep learning based android malware
detection using real devices. Comput Secur 89:101663
13. Idrees F, Rajarajan M, Conti M, Chen TM, Rahulamathavan Y (2017) PIndroid: a novel Android
malware detection system using ensemble learning methods. Comput Secur 68:36–46
14. Alam S, Alharbi SA, Yildirim S (2020) Mining nested flow of dominant APIs for detecting
android malware. Comput Netw 167:107026
15. Hosseini HS (2007) Problem solving by intelligent water drops. In: 2007 IEEE congress on
evolutionary computation. IEEE, pp 3226–3231
16. Android malware dataset for machine learning 2. https://figshare.com/articles/dataset/And
roid_malware_dataset_for_machine_learning_2/5854653 Accessed 11 Sep 2021
17. Milosevic N, Dehghantanha A, Choo KKR (2017) Machine learning aided Android malware
classification. Comput Electr Eng 61:266–274
Intuitionistic Fuzzy 9 Intersection Matrix
for Obtaining the Relationship Between
Indeterminate Objects
Subhankar Jana and Juthika Mahanta
Abstract This paper defines intuitionistic fuzzy core (IFC), intuitionistic fuzzy
fringe (IFF), and intuitionistic fuzzy outer (IFO) of an intuitionistic fuzzy set (IFS)
in an intuitionistic fuzzy topology space (IFTS). It has been shown that the IFC, IFF,
and IFO of an IFS are mutually disjoint. Further, intuitionistic fuzzy 9 intersection
matrix (IF9IM) is defined, which can determine the topological relation between
any two IFS. The IF9IM is an upgradation of fuzzy 9 intersection matrix. The IFS
being capable of handling any hesitancy or indeterminacy, the IF9IM determines
relationship between two uncertain objects having any indeterminacy.
Keywords GIS · 9 intersection matrix · Intuitionistic fuzzy set · Intuitionistic

fuzzy 9 intersection matrix
1 Introduction
Relationship between geographical objects is an essential query in GIS. Topological

theory is commonly utilized to evaluate topological relationships between geograph-
ical entities. The topological models are mostly qualitative and invariant under any
topological transformations such as translation, rotation, and scaling. Egenhofer’s 9
intersection approach is the most well-known among existing topological models.
The model, however, fails when the boundary of geographical objects is not sharp
enough. The problem frequently arises because most geographical objects lack a
sharp boundary. The uncertainty may arise as a consequence of a crisp object’s
spatial inaccuracy, logical inconsistency, or data inconsistency. Because the circum-
stance of such ambiguity is relatively widespread, it drew the attention of scholars.
Different techniques such as the probabilistic method, fuzzy set theory, and egg yolk
S. Jana (B) · J. Mahanta

Department of Mathematics, National Institute of Technology Silchar, Silchar, Assam 788010,
India
e-mail: suvo.jana@gmail.com
J. Mahanta
e-mail: juthika@math.nits.ac.in
172 S. Jana and J. Mahanta
have been used to deal with the uncertainty. The studies [2, 9, 17] based on fuzzy
set were introduced to make the GIS database system capable of dealing with uncer-
tainties. To determine the relationship between two uncertain geographical objects,
9 intersection matrix using broad boundary [3], egg yolk method [4], fuzzy 9 inter-
section matrix [15], unified fuzzy 9 intersection [6], fuzzy 9 intersection matrix in
a crisp fuzzy topological space [16], and α-induced 9 intersection matrix [11, 12]
were proposed.
Data accuracy in any database model is the primary and most crucial component.
As GIS is a database model, the aim is to make the model as accurate as possible.
But hesitancy or indeterminacy is a concern that occurs due to several facts such
as noise in the data or incapability of collecting data at a particular location. The
fuzzy models are capable of dealing with uncertainty though it fails to handle any
kind of indeterminacy. The generalized fuzzy sets serve better for that purpose. The
intuitionistic fuzzy set introduced by Atanassov [1] is one such generalization of the
fuzzy set that considers the membership and non-membership, also the measure of
the hesitancy of any object in the set. The elements of the set follow the condition
that the sum of the membership and non-membership is always less or equal to
one. The GIS modeling has already been studied in terms of the intuitionistic fuzzy
set. Malek [13] pointed out the shortcomings of the fuzzy framework and proposed
intuitionistic fuzzy framework and its several possible applications in the GIS. In
the viewpoint of point-set topology, the model describes an intuitionistic fuzzy set
in terms of the interior and boundary of membership and that of non-membership.
Interestingly, the study neither examines whether the parts are topological properties
nor it verifies their mutual disjointness. In this paper, we introduce the intuitionistic
fringe and intuitionistic fuzzy outer and show that the fringe, outer and core are
mutually disjoint topological properties in an intuitionistic fuzzy topology space
(IFTS). Finally, we propose IF9IM for determining the relationship between two
objects in an IFTS.
The paper is organized as follows. Section 2 discusses the preliminary concepts
required for the study. Section 3 introduces intuitionistic fuzzy core, intuitionistic
fuzzy fringe and intuitionistic fuzzy outer and constructs the IF9IM. Finally, Sect. 5
concludes the study.
2 Preliminary Concepts
In this section, we briefly recall the preliminary concepts required for the study.
2.1 Intuitionistic Fuzzy Set
Intuitive fuzzy set (IFS) is a generalization of fuzzy set [18] introduce by Atanassov
[1]. Each of the elements of the set is assigned a membership value as well
Intuitionistic Fuzzy 9 Intersection Matrix for Obtaining the Relationship … 173
as a non-membership value. Let A be a subset of a non-empty fixed set X .

A = {⟨ x, μ A (x), ν A (x)⟩} is said to be a IFS, where μ A : X → I and ν A : X → I
are respectively the membership and non-membership of each element x ∈ X for
the set A, with the condition that, 0 ≤ μ A (x) + ν A (x) ≤ 1. For each x ∈ X , the
measure of hesitancy is evaluated as, Hes(x) = 1 − μ A (x) + ν A (x). Some of the
basic operations [1] on intuitionistic fuzzy set are defined as following.
Let A = ⟨ μ A (x), ν A (x)⟩ and B = ⟨ μ B (x), ν B (x)⟩ be two IFS in X , then
1. A ∧ B = ⟨ min{μ A , μ B }, max{ν A , ν B }⟩
2. A ∨ B = ⟨ max{μ A , μ B }, min{ν A , ν B }⟩
3. Compliment of A = Ac = ⟨ ν A , μ A ⟩.
2.2 Intuitionistic Fuzzy Topological Spaces
The intuitionistic fuzzy topology was introduced by Çoker [5] in 1997 and defined
as follows:
Definition 2.2.1 Let X /= φ be any set and I = [0, 1] and τ ⊂ I X be a collection
of IFS such that τ satisfies the following conditions
(i) 0 X , 1 X ∈ τ,
(ii) A, B ∈ τ ⇒ A ∧ B ∈ τ, (1)
(iii) ( A j ) ∈ τ ⇒ ∨ A j ∈ τ, where J is an index set,
j∈J j∈J
where 1 X and 0 X are, respectively, the whole set X and the null set.
Then τ is called an intuitionist fuzzy topology (IFT) for X .
Members of τ are called intuitionistic fuzzy open set in τ , and compliments of
elements of τ are said to be intuitionistic fuzzy closed set. For an intuitionistic fuzzy
set A in an intuitionistic fuzzy topological space (IFTS), the supremum of the all
intuitionistic fuzzy open set contained in A is defined as intuitionistic fuzzy interior
of A, denoted as IFIntA and the infimum of the all fuzzy closed set that contain A
is defined as intuitionistic fuzzy closure of A, denoted by IFClA. The exterior of A,
denoted as A− , defined as A− = (IFClA)c . The relation between IFCl and IFInt can
be obtained from the following theorem [5].
Theorem 2.2.1 For a IFS A in a IFTS X ,
1. IFClAc = (IFIntA)c
2. IFIntAc = (IFClA)c .
The intuitionistic fuzzy boundary of an IFS A is defined by Hur et al. [10] as follows
Definition 2.2.2 For a IFS B in an IFTS X , the intuitionistic fuzzy boundary of B

is defined as IFBdB = IFClB ∧ IFClB c .
2.3 Related Studies
Crisp methods: 4 intersection matrix [8] by Egenhofer was the first algebraic method
to obtain topological relation between geographical objects. The model was later
upgraded to the famous 9 intersection matrix model by Egenhofer and Franzosa [7].
For two crisp set A and B, the 9 intersection method is defined [8] as follows:
⎛ ⎞
IntA ∩ IntB IntA ∩ BdB IntA ∩ B −
⎝BdA ∩ IntB BdA ∩ BdB BdA ∩ B − ⎠
A− ∩ IntB A− ∩ BdB A− ∩ B −
Methods based on broad boundary: The above-discussed models were designed

to find the relation between two geographical objects, assuming objects have sharp
boundaries. In reality, geographical objects often consist of uncertain boundaries,
which inspired researchers to develop the idea of the minimal and maximal extent
of an area object. Based on that idea, 9 intersection matrix for areas with broad
boundary [3] and egg yolk method [4] were proposed. Though both the broad bound-
ary and egg yolk method can determine different relationships between two uncer-
tain area objects, the models were not based on point-set topology. Also, relations
between uncertain area-line objects and line-line objects are not deducible using
these methods.
Fuzzy topological based methods: The first fuzzy topology-based model was intro-
duced by Tang and Kainz [15]. The fuzzy 9 intersection matrix is an upgradation
of crisp 9 intersection matrix constructed using mutually disjoint topological prop-
erties. In a crisp 9 intersection matrix, the interior, boundary and closure act as the
mutually disjoint topological properties, which is generally not true in fuzzy cases.
Thus the mutually disjoint fuzzy topological properties, namely the core, outer, and
fringe [15] of a fuzzy set, were introduced to construct fuzzy 9 intersection matrix.
In 2005, Du et al. [6] introduced a unified fuzzy 9 intersection method that deter-
mines the relation between two fuzzy objects, fuzzy, and crisp objects or two crisp
objects. The α-induced fuzzy topology was introduced by Liu and Shi [11] in 2006,
where they computed the interior, boundary, and exterior depending upon the dif-
ferent α values, 0 ≤ α ≤ 1. Later fuzzy 9 intersection matrix [12, 14] based on the
α-induced fuzzy topology was proposed.
Intuitionistic fuzzy model: Intuitionistic fuzzy-based intersection model was intro-
duced by Malek [13]. He pointed out the shortcomings of the fuzzy models and
proposed an intuitionistic fuzzy model using interior, boundary of membership and
interior, and boundary of non-membership.
3 Intuitionistic Fuzzy 9 Intersection Matrix
The components of a general 9 intersection matrix are interior, boundary and exterior.
These are the mutually disjoint topological properties of a set in a topological space.
For an IFS in an IFTS, the intuitionistic fuzzy interior, boundary, and exterior are
not mutually disjoint. We find three mutually disjoint topological parts here.
Definition 3.0.1 The intuitionistic fuzzy core (IFC) of a IFS A denoted by Aθ is
defined as Aθ = {x ∈ IFIntA : IFIntA(x) = ⟨ 1, 0⟩}.
Theorem 3.0.1 For a IFS A in a IFTS X , IFBdA ∧ Aθ = φ.
Proof Aθ = {x ∈ IFIntA : IFIntA(x) = ⟨ 1, 0⟩}
IFClAc = (IFIntA)c by Theorem 2.2.1, therefore, IFClAc = ⟨ 0, 1⟩, for all x ∈ X ,
(μIFBdA ∧ μIFInt A )(x) = (μIFClA ∧ μIFClAc ∧ μIFInt A )(x) = 0, and
(νIFBdA ∨ νIFInt A )(x) = (νIFClA ∨ νIFCl Ac ∨ νIFInt A )(x) = 0, for x ∈ X .
Which implies IFBd A ∧ Aθ = φ.
Theorem 3.0.2 For a IFS A in a IFTS X , IFBdA ∧ IFIntA = φ implies either
IFIntA = φ or IFIntA = Aθ .
Proof IFBd A ∧ IFIntA = φ ⇒ IFBd A = φ or IFInt = φ
IFInt = φ implies IFInt is crisp
where IFBd A = φ ⇒ IFClA ∧ IFClAc = φ
⇒ either IFCl A is empty or IFCl Ac is empty.
IFClA is empty would imply IFIntA as empty, also
whereas IFClAc is empty ⇒ IFClAc = ⟨ 0, 1⟩
⇒ (IFClAc )c = ⟨ 1, 0⟩
⇒ IFIntA = ⟨ 1, 0⟩
Therefore, either IFIntA = φ or IFIntA = Aθ .
For a particular set, A in a crisp topological space X, the whole space split into
IntA, BdA and A− . In crisp set, IntA ∪ BdA = ClA and ClA − IntA = BdA. Thus,
we can say Cl A splits into IntA, BdA. But, in case of IFS, the intersection between
IntA and BdA to in general non-empty and the Theorem 3.0.1 suggest that the
decomposition of IFClA in IFS is due to the Aθ . So, in a IFTS, the IFClA splits into
Aθ and IFClA − Aθ .
Definition 3.0.2 The intuitionistic fuzzy fringe (IFF) of an intuitionistic fuzzy set
A denoted by ΔA is defined as ΔA = IFCl A − Aθ .
It is clear from the definition of ΔA that ΔA ∨ Aθ = IFCl A.
Theorem 3.0.3 For a IFS A in a IFTS X , the intersection between Aθ and ΔA is
empty.
Proof Aθ ∧ ΔA
= Aθ ∧ IFClA ∧ ( Aθ )c
= φ, as Aθ is crisp.
Definition 3.0.3 Let A be a IFS in IFTS. We denote the intuitionistic fuzzy outer
(IFO) of A as A∗ , defined by A∗ = {x ∈ (ClA)c : (ClA)c = ⟨ 1, 0⟩}.
Theorem 3.0.4 IFClA ∧ A∗ is empty.
Proof A∗ is a crisp, so either A∗ (x) = ⟨ 0, 1⟩ or A∗ (x) = ⟨ 1, 0⟩, for any x ∈ X .

When A∗ (x) = ⟨ 0, 1⟩, then obviously (IFClA ∧ A∗ )(x) = ⟨ 0, 1⟩,
and when A∗ (x) = ⟨ 1, 0⟩ ⇒ (ClA)c (x) = ⟨ 1, 0⟩ ⇒ ClA(x) = ⟨ 0, 1⟩ ⇒ (IFClA ∧
A∗ )(x) = ⟨ 0, 1⟩.
So, in each cases (IFClA ∧ A∗ )(x) = ⟨ 0, 1⟩, which implies IFClA ∧ A∗ is empty.
Corollary 3.0.1 For a IFS A in a IFTS X , ΔA ∧ A∗ is empty.
From the discussions in this section, we can conclude the following.

Corollary 3.0.2 Aθ , ΔA and A∗ are mutually disjoint.
Proof From the definition of the Aθ and A∗ , it is very much obvious that the inter-
section between them is empty.
From Corollary 3.0.1, we can say that the intersection between ΔA and A∗ is empty,
and Theorem 3.0.3 proves that intersection between Aθ and ΔA is empty.
4 Application of the Proposed Definitions
IFS certainly gives a better description of the objects as it considers the membership,
non-membership, as well as any kind of indeterminacy or the hesitancy of objects.
Therefore, considering the geographical elements as a fuzzy intuitionistic set would
significantly improve the accuracy of the GIS models. The geographical elements
considered as intuitionistic fuzzy needs an upgraded 9 intersection matrix to obtain
the relationship between objects. As we discussed earlier, the 9 intersection matrix
was introduced by Egenhofer and Franzosa [7] and later, it has been upgraded [11,
15] by many others for finding the relationship between the fuzzy objects.
4.1 Importance of the Proposed Definition
Given any set A in a topological space, the whole space decomposed into these
three mutually exclusive topological parts, namely the IntA, BdA and A− . But in
the case of a IFS A, intersection between any two of IFIntA, IFBd A and A− can
be non-empty. Corollary 3.0.2 in the previous section shows that the Aθ , ΔA and
A∗ are mutually disjoint and for any IFS A in an IFTS X , the whole space splits
into the mentioning subsets of X . Figure 1 is the geometrical interpretation of an
intuitionistic fuzzy area object in the whole space and the decomposition Aθ , ΔA and
A∗ . Now, to form a 9 intersection matrix, it requires to establish that Aθ , ΔA and A∗
Fig. 1 Decomposition of an
intuitionistic fuzzy area
object
are topological properties. By definition, any intuitionistic fuzzy homeomorphism

[5] is crisp preserving. Thus Aθ and A∗ are topological properties, and as ΔA =
IFCl A − Aθ = IFClA ∧ ( Aθ )c , ΔA is a topological property.
Thus a 9 intersection matrix can be formed to obtain the relationship between any
two IFS in an IFTS, using this intuitionistic decomposition.
4.2 Intuitionistic Fuzzy 9 Intersection Matrix
The 9 intersection matrix between two IFS in a IFTS A and B is defined as follows.
⎛ θ ⎞
A ∧ B θ Aθ ∧ ΔB Aθ ∧ B ∗
I = ⎝ΔA ∧ B θ ΔA ∧ ΔB ΔA ∧ B ∗ ⎠
A∗ ∧ B θ A∗ ∧ ΔB A∗ ∧ B ∗
Theoretically, it is possible to accrue 29 = 512 different relations from this matrix,

but certain rules have to be maintained while investigating the topological relation
between two geographical objects, which will reduce the total number of relations
between objects. A further investigation is needed to find out the exact number of
relations that can be found using the proposed matrix.
Uncertainty and hesitancy in modeling are unarguably the vital aspects. Almost
every data models suffer the situation of uncertainty and hesitancy. The GIS models
are no exception. The existing fuzzy modelings of the geographical objects were
introduced to counter the uncertainty, whereas intuitionistic fuzzy was introduced to
deal with hesitancy. This paper has introduced a framework, viz. IF9M to determine
the topological relationship between spatial objects having uncertainty as well as
hesitancy.
References
1. Atanassov KT (1986) Intuitionistic fuzzy sets. Fuzzy Sets Syst 20(1):87–96

2. Cheng T, Molenaar M, Lin H (2001) Formalizing fuzzy objects from uncertain classification
results. Int J Geogr Inf Sci 15(1):27–42
3. Clementini E, Di Felice P (1996) An algebraic model for spatial objects with indeterminate
boundaries. In: Geographic objects with indeterminate boundaries, vol 2, pp 155–169
4. Cohn AG, Gotts NM (2020) The ‘egg-yolk’ representation of regions with indeterminate bound-
aries. In: Geographic objects with indeterminate boundaries. CRC Press, pp 171–187
5. Çoker D (1997) An introduction to intuitionistic fuzzy topological spaces. Fuzzy Sets Syst
88(1):81–89
6. Du S, Qin Q, Wang Q, Li B (2005) Fuzzy description of topological relations I: a unified
fuzzy 9-intersection model. In: International conference on natural computation. Springer, pp
1261–1273
7. Egenhofer MJ, Franzosa RD (1991) Point-set topological spatial relations. Int J Geogr Inf Syst
5(2):161–174
8. Egenhofer MJ, Herring J (1990) Categorizing binary topological relations between regions,
lines, and points in geographic databases. Technical report 9(94-1). National Center for Geo-
graphic Information and Analysis, Santa Barbara, CA, p 76
9. Fisher P (1996) Boolean and fuzzy regions. In: Geographic objects with indeterminate bound-
aries, vol 2
10. Hur K, Kim JH, Ryou JH (2004) Intuitionistic fuzzy topological spaces. Pure Appl Math
11(3):243–265
11. Liu K, Shi W (2006) Computing the fuzzy topological relations of spatial objects based on
induced fuzzy topology. Int J Geogr Inf Sci 20(8):857–883
12. Liu K, Shi W (2009) Quantitative fuzzy topological relations of spatial objects by induced
fuzzy topology. Int J Appl Earth Obs Geoinf 11(1):38–45
13. Malek MR (2004) Spatial object modeling in intuitionistic fuzzy topological spaces. In: Inter-
national conference on rough sets and current trends in computing. Springer, pp 427–434
14. Shi W, Liu K (2007) A fuzzy topology for computing the interior, boundary, and exterior of
spatial objects quantitatively in GIS. Comput Geosci 33(7):898–915
15. Tang X, Kainz W (2002) Analysis of topological relations between fuzzy regions in a general
fuzzy topological space. In: Symposium on geospatial theory, processing and applications.
Citeseer, pp 1–15
16. Tang X, Kainz W, Wang H (2010) Topological relations between fuzzy regions in a fuzzy
topological space. Int J Appl Earth Obs Geoinf 12:S151–S165
17. Tao C, Molenaar M, Bouloucos T (1997) Identification of fuzzy objects from field observation
data. In: International conference on spatial information theory. Springer, pp 241–259
18. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
A Hybrid Model of Latent Semantic
Analysis with Graph-Based Text
Summarization on Telugu Text
Aluri Lakshmi and D. Latha
Abstract In this paper, we are proposing a hybrid model of latent semantic analysis
with graph-based xtractive text summarization on Telugu text. Latent semantic anal-
ysis (LSA) is an unsupervised method for extracting and representing the contextual-
usage meaning of words by statistical computations applied to a corpus of text.
Text rank algorithm is one of the graph-based ranking algorithm which is based on
the similarity scores of the sentences. This hybrid method has been implemented
on Eenadu Telugu e-news data. The ROUGE-1 measures are used to evaluate the
summaries of proposed model and human-generated summaries in this extractive
text summarization. The proposed LSA with Text rank method has a F1-score of
0.97 as against the F1-score of 0.50 for LSA and 0.49 of Text rank methods. The
hybrid model yields better performance compared with the individual algorithms of
latent semantic analysis and Text rank results.
Keywords Text summarization · Latent semantic analysis · Text rank algorithm ·

Singular value decomposition · Telugu language
1 Introduction
In today’s information technology environment, excessive and vast information is

available on Web resources, but it is very difficult to find useful and important infor-
mation. Automatic text summarization is extracting the meaningful information from
a large carpus of text. “Latent semantic analysis (LSA) is a mathematical model that
extracts the conceptual relations among the terms that exists within a sentence or
within a document or documents” [1]. External vocabularies, human-constructed
dictionaries, information systems, grammar, and morphologies are not used in this
model because extraction in this model depends only on the input text. The working
principle of LSA is based on mathematical model and not on the language of the
source text [2]. Telugu is morphologically rich language. The morphology of Telugu
A. Lakshmi (B) · D. Latha

Adikavi Nannaya University, Rajamahendravaram, Andhara Pradesh 533296, India
e-mail: alurilakshmi@gmail.com
180 A. Lakshmi and D. Latha
verbs and nouns is more complex than those of the English language. Thus, adapting
the existing models of text summarization for Telugu language is not feasible. This
has resulted in less reported work done on Telugu text. So, in this paper, we are
proposing and implementing a generic evaluative extractive text summarization on
Telugu text using LSA model.
The contents of this paper are in seven sections. Section 2 explains the related
work on LSA and Text rank. Section 3 describes the latent semantic analysis and
existing algorithms proposed by the authors Gong and Liu, Steinberger and Jezek,
Murray et al. and Ozsoy et al. Text rank algorithm is explained in Sect. 4. In Sect. 5,
the proposed algorithm and its implementation are discussed. Section 6 illustrates the
results of the proposed algorithm, its evaluation metrics, and comparative analysis.
Conclusion statements and future scope are specified in Sect. 7.
2 Related Work
The important and relevant studies on text summarization using LSA and Text rank
for documents in English and some Indian languages are reviewed in this section.
Gong and Liu [3] proposed two generic text summarization methods such as relevance
measures and LSA. Steinberger and Jezek [4] proposed two new evaluation methods
based on LSA. These methods measured the similarity between the summary and its
original document. Murray et al. [5] proposed an automatic speech summarization
using maximal marginal relevance (MMR) and latent semantic analysis. Ozsoy et al.
[6] proposed a text summarization using LSA on Turkish documents. They proposed
cross-method in this paper. Dokun and Celebi [7] proposed two approaches such as
avesvd and ravesvd on English documents using latent semantic analysis. Chowdary
[8] proposed a generic text summarization using latent semantic analysis on Bengali
text. Geetha et al. [9] proposed a text summarization using latent semantic analysis
on Kannada text. Two approaches (cross-method, Steinberger, and Jezek) are used
for generating the summary. In [10], survey of cross-domain text categorization
techniques has been presented. Improving classification accuracy and dimensionality
reduction of a large text data by least square support vector machines along with
singular value decomposition was implemented in [11]. Kumar [12] proposed new
hybrid model based on fuzzy logic using two graph-based techniques known as
Textrank and lexrank and latent semantic analysis. Mandal and Singh [13] proposed
a generic and query-based text summarization using latent semantic analysis. Reddy
[1] proposed a hybrid model for text categorization using SVM classifier with latent
semantic analysis.
A Hybrid Model of Latent Semantic Analysis … 181
3 Latent Semantic Analysis
Latent semantic analysis (LSA) is an unsupervised method that is a combination of

statistical and algebraic methods which help to find out the hidden structure of words
and or between words, sentences, and documents. The method uses singular value
decomposition (SVD) for analyzing the relationship between a set of documents and
the terms and also used for dimensionality reduction. LSA algorithm is implemented
[14] in 3 steps as discussed below.
Step 1: Term–Document Matrix

The first step of LSA algorithm is the construction of term by document matrix. The
term–document matrix is based on term frequency–inverse document frequency (TF–
IDF) vectorization. Before calculating the TF–IDF matrix, preprocessing should be
done [13].
Step 2: Applying SVD to Term–Document Matrix

The second step of LSA algorithm is applying singular value decomposition to the
term–document matrix. The term by document matrix is decomposed into three
matrices such as U, S, and V T are called singular value decomposition (SVD). The
SVD performs decomposition for exhibiting the relationship between words and
sentences and also used for dimensionality reduction. The SVD of a matrix A is
defined in Eq. (1).
A = USVT (1)
where U = [uij] is a term by concept (t × c) column-orthonormal matrix, S =

diag(S1, S2, Sn) is a connect by concept (c × c) diagonal matrix, whose diagonal
elements are only positive singular values arranged in non-ascending order, V = [vij]
is a sentence by concept (s × c) orthonormal matrix.
Step 3: Sentence Selection Process

Sentence selection process is the last step of the implementation. Based on the
sentence ranking, the top ranked sentences can be selected for summary. There
are different algorithms (Gong and Liu, Steinberger and Jezek, Murray et al. and
Ozsoy et al.) for selecting the sentences. Makbule Gulcin Ozsoy has introduced the
cross-method which uses VT matrix for sentence selection. In this paper, we used
cross method for selecting the best sentences because this method given better results
compare than the all the above approaches [6].
Fig. 1 Text rank algorithm
4 Text Rank Algorithm
Text rank method is one of the popular graph-based ranking algorithm. Text rank
method is used to extract the sentences based on the sentence scores. It is based on the
page rank algorithm which is used for ranking the Web pages in search engine results
[15]. Text rank algorithm is illustrated in Fig. 1. The following steps implement the
Text rank algorithm.
Step 1: The input document is tokenized into sentences.
Step 2: Find the vectors for each sentence here term frequency–inverse document
frequency (TF–IDF) vectorization is used. Term frequency identifies the
term importance in a document. Inverse document frequency is the number
of occurrences of a term in a collection of documents.
Step 3: Find the similarity using cosine similarity between sentence vectors.
Step 4: The similarities stored in matrix format have to be represented as graph.
The nodes of this graph represent the sentences, and the edges represent the
similarity scores between the sentences.
Step 5: Top-ranked sentence scores are selected to form a summary.
5 Proposed Algorithm
In the proposed algorithm, the hybrid model of LSA and Text rank summarization
methods is used for extracting the sentences based on their sentence ranking. Figure 2
shows the detailed flowchart of the proposed algorithm (hybrid model).
In the preprocessing step, cleaning (removing unnecessary symbols) and tokeniza-
tion have been done. From the preprocessed document, the term document matrix
with TF–IDF has been constructed. LSA and Text rank algorithm discussed in the
earlier sections will be implemented separately with the term document matrix.
The two algorithms generate two different sets of the top “n” sentences with their
scores individually [12]. Sentences common in both the results are included in the
Input Document
Pre-processing
Create the term document

matrix with TF-IDF
Decompose the matrix into Find the cosine similarly of

U, Sigma, VT the vectors
Apply the cross Construct the Graph

method on VT.
Select the top ranked sentences based Select the top ranked
on length scores. sentences.
Summary Summary
Arrange in ascending order

based on length
Select the top ‘n’

sentence
Final Summary
Fig. 2 Proposed algorithm (hybrid model)
final summary. The common sentences are considered as very important because
these are selected by both algorithms. If there are no common sentences, the top
n sentences from the merged list of sentences sorted according to their length are
selected as final summary.
6 Evaluation and Experimental Results
We have used unigram overlap method for evaluating the proposed summaries [8].
Precision, recall, F1-score are the metrics used to analyze the efficiency of the
proposed method in text summarization. Precision, recall, F1-score are calculated

by using Eqs. (2–4).
U H ∩U M
Precision = (2)
UH
U H ∩U M
Recall = (3)
UM
2 ∗ Precision ∗ Recall
F1 = (4)
Precision + Recall
where |UH| is the number of unigrams is selected by the human generated summary,
|UM| is the total number of unigrams generated by the system generated summary,
|UH ∩ UM| is the number of unigrams common in human generated summary and
system generated summary.
The proposed hybrid model based on LSA and Text rank method was implemented
on a sample dataset which is manually generated from one of the popular daily
newspaper Eenadu’s e-news data. A 100 news articles were collected to evaluate the
performance of the model. The proposed LSA with Text rank method is compared
with LSA and Text rank algorithms in 5 categories of Telugu e-news data. The
F1-score is measured for the proposed LSA with Text rank method for the text
summarization in Telugu e-news data as shown in Fig. 3. The comparative analysis
of the proposed LSA-Text rank method with LSA and Text rank algorithms is shown
in Table 1. The comparison shows that the proposed LSA with Text rank method has
higher efficiency than LSA and Text rank methods in Andhra Pradesh and business
categories. In remaining categories, the proposed hybrid model results are equivalent
to either LSA or Text rank methods.
1.2
0.8
F1 - Score
0.6
0.4
0.2
0
Andhra Politics Crime Fresh news Sports
Pradesh
Category
LSA Text Rank Hybrid Model (Proposed Method)
Fig. 3 F1-scores of the proposed LSA with Text rank method

Table 1 Comparative analysis

Method
Category LSA Text rank Hybrid model (Proposed method)
Andhra Pradesh 0.50 0.49 0.97
Politics 0.54 0.34 0.82
Crime 0.60 0.58 0.60
Fresh news 0.78 0.71 0.78
Sports 0.42 0.41 0.42
7 Conclusion
In this paper, we proposed a hybrid model of latent semantic analysis with graph-
based text summarization on Telugu text. The daily newspaper Eenadu Telugu e-news
data were collected to evaluate the performance of the proposed LSA with Text rank
method. We evaluated our approach by computing the precision, recall, and F1-score
using the ROUGE metrics. The result shows that the proposed LSA with Text rank
method has a F1-score of 0.97, and existing LSA, Text rank methods have the F1-
score 0.50 and 0.49, respectively. The proposed method yields better score results
when compared with the individual algorithms of latent semantic analysis and Text
rank. It has been observed to work well for small documents. The future scope of
this work will be to apply the model to large documents and extend it to perform
abstractive summarization on documents in Telugu language.
References
1. Reddy PVP. A hybrid approach for Tex categorization with LSA. 5, pp 20181–20188. ISSN
NO:1076-5131
2. Suleman RM, Korkontzelos I (2021) Extending latent semantic analysis to manage its syntactic
blindness. Expert Syst Appl 165:114130. https://doi.org/10.1016/j.eswa.2020.114130
3. Gong Y, Liu X (2001) Generic text summarization using relevance measure and latent semantic
analysis. In: Proceedings of the 24th annual international ACM SIGIR conference on research
and development in information retrieval, pp 19–25
4. Steinberger J, Jezek K (2017) Using latent semantic analysis in text summarization
5. Murray G, Renals S, Carletta J (2005) Extractive summarization of meeting recordings X2Vi
6. Ozsoy MG, Cicekli I, Alpaslan FN (2010) Text summarization of Turkish texts using latent
semantic analysis. In: Proceedings of the 23rd international conference on computational
linguistics (Coling 2010) 2:869–876
7. Dokun O, Celebi E (2015) Single-document summarization using latent semantic. Analysis
1:1–13
8. Chowdhury SR (2017) An approach to generic Bengali text summarization using latent
semantic analysis. In: 2017 International conference on information technology, pp 11–16.
https://doi.org/10.1109/ICIT.2017.12
9. Kannada text summarization (2015), pp 1508–1512
10. Murty MR, Murthy JVR, Prasad Reddy PVGD, Satapathy SC (2012) A survey of cross-domain
text categorization techn iques. In: 2012 1st international conference on recent advances in
information technology RAIT-2012. pp 499–504. https://doi.org/10.1109/RAIT.2012.6194629
11. Murty MR, Murthy JV, Prasad Reddy PVGD (2011) Text document classification basedon
least square support vector machines with singular value decomposition. Int J Comput Appl
27:21–26. https://doi.org/10.5120/3312-4540
12. Kumar A (2020) Fuzzy logic based hybrid model for automatic extractive text summarization,
pp 7–15
13. Mandal S, Singh GK (2020) LSA based text summarization. Int J Recent Technol Eng 9:150–
156. https://doi.org/10.35940/ijrte.b3288.079220
14. Hussein A, Joan A, Qiang L (2019) An efficient framework of utilizing the latent semantic
analysis in text extraction. Springer, US. https://doi.org/10.1007/s10772-019-09623-8
15. Vijay R, Vangara B, Vangara SP (2020) A hybrid model for summarizing text documents using
text rank algorithm and term frequency
A Combined Approach of Steganography
and Cryptography with Generative
Adversarial Networks: Survey
Kakunuri Sandya and Subhadra Kompella
Abstract Secure transformation of data over public networks like the Internet is
nothing but achieving authenticity, secrecy, and confidentiality in secure data trans-
mission is now the primary concern. These issues may be solved by using data
hiding techniques. Steganography, cryptography, and watermarking techniques are
used to hide data and ensure its security during transmission. Objective of this submis-
sion is to analyze and examine several methods of deep learning in image cryptog-
raphy and steganography. The hidden message is revealed via steganography, but its
format is altered through cryptography. Steganography and cryptography are both
essential and robust techniques. This paper’s primary goal is to explore several inte-
grating steganography with encryption to create a hybrid system. In addition, specific
differences were also given between cryptography and steganographic approaches.
This paper aims to help other researchers’ summaries current trends, problems, and
possible future directions in this area.
Keywords Image steganography · Cryptography · Deep learning · Data hiding ·

Encryption
1 Introduction
The Internet’s inception and subsequent expansion made digital communication

easier but consequently. That is the problem of information security over open
networks. It is not safe to transmit and receive data over email and Web browsers
since sensitive information like the information provided by credit cards might
be intercepted via this medium [1]. Secure and private communication for online
users is necessary. Steganography and cryptography are two distinct approaches that
preserve data’s authentication, confidentiality, and integrity. Steganography attempts
K. Sandya (B) · S. Kompella

GITAM Deemed to Be University, Vishakhapatnam, India
e-mail: skakunur@gitam.in
S. Kompella
e-mail: skompell@gitam.edu
188 K. Sandya and S. Kompella
to hide digital media hidden messages in such a manner that they cannot be detected.
Steganography is mainly designed to communicate secret messages using images
securely. Steganography does not modify the secret data; instead, it hides it inside
image, video, or audio so that it cannot be detected [2]. Messages are encrypted
using cryptography to keep them safe from unwanted access [3]. The techniques for
steganographic can be tracked, or the steganography system can be known, as long
as the encoding method is understood.
The stenographic technique allows for disclosing the transmission by digital media
of messages. The methods of communication between senders and receivers [4]
are invisible. Cryptography hides the integrity of information such that it cannot
be decoded by anybody other than the sender and receiver. Data integrity, entity
authenticity, and data authenticity are elements of information security connected to
cryptography, a mathematical study [5].
2 Background Works
This section explores data hiding techniques used primarily on industrial and military
applications. The data ensure maximum security. The secret data are altered by
cryptography; steganography conceals the personal data’s existence, and watermark
data ownership is marked.
Traditional-Based Cryptography Methods

In [6], the authors use the combined key sequence of logistic and duffing maps to
encrypt and decrypt the image by shuffling the adjacent pixels behind this work. To
obtain a single key sequence {Ka}, both the key sequences are joined using the XOR
function. To encrypt the image, the key sequence {Ka} acquired is utilized.
In [7], Blowfish is the algorithm employed by authors of cryptographic algo-
rithm encryption. The file size problem is also a calculation and file security. The
compression of large files by LZW is one of the compression algorithms which uses
a dictionary.
The paper [8] suggested the system’s compression and data encryption notion.
Data compression and cryptography were focused in the first stage. We emphasized
compression cryptosystems in the next phase. The recommended technique utilized
for data compression and encryption has finally been explained.
In [9], the authors discussed how to secure correspondences in an exchange such
that fraudsters are tackled in various circumstances. The conclusion was that the
propelled marks reduce time and improve safety.
Some linear collision-resistant hash functions were converted to one-time signa-
tures in [10] using a generic method. The signature schemes that arise are proven
safe based on the worst-case hardness of approximating the shortest vector (and other
common lattice problems) in the corresponding class to a polynomial factor.
A Combined Approach of Steganography … 189
Traditional-Based Steganography Methods

In [11], the authors introduced a leading and more secure strategy for the least
significant bit of the steganography with various capacities for hiding information and
a signal-to-noise ratio (SNR). Dismissing signal-to-noise ratio (SNR) and vice versa
increased the ability to hide. Depending on the situation in which it was employed, a
give and take were established between capacity and SNR. A compromise was made
with applying the variable least significant bits (VLSBs) steganography to obtain the
necessary parameters.
Using a new, hashing-based method for steganography in grayscale images, the
scientists published their findings in [12]. There is a better way to transfer data
using the process that has been given. A VB.NET-coded prototype tool implements
the approach. File formats like BMP and tiff may be used successfully using the
described method.
Using pixels’ most significant bits (MSBs), the authors in [13] developed a novel
method to image steganography. Using bit 5 and image 6 differentials, hidden bits
are stored in bit 5. Bit 5’s value changes if bits 5 and 6 vary from the secret data bit.
Bit 5’s value is changed. There is a substantial improvement in the signal-to-noise
ratio with this technique, as shown by the findings.
In [14], the authors discussed several ways of steganography that suggested hidden
image data such as LSB, DCT, pixel value differentiation, and DFT. However, these
approaches suffer from issues such as less hiding capacity, decreased image quality,
and hidden data security when more data are hidden within.
In [15], the researchers suggested using an efficient embedding technology of
modifying the direction of a novel quantum-image steganography method. For ease
of reference, this embedding method is the EMD embedding.
Hybrid Methods
In [16], the authors proposed a cloud architecture that would allow safe data transfer
from the client’s organization to the cloud service provider’s servers (CSP). The data
are sent over the network using a hybrid technique that combines cryptography and
steganography.
For example, in [17], the authors reviewed the current steganographic method-
ology for video copying, forgeries, and unauthorized access by using LSB techniques,
RSA algorithmic methods, and steganographic DNA methods. Existing concealing
techniques included drawbacks such as increased key size, higher computation cost,
decreased performance, and larger input sizes. Compared to traditional cryptography
methods, the proposed HECC-based DNA steganography increases the encrypting
and decryption processing times by 30 and 42%.
There was an emphasis on security, speed, and load [18]. The suggested approach
begins by updating two entries: medical image data and medical report data. The
suggested technique justifies its performance on example images with an average
PSNR of 55–70 dB, an MAE of 0.2–0.7%, and an average correlation coefficient of
1 (SSIM/sc/correlation coefficient).
3 Advanced Techniques
Steganalysis Algorithms Based on Deep Learning

The CNN model is commonly used for steganalysis since the CNN classification
process is the conventional steganalysis algorithm.
Before deep learning was utilized for information hiding, a steganalysis paradigm
based on deep understanding was proposed. The algorithm’s structure is shown in
Fig. 1. Using a standard preprocessing method with a high-pass filter, the algorithm’s
primary goal is noise reduction. The preprocessed image is sent into the CNN model
for image extraction.
Using this novel paradigm of steganalysis, the authors in [20] proposed ways to
enhance transfer learning’s performance while also aiding the training of CNN in
steganalysis techniques. This technique performs better than the standard steganal-
ysis technique when detecting the WOW algorithm by employing the SRM feature
set (Fig. 2).
Fig. 1 Framework of steganalysis algorithm [19]
Fig. 2 Flowchart of the framework using learning and transferring representations [20]
CNN-Based Steganography Methods

There is a novel method of concealment described in [21]. First, the improved GAN
Wasserstein (WGAN-GP) model is constructed, and disguised and hidden pictures
are created. After the model is stable, a covered picture is sent to the generator. Finally,
the generator produces a visually similar image to the secret image, resulting in the
same effect that sending the hidden image would have on the recipient’s mind.
The coverless image steganography (CLS) of [22] proposed by authors to select
images conveying desired secrets was extensively considered since the cover image
is not designed to conceal secret information. The nature of the image is maintained
to avoid this situation.
The authors introduced steganalysis in [23] as an all-encompassing method based
on embedding code elements (CEs), Bi-LSTM, and CNN, which incorporates the
processes for attracting attention. Each frame begins with a CE being transformed
to a multi-hot vector.
In [24], the authors proposed a novel “generative sampling steganography”
method for information concealing based on data (GSS). In our approach, a powerful
generator samples the stego image instead of using an explicit cover, as is the case
with conventional steganography. Messages may be embedded and extracted using
a secret key that both parties share.
Using the generative steganography with the Kerckhoff principle (GSK), the
authors in [25] proposed a new steganography method. GSK uses a cover picture
as a generator to create hidden messages that aren’t part of the cover, so the cover
doesn’t change. Based on generative adversarial networks, the generators are taught
to adhere to the Kerckhoffs principle (GANs). Except for the extraction key, all
publicly available information for receivers pertains to the GSK system.
The authors of [26] made a first attempt to deal with this problem. Exper-
imental results have shown that the proposed methods may further restrict the
semantic expression of the produced steganographic text to enhance its cognitive
imperceptibility based on ensuring a particular level of perceptual and statistical
imperceptibility.
Steganography based on the generative adversarial networks (GANs) was
discussed in [27] to combat the detection of strong steganalysis in the minimax game
between a generator and a discriminator. Based on the GANs, the steganography
without embedding (SwE) demonstrates its contemporary steganographic capabil-
ities using stego pictures that conceal sensitive data. SwE’s GAN-based flaws, on
the other hand, include low information recovery accuracy, limited steganography
capability, and poor natural presentation.
For embedding messages automatically, the authors in [28] proposed an audio
steganography system that would offer a better steganographic cover for audio. The
steganalyzer generator, discriminator, and trained deep learning all fall within the
purview of the proposed system’s training framework.
In [29], earlier work has been expanded, and the problem of invertible stereography
crypto space in deep neural networks has been addressed. A classic study on invertible
steganography crypto space was re-visited in which message decoding and image
recovery issue is seen as a form of binary classification.
Secure Image Steganography Based on GANs

The image used to hide data is termed the cover in this method, and the image of
the embedded hidden information is known as the stego image. GANs have also
transformed steganography into a full security measure and attractive study area as
they work with images. GANs may be used for secure steganography of images, and
some of the authors’ efforts are mentioned in Table 1.
4 Fallouts and Discussion
Experimentation with both gray scale and color images may be performed about
cryptography, steganography, and compression methods—detailed discussion of the
reasons behind improving the performance of the recommended approaches. The
quantitative measures were generated and assessed to emphasize the method’s char-
acteristics. A significant effort has been made to establish adequate security algorithm
quantitative metrics that can evaluate and compare different conventional security
algorithms. The experiments on natural (real world) images and benchmarks were
performed to assess the algorithms objectively.
Quantitative Metrics
Four quantitative measures are utilized for the evaluation of the performance of new
techniques provided in the study: peak signal-to-noise ratio (PSNR), mean square
error [MSE], changing pixel rate number [NPCR], and unified averaged changed
intensity (UACI). PSNR is utilized for the image quality and MSE for the image
distortion measurement, and the NPCR, UACI for the evaluation of image encryption
techniques are employed.
Peak Signal Noise Ratio (PSNR): When the projected PSNR value is high, the
performance of the proposed system is high. PSNR improves its performance because
it is based on the region of the pixels in the received image, and the noise is eliminated
during the image recovery process instead of the pixel. Regarding secret images, it
determines the quality of the decoded image.
MAX f
PSNR = 20 log10 √
MSE
where MAX f is the decoded image total pixel count and MSE mean square error.
Mean Square Error (MSE): When the estimated value of MSE is low, the perfor-
mance of the suggested system is high. Because of the low MSE, the proposed
technique is susceptible to multivariate images, and recovery noises are eliminated
throughout the image process.
1 Σ Σ
m−1 n−1
MSE = ∥ f (i, j ) − g(i, j )∥2
m−n 0 0
Table 1 Secure image steganography related works based on GANs

Author Findings GAN Algorithms/technologies Merits Research
type limitations
Zhang In this Vanilla proposed a new, Real stego Domain-less
et al. authors, the GAN ISGAN-named CNN pictures and hidden
(2018) invisibility has architecture that would better secret information
been enhanced hide a secret gray image images are
by hiding the on the sender’s side and shown
secret image precisely extract the
exclusively on secret image from the
the Y channel receiver’s side
of the cover
image
Hayes The Vanila Training adversarial to Competitive Computational
et al. discriminatory GAN the discriminatory job of versus the complexity
(2017) function of learning a traditional
these steganographic ways of
researchers is algorithm steganography
to determine if required
an image has
secret
information
Shi et al. These authors WGAN Adversarial learning Faster speed Not suited for
(2017) suggest a system with the S-net of every dataset
novel method re-design, called convergence, type
that provides SSGAN more
more consistent
appropriate training, and
and secure better quality
covers for images
steganography
and
adversarial
learning
Weixuan In this article, DCGAN An automated learning Incorporate Prone to
et al. embedding approach for hidden bits in mistakes
(2017) alteration for steganographic textured areas. during
each pixel of a distortion using a Good transmission
certain spatial generative adversarial performance
cover image network in security
was
unavoidably
learned
whereas,
‘m’ represents secret image width.
‘n’ represents secret image height.
‘ f (i, j )’ represents original secret binary image.
‘g(i, j )’ represents decoded secret binary image.
NPCR and UACI: Decoded image quality is assessed between pixel change rate
number (NPCR) and unified average changing intensity (UACI).
NPCR may be specified as
1 Σ
NPCR = K (i, j )
X ∗Y
Let k1(i, j ) is image input and k2(i, j ) is image decoded. The image is X and Y
correspondingly in width and height.
⎧
amp; 1, ifk1(i, j ) == k2(i, j )
k(i, j ) =
amp; 0, else
UACI may be specified as,
1 Σ
UACI = [k1 (i, j ) − k2 (i, j )]/255 ∗ 100%
X ∗Y
5 Conclusion
The steganography may not be utilized in alternation to cryptography after compar-

ative research between the science of cryptography and steganography because each
aspect has its particularities. Cryptography means the process of secret writing
through the encoding and decoding of encoded messages. Steganography refers to
the techniques in which a secret message is hidden inside a cover message. The
system is susceptible to third parties with the use of merely one of these approaches.
Thus, steganography and cryptography combination provides better security and
robustness.
References
1. El-kenawy E-S, Saber M, Arnous R (2019) An integrated framework to ensure information

security over the internet. Int J Comput Appl 178:13–15. https://doi.org/10.5120/ijca20199
19117
2. Alqadi Z (2020) Image blocking to hide secret message. Int J Comput Sci Mob Comput 9.
https://doi.org/10.47760/ijcsmc.2020.v09i11.002
3. Asratian R (2019) Protected message processing in distributed systems on the basis of
cryptographic message syntax. Inform Tehnol 25:435–440. https://doi.org/10.17587/it.25.
435-440
4. Chandra P, Suhartana I (2021) Web-based image steganography application to hide secret

messages. JELIKU (J Elektr Ilmu Komput Udayana) 9(375). https://doi.org/10.24843/JLK.
2021.v09.i03.p08
5. Zaidan BB, Zaidan AA, Al-Frajat AK, Jalab HA (2010) On the differences between hiding
information and cryptography techniques: an overview. J Appl Sci (Faisalabad) 10:1650–1655
6. Gupta Vibhor, Metha G (2018) Medical data security using cryptography, pp 866–869. https://
doi.org/10.1109/CONFLUENCE.2018.8442712
7. Rahim R, Adyaraka D, Sallu S, Sarimanah E, Hidayat A, Sewang A, Hartinah S (2018) An
application data security with lempel-ziv welch and blowfish. Int J Eng Technol (UAE) 7:71–73
8. Singh A, Gilhotra R (2011) Data security using private key encryption system based on
arithmetic coding. Int J Netw Secur Appl (IJNSA) 3. https://doi.org/10.5121/ijnsa.2011.3305
9. Sheshasaayee A, Anandapriya B (2017) Digital signatures security using cryptography for
industrial applications, pp 379–382. https://doi.org/10.1109/ICIMIA.2017.7975640
10. Lyubashevsky V, Micciancio D (2017) Asymptotically efficient lattice-based digital signatures.
J Cryptol 31. https://doi.org/10.1007/s00145-017-9270-z
11. Khan S, Yousaf MH, Wahid M (2015) Variable least significant bits garyscale image
Stegnography. Pak J Sci 67:281–287
12. Bajwa I, Riasat R (2011) A new perfect hashing based approach for secure stegnography. In:
2011 6th International conference on digital information management, ICDIM 2011. https://
doi.org/10.1109/ICDIM.2011.6093325
13. Islam AU, Khalid F, Shah M, Khan Z, Mahmood T, Khan A, Ali U, Naeem M (2016) An
improved image steganography technique based on MSB using bit differencing. In2016 Sixth
international conference on innovative computing technology, pp 265–269. https://doi.org/10.
1109/INTECH.2016.7845020
14. Singh A, Singh H (2015) An improved LSB based image steganography technique for RGB
images, pp 1–4. https://doi.org/10.1109/ICECCT.2015.7226122
15. Qu Z, Cheng Z, Liu W, Wang X (2019) A novel quantum image steganography algorithm based
on exploiting modification direction. Multimedia Tools Appl 78. https://doi.org/10.1007/s11
042-018-6476-5
16. Dhamija A, Dhaka V (2015) A novel cryptographic and steganographic approach for secure
cloud data migration. https://doi.org/10.1109/ICGCIoT.2015.7380486
17. Pv, Vijay &Vijayalakshmi, V. &Godandapani, Zayaraz (2016) An improved level of security
for DNA steganography using hyperelliptic curve cryptography. Wireless Pers Commun 89.
https://doi.org/10.1007/s11277-016-3313-x
18. Balaji P, Murugan K, Srinivasan K, Shridevi S, Shamsudheen S, Hu Y-C (2021) Improved
authentication and computation of medical data transmission in the secure IoT using
hyperelliptic curve cryptography. J Supercomput. https://doi.org/10.1007/s11227-021-03861-x
19. Qian Y, Dong J, Wang W, Tan T (2015) Deep learning for steganalysis via convolutional neural
networks. In: Proceedings of SPIE—the international society for optical engineering, p 9409.
https://doi.org/10.1117/12.2083479
20. Qian Y, Dong J, Wang W, Tan T (2016) Learning and transferring representations for image
steganalysis using convolutional neural network, pp 2752–2756. https://doi.org/10.1109/ICIP.
2016.7532860
21. Duan X, Li B, Guo D, Zhang Z, Ma Y (2020) A coverless steganography method based on
generative adversarial network. EURASIP J Image Video Process. https://doi.org/10.1186/s13
640-020-00506-6
22. Chen X, Zhang Z, Qiu A, Xia Z, Xiong N (2020) A novel coverless steganography method
based on image selection and StarGAN. IEEE Trans Netw Sci Eng 1–1. https://doi.org/10.
1109/TNSE.2020.3041529
23. Li S, Wang J, Liu P, Wei M, Yan Q (2021) Detection of multiple steganography methods in
compressed speech based on code element embedding, Bi-LSTM and CNN with attention
mechanisms. IEEE/ACM Trans Audio Speech Lang Process 29:1556–1569. https://doi.org/
10.1109/TASLP.2021.3074752
24. Zhang Z, Liu J, Ke Y, Lei Y, Li J, Zhang M, Yang X (2019) Generative steganography by

sampling. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2019.2920313
25. Ke Y, Zhang MQ, Liu J, Su TT, Yang XY (2019) Generative steganography with Kerckhoffs’
principle. Multimedia Tools Appl 78(10):13805–13818. https://doi.org/10.1007/s11042-018-
6640-y
26. Yang Z, Xiang L, Zhang S, Sun X, Huang Y (2021) Linguistic generative steganography with
enhanced cognitive-imperceptibility. IEEE Signal Process Lett 1–1. https://doi.org/10.1109/
LSP.2021.3058889
27. Yu C, Hu D, Zheng S, Jiang W, Li M, Zhao Z-Q (2021) An improved steganography without
embedding based on attention GAN. Peer-to-Peer Netw Appl 14:1–12. https://doi.org/10.1007/
s12083-020-01033-x
28. Chen L, Wang R, Yan D, Wang J (2021) Learning to generate steganographic cover for audio
steganography using gan. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2021.3090445
29. Chang C-C (2021) Cryptospace invertible steganography with conditional generative adver-
sarial networks. Secur Commun Netw 2021:1–14. https://doi.org/10.1155/2021/5538720
Real-Time Accident Detection
and Intimation System Using Deep
Neural Networks
K. Padma Vasavi
Abstract In India, around 4.61 Lakh road accidents happened in 2017 out of which
1.49 Lakh led to fatality. It is estimated that Andhra Pradesh alone takes a share of
7416 deaths among them. Among the total accidents, the death tolls are about 55,336
only from the two-wheeler crashes which indicate the pathetic and alarming scenario
of road accidents in India. Many lives would have been saved in such conditions
if the accident vehicle is detected, and the information regarding the incident is
sent to the right people in right time. This situation motivated us to take up this
research, which detects the road accidents using a computer vision system built
around a Raspberry Pi and intimate the registered mobile numbers through IoT. The
vehicle accident detection system (VADS) is built around a Raspberry Pi interfaced
with a Web camera. The camera may be fixed in places like four road junctions, T-
junctions, and other important locations where the probability of accident occurrence
is high. The camera continuously captures the scene under consideration and gives
the input to the processor. A convolution neural network architecture is designed
and implemented to classify the severity of the accident into one among the three
categories: good, moderate, and worst. As and when the test image in the scene is
classified as a “moderate” class or a “Worst” class, the system identifies the situation
as a serious condition and immediately triggers an event in the Ubidots cloud by
using Wi-Fi interfaced with the controller. On triggered by the event received from
the processor, the cloud immediately delivers a text message to the registered mobile
numbers like: ambulance or a police control room to ensure immediate help that
saves the life of the victim.
Keywords Accident · Classification · Deep neural networks · Raspberry Pi
K. Padma Vasavi (B)

Shri Vishnu Engineering College for Women, Bhimavaram, India
e-mail: padmavasaviece@svecw.edu.in
198 K. Padma Vasavi
1 Introduction
According to a report given by the World Health Organization on road safety, India
accounts for more than 11% of the total number of road accidents with only 1% of the
world vehicles. In many cases, human lives are lost in road accidents due to delays
in emergency medical assistance [1]. According to golden hour principle [2], there
is a high probability that timely medical and surgical aid can avoid death during the
golden hour, which is the period after the traumatic injury. A decrease in the response
time of emergency medical care can reduce the probability of death by one-third on
an average. The percentage of people who die before reaching the hospital in low- and
middle-income countries is more than twice as compared to high-income countries
[3]. The rest of the paper is organized as follows: The literature related to detection
of vehicle detection is reviewed and described in Sect. 2; Sect. 3 gives the methods
used in the proposed vehicle accident detection system; Sect. 4 provides the results
and discussion of the proposed system. Finally, Sect. 5 concludes the paper.
2 Literature Review
Nowadays, rapid progress in wireless communication technology and information

broadcasting through Internet of technologies is helping the victims of the road acci-
dents by reducing the rescue time. IoT is like an umbrella that connects millions of
physical devices on Internet to exchange information. IoT is a prospective method
for tracking and management of intelligent vehicles that can network any associated
physical object to a controller. A few researchers have worked on methods to develop
that detect the accidents and give information to their families by making use of IoT.
An automatic smart accident detection (ASAD) proposed by Ali and Eid is an auto-
detection unit system that immediately notifies an emergency contact through a text
message about the details of location and time of accident, when an instant change
in acceleration, rotation, and an impact force is detected, in an end of the vehicle
[4]. Most researchers have confined their work to improve the accuracy of accident
detection, estimating the severity of road accidents, or minimizing the rescue time
post occurrence of an accident [5, 6]. In general, all the systems that are developed
for accident detection and prevention are deployed in high-end vehicles and mid-
range vehicles to some extent. However, there are no such facilities for two or three
wheelers or any heavy vehicle carrying cargo. To overcome the abovementioned
problem and rescue the lives of the people who are met with serious road accidents,
this paper proposes a computer vision-based system that estimates the severity of the
accident and immediately notifies the nearby command and control room, nearest
hospital that can immediately send an ambulance to the accident spot to save the
life of the victim. The vehicle accident detection system (VADS) is built around a
Raspberry Pi interfaced with a Web camera. The camera may be fixed in places like
four road junctions, T-junctions, and other important locations where the probability
Real-Time Accident Detection … 199
of accident occurrence is high. The camera continuously captures the scene under
consideration and gives the input to the processor. A deep learning algorithm with
a convolution neural network architecture is designed and implemented to classify
the severity of the accident into one among the three categories: good, moderate, and
worst. As and when the test image in the scene is classified as a “moderate” class or a
“worst” class, the system identifies the situation as a serious condition and immedi-
ately triggers an event in the cloud by using Wi-Fi interfaced with the controller. On
triggered by the event received from the processor, the cloud immediately delivers
a text message to the registered mobile numbers like: ambulance or a police control
room to ensure immediate help that saves the life of the victim.
3 Methods
As an initial step, the dataset comprising of vehicles in good condition, affected

to moderate level accidents, and suffered by huge damage due to fatal accidents
are collected and are labeled as “Good Vehicle,” “Moderate Vehicle,” and “Worst
vehicle” as shown in Fig. 1. A dataset of nine hundred images is created by us, by
collecting images of same size, and resolution from various sources on the Internet
as there is no benchmark dataset for accident vehicles is available to the best of the
author’s knowledge. The preprocessing steps done before training the DNN are frame
division, noise removal, background removal, contour detection, and morphological
processing. During frame division, the camera continuously captures the particular
road environment and divides the video into frames in which the accident-prone bikes
are present. A bilateral filter removes the noise from the frames without disturbing
the edge information. After applying adaptive Gaussian thresholding for background
elimination, contour detection is used to detect the edge map of the vehicle in the
scene. After detecting the contours of the vehicle image, a new database with the
contour images is created with the same labels. This new dataset is used to train the
deep neural network and is computationally efficient because of the reduction in the
memory space and faster computations.
The vehicle detection by the deep neural network is carried out in two phases:
training and testing. During the training phase, the network is trained with the three
categories of the image. During the testing phase, the scene captured by the camera
divides the video into frames, preprocesses the frames, and gives the contour image
as input to the deep neural network, and then, the deep neural network classified the
image into one of the three classes of vehicles. If the detected class of the image is
either “Moderate” or “Worst,” then triggers an event in the cloud connected to the
base unit on the road.
The event then enables the camera to capture the frame of the accident and sends
an email along with the captured frame to the nearby command and control room
and also sends an SMS alert to the nearest hospital with ambulance by sharing the
location of the incident and detailing the severity of the incident. The total method
described above is implemented in real time on a Raspberry Pi (R-Pi) processor. The
200 K. Padma Vasavi
Fig. 1 Engineering method for vehicle accident detection system
block diagram for implementing the process on a Raspberry Pi processor is shown

in Fig. 2. The camera interfaced with the R-Pi processor is an 8 MP camera. The
R-Pi is connected to the cloud via the built in Wi-Fi module. The complete method
is implemented in the R-Pi using Open CV and Python programming. Pytorch is
used to implement the deep neural network architecture on the R-Pi. The heart of the
total method described till now is the deep neural network architecture that classifies
the accident severity into three different categories. Now, the details of the DNN
architecture used are presented in detail.
Deep Neural Network for Vehicle Accident Classification

The deep neural network architecture chosen for accident severity detection is a
customized convolution neural network as shown in Fig. 3. The customized convo-
lution neural network of proposed system is designed by using four layers of convo-
lution to extract the features and a layer each of Maxpool, ReLU, and linear classifi-
cation layer for dimensionality reduction, activation, and classification, respectively.
For training the CNN to classify the vehicle status, a dataset of 600 images is collected
from various sources of Internet. The dataset comprises of 200 images each for each
category of vehicles: good, moderate, and worst status. Further, data augmentation
Fig. 2 Block diagram for the real-time implementation

Fig. 3 Customized convolution neural network
is done on the database to resize them to 110 × 110 × 3 to be suitable for the size of
the input layer of the proposed CNN. Also, the images in the database are subjected
to translation and rotation to improve the classification accuracy of the proposed
system. A zero-center normalization is also performed on the database to ensure the
training is completed at a quicker pace. The convolution layer in the network uses
three 3 × 3 filters with a stride of 1 to extract the features of each class of the vehicles.
The ReLU layer is used to compute the activations of the inputs. The Maxpool layer
is used to reduce the dimensions of the features calculated from convolution layer by
using a 3 × 3 filter with a stride of 1. The convolution layer, the ReLU layer, and the
Maxpool layer are repeated for three times, to calculate the features of the vehicles
from macro-level to micro-level. Finally, a fully connected layer is used to flatten the
features to one dimension and categorizes the vehicles into one of the three classes.
Till now, the implementation details of the proposed architecture for vehicle acci-
dent classification are presented in detail. The next section discusses the results
of implementing the proposed system using MATLAB simulations and real-time
implementation on a Raspberry Pi processor as well.
Initially, the dataset is presented to the convolution network for the purpose of classi-
fication of accident. A random set of images from the dataset chosen for the vehicle
accident detection is shown in Fig. 4.
The proposed architecture was trained on an Nvidia (TM) GeForce GTX GPU
with 16 GB of memory. It took approximately two minutes to train the network. The
network was trained using stochastic gradient descent learning, with a learning rate
of 0.001. Among the six hundred images present in the database, 90% of the images
202 K. Padma Vasavi
Fig. 4 Random set of images from image data store
is chosen for training the neural network, and the remaining 10% is used for testing
the neural network.
The training accuracy obtained was 100% after eight epochs with epoch running
for 1500 iterations. The detail of training the neural network is shown in Fig. 5.
Fig.5 Training results of neural network for classification of vehicles

Fig. 6 Testing results of neural network for classification of vehicles
After training the neural network, the neural network is tested by presenting the
images in the remaining 10% of the database left after training the network, and
the results are shown in Fig. 6. In all the given instances, the neural network could
rightly label the category of the vehicle, with a validation accuracy of 94.7%. Till
now, the MATLAB simulation results of the neural network are presented in detail.
However, to implement the system in real time, we have deployed the system in a
Raspberry Pi processor. An equivalent C code required for Raspberry Pi is generated
by using the MATLAB coder support package for Raspberry Pi processor. The real-
time implementation is done by created a simulated environment in front of our
institution main gate, by deliberately keeping a moderately damaged vehicle on
the roadside. The camera interfaced with the Raspberry Pi processor, continuously
captured the scene, divided them into frames, and these frames are given as test
images for the neural network, deployed on the processor. The convolution neural
network could label the vehicles with good condition as “good” and the vehicle we
deliberately kept on the roadside as “Moderate” as shown in Fig. 7. Now, when the
neural network identified a vehicle with moderate damage, it triggers an event in the
cloud remotely connected to the Raspberry Pi processor through the Wi-Fi module
in built on the processor.
The Raspberry Pi is connected to the “Ubidots” cloud through Wi-Fi connection.
When the neural network identifies any vehicle with moderate to worst damage, it
triggers two events on the cloud. One event corresponds to sending a short message
to nearby hospital with ambulance and another event to the nearby police command
and control room giving the notification about the incident as shown in Fig. 8.
Comparisons
The performance of the proposed method is compared with other popular deep
learning architectures like “Alex Net” and “Squeeze Net.” Alex Net is chosen for
comparison because of its efficiency in terms of classification accuracy. Squeeze Net
is chosen because of its low computational complexity and small execution time. All
the three architectures are compared in terms of classification accuracy and execution
time. The comparison results are given in Table 1. From Table 1, it is observed that
the proposed method is executing at a faster pace when compared with Alex Net
without loosing much of the classification accuracy. The faster computational speed
helps the victim to be saved faster by preserving the golden hour.
204 K. Padma Vasavi
Fig. 7 Real-time implementation of vehicle accident detection system
(a) (b)
Fig. 8 a Triggering event, b SMS with location
Table 1 Performance
Architecture Classification Execution time (min)
evaluation of proposed
accuracy in %
method
Alex Net 97 8
Squeeze Net 96.8 6
Proposed CNN 94.7 2
5 Conclusion
A dataset of 600 images which contains good, moderate and worst vehicles with
200 images in each category is preprocessed using bilateral filtering and adaptive
Gaussian thresholding. CNN architecture is designed and built with three convo-
lutional, Maxpooling layers, and one fully connected layer and trained with data
that extracted 6,315,843 features with 100% training accuracy. The testing accuracy
obtained is 94.7%. The vehicle accident system is implemented on a Raspberry Pi to
detect the accident vehicle in a real-world scenario and transmit the information to
the hospitals by built in Wi-Fi of the processor. The message from R-Pi is received
by the registered user’s mobile phone from the Ubidots Cloud.
Acknowledgements The author would like to express her fond appreciation to Md. Imamunnisa
and her team for their active participation in the execution of project and their enthusiasm in real-
time implementation of the project. The author would like to express her heartfelt gratitude and
sincere thanks to the Management of Shri Vishnu Engineering College for Women, for their support
and encouragement to complete this research.
References
1. Global status report on road safety (2018) World Health Organization

2. https://www.codot.gov/safety/traffic-safety/assets/safety-circuit-rider/emergency-response
3. Lopez AD, Mathers CD, Ezzati M, et al. (eds) (2006) Burden: mortality, morbidity and risk
factors. Oxford University Press
4. Ali A, Eid M (2015) An automated system for accident detection. In: 2015 IEEE international
instrumentation and measurement technology conference (I2MTC) proceedings, pp 1608–1612.
https://doi.org/10.1109/I2MTC.2015.7151519
5. Anil, BS, Vilas KA, Jagtap SR (2014) Intelligent system for vehicular accident detection and
notification. In: 2014 International conference on communication and signal processing. IEEE
6. Khalil U, Javid T, Nasir A (2017) Automatic road accident detection techniques: a brief survey.
In: 2017 International symposium on wireless systems and networks (ISWSN). IEEE
Design of Cu-Doped SnO2 Thick-Film
Gas Sensor for Methanol Using ANN
Technique
Amit Gupta, Shashi Kant Dargar, A. V. Nageswara Rao, and B. Raghavaiah
Abstract In this paper, the authors present the analysis of the sensitivity response
of tin oxide-based Cu-doped thick-film gas sensor using neural computing method
of ANN simulation that enables us to foresee the response for wood alcohol at a
temperature of and 350 °C. The device’s sensitivity has been studied at entirely
different Cu-doped concentrations, including no conduction doping. Furthermore,
the minimum and maximal sensitivity at a particular temperature, and 350 °C, has
been analyzed upon exposure to methanol. A unique approach has been adapted to
measure the sensitivity of I records Cu-doped SnO2 dense gas device with applied
ANN algorithms for three distinct network functions. The algorithmic instruction
rule of feed-forward algorithms, particularly gradient descent and backpropagation
with accommodate learning rate (TRAINGD), was used. The performance of ANN
models with different algorithms is evaluated for the clear responsiveness of the
device with different network transfer functions. By experimentation, we tend to
find that the ANN model with an algorithmic training rule is also appropriate for the
sensitivity device. The results presented at intervals in the paper show that ANN is
an associate degree economical tool for designing SnO2 based mostly thick-film gas
sensor devices.
Keywords Artificial neural network (ANN) · Thick-film gas sensor · Sensitivity ·

Neural network (NN)
1 Introduction
As environmental guidelines tend to be stricter, developing a susceptible gas sensor

grows. A gas sensor must be advanced in adaptability, selectivity, and speedy response
A. Gupta · A. V. Nageswara Rao · B. Raghavaiah

Department of Electronics and Communications Engineering, NEC, Narasaraopeta, Andhra
Pradesh, India
S. K. Dargar (B)
Department of Electronics and Communication Engineering, Kalasalingam Academy of Research
and Education, Krishnankoil, Tamilnadu, India
e-mail: drshashikant.dargar@ieee.org
208 A. Gupta et al.
to meet the demand for low-level gas detection [1]. Furthermore, thick-film gas
sensors ought to be cost-effective and reliable over the long term [2]. Metal-oxide-
semiconductor (MOS) sensors based on electron conduction have been widely used
as a widespread process of gaseous detection. The sensing characteristics of SnO2
were reported by Neri et al. (2006) using the gel-combustion method [3]. Mendoza
et al. (2014) showed chemical sensors based on SnO2 -CNT films using the HF-CVD
technique [4]. Due to less expensiveness, durability, and reusability, SnO2 is the
most widely used material among semiconductor oxides for fabricating sensors [5,
6]. Furthermore, owing to the high sensitivity, subtle design, lightweight, less cost,
small particle size, and dopant properties, the thick-film SnO2 gas sensor device is
most suited and competent [7].
Conversely, SnO2 exhibits n-type behavior due to non-stoichiometry instigated
due to oxygen voids. Furthermore, as the SnO2 is an n-type semiconductor, it has a
forbidden bandgap of 3.6 eV. In addition, each anion in the unit cell is attached with
the cations in a planner–trigonal conformation so that p orbitals of oxygen contain
the four–atom plane [8, 9].
The thick-film gas sensor is conductive due to the non-stoichiometric composi-
tions resulting from oxygen deficiency [10]. The sensing property of the thick-film
gas sensor is that its adsorption of the gas on the particles on its surface produces
changes in its conductivity [11].
Newly organized SnO2 particles adsorb oxygen atoms on the surface when
exposed to air [12]. Every SnO2 particle is shielded with negatively charged ions
on the surface, while positive charges get deposited after an atom donate an electron
due to depleting charges. Such layer is created just below the particle surface. When
the sensor is exposed to rare gases at higher temperatures, the O-absorbents respond
and discharge the e-s toward the conduction band [13].
Consequently, the bottom of the space–charge region shrinkages, ensuing in a
decline of the latent barrier height for convenient conduction at the grain bound-
aries. ANN analysis was revealed approximately fifty years ago; however, hands-on
problems have been practiced only for the past 20 years [14].
The ANN is the buildup of small distinctly related transform units. The smart info
is passed between those entities beside onward interconnections. An inward assembly
has two values detected: One is the input, and the other is gaseous mass [15]. The
output is a function of the estimated amount. ANN is proficient in predefined input
data and authorized ready for prediction or classification. ANNs can self-learn to
distinguish the pattern in the real data system. An ANN can handle much input and
provide a suitable selection for designers [16].
ANNs acknowledge signals to navigation in one manner only from input to output.
There is no close loop, i.e., the outcome of any layer does not influence the same
layer. Feed-forward ANNs tend to be truthful networks those assistant inputs with
outputs [17]. They are widely used in pattern identification. This type of network
is also mentioned to assign as bottom-up or top-down. The single-layer network is
the most accessible form of an incrusted system with only a single input layer that
associates straight to the output layer [18].
Design of Cu-Doped SnO2 Thick-Film Gas Sensor … 209
The perceptron is the simplest form of artificial neural network used to classify
patterns that are linearly breakable. Linearly delicate patterns lie on opposite sides of
a hyperplane. The model exists of single neurons with balanced synaptic weight and
bias. Single neuron perception is limited to performing pattern classification with
only two classes. From the classification with than two classes, the output layer of
the perception can include more than one neuron.
With multilayer feed-forward ANNs, hidden layers are available between the
input and output layers. In feedback or recurrent ANNs, there are linkages from
later layers back to earlier layers of neurons. In this type of neural network, there
is one feedback loop. The network’s hidden neuron unit activation to the output
data is feedback into the network as input. This work presents the reorganization
of sensitivity in a Cu-doped SnO2 sensor for the feed-forward network that can be
used to reorganize the pattern. Feed-forward network utilizes the Gaussian activation
function. The importance of such a function is that it is non-negative for all values
of x.
2 Proposed Experiment
The methanol gas is highly toxic and unsafe for a living being, whose severe contact
can produce instant bronchial contraction, narrowing of the airways, high pulmonary
resistance, and increased airway reactivity in experimental animals. Critical expo-
sures to experimental animals have also produced changes in metabolism and irri-
tation to the mucus membranes in the eyes. Therefore, the calibration of the heater
element was carried out in air ambient. The temperature variation of the substrate
containing the heater with external electrical power supplied was recorded using
the thermistor. The toxic gases and liquid are injected into the chamber by a needle
from the top of chamber. The base or foundation of the chamber is insulated and
isolated by a cotton bed or sheet. Chamber box has open the measurement of Ra
resistance of air and Rg resistance of test gas and liquid. To perform the experiment,
chain of different concentration liquids and gases is necessary. The sensor resistance
started falling down immediately after being due to the semiconducting nature of
the sensor. This decrease in resistance is exponential. It was followed by an increase
of resistance of the sensor due to the adsorption of oxygen molecules on the sensor
surface. After some time, the sensor resistance stabilized, which was the resistance
value in the clean air for that temperature. One ml of the test gas was introduced to
the enclosed chamber in this moment, and the resistance was noted down. The gas
concentration was increased by injecting more gas into the chamber ml-wise, and
the corresponding sensor reading was noted down The concentration is measured in
terms of part per million (ppm), and for liquid, 1 ml equals 100 ppm, and for gases,
1 ml equals to 250 ppm. To measure the sensitivity of the SnO2 based 1% Cu-doped
thick-film gas sensor, the value of resistance of thick-film gas sensor in air (Ra) is
measured with the digital multi-meter (DMM). Secondly, the value of resistance in
sample gas is calculated by digital multi-meter (DMM).
210 A. Gupta et al.
Alumina Substrate
Fig. 1 Schematic of fabricated thick-film gas sensor
The chematic of fabricated sensors ∥1 × 1∥ thick-film gas sensor was shown in

Fig. 1. The artificial neural network (ANN) model may be used as an alternative
method for technological analysis and Matlab-based calculation. Artificial neural
networks have two main components—the processing element called neurons and
the connection between them; each connection has their own weights. The neurons
are the information processors, and the connection functions are the information
storage. Each processing element first calculates a weighted sum of the input signals
and then applies the transfer functions. The term ‘Feed Propagation’ comes due to the
training method used during the training process-backpropagation error. The error
between the desired and true output is computed. The input of the FEFN NN model is
a absorption of methanol at 150 and 350 °C. The output of the NN model is sensor’s
sensitiveness for methanol at 150 and 350 °C. The ANN is proficient in charting
between methanol’s concentration and output sensitivity at 150 and 350 °C.
3 Result and Discussion
The experimental data were first extrapolated by the Matlab tool, and 10 extrap-
olated data were obtained for the different concentrations of methanol at 150 and
350 °C. Out of these, first six data were used for the training purpose and the rest
four for confirmation. The confirmation set was used to stop the NN training when
the neural network began to overfit the data. The test dataset was not used during the
model validation of the neural network. Multilayer perception feed-forward ANN
was designed to test and train using gradient descent back (GDB) propagation algo-
rithm offered forward propagation. LEARNGDM has been cast-off as its adaptation
learning function, and mean square error is used as a performance function. The
mean square error (MSE) has been used to estimate the network efficiency as the
training goal. The more minor the MSE is, the better the network’s performance and
accuracy. For actual data tansin, logsin and purelin were used as transfer functions
for all the neurons, respectively, at individual iteration set of input and output data,
after building the neural network by MATLAB setting the network type and param-
eters and training for 1000 iterations and ten hidden neurons. When the sensitivity
was tested by Matlab software neural network tool through Levenberg–Marquardt
feed-forward propagation algorithm, the maximum sensitivity in the tansin network
transfer function was 22.52% at 150 °C compared to the various network transfer
functions in Fig. 2.
While analyzing the sensitivity of a sensor by the function extracted from Matlab
software (neural network) tool, it was observed that purelin transfer function of the
algorithm (Gradient descent backpropagation with adaptive learning rate), the sensi-
tivity appeared was found to be maximum 79.28% at 350 °C as compared to the other
network transfer function in Fig. 3a. Therefore, the gradient descent backpropagation
(GDB) with adaptive learning rate algorithm was used the regression parameter of
train data (0.9899) and output target train data (0.98305) in Fig. 3b.
(a) (b)
Fig. 2 Schematic of a fabricated thick-film gas sensor, b characterization setup
Fig. 3 Response of 1% Cu-doped SnO2 sensor on exposure to methanol in different network

transfer functions
212 A. Gupta et al.
Fig. 4 Results of regression: a logsin transfer function, b purelin network transfer function
The gradient descent backpropagation with adaptive learning rate algorithm has
been utilized as regression factor of train data (0.97678) and the output target train
data (0.96097) as shown in Fig. 4a. The GDB propagation with adaptive learning rate
algorithm was used as regression parameter of the train data (0.9996) and the output
target data (0.9723) as shown in Fig. 4b.
LEARNGDM is used as its adaptation learning function, and mean square error
is used as performance function at 350 °C. The Levenberg–Marquardt feed-forward
propagation algorithm is used as regression parameter of train data (0.99922) and
output target data (0.98369) as demonstrated in Fig. 5. The above algorithm having
regression parameter of train data (0.99298) and output target data (0.99004) is shown
in Fig. 6a.
Levenberg–Marquardt feed-forward propagation algorithm regression parameter
of train data is (1.000), and output target data (0.99996) are shown in Fig. 6b.
4 Conclusion
The maximum sensitivity recorded for 1% Cu doping SnO2 -based thick-film gas
sensor was 22.52% at 150 °C. The maximum sensitivity of methanol was also tested
by Matlab software neural network tool. In gradient descent backpropagation with
adaptive learning rate algorithm network function in logsin network transfer function,
it was established to be (0.9830) at 150 °C in Fig. 4.
Among the three transfer function network, logsin is the most suitable function
as at zero epoch; maximum validation performance is too successful. The gradient
descent backpropagation with adaptive learning rate network function was found to
Fig. 5 Results of regression: a tansin transfer function, b logsin transfer function
Fig. 6 Results of regression: a purelin transfer function, b tansin transfer function
have the lowest error in logsin transfer function network. Gradient descent back-
propagation with adaptive learning rate algorithm is the substantial worthy approach
in equivalence to Levenberg–Marquardt feed-forward propagation algorithm. The
utmost sensitivity founded for 1% Cu doping SnO2-based thick-film gas sensor was
79.33 at 350 °C.
Amendment capability of 1% Cu-doped SnO2-based thick-film gas sensor was
checked by tansin transfer function for methanol at 350 °C.
214 A. Gupta et al.
The Levenberg–Marquardt feed-forward propagation algorithm regression

parameter of train data was (1.000), and output target data (0.99996) are shown
in Fig. 6. Amid the three transfer function network, the tansin network transfer
function was the best suitable function as maximum validation performance was
achieved at zero epochs. The Levenberg–Marquardt feed-forward propagation algo-
rithm and tansin network transfer function train the data regression function. Leven-
berg–Marquardt feed-forward propagation algorithm is the most preferred technique
in comparison to gradient descent backpropagation with an adaptive learning rate
algorithm.
References
1. Butta N, Cinquegrani L, Mugno E, Tagliente A, Pizzini S (1992) A family of tin oxide-based

sensors with improved selectivity to methane. Sens Actuators B Chem 6(1–3):253–256
2. Carotta MC, Dallara C, Martinelli G, Passari L, Camanzi A (1991) CH4 thick-film gas sensors:
characterization method and theoretical explanation. Sens Actuators B Chem 3(3):191–196
3. Deorsola FA, Mossino P, Amato I, DeBenedetti B, Bonavita A, Micali G, Neri G (2006)
Gas Sensing Properties of TiO2 and SnO2 Nanopowders Obtained through Gel Combustion.
InAdvances in Science and Technology 45:1828–1833. Trans Tech Publications Ltd.
4. Centeno FW (2014) Development of sensors based on advanced micro-and nanostructured
carbon materials. University of Puerto Rico, Rio Piedras (Puerto Rico)
5. Choe, Y-S (2001) New gas sensing mechanism for SnO2 thin-film gas sensors fabricated by
using dual ion beam sputtering. Sens Actuators B Chem 77(1–2):200–208
6. Choudhary M, Mishra VN, Dwivedi R (2013) Effect of temperature on palladium-doped tin
oxide (SnO2) thick film gas sensor. Adv Sci Eng Med 5(9):932–936
7. Comini E, Ferroni M, Guidi V, Faglia G, Martinelli G, Sberveglieri G (2002) Nanostructured
mixed oxides compounds for gas sensing applications. Sens Actuators B Chem 84(1):26–32
8. Lee, SC, Choi HY, Lee WS, Lee SJ, Kim SY, Ragupathy D, Lee DD, Kim JC (2011) Improve-
ment of recovery of SnO2-based thick film gas sensors for dimethyl methylphosphonate
(DMMP) detection. Sensor Lett 9(1):101–105
9. Gupta A, Srivastava JK, Bhaskar A (2013) ANN for the analysis of Pd-doped SnO2 sensor for
detection of Acetylene. Arni Univ Int J 2(1):26–30. ISSN: 2278-4241
10. Wang Y, Meng X, Yao M, Sun G, Zhang Z (2019) Enhanced CH4 sensing properties of Pd
modified ZnO nanosheets. Ceram Int 45(10):13150–13157
11. Li, D, Tang Y, Ao D, Xiang X, Wang S, Zu X (2019) Ultra-highly sensitive and selective H2S
gas sensor based on CuO with sub-ppb detection limit. Int J Hydrogen Energy 44:3985–3992
12. Gupta A, Kannan D, Reddy GF (2020) Pd Dopant for SnO2 based thick film gas sensor for
detection of LPG using ANN technique. In: Proceedings of industry interactive innovations in
science, engineering & technology (I3SET2K19)
13. Dawson CW, Wilby R (1998) An artificial neural network approach to rainfall-runoff modelling.
Hydrol Sci J 43(1):47–66
14. Haykin, S (1994) Neural networks: a comprehensive foundation. MacMillan College
Publishing Co. New York
15. Kalogirou SA (2003) Artificial intelligence for the modeling and control of combustion
processes: a review. Prog Energy Combust Sci 29(6):515–566
16. Nissen S (2003) Implementation of a fast artificial neural network library (fann). Rep Dept
Comput Sci Univ Copenhagen (DIKU) 31:29–26
17. Dargar SK, Srivastava VM (2019) Design and analysis of IGZO thin film transistor for
AMOLED pixel circuit using double-gate tri active layer channel. Heliyon 5(4):e01452
18. Gupta A, Kumar VR (2020) Machine learning technology using thick film gas sensor toxic
liquid detection for industrial IOT application. In: 2020 IEEE international conference on
electronics, computing and communication technologies (CONECCT)
Detect Traffic Lane Image Using
Geospatial LiDAR Data Point Clouds
with Machine Learning Analysis
M. Shanmuga Sundari , M. Sudha Rani, and A. Kranthi
Abstract Artificial intelligence is a challenging domain in geospatial technology.

It will boost the heights in various application domains while also displaying the
variance in the geographical concept. Artificial intelligence-based techniques are
crucial in LiDAR evaluation and geospatial digital images to interpret the compo-
nents of geospatial AI. LiDAR point clouds technique will explain the feasibility of
machine learning and deep learning approaches in the geospatial field. We define
the workflow using LiDAR point clouds based on machine learning/deep learning
approaches that will create the LiDAR point clouds in spatial LiDAR models. A
regionally weighted regression includes the land-use/land-cover change indicator
and the geospatial weighted regression (GWR). Machine learning and deep learning
enable the LiDAR technique in geospatial to build and maintain all virtual models.
We use traffic images to detect the conjunction, collision, and crowd traffic using
the LiDAR technique. This research gives the accuracy of images using machine
learning concepts.
Keywords Geospatial image · LiDAR · Machine learning algorithm · Traffic lane
1 Introduction
Climate change will bring a slew of new dangers to the earth. Floods and other hydro-
logical hazards may cause the global map to alter geographically. Changes in land
use and coverage are also the most difficult challenges in maintaining the geograph-
ical area. The harmful dangers include causing surface runoff and generating varied
surface capabilities. Natural disasters can do significant harm to human life. Taking
M. Shanmuga Sundari (B) · M. Sudha Rani · A. Kranthi

BVRIT Hyderabad College of Engineering for Women, Hyderabad, Bachupally, India
e-mail: sundari.m@bvrithyderabad.edu.in
M. Sudha Rani
e-mail: sudharani.m@bvrithyderabad.edu.in
A. Kranthi
e-mail: kranthi.a@bvrithyderabad.edu.in
218 M. Shanmuga Sundari et al.
into account and preserving an excessive design of geological space is critical. Many
research students are interested in experimenting with the connections between land
use and coverage and any changes in the landscape. These will have an impact on
traffic restrictions as well. The diverse characteristics of hydrological dangers will
impact the forest’s built-up space extensions.
The afforestation provides the land changes due to the peak frequency of changes
in the land coverage. A range of study fields is related to geospatial analysis. All-
natural climatic information and alterations are managed using geoformation [1],
which refers to geographic information systems (GISs). The escalation of hydro-
logical risks sets the stage for spatial resolutions and land-cover changes. Satellite
images were obtained and used for enhancements. It specifies the GIS’s position
using the statistics used to generate vector or raster inputs.
The geospatial images [2] will show the changes in the road map, such as traffic
rerouting. When there is an irregular traffic flow, our dynamic technique will regulate
it. The tremendous intensity of traffic flow constantly disrupts the usual routine.
Machine learning technologies are applied to improve the accuracy of traffic forecasts
and flow control.
2 Literature Survey
Nowadays, various tools and technologies are available in the geospatial domain
leading the industry to the next level. These research tools mostly employ machine
learning techniques using algorithms such as logistic regression (LR), support vector
machine (SVM), and stochastic gradient descent algorithms (SGD). Many studies
have been conducted on the early prediction of events and supporting decisions
[3]. Labs have developed an LR-based model of real-time heart prediction based
on machine learning analysis. SVM algorithm is used to calculate the dependency
between attributes and analyze the disease. It predicts acute cardio effect in [4]
and diastolic and systolic blood pressure [6]. In addition to carrying out a contour
reputation for every signal, the use of a linear regression version is primarily based
on a raster image [5]. It uses earlier records to do away with floor and constructing
points.
The technique of coming across excessive extended items is based on the pinnacle
or border of the street in the MLS factor cloud. It clusters those excessively extended
items to site visitors’ signals and mild pole classes [6]. Big data analysis is going
on in geospatial fields and finding the predictions of traffic flow [7]. LiDAR tech-
nique is one of the efficient techniques that help to find the image visualization [8].
Much research is going on geospatial like time series segmentation [9] and image
reproduction of the original image.
Thanh Ha T and T. Chaisomphob et al. [10] propose to utilize principal component
analysis (PCA) to find the planar MLS data. This paper classifies the pole as different
objects like utility, lamps, and street signs. CNN features give the best performance
by providing visual data and image representation. Many algorithms are used in
Detect Traffic Lane Image Using Geospatial... 219
geospatial methods, but getting the data and images is most challenging in this area.
It is a huge limitation of this research. This research helps developing countries
regulate the traffic in necessary places. It will help to improve personal safety in the
country.
3 Proposed System
Figure 1 represents the proposed system of our research. Geospatial analysis is carried
out using weighted regression. Preprocessing of the image has crossed many stages
to reach the regression. The correction and classification are used to help to get the
figured image that is suitable for regression. This research differs from the previous
research in terms of algorithm matrix prediction using the Markov matrix and finding
the geometric points. This method is useful to discover the interception points in the
geospatial image reorganization.
3.1 Land-Usage/Land-Coverage Change Analysis
The Markov matrix is calculated using the below formula and is used to find land-use
changes. The transitions were obtained in ArcGIS [11] using different datasets with
the help of Raster Calculator tool. The obtained value is quantified and categorized
Fig. 1 Proposed model for LiDAR geospatial traffic lane

using the equation called TRDSDLUI (1). This formula used to derive from the land-
use/land-cover change, namely the synthetic dynamic land. LU is the land-usage
value from source to destination LiDAR points.
Σn | |
|ΔLUi− j |
TRDSDLUI = i=1
Σ × 100 (%) (1)
2 LUi
The modified Fournier index was calculated according to Eq. (2):
Σ12
Pi2
MFI = (2)
i=1
P
The numerous geometric points using the item observed in the image are provided
by LiDAR points. This has nothing to do with structural or other data. To forecast
the analysis in geospatial dangers of irregularity or incompletion, machine learning
and deep learning are applied. Due to their ability to predict items in images, LiDAR
points are extremely valuable for geospatial accounting.
Figure 2 explains the architecture of the sensing concept using the neural network
concept. The given inputs are considered as the attributes which will carry the inputs
in different layers and transform the output attribute.
The traffic locations and other land-cover details are identified and used to train the
MLP model. The classes/categories were calculated with the frequency ratio values
calculated using Eq. (3).
( ) ( ) ) )
N (F X i ) N Xj
FR = Σm / Σn ) ) (3)
i=1 N (F X i ) j=1 N X j
where FR—frequency ratio for i and j parameters, function of N is flash-flood loca-

tions in the image of X, N (X j ) is denoting the total pixels in a variable Xj , m is the total
number of classes in flash-flood predictor X i , and n is the number of flood-influencing
factors in the image.
FR coefficients are calculated, and the value is normalized between 0.1 and 0.9
using the given formula (4).
(a − min(r )) × (max(l) − min(l))

v= (4)
max(r ) − min(r )
v represents the standardized value is calculated of a where a gives the current

value and r represents the high and low limit of value and l gives the limitation
of standardization of range.
Fig. 2 LiDAR sensing architecture
4 Interpretation Concept
The processing of LiDAR point clouds using machine learning is based on the concept
of interpretation. LiDAR points are useful to extract the features and provide the
applications or services such as the search area in the geospatial image. ML will
train the feature and create a model. The evaluation of outcomes is triggered by the
analysis processes.
Figure 3 presents a comparison of LiDAR point cloud interpretation to the work-
flow: (a) workflow and LiDAR point cloud (b) semantic workflow and raw data and
features in training data. The following functions will be met by the machine learning
engine:
. LiDAR point classification: Labels are computed and applied as per-point prop-
erties together with the likelihood for this category assignment according to
established point categories.
. Cloud segmentation: Segmenting LiDAR point clouds as a key process helps to
reduce fragmentation and subdivide big point clouds.
Fig. 3 Traffic lane-captured

image
. Shape recognition: Understanding LiDAR environments requires the recognition

of shapes. A combination 2D–LiDAR technique was used to recognize them.
. Object classification: Most applications require the extraction of object-based
information from LiDAR point clouds.
Figure 3 shows the aerial view of the traffic on the highway. Three-dimensional
modeling of the LiDAR point clouds can be made in a classical way (without colors)
or in an innovative way (with RGB information). This image captured the coordinates
of the location of the vehicle.
Figure 4 shows that the image represents the captured image of the highway with
the LiDAR system. This can give clarity about the geospatial view of the traffic lane
particularly. Fig. 4 gives the accurate detection of the traffic images.
Figure 5 shows the geospatial image of the traffic and other lane cover surrounding
the traffic image. The accuracy of the image is considered the visualization of the
below image.
Fig. 4 Traffic lane detection

Fig. 5 Geospatial data with

land cover
True Positives
Recall =
True Positives + False Positives
True Positives
Precision =
True Positives + False Negatives
Precision × Recall
F1-measures = 2 ×
Precision + Recall
True Positives
quality =
True Positives + False Positives + False Negatives
Using the image with LiDAR technique achieved the confusion precision value
82.35% and recall value 100%; F1 value 90% with overall quality is 82%. This
performance is attributed to the use of the optimization with equality constraints to
more accurately classify a pixel as road or non-road, the use of deep features to more
accurately represent the visual data of candidates, and the use of the sparse classifier
optimization model to more accurately classify each candidate as a traffic sign or
non-traffic lane.
The precision value of this traffic lane is 82.35%, and the quality of the image
recognized by LiDAR is 82.35% shown in Fig. 6. This can depend on the location
captured and differ according to coordinates.
Accuracy is measured using the geospatial image coordination and the flow of each
vehicle using the interception concept. So, traffic tracking process [12] is successful
using geospatial techniques.
Fig. 6 Accuracy prediction in high distance
5 Conclusion and Future Scope
In this research, we used the images to capture the airborne geo color-referenced
images and noisy data in the traffic signs. The steps for this research are (1) road
extraction, (2) traffic sign candidate detection, and (3) traffic sign classification. The
drone’s usages are also critical to capture LiDAR technology. Because this technique
makes use of small areas, even inaccessible positions in topographic equipment. It
is recommended to use in areas with slow altitude differences in the traffic can track
at an average altitude.
Local deep features are integrated with the sparse representation to optimize the
traffic sign candidates with different color images with all coordinate projections.
The proposed system shows the qualitative and quantitative effectiveness of the
traffic lane. The outcome showcased in our project is accurate detection in the traffic
lane and reduction of collisions using the optimization model for classification. In
the future, we plan to enhance our research in remote aerial vehicle technology for
high-resolution aerial images with LiDAR data.
References
1. Huang X, Gong J, Chen P, Tian Y, Hu X (2021) Towards the adaptability of coastal resilience:
Vulnerability analysis of underground gas pipeline system after hurricanes using LiDAR data.
Ocean Coast Manage 209:105694
2. Johnson KM, Ouimet WB (2021) Reconstructing historical forest cover and land use dynamics
in the northeastern United States using geospatial analysis and airborne LiDAR. Ann Am Assoc
Geogr 111(6):1656–1678
3. Padmaja B, Prasad VVR, Sunitha KVN, Reddy NCS, Anil CH (2019) Detectstress: a novel
stress detection system based on smartphone and wireless physical activity tracker. Adv Intell
Syst Comput 815. https://doi.org/10.1007/978-981-13-1580-0_7
4. Lakshmi L, Purushotham Reddy M, Praveen A, Suniha KVN (2020) Identification of diabetes
with recursive partitioning algorithm using machine learning. Int J Emerg. Technol 11(3)
5. Nelson JR, Grubesic TH (2020) The use of LiDAR versus unmanned aerial systems (UAS) to
assess rooftop solar energy potential. Sustain Cities Soc 61:102353
6. Ureta JC, Zurqani HA, Post CJ, Ureta J, Motallebi M (2020) Application of nonhydraulic
delineation method of flood Hazard areas using LiDAR-based data. Geosciences 10(9):338
7. Malik R, Nishi M (2021) Flexible big data approach for geospatial analysis. J Ambient Intell
Humaniz Comput 1–20
8. Lyu F, Xu Z, Ma X, Wang S, Li Z, Wang S (2021) A vector-based method for drainage network
analysis based on LiDAR data. Comput Geosci 156:104892
9. Anders K, Winiwarter L, Mara H, Lindenbergh R, Vos SE, Höfle B (2021) Fully automatic
spatiotemporal segmentation of 3D LiDAR time series for the extraction of natural surface
changes. ISPRS J Photogramm Remote Sens 173:297–308
10. Thanh Ha T, Chaisomphob T (2020) Automated localization and classification of expressway
pole-like road facilities from mobile laser scanning data. Adv Civ Eng 2020
11. Ahmed C, Mohammed A, Saboonchi A (2020) ArcGIS mapping, characterisations and
modelling the physical and mechanical properties of the Sulaimani City soils, Kurdistan Region,
Iraq. Geomech Geoengin 1–14
12. Sundari MS, Nayak RK (2021) Efficient tracing and detection of activity deviation in event log
using ProM in Health Care Industry. In: 2021 Fifth international conference on I-SMAC (IoT
in social, mobile, analytics and cloud)(I-SMAC), pp 1238–1245
Classification of High-Dimensionality
Data Using Machine Learning
Techniques
D. Padmaja Usharani, G. Sridevi, Rambabu Pemula,

and Sagenela Vijaya Kumar
Abstract In the digitization world, a large volume of information is being produced

across a few areas such as medical services, creation, Web, and associations. Machine
learning techniques are utilized to reveal designs among the train of this information
for decision-making. Not all the features in the datasets produced are significant
for preparing the machine learning (ML) algorithms. A few features may be not
important and some probably won’t influence the result of the forecast. Disregarding
or eliminating these immaterial or less significant features reduces the weight on
ML algorithms. In this work, principal component analysis (PCA) is researched on
different most popular ML algorithms, Naïve Bayes, support vector machine (SVM)
classifier and KNN classifier, using freely accessible MINIST dataset. Experimen-
tation results demonstrate that ML algorithms with PCA produce better outcomes
when dimensionality of the datasets is high.
Keywords Dimensionality reduction · KNN · Machine learning · Naïve Bayes ·

PCA · SVM
1 Introduction
Several research works have been done past two decades. There is a recognition
technique known as handwritten digit recognition. This process involves converting
handwritten text from an image or a scanned file into an editable text. It is not possible
to identify all digits correctly even for human. Here, we implement dimensionality
D. Padmaja Usharani · G. Sridevi · R. Pemula (B)

Department of Computer Science and Engineering, Raghu Engineering College, Visakhapatnam,
India
e-mail: rpemula@gmail.com
G. Sridevi
e-mail: sridevi.gadde@raghuenggcollege.in
S. V. Kumar
Department of Computer Science and Engineering, School of Technology, GITAM (Deemed to be
University), Hyderabad, India
e-mail: vsagenel@gitam.edu
228 D. Padmaja Usharani et al.
reduction and various classification techniques in machine learning so as to take a

look at their suitability for digit recognition. One of the most traditional linear dimen-
sionality reduction algorithms is the principal component analysis (PCA) [1]. The
productivity of preparing in ML algorithms affects with high-dimensionality infor-
mation. The precision of ML-based grouping tumbles down with high measurement
information. To increase the accuracy, we need to reduce the high dimensionality
into low-dimensionality data and perform the classification on reduced data.
2 Related Work
Techniques to recognition the handwritten digits using machine learning algorithms

and dimensionality reduction (DR) techniques Written by hand digit assignment is
a tangled task that is indispensable for creating applications, in computer vision
affirmation. Handwritten digits are one of the major applications. Identification of
handwritten digits has come to be sizeable with inside the better worldwide due than
its execution in our conventional life [2]. In Literature, a large amount of time is
spent on digit recognition using an explicit framework and being acquainted with
calculations. Numerous locales of exploration having some expertise in consolidating
and developing various methodologies of framework becoming more acquainted
with which will upgrade the general presentation and exactness of digit recognition.
Famous dataset for digit personality is Modified National Institute of Standards and
Technology (MNIST) [3]. DR technique takes vital part, while we’ve taken high-
dimension size dataset. The DR technology is intended to diminish the dimensionality
of raw records. It is pivotal to data mining and framework becoming acquainted
with. By bringing down the wide assortment of data highlights, it could upgrade the
classifier’s general presentation and reduction computational intricacy [4]. To reduce
the dimensionality reduction, linear and nonlinear methods were identified [5].
Sarowar et al. [6] labored on MNIST dataset, and special accuracies received from
diverse classifiers. The maximum accuracy 80.84% recognized with PCA primarily
based totally CNN with ACO. Thippa Reddy et al. [7] recognized that the overall
classifier’s performance along with PCA is more than classification with LDA with
the aid of using thinking about Cardiotocography (CTG) dataset and additionally
cautioned that the effectiveness of those DR strategies may be examined on excessive
dimensionality statistics which includes images, textual content statistics, etc. Yan
and Tianyu [5] recognized that class accuracy may be improved efficiently with the
aid of using appearing suitable DR earlier than training. DR additionally reduces
the computational complexity and lowers the specified garage for digit recognition
[8]. Adiwijaya et al. [9] cautioned a most cancer detection scheme primarily based
totally on microarray statistics class which uses the discount approach PCA and
additionally in comparison with SVM and LMBP algorithms. Hartono [10] labored
on identical dataset MNIST and to embed the elegance facts in its low-dimensional
illustration of rRBF even as acquiring the class ability. And additionally labored
on diverse sorts of dimensionality discount strategies like PCA, NCA and t-SNE.
Classification of High-Dimensionality Data Using Machine … 229
Combining DR strategies can cause higher consequences than making use of simplest
one approach consistent with de Paula Rodrigues et al. [11]. Mardani et al. [12]
worked on World Development Indicator (WDI) dataset, and SVD used for reduction.
Zebari et al. [13] applied feature selection as well as feature extraction methods
and analyzed that high dimensionality of data has a direct impact on the learning
algorithm, computational time, computer resources (memory) and model accuracy.
Ramakrishna Murty et al. [14] suggested dimensionality reduction of a large text
data by least square SVM along with singular value decomposition. Dimensionality
reduction text data clustering with prediction of optimal number of clusters. Saleem
and Chishti [15] suggested lightweight CNN model on MNIST dataset based on
execution time.
3 Machine Learning Techniques
In this section, authors discuss different ML classification techniques like Naïve

Bayes algorithm, SVM, KNN and dimensionality reduction technique like PCA.
3.1 Naive Bayes Algorithm
This is a classification method dependent on Bayes’ hypothesis with a presumption

of freedom between predicators. A specific trademark in a class isn’t identified with
the presence of some other trademark. Bayes’ hypothesis finds the likelihood that an
event will happen given the likelihood that another event has as of now happened.
P(x|c) ∗ P(c)
P(c|x) = (1)
P(x)
P(c|x) = P(x1 |c) × P(x2 |c) × · · · × P(xn |c) × P(c) (2)
• P(c|x): The posterior probability of class (c, target) given predictor (x, attributes).
• P(c): The prior probability of class.
• P(x|c): The likelihood which is the probability of predictor given class.
• P(x): The prior probability of predictor.
3.2 Support Vector Machine (SVM)
The SVM algorithm is to make the best line or choice limit that can isolate n-
dimensional space into classes so we can undoubtedly place the new information
point in the right classification later on. This best choice limit is known as a hyper-
plane. The fact is to track down a hyper-plane that characterizes and expands the edge
in a n-dimensional space. SVM picks the outrageous focuses/vectors that assistance
in making the hyperplane. These outrageous cases are called as help vectors, and
consequently calculation is named as SVM. SVM are a bunch of regulated learning
techniques utilized for arrangement, regression and exceptions’ location.
3.3 K-Nearest Neighbor (KNN) Algorithm
K-nearest neighbor [16] is one of the supervised learning methods. KNN calculation
can be utilized for regression as well as for arrangement; however, for the most part,
it is utilized for the characterization issues. It is known as instance-based or lazy
learner calculation since it doesn’t gain from the preparation set promptly, rather
than it stores the dataset and at the hour of grouping, it recognizes which class
an information point has a place dependent on how intently it matches with the k
closest neighbors. Ascertain the Euclidean distance between the information focuses.
The Euclidean distance is the distance between two focuses to observe to be close
surmised with the K-nearest neighbors.
Three distance (Euclidean, Manhattan and Minkowski distance) measures are just
legitimate for nonstop factors. In the instance of straight-out factors, the Hamming
distance should be utilized.
Minkowski distance: is somewhat more perplexing measure than most. Minkowski
distance is a measurement in a normed vector space which can be considered as a
speculation of both the Euclidean distance and the Manhattan distance. This action
has three prerequisites:
Zero vector: The zero vector has a length of nothing while each and every other
vector has a positive length. For instance, assuming we make a trip starting with one
spot then onto the next, that distance is consistently certain. In any case, assuming
we go from one spot to itself, that distance is zero.
Scalar factor: When you multiply the vector with a positive number its length is
adjusted while keeping its course. For instance, in the event that we head a specific
distance one way and add a similar distance, the course doesn’t change.
Triangle inequality: The briefest distance between two focuses is a straight line.
( k ) q1
Σ
D(x, y) = (|xi − yi |)q (3)
i=1
3.4 Principal Component Analysis (PCA)
PCA [1] is a linear dimensionality reduction method that is used to reduce the high
dimensionality of large datasets, by transforming a large set of features (mostly
contains all features) into a smaller set of features that still contains most of the
information in the large set. We need to reduce the dimensionality because smaller
datasets are easier to visualize and explore and make analyzing data much faster and
easier for ML classification. This contains five steps as follows:
Standardization: We need perform standardization prior to PCA. Mathematical
Eq. (4) to find, by subtracting the mean and dividing by the standard deviation for
each value of each variable.
x ij − x j
x ij = ∀j (4)
σj
All variables will be transformed into the same scale. We need to compute
covariance matrix to identify the correlations.
Σ 1 Σ( i )( i )T Σ
m
= x x , ∈ R n∗n (5)
m i
In order to identify the principal component, we need to compute the eigenvectors

and eigenvalues of covariance matrix.
Σ
uT = λμ
⎡ ⎤
| | |
U = ⎣ u 1 u 2 u 3 ⎦, u i ∈ R n (6)
| | |
Feature vector: n-dimensional data has to be represented in terms of k-dimensional

subspace. Top k eigenvectors has to be chosen.
⎡ ⎤
u T1 x i
⎢ uT x i ⎥
⎢ 2 ⎥
⎢ ........ ⎥
xinew =⎢ ⎥
⎢ .... ⎥ ∈ R
k
(7)
⎢ .... ⎥
⎣ ........ ⎦
u Tk x i
Recast the data on the principal component’s axes. In the previous steps, to form the
feature vector, select the principal components but the input dataset always remains
in terms of the original axes.
final dataset = feature vector ∗ standardize original datase (8)
4 Proposed Model
The proposed method as depicted in Fig. 1. In order to identify the performance

evaluation of this model, execute the following steps.
Fig. 1 Proposed model based on PCA dimensionality reduction and classification

Fig. 2 Sample image of

MNIST dataset
Step 1: Collection of datasets.

Step 2: Normalize or preprocess the dataset.
Step 3: Train and test the ML algorithm using dataset and evaluate the
performance.
Step 4: Apply PCA technique on normalized data and train and test the ML
algorithm using reduced dataset.
Step 5: Compare the results of step 3 and step 4 based on parameters like precision,
recall, accuracy and f 1 -score.
i. The MNIST dataset has 42,000 labeled (28 × 28 pixel) grayscale images of
handwritten digits from 0 to 9 in their and 28,000 unlabeled test images. In
order to identify the digits correctly, we use different classification techniques of
machine learning. Sample image of MNIST dataset is as shown in Fig. 2.
ii. Data Normalization: The purpose of normalization is to transform data in a way

that they are either dimensionless and/or have similar distributions. This process
of normalization is known by other names such as standardization and feature
scaling. Normalization is an essential step in data preprocessing in any machine
learning application and model fitting. Here we used, standard score method for
normalization.
x −μ
z= (9)
σ
z standard score
μ population mean
σ standard deviation.
iii. The normalized data is experimented by using ML algorithms like Naive Bayes,
SVM and KNN. Performance of classifiers is then evaluated on the various
metrics like precision, recall, f 1 -score and accuracy.
iv. PCA is applied on the normalized data. The resultant reduced dataset is then
experimented by using the ML algorithms like Naive Bayes, SVM and KNN. The
obtained results are again evaluated using the aforementioned metrics precision,
recall, f 1 -score and accuracy.
5 Performance Evaluation Metrics
Metrics like precision, accuracy, recall and f1-score are used here to analyze the
performance of this method. We discuss about these metrics here.
Accuracy: Accuracy is the total number of predictions which are correct.
TP + TN
Accuracy = (10)
(TP + TN + FP + FN)
True Positive (TP) is positive class identified correctly as positive. TP is an

outcome where the model predicts the positive class correctly.
False Negative (FN) is positive class identified incorrectly as negative. FN is an
outcome where the model predicts the positive class incorrectly.
False positive (FP) is negative class identified incorrectly as positive. FP is an
outcome where the model predicts the negative class incorrectly.
True Negative (TN) is negative class identified correctly as negative. TN is an
outcome where the model predicts the negative class correctly.
Precision: Precision is the ratio of total number of classified positive (TP) exam-
ples which are correct and the total number of all predicted positive examples (TP
+ FP). It gives correctness achieved in positive prediction.
TP
Precision = (11)
(TP + FP)
Recall: Recall is the ratio of total number of classified positive (TP) examples
which are correct and all positive predictions that could have been made.
TP
Recall = (12)
(TP + FN)
F 1 -score: F 1 -score is a weighted average of sensitivity and precision. F 1 -score

might be consider as good choice for balance between precision and recall.
Precision × Recall
F1 score = 2 × (13)
Precision + Recall
6 Result Analysis
Performance of dimensionality reduction and ML classification are evaluated based

on aforementioned parameters accuracy, precision, recall and f 1 -score. Accuracy
of different classifiers with and without DR are given in Table 1 is evaluated using
Table 1 Accuracy of
Classifiers Accuracy
classifiers with and without
DR Without DR n = 784 With DR (PCA) n = 70
SVM 0.9171 0.9381
NB 0.5447 0.8754
KNN (k = 3) 0.9400 0.9750
Eq. (11). KNN classifier gives 94% accuracy Without DR when k = 3. After applying
PCA, dimensionality has been reduced to 70, KNN gives 97.5% accuracy. SVM and
NB also increases its accuracy with PCA up to 93.8 and 87.5%.
Precision of different classifiers with and without DR are given in Table 2 using
Eq. (10). Precision to predict each digit (0–9) by different algorithms has been iden-
tified as follows. Precision values were low when we use only classification algo-
rithms. Precision has increased when classification used together with PCA. Among
all techniques PCA + KNN gives us best result.
Recall of the different classifiers with and without DR are given in Table 3 using
Eq. (13). Recall to predict each digit (0–9) has been identified as follows. Recall
values were very low in case of NB. Recall increased when we use classification
together with PCA.
F 1 -score of various classifiers with and without DR are given in Table 4 using
Eq. (13). F 1 -score to predict each digit (0–9) correctly has been identified as follows.
F 1 -score very low when we use only classification. We can identify that these values
were increased when we use classification together with PCA.
Table 2 Precision of classifiers with and without DR

Digit Precision
NB SVM KNN PCA + NB PCA + SVM PCA + KNN
0 0.68 0.94 0.94 0.96 0.96 0.98
1 0.79 0.96 0.95 0.98 0.97 0.98
2 0.87 0.89 0.95 0.81 0.92 0.98
3 0.66 0.87 0.92 0.83 0.92 0.97
4 0.84 0.89 0.93 0.86 0.91 0.97
5 0.48 0.88 0.93 0.78 0.91 0.98
6 0.68 0.96 0.96 0.93 0.95 0.98
7 0.92 0.93 0.93 0.93 0.96 0.97
8 0.28 0.92 0.97 0.85 0.94 0.98
9 0.41 0.92 0.91 0.84 0.94 0.96
Table 3 Recall of classifiers with and without DR

Digit Recall
0 0.91 0.97 0.99 0.94 0.98 1.00
1 0.95 0.98 0.99 0.93 0.98 0.99
2 0.20 0.90 0.92 0.86 0.92 0.97
3 0.33 0.89 0.94 0.85 0.92 0.97
4 0.08 0.95 0.94 0.86 0.97 0.98
5 0.03 0.85 0.91 0.85 0.90 0.96
6 0.92 0.95 0.97 0.91 0.95 0.99
7 0.26 0.93 0.93 0.86 0.94 0.98
8 0.72 0.86 0.89 0.87 0.91 0.95
9 0.94 0.87 0.91 0.83 0.90 0.96
Table 4 F 1 -score of classifiers with and without DR

Digit F 1 -score
0 0.77 0.96 0.96 0.95 0.97 0.99
1 0.86 0.97 0.97 0.95 0.97 0.98
2 0.33 0.90 0.93 0.83 0.92 0.97
3 0.44 0.88 0.93 0.84 0.92 0.97
4 0.15 0.92 0.94 0.86 0.94 0.97
5 0.06 0.87 0.92 0.81 0.90 0.97
6 0.78 0.96 0.97 0.92 0.95 0.99
7 0.41 0.93 0.93 0.90 0.95 0.97
8 0.41 0.89 0.93 0.86 0.93 0.97
9 0.58 0.89 0.91 0.83 0.92 0.96
7 Conclusion
In this paper, the effect of DR using PCA on ML classification algorithms has been
investigated. The MNIST has 42,000 labeled (28 × 28 pixel) grayscale images,
totally 784 features. ML classification (Naïve Bayes, SVM and KNN) applied on
raw dataset as well as reduced dataset, results have been identified. PCA and ML
classification algorithms together gives better results. In the future, the effectiveness
of DR technique can also be applied on other dataset such as text data and image
dataset (which contains high dimensionality). Other DR techniques and classification
algorithms can also test.
References
1. Jolliffe I (1986) Principal component analysis. Springer, New York

2. Beohar D, Rasool A (2021) Handwritten digit recognition of MNIST dataset using deep learning
state-of-the-art artificial neural network (ANN) and convolutional neural network (CNN). In:
2021 International conference on emerging smart computing and informatics (ESCI), 5–7 Mar
2021, AISSMS Institute of Information Technology, Pune, India
3. Le Cun Y et al (1998) Gradient-based learning applied to document recognition. Proc IEEE
86:2278–2324
4. Huang CL, Dun JF (2008) A distributed PSO–SVM hybrid system with feature selection and
parameter optimization. Appl Soft Comput 8(4):1381–1391
5. Yan H, Tianyu H (2017) Unsupervised dimensionality reduction for high-dimensional data
classification. Mach Learn Res 2(4):125–132. https://doi.org/10.11648/j.mlr.20170204.1
6. Sarowar Md G, Jamal AA, Saha A, Saha A (2020) Performance evaluation of feature extrac-
tion and dimensionality reduction techniques on various machine learning classifiers. In: 9th
International conference on advanced computing (IACC), 03 June 2020 UTC from IEEE Xplore
7. Thippa Reddy G, Praveen Kumar Reddy M, Lakshmanna K, Kaluri R, Rajput DS, Srivatava
G, Baker T (2020) Analysis of dimensionality reduction techniques on big data. IEEE Access
8:54776–54878
8. Alsaafin A, Elnagar A. A minimal subset of features using feature selection for handwritten
digit recognition. J Intell Learn Syst Appl 9:55–68
9. Adiwijaya, Wisesty UN, Lisnawati E, Aditsania A, Kusumo DS (2018) Dimensionality
reduction using principal component analysis for cancer detection based on microarray data
classification. J Comput Sci 14(11):1521–1530
10. Hartono P (2016) Classification and dimensional reduction using restricted radial basis function
networks, 17 Nov 2016. The natural computing applications forum 2016
11. de Paula Rodrigues GE, Marcílio WE Jr, Eler DM (2018) Data classification: dimension-
ality reduction using combined and non-combined multidimensional. In: 2018 7th Brazilian
conference on intelligent systems projection techniques, pp 403–407
12. Mardani A, Liao H, Nilashi M, Alrasheedi M, Cavallaro F (2020) A multi-stage method
to predict carbon dioxide emissions using dimensionality reduction, clustering, and machine
learning techniques. J Cleaner Prod. https://doi.org/10.1016/j.jclepro.2020.122942
13. Zebari RR, Abdulazeez AM, Zeebaree DQ, Zebari DA, Saeed JN (2020) A comprehensive
review of dimensionality reduction techniques for feature selection and feature extraction. J
Appl Sci Technol Trends (JASTT) 01(02):56–70. ISSN: 2708-0757
14. Ramakrishna Murty M, Murthy JVR, Prasad Reddy PVGD (2011) Text document classification
based on a LeastSquare support vector machines with singular value decomposition. Int J
Comput Appl (IJCA) 27(7):21–26
15. Saleem TJ, Chishti MA (2020) Assessing the efficacy of machine learning techniques for
handwritten digit recognition. Int J Comput Digital Syst 9(2). ISSN (2210-142X)
16. Cover TM, Hart PE (1987) Nearest neighbor pattern classification. Trans IEEE Inf Theory
IT-13:21–27
To Detect Plant Disease Identification
on Leaf Using Machine Learning
Algorithms
P. Praveen, Mandala Nischitha, Chilupuri Supriya, Mitta Yogitha,

and Aakunoori Suryanandh
Abstract India’s economy is based on agriculture. Farmers raise a range of crops

to meet their needs. Weather, soil conditions, disease, and other factors all have an
impact on crop production. Plant diseases are one of the leading causes agricultural
production and economic losses are both on the rise. Identifying plant disease is the
initial step in averting some losses in agricultural output quantities. Plant disease iden-
tification is crucial for long-term agricultural viability, yet it’s manually monitoring
plant diseases is tough. It necessitates a lot of labor, as well as knowledge of plant
diseases and long processing times. The convolution neural networks (CNNs) used
in this article performed exceptionally well in picture classification. The purpose of
this research is to provide a fresh perspective to constructing a leaf blight recognition
model based on leaf photo categorization. Deep convolution networks are utilized
to do this. The proposed model is capable of distinguishing between plant leaves
and their immediate environs, as well as healthy leaves and 13 various types of leaf
diseases. The study details all of the steps involved in putting this disease recog-
nition model into action, beginning with the collecting of pictures and the creation
of a database that has been verified by agricultural experts. For independent class
assessments, the precision of the experimental findings on the created model ranged
from 91 to 98%, with an average of 96.3%.
Keywords Classification · Clustering · CNN · Pattern recognition · Image

processing
P. Praveen (B)
Department of Computer Science and Artificial Intelligence, SR University, Warangal, Telangana
506371, India
e-mail: prawin1731@gmail.com
M. Nischitha · C. Supriya · M. Yogitha · A. Suryanandh
Department of Computer Science Engineering, SR Engineering College, Warangal,
Telangana 506371, India
240 P. Praveen et al.
1 Introduction
All of these issues might be solved with organic farming. Pest and disease manage-
ment, as well as fertilization, are the most important aspects of organic farming.
Disease identification is a complex undertaking that necessitates prior experience.
Infections colorful dots or streaks, as well as associated symptoms, are frequently
visible on plant leaves. Microbes such as fungi, bacteria, and viruses are widely
found in the environment accountable for plant illnesses. The signs and symptoms
of plant illness vary depending on the disease’s cause or ethnology [1–3].
The current method for identifying plant diseases is basic naked-eye inspection,
which necessitates more staff, properly equipped laboratories, costly technologies,
and so on. Incorrect disease identification can lead to incorrect pesticide application,
which can contribute to the development of long-term pathogen resistance and a
decrease in the crop’s ability to defend you. Plant disease can be identified using a
variety of methods. The plant’s leaves that have been infected even skilled agricul-
tural professionals as well as plant pathologists commonly fail to diagnose specific
diseases due to this complication, as well as the enormous number of crops in devel-
opment and their current status. Photo pathogenic concerns, resulting in incorrect
results and concern remedies. Identifying plant infections is the only way to avert
some losses in agricultural output quantities [4–6].
To be sustainable, agriculture must rely on plants to detect diseases, yet physi-
cally monitoring plant diseases is challenging. It necessitates a substantial amount
of work, as well as plant disease knowledge and a lengthy processing time. Our
project is based on a convolution neural network-based system that detects cotton
leaf illnesses. It makes it easier to detect bacterial illnesses and their consequences
on the environment. It is difficult to pinpoint disease in crops in the early stages. This
task requires farmers to be physically present. Detecting and identifying diseases on
the crop are really important [7–9].
Agriculture is essential for feeding the world’s populations of humans and live-
stock with the help of introduction agriculture’s use of renewable energy technology
involvement in generation of clean energy has grown. Agriculture offers raw mate-
rials as well for textile, chemical, and pharmaceutical manufacturing. Despite having
only a 10% chance of winning, increase in the amount of land utilized for agriculture
in between 1960s and in the early twentieth century, agriculture output increased
thrice [1, 10].
2 Related Work
We have chosen a few papers that deal with employing advanced approaches to detect
plant leaf diseases, and we have included a few of them below.
Using K-mean clustering, texture, and color analysis, the authors of paper [1, 11]
devised a method for identifying disease in Malus domestic. It makes use of textures
To Detect Plant Disease Identification on Leaf Using … 241
and colors that are unique, widespread in both healthy and diseased areas to identify
and distinguish between different agricultural types.
The author of paper [4] looked at and studied a total of 40 research projects that
used approaches for deep learning address a variety of food and agricultural produc-
tion issues. Investigate the unique agricultural issues being investigated, frameworks
and models that were utilized, as well as the overall effectiveness attained using
the measures for each task under inquiry. Also, compare deep learning to other
well-known approaches to see if the classification or regression results differ.
Deep learning outperforms traditional image processing systems [2, 12]. In a
research paper (SVM), the implementation of an SVM-based regression approach
resulted in a more accurate. The relationship between environmental factors and
disease severity is described, which could be useful in disease management.
[Sujatha R., Y. Sravan Kumar, and Garine Uma Akhil]
According to this study, plant disease in the agriculture sector is fairly challenging.
There is a significant loss in agricultural production and market economic value
if the identification is incorrect. The detection of leaf diseases necessitates a vast
amount of data of labor, plant disease knowledge, and additional processing time.
As a result, we may use MATLAB image processing to detect leaf disease. Image
loading, contrast improvement, RGB to HSI conversion, feature extraction, and SVM
are all processes in the sickness detection process. This study uses image processing
techniques to provide a method for detecting and categorizing plant diseases that is
both efficient and accurate. K-means and GLCM algorithms are used to detect plant
leaf disease [13, 14]. This method automates the process, reducing detection time
and labor expenses.
[Dr. Gagan Jindal Chandigarh, Simranjeet Kaur, Geetanjali Babbar, Navneet Sandhu]
Plant leaf diseases must be identified, as summarized in this research article, is a
preventative step in reducing yield loss and overall crop quantity in agriculture.
Observing and recognizing patterns that are observed and engraved on the leaves is
the essence of plant disease research. As a result, early disease diagnosis of any plant
before it has a negative impact becomes crucial for long-term agriculture sustain-
ability. However, due to the high costs involved with the technique, which need a
large amount of work, energy, talent, and, last but not least, processing time, manu-
ally identifying, monitoring, and drawing conclusions from plant leaf diseases is
exceedingly difficult. As a result, image processing concepts come in handy and
are used to diagnose diseases. The detection method includes picture acquisition,
segmentation, picture preprocessing, segment feature extraction, and classification
based on the findings [15–17].
3 Problem Statement
This is a study of leaf disease detection using various models of machine learning
this indicates whether or not the leaf will have a disease or not. Figure 1 depicts
the pipeline to early disease of leaf disease. At the start of the convolution layer, a
convolution core is defined. The essential benefit of the convolution neural network
is the local receptive field, which may be thought of as a local receptive field. When
the convolution core processes data input, it slides onto the feature map and extracts
a piece of feature information. After the convolution layer has retrieved the feature,
the neurons are sent to the pooling layer. The mean, maximum, and random values
of all values in the local receptive field are currently calculated using current pooling
methods.
In this project, initially, we are gathering dataset from Kaggle that contains train
and valid datasets of images including various diseased leaves. These sets of data
have ten classes in total where nine classes are diseased leaves and one class of
healthy leaves. After importing datasets, we do image processing that includes
resizing and reshaping to do this we have to import ImageDataGenerator from
keras.preprocessing.image followed by checking the model to ensure all the params
are trainable.
In addition to that we import and preprocess an image from valid dataset later
display the imported image and send through different layers on convolution neural
network. In next step, the process is visualization, here we acquire different filters
or features based on given number of rows and columns that are result of CNN
from previous process. Later, we train the model with certain epochs and mentioned
learning rate to acquire better accuracy, there more the epochs are the better the
accuracy is. After training, we plot a graph of training and validation accuracy. The
next step ahead is saving the model by importing load_models from keras.models
Fig. 1 Flow chart for leaf disease detection

and save it in any location on the disk. The final step is to import an image from
valid dataset, preprocess and detect whether the provided input leaf is diseases or
healthy. We also find the probability of the given leaf with the leaf from training
dataset. Hence, we can find the status of leaf whether it is a diseased leaf or a healthy
leaf. Convolution networks are a type of neural network that has been shown to be
particularly effective at picture recognition and categorization.
4 Dataset and Attributes
The information was gathered from the Kaggle website, which provides eleven char-
acteristics that can be used to detect the disease a leaf is suffering from. The qualities
of tomatoes are investigated in this research. Tomato Leaf Mold with a Bacterial
Blemish Tomato spider mites, tomato Target Spot Tomato Yellow Leaf Curl Virus,
and Septoria leaf spot tomato two-spotted spider mite. There are a total of 19,286
leaf images in the dataset, with ten records being disease free.
This table consists of different types of diseases; this information is completely
based on the disease which is occurred on different types of leaves of tomato. That
means different spots are considered as a different type of disease. A leaf can have
different type of disease, and here, we display the types of diseases.
The local receptive field of the convolution neural network is its main benefit.
The core of the convolution glides across the screen. While processing data, use a
feature map to retrieve a piece of feature information. The most common pooling
algorithms today include all values in the local receptive field’s mean, maximum,
and random values are calculated. Convolution neural networks (CNN) are a type of
neural network that has been found to be particularly effective in image recognition
and categorization. Convents have been used to recognize faces, objects, and traffic
signs, as well as to power robotics. Conversion by learning visual attributes and
employing small filters, it is possible to preserve the spatial relationship between
pixels. The convolution network is made up of many critical components.
The database’s name is image reshaping, resizing, preprocessing includes things
like array conversion and data transformation. A similar treatment is applied to the
test image. A database of roughly 10,000 plant species is compiled, and any image
from that collection can be utilized as a software test image. The model (CNN) is
trained to recognize the test image, and the ailment it is suffering from using the train
database. CNN layers include Dense, Dropout, Activation, Flatten, Convolution2D,
and MaxPooling2D. If the plant species is in the database and the model has been
properly trained, the program can detect the illness. After adequate training and
preprocessing, the comparison is made. To predict the disease, a comparison of the
test image and the trained model is used.
5 Application of the Outcomes
An algorithm processes the image as soon as it arrives on the server. By convolving the
filter across the image, we extract image features using a convolution process, which
produces feature maps such as edges, texture, spots, holes, and color. These feature
maps are down sampled before being passed to a layer that is entirely connected,
such as a classifier, where ReLu, or non-linearity, is used to solve a hard task like
classification (Fig. 2).
. Step 1: Resize all of the photographs in the collection to 256 × 256.
. Step 2: The information is divided into two groups: training and testing.
. Step 3: Data augmentation: To minimize over fitting, the training set is augmented
by rotating, scaling, and adding random noise to images.
. Step 4: Extraction of features: Features would be obtained using the convolution
technique to generate layers in the CNN design.
. Step 5: Model training: In our circumstance, we use the sequential model. The
sequential model API allows you to create deep learning models by creating a
sequential class and layering model layers on top of it.
. Step 6: Evaluation: The model’s correctness will be tested using a test set.
. Step 7: Tuning: If the results are not what you expected, fine-tune the model by
changing architecture elements such as kernel size and nodes.
. Step 8: Save the weights: Save the final model under the model name once you
have completed training it.
Fig. 2 Process to extracting leaf from dataset

. In order to use it with new data, you will need to edit the h5 configuration file.
. Step 9: A web application based on Flask would be built to upload images to the
server and display the results.
. Step 10: These programmers are in responsible of preprocessing the user’s
uploaded image, categorizing it based on its features, and presenting results.
. Step 11: Photograph a scene, resize it, and upload it to the server.
. Step 12: Compare the extracted characteristics to the model that has been trained.
We considered ten tomato disease classes which holds around 10,000 images.
This work provides a genuine idea for detecting the attacked leaf, and the farmer
who works to produce these receives a remedy, allowing them to increase agricul-
tural production. Specialists in the agriculture department accept image processing
techniques for rapid disease detection, and as a result, image processing technology
has reached a significant milestone in a relatively a short time frame. The transited
portion of the leaf is easily segmented and analyzed using the CNN model, and the
best possible result is provided instantly. As a result, farmers who detect plant disease
manually can save time and reduce their risk of misdiagnosis. Our long-term goal is
to create an open multimedia system, and software that can automatically detect and
treat plant diseases (Fig. 3).
In this code sample, we will train the dataset and find its accuracy. So, after
writing the code, we will get the output in terms of graphs, the redline represents the
validation accuracy and blue line represents the training accuracy. Training accuracy
can be calculated by diving the number of correct predictions by the number of total
predictions.
Fig. 3 Training the dataset and finding the validation accuracy of the dataset
6 Result Analysis
After calculating the validation and training accuracy. We have saved the model, by
using module from keras.model import load module, later we have loaded the saved
model for testing purpose to get the accurate results. Followed by preprocessing
test data by using modules from keras.model import sequential. Here, we are giving
image width and height as 256 (256 × 256).
Tomato results (Fig. 4):
In this, we have validated the accuracy by training the model. Here, we
have predicted whether the leaf is diseased or not. By using this CNN model,
we have predicted good accuracy. After preprocessing our images to size
256 × 256 from keras.preprocessing import image module, we have loaded
the image using image.load_img and then converted the image into array using
image.img_2_array(img) img-temporary variable for a particular image. There are
two lists in this section. A key list is the first, while a value list is the second.
We can eventually determine which form of sickness is affecting the leaf by using
these classes. Because we had ten distinct sorts of illnesses, we were able to figure
out which one impacted the leaf. Here, we can observe that the leaf belongs to
Tomato_mosaic_virus class that means it belongs to diseased class. The word result
in the above code is defined to assign the prediction of the image taken from one of
the classes of valid dataset. In this, we take help of dictionaries to get keys and values
of the classes after converting into lists. Finally, we will display the index position
Fig. 4 Accuracy of the dataset

Fig. 5 Healthy leaf status
of the result value to show the class name like in this case it is Tomato_mosaic_virus
(Fig. 5).
After preprocessing, we have predicted to which class the output belongs to. We
have taken test image from validation dataset by using model.predict classes. The
disease keyword in the above code is to show the test image figure. Hence, we have
predicted that the leaf is healthy. After assigning the location of the image to result,
loading certain image and displaying it we have obtained the probability of the leaf
based on the result using np because it is used to perform array operations efficiently.
In this case, the leaf belongs to healthy class as we have mentioned earlier there
are ten various classes in total where nine of them contains diseased leaves and
one possess healthy leaves. Here, the output leaf belongs to healthy class with 99%
accuracy so it displayed leaf is healthy. In the second test case, we have taken another
leaf from the valid dataset by using model.predict classes same like in the previous
test case. The same process is followed from preprocessing we have predicted to
which class the output belongs to. The disease keyword in the above code is to
show the test image figure. Here, the leaf given in the input compares with all the
images from training set and predicts the class of the given leaf. In this test case,
the leaf belongs to Tomato_Spider_mites_Two_spotted_spider_mite which means it
is a diseased leaf as apart from healthy_leaf class all other classes contain diseased
leaves. The probability that it belongs to that class is 99%.
7 Conclusion
Although there are several approaches for detecting and classifying plant diseases
using automated or computer vision, this study area is yet immature. Furthermore,
with the exception of those dealing with plant species recognition based on leaf
photos, no commercial solutions are available on the market. In this paper, we will
look at, an algorithm for deep learning was used to investigate a novel method-
ology for automatically categorizing and identifying plant illnesses from leaf photos.
The proposed model was successful in detecting the presence of leaves and distin-
guish between 13 healthy leaves and distinct disorders that could be identified visu-
ally. From picture collecting for training and validation to image preprocessing and
augmentation, and eventually training and fine-tuning the deep CNN, the full tech-
nique was detailed. The performance of the newly designed model was evaluated
using a series of tests.
References
1. Kulkarni AH, Ashwin Patil RK (2012) Applying image processing technique to detect plant
diseases. Int J Mod Eng Res 2(5):3661–3664
2. Revathi P, Hemalatha M (2012) Classification of cotton leaf spot diseases using image
processing edge detection techniques. In: IEEE International conference on emerging trends
in science, engineering and technology, Tiruchirappalli, Tamil Nadu, India, pp 169–173
3. Al-Tarawneh MS (2013) An empirical investigation of olive leave spot disease using auto-
cropping segmentation and fuzzy C-means classification. World Appl Sci J 23(9):1207–1211
4. Argenti F, Alparone L, Benelli G (1990) Fast algorithms for texture analysis using co-
occurrence matrices. IEE Proc Radar Signal Process 137(6):443–448
5. Wang H, Li G, Ma Z, Li X (2012) Image recognition of plant diseases based on back propagation
networks. In: 5th International congress on image and signal processing, Chongqing, China,
pp 894–900
6. Arivazhagan S, Newlin Shebiah R, Ananthi S, Vishnu Varthini S (2013) Detection of unhealthy
region of plant leaves and classification of plant leaf diseases using texture features. Comm Int
Genie Rural (CIGR) J 15(1):211–217
7. Jaware TH, Badgujar RD, Patil PG (2012) Crop disease detection using image segmentation. In:
National conference on advances in communication and computing, World Journal of Science
and Technology, Dhule, Maharashtra, India, pp 190–194
8. Zhang Y-C, Mao H-P, Hu B, Li M-X (2007) Feature selection of cotton disease leaves
image based on fuzzy feature selection techniques. In: Proceedings of the 2007 international
conference on wavelet analysis and pattern recognition, Nov 2007, Beijing, China, pp 124–129
9. Arivazhagan S, Newlin Shebiah R, Ananthi S, Vishnu Varthini S (2013) Detection of unhealthy
region of plant leaves and classification of plant leaf diseases using texture features. Agric Eng
Int CIGR 15(1):211–217
10. Shaik MA, Verma D (2020) Deep learning time series to forecast COVID-19 active cases in
INDIA: a comparative study. IOP Conf Ser Mater Sci Eng 981:022041. https://doi.org/10.1088/
1757-899X/981/2/022041
11. Praveen P, Shaik MA, Kumar TS, Choudhury T (2021) Smart farming: securing farmers using
block chain technology and IOT. In: Choudhury T, Khanna A, Toe TT, Khurana M, Gia Nhu N
(eds) Blockchain applications in IoT ecosystem. EAI/Springer innovations in communication
and computing. Springer, Cham. https://doi.org/10.1007/978-3-030-65691-1_15
12. Shaik MA, Verma D (2021) Agent-MB-DivClues: multi agent mean based divisive clus-
tering. Ilkogretim Online Elementary Educ 20(5):5597–5603. https://doi.org/10.17051/ilkonl
ine.2021.05.629
13. Pramod Kumar P, Sagar K (2019) A relative survey on handover techniques in mobility
management. IOP Conf Ser Mater Sci Eng 594:012027
14. Shaik MA, Verma D, Praveen P, Ranganath K, Yadav BP (2020) RNN based prediction of
spatiotemporal data mining. IOP Conf Ser Mater Sci Eng 981:022027. https://doi.org/10.1088/
1757-899X/981/2/022027
15. Kumar S, Manjula B, Shaik MA, Praveen P (2019) A comprehensive study on single sign on
technique. Int J Adv Sci Technol (IJAST) 127. ISSN: 2005-4238; E-ISSN: 2207-6360
16. Shaik MA, Verma D (2020) Enhanced ANN training model to smooth and time series forecast.
IOP Conf Ser Mater Sci Eng 981:022038. https://doi.org/10.1088/1757-899X/981/2/022038
17. Praveen P, Babu CJ, Rama B (2016) Big data environment for geospatial data analysis. In: 2016
International conference on communication and electronics systems (ICCES), Coimbatore, pp
1–6
18. Ravi Kumar R, Babu Reddy M, Praveen P (2019) An evaluation of feature selection algorithms
in machine learning. Int J Sci Technol Res 8(12):2071–2074. ISSN 2277-8616
Association and Correlation Analysis
for Predicting the Anomaly in the Stock
Market
R. Ravinder Reddy, M. Venkata Krishna Reddy, and L. Raghavender Raju
Abstract The stock market is more volatile and fluctuate along with the time, the
rapid change of the price value; it is very difficult to predict the price of the stock.
Stock market price is mostly determined by the demand of the stock, which is deter-
mined by the gross purchase and sales. In the stock market, these are mostly done
by the domestic intuitional investors (DII) and foreign intuitional investors (FII).
Their percentage of investment is very huge compared to the retail investors in the
market. The price change is mostly determined by the activity done by the FII and
DII. The market price is dominated by the FII and DII; in this work, we identified the
association and correlation between the FII and DII activities. The results show the
suspicious anomaly between the FII and DII. In the Indian stock market, every day
an average of 6.43 billion shares was traded depending on total composite volume.
But surprisingly in the last one decade, the DII and FII are negatively correlated.
Keywords Association analysis · Correlation · Stock market
1 Introduction
The most essential questions for investors in the stock market are how to predict
future stock values. It is most uncertain thing in the market, even the huge develop-
ments and research are conducted on various subjects like mathematics, statistics,
machine learning, data mining, and deep learning; none of the model has predicted
the price accurately. The specific model or tool to identify the price of the market
R. Ravinder Reddy (B) · M. Venkata Krishna Reddy

Department of Computer Science and Engineering, Chaitanya Bharathi Institute of
Technology(A), Gandipet, Hyderabad, India
e-mail: rravinderreddy_cse@cbit.ac.in
M. Venkata Krishna Reddy
e-mail: krishnareddy_cse@cbit.ac.in
L. Raghavender Raju
Department of Computer Science and Engineering, Matrusri Engineering College, Hyderabad,
India
e-mail: lraghavenderraju@matrusri.edu.in
252 R. Ravinder Reddy et al.
or the momentum of the market is very difficult. In our observation of the market
over a period of decade, we have come to one conclusion. These markets mostly
depend on the FII and DII activities. These institution can pour the large quantity
of money into the market; they easily change the direction of the market. Many
various methodologies, mathematical formulations, genetic algorithm (GA)-based
models, neural network models, machine learning-based techniques, and so on, have
all been presented and tested to varying degrees of success. Among these, the surprise
behavior of the FII and DII is detected in this work. Using these correlations, we can
predict the price up to a certain level more accurately.
Mostly, the traditional users believe that the market movement is normal, but it is
deviated by the FII and DII. The strategy in between the FII and DII has observed and
is mostly negatively correlated which influences the market condition. The activity
is involved very highly to divert the actual move of the market. The traditional
user’s general public may feel that the market move is predicted as per the financial
conditions in the current scenarios. But, it may not be accurate and true, because of
these big players based on some understandings move the markets in their direction
to create a panic situations among the general investors [1].
Increased computational power makes the users to determine the price and study
the different strategies easily. Many of the existing tools identify the flaws and trade
value of the different people. The main aim of this work is to determine the direction
of the stock market, which is the real move or anomaly move. This work majorly
focuses on the FII and DII data, which are collected in the real-time mode. Most
of the time, the data shows the fact that the FII and DII are inversely proportionate
activity. This influences the market for the long period of time.
The basic building blocks of the model consist the following stages; in each
stage, the model is crucial in decision-making process. The data collection is the
more challenging part; it is taken from the real-time data, collected from the Indian
stock market.
1. Data collection
2. Data analysis
3. Feature engineering
4. Model selection
5. Model building.
2 Related Work
Most of the researchers have contributed toward the stock price prediction using
machine learning and deep learning [2]. More people invest their money in the stock
market to get the maximum benefit, but really it is not happening due to the huge
correction by the different fund houses playing the strategies to get the maximum
money. However, this kind of investment possesses a lot of risks. Detecting such kind
of risks and the anomaly in stock market price is important, before the smart player
or smart investors like FII’s and DII’s exit their positions. Here, the risk and the
Association and Correlation Analysis for Predicting … 253
anomaly are the different perspectives; risk is to gain the money believing the other
persons. But, the anomaly is taking the money by creating the panic environments
by using different strategies.
The distribution phase of Wyckoff theory discusses the accumulation phase of
the stocks, and it is opposite to the accumulation in the distribution phase and as it
is defined by Richard Wyckoff [3]. This is an oldest theory just a slight background
about Richard Wyckoff, he used to write in wall street journal long time ago, and the
theory was defined that most of the fund managers and most of the smart investors
exist and retailers do not. The big gambling is happened in the stock market because
everyone wants the maximum profits [1].
The traders generally retail investors feel that prices have retraced and going to
heading the new high no; it tests those levels which are generally known as lpsy last
point of supply; this generally happens with gaps maximum time, where we witness
gaps get down re-test of gaps, prices go down again, and again the selling pressure
continues; generally, this distribution happens and stock price forms major top over
here, and there can be talk for weeks to come months and even years. The Wyco
theory really helps us to find out the right exit position of the stock, most of the fund
managers use this wisely to exit their positions without affecting the selling price,
and it can be implemented on any asset class which has the volume data, so volumes
here play a key role you need to keep in mind [4].
The other researcher used a histogram-based detector to detect irregularities,
which were then retrieved via association rule mining. The Apriori and FP growth
algorithms were used to build the metadata rules collection. In that study, the author
compared the results of the Apriori technique with the FP growth algorithm, revealing
how the FP growth algorithm achieves better results in terms of lowering time
and space complexity. The implementation of the FP growth algorithm has been
postponed as a future development [5].
3 Data Preparation
The stock dataset contains several characteristics, including both continuous and
categorical data. These values should be quantitative and categorized to aid in the
optimization process. All alphanumeric values must now be translated to numeric
values first, followed by continuous values being changed to categorical values [6].
As shown in Table 1, the day-wise DII and FII activities regarding the purchase
or sale are collected for the month of January 2020. We have collected the data for
the past 10 years from the NSE India. This data is collected for the Indian stock
market. These data is pre-processed to remove any inconsistencies and redundant
and missing values.
Table 1 FII and DII day-wise gross purchase/sale activity

Date FII Gross FII Gross FII Net DII Gross DII Gross DII Net
purchase sales purchase/sales purchase sales purchase/sales
31-Jan-20 5142.69 9321.81 −4179.12 7024.62 3208.18 3816.44
30-Jan-20 4674.24 5636.52 −962.28 3748.39 3456.04 292.35
29-Jan-20 5012.44 6026.71 −1014.27 4662.96 3142.06 1520.90
28-Jan-20 4871.86 6229.42 −1357.56 4857.64 4145.94 711.7
27-Jan-20 3025.73 3464.58 −438.85 3451.63 3441.12 10.51
24-Jan-20 4884.53 4225.42 659.11 3748.79 3330.83 417.96
23-Jan-20 7969.55 6617.42 1352.13 6729.56 7714.12 −984.56
22-Jan-20 5254.68 5431.11 −176.43 4286.22 4612.44 −326.22
21-Jan-20 6095.57 6145.65 −50.08 3255.07 3562.88 −307.81
20-Jan-20 5050.19 5044.32 5.87 3332.28 4752.13 −1419.85
17-Jan-20 6609.58 6,345.32 264.26 3615.43 4115.60 −500.17
16-Jan-20 4758.32 5153.56 −395.24 3686.43 3871.08 −184.65
15-Jan-20 6026.82 5,747.29 279.53 3897.86 4546.20 −648.34
14-Jan-20 4880.95 5086.51 −205.56 3934.36 4576.83 −642.47
13-Jan-20 4881.89 4761.10 120.79 595.5 1724.39 −1128.89
10-Jan-20 4679.66 4101.38 578.28 4438.75 4690.49 −251.74
09-Jan-20 4716.40 5147.51 −431.11 5020.49 4601.27 419.22
08-Jan-20 4109.04 4624.89 −515.85 5162.02 4413.62 748.4
07-Jan-20 3911.17 4593.40 −682.23 4205.16 3893.97 311.19
06-Jan-20 3732.00 3,835.84 −103.84 3778.78 3802.48 −23.7
03-Jan-20 4514.35 3251.30 1263.05 2750.87 3780.07 −1029.20
02-Jan-20 2670.78 1982.02 688.76 3490.16 3426.21 63.95
01-Jan-20 340.21 399.08 −58.87 1688.37 1479.90 208.47
4 Methodology
The collected market data is used to analyze the behavior of the FII and DII. To
analyze these data, here we used the association and correlation of the DII and FII,
which impact the users how the movement of the market is in the real scenarios [7,
8]. The major thing is to analyze their behavior with the FII and DII along with the
market movement. The positive and the negative analysis of the data is used. Here,
the value negative represents the net sales in the market.
4.1 Data Mining Association Rule
Data mining for the most part refers to the method involved with mining information
from a huge data. In this process, the new information is predicted by comprehending
the current information. By large, data mining is classified into two classes: predictive
and descriptive. The general properties of the information in the dataset are depicted
by elucidating mining. Predictive mining performs derivation on present information
to make expectations [9, 10].
Specifically, two data mining approaches have been proposed [11] and used for
anomaly disclosure: association rules and recurrence episodes. Association rule algo-
rithms notice connections between elements or properties used to portray a dataset.
Association rules mining began a strategy for tracking down interesting rules from
value-based datasets.
Association rule mining was proposed in, where the formal definition of the issue
is introduced as: Let L = {i1 ,..., in } be a set of literals, called items. Let dataset D
alone be a set of records, where each transaction T is a set of items to such an extent
that T L. Associated with each exchange is an original identifier, called its transaction
id (TID).
The transaction T contains X, a set of specific items in L, if XL. An association
rule is a ramifications of the form X → Y, where X L, Y L, and X ∩ Y = Ø. The
standard X → Y holds in the transaction set D with confidence c if c% of transaction
in D that contains X moreover contains Y. The standard X → Y has support s in the
transaction set D if s% of transaction in D contains X U Y.
Given a set of items I = {I1, I2,…,Im} and an dataset of transaction D = {t1,
t2, …, tn} where ti = {Ii1, Ii2, …, Iik} and Iij I, the association rule problem is to
recognize all association rules X → Y with a base help and confidence. The support
of the standard is the degree of trades that contains both X and Y in all transaction
not entirely settled as |X Y|/|D|. The support of the standard gauges the significance
of the correlation between item sets. The confidence is the degree of transaction that
contains Y in the transaction that contains X. The confidence of a standard gauges
the degree of correlation between the item sets which is defined as |X Y |/|X|. The
support is an extent of the recurrence of a standard, and the certainty is an extent of
the strength of the connection between the arrangements of things [11].
1. Correlation is a bivariate investigation that measures the strength of relationship
between two factors and the direction of the relationship. As far as the strength of
relationship, the value of the correlation coefficient changes among +1 and −1.
A value of ±1 shows an ideal level of association between the two factors. As the
correlation coefficient esteem goes toward 0, the relationship between the two
factors will be more vulnerable. The direction of the relationship is demonstrated
by the indication of the coefficient; a + sign shows a positive relationship, and
a− sign shows a negative relationship. As a rule, in insights, we measure four
kinds of correlations [12].
2. Pearson correlation,
Σ Σ Σ
n xi yi − xi yi
rx y = / / (1)
Σ ( Σ 2 ) Σ (Σ )2
n xi2 − xi n yi2 − yi
r xy Pearson r correlation coefficient between x and y.

n number of observations.
xi value of x (for ith observation).
yi value of y (for ith observation).
3. Kendall rank correlation,
Nc − Nd
τ (2)
1
2
n(n − 1)
N c Number of concordant.
N d Number of discordant.
4. Spearman correlation,
Σ
6 d12
ρ =1− (3)
n(n 2 − 1)
ρ Spearman rank correlation.

d i The difference between the ranks of corresponding variables.
n Number of observations.
5. Point-Biserial correlation,
M1 − M0 √
r pb = pq (4)
sn
M 1 Mean (for the entire test) of the group that received the positive binary
variable.
M 0 Mean (for the entire test) of the group that received the negative binary
variable.
S n Standard deviation for the entire test.
p Proportion of cases in the “0” group.
q Proportion of cases in the “1” group.
Table 2 Correlation of the

S. No. Correlation method Value
DII and FII
1 Kendall −0.35212
2 Pearson −0.505678
3 Spearman −0.516246
The correlation is performed for the FII and DII net purchase/sale; we identified that
the correlation among these two is negative −0.35212. But, the analysis of the results
shows that it is reverse of the market action. The market has to move upward if there
is a tremendous purchase and downward based on the selling pressure. But instead
of the demand and supply of the market, the FII and DII are artificially creating the
trends in the market.
In this, we performed the three correlation measures.
1. Kendall
2. Pearson
3. Spearman.
In all these correlation, the results are showing the negative relation, given in
Table 2.
As shown in Fig. 1, we show the correlation between the FII and DII which shows
purely these two are negatively correlated. This will impacts the market hugely. The
relations show that it is anomaly in between the two parties’ agreement [13–16].
Because which is to be happened as a casual relation, but in the past one decade
data compared, it shows that it is negatively correlated. But, it may not happen in the
general market. When we analyze the general person perspective, most of the time
it shows that it is positively correlated in between the users. In the general scenarios
also, most of the users will see the price movement based on the demand and supply.
It may not happen here. The fishy thing is both the behaviors negatively correlated;
this is surprising and shocking thing. It is a big surprise to everyone, among the
billion transactions; both the FII and DII behaviors are negatively associated. The
association not depends on few parameters.
6 Conclusion
To study the movement of the stock-market and patterns among sectorial indexes,
association rule mining and statistical correlation analysis were used. According
to the study’s findings, various sectoral indices are connected together. Another
intriguing discovery is that distinct industry indexes have a time-lag relationship.
This correlation can be used to estimate the direction of future index movement with
a forecast horizon of d days, where d is the number of days lag considered. We tested
Fig. 1 Correlation between

the DII and FII
our proposed method on a real-world dataset and discovered that it is capable of

capturing effective anomaly identification properties that PCA-like methods cannot.
Majorly the consistent negative correlation indicates the anomaly behavior among the
FII and DII. As a result, many investors can use this model to balance their portfolios
and decide which industry to invest in next to minimize risk. Some industries are
absolutely unrelated, while others are significantly linked (positively or negatively)
by correlation coefficients greater than 0.8.
References
1. Coleman H (2021) Is the stock market gambling? Why trading in the stock market isn’t
gambling, February 2021
2. Marchai FL, Martin W, Suhartono D (2021) Stock prices prediction using machine learning.
In: 2021 8th international conference on information technology, computer and electrical
engineering (ICITACEE). IEEE
3. Louis S, McGraw G, Wyckoff RO (1993) Case-based reasoning assisted explanation of genetic
algorithm results. J Exp Theor Artif Intell 5(1):21–37
4. US Equities Historical Market Volume Data, February 2021, [online] Available: https://www.
cboe.com/us/equities/market_statistics/historical_market_volume/
5. Aung KMM, Oo NN (2015) Association rule pattern mining approaches network anomaly
detection.In: Proceedings of 2015 international conference on future computational technolo-
gies (ICFCT’2015) Singapore. 2015
6. Jyothsna V, Rama Prasad VV (2016) FCAAIS: anomaly based network intrusion detection
through feature correlation analysis and association impact scale. ICT Express 2(3):103–116
7. Umer M, Awais M, Muzammul M (2019) Stock market prediction using machine learning
(ML) algorithms. ADCAIJ: Adv Distrib Comput Artif Intell J 8(4):97–116
8. Ding G, Qin L (2020) Study on the prediction of stock price based on the associated network
model of LS TM. Int J Mach Learn Cybern 11(6):1307–1317
9. Kamalov F (2020) Forecasting significant stock price changes using neural networks. Neural
Comput Appl 32(23):17655–17667
10. Henrique BM, Sobreiro VA, Kimura H (2018) Stock price prediction using support vector
regression on daily and up to the minute prices. J Finance Data Sci 4(3):183–201
11. Nalavade K, Meshram BB (2014) Finding frequent itemsets using apriori algorithm to detect
intrusions in large dataset. Proc 2014 IJCAIT 6(I):84–92
12. Kendall M, Gibbons JD (1990) Rank correlation methods edwardarnold. A division of hodder
and stoughton, A Charles Griffin title, London, pp 29–50
13. Su S et al (2019) A correlation-change based feature selection method for IoT equipment
anomaly detection. Appl Sci 9(3):437
14. Saboori E, Parsazad S, Sanatkhani Y (2010) Automatic firewall rules generator for anomaly
detection systems with Apriori algorithm. In: 2010 3rd international conference on advanced
computer theory and engineering (ICACTE), pp V6-57–V6-60. https://doi.org/10.1109/ICA
CTE.2010.5579365
15. Razaq A, Tianfield H, Barrie P (2016) A big data analytics based approach to anomaly detec-
tion. In: Proceedings of the 3rd IEEE/ACM international conference on big data computing,
Applications and Technologies
16. Mazel J, Casas P, Labit Y, Owezarski P (2011) Sub-space clustering, inter-clustering results
association and anomaly correlation for unsupervised network anomaly detection. In: 2011 7th
international conference on network and service management, pp 1–8
Early Identification of Diabetic
Retinopathy Using Deep Learning
Techniques
Sachin Sharma, Sakshi Zanje, and Dharmesh Shah
Abstract Diabetic retinopathy (DR) is a rapidly spreading disease, which is caused

in diabetic patients. Patients who have diabetic retinopathy may suffer from complete
vision loss. Even after scientific and medical advancement it is still incurable and it
is a big threat to humans. So, early detection of DR is important to provide treatment
on time. Manual detection of DR is time, cost, and effort consuming. A convolutional
neural network (CNN) is a method of deep learning and it is more widely used in
the medical field. This paper presents an idea of building an automated system for
detection and identification of DR using the Asia Pacific Tele-Ophthalmology Society
(APTOS) 2019 Kaggle dataset. Two CNN models, ReseNet50 and VGG16, are used
for training and classification. The accuracy of the ResNet50 has been calculated to
be 81.7% and that of the VGG16 is calculated to be 80.5%.
Keywords Diabetic retinopathy (DR) · Convolution neural network (CNN)

models · Deep learning (DL) · Training · Classification
1 Introduction
The retina is the innermost, thin layer of tissue that is situated at the back of the eyeball
from inside. It is situated near the optic nerve. The retina receives the light that has
focused on the lens, converts it into the signals and transmits that signal to the brain for
enabling us to see [1]. The most common eye disease is diabetic retinopathy. Usually,
diabetic retinopathy affects those people who have had diabetes for a significant
number of years but they may have or have not gone through diagnosis. Diabetic
retinopathy can affect any diabetic person and if it is left untreated for a longer time,
it may become dangerous and the risk of blindness may increase [2].
S. Sharma (B) · S. Zanje

Department of Engineering and Physical Sciences, Institute of Advanced Research, Gandhinagar,
India
e-mail: sharma.f@gmail.com
D. Shah
Faculty of Engineering and Technology, Sankalchand Patel University, Visnagar, India
262 S. Sharma et al.
Sometimes increase in blood glucose levels causes changes in retinal blood vessels
and it may cause diabetic retinopathy. These blood vessels swell and leak fluid
occasionally, or even close off completely. In the other case, an abnormal new blood
vessel grows on the surface of the retina [3]. The patients suffering from diabetic
retinopathy may subject to blurry vision or a black spot appear on the image or
complete vision loss. Early identification of Diabetic retinopathy is more important
to recover the eyesight and provide treatment in time. Ophthalmologists perform the
identification of DR manually which has drawbacks like:
. It is a time-consuming task and expensive.
. Preclinical signs are not easily detected with manual grading.
1.1 Types of Diabetic Retinopathy
Generally, diabetic retinopathy is divided into two levels:

1. Proliferative diabetic retinopathy (PDR)
2. Non-proliferative diabetic retinopathy (NPDR)
NPDR is further subdivided into three stages:
1. Mild
2. Moderate
3. Severe non-proliferative diabetic retinopathy (Fig. 1).
2 Literature Review
In machine learning selecting reliable features is important for classification but DL

overcomes that problem. On the other hand, DL needs a huge size of the dataset for
classification. Authors in [5] classified 73% fundus images into DR or no DR while
27% classified images into one or more stages. On the other side, 70% of studies
did not detect the affected lesions while 30% detected the affected lesions. Only 6%
Fig. 1 Diabetic retinopathy

[4]
Early Identification of Diabetic Retinopathy … 263
classified images and detected the affected lesions. In [6] authors have used machine
learning techniques for the detection of diabetic retinopathy. First preprocessing is
done on retinal fundus images using the green channel, histogram equalization, crop-
ping, and resizing techniques. Images were divided into two different datasets; one
was in normal retinal images and the other was in affected retinal images. A total
of 14 features of DR were extracted from both normal and diabetic retinal fundus
images datasets. These even features were used for comparison and classifying the
images into normal and diabetic fundus images. As per the authors, it was observed
that exudate is the best feature that can be used primarily for diabetic detection from
the results obtained. After exudate blood vessels and other features can be used for
detection. Authors in [7] covered a detailed survey about the identification of diabetic
retinopathy in light of almost 150 research articles, summarized with the collection
of retinal datasets, adoption of different kinds of methodologies to detect diabetic
retinopathy, and selection of the performance evaluation metrics for the representa-
tion of their outcomes. Initially, retinal datasets are discussed and then several kinds
of approaches have been explained to detect the retinal abnormalities including retinal
neovascularization, hemorrhages, microaneurysm, and exudates. Moreover, the role
of evaluation metrics for computer-aided diagnosis (CAD) systems has been briefly
discussed. Authors in [8] suggest that early detection of glaucoma is necessary and
that treatment should be done in time. Otherwise, in affected patients, it can cause
permanent blindness. For this he proposed morphological filtering algorithms for
preprocessing or enhancement of retinal images. In the morphological enhancement
module, input images are converted into gray scale images, then a channel extracts
the optic cup and optic disc from the images. Here, top-hat transform highlights the
bright objects on a dark background, and bottom-hat transform highlights the darker
regions of the image [9]. Then, in morphological operations, bottom-hat transform
is subtracted from top-hat transform and channel images are merged, and converted
into grayscale images.
3 Methodology
Figure 2 represents our proposed system architecture. First step includes acquiring
the input image. We have used ATPOS 2019 Kaggle dataset which is available freely
online [10]. In the second step, various preprocessing steps like cropping, resizing,
converting to gray and Gaussian blur have been applied on the dataset so that training
of the model may improve. In the third step of architecture, feature extraction is
applied which includes various features like Microaneurysm, exudates, hemorrhages,
and blood vessel. Various features are extracted in the third step, and then in the fourth
step classification is done. Classification is done by detecting it has DR or No DR.
4 Experimentation Setup
Environment: Python is a high-level programming language. It is the most popular

programming language used for machine learning and convolution neural network
algorithms. For model training, Google Collaboratory, a cloud-based service is used
as it provides a free online cloud-based Jupyter notebook environment that allows us
to train our machine learning and convolution neural network models and provides
free access to GPU.
5 Dataset
We have used fundus retinal images from the Kaggle challenge of APTOS 2019
blindness detection [10]. The dataset consists of 3662 train images, 1929 test images
and is divided into five different classes as no DR, mild, moderate, severe, and
proliferative DR. Class’s name is given in numbers starting from 0 to 4. We have
used 2929 images for training and 733 images for testing. In this dataset, id_code is
used for different images name, and their diagnosis level is given as below in Fig. 3.
Fig. 3 Classes of retinal

fundus images
6 Image Processing
To enhance and extract useful features from images, image processing is used. Oper-
ations such as circular cropping, resizing, converting to gray, applying Gaussian blur
are performed on images.
6.1 Input Fundus Images
See Fig. 4.
6.2 Gray Fundus Images
See Fig. 5.
6.3 Gaussian Blur
See Fig. 6.
Fig. 4 Sample fundus images

Fig. 5 Sample gray images
Fig. 6 Applying Gaussian blur
7 Convolution Neural Network Models
A convolution neural network (CNN) is used as a deep learning method. Some CNN
architectures can be used for image processing, object detection, segmentation, and
image classification. There are various types of CNN architecture with pre-trained
ImageNet weights like:
. ResNet50
. VGG16
. MobileNet
. Inception Net
. Dense Net.
We have used ResNet50 and VGG16 for training, testing, and classification.
Parameters such as batch size, learning rate, epochs, height, width, are used for
training. Image data generator function is used for data augmentation, data flow is
used to prepare the dataset and split it into training, validation, and testing. Here,
2344, 585, 733 images are validated for training, validation, and testing.
7.1 Training Using ResNet50 and VGG16
The dataset is ready for the training phase after initializing the above-mentioned steps.
We did transfer learning using standard ResNet50 and VGG16 CNN architecture with
pre-trained ImageNet weights. As the standard ImageNet weights classify the objects
into 1000 categories so we excluded the top layer and added some layers of our own
in the model. The standard input image size for ResNet50 and VGG16 is 224, 224,
3 but we have used 320, 320, 3.
We froze all the layers in the base model and then added the following layers to
the original model:
. GlobalAveragePooling2D
. Dropout Layer (50% dropout)
. Dense (2048 inputs and ReLu activation) and Dense (512 inputs and ReLu
activation) for ResNet50 and VGG16, respectively
. Dropout Layer (50 percent dropout)
. Dense Layer (Softmax activation with 5 classes).
The method works as it is found that the kind of information needed to distinguish
between all the 1000 classes in ImageNet is often also useful to distinguish between
new kinds of images (fundoscopic retinal images in our case).
Fine-Tuning
Fine-tuning is one of the approaches to transfer learning. Here pre-trained model
is used for training with 2 epochs on our new dataset then fine-tuning is done by
unfreezing the whole base model (or a part of it), and retrain the whole model with
a very low learning rate of 0.0001. Here, binary cross-entropy and Adam optimizer
is used and early stopping is used with mode minimum and it monitors according to
val_loss and verbose 1 is used. Fine-tuning is done using the same model again to
tweak the parameters of already trained networks. Because initial layers learn very
general features and we want to go higher up the network so that layers tend to learn
features more specific to the task it is being trained. We used Adam optimizer, 2
epochs, and categorical cross-entropy as loss function in the first case and binary
cross-entropy in the second case. In fine-tuning the model is trained for 12 epochs
and 22 epochs for Resnet50 and VGG16, respectively, with a batch size of 8.
8 Result
Keras is used to train a classifier for detecting whether a person is having DR or not.
During the process, we keep track of training, validation accuracy and loss.
8.1 Result of ResNet50
Model Accuracy graph: The learning curves (training, validation accuracy and loss)
of ResNet50 model which took almost 26 min for 12 epochs as mentioned below
(Figs. 7 and 8; Table 1).
Fig. 7 Training (blue line) and validation (orange line)
Fig. 8 Confusion matrix shows how accurate it is predicting using different shades
Table 1 Classification report of ResNet50

Class Precision Recall F1-score Support
0 0.97312 0.98638 0.97970 367
1 0.66667 0.59459 0.62857 74
2 0.70485 0.83770 0.76555 191
3 0.34043 0.44444 0.38554 36
4 0.80952 0.26154 0.39535 65
Accuracy 0.81719 733
Macro avg 0.69892 0.62493 0.63094 733
Weighted avg 0.82670 0.81719 0.80745 733
Fig. 9 Training (orange line) and validation (orange line)
Fig. 10 Confusion Matrix shows how accurate it is predicting using different shades
8.2 Result of VGG16
Model Accuracy graph: The learning curves (training validation accuracy and loss)
of VGG16 model which took almost 43 min for 22 epochs as mentioned below
(Figs. 9 and 10; Table 2).
8.3 Comparison of Results
Table 3 shows the comparison of results of ResNet50 and VGG16. We can see that
train accuracy of ResNet50 is high compare to VGG16 and test accuracy is slightly
high of ResNet50. There is not much difference between test accuracy of both models,
so can say that ResNet50 is predicting little better for some classes.
Table 2 Classification Report of VGG16

Class Precision Recall F1-score Support
0 0.98383 0.99455 0.98916 367
1 0.56098 0.62162 0.58974 74
2 0.66397 0.85864 0.74886 191
3 0.45000 0.25000 0.32143 36
4 0.46154 0.09231 0.15385 65
Accuracy 0.80491 733
Macro avg 0.62406 0.56342 0.56061 733
Weighted avg 0.78526 0.80491 0.77935 733
Table 3 Results of CNN

Model Train accuracy Test accuracy
model
Resnet50 0.905 0.817
VGG16 0.836 0.805
9 Conclusion
Automated system for diabetic retinopathy will help every person in all aspects like
time, cost, etc. The system helps ophthalmologists to give fast and assured treatment
to their patients. Preprocessing is a very important part and in our paper; we showed
that it has improved our accuracy to a great extent. In our paper, 2344, 585, and 733
images are used for training, validation, and testing of the model. We obtained best
results using the Resnet50 and VGG16 model with ImageNet pre-trained weights
and a softmax layer with 5 output units at the end. Training accuracy of ResNet50
model is high than VGG16 but test accuracy is almost the same for both.
10 Future Work
As there are very few images for classes 3 and 4 so our model got less information
about these classes. Therefore, either we need more data for these classes or we need
to augment the data more. We can use bounding boxes to extract features of DR
in images while testing on individual images and can show the probability of how
accurate the model is predicting.
11 Competing Interest
The authors declare that they have no Competing interest.

References
1. Healthline. https://www.healthline.com/human-body-maps/retina#1
2. https://www.medicalnewstoday.com/articles/183417?c=1338628189797
3. Webmd. https://www.webmd.com/diabetes/diabetic-retinopathy
4. Eye 7. https://www.eye7.in/retina/diabetic-retinopathy/
5. Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep
learning techniques: a review. Inf Med Unlocked 20(2020):100377
6. Sisodia DS, Nair S, Khobragade P (2017) Diabetic retinal fundus images: pre-processing and
feature extraction for early detection of diabetic retinopathy. Biomed Pharmacol J 10(2):615–
626
7. Mateen MM, Wen J, Hassan M, Nasrullah N, Sun S, Hayat S (2020) Automatic detection of
diabetic retinopathy: a review on datasets. Meth Eval 8
8. Johri A et al (2021) Enhancement of retinal images using morphological filters. Data
engineering and intelligent computing. Springer, Singapore
9. Bhadauria AS, Nigam M, Arya A, Bhateja V (2018) Morphological filtering-based enhance-
ment of MRI. In: Proceedings of 2nd international conference on computing, communication
and control technology (IC4T), Lucknow, (U.P.), India, pp 54–56
10. Kaggle. https://www.kaggle.com/c/aptos2019-blindness-detection/data?select=train_images
Performance Evaluation of MLP
and CNN Models for Flood Prediction
Ippili Saikrishna Macharyulu, Deba Prakash Satapathy, Abinash Sahoo,

Sandeep Samantaray, Nihar Ranjan Mohanta, and Arkajyoti Ray
Abstract Accurate and reliable forecasts with an appropriate lead-time affect oper-
ational flood control systems for making required arrangements against floodings.
Developing a suitable artificial intelligence (AI) model for flood forecasting poses
a severe challenge in terms of interpretability and accuracy. Due to nonlinearity
and uncertainty of floods, prevailing hydrological solutions consistently attain less
prediction robustness. Thus, present work developed a flood model utilising a convo-
lution neural network (CNN) to move forward from artificial neural network (ANN)
that has been broadly applied for developing flood models to secure diversity and
establish model’s suitability. The mean squared error (MSE) and Willmott index
(WI) of CNN were 1.743 and 0.9878, respectively, representing an excellent overall
model performance in flood prediction. The conclusive results indicated that CNN
generated improved forecasting results than MLP models and can be recommended
for monthly flood forecasting. Using commonly accessible data of the region crucial
for prediction, the outcomes would be helpful for real-time flood forecasting, evading
complexity of physical procedures.
Keywords Flood · MLP · CNN · Subarnarekha River
I. S. Macharyulu · A. Ray
Department of Civil Engineering, GIET University, Bhubaneswar, Odisha, India
e-mail: saikrishnaar@giet.edu
D. P. Satapathy · S. Samantaray (B)
Department of Civil Engineering, OUTR, Bhubaneswar, Odisha, India
e-mail: sandeep1139_rs@civil.nits.ac.in
D. P. Satapathy
e-mail: dpsatapathy@cet.edu.in
A. Sahoo
Department of Civil Engineering, NIT Silchar, Assam, India
N. R. Mohanta
Department of Civil Engineering, NIT Raipur, Raipur, Chhattisgarh, India
274 I. S. Macharyulu et al.
1 Introduction
Of several natural disasters, floods are the most dangerous since they often cause loss
of life and assets each year and destroy settlements, farmland, and roads worldwide
[1–3]. It was reported that during 2011–2012, the flood disasters influenced about
200 million people, and total indirect damages were around 95 billion dollars. India,
in particular, has been facing frequent floods for so long, and this disaster causes huge
amounts of property losses and fatalities. Hence, it is vital to predict floods and recog-
nise the susceptibility zones [4]. Floodings are a complex and nonlinear process; as
a result, it is impossible to prevent floods entirely [2, 5, 6]. However, we can predict
future flood occurrences and help in mitigating human and economic damages. ANNs
are nonlinear nonparametric regression systems [7–9] known as ‘universal approx-
imators’, i.e., if provided with an adequate number of hidden neurons, they have
the ability to approximate every continuous function when trained with an informa-
tive data set [10, 11]. CNN is among the most prevalent models utilised today. This
computational neural network model utilises a variation of MLPs and comprises one
or additional convolutional layers which can either be entirely linked or pooled.
Tiwari and Chatterjee [12] explored ability of bootstrapping and wavelet methods
by applying a hybrid bootstrap–wavelet–ANN (BWANN) model to develop a reliable
and accurate model for forecasting hourly flood magnitude of Mahanadi river basin,
India. Results revealed that robust BWANN model generated better outcomes than
other applied models in their study. Kim and Singh [13] developed and applied
MLP, GRNN, and SOM for flood forecasting at Sangye site of Bocheong stream
watershed, Republic of Korea. Their findings revealed that SOM forecasted flood
discharge more accurately than MLP and GRNN during testing period. Hong and
Hong [14] evaluated the usage of MLP neural network to forecast water levels of a
gauge station positioned at Kuala Lumpur city centre, Malaysia. Phitakwina et al.
[15] implemented MLP-CS (Cuckoo search) for predicting 7 h ahead water level for
developing a flood model of River Ping, Thailand. Results indicated that MLP-CS
model performed better than simple MLP model. Le et al. [16] suggested an long
short-term memory (LSTM) model for flood forecasting using daily discharge and
rainfall of Da River basin, Vietnam, as input data. Wang et al. [17] introduced CNN to
evaluate flood susceptibility in Shangyou County, China and compared the obtained
results with conventional support vector machine (SVM) classifier. They concluded
that CNN could help manage and mitigate floods. Suddul et al. [18] proposed and
evaluated MLP, MLP-GA (genetic algorithm), MLP-BA (bat algorithm), and MLP-
BA-GA models for automated and real-time river flood prediction. They found that
MLP-BA-GA model provided improved river flood prediction with better accuracy
and reliability. Duan et al. [19] developed temporal CNN model for predicting long-
term streamflow in California within catchment characteristics. Results of developed
model were compared with LR, RNN, ANN and LSTM models, which showed the
ability and potential of temporal CNN model. Song [20] used CNN to develop a runoff
model for Heuk River, South Korea. He found great potential in implementing the
CNN model with better results and accuracy.
Performance Evaluation of MLP and CNN Models for Flood Prediction 275
The objective of present research is to study the potential of CNN model in

forecasting flood magnitude taking historical discharge data of Jamsolaghat gauging
station of Barak river basin, India.
2 Study Area
Subarnarekha river basin lies between 21° 33' N to 23° 32' N and 85° 09' E to 87°
27' E covering 19,300 km2 area and originating near Nagri village in Ranchi district
(Fig. 1). Present study area covers the central and lower watersheds of the river
basin. Subarnarekha flows through extreme southwestern regions of West Medinipur
district (West Bengal) and easternmost regions of Baleswar and Mayurbhanj districts
of Odisha. Average annual precipitation fluctuates between 1150 and 1500 mm, with
most rain experienced from June to October. In winter, minimum temperature is as
low as 8 °C, whereas during summer season, temperature ranges from 40 to 45 °C.
3 Methodology
3.1 Mlp
Rumelhart et al. [21] developed a feed-forward MLP network usually applied for
problems associated with pattern mapping. The MLP network utilised in present
study comprises a sensory unit set that establishes an input layer, one or more hidden
layers with computational neurons, and an output layer [22]. A neuron contains a
single output with multiple inputs. The basic equation representing net in an MLP
network is as follows:
Σ
net = xi .wi − b (1)
where, w—weights, b—bias, and x—input. Then neuron’s output, f (net), is selected
by an activation function that finds a response of node to input signal it accepts.
3.2 CNN
CNNs are multilayer feed-forward NNs demonstrating robust performance in the

field of image processing and computer vision, which extracts valued features auto-
matically from raw data [23]. In recent times, prediction and classification using
CNN have been employed increasingly in various disciplines. Using multiple layers,
Fig. 1 Proposed study area
local connections, shared weights, and pooling, CNN is differentiated from a conven-
tional NN. CNN’s primary idea is that input data are images or can be inferred as
images. This significantly reduces a number of parameters which results in more fast
processing. CNN is an optimal architecture designed for detecting patterns in 1D
and 2D data as they can be customised based on the application’s number and kind
Fig. 2 Basic architecture of CNN
of layers. Architecture of CNN primarily constitutes an input layer, multiple hidden

layers, and an output layer. Hidden layers are composed of one or more pooling and
convolutional layers [24, 25]. These architectures can be categorised into two based
on classification and regression problems (Fig. 2).
3.3 Evaluating Constraint
Discharge data (Dt ) of monsoon season (June–October) are collected from CWC,
Guwahati, for a period of 1988–2019. The data collected from 1988–2011 (75%
of data) are utilised for training and from 2012–2019 (25%) are utilised for testing
the models. Three constraints WI, R2 , and MSE are applied to evaluate the model
performance.
1 Σ( )2
n
MSE = Yi − Ŷi (2)
n k=1
⎡ ⎤
N (
Σ )2
⎢ Yi − Ŷi ⎥
⎢ k=1 ⎥
WI = 1 − ⎢ N (| | | | ) ⎥ (3)
⎣Σ | | | | ⎦ 2
| i
Ŷ − Yi| + | i
Ŷ − Ŷi|
k=1
The present study aims to familiarise practicability of hybridised AI model for flood
forecasting. Monthly data of Jamsolaghat meteorological station was utilised for
developing the proposed CNN model and then compare its performance against MLP.
Before the forecasting procedure, correlation analysis is used to determine the related
lag times of forecasting matrix for constructing the predictors. Table 1 summarises
five input combinations assimilated with four different lag times. For more descriptive
assessment of proposed prediction models, Fig. 3 demonstrated scatter plots amid

predicted (y-axis) and observed (x-axis) flood values in testing phase where deviation
between observed and prediction values was specified by formula of linear regression.
It is clear that predictions by CNN showed better agreement with observed monthly
flood over MLP. This shows the forecasting ability of CNN model which is far
superior than the MLP model.
The predicted flood values by MLP and CNN for scenario IV in the Jamsolaghat
station is shown in Fig. 4. It is found that predictions made by CNN model are closer
to the observed flood values, whereas standalone MLP gave poor prediction results.
The discrepancy in flood magnitudes is separately computed for both training and
testing phase which are displayed in the form of box plots, as shown in Fig. 5. The
resultant box plot and its distribution by CNN are similar to observed data. From
Fig. 5, it is clear that for all kinds of flood scenarios considered, CNN gives better
performance than MLP.
The values of statistical performance evaluation and graphical analysis indicate
that CNN model is capable of simulating the flood better than standalone MLP model.
Table 1 Performance of flood simulation models

Station name Model name MSE WI MSE WI
Training Testing
Jamsolaghat MLP-1 12.08 0.9476 17.431 0.9264
MLP-2 11.36 0.9498 16.227 0.928
MLP-3 10.997 0.952 15.832 0.9302
MLP-4 9.732 0.9527 14.84 0.9326
CNN-1 3.559 0.9807 7.67 0.959
CNN-2 2.961 0.9839 6.992 0.9613
CNN-3 2.09 0.986 6.089 0.9636
CNN-4 1.743 0.9878 5.863 0.9654
Fig. 3 Scatter plots of actual versus predicted flood

Fig. 4 Predicted versus actual monthly flood discharge of MLP and CNN models
Fig. 5 Boxplots of actual and predicted flood values by selected models

5 Conclusion
One of the most complex and challenging problems in hydrology is flood forecasting.
However, because of its critical contribution to reducing life and economic losses,
it is also one of the most significant aspects of hydrology. From the standpoint of
providing reliable forecasts and avoiding difficulty of physical processes, in this
study, CNN and ANN models were applied to predict floods at a specific location.
This work recommends the potential of using CNN model in the hydrological field
of study to construct and manage real-time flood warning systems. Results indicated
that accuracy of CNN was better than MLP for monthly flood forecasting in the
study area. With this knowledge, estimating future river system floods is promising
by utilising past rainfalls and water levels from stations without any comprehensive
data requirements.
References
1. Sahoo A, Ghose DK (2021) Flood frequency analysis for menace gauging station of Mahanadi
River, India. J Inst Eng (India): Series A, pp 1–12
2. Sahoo A, Samantaray S, Ghose DK (2021) Prediction of flood in Barak River using hybrid
machine learning approaches: a case study. J Geol Soc India 97(2):186–198
3. Samantaray S, Tripathy O, Sahoo A, Ghose DK (2020) Rainfall forecasting through ANN
and SVM in Bolangir watershed, India. In: Smart intelligent computing and applications, pp
767–774. Springer, Singapore
4. Samantaray S, Sahoo A, Agnihotri A (2021) Assessment of flood frequency using statistical and
hybrid neural network method: Mahanadi River basin, India. J Geol Soc India 97(8):867–880
5. Samantaray S, Sahoo A (2019) Estimation of flood frequency using statistical method:
Mahanadi River basin, India. H2Open J 3(1):189–207
6. Sahoo A, Samantaray S, Paul S (2021b) Efficacy of ANFIS-GOA technique in flood prediction:
a case study of Mahanadi river basin in India. H2Open J 4(1):137–156
7. Samantaray S, Ghose DK (2018) Dynamic modelling of runoff in a watershed using artifi-
cial neural network. In: Smart intelligent computing and applications, pp 561–568. Springer,
Singapore
8. Samantaray S, Ghose DK (2020) Modelling runoff in an arid watershed through integrated
support vector machine. H2Open J 3(1):256–275
9. Samantaray S, Ghose DK (2021) Prediction of S12-MKII rainfall simulator experimental runoff
data sets using hybrid PSR-SVM-FFA approaches. J Water Clim Change. https://doi.org/10.
2166/wcc.2021.221
10. Sahoo A, Samantaray S, Bankuru S, Ghose, DK (2020) Prediction of flood using adaptive
neuro-fuzzy inference systems: a case study. In: Smart intelligent computing and applications,
pp 733–739. Springer, Singapore
11. Sahoo A, Singh UK, Kumar MH, Samantaray S (2021c) Estimation of flood in a river basin
through neural networks: a case study. In: Communication software and networks, pp 755–763.
Springer, Singapore
12. Tiwari MK, Chatterjee C (2010) Development of an accurate and reliable hourly flood
forecasting model using wavelet–bootstrap–ANN (WBANN) hybrid approach. J Hydrol
394(3–4):458–470
13. Kim S, Singh VP (2013) Flood forecasting using neural computing techniques and conceptual
class segregation. JAWRA J American Water Resour Assoc 49(6):1421–1435
14. Hong JL, Hong K (2016) Flood forecasting for Klang river at Kuala Lumpur using artificial
neural networks. Intl J Hybrid Inf Technol 9(3):39–60
15. Phitakwinai S, Auephanwiriyakul S, Theera-Umpon N (2016) Multilayer perceptron with
cuckoo search in water level prediction for flood forecasting. In: 2016 international joint
conference on neural networks (IJCNN). IEEE, pp 519–524
16. Le XH, Ho HV, Lee G, Jung S (2019) Application of long short-term memory (LSTM) neural
network for flood forecasting. Water 11(7):1387
17. Wang Y, Fang Z, Hong H, Peng L (2020) Flood susceptibility mapping using convolutional
neural network frameworks. J Hydrol 582:124482
18. Suddul G, Dookhitram K, Bekaroo G, Shankhur N (2020) An evolutionary multilayer percep-
tron algorithm for real time river flood prediction. In: 2020 zooming innovation in consumer
technologies conference (ZINC). IEEE, pp 109–112
19. Duan S, Ullrich P, Shu L (2020) Using convolutional neural networks for streamflow projection
in California. Frontiers Water 2:28
20. Song CM (2020) Hydrological image building using curve number and prediction and
evaluation of runoff through convolution neural network. Water 12(8):2292
21. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error
propagation. Parallel distributed processing, vol 1. MIT Press, Cambridge, pp 318–362
22. Mohanta NR, Panda SK, Singh UK, Sahoo A, Samantaray S (2022) MLP-WOA is a successful
algorithm for estimating sediment load in kalahandi gauge station, India. In: Proceedings of
international conference on data science and applications, pp 319–329. Springer, Singapore
23. Zhang C, Sargent I, Pan X, Li H, Gardiner A, Hare J, Atkinson PM (2018) An object-based
convolutional neural network (OCNN) for urban land use classification. Remote Sens Environ
216:57–70
24. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
25. Ghorbanzadeh O, Blaschke T, Gholamnia K, Meena SR, Tiede D, Aryal J (2019) Evaluation
of different machine learning methods and deep-learning convolutional neural networks for
landslide detection. Remote Sens 11(2):196
Bidirectional LSTM-Based Sentiment
Analysis of Context-Sensitive Lexicon
for Imbalanced Text
P. Krishna Kishore, K. Prathima, Dutta Sai Eswari, and Konda Srikar Goud
Abstract The authors plan to use a context-based lexicon as a source of energy to

investigate totally unbalanced text sentimental analysis in this article. This strategy
addresses two important issues in text sentiment classification: scenario interdepen-
dence and data corpus imbalance. To begin, it identifies subjective words in different
contexts and computes weight rankings for vague generalisations and full review. As
a result, RNNs have recently been used to solve real-world problems, specifically
natural language processing tasks. The authors employ a context-based lexicon as
well as a bidirectional LSTM to interact with text sentiment classification. Second,
it interacts with unbalanced data by generating modern and unconventional text test
results using a text-based post-processing technique. The experimental results show
that using a sentiment lexicon similar to the framework and combining bidirectional
LSTM with text-based sampling is beneficial in unbalanced text sentiment classi-
fication and yields state-of-the-art results when compared to deep neural learning
model benchmarks.
Keywords Imbalanced data · Class imbalance · Long short-term memory ·

Sentiment classification · Supervised learning · Word embedding · Oversampling ·
Recurrent neural networks
P. Krishna Kishore (B) · D. S. Eswari · K. S. Goud

Department of IT, BVRIT Hyderabad College of Engineering for Women, Hyderabad, India
e-mail: krishna.boinapalli@gmail.com
D. S. Eswari
e-mail: saieswari.d@bvrithyderabad.edu.in
K. S. Goud
e-mail: kondasrikargoud@gmail.com
K. Prathima
Department of CSE, BVRIT Hyderabad College of Engineering for Women, Hyderabad, India
e-mail: prathimareddy61@gmail.com
284 P. Krishna Kishore et al.
1 Introduction
Every form of Internet communication, including messaging, chatting, tweeting,

blogging, emailing, and reviewing, generates massive amounts of unstructured text
data. Consumers, manufacturers, and online retailing practitioners would benefit
from processing and analysing this vast and rich data to gain useful insights. In overall,
Web customers take the guidance of many other Internet users in order to minimise
the incidence associated with Internet market research. For instance, film reviews
and ranking could be used to calculate a film’s profitability and customer satisfaction
[1]. Customers read customer feedback to learn about the stage of satisfaction of the
customer who already purchased it [2]. Investigate news items to develop a better
understanding of the national sentiment towards political groups [3]. Tweets mood
could be used to forecast financial market efficiency [4]. As a result, it is evident that
every form of online correspondence includes sentient viewpoint [5]. Sentimental
analysis relates to the procedure of extracting the view point or sentiment hidden
inside qualitative text. As a result, data mining, knowledge extraction, information
retrieval, and natural language processing [6] are all involved in the computations
and research of contextual arrange to acquire muddled perception.
Previously, machine learning algorithms were extensively researched and found
to be effective in overcoming sentiment classification. These algorithms that learn
from massively labelled corpora [7]. Machine learning algorithms in classification
have been successful to feature extraction. Some of the traditional attribute selection
techniques used in machine learning include word embedding, multiple variations of
a bag of words (to make it easier for users and n-gram features), term frequency, and
differential record [8]. Later, similarity measure analysis and a positional counting-
based model was used to retrieve features extracted. However, these feature extraction
techniques are message resonance dependent and do not account for word purchase
requisitions, linguistic comparative, or grammar rules between phrases. Recently,
distributional text recognition designs or context predicting models such as word2vec
[9], avg-word2vec, tf-idf weighted word2vec, and glove [10] features extraction tech-
niques have been introduced and proved to be efficient in resolving this issue. Each
word in redistributive visual words is mapped to a contiguous lower dimension feature
space. The distribution of income hypothesis highlights that words showing up in
comparable situations have a closer connection, which is represented in their arith-
metic representation [11]. Word co-occurrence and singular value decomposition
methods are used to create count-based distribution methods. In the development of
an n-gram framework, for example, the combined posterior distribution is calculated
on a series of words. Context trying to predict designs is machine—learning models,
and having to learn word interpretations needs a massive data sets.
However, the job of content sentiment analysis in the equitable situation has been
comprehensively and successfully researched, there are drawbacks in implementing
conventional methods in the case of minor extremely unbalanced lexicon or data
set [12]. First, there is insufficient training data in at least one class label in the
imbalanced data corpus. Existing research has demonstrated the effectiveness of
Bidirectional LSTM-Based Sentiment Analysis … 285
resizing methods such as arbitrary oversampling, arbitrary oversampling, SMOTE,

cover different, and those in balancing mathematical depictions of textual data [13].
Since these techniques do not take account the semantic and syntactic relationships
of statements, their efficiency is confronted in the setting of a totally unbalanced text.
Moreover, principles that can be applied methods were particularly created to deal
with totally unbalanced mathematical applications. Second, they do not take account
the conceptual different variants of phrases in regards to context.
Authors wish to suggest a framework that explains the context-sensitive lexicon
and imbalanced class questions faced in the foregoing section. To start, the method
incorporates a bidirectional LSTM with such a context-sensitive lexicon to collect
long and short-term relationships [14]. The rating of the review is then calculated on
the basis sum of the perception lexicon’s sentiment variables. After which, in order
to create new text sample data, a message altering method is used. This process
results in new message tests by inverting majority samples and imitating minority
class samples.
The remainder of the paper is addressed in the sections that follow. Section 2
provides an overview of classification tasks, methods, and techniques used in the
literature to confront classification techniques and the nuts and bolts of extremely
unstable sentiment analysis. The proposed model is extensively discussed in State-
ment 3. Sections 4 describes the testing procedures and discuss the findings. Section 5
is the final part.
2 Related Work
This article presents trustworthy background information on property lexicon

generation, sentiment analysis, semi-supervised learning, and extremely unstable
reviews.
2.1 Classification of Sentiments
It is really prominent that conceptual text contains lots of subjective opinions and is
never objective. In this case, sentiment analysis skilled at evaluating whether qual-
itative text is optimistic, false, or indifferent in words of sets of assumptions. A
review sites is a brief record. As a result, the authors regard evaluation classifica-
tion as paragraph sentiment analysis. To begin, the book describes machine learning
techniques that have been used in the literary works to manage sentiment classifica-
tion. Deep learning techniques for sentiment analysis are classified into four stages:
semi-supervised, ensemble learning, supervised, and unsupervised.
2.2 Techniques of Supervised Learning
When training data is used to identify an effective set of features for sentiment
categorization, machine learning classifiers are used. Supervised machine learning
algorithms necessitate a substantial amount of labelled training data in order to do
this. Many supervised algorithms, including Naive Bayes, support vector machines,
and maximum entropy classification are utilised to classify movie reviews. Each
film review is treated as a separate piece of work when it comes to determining how
viewers feel about it [15]. The bag of words model, which tracks the recurrence
of terms and phrases in feedback, represents each evaluation as a feature vector.
POS tags and classifier attributes are used to create feature-based representations
of sentiment orientation [16]. The dominating collection of n-gram characteristics is
selected using semantic information and syntactic interactions between n-grams [17].
Bespalov et al. [18] used latent overview and n-gram characteristics to construct a
word embedding model for sentiment categorization. The authors compared support
vector machines, character-based n-grams, and Naive Bayes for supervised machine
learning.
There has been investigation on a neural network-based approach for encoding
tweet conductivity characteristics into word representations [19]. SVM and ANN,
two supervised algorithms, are combined to recognise movie reviews. First, SVM
allocates a sentiment value to every factor and calculates which ones are the most
suitable. ANN [20] is then employed to determine the categorization accuracy. Using
four feature selection strategies, IG, GR, CHI and DF, and five supervised designs
(NB, KNN, DT, SVM, and RBFNN), Liu et al. [21] examined the performance
of four feature selection methods in the context of multi-class sentiment. Zhu and
colleagues developed an architecture that recognises long-term associations without
the usage of syntax in a statement and document simulation [22]. A two-way long
record approach has been used to collect global semantic meaning and compositional
linkages [23]. Recent years have seen the usage of CNN models [24, 25] and recurrent
neural networks [26, 27] for sentiment analysis of sentences.
2.3 Techniques with No Supervision
Unsupervised sentiment classification methods are those that do not require labelled
training data sets to work. Sentiment reviews were analysed using a sentiment lexicon
that includes a list of well-known sentiment phrases as well as their direction. The
authors referred to these strategies as lexicon-based methodologies. It was found
that Turney and Littman [28] used the knocks results of a search result to assess a
message’s sentiment orientation, but WordNet [29] used the short distance between
the word “outstanding” and the bi-polar start concentrating terms (“outstanding”
and “terrible”). By combining similarity interactions in WordNet to phrases and
records, Missen and Boughanem [30] derived the sentiment score. It is possible to
utilise physical dependence syntax to infer semantic relationships between tweets.

Interconnections between unconditioned variables can improve text messaging and
customer feedback [31].
2.4 Techniques for Semi-Supervised Learning
Data unlabelled and labelled data are both used in semi-supervised learning.
These algorithms continue to expand sentiment labels from concrete to abstract by
employing a semi-supervised methodology on a limited portion of training data [32].
In order to extend adjectives, Hatzivassiloglou and McKeown [32] employed a graph-
based classification system and a large number of associative rules. One of Zhu and
Ghahramani’s proposed clustering algorithms keeps shifting the labels throughout a
data set. Using an all-encompassing lexicon, he and Zhou [34] developed a framework
for merging preliminary classifier expertise with sentiment classifier information. By
employing the simple majority strategy on successfully categorised instances, Zhang
and He [35] hope to reduce the number of misclassified cases. An approach based
on multiple feature subspace-based self-training was proposed by Gao et al. [36] for
identifying appropriate features and informative examples for autonomous labelling.
Unsupervised and supervised data are combined in a semi-supervised framework
described by Da Silva et al. [37]. The first technique for calculating conceptual simi-
larities between document sentences is presented by Tai and Kao [38]. According to
Hamilton et al. [39], the best way to build opinion sentiment lexicons from corpora
is to combine label transmission with word representations that are particular to the
topic in question.
2.5 Ensemble Techniques
In ensemble learning, data is taught using a variety of classifier systems. Research by

Xia et al. [40] looked at the effectiveness of combining the ensemble technique and
local feature approaches. POS-tagged features and classification algorithms based on
word relations are combined with three machine learning approaches: Naive Bayes,
create bigger, and support vector machine (SVM). An technique developed by Onan
et al. [41] uses a combination of unsupervised clustering and random selection to
prune sentiment analysis data. The ensemble architecture suggested by Onan et al.
[42] blends an electronic voting machine with multi-objective evolutionary algo-
rithms for sentiment categorization. Five numerical keyword extraction techniques,
as well as classification and ensemble learning methods, were studied by Onan et al.
[43]. Perikos and Hatzilygeroudis [44] used three classifiers for emotion recognition:
knowledge base tool, maximum entropy, and Naive Bayes with majority voting. Text
normalisation, semantic indexing, and classification were proposed by Lochter et al.
[45] to classify sentiment in textual material.
A large number of randomly generated feature spaces was used by Li et al. [47]
to overcome the imbalanced sentiment analysis challenge [48]. Song et al. [48]
suggested two-way clustering-based algorithms to categorise unbalanced text data.
Oversampling and undersampling are combined using the SMOTE algorithm. Clas-
sifier performance on severely unbalanced Twitter post data was studied by Prus
et al. [49] using simulated data in the selection, management, and improving stages
of the classification process. Text resampling was examined by Moreo et al. [50]
by determining the minority class. This method is used to assess the relevance of a
characteristic’s distribution across a vast corpus of textual data [46]. This information
implementation outperforms in mathematical space.
The authors described a framework that controls the activity of totally unbalanced
text sentiment analysis in this section. The method described in [23] was used by the
author to calculate the sentiment score of evaluations in domain-specific records. In
general, domain-specific data provides the semantic meaning of words in the context
of sales. To collect the review sites’ local and nonlocal correctness, a bidirectional
LSTM is used. Consider a review R = wo1 , wo2 , wo3 , …, won which has n words.
Authors identified the subjective words wo1L , wo2L , wo3L , …, WOmL in review R using
a lexicon resource L. Then, as shown in Eq. (1), the sentiment value of review R is
computed. The sentiment score of subjective word woijL is represented by S value
(woijL ). The sentiment weight and partiality of the review R are represented by the
factors Aij and b, respectively. Because an overview may contain different paragraphs,
the researchers compared them into a small paragraph. As a consequence, there will
be only one bias all through the analysis.
Σ
m
S_value(R) = αij ∗ S_value(wo L )ij + b (1)
j=1
Between the data and the bidirectional LSTM layer is this layer. i. Bidirectional
LSTM: this layer performs the input data procedure (the entire review) within a
specific timeframe. It extracts resident and non-semantic information from the input
pattern by combining previous and future data from that time frame. The proposed
method’s entire model is depicted in Fig. 1. General framework of the proposed
model. Three distinct layers make up the model. ii. The embedding layer is in charge
of transforming arriving words into dense, real-valued matrices. One LSTM examines
the entire input from left to right, while another LSTM examines the same data in
the opposite direction (right to left). Finally, regression analysis is implemented on
highest part of the output nodes to conduct numeric text sentimental analysis.
Fig. 1 Overview of the proposed model
3.1 Bidirectional Long Short-Term Memory (BLSTM)
They started a series of gates that evaluate the amount of data that must be kept from
the previous condition as well as the extraction of features from the existing input
pattern. For the first time in recurrent neural networks, Hochreiter and Schmidhuber
(1997) proposed LSTM to solve the gradient vanishing problem. A double-layer
LSTM is a BLSTM. From left to right, the first LSTM computes the input data
token by symbol. The input pattern is encrypted from right to left by another layer.
The BLSTM model used in this study is shown in Fig. 2. BLSTM Model: BLSTM
extracts features from an input sequence word for word. It also has memory cell cts .
These elements are in charge of directing information flow from the past x 1 , x 2 , x 3 ,
…, x ts-1 and h1 , h2 , h3 , …, hts-1 to the present state hts and output gate ots . Each LSTM
cell has three gates: at each time, stamps, the memory cell cts , input gate its , forget
gate f ts, and an output gate ots are formally updated in the following way:
i ts = ([xts , h ts - 1 , cts - 1 ] + bi ) (2)
f ts = 1 − i ts (3)
( )
gts = Wg [xts , h ts - 1 ] + bg (4)
( )
cts = f ts ∗ cts - 1 + (i ts ∗) (5)
ots = (Wo [xts , h ts - 1 ] + bo ) (6)
h ts = ots ∗ (cts ) (7)

Fig. 2 Model of LSTM unit
The embedded dense real-valued vector of the input data word wots is denoted
by x ts . In input gate, there is a sigmoid function (σ) implementation on the variable
W i , where W i is described as the linear combination of variables are wi , pi , qi .
wi , pi , and qi are weight vectors of input word, hidden state, and memory cell,
respectively. W g is defined as concatenation of parameters x g , pg . × represents the
point-wise multiplication operation. W o is defined as concatenation of parameters
wo , po , Finally, bi , bg , bo are the bias level parameters of gts and ots, respectively.
To compute sentiment weights aij , each BLSTM layer generates a hidden state
vector for each input word. The two hidden states produced by the left to right LSTM
layer, and the right to left layers are denoted by hl and hr , respectively.
3.2 Calculating Sentiment Scores
This model’s section, the overall sentiment score of the review, was calculated by
the authors. To figure out the weights. Authors leveraged the hidden state vectors hl
and hr for each subjective word woij in the review R.
The following equations are used to calculate sentiment weights.
( )
FijR = σ WpR .Hij + (8)
( )
βijR = Wpw .FijR + (9)
The review’s overall bias score Rbias is calculated using the same methodology by
the authors. Rbias is calculated once for each overview and is focused on the whole
review, according to intellect. The following formula is being used to calculate Rbias .
F Bias = WFB . H + bFB (11)

( )
R bias = σ WBias . F Bias + (12)
Now, author’s leveraged Rbase and Rbias . Using the equation shown below, calculate
the overall sentiment score of the review R.
S_(R) = ∅R base + (1 − ∅)R bias (13)
∅ is a component with such a value between 0 and 1 which can be measured using
the equation below.
∅ = (Wϕ H B + bϕ ) (14)
The concatenation of the source and partiality component vectors is represented

by W∅ . H B estimate of the sum of the two hidden layer matrices as well as the previous
hidden vectors of left to right LSTM and right to left LSTM are combined to form
HB. Finally, logistic regression analysis was used to evaluate the probability values
for the hidden layers. The odds of establishing whether the evaluation is positively
or negatively are given below.
+ = (S_(R))
probRi (15)
3.3 Resolving the Issue of Class Imbalance
Existing approaches and algorithms are based on the principle of generating new
synthesised points based on numerical data distribution. Traditional methods and
technologies for dealing with unlabelled data have been successful in a wide range
of applications, but they are restricted to numeric value distributions. To address the
issue of unequal class distribution, the researchers used a new data target population
with [51] inversion and illustration. This method generates modern experimental
texts by reversing a selected cluster and mimicking minority class examples. To be
more specific, this technique balances the data while also generating new messages
in the minority class. This method produced a single text based on the distribution
of minority classes. It should be noted that the negative classifier is considered the
minority class, whereas the positive class distribution is considered the majority
class.
4 Experimental Study
The authors first presented experimental settings, followed by our results, which
confirmed the performance of the proposed method based on the experimental
analysis in this section of this paper.
4.1 The Experimental Environment
Four Amazon multi-domain review data sets and the SemEval-2013 Twitter data set
are used to validate the proposed model (subtask B). Electronics, DVDs, kitchen, and
books are the four Amazon multi-domain analysis data sets analysed. To produce an
extremely unbalanced variant for Amazon review data sets, the researchers have used
the same distribution rule as in [51]. Each domain has 1000 positive and 400 nega-
tive samples. The unbalancing ratio from each of the Amazon multi-domain review
data sets is 1:2.5. There really are 6400 Twitter posts in the SemEval-2013 Twitter
data set. The authors have been using 3550 tweets as the training examples (posi-
tive:2600 and negative:950) and 2850 tweets as the testing data set (positive:1950
and negative:900). The SemEval-2013 Twitter data set does have a consistency ratio
of 1:2.45. Word removal, punctuation symbols, and unexpected words have been
eliminated, and every word was translated to lowercase letters. The authors have
identified the personal words and their rankings using SentiWordNet 3.0 [52], a
lexicon source specifically developed for sentiment classification implementations.
In SentiWordNet 3.0, the scoring system of lexicon words ranges from 0 to 1.
4.2 Results and Discussion
An assessment for evaluating imbalanced data sets varies from those for analysing
balanced data sets. In this article, the authors use correctness and area under curve
(AUC) performance evaluation (area under curve). The authors employed the same
techniques in [53] and [51] to evaluate the efficacy of the proposed method. In words
of measurement exactness, the proposed model is also evaluated by comparing to
the SE-HyRank Lexicon [54], the cluster UC, and the cost sensitive algorithms [55].
For four Amazon multi-domain review data sets, the proposed model outperforms
comparison algorithms in terms of effectiveness. This is clear from the data in Table
1. Despite receiving an F1-score of 81.2%, the proposed approach lagged behind
the other data sets in the race. Province accuracy and F1 rating results, as well as
modelling techniques suggested. The F1-score is defined in Eq. (16) and is used as
a metric in the SemEval-2013 Twitter data set.
Precision ∗ Recall
F1score = 2 ∗ (16)
Precision + Recall
Table 1 Accuracy and F1-score results for the state of the arts and the proposed model
Books ACC DVD ACC Electronics ACC Kitchen ACC SemEval-2013
F1
Se-HyRank – – – – 81.7
lexicon
Cluster UC 71.3 73.1 79.7 80.3 –
Cost 66.3 71.2 74.8 77.3 –
sensitive
Unified 78.5 75.5 78.6 81.1 81.5
model
Proposed 79.2 76.4 80.0 81.9 81.2
model
Fig. 3 ROC for books

dataset
The ROC curves for four Amazon multi-domain data sets are shown in the figures
(books, DVD, electronics, and kitchen). Books, DVDs, electronics, and a kitchen are
depicted in Figs. 3, 4, 5 and 6. Similarly, authors achieved an accuracy (ACC) of
79.2% for the books data set, 76.4% for the DVD data set, 80.0% for the electronics
data set, and 81.9% for the kitchen data set. The AUC for the books data set was
77%, the DVD data set had an AUC of 85%, the electronics data set had an AUC of
87.5%, and the kitchen data set had an AUC of 82%.
5 Conclusion
The authors proposed a method for communicating with the assignment of imbal-
anced text sentiment analysis in this article. It evaluates evaluation sentiment value
as a weighting factor of interpretive word sentiment words. To identify subjective
terms in evaluations, this method allows use of its lexicon resource. It has used a bidi-
rectional LSTM to determine the sentiment value of evaluations. The goal of using a
Fig. 4 ROC for DVD

dataset
Fig. 5 ROC for electronics

dataset
Fig. 6 ROC for kitchen

dataset
BLSTM is to determine sentiment strength exercises for subjective words using both
local and nonlocal semantic text features. It employs logistic regression on top of
BLSTM to measure responsibility and evaluate sentiment. Finally, using a text-based
over sampling method, the authors generate additional test results by inverting the
class membership and mimicking the underrepresented class. According to the results
of the experiments, the proposed method outperforms existing extremely unbalanced

text sentiment classification approaches in terms of effectiveness.
References
1. Mishne G, Glance NS (2006) Predicting movie sales from blogger sentiment. In: AAAI spring
symposium: computational approaches to analyzing weblogs, pp 155–158
2. Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: opinion extraction and
semantic classification of product reviews. In: Proceedings of the 12th international conference
on World Wide Web, pp 519–528
3. Godbole N, Srinivasaiah M, Skiena S (2007) Large-scale sentiment analysis for news and blogs.
Icwsm 7(21):219–222
4. Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8
5. Goldsmith RE, Horowitz D (2006) Measuring motivations for online opinion seeking. J Interact
Advert 6(2):2–14
6. Pk MR (2018) Role of sentiment classification in sentiment analysis: a survey. Ann Libr Inf
Stud (ALIS) 65(3):196–209
7. Andreevskaia A, Bergler S (2008) When specialists and generalists work together: overcoming
domain dependence in sentiment tagging. In: Proceedings of ACL-08: HLT, pp 290–298
8. Medhat W, Hassan A, Korashy H (2014) Sentiment analysis algorithms and applications: a
survey. Ain Shams Eng J 5(4):1093–1113
9. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of
words and phrases and their compositionality. In: Advances in neural information processing
systems, pp 3111–3119
10. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation.
In: Proceedings of the 2014 conference on empirical methods in natural language processing
(EMNLP), pp 1532–1543
11. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J
Mach Learn Res 3(Feb):1137–1155
12. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions.
Prog Artif Intell 5(4):221–232
13. Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE
Trans Syst Man Cybern Part B (Cybern) 42(4):1119–1130
14. Go A, Bhayani R, Huang L (2009) Twitter sentiment classification using distant supervision.
CS224N project report. Stanford 1(12)
15. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine
learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural
language processing. 10:79–86. Association for Computational Linguistics
16. Hu M, Liu B (2004) Mining and summarizing customer reviews. In: Proceedings of the tenth
ACM SIGKDD international conference on knowledge discovery and data mining, pp 168–177
17. Abbasi A, France S, Zhang Z, Chen H (2010) Selecting attributes for sentiment classification
using feature relation networks. IEEE Trans Knowl Data Eng 23(3):447–462
18. Bespalov D, Bai B, Qi Y,Shokoufandeh A (2011) Sentiment classification based on super-
vised latent n-gram analysis. In: Proceedings of the 20th ACM international conference on
Information and knowledge management, pp 375–382
19. Ye Q, Zhang Z, Law R (2009) Sentiment classification of online reviews to travel destinations
by supervised machine learning approaches. Expert Syst Appl 36(3):6527–6535
20. Tripathy A, Anand A, Rath SK (2017) Document-level sentiment classification using hybrid
machine learning approach. Knowl Inf Syst 53(3):805–831
21. Liu Y, Bi JW, Fan ZP (2017) Multi-class sentiment classification: The experimental compar-
isons of feature selection and machine learning algorithms. Expert Syst Appl 80:323–339
22. Zhang R, Lee H, Radev D (2016) Dependency sensitive convolutional neural networks for
modeling sentences and documents. arXiv preprint arXiv:1611.02361
23. Teng Z, Vo DT, Zhang Y (2016) Context-sensitive lexicon features for neural sentiment analysis.
In: Proceedings of the 2016 conference on empirical methods in natural language processing,
pp 1629–1638
24. Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint arXiv:
1408.5882
25. Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for
modelling sentences. arXiv preprint arXiv:1404.2188
26. Dong L, Wei F, Tan C, Tang D, Zhou M, Xu K (2014) Adaptive recursive neural network for
target-dependent twitter sentiment classification. In: Proceedings of the 52nd annual meeting
of the association for computational linguistics (vol 2: Short papers), pp 49–54
27. Liu P, Qiu X, Chen X, Wu S, Huang XJ (2015) Multi-timescale long short-term memory neural
network for modelling sentences and documents. In: Proceedings of the 2015 conference on
empirical methods in natural language processing, pp 2326–2335
28. Turney PD, Littman ML (2002) Unsupervised learning of semantic orientation from a hundred-
billion-word corpus. arXiv preprint cs/0212012
29. Kamps J, Marx M, Mokken RJ, De Rijke M (2004) Using WordNet to measure semantic
orientations of adjectives. In: LREC (vol 4, pp 1115–1118)
30. Missen MMS, Boughanem M (2009) Using wordnet’s semantic relations for opinion detection
in blogs. In: European conference on information retrieval, pp 729–733. Springer, Berlin,
Heidelberg
31. Fernández-Gavilanes M, Álvarez-López T, Juncal-Martínez J, Costa-Montenegro E, González-
Castaño FJ (2016) Unsupervised method for sentiment analysis in online texts. Expert Syst
Appl 58:57–75
32. Hatzivassiloglou V, McKeown KR (1997) Predicting the semantic orientation of adjectives. In:
Proceedings of the 35th annual meeting of the association for computational linguistics and
eighth conference of the European chapter of the association for computational linguistics, pp
174–181. Association for computational linguistics
33. ZhuǴ X, GhahramaniǴn Z (2002) Learning from labeled and unlabeled data with label
propagation
34. He Y, Zhou D (2011) Self-training from labeled features for sentiment analysis. Inf Process
Manage 47(4):606–616
35. Zhang P, He Z (2013) A weakly supervised approach to Chinese sentiment classification using
partitioned self- training. J Inf Sci 39(6):815–831
36. Gao W, Li S, Xue Y, Wang M, Zhou G (2014) Semi-supervised sentiment classification with
self- training on feature subspaces. In: Workshop on Chinese lexical semantics, pp 231–239.
Springer, Cham
37. da Silva NFF, Coletta LF, Hruschka ER, Hruschka ER Jr (2016) Using unsupervised information
to improve semi-supervised tweet sentiment classification. Inf Sci 355:348–365
38. Tai YJ, Kao HY (2013) Automatic domain-specific sentiment lexicon generation with label
propagation. In: Proceedings of international conference on information integration and web-
based applications & services, pp 53–62
39. Hamilton WL, Clark K, Leskovec J, Jurafsky D (2016). Inducing domain-specific sentiment
lexicons from unlabeled corpora. In: Proceedings of the conference on empirical methods in
natural language processing. Conference on Empirical methods in natural language processing
(vol 2016, pp 595) NIH Public Access
40. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment
classification. Inf Sci 181(6):1138–1152
41. Onan A, Korukoğlu S, Bulut H (2017) A hybrid ensemble pruning approach based on consensus
clustering and multi-objective evolutionary algorithm for sentiment classification. Inf Process
Manage 53(4):814–833
42. Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier
based on differential evolution algorithm for text sentiment classification. Expert Syst Appl
62:1–16
43. Onan A, Korukoğlu S, Bulut H (2016) Ensemble of keyword extraction methods and classifiers
in text classification. Expert Syst Appl 57:232–247
44. Perikos I, Hatzilygeroudis I (2016) Recognizing emotions in text using ensemble of classifiers.
Eng Appl Artif Intell 51:191–201
45. Lochter JV, Zanetti RF, Reller D, Almeida TA (2016) Short text opinion detection using
ensemble of classifiers and semantic indexing. Expert Syst Appl 62:243–249
46. Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G (2017) Learning from class-
imbalanced data: review of methods and applications. Expert Syst Appl 73:220–239
47. Li S, Wang Z, Zhou G, Lee SYM (2011). Semi-supervised learning for imbalanced sentiment
classification. In: Twenty-second international joint conference on artificial intelligence
48. Song J, Huang X, Qin S, Song Q (2016) A bi-directional sampling based on K-means method for
imbalance text classification. In: 2016 IEEE/ACIS 15th international conference on computer
and information science (ICIS). IEEE, pp 1–5
49. Prusa JD, Khoshgoftaar TM, Seliya N (2016) Enhancing ensemble learners with data sampling
on high- dimensional imbalanced tweet sentiment data. In: The twenty-ninth international flairs
conference
50. Moreo A, Esuli A, Sebastiani F (2016) Distributional random oversampling for imbalanced text
classification. In: Proceedings of the 39th international ACM SIGIR conference on research
and development in information retrieval, pp 805–808
51. Li Y, Guo H, Zhang Q, Gu M, Yang J (2018) Imbalanced text sentiment classification using
universal and domain-specific knowledge. Knowl-Based Syst 160:1–15
52. Baccianella S, Esuli A, Sebastiani F (2010) Sentiwordnet 3.0: an enhanced lexical resource for
sentiment analysis and opinion mining. In: Lrec. vol 10:2200–2204
53. Loyola-González O, Martínez-Trinidad JF, Carrasco-Ochoa JA, García-Borroto M (2016)
Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced
databases. Neurocomputing 175:935–947
54. Tang D, Wei F, Qin B, Yang N, Liu T, Zhou M (2015) Sentiment embeddings with applications
to sentiment analysis. IEEE Trans Knowl Data Eng 28(2):496–509
55. Li S, Ju S, Zhou G, Li X (2012) Active learning for imbalanced sentiment classification. In:
Proceedings of the 2012 joint conference on empirical methods in natural language processing
and computational natural language learning, pp 139–148. Association for computational
linguistics
Improving Streamflow Prediction Using
Hybrid BPNN Model Combined
with Particle Swarm Optimization
Nagarampalli Manoj Kumar, Ippili Saikrishnamacharyulu, Abinash Sahoo,

Sandeep Samantaray, Mavoori Hitesh Kumar, Akash Naik,
and Srinibash Sahoo
Abstract Because of global climate change, sustainable water resource manage-

ment faces a severe challenge. In this context, streamflow estimation is highly
significant for managing different water resources schemes, such as water supply
and reservoir scheduling. This work improves stability and accuracy of stream-
flow estimations using a hybrid model integrating backpropagation neural network
with particle swarm optimisation (BPNN-PSO). Proposed models analysed histor-
ical monthly streamflow series of Panposh gauging station of Brahmani River, India.
Performance of robust BPNN-PSO model is assessed based on Nash–Sutcliffe coef-
ficient (NSE ) and root mean square error (RMSE) measures. Results show that hybrid
neural network model significantly enhanced the accurateness of streamflow predic-
tions with NSE-0.9886 and RMSE-0.364 compared to standalone neural network.
This work indicates that projected robust model can capture nonlinear characteristics
of streamflow process and provide more precise forecasting outcomes.
Keywords BPNN · BPNN-PSO · Brahmani river · Streamflow
N. M. Kumar · I. Saikrishnamacharyulu · A. Naik · S. Sahoo

Department of Civil Engineering, GIET University Gunpur, Bhubaneswar, Odisha, India
e-mail: manojnagarampalli@giet.edu
I. Saikrishnamacharyulu
A. Sahoo
Department of Civil Engineering, NIT Silchar, Silchar, Assam, India
S. Samantaray (B)
Department of Civil Engineering, OUTR Bhubaneswar, Bhubaneswar, Odisha, India
M. H. Kumar
Department of Civil Engineering, NIT Tiruchirappalli, Tamil Nadu, Tiruchirappalli, India
300 N. M. Kumar et al.
1 Introduction
Estimation of streamflow is significant for substantial water resources distribution

and flood management [1–4]. Due to this, in recent decades, the scientific commu-
nity has extensively researched streamflow estimation. Simple artificial intelligence
(AI) models cannot describe river flow’s hydrological process behaviour since it
is very complex. Therefore, it is vital to investigate appropriate models for highly
nonlinear and seasonal river flows. Several models for estimating time series have
been projected in hydrological forecasting [5, 20–23]. In recent years, AI methods
have been prevalent data-based approaches and effectively developed for modelling
nonlinear and complex hydrological problems [14, 24, 25]. Amongst AI techniques,
ANN, support vector machine (SVM), etc., have proven to be efficient tools for
developing hydrologic models [22, 26]. Based on this, comparative assessments
amid various AI methods have been performed in relevant works of literature, and
attempts are still made for finding most suitable one [27].
Wang et al. [6] used threshold-based, cluster-based, periodic ANN and MLP-NN
(multilayer Perceptron) to forecast daily streamflow of Yellow River, China. They
found that periodic ANN performed best than other selected models. Sahoo et al. [7]
examined ANN, multiple regression analysis (MRA), and chaotic nonlinear dynamic
algorithm (CNDA) for predicting the temperature of stream water from accessible air
temperature and solar radiation. Sattari et al. [8] applied Time-Lag Recurrent Neural
Network (TLRNN) to model daily inflow into Eleviyan basin, Iran. Mehr et al. [9]
proposed Wavelet-ANN and linear genetic programming (LGP) techniques for fore-
casting monthly streamflow of Çoruh River, Turkey, and compared performance of
proposed models. Gowda and Mayya [10] used BPNN and GANN (genetic algo-
rithm) to develop streamflow prediction models using daily data from River Nethra-
vathi, India. Statistical analysis showed that GANN model performed much better
than BPNN model. Mehr et al. [11] applied feed-forward backpropagation (FFBP),
generalised regression neural network (GRNN), and radial basis function network
(RBFN) for predicting monthly streamflow of successive stations on Coruh River,
Turkey. Chen et al. [12] investigated artificial bee colony (ABC), ant colony optimiza-
tion (ACO), differential evolution (DE), and PSO to optimise forecasting problems
for hybrid NN model for determining the best model to forecast downstream stream-
flow. Results show that DE algorithm achieves best performance in generalisation
and forecasts. Peng et al. [13] proposed ANN-MVO (multi-verse optimiser), ANN-
PSO, and (BPNN) for streamflow forecasting of Yangtze River, China. Gao et al. [14]
proposed BPNN-PSO model to identify kinematic parameters of industrialised robots
with an improved convergence response. Their findings showed that proposed BPNN-
PSO parameter-identification method has fast convergence speed and fewer itera-
tions. Gao et al. [15] compared accuracy of basic extreme learning machine (ELM),
ELM-kernel, BPNN, random forest (RF), and support vector machine (SVM) to fore-
cast daily streamflow of Wei River Basin, China. They concluded that ELM-kernel
provided superior forecasts than other models. Zhang et al. [16] applied BPNN-GA
Improving Streamflow Prediction Using Hybrid BPNN … 301
model to develop a seepage prediction model. Outcomes showed that enhanced dam
seepage model increases ability of generalisation and nonlinear mapping.
This study aims to develop an efficient streamflow forecasting model based on
BPNN-PSO model using historical monthly streamflow data of Panposh gauging
station of Brahmani River, India, and investigate its performance against conventional
BPNN model.
2 Study Area
Brahmani River basin lies between 20° 30' to 23° 36' N latitudes and 83° 52' and
87° 00' E longitudes and flows in eastern region of India with total catchment area
of 39,313 km2 (Fig. 1). Brahmani is located amid River Baitarani on left and River
Mahanadi on right and has four separate sub-basins: Jaraikela, Tilga, Jenapur, and
Gomlai. It receives mean annual precipitation of 1305 mm, with maximum rain
occurring during four months of southwest monsoon season (June–October). The
minimum and maximum temperature fall to 4°C in winter and in summer reaches as
high as 47°C, respectively.
3 Methodology
3.1 BPNN
Rumelhart et al. [28] developed BPNN on the basis of error back-propagation algo-
rithm (Fig. 2). BPNN is a commonly applied efficient NN comprising input, hidden,
and output layers [29, 30]. The complexity of mathematical problems governs the
count of hidden nodes and is determined experimentally. Input I j and output O j of
j th node are computed by
Σ
Ij = wij Oi (1)
i
( )
Oj = f Ij + θj (2)
where
( wij)—weight between input node to hidden or hidden to output node,
f I j + θ j —activation function, and θ j —biased input to neuron.
Fig. 1 Location of hydrological stations in Brahmani River Basin, India
3.2 PSO
Kennedy and Eberhart [17] developed a prevalent and well-known population-based

metaheuristic algorithm called the PSO algorithm that has been applied success-
fully in optimising several problems like function optimization and network training
(Fig. 3). It was developed based on bird flocks’s two significant movement behaviour
characteristics, i.e., the position of the birds and their velocity [18]. Considering
optimisation process, the PSO equation in l th iteration can be expressed as:
Pl+1 = Pl + Vl+1 (3)

Fig. 2 Structure of BPNN
being Vl and Pl —velocity and position of the particle, respectively. In Eq. (3), updated
velocity can be achieved based on best swarm position (Pg ) and personal best value
(Pb ) using subsequent relation:
( )
Vl+1 = a · Vl + c1 · r1 (Pl − Pb ) + c2 · r2 Pl − Pg (4)
where a—inertia weight, c1 and c2 —coefficients of acceleration, and r1 and r2 —two

arbitrary coefficients produced at each iteration. Unless target criterion is reached,
repetitive procedure is continuing.
Present work applies two data-driven modelling tools comprising BPNN and BPNN-
PSO for developing runoff models on basis of six different input combinations for
determining predictor variables. Performance assessment results of both models are
given in Table 1. Predicted values were computed considering data from 30 years
(1990–2019). Valuation is done using NSE and RMSE values between observed
data and predicted results. According to Table 1, when we compared for the flow
predictions in between six scenario, the flow predictions of sixth scenario related to
other scenarios provide higher values of NSE and lower values of RMSE. Similarly,
for BPNN-PSO model, scenario VI provide superior performance than other five
scenario condition. Regarding performance of prediction models based on Table
1, the results of BPNN-PSO model showed more pre-eminence than conventional
BPNN method.
Assessment of prediction models based on their performance in terms of graphical
representation is shown in Figs. 4, 5 and 6. The performance of the two proposed
models for streamflow forecasting is compared using scatter plots, time-series plots,
and violin plots. Figure 4 shows the coefficient of determination (R2 ) between
observed flow data outcomes and predicted values equal 0.92965 and 0.97583 for
Fig. 3 Flow chart of the BPNN-PSO algorithm
Table 1 Performance of proposed model

Station name Model name RMSE NSE RMSE NSE
Training Testing
BPNN1 12.35 0.9418 19.367 0.9211
BPNN2 11.9381 0.9442 18.002 0.9243
BPNN3 11.367 0.945 16.9436 0.9258
BPNN4 10.5218 0.9462 15.654 0.927
BPNN5 10.1364 0.9486 15.0068 0.9294
Panposh BPNN6 9.2213 0.9513 14.361 0.9306
BPNN-PSO1 3.4965 0.9802 8.8834 0.9698
BPNN-PSO2 2.63 0.9816 8.3762 0.9709
BPNN-PSO3 2.005 0.9837 7.261 0.9721
BPNN-PSO4 1.9534 0.985 6.995 0.9743
BPNN-PSO5 1.397 0.9862 6.4398 0.976
BPNN-PSO6 0.364 0.9886 5.974 0.9795
BPNN and BPNN-PSO techniques, respectively. These models generate RMSE

values of 14.361 and 5.974, with NSE values 0.9306 and 0.9795, respectively.
This reveals that the BPNN-PSO model with the highest R2 and NSE values and
lowest RMSE performed best than the standard BPNN model, making the hybrid
BPNN-PSO model an efficient neural network for streamflow forecasting.
The distribution of forecasted values by BPNN and BPNN-PSO models and actual
streamflow is portrayed in Violin plot, as presented in Fig. 6. This plot demonstrates
that maximum streamflow values range from 0 to 160 m3 /s, and streamflow distri-
bution estimated by BPNN-PSO is close to actual flow distribution. The BPNN was
unable to model the peak runoff values successfully.
Fig. 4 Scatter plot of actual versus predicted streamflow for the testing period
Fig. 5 Actual versus predicted streamflow values at Panposh station

Fig. 6 Violin plots of observed streamflow data versus predicted data through standalone BPNN
and hybrid BPNN-PSO models
5 Conclusion
Predicting streamflow is vital to assess impending flood risks and evaluate and plan
flood mitigation actions. Generally, in hydrological modelling, sensitivity and uncer-
tainty are two essential considerations. The primary goal of this research was to
analyse the predictive abilities of hybrid BPNN-PSO algorithm for monthly stream-
flow forecasts. Compared with the BPNN model, the ANN models trained by PSO
algorithm obtain better forecasting results. Obtained results are validated utilising
different statistical measures, indicating that proposed model is computationally fast
and able to learn quickly. Present work only considered ANN-based modelling tech-
nique and utilised data from one gauging station. Future studies can focus on applying
different time-series modelling methods, and additional data from other stations may
be necessary for strengthening our conclusions.
References
1. Samanataray S, Sahoo A (2021) A comparative study on prediction of monthly streamflow

using hybrid ANFIS-PSO approaches. KSCE J Civ Eng 25(10):4032–4043
2. Sahoo A, Samantaray S, Ghose DK (2019) Stream flow forecasting in Mahanadi River basin
using artificial neural networks. Procedia Comput Sci 157:168–174
3. Yaseen ZM, Kisi O, Demir V (2016) Enhancing long-term streamflow forecasting and
predicting using periodicity data component: application of artificial intelligence. Water Resour
Manage 30(12):4125–4151
4. Tian X, Negenborn RR, van Overloop PJ, Maestre JM, Sadowska A, van de Giesen N
(2017) Efficient multi-scenario model predictive control for water resources management with
ensemble streamflow forecasts. Adv Water Resour 109:58–68
5. Samantaray S, Sumaan P, Surin P, Mohanta NR, Sahoo A (2022) Prophecy of groundwater
level using hybrid ANFIS-BBO approach. In: Proceedings of international conference on data
science and applications, pp 273–283. Springer, Singapore
6. Wang W, Van Gelder PH, Vrijling JK, Ma J (2006) Forecasting daily streamflow using hybrid
ANN models. J Hydrol 324(1–4):383–399
7. Sahoo GB, Schladow SG, Reuter JE (2009) Forecasting stream water temperature using regres-
sion analysis, artificial neural network, and chaotic non-linear dynamic models. J Hydrol
378(3–4):325–342
8. Sattari MT, Yurekli K, Pal M (2012) Performance evaluation of artificial neural network
approaches in forecasting reservoir inflow. Appl Math Model 36(6):2649–2657
9. Mehr AD, Kahya E, Olyaie E (2013) Streamflow prediction using linear genetic programming
in comparison with a neuro-wavelet technique. J Hydrol 505:240–249
10. Gowda CC, Mayya SG (2014) Comparison of back propagation neural network and genetic
algorithm neural network for stream flow prediction. J Comput Environ Sci
11. Mehr AD, Kahya E, Şahin A, Nazemosadat MJ (2015) Successive-station monthly stream-
flow prediction using different artificial neural network algorithms. Int J Environ Sci Technol
12(7):2191–2200
12. Chen XY, Chau KW, Busari AO (2015) A comparative study of population-based optimization
algorithms for downstream river flow forecasting by a hybrid neural network model. Eng Appl
Artif Intell 46:258–268
13. Peng T, Zhou J, Zhang C, Fu W (2017) Streamflow forecasting using empirical wavelet
transform and artificial neural networks. Water 9(6):406
14. Gao G, Liu F, San H, Wu X, Wang W (2018) Hybrid optimal kinematic parameter identification
for an industrial robot based on BPNN-PSO. Complexity
15. Li X, Sha J, Wang ZL (2019) Comparison of daily streamflow forecasts using extreme learning
machines and the random forest method. Hydrol Sci J 64(15):1857–1866
16. Zhang X, Chen X, Li J (2020) Improving dam seepage prediction using back-propagation
neural network and genetic algorithm. Math Probl Eng
17. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: 1995 proceedings of the IEEE
international conference on neural networks, vol 4, pp 1942–1948
18. Cao Y, Zhang H, Li W, Zhou M, Zhang Y, Chaovalitwongse WA (2018) Comprehensive learning
particle swarm optimization algorithm with local search for multimodal functions. IEEE Trans
Evol Comput 23(4):718–731
algorithm for estimating sediment load in Kalahandi Gauge Station, India. In: Proceedings of
International Conference on Data Science and Applications. Springer, Singapore, 319–329
20. Sridharam S, Sahoo A, Samantaray S, Ghose DK (2021) Assessment of flow discharge in a river
basin through CFBPNN, LRNN and CANFIS. Commun Softw networks. Springer, Singapore,
765–773
21. Samantaray S, Sahoo A (2021b) Modelling response of infiltration loss toward water table
depth using RBFN, RNN, ANFIS techniques. Int J Knowl-based and Intell Eng Syst. IOS
Press. 25(2):227–234
22. Kisi O (2015) Streamflow Forecasting and Estimation Using Least Square Support Vector
Regression and Adaptive Neuro-Fuzzy Embedded Fuzzy c-means Clustering. Water Resour
Manage 29:5109–5127. https://doi.org/10.1007/s11269-015-1107-7
23. Samantaray S, Sahoo A, Agnihotri A (2021) Assessment of flood frequency using statistical
and hybrid neural network method: Mahanadi River Basin, India. J Geol Soc India, Springer.
97(8):867–880
24. Samantaray S, Sahoo A (2021c) Prediction of suspended sediment concentration using hybrid
SVM-WOA approaches. Geocarto Int. Taylor & Francis. 1–27
25. Tien Bui D, Pham BT, Nguyen QP, Hoang ND (2016) Spatial prediction of rainfall-
induced shallow landslides using hybrid integration approach of Least-Squares Support Vector
Machines and differential evolution optimization: a case study in Central Vietnam. Int J Digital
Earth. 1077–1097
26. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluctua-
tion through CANFIS in arid watershed: a case study. Smart intelligent comput appl. Springer,
Singapore. 781–789
27. Nourani V, Komasi M, Mano A (2009) A Multivariate ANN-Wavelet Approach for Rainfall–
Runoff Modeling. Water Resour Manage 23(14):2877–2894
28. Rumelhart GE, Hinton RJ, Williams R (1986) Learning representations by back-propagating
errors. Nature 323:533–536. https://doi.org/10.1038/323533a0
29. Samantaray S, Ghose DK (2020a) Modelling runoff in an arid watershed through integrated
support vector machine. H2Open Journal, IWA Publishing. 3(1):256–275
30. Samantaray S, Ghose DK (2020b). Assessment of suspended sediment load with neural
networks in arid watershed. J Inst Eng (India): Series A, Springer. 101(2):371–380
Prediction of Pullout Resistance
of Geogrids Using ANN
Ippili Saikrishna Amacharyulu, Balendra Mouli Marrapu,

and Vasala Madhava Rao
Abstract Determination of pullout resistance for Geogrids in sandy soils is one of

the important aspects for reinforced earth structures. It can be determined in labo-
ratory by laboratory testing. For performing these laboratory experiments, standard
equipment and skilled technicians are required. Preparation of soil samples similar
to field conditions and performance of tests will take lot of time. To minimize this
problem, an Artificial Neural Network (ANN) analysis is a soft computing technique
which can be used alternative to it. A well-trained ANN provides similar results to the
experimental results within a short period of time. In present study, an ANN model
was developed for the determination of pullout resistance for geo grid. For devel-
oping this ANN model initially known input and output, training data are required;
for this purpose, training data are collected from the literature. Here, input param-
eters considered for training ANN are normal stress acting on geogrid (q), length
of embedment (L), width of geogrid (W), relative density of sand (Dr ), and average
friction angle between soil and grid (δ), and output was pullout force (P). Apart from
the prediction of pullout forces (P), ANN can also be used for the identification of
the relative contribution of pullout force parameters q, L, W, Dr , and δ using Garson
algorithm technique. From this study, it is observed that normal stress acting on
geogrid (q) is the most significant parameter.
Keywords Reinforced earth · Geogrid · Pullout resistance · Artificial Neural

Network · Relative contribution parameter
I. S. Amacharyulu (B) · V. M. Rao

Department of Civil Engineering, GIET University, Bhubaneswar, India
V. M. Rao
e-mail: profvmrao@giet.edu
B. M. Marrapu
Department of Civil Engineering, AITAM, Visakhapatnam, India
310 I. S. Amacharyulu et al.
1 Introduction
Reinforced earth structures play a major role in the developmental activities of

any nation. Due to the scarcity of land, reinforced earth minimizes the land use
by reducing the inclination of slopes in embankments, minimizing the construction
time, increasing the bearing capacity of pavements, stabilization of soils, etc., [2, 7,
9] Reinforced earth structure means the arrangement of alternate layers of soil and
reinforcement materials. The main principle behind reinforced earth is that vertical
normal stresses of soil exert on the reinforcement will develop frictional resistance
that tensile stresses being carried by the reinforcement. There are many reinforce-
ment materials available in the market like geogrids, fibers, steel strips, geotex-
tiles, geo-fabrics strips, planks, Geo Synthetics, etc.; out of these, geogrids are most
commonly used in reinforced wall construction. Determining actual pullout force
for these geogrids in sandy soils is a very difficult task. These geogrids are placed
at different levels in embankment constructions; at every level, finding pullout force
is most time consuming because its overburden pressure varies at different layers.
Determining original pullout force using experimental method is time consuming
[5]; an alternative to this is the Artificial Neural Network (ANN) model. ANN is a
soft computing technique that can be used as an alternative to experimental methods.
In this study, the ANN model is developed for the determination of pullout forces
and the developed ANN model performances are compared with experimental results
in terms of statistical index performances like mean square error (MSE), coefficient
of determination (R2 ), root mean square error (RMSE), and variability accounted for
(VAF). Apart from the prediction of pullout forces, relative performance of pullout
forces parameters is also determined; it will help to identify the significance of
each of the pullout force participating parameter. If any parameter is observed most
significantly, then that parameter should be very important in the calculation of
pullout force, as the accuracy of the estimated pullout force is greatly influenced
by the input parameters having high values. Even a small variation in these input
parameters then may be huge error in a result/variation in the output.
2 Pullout Force
Geogrid is a polymer material made of polypropylene, or high-density polyethylene

is used in reinforced earth structures. Keeping alternative layers of soil and geogrid
with suitable spacing will improve the shear strength of reinforced earth structures.
The properties of these geogrids will vary from the spacing of the grid, uniaxial
to biaxial, manufacturer to manufacturer, etc. The manufacturer will provide only
tensile strength, grid size, and spacing only during buying; for the design of reinforced
earth structure, pullout force is a very important parameter. Pullout force depends
upon many factors like normal stress acting on geogrid (q), length of embedment (L),
the width of geogrid (W), relative density of sand (Dr ), and average friction angle
Prediction of Pullout Resistance of Geogrids Using ANN 311
Fig. 1 Sample geogrid

material
between soil and grid (δ). L and W are geogrid properties and remaining are filling
material properties.
Pullout force can be determined experimentally using pullout test apparatus. In
this study, pullout test result data were collected from literature [5], where they
performed tests for 35 geogrids with different lengths, widths, overburden pressures,
and friction angle between soil and grid (δ). Here (δ), it is calculated by using [1]
procedure (δ = 2/3ϕ). Sample geogrid material is shown in Fig. 1.
3 Artificial Neural Network (ANN)
Artificial Neural Network (ANN) is a soft computing technique working on the

principle similar to the human nervous system. Like our human brain, it will need
some data for its initial training. After successful training with known input and
output data, without modifying the neural network’s interconnected residual weights
and network structure, this trained ANN network can be applied to any new untrained
data for prediction of output [6, 8]. In this study, ANN model was developed for the
prediction of pullout force of geogrids. For this purpose, data collected from the
literature Lentz [5] were used. Details of these data were presented in Tables 1 and
2. Out of 35 cases, 30 cases were used for the development of the ANN model as
a training data. The remaining 5 cases were used for verifying the developed ANN
model.
3.1 Development of ANN Model
For the development of ANN model, network needs input parameters for prediction
of pullout force. Totally five input parameters are required, normal stress acting on
geogrid (q), length of embedment (L), width of geogrid (W), relative density of
sand (Dr ), and average friction angle between soil and geogrid (δ), and one output
parameter was pullout force (P). Initially, ANN is trained with 5 inputs: 4 hidden
Table 1 Training data collected from literature

S. No q L W Dr δ P
1 1.265 29.5 10.38 39 36.6 576
2 0.69 29.5 10.38 43 49 500
3 0.69 29.5 10.38 40 34 285
4 1.265 24.25 10.38 41 36 462
5 1.265 29.5 8.13 41 39.1 493
6 1.265 24.25 10.38 87 40.3 540
7 1.265 24.25 10.38 42 31.8 394
8 0.69 29.5 10.38 81 41.6 375
9 1.265 29.5 8.13 86 46.3 635
10 1.265 29.5 10.38 39 35.2 547
11 1.265 29.5 3.63 45 54.2 375
12 1.265 12.25 10.38 43 41.1 280
13 1.265 29.5 10.38 51 44.5 760
14 1.265 22.75 10.38 36 49.5 700
15 1.265 18.5 10.38 40 31.3 295
16 2.416 29.5 10.38 40 33.1 965
17 1.841 29.5 10.38 84 40.6 965
18 1.265 29.38 10.5 45 33.4 515
19 0.259 29.5 10.38 25 56.6 240
20 1.265 29.5 5.88 84 52.1 563
21 0.69 29.5 10.38 37 32.4 268
22 1.265 29.38 10.5 44 33.8 523
23 1.265 29.5 3.63 86 59.0 450
24 1.265 12.25 10.38 41 36.3 236
25 1.265 10.63 10.38 35 65.8 620
26 1.265 29.5 10.38 87 41.3 680
27 1.265 29.5 5.88 43 45.7 450
28 1.265 16.63 10.38 38 56.5 660
29 1.265 29.5 10.38 86 44.1 750
30 1.265 29.5 10.38 41 35.1 545
and 1 output layer (5-4-1), and its performance was observed, if its performance was
not satisfactory, then hidden layer neurons need to increase. So hidden layer neurons
were increased one by one and checked ANN performance. It was observed that
when hidden layer neurons are at 10, the performance was most satisfactory. So, 5-
10-1 ANN architecture was concluded and shown in Fig. 2. The algorithm used here
was Bayesian regression, and transfer function used between input and the hidden
Table 2 Data applied to real data and compared with ANN-predicted value
S. No. q L W Dr δ P ANN-P
1 1.265 29.5 10.38 37 43.7 740 735
2 1.265 18.5 10.38 84 45.4 493 511
3 1.265 12.25 10.38 87 52.2 415 430
4 1.265 18.25 10.38 42 33.7 320 317
5 1.841 29.5 10.38 40 34.6 777 766
Fig. 2 Considered ANN

architecture for training data
layer was TANSIGMA whereas PURLIN was used between the hidden layers and
output layers [15, 16, 17].
where
q normal stress (Pound force per square inch).

L length of embedment (inch).
W width of reinforcement (inch).
Dr relative density of sand (%).
Δ average friction angle between soil and grid (Deg).
P pullout force (lb).
3.2 Application of Developed ANN Model
After developing the ANN model successfully, it was applied to untrained new
data collected from the above literature and presented in Table 2. It was observed
that ANN-predicted pullout forces P were almost close to the experimental pullout
force values and were shown in Fig. 3. For the further clarification of the perfor-
mance of ANN, ANN-predicted results were compared using statistical relations of
mean square error (MSE), coefficient of determination (R2 ), root mean square error
(RMSE), and variability accounted for (VAF). These values were calculated using
Eq. (1), Eq. (2), Eq. (3), and Eq. (4), respectively.
1 Σ( )2
1
MSE = y − y' [10] (1)
N N
⎡ ⎤2
Σ ( Σ )( Σ )
⎢ N y ∗ y' − y y' ⎥
R2 = ⎢ /
⎣ [ Σ ][ ]
⎥ [11, 13] (2)
(Σ )2 Σ (Σ )2 ⎦
N y2 − y N y '2 − y'
[
|
|1 Σ1
RMSE = [ (y − y ' )2 [12, 14] (3)
N N
[ ( )]
var y − y '
VAF = 100 1 − (4)
var(y)
where
Y Pullout force from Experiment
Y’ Pullout force from ANN
N Total no of cases.
It was observed that MSE was 140 which was a slightly higher value, but it was
manageable for consideration. Coefficient of determination value R2 was close to 1,
900 Experimental
800
ANN Predicted
Pullout Resistance force (lb)
700
600
500
400
300
200
100
0
1 2 3 4 5
Sl .No
Fig. 3 Comparison of ANN-predicted values with original tested values

Fig. 4 Coefficient of 900

R2 = 0.995
determination (R2 ) for
ANN-predicted values with 800
Expermental Pullout force

experimental value
700
Values(lb)
600
500
400
300
300 500 700 900
Predicted Pullout force Values (lb)
RMSE was 11, and VAF value was 99.58. These indices indicate that ANN perfor-
mance was almost close to experimental values. The coefficient of determination R2
can also be seen in Fig. 4.
4 Calculations of Relative Contribution Factors
After successfully developing the ANN model with 30 cases of data, the relative
performance of various contributing factors for pullout resistance is assessed from
the residual weights between various neurons. For this calculation, the range of
variations of these input parameters is to be in the same order variation [3]. This
requires normalization of the training data. For this purpose, all the training data
parameters are normalized between [0–1] using Eq. (5).
X − Min(X )
Normalization = (5)
Max(X ) − Min(X )
where ‘X’ is the considered parameter value. Max (X) is the maximum value of
the considerable parameter, and Min (X) is the minimum value of the considerable
parameter.
After normalization, normalized data are considered again as training data without
modifying the ANN architecture and transfer function. After successful training with
normalized data, the interconnected residual weights are collected from the trained
ANN and presented in Table 3. The sensitivity of input parameters was calculated
by using Garson’s [3, 4] procedures. After calculation, the relative contribution of
each parameter was assessed and shown in Fig. 5.
Table 3 Interconnected residual weights between input and hidden layers and hidden and output
layers
Weights q L W Dr Δ Sub weights
1 0.21741 0.050567 0.090315 0.001530 0.10769 0.28683
2 − 0.16025 − 0.034861 − 0.072725 0.004811 − 0.062762 − 0.20685
3 0.11967 0.023289 0.052802 − 0.005020 0.04226 0.1513
4 − 0.61447 − 0.24765 − 0.51278 0.2689 − 0.038005 − 0.81159
5 − 0.70989 0.14546 − 0.018932 − 0.080583 − 0.64969 − 1.0713
6 − 0.15113 − 0.032127 − 0.068666 0.005114 − 0.057351 − 0.19419
7 0.36616 0.10538 0.20311 0.016883 0.20854 0.52003
8 − 0.23035 − 0.40549 − 0.031452 − 0.13675 − 0.9227 − 0.74142
9 1.0268 0.48102 0.35375 0.084676 − 0.75253 1.006
10 − 0.07513 − 0.012934 − 0.031304 0.003429 − 0.025275 − 0.093299
Fig. 5 Relative contribution 45

Relative Performence (%)
factors for pullout force 40

35
30
25
20
15
10
5
0
q L W Dr δ
Pull out Resistence Parameters
5 Conclusions
The main objective of this study is to determine pullout force of geogrid using ANN
and calculate the relative contribution parameters of pullout force using the Garson
algorithm. The ANN-predicted pullout forces were almost close to experimental
results. This can be observed by R2 and VAR values, whereas MSE and RMSE were
a little higher value but within a range only. For further improvement of results,
training data need to increase. In this study, training data considered were very less,
i.e., 30 cases; only because of this, MSE and RMSE were little higher. Apart from
the prediction of pullout force, relative contribution was also calculated in this study.
It was observed that ‘q’ value is higher (42%); this implies that pullout force mainly
depends upon overburden pressure, next significant was friction between soil and
geogrid ‘δ’ (26%), next width (17%), and length (12%) and least significant was the
relative density of sand (6%). If the contribution of the parameters is more, it plays
a major role in the calculations of the pullout force. Even a small deviation in input
value will result in maximum variation in the pullout force calculations. Therefore,
it is very important to identify the relative contribution of various parameters in the
calculation.
References
1. Barrett RJ (1985) Geotextiles in earth reinforcement. Geotech Fabr Report 3(2):15–19

2. Chakraborty D, Kumar J (2014) Bearing capacity of strip foundations in reinforced soils. Int J
Geomech 14(1):45–58
3. Garson David G (1991) Interpreting neural network connection weights. AI Expert 6(4):47–51
4. Goh Anthony TC (1994) Seismic liquefaction potential assessed by neural networks. J Geotech
Eng 120(9):1467–1480
5. Lentz RW, Pyatt JN (1988) Pull-out resistance of geogrids in sand. Transp Res Rec 1188
6. Marrapu BM, Jyothi A, Jakka RS (2021) Improvement in prediction of slope stability and
relative importance factors using artificial neural network. Geotech Geol Eng 39(5):1–16
7. Michalowski RL, Zhao A (1996) Failure of fiber-reinforced granular soils. J Geotech Eng
122(3):226–234
8. Sakellariou MG, Ferentinou MD (2005) A study of slope stability prediction using neural
networks. Geotech Geol Eng 23(4):419–445
9. Wang Y, Guo P, Lin H, Li X, Zhao Y, Yuan B, Liu Y, Cao P (2019) Numerical analysis
of fiber-reinforced soils based on the equivalent additional stress concept. Int J Geomech
19(11):04019122
10. Samantaray S, Ghose DK (2019) Dynamic modelling of runoff in a watershed using artifi-
cial neural network. In: Smart intelligent computing and applications, pp 561–568. Springer,
Singapore
11. Samantaray S, Sahoo A, Ghose DK (2020a) Infiltration loss affects toward groundwater fluc-
tuation through CANFIS in arid watershed: a case study. In: Smart intelligent computing and
applications, pp 781–789. Springer, Singapore
support vector machine. H2O J 3(1):256–275
13. Samantaray S, Sahoo A, Ghose DK (2020b) Assessment of sediment load concentration using
SVM, SVM-FFA and PSR-SVM-FFA in arid watershed, India: a case study. KSCE J Civil
Eng, pp 1–14
data sets using hybrid PSR-SVM-FFA approaches. J Water Clim Change. https://doi.org/10.
2166/wcc.2021.221
15. Samantaray S, Sahoo A, Ghose DK (2021) Watershed management and applications of AI.
CRC Press, Taylor and Francis
16. Samantaray S, Sahoo A (2021) A comparative study on prediction of monthly streamflow using
hybrid ANFIS-PSO approaches. KSCE J Civ Eng 25(10):4032–4043
17. Samantaray S, Sahoo A (2021b) Prediction of suspended sediment concentration using hybrid
SVM-WOA approaches. Geocarto Int, pp 1–27
Simulation of Water Table Depth Using
Hybrid CANFIS Model: A Case Study
Ippili Saikrishnamacharyulu, Nihar Ranjan Mohanta,

Mavoori Hitesh Kumar, Sandeep Samantaray, Abinash Sahoo,
Prameet Kumar Nanda, and Priyashree Ekka
Abstract Long-term water table depth (WTD) prediction in agricultural areas

proves to be a challenging task. These regions have heterogeneous and complex
hydrogeological characteristics, human activities, and boundary conditions; also,
there are nonlinear interactions between these aspects. Machine learning (ML)
approaches have been broadly implemented for WTD forecasting because of their
capability of modelling nonlinearities amongst GWL and its conditional factors. A
new ML model was developed known as the co-active neuro-fuzzy inference system
combined with the firefly algorithm (CANFIS-FA) for estimating monthly WTD of
Nuapada watershed located in Odisha state, India. Prediction results of CANFIS-FA
model presented good performance with mean squared error of 1.084–3.709; the
correlation coefficient is >0.98, demonstrating that the hybrid model is appropriate
to assess multifaceted groundwater systems. Therefore, it is evident that proposed
model can assist as an alternate method in WTD prediction, particularly in regions
where hydrogeological data are challenging to acquire.
Keywords CANFIS · CANFIS-FA · WTD · Nuapada watershed
I. Saikrishnamacharyulu · P. K. Nanda · Priyashree Ekka

Department of Civil Engineering, GIET University Gunpur, Gunpur, Odisha, India
N. R. Mohanta
Department of Civil Engineering, NIT Raipur, Raipur, Chhattisgarh, India
M. H. Kumar
Department of Civil Engineering, NIT Tiruchirappalli, Tiruchirappalli, Tamil Nadu, India
S. Samantaray
A. Sahoo (B)
e-mail: bablusahoo1992@gmail.com
320 I. Saikrishnamacharyulu et al.
1 Introduction
Water table depth forecasting is vital for water resource managers for managing
water supply, planning land development and creating efficient irrigation schedules.
Groundwater is one of the primary irrigation and food production sources in many
countries ([1–4]). Rapid population expansion and constant advancement of indus-
trialisation, climate and weather change and changing frequency and intensity of
rainfall have had a severe influence on groundwater resources. Hence, groundwater
resource evaluation is urgently necessary to guarantee sustainable usage and manage-
ment [5–7]. Variations of water level in wells directly quantify effect of groundwater
development, and essential information regarding aquifer dynamics is frequently
entrenched in constantly noted WTD time series [8]. As a result, modelling and
prediction of WTD are essential for water administrators and engineers for qualifying
and quantifying groundwater resources and keep a balance amid demand and supply.
GWL mainly fluctuates due to natural processes, like precipitation or surface water
interaction and artificial effects like artificial recharge and groundwater pumping.
Predicting variations in GWLs is a complicated problem when natural time series
that affect GWL are diversified with very uneven anthropogenic impacts. Artificial
neural networks (ANNs) can give a more practical method to predict GWLs in a
highly uncertain and dynamic system. These are the reasons for which ANN models
based on time series have recently been broadly utilised in hydrology to estimate
and predict GWL [9, 10], water quality [11, 12], flood [13–15], precipitation [6,
16], etc. Gholami et al. [17] used CANFIS for simulating groundwater quality and
geographic information system (GIS) as a preprocessor tool for demonstrating spatial
changes in groundwater quality. Results confirmed high efficacy of combining neuro-
fuzzy methods and GIS. Allawi et al. [18] proposed a modified CANFIS model
to allow improved detection of high nonlinearity patterns faced during reservoir
inflow forecasting of Aswan High Dam (AHD), Egypt. Applied statistical measures
show better performance of modified CANFIS, which considerably outperformed
ANFIS. Zhang et al. [19] developed a long short-term memory (LSTM) model
to predict long-term WTD in agricultural areas and compared the obtained results
with traditional feed-forward neural network (FNN) model. CANFIS was developed
for estimating monthly pan evaporation at two stations located in Uttarakhand and
Uttar Pradesh states of India [20, 21]. They validated it against multilayer percep-
tron (MLP) and multiple linear regression (MLR) and concluded supremacy of the
developed CANFIS model. Also, MLP, MLR and CANFIS were applied, and their
performance was investigated in predicting drought index at different study loca-
tions [21, 22]. In both studies, CANFIS predicted drought index better than other
models. Some studies have also revealed that hybrid methodologies that integrate
different ML models with optimization algorithms and data pre-processing tech-
niques can provide more precise outcomes than conventional ML as certain patterns
in data (e.g. periodicities, level shifts, trends) can be well apprehended by hybrid
methods. Bayatvarkeshi et al. [23] evaluated ANN, CANFIS, ANN-PCA (principal
component analysis) and three conjoint models, comprising WANN, WPCA–ANN
Simulation of Water Table Depth Using Hybrid CANFIS Model … 321
and W-CANFIS for predicting daily relative humidity. Their findings revealed that
WPCA–ANN model was optimal model to estimate RH. Singh et al. [24] used
MLP, random forest (RF), decision tree (DT), CANFIS and support vector machine
(SVM) models and their wavelets W-MLP, W-DT W-RF, W-CANFIS and W-SVM
for predicting soil permeability corresponding to physical aspects of soil. It was
found that wavelet-based models simulated better soil permeability results than non-
wavelet models, and W-RF had the highest efficiency and accuracy. Supreetha et al.
[25] used a hybrid lion algorithm-long short-term memory (LA-LSTM) to develop a
GWL forecasting model and compared it with traditional LSTM and FNN models.
Results reveal that hybrid LA-LSTM model forecasted GWL with better accuracy
for a larger data set. Moravej et al. [26] applied hybrid LSSVR-ISA and LSSVR-GA
and conventional GP and ANFIS models for monthly GWL forecasting in Karaj
plain, Iran. Azizpour et al. [27] implemented ANFIS-FA and WANFIS-FA to esti-
mate qualitative and quantitative groundwater parameters using collected data from
Karnachi Well, Kermanshah, Iran. They found that WANFIS-FA was successful
in groundwater parameters forecasting. The current study employed PSO to tune
BPNN (BPNN-PSO) for WTD simulation and forecasting in Nuapada watershed
and evaluate potential of proposed technique against conventional BPNN.
2 Study Area
Nuapada district lies between 20° 0' N and 21° 5' latitudes and 82° 20' E and 82° 40'
E longitudes. It falls in the western region of Odisha with 3407.5 km2 area. Because
of noticeable lack of any industry, the economy of this region mostly rotates around
agricultural activities. The soils are red, mixed red and black. The district agriculture
faces systematic risk and uncertainties with drought and acidic soils. It receives
an average rainfall of 1286 mm, majorly through monsoon rains. The summer is
scorching, and the temperature may increase to 48 °C (Fig. 1).
3 Methodology
3.1 CANFIS
CANFIS is an integrated form of artificial neural network and fuzzy system with
precise and quick capabilities [28]. Usefulness of ANFIS or CANFIS lies in the
usage of nonlinear fuzzy rules. Utilising fuzzy rules, correlations amongst outputs
in CANFIS are formulated with collective membership values [29]. If FIS with
one output z and two inputs x1 and x2 is utilised, then for CANFIS network, a
characteristic ruleset with two fuzzy IF–THEN rules for first-order Sugeno fuzzy
model can be articulated as given below:
Fig. 1 Location of Nuapada watershed
Rule 1 : if(x1 is A1 ) and (x2 is B1 ) then z 1 = p1 x1 + q1 x2 + r1 (1)
Rule 2 : if(x1 is A2 ) and (x2 is B2 ) then z 2 = p2 x1 + q2 x2 + r2 (2)
where A1 , A2 , B1 , B2 —membership function (MF) for x1 , x2 inputs, respectively,

p1 , q1 , r1 and p2 , q2 , r2 —output function parameters. The main building chunks of a
CANFIS are its network structure, fuzzy operator, MF, training algorithm, and activa-
tion function. Complete output is the sum of all inbound signals utilising normalised
firing strength calculated as:
Σ w1 c1 + w2 c2
z1 = wi ci = w1 c1 + w2 c2 = (3)
i
w1 + w2
where wi —normalised firing strength of ith rules and ci —adaptive node.

3.2 CANFIS-FA
Yang [30] developed FA based on fireflies flashing behaviour. Three primary guide-
lines in FA are—all fireflies are unisex; every firefly has its personal brightness,
and luminous intensity governs the attractiveness in fireflies [31]. Figure 2 illus-
trates training arrangement flowchart of CANFIS-FA. Elementary stages involved
are elaborated in subsequent steps.
Step I: Initialise fireflies population where a set of fireflies will be randomly
produced on basis of scaling outcome of firefly population. Consequently, the set of
CANFIS parameters is mapped by each firefly.
Step II: Now fitness function f (u) represents light intensity of each firefly.
Step III: On basis of light intensity for each firefly, compute attractiveness β
utilising Eq. (4) and after that compare amid fireflies, transfer lower intensity firefly
u i towards high-intensity firefly u j utilising Eq. (5).
( )
β (r ) = βo exp −γ r 2 (4)
( )( )
u i = u i + β exp −γ ri2j u i − u j + α(rand − 0.5) (5)
where u i and u j —firefly with low and high intensity of light, γ —coefficient of
absorption, r 2 —Euclidian stance amid ith and jth firefly, α—random movement
behaviour, βo —attractiveness of β at r = 0, rand—arbitrary number produced in
interval [0−1].
Step IV: If iterations reached maximum permissible iterative number or system
fitness reached fitness threshold (i.e. iterations < maximum), end or return to step
II and repeat the procedure. This output represents fitness value and position of
preeminent firefly.
The results of CANFIS and CANFIS-FA models are compared to assess their poten-
tial in predicting WTD. Mean absolute error (MAE) and coefficient of determination
(R2 ) are used as statistical metrics. Values of MAE and R2 are shown in Table 1,
which reveals that best CANFIS model predicted WTD with R2 = 0.95184, MAE =
7.9614 in the training stage. Also, the best CANFIS-FA model estimated WTD with
R2 = 0.99163, MAE = 1.084 m. Comparison of obtained results indicates that the
precision of the CANFIS model increased by applying the optimization algorithm.
The MAE reduced by using CANFIS and CANFIS-FA models by 13.89–4.11 during
testing phases.
On the whole, comparison amid CANFIS and CANFIS-FA models shows that
hybrid CANFIS-FA performed superior to CANFIS in both stages. The key reason
for better performance of CANFIS-FA is that the model incorporates both neural
Fig. 2 Working flowchart of CANFIS-FA model

Table 1 Performance indicators (MAE, R2 ) values using CANFIS and CANFIS-FA methods
Station name Model name MAE training R2 MAE testing R2
Nuapada CANFIS 1 11.62 0.94815 16.3847 0.92736
CANFIS 2 10.3766 0.9497 14.906 0.92905
CANFIS 3 9.005 0.95026 14.217 0.9308
CANFIS 4 7.9614 0.95184 13.89 0.93177
CANFIS-FA 1 3.709 0.98421 5.752 0.96733
CANFIS-FA 2 3.1971 0.98596 5.114 0.9689
CANFIS-FA 3 1.92 0.9872 4.7899 0.96941
CANFIS-FA 4 1.084 0.99163 4.11 0.97098
network, fuzzy logic principles with FA optimization algorithm; therefore, it can

apprehend advantages of both in a solitary framework. Thus, the CANFIS-FA model
seems to be more satisfactory than CANFIS model in establishing a good relationship
between WTD and its predictors.
Figure 3 illustrates the scatter plots of best models in testing phase with R2 —
0.93177 and 0.97098 for CANFIS and CANFIS-FA. Results of observed WTD with
corresponding predicted values by CANFIS and CANFIS-FA model for best input
combination are presented in Fig. 4. Figure 5 shows the range of observed and
predicted values in form of histogram plots.
Overall, results of this study are promising and indicate that CANFIS-FA is a
favourable prediction model in groundwater hydrology. In addition, this work gave
case study-based artworks that specify it as appropriate for circumstances where
only low-dimension input data are accessible. Present study utilised data from a
single station only, and additional data from other zones can be utilised for verifying
this study’s conclusions. Moreover, research is required for improving prediction
accuracy when a sudden variation occurs in WTD over consecutive periods.
Fig. 3 Scatter plots of predicted and calculated values by CANFIS and CANFIS-FA models in
testing period
Fig. 4 Comparison plot of observed and predicted monthly water table depth
Fig. 5 Variation of results in terms of histogram plot
5 Conclusion
Long-term WTD forecasting presents a significant challenge and is vital for the
sustainable management of water and environmental resources. However, because
of nonlinear interactions between GWL and its drivers and their multiscale behaviour
that varies with time, producing accurate water table depths is difficult. To address
such problems, this study tested potential of CANFIS-FA model for predicting WTD
of Nuapada watershed, Odisha, India. Applied model gives a capable new technique
for predicting WTD, verified by satisfactory WTD prediction performance in the
specified location. The robust CANFIS-FA model can work as a valuable tool for
predicting WTD. The results of this study will assist as a guideline to government
policymakers and authorities for future water management projects.
References
1. Samantaray S, Sumaan P, Surin P, Mohanta NR, Sahoo A (2022) prophecy of groundwater

level using hybrid ANFIS-BBO approach. In: Proceedings of international conference on data
science and applications. Springer, Singapore, pp 273–283
2. Sridharam S, Sahoo A, Samantaray S, Ghose DK (2021) Estimation of water table depth using
wavelet-ANFIS: a case study. In: Communication software and networks. Springer, Singapore,
pp 747–754
3. Samantaray S, Sahoo A, Ghose DK (2019) Assessment of groundwater potential using
neural network: a case study. In: International conference on intelligent computing and
communication. Springer, Singapore, pp 655–664
4. Shahid S, Wang XJ, Rahman MM, Hasan R, Harun SB, Shamsudin S (2015) Spatial assessment
of groundwater over-exploitation in northwestern districts of Bangladesh. J Geol Soc India
85(4):463–470
5. Samantaray S, Sahoo A (2021) Modelling response of infiltration loss toward water table depth
using RBFN, RNN, ANFIS techniques. Int J Knowl Based Intell Eng Syst 25(2):227–234
6. Samantaray S, Sahoo A, Ghose DK (2020) Infiltration loss affects toward groundwater fluc-
tuation through canfis in arid watershed: a case study. In: Smart intelligent computing and
applications. Springer, Singapore, pp 781–789
7. Singh KP, Gupta S, Rai P (2014) Investigating hydrochemistry of groundwater in Indo-Gangetic
alluvial plain using multivariate chemometric approaches. Environ Sci Pollut Res 21(9):6001–
6015
8. Butler JJ Jr, Stotler RL, Whittemore DO, Reboulet EC (2013) Interpretation of water level
changes in the High Plains aquifer in western Kansas. Groundwater 51(2):180–190
9. Samanataray S, Sahoo A (2021) A comparative study on prediction of monthly streamflow
using hybrid ANFIS-PSO approaches. KSCE J Civ Eng 25(10):4032–4043
10. Sahoo A, Samantaray S, Ghose DK (2019) Stream flow forecasting in Mahanadi River Basin
using artificial neural networks. Procedia Comput Sci 157:168–174
11. Noori N, Kalin L, Isik S (2020) Water quality prediction using SWAT-ANN coupled approach.
J Hydrol 590:125220
12. Ahmed AN, Othman FB, Afan HA, Ibrahim RK, Fai CM, Hossain MS, Ehteram M, Elshafie
A (2019) Machine learning methods for better water quality prediction. J Hydrol 578:124084
13. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid anfis-aco model: a
case study. In: Proceedings of ICICIT 2021 p 169
hybrid neural network method: Mahanadi River basin, India. J Geol Soc India 97(8):867–880
16. Sarkar D, Sarkar T, Saha S, Mondal P (2021) Compiling non-parametric tests along with CA-
ANN model for precipitation trends and variability analysis: a case study of Eastern India.
Water Cycle 2:71–84
17. Gholami V, Khaleghi MR, Sebghati M (2017) A method of groundwater quality assessment
based on fuzzy network-CANFIS and geographic information system (GIS). Appl Water Sci
7(7):3633–3647
18. Allawi MF, Jaafar O, Hamzah FM, Mohd NS, Deo RC, El-Shafie A (2018) Reservoir inflow
forecasting with a modified coactive neuro-fuzzy inference system: a case study for a semi-arid
region. Theoret Appl Climatol 134(1):545–563
19. Zhang J, Zhu Y, Zhang X, Ye M, Yang J (2018) Developing a Long Short-Term Memory (LSTM)
based model for predicting water table depth in agricultural areas. J Hydrol 561:918–929
20. Malik A, Kumar A (2015) Pan evaporation simulation based on daily meteorological data using
soft computing techniques and multiple linear regression. Water Resour Manage 29(6):1859–
1872
21. Malik A, Rai P, Heddam S, Kisi O, Sharafati A, Salih SQ, Al-Ansari N, Yaseen ZM (2020) Pan
evaporation estimation in Uttarakhand and Uttar Pradesh States, India: validity of an integrative
data intelligence model. Atmosphere 11(6):553
22. Malik A, Kumar A, Rai P, Kuriqi A (2021) Prediction of multi-scalar standardized precipitation
index by using artificial intelligence and regression models. Climate 9(2):28
23. Bayatvarkeshi M, Mohammadi K, Kisi O, Fasihi R (2020) A new wavelet conjunction approach
for estimation of relative humidity: wavelet principal component analysis combined with ANN.
Neural Comput Appl 32(9):4989–5000
24. Singh VK, Kumar D, Kashyap PS, Singh PK, Kumar A, Singh SK (2020) Modelling of soil
permeability using different data driven algorithms based on physical properties of soil. J
Hydrol 580:124223
25. Supreetha BS, Shenoy N, Nayak P (2020) Lion algorithm-optimized long short-term memory
network for groundwater level forecasting in Udupi District, India. Appl Comput Intell Soft
Comput
26. Moravej M, Amani P, Hosseini-Moghari SM (2020) Groundwater level simulation and fore-
casting using interior search algorithm-least square support vector regression (ISA-LSSVR).
Ground Water Sustain Dev 11 p 100447
27. Azizpour A, Izadbakhsh MA, Shabanlou S, Yosefvand F, Rajabi A (2022) Simulation of time-
series groundwater parameters using a hybrid metaheuristic neuro-fuzzy model. Environ Sci
Pollut Res pp 1–17
28. Abyaneh HZ, Varkeshi MB, Golmohammadi G, Mohammadi K (2016) Soil temperature esti-
mation using an artificial neural network and co-active neuro-fuzzy inference system in two
different climates. Arab J Geosci 9(5):377
29. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow prediction
in river using wavelet-CANFIS: a case study. In: Intelligent data engineering and analytics.
30. Yang XS (2009) Firefly algorithms for multimodal optimization. In: International symposium
on stochastic algorithms. Springer, Berlin, Heidelberg, pp 169–178
31. Poursalehi N, Zolfaghari A, Minuchehr A, Moghaddam HK (2013) Continuous firefly
algorithm applied to PWR core pattern enhancement. Nucl Eng Des 258:107–115
Monthly Runoff Prediction by Support
Vector Machine Based on Whale
Optimisation Algorithm
Aiswarya Mishra, Abinash Sahoo, Sandeep Samantaray,

Abstract This study was conducted in catchment area of Baitarani River at Jaraikela,
situated in Eastern India. The Baitarani River is one of the most important rivers
in the eastern region of peninsular India, which later joins the Bay of Bengal.
This region frequently experiences floods due to its erratic rainfall patterns and
climatic conditions, which makes runoff prediction important for planning better
watershed management techniques and mitigation strategies. To simulate rainfall-
runoff process, SVM model integrated with Whale Optimisation Algorithm (WOA)
method has been used. WOA enhances the results by reducing the error margin in
SVM. For this purpose, 48 years (1981–2020) of statistical data have been used for
calibration, validation and testing of the model. The results show that the hybrid
SVM-WOA model outperforms the classical SVM model in terms of forecasting
accuracy and efficiency based on root mean squared error (RMSE), mean absolute
error (MAE), and Nash–Sutcliffe efficiency (NE) performance evaluation measures.
Keywords SVM · Baitarani River · Jaraikela · SVM-WHO
1 Introduction
Runoff prediction and simulation in watersheds are prerequisites for several practical
applications concerning environmental disposal and conservation and management
of water resources [1–3]. Over the last decade, machine learning and optimisation
algorithms have been widely used for creating hydrological models of rainfall-runoff
A. Mishra · S. Samantaray (B) · D. P. Satapathy

A. Sahoo
S. C. Satapathy
Department of Computer Science and Engineering, KIIT University, Bhubaneswar, Odisha, India
e-mail: dpsatapathy@cet.edu.in; sureshsatapathy@gmail.com
330 A. Mishra et al.
relationships [4–6], sediment load modelling [7–9]; flood prediction [10, 11] (Sahoo
et al. 2020), groundwater prediction [7, 12] and many more. These techniques gained
momentum among scholars and researchers due to better forecasting accuracy. SVM
has greater prediction capability than the classical ANN models. It incorporates the
structural risk minimisation principle that reduces the risk (error) and also seems
to substitute every other shortcoming of ANN models. Proper mid- and long-term
prediction of runoff is significant for better urban planning, watershed management,
urban construction and flood mitigation.
Bray and Han [13] illustrated the rainfall-runoff process using SVM in the region
of Bird Creek, USA. A transfer function model was compared with unit response
curve from SVM, and it was observed that a TF model outmatched SVM in short-
range predictions. It also depicts the complications in finding for global optima for a
SVM model. Behzad et al. [14] compared applicability of SVM, ANN, and ANN-GA
(genetic algorithm) models for predicting one-day ahead streamflow of River Bakhti-
yari, Iran, utilising local rainfall and climate data. Their findings showed that SVM
was the most efficient as it provides high accuracy outcomes, thus demonstrating the
forecasting capabilities of SVM. Misra et al. [15] performed runoff estimation for
Vamsadhara River basin, using SVM and ANN. It was concluded that SVM provided
robust and accurate estimation and significantly improved results than ANN. Sharma
et al. [16] examined the performance of SVM, ANN, and simple regression models
to estimate runoff in the Nepal watershed, and SVM proved to be superior to the
other two models, under comparable accuracy predictions. Sedighi et al. (2016)
used ANN model and SVM for simulating rainfall-runoff process subjective to snow
water equivalent height in Roodak catchment, Iran. The results displayed that SWE
enhanced the performance and precision of SVM. Regardless of extensive applica-
tion of these techniques, there are substantial disadvantages to applying these models.
A key disadvantage is their necessity to tune parameters of optimal learning proce-
dure, whereas the major concern is performance and predictability of these models.
In recent times, the application of metaheuristic optimisation algorithms revealed a
significant solution to ease the complications in the parameterisation of these models
[17–19].
Wang et al. [20] demonstrated the superiority of EEMD-SVM-PSO over ordi-
nary least-squares (OLS) regression and feed-forward neural network (FFNN) for
rainfall-runoff forecasting at Yellow River, China. Komasi and Sharghi [21] vali-
dated the dominance of wavelet SVM model over standalone classical ANN and
SVM models in predicting both long-term and short-term runoff discharge by taking
into consideration the seasonal effects in two catchments. Obtained outcomes showed
that wavelet SVM model could estimate both short- and long-term flow discharges.
Feng et al. [22] adopted SVM-QPSO to determine the input–output relationships in
Yangtze Valley, China. Test outcomes indicated that hybrid model gives improved
forecasting precision than several classical techniques like ANN and extreme learning
machines (ELM). Mohammadi et al. [23] applied standalone SVM and SVM-WOA
models to predict daily evapotranspiration at three meteorological stations in Iran.
Monthly Runoff Prediction by Support Vector Machine … 331
It was found that hybrid SVR-WOA model delivered the best performance. Al-
Zoubi et al. (2018) incorporated the SVM-WOA model to detect spam profiles on
social networking sites in multiple lingual contexts (English, Arabic, Spanish and
Korean). This hybrid model outperformed other commonly used classical models.
The proposed model detects spam profiles automatically and provides information
about the most influencing features during detection procedure, with a high degree
of accuracy. Anaraki et al. [24] estimated flood frequency using changing climatic
conditions in the region of Karun River basin, Iran, using machine learning, decom-
position and metaheuristic algorithms. MARS and M5 tree models are applied to
classify precipitation, WOA is used to train LSSVM, wavelet transform (WT) is
performed to decompose temperature and precipitation, ANN, K-nearest neigh-
bour (KNN), LSSVM and LSSVM-WOA are applied to downscale temperature and
precipitation, and discharge is simulated under the considered time period. Results
showed the superiority of LSSVM-WOA-WT model in simulating discharge. Vahed-
doost et al. [25] developed a metaheuristic optimisation tool, ANN-WOA model, to
define soil parameters of East and West Azerbaijan provinces in Iran. This hybrid
model turned out to be superior to other core optimisation models of ANN and
multilinear regression (MLR).
Objective of current research is to develop a runoff forecasting model based on
hybrid CANFIS method emphasising high-value flows. Outcomes related to standard
CANFIS model are utilised as a benchmark to show the hybrid method’s performance.
2 Study Area
Baitarani River originates from hill ranges of Keonjhar District, Odisha, with an
elevation varying between 32 and 1024 m. Being one of the major rivers of Odisha,
the river flows mainly through the state of Odisha and some parts through Jharkhand.
It lies between20°35' N to 22°15' N latitude and 85°03' E and 87°03' E longitude,
spreading over 14,351 km2 in Odisha with a major part of the basin being covered by
agricultural land. The monsoon season extends from June till October with minimum
and maximum annual rainfall of 800 mm and 2000 mm, respectively, with an average
yearly precipitation of about 1400 mm (Fig. 1).
3 Methodology
3.1 Support Vector Machine
Vapnik [26] first introduced SVM as a two-layer network (in principal layer, weights
are nonlinear, and in subsequent layer, linear). In general, ANN adapts all constraints
(utilising clustering or gradient-based methods), SVM selects constraints as training
Fig. 1 Location of Jaraikela River gauge station (Brahmani River)
input vectors for first layer since this minimises dimension [27–30]. In mathematical
terms, a primary function for statistical learning procedure is
Σ
M
y = f(x) = ai ϕi (x) = wϕ(x) (1)
i=1
where outcome is a linearly weighted summation of M. f (ϕ) is used to carry out

nonlinear transformation. Decision function of SVM is expressed by:
⎧ ⎫
Σ
L
y = f(x) = ai K (xi , x) − b (2)
i=1
where K —kernel function, ai and b—parameters resulting by maximising their

objective function, L—amount of training data, xi —vectors utilised in training phase;
x—self-regulating vector.
3.2 Whale Optimisation Algorithm
WOA optimisation algorithm was first familiarised by Mirjalili and Lewis [31],
motivated by bubble-net feeding behaviour of humpback whales. Usually, they
hunt smaller fishes and other aquatic animals by generating an enclosing curtain
of bubbles. In WOA, target prey is reflected to be the preeminent solution. Probable
circumstance of humpback whales nearby the target is articulated by:
−
→ −
→ | (− → → )||
−
|
X (t + 1) = X ∗ (t) − A→ · |C→ · X ∗ (t) − X (t) | (3)
−
→
where X —condition vector of whale, t—running iteration, X * —condition vector of
preeminent solution and is restructured if there is an enhanced solution.
A→ = 2→
a · r→ − a→
C→ = 2 · r→
A→ and C—coefficient
→ vectors, and a→ —linearly reduces from 2 to 0 in the process
of iteration, r→—unplanned vector ∈ [0, 1].
−
→ −
→ −
→
X (t + 1) = D ' · ebl · cos(2πl) + X ∗ (4)
−
→' |− → ||
|→ −
D = | X ∗ (t) − X (t)|—space amid prey and ith whale; b—constant to deter-
mine logarithmic helix-shaped wave; l—arbitrary number between [−1, 1]. By
shrinking circles, whales move around prey along spiral-shaped paths. Subsequent
mathematical model is formulated to model this concurrent behaviour:
⎧−→
−
→ X ∗ (t) − A→ · D
→ if p ≤ 0.5
X (t + 1) = −→' bl −→∗ (5)
D · e · cos(2πl) + X if p ≥ 0.5
where p ∈ [0, 1]; this permits one for finding probability to keep rotation mode
updating according to whales’ situations.
Table 1 shows the performance of applied models based on considered quantitative

statistical indices. It is found that the value of RMSE, MAE, NE are 14.2279, 16.6328,
0.9598 and 6.74, 7.9376, 0.9864 for the standalone SVM and hybrid model SVM-
WOA during testing phase receptively. The improved hybrid model prediction model
has lower (Fig. 2).
RMSE and MAE statistics and higher NSE, respectively, are making them
significantly better than the classical model.
Scatter plot analysis (Fig. 3) is done to validate the efficiency of SVM-WOA
and standalone SVM algorithm. The mean monthly changes in the maximum and
minimum runoff concerning proposed algorithms are shown in Fig. 4.
The box plots of peak runoff (Fig. 5) recommend that regionalised and cali-
brated parameterisations were usually relatable to collected values according to their
prediction accurateness for the median, first and third quartiles. In contrast, spatial
estimation overestimated the highest peak flow values for most basins, which lead
to larger degrees of extreme errors. Comparative outcomes specify that SVM-WOA
model is certainly better for hydrological research and engineering designs when
observed data is limited in a region or basin.
Table 1 Assessment results of prediction models in training and testing phases

Station name Model name RMSE MAE NE RMSE MAE NE
Training Testing
Jaraikela 1 13.804 15.3887 0.9675 16.9032 19.455 0.9547
2 13.49 15.001 0.9689 16.55 19.0032 0.9554
3 12.907 14.67 0.9694 16.2201 18.7886 0.9565
4 12.488 14.3159 0.97 15.668 18.2874 0.957
5 12.1742 13.8897 0.9708 15.1706 17.99 0.9579
6 11.8697 13.027 0.9717 14.83 17.843 0.9587
7 11.374 12.6728 0.9721 14.2279 16.6328 0.9598
1 5.0031 4.68 0.9909 9.065 10.221 0.98
2 4.939 4.2216 0.9916 9.4872 9.8963 0.981
3 4.6714 3.7882 0.9928 9.0296 9.466 0.9819
4 2.36 3.0624 0.9931 8.6854 9.1874 0.9827
5 1.903 2.3389 0.9937 8.129 8.9327 0.983
6 1.4632 1.95 0.994 7.2278 8.004 0.9843
7 0.968 1.7621 0.9956 6.74 7.9376 0.9864
Fig. 2 SVM-WOA flowchart
5 Conclusion
This paper uses SVM (machine learning tool) to model monthly rainfall-runoff data
of the Baitarani River Basin at Jairaikela, Odisha, India. The kernel functions used
were RBF and Gaussian. Among these, Gaussian showed the highest efficacy. The
results obtained from the standalone model were compared with that of SVM-WOA
model. Although the SVM model generated fairly good results, SVM-WOA could
Fig. 3 Scatter plots for best models of observed vs. predicted (SVM and SVM-WOA) runoff values
Fig. 4 Monthly predicted runoff by SVM and SVM-WOA models for Jaraikela station
bridge the few gaps in the SVM model and produce better prediction in terms of
variability and efficiency. The RMSE values of both the models were 0.968, 11.374
while the NS values were 0.9956 and 0.9721 respectively, which clearly indicate that
SVM-WOA was found more prominent than standalone SVM. Thus, we conclude
that SVM-WOA is suitable to carry out data estimation of runoff and can also be
applied for flood and groundwater forecasting.
Fig. 5 Box plots of

developed models in runoff
estimation
References
2. Sahoo A, Singh UK, Kumar MH, Samantaray S (2021) Estimation of flood in a river basin
through neural networks: a case study. In: Communication software and networks. Springer,
Singapore, pp 755–763
3. Mohanta NR, Biswal P, Kumari SS, Samantaray S, Sahoo A (2021) Estimation of sediment
load using adaptive neuro-fuzzy inference system at Indus River Basin, India. In: Intelligent
data engineering and analytics. Springer, Singapore, pp 427–434
4. Samantaray S, Sahoo A (2020) Prediction of runoff using BPNN, FFBPNN, CFBPNN
algorithm in arid watershed: a case study. Int J Knowl Based Intell Eng Syst 24(3):243–251
5. Jimmy SR, Sahoo A, Samantaray S, Ghose DK (2021) Prophecy of runoff in a river basin using
various neural networks. In: Communication software and networks. Springer, Singapore, pp
709–718
6. Samantaray S and Sahoo A (2021) Modelling response of infiltration loss toward water table
depth using RBFN, RNN, ANFIS techniques. Int J Knowl.-Based Intell Eng Syst 25(2):227–234
7. Samantaray S, Sahoo A, Ghose DK (2020) Prediction of sedimentation in an arid watershed
using BPNN and ANFIS. In: ICT analysis and applications. Springer, Singapore, pp 295–302
8. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow predic-
tion in river using Wavelet-CANFIS: a case study. Intelligent data engineering and analytics.
9. Sahoo A, Samantaray S, Singh RB (2020) Analysis of velocity profiles in rectangular straight
open channel flow. Pertanika J Sci Technol 28(1)
10. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO
model: a case study. In: Proceedings of ICICIT 2021, inventive computation and information
technologies, p 169
13. Bray M, Han D (2004) Identification of support vector machines for runoff modelling. J J
Hydroinform 265–280
14. Behzad M, Asghari K, Eazi M, Palhang M (2009) Expert systems with applications gener-
alization performance of support vector machines and neural networks in runoff modeling. J
Expert Syst Appl 36:7624–7629
15. Misra D, Oommen T, Agarwal A, Mishra SK, Thompson AM (2009) Application and analysis
of support vector machine based simulation for runoff and sediment yield. J Biosyst Eng
103:527–535
16. Sharma N, Zakaullah M, Tiwari H, Kumar D (2015) Runoff and sediment yield modeling using
ANN and support vector machines: a case study from Nepal watershed. J Model Earth Syst
Environ 1(3):1–8
17. Samantaray S, Biswakalyani C, Singh DK, Sahoo A, Prakash Satapathy D (2022) Prediction
of groundwater fluctuation based on hybrid ANFIS-GWO approach in arid Watershed, India.
Soft Comput 26(11):5251–5273. Springer, Berlin Heidelber
algorithm for estimating sediment load in Kalahandi Gauge Station, India. Proceedings of
international conference on data science and applications. Springer, Singapore, pp 319–329
19. Kisi O, Sanikhani H, Zounemat-Kermani M, Niazi F (2015) Long-term monthly evapotranspi-
ration modeling by several data-driven methods without climatic data. Comput Electron Agric
115:66–77
20. Wang WC, Xu DM, Chau KW, Chen S (2013) Improved annual rainfall-runoff forecasting
using PSO–SVM model based on EEMD. J Hydroinform 15(4):1377–1390
21. Komasi M, Sharghi S (2016) Hybrid wavelet-support vector machine approach for modelling
rainfall–runoff process. J Water Sci Technol 73(8):1937–1953
22. Feng ZK, Niu WJ, Tang ZY, Jiang ZQ, Xu Y, Liu Y, Zhang HR (2020) Monthly runoff time
series prediction by variational mode decomposition and support vector machine based on
quantum-behaved particle swarm optimization. J J Hydrol 583:124627
24. Anaraki MV, Farzin S, Mousavi SF, Karami H (2021) Uncertainty analysis of climate change
impacts on flood frequency by using hybrid machine learning methods. J Water Resour Manage
35(1):199–223
25. Vaheddoost B, Guan Y, Mohammadi B (2020) Application of hybrid ANN-whale optimization
model in evaluation of the field capacity and the permanent wilting point of the soils. J Environ
Sci Pollut Res 27(12):13131–13141
26. Vapnik V (1995) The nature of statistical learning theory. Springer, New York
27. Mohammadi B, Mehdizadeh S (2020) Modeling daily reference evapotranspiration via a novel
approach based on support vector regression coupled with whale optimization algorithm. J
Agricult Water Manage 237:106145
28. Ala’M, AZ, Faris H, Alqatawna, JF, Hassonah MA (2018) Evolving support vector machines
using whale optimization algorithm for spam profiles detection on online social networks in
different lingual contexts. J Knowl Based Syst 153:91–104
data sets using hybrid PSR-SVM-FFA approaches. J Water Clim Change
support vector machine. H2 Open J 3(1):256–275
31. Mirjalili S and Lewis A (2016) The whale optimization algorithm. Adv Eng Softw 95:51–67
Application of Adaptive Neuro-Fuzzy
Inference System and Salp Swarm
Algorithm for Suspended Sediment Load
Prediction
Gopal Krishna Sahoo, Abinash Sahoo, Sandeep Samantara,

Abstract Due to the sheer importance of suspended sediment load (SSL) in water-
shed management and design of engineering structures and considering the impact
of rainfall, temperature, and runoff parameters in quantifying and understanding
nonlinear interdependence, it has been a crucial task to predict suspended sediment
load based on these parameters. For this purpose of prediction, a soft computing
model (adaptive neuro-fuzzy inference system (ANFIS)) is optimized with Salp
swarm algorithm (SSA), and the results were validated against a well-established
classical ANFIS model. Data from Jaraikela catchment area in Jharkhand with some
part of it in Sundergarh district of Odisha were used in the analysis. The perfor-
mance of the models was evaluated based on MSE and WI performance indicators.
In comparing the results of the models used, it is evident that ANFIS-SSA model
proved its ascendancy over ANFIS.
Keywords Sediment load · ANFIS · ANFIS-SSA · Jaraikela station
1 Introduction
Sediment load is one of the most indispensable hydrological and hydraulic criteria,
which affects efficiency of water diversion projects and hydraulic structures. It can
cause environmental issues like damaging the aquatic ecosystem and reducing the
quality of surface water. Besides that, the reservoir capacity is reduced, and oper-
ational policy (i.e., energy generation, irrigation, and water supply) is affected due
G. K. Sahoo · S. Samantara (B) · D. P. Satapathy

D. P. Satapathy
A. Sahoo
S. C. Satapathy
340 G. K. Sahoo et al.
to the transport of sediments. Hence, the evaluation and forecasting of SL in rivers

are critical in water resources management, and accurate prediction of sediment
load will assist in river engineering, watershed management, and reservoir opera-
tions. Various approaches to sediment load prediction have been developed, which
include numerical, empirical, and machine learning models. Several researchers have
utilized soft computing approaches for estimating different hydrological parameters
like sediment load ([1–3]), flood [4, 5], groundwater level [6–8], etc.
Kisi et al. [9] proved dominance of genetic programming (GP) over ANFIS,
support vector machine (SVM), and artificial neural network (ANN) for lagged
time discharge and SSL of Cumberland River, USA. Upon investigation of ANFIS
membership functions (MF), the triangular MF proved its superiority over other
MFs in ANFIS. Mehdi [10] evaluated the performance of ordinary kriging (OK), co-
kriging (CK), ANN, and ANFIS, using the daily precipitation, streamflow, and SSL
time series for Kojori forest watershed and substantiated the ascendancy of ANFIS
and ANN over OK and CK models. Kaveh et al. [11] illustrated the supremacy of
the Levenberg–Marquardt learning algorithm with ANFIS over BP algorithm for
time series modeling of SSL of River Schuylkill in the USA. Madvar and Seifei [12]
validated the application of ANFIS over rivers, using the Monte Carlo uncertainty
analysis.
Although conventional soft computing models have good potential to estimate
SSL, optimization of these algorithms is necessary for obtaining more precise
outcomes [13]. It is observed that machine learning methods work well with opti-
mization algorithms than stand-alone machine learning models [5–7, 14]. Samet
et al. [15] performed evaluative analysis for the Maku dam watershed in Iran and
proved the supremacy of ANFIS over ANN and genetic algorithm. It is observed
that ANFIS with the “gauss” MF outperformed other MFs. Kisi and Yaseen [16]
analogized embedded ANFIS-GP (grid partition), ANFIS-FCM (fuzzy c-means),
and ANFIS-SC (subtractive clustering) models for SSL prediction of Eel river, Cali-
fornia, USA. The results proved the superiority of evolutionary fuzzy (EF) over
other models. Yaseen et al. [17] assessed the potential of various models like SSA-
ELM (extreme learning machine), ELM, SVM, RF, and GRNN for monthly river
flow forecasting of Tigris river, Iraq. They established superiority of the SSA-ELM
model compared to other artificial intelligence models. Moayedi et al. [18] synthe-
sized SSA, elephant herding optimization (EHO), shuffled frog leaping algorithm
(SFLA), and wind-driven optimization (WDO) with MLP neural network for fore-
casting shear strength of the soil. Results of his study ratified the eminence of
the SSA-MLP model over other neural-metaheuristic ensembles. Malik et al. [19]
authenticated the robustness of SVM optimized with SVM-SSA over SVR-MVO
(Multi-Verse Optimizer), SVM-WOA, SVM-PSO, SVM-SHO (Spotted hyena opti-
mizer), and Penman model for estimating daily pan-evaporation at Hisar, Bathinda,
and Ludhiana meteorological sites. Babanezhad et al. [20] compared performance
of different MFs of ANFIS with the ant colony optimisation-based FIS (ACOFIS)
for simulating suspended sediment load dataset of Cumberland River, USA. They
deduced that ANFIS technique performed best with tri-membership function.
Darabi et al. [21] used parameters like discharge, rainfall, and sediment for the
Application of Adaptive Neuro-Fuzzy Inference System … 341
prediction of SSL for Talar basin, Iran, and Eagle Creek basin, USA. They compared
stand-alone ANFIS with other neural network models and also the hybridization
of ANFIS, integrated with four optimization algorithms, i.e., particle swarm opti-
mization (PSO), sine–cosine algorithm (SCA), bat algorithm (BA), and firefly algo-
rithm (FA). Results specified that ANFIS-SCA outperformed other applied models.
The goal of the current study is to integrate stand-alone ANFIS model with SSA
for creating a robust model for SSL prediction. For this purpose recorded data
(2001–2020) of Jaraikela catchment, India was tested for predicting monthly SSL
outcomes.
2 Study Area
Jaraikela catchment area spreads from 21° 50' N to 23° 36' N latitude and 84° 29' E to
85° 49' E longitude. This catchment area extends around the River Koel, a tributary of
River Brahmani, that originates close to Palamu Tiger Reserve, Jharkhand. Elevation
of watershed area stretches from 185 m at Jaraikela gauge station to 640 m in the
upper portion of the watershed. Total drainage area of Jaraikela catchment is almost
9160 km2 . A major part of this catchment lies in Jharkhand state and some part of
the catchment covers the Sundergarh district of Odisha. About 80% of rainfall in
this catchment occurs during the monsoon season. The catchment’s topography is
undulating and flat, shielded with deep forest and cultivable lands, and the climate
is categorized as sub-humid (Fig. 1).
3 Methodology
3.1 Adaptive Neuro-Fuzzy Inference System
Fuzzy logic method is established for indeterminate linguistic expression instead of

numerical ambiguity. Learning, constructing, expensing, and classifying competence
are possessed by ANFIS. A FIS is a rule-based system that comprises three conceptual
constituents: a rule base containing fuzzy if–then rules; a database describing MF; an
implication system combining fuzzy rules and producing system outcomes. The FL
modeling in the initial stage consists of determinating MFs of input–output variables;
in the next stage, it constructs fuzzy rules. Lastly, it consists of determining output
features, output MF, and system outcomes. The multi-layered forward network uses
artificial neural network learning algorithms and fuzzy reasoning for characterizing
input space to output space in ANFIS.
Fig. 1 Study area showing Jairakela catchment
Two rules can be expressed for a typical first-order Sugeno inference system
Rule 1 : IFxisA1 andyisB1 THEN f 1 = p1 ∗ x + q1 ∗ y + r1 (1)
Rule 2 : IFxisA2 andyisB2 THEN f 2 = p2 ∗ x + q2 ∗ y + r2 (2)

where, x and y crisp inputs to node i, Ai , and Bi linguistic labels characterized by

appropriate MFs, and pi , qi , and ri are the consequence parameters (i = 1 or 2).
3.2 SSA
For our study, we devised an integrated SSA-ANFIS approach in which learning

constraints of ANFIS controller were enhanced utilizing the SSA for prediction
of sediment load (Fig. 2). Salp belonging to the Salpidae family is barrel-shaped
gelatinous organisms that move by pumping water through their body. Salps live in
colonies and move in chains. The salp chains consist of leader and follower salps,
position of leader salp is at front of the chain. SSA possesses balanced exploitation
and exploration capabilities for achieving an optimum solution and avoiding traps in
optimal local solutions similar to other optimization algorithms.
Core parameter D1 is described in terms of following mathematical expression
( )2
4t
D1 = 2e Tmax
(3)
where D1 —utilization partialities of SSA appropriately composed, t—iteration, and

Tmax —maximum number of iterations. In addition, location of follower salps is given
using the following equation.
Y ji + Y ji−1
Y ji = (4)
2
where Y ji —commencement of jth dimension and ith salp at SSA approach that is
employed to fitness function as below equation.
λ = Min(LP) (5)
LP—ANFIS’s learning parameter using SSA, and after iteration, λ value can be
reorganized and verified for coverage of accurate value. The best iteration quantity
could be decided at finish of iteration. Objective function of SSA is defined using
the subsequent equation.
[
|
| 1 Σ
N
F =] (OB(R = Pa , Ia , Da ))2 (6)
N 1
Fig. 2 Flowchart depicting working procedure of ANFIS-SSA model
The ANFIS and ANFIS-SSA models were utilized to model SSL time series for
considered input combinations. The performance of proposed models is given in
Table 1. For evaluating efficiency of models, MSE and WI measures were employed
for all scenarios. Based on Table 1, WI and MSE for ANFIS models ranges from
0.93816 to 0.96883, 29.4963 to 14.908. In ANFIS-SSA models, parameters are in
range 0.97336 to 0.99489 and 9.1998 to 1.0132, respectively.
Predicted and observed cumulative SSL during testing phase is illustrated in Fig. 3
in the form of scatter plots. Predictions by ANFIS-SSA were closer to 45° straight
line compared to stand-alone ANFIS model in Jaraikela catchment. As it is raised
from Fig. 4, ANFIS-SSA models have better predictions for SSL prediction than
stand-alone ANFIS models. ANFIS model constantly underestimated peak SSL,
whereas ANFIS-SSA model provided consistent results with peak SSL occurrences
in Jaraikela catchment. Magnitudes of the ANFIS-SSA models’ low, medium, and
high SSL predictions were nearer to observed values. It was observed that cumulative
SSL estimated by optimal ANFIS-SSA model was in good agreement with collected
data. Nevertheless, ANFIS-SSA model showed a superior performance than ANFIS
model.
Table 1 Performance assessment of applied models

Algorithm Model name MSE WI MSE WI
Training Testing
ANFIS Pt 17.1093 0.95987 21.4963 0.93816
Pt , Pt−1 16.698 0.9604 20.88 0.9394
Pt , Pt−1 , Pt−2 16.39 0.96195 20.4892 0.94007
Pt , Pt−1 , Pt−2 , Rt 16.0025 0.96302 20.183 0.94105
Pt , Pt−1 , Pt−2 , Rt , Rt−1 15.6746 0.96441 19.6377 0.94288
Pt , Pt−1 , Pt−2 , Rt , Rt−1 , Rt−2 15.2289 0.9669 19.21 0.9432
Pt , Pt−1 , Pt−2 , Rt , Rt−1 , Rt−2 , T t 14.908 0.96883 18.8896 0.94581
ANFIS-SSA Pt 4.0781 0.98844 9.1998 0.97336
Pt , Pt−1 3.6982 0.98921 8.683 0.9749
Pt , Pt−1 , Pt−2 3.0183 0.99005 8.0187 0.97504
Pt , Pt−1 , Pt−2 , Rt 2.4388 0.9917 7.8832 0.97791
Pt , Pt−1 , Pt−2 , Rt , Rt−1 1.894 0.99206 7.157 0.9798
Pt , Pt−1 , Pt−2 , Rt , Rt−1 , Rt−2 1.07 0.993 6.8829 0.98002
Pt , Pt−1 , Pt−2 , Rt , Rt−1 , Rt−2 , T t 1.0132 0.99489 5.62 0.98055
Fig. 3 Scatter plot graphical

representation between
observed and prediction
models
5 Conclusion
In this research, integrated SSA-ANFIS approach is evaluated using the data of

Jaraikela catchment area of Koel river. The learning parameters of the ANFIS
controller were enhanced using the salp swarm algorithm. The results were then
assessed with those of classical ANFIS using of rainfall, runoff, temperature param-
eters for suspended sediment load prediction. The performance indices indicated
Fig. 4 Observed versus predicted SSL by ANFIS and ANFIS-SSA models
that ANFIS-SSA model outperformed than ANFIS model, with MSE = 1.0132,
WI = 0.99489 and MSE = 14.908, WI = 0.96883, respectively. Hence, this study
revealed the potential of ANFIS-SSA model in predicting suspended sediment load.
The current outcome of this research can be expanded further by including more
hydrological and climatological data for the proposed model and by trying different
input combinations for various time series data.
References
1. Mohanta NR, Biswal P, Kumari SS, Samantaray S, Sahoo A (2021) Estimation of sediment
load using adaptive neuro-fuzzy inference system at Indus River Basin, India. In: Intelligent
data engineering and analytics. Springer, Singapore, pp 427–434
2. Samantaray S, Ghose DK (2020) Assessment of suspended sediment load with neural networks
in arid watershed. J Inst Eng (India) Ser A 1–10
4. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO model:
a case study. In: Inventive computation and information technologies: proceedings of ICICIT
2021, p 169
5. Sahoo A, Samantaray S, Paul S (2021) Efficacy of ANFIS-GOA technique in flood prediction:

7. Samantaray S, Sahoo A, Ghose DK (2020) Assessment of sediment load concentration using
SVM, SVM-FFA and PSR-SVM-FFA in arid watershed, India: a case study. KSCE J Civ Eng
24(6):1944–1957
8. Sridharam S, Sahoo A, Samantaray S, Ghose DK (2021) Estimation of water table depth using
wavelet-ANFIS: a case study. In: Communication software and networks. Springer, Singapore,
pp 747–754
9. Kisi O, Dailr AH, Cimen M, Shiri J (2012) Suspended sediment modeling using genetic
programming and soft computing techniques. J Hydrol 450–451:48–58
10. Mehdi V (2013) Comparison of cokriging and adaptive neuro-fuzzy inference system models
for suspended sediment load forecasting. Arab J Geosci 6:3003–3018
11. Kaveh K, Bui MD, Rutschmann P (2017) A comparative study of three different learning
algorithms applied to ANFIS for predicting daily suspended sediment concentration. Int J Sed
Res 32:340–350
12. Madvar HR, Seifi A (2018) Uncertainty analysis in bedload transport prediction of gravel-bed
rivers by ANN and ANFIS. Arab J Geosci 11:688
13. Samantaray S, Biswakalyani C, Singh DK, Sahoo A., Prakash Satapathy D (2022) Prediction
of groundwater fluctuation based on hybrid ANFIS-GWO approach in arid Watershed, India.
Soft Computing 26(11):5251–5273
algorithm for estimating sediment load in Kalahandi gauge station, India. In: Proceedings of
international conference on data science and applications. Springer, Singapore, pp 319–329
15. Samet K, Hoseini K, Karami H, Mohammadi M (2019) Comparison between soft computing
methods for prediction of sediment load in rivers: Maku Dam case study. Iran J Sci Technol
Trans Civ Eng 43:93–103
16. Kisi O, Yaseen ZM (2019) The potential of hybrid evolutionary fuzzy intelligence model for
suspended sediment concentration prediction. CATENA 174:11–23
17. Yaseen ZM, Faris H, Ansari NA (2020) Hybridized extreme learning machine model with
salp swarm algorithm: a novel predictive model for hydrological application. Complexity
2020:8206245
18. Moayedi H, Gör M, Khari M, Foong LK, Bahiraei M, Bui DT (2020) Hybridizing four wise
neural-metaheuristic paradigms in predicting soil shear strength. Measurement 156:107576
19. Malik A, Tikhamarine Y, Ansari NA, Shahid S, Sekhon HS, Pal RK, Rai P, Pandey K, Singh
P, Elbeltagi A, Sammen SS (2021) Daily pan-evaporation estimation in different agro-climatic
zones using novel hybrid support vector regression optimized by salp swarm algorithm in
conjunction with gamma test. Eng Appl Comput Fluid Mech 15:1075–1094
20. Babanezhad M, Behroyan I, Marjani A, Shirazian S (2021) Artificial intelligence simulation
of suspended sediment load with different membership functions of ANFIS. Neural Comput
Appl 33:6819–6833
21. Darabi H, Mohamadi S, Karimidastenae Z, Kisi O, Ehteram M, ELShafie A, Haghighi AT
(2021) Prediction of daily suspended sediment load (SSL) using new optimization algorithms
and soft computing models. Soft Comput 25:7609–7626
22. Wang H, Yan H, Zeng W, Lei G, Ao C, Zha Y (2020) A novel nonlinear Arps decline model
with salp swarm algorithm for predicting pan evaporation in the arid and semi-arid regions of
China. J Hydrol 582:124545
23. Samantaray S, Ghose DK (2019) Sediment assessment for a watershed in arid region via neural
networks. Sādhanā 44(10):1–11
24. Tao H, Ewees AA, Sulttani AOA, Beyaztas U, Hameed MM, Salih SQ, Armanuos AM, Ansari
NA, Voyant C, Shahid S, Yaseen ZM (2021) Global solar radiation prediction over North
Dakota using air temperature: development of novel hybrid intelligence model. Energy Rep
7:136–157
Maturity Status Estimation of Banana
Using Image Deep Feature and Parallel
Feature Fusion
Ashoka Kumar Ratha, Prabira Kumar Sethy, Nalini Kanta Barpanda,
and Santi Kumari Behera
Abstract Immediately following mango, the banana (Musa sp.) is India’s second
most important fruit crop. Once again, bananas are the most important fruit in world-
wide trade and the most widely consumed, ranking second only to citrus in terms
of value after apples. The size, colour, and ripeness of the fruits are the primary
factors in grading. The grading of bananas is based on maturity in four stages: green,
yellowish-green, mid-ripen, and overripen. Here, the maturity label of banana is esti-
mated using deep features of VGG16 and texture feature with SVM classifier. The
performance of classification is measured individually for deep feature and texture
features. The classification task is also performed using both deep and texture features
using parallel fusion. The accuracy and AUC using deep feature, texture feature and
both (using parallel feature fusion) are 92.34% and 0.99%, 89.99% and 0.97%, and
99.87% and 1%, respectively.
Keywords Banana grading · Deep feature · Parallel feature fusion · VGG16 ·

Texture feature
1 Introduction
Deep learning has been hailed as the cutting-edge technology in computer vision
techniques for image classification in the age of computerization. The quality of
fresh banana fruit is the most significant source of concern for purchasers and fruit
processors, and ripeness is the most important element in determining the fruit’s
storage life. The productivity of a banana’s development stage and the speed with
which it can be classified is the most conclusive variables influencing its quality
A. K. Ratha · P. K. Sethy (B) · N. K. Barpanda

Department of Electronics, Sambalpur University, Burla, Odisha 768019, India
e-mail: prabirsethy.05@gmail.com
N. K. Barpanda
e-mail: nkbarpanda@suniv.ac.in
S. K. Behera
Department of Computer Science and Engineering, VSSUT Burla, Burla, Odisha 768018, India
350 A. K. Ratha et al.
[1]. Fruit ripeness can be determined by a variety of characteristics, and the most
important of which is skin colour [2]. The majority of the time, human specialists use
optical identification to determine the stage of maturity of the fruit, which is prone
to inaccuracy. Consequently, it is important to draw and execute image processing
techniques in order to fix the ripening stage categorization of the various fresh arriving
banana bundles at the appropriate time. The amount of ripeness in banana fruit has
the greatest impact on the eating quality and market price of the banana fruit [3]. In
this research, it is proposed that a computer vision model be used to automatically
detect the ripening stages of bananas. Bananas are outlined at four different stages
of maturity in this article. The deep feature, the texture feature, and the SVM are
all used in the execution of the operation. Preparation of a data set consisting of
images of 104 green bananas, 48 yellowish-green bananas, 88 mid-ripen bananas,
and 32 over-ripen bananas is completed. This proposed system categorises bananas
according to their maturity level, which makes them more easily marketable.
There have been numerous reports on the use of image processing and machine
learning techniques for the classification of bananas. Mendoza and co-workers used
image analysis techniques to categorise ripened bananas into seven class of fruit. 49
banana samples were classified into their seven ripening stages with an accuracy of
98% using the L*, a*, and b* bands, brown area percentage and contrast. Several
chemical parameters, including Brix and pH are used to verify the findings [4]. Prabha
et al. proposed an image processing technique that may be used to precisely detect the
maturity stage of fresh banana fruit based on the colour and size values of the images
they took. A total of 120 images were used, with 40 images from each stage of devel-
opment, such as under-mature, mature, and over-mature. The accuracy was 85% [5].
Diez et al. used hyperspectral imaging techniques in the visible and near-infrared
(400–1000 nm) wavelength ranges to investigate the ripening stages of banana fruits
over their storage time in a ripening chamber (12 °C and 80–90% relative humidity)
[6]. Two batches of bananas, containing seven and fourteen bananas, were observed.
It was possible to discern between the spectral patterns associated with the various
ripening stages. The most significant changes in the relative reflectance spectra occur
around 680 nm, which is the wavelength at which an absorption band of chlorophyll is
centred. Principal component analysis applied to a calibration set of spectra revealed
statistically significant differences between the seven maturity classes based on the
scores of the first principal component (94.6% of the explained variance). Mesa et al.
proposed a deep learning model utilising morphological features and hyperspectral
imaging for grading bananas into three categories based on their quality [7]. This
method took into account both the external and internal characteristics of the banana
and achieved an accuracy of 98.45%. According to Mohapatra et al. [8], using the
dielectric properties of bananas, they created a quick and non-destructive method for
measuring the ripening stage of bananas. According to Olaniyi et al., an automated
method for distinguishing between healthy and unhealthy bananas was developed
using the GLCM texture feature and SVM [9]. The approach obtained an accu-
racy of 100%. An autonomous computer vision system for identifying the ripening
phases of bananas was created by Mazen and colleagues. First and foremost, a four-
class handmade database is constructed. Second, a framework based on artificial
Maturity Status Estimation of Banana Using Image … 351
neural networks is used to categorise and grade the ripening stage of banana fruits.
The system takes into account colour, the development of brown spots, and Tamura
statistical texture data. According to [10], this approach had a 97.45% of accuracy.
According to the state of the art, the maximum accuracy reached for banana grading
in terms of its maturity level in three classes is 85%, when image processing and
machine learning are used in conjunction with each other. As a result, it is necessary
to improve the accuracy of the system by include more banana maturity classes.
2 Methodology
The methodology comprises two phases. In the first phase, the performance of two
classification models, such as VGG16 plus SVM and GLCM texture feature plus
SVM, is evaluated for the classification of bananas into four classes as per their matu-
rity level. In the second phase, the parallel feature fusion is adapted. In our previous
work [11], for recognising 40 kinds of fruits, it was observed that in deep learning
approach VGG16 plus SVM and machine learning approach, GLCM feature plus
SVM outperformed the other classification models. Hence, we evaluated these two
models individually and got satisfactory results. Again, to enhance the performance,
the parallel feature fusion is adapted. Here, the deep feature of VGG16 (extracted
from fc8 layer) and 13 number GLCM feature is fused in a parallel fashion. So, a
total of 1013 features (1000 deep features of VGG16 plus 13 GLCM texture features)
are fed to SVM for classification. The detailed flow of methodology is depicted in
Fig. 1.
The images of four stages of bananas are collected, and then, the data set is then
enhanced by introducing different flipping and rotating operations. The distribution
of the original and enhanced data set is detailed in Table 1. The banana samples
concerning four ripen stages are illustrated in Fig. 1. The data set enhances by
executing the horizontal right-flip, horizontal left-flip, rotate right 90°, and rotate
left 90° operations. So, the data set is increased five times (Fig. 2).
a b c d
Fig. 1 Samples of banana. a Green banana b yellowish-green banana c mid-ripen banana d over-
ripen banana
352 A. K. Ratha et al.
Table 1 Detail of banana

Banana with its stage Original images Enhanced images
images
Green 104 520
Yellowish-green 48 240
Mid-ripen 88 440
Over-ripen 32 160
Fig. 2 Overall process flow

chart of banana grading
Initially, the classification models, i.e. VGG16 plus SVM and GLCM plus SVM, are
evaluated. The models are executed in Windows 10, core i5, 5th generation, 8 GB
RAM laptop with in-built NVIDIA GEFORCE in MATLAB 2020a platform. The
deep feature of VGG16 with SVM resulted in an accuracy of 92.34% and an AUC
of 0.99. Further, the SVM with the GLCM texture feature resulted in an accuracy
of 89.99% and AUC of 0.97. Again, with the adaptation of parallel feature fusion,
the SVM achieved an accuracy of 99.87% and an AUC of 1. This experimentation
revealed that the performance is significantly increased with the adaptation of the
parallel feature fusion technique. Hence, the deep feature of VGG16 with GLCM
texture feature and SVM is the best classification model for grading bananas into
four levels concerning their maturity levels.
Maturity Status Estimation of Banana Using Image … 353
4 Conclusion
An automatic system for grading bananas according to their maturity level is impor-
tant for the stock and export market. Here, three classification models, one deep
learning approach, a machine learning approach, and a hybrid approach are evalu-
ated. In the deep learning approach, the deep feature of VGG16 with SVM and the
machine learning approach, the GLCM texture feature with SVM, is evaluated. These
two classification models are considered among their respective strategies based on
our expert knowledge and results for fruit recognition. Here, the deep learning and
machine learning approaches are satisfactory for grading bananas into four levels
as per maturity levels. The deep learning approach, i.e. the deep feature of VGG16
plus SVM, achieved an accuracy of 92.34% and AUC of 0.99. Again, in the machine
learning approach, i.e. GLCM texture feature plus SVM attained an accuracy of
89.99% and AUC of 0.97. Further, with the adaptation of the parallel feature fusion
technique, the performance of the classification model is significantly increased, i.e.
the accuracy is 99.87% and AUC is 1. This automatic grading approach is helpful to
grade ripen bananas.
References
1. http://nhb.gov.in/report_files/banana/BANANA.html
2. Prasad K, Jacob S, Siddiqui MW (2018) Fruit maturity, harvesting, and quality standards. In:
Preharvest modulation of postharvest fruit and vegetable quality. Academic Press, pp 41–69
3. Maduwanthi SDT, Marapana RAUJ (2019) Induced ripening agents and their effect on fruit
quality of banana. Int J Food Sci 2019:8 pages. Article ID 2520179. https://doi.org/10.1155/
2019/2520179
4. Mendoza F, Aguilera JM (2004) Application of image analysis for classification of ripening
bananas. J Food Sci 69(9):E471–E477
5. Prabha DS, Satheesh Kumar J (2015) Assessment of banana fruit maturity by image processing
technique. J Food Sci Technol 52(3):1316–1327
6. Diez B et al (2016) Grading banana by VNIR hyperspectral imaging spectroscopy. In: VIII
international postharvest symposium: enhancing supply chain and consumer benefits-ethical
and technological issues, pp 1194
7. Mesa AR, Chiang JY (2021) Multi-input deep learning model with RGB and hyperspectral
imaging for banana grading. Agriculture 11(8):687
8. Mohapatra A, Shanmugasundaram S, Malmathanraj R (2017) Grading of ripening stages of red
banana using dielectric properties changes and image processing approach. Comput Electron
Agric 143:100–110
9. Olaniyi E et al (2017) Automatic system for grading banana using GLCM texture feature
extraction and neural network arbitrations. J Food Process Eng 40(6):e12575
10. Mazen F, Nashat A (2019) Ripeness classification of bananas using an artificial neural network.
Arab J Sci Eng 44(8):6901–6910
11. Behera SK, Rath A, Sethy PK (2020) Fruit recognition using support vector machine based
on deep features. Karbala Int J Mod Sci 6(2). Article 16. https://doi.org/10.33640/2405-609X.
1675
Application of a Combined GRNN-FOA
Model for Monthly Rainfall Forecasting
in Northern Odisha, India
Deba Prakash Satapathy, Harapriya Swain, Abinash Sahoo,

Sandeep Samantaray, and Suresh Chandra Satapathy
Abstract Rainfall forecasting is considered the most complex variable in the hydro-
logical cycle, and often its cause-impact relationship cannot be articulated in complex
or simple mathematical terms. Because of climate change, the varying amount of rain
can lead to either surplus or dryness in reservoirs. This research introduces a novel
hybrid model generalised regression neural network integrated with fruit fly opti-
misation algorithm (GRNN-FOA), to forecast monthly rainfall. Rainfall data were
collected from a local meteorological station from 1971 to 2020 and utilised in
this study to assess model performance. Performance of each approach is assessed
utilising root mean squared error (RMSE), Nash Sutcliffe efficiency (NSE), and Will-
mott index (WI). Results specify that the hybrid GRNN-FOA model is consistent
and accurate in estimating the risk level of significant rainfall events. Our proposed
robust model shows improved performance than conventional techniques, providing
a new thought in the area of rainfall prediction. This artificial intelligence-based
study would also help quickly and accurately predicting monthly rainfall.
Keywords GRNN · GRNN-FOA · Rainfall forecasting · Keonjhar station
1 Introduction
The most significant meteorological event with a substantial impact on human life
is rainfall. It is also considered one of the essential constituents of feasible planning
and design. Hence, proper understanding of rainfall-runoff process is significant in
D. P. Satapathy · H. Swain · S. Samantaray (B)

D. P. Satapathy
A. Sahoo
S. C. Satapathy
356 D. P. Satapathy et al.
water-sensitive urban designing approaches [5, 17, 21, 22]. As a result, understanding
and modelling rainfall have turned out to be essential in solving many water engi-
neering and flood problems and maintaining a stable agro-economic system fulfilling
necessities of sustainable growth [4, 7, 15, 21, 22, 24, 25]. Then again, inadequate
rain for an extended period causes droughts. Hence, rainfall prediction is essential
for protecting and improving the human lives, aquatic environment, and water usage
[3]. Recently, AI-based methods have become popular and are being broadly utilised
for forecasting/prediction purposes in different areas of science and engineering [1,
9, 11, 18–20]. These techniques are generalised data-driven methodologies that can
model linear and non-linear systems.
Nagahamulla et al. [12] investigated applicability of combined multilayer feed-
forward networks (MLFNs) with backpropagation (BP) algorithm, radial basis func-
tion network (RBFN), and GRNN for forecasting precipitation in Colombo, Sri
Lanka. Outcomes revealed that performance of integrated model is superior to perfor-
mances of other models. In another study, Chen et al. [2] applied MLFN, RBFN, and
GRNN to predict streamflow of River Jinsha, China. Lu et al. [8] employed GRNN,
support vector machine (SVM), and an autoregressive model to forecast monthly
rainfall. Their findings revealed that performance of both SVM and GRNN models
was better. Modaresi et al. [10] assessed performance of artificial neural network
(ANN), GRNN, least square-SVM, and K-nearest neighbour (KNN) for monthly
inflow forecasting to Karkheh dam, Iran, in different environments. Sanikhani et al.
[23] applied GRNN, multivariate adaptive regression splines (MARS), random forest
(RF), and extreme learning machines (ELMs) to estimate air temperature deprived
of climate-based inputs. They found that GRNN model was capable of estimating
temperature without climate-based inputs. Kamel et al. [6] employed GRNN and
RBFN to predict sub-surface evaporation rate considering wind speed, temperature,
water depth, and humidity as input parameters. Results showed that neural network
models have the potential for accurate prediction of evaporation rate.
Regardless of expected flexibility, latest investigations have revealed that stand-
alone AI techniques are not adequately appropriate for forecasting rainfall at longer
time scales, predominantly in semi-arid and arid areas where time series of rain
are very intermittent. Niu et al. [13] proposed GRNN-FOA for improving stability
and accuracy of icing prediction on transmission lines. Results indicated that GRNN-
FOA model provided better robustness, generality, and accuracy in icing forecasting.
Ruiming and Shijie [14] developed a reference evapotranspiration (ET0 ) prediction
model for daily ET0 prediction of Tieguanyin on the basis of integration of GRNN
and mathematical morphology clustering (MMC). FOA was utilised for optimising
GRNN’s smoothing factor. Predictions of different seasons under multifaceted mete-
orological conditions showed that projected model is effective with higher precision
and has better flexibility. Salehi et al. [16] aimed at forecasting and optimising pacli-
taxel biosynthesis and growth utilising GRNN-FOA data mining approach. Results
revelled that GRNN-FOA model produced better forecasting outputs than multilayer
perceptron-genetic algorithm (MLP-GA).
Application of a Combined GRNN-FOA Model for Monthly … 357
2 Study Area
Keonjhar District lies between 21° 1' N to 22° 10' N latitudes and 85° 11' E to
86° 22' E longitudes and covers a geographical area of 8303 km2 (Fig. 1). This
region is structurally and geologically complicated and is characterised by diverse
geomorphological set up leading to broadly deviating hydrogeologic conditions.
Because of tropical humid climate, Keonjhar receives average to massive rain from
southwest monsoon between June to September and from Northeast monsoon a little
between December and January. Average annual precipitation varies between 150
and 200 cm, and mean annual temperature varies between 22 and 27 °C.
Fig. 1 Location of Keonjhar district, Odisha

3 Methodology
3.1 GRNN
GRNN has robust non-linear mapping capability and is suitable to solve prob-
lems related to linear and non-linear regression. GRNN shows good performance
on converging speed to best outcomes for small and large sample datasets. In
mathematical terms, solution of GRNN can be expressed as
∫ +∞ ( )
[ ] Y→ f Y→ , | X→ d X→
−∞
E Y→ | X→ = ∫ ( ) (1)
+∞ → → →
−∞ f Y , | X d X
[ ]
where X→ —input vector; Y→ —predicted result of GRNN; E Y→ | X→ —true value of
output Y→ ; and f (Y→ , | X→ )—combined probability density function of X→ and Y→ .GRNN
algorithm’s main architecture includes three main layers: input, hidden, and output.
First hidden layer is a pattern layer (RBF layer) with a Gaussian function, whereas
second layer is a summary layer with a linear function. Even though GRNN is
considered a straightforward and quick predictor, its usability is limited to only
regression models. It also has some disadvantages because it lacks the ability of
extrapolation. Moreover, GRNN is generally similar to kernel functions and is nega-
tively pretentious by matters associated with dimensionality. GRNN cannot overlook
inappropriate inputs without significant adjustments to its elementary algorithm.
3.2 FOA
Pan (2012) proposed FOA, an optimisation algorithm that mimics the foraging
behaviour of a fruit fly. FOA simulates procedure utilised by fruit flies for finding
food by manipulating their intense sense of vision and smell. FOA applies an itera-
tive space search for finding solutions (Cao and Wu 2016). Computations are simple
and minimal; their convergence rate is quick and generally easy for implementation
(Li et al. 2020; Cao and Wu 2016). Therefore, FOA has subsequently been the main
point of most investigations in optimisation domain (Mao et al. 2014). In addition,
FOA can get the better of problems in finding the optimum GRNN flattening factor
σ faced by the prevailing GRNN method, thus enhancing the prediction accurate-
ness. Figure 2 shows the flowchart of FOA optimisation procedure for GRNN, with
detailed steps.
Fig. 2 Flowchart of GRNN-FOA model

Steps to implement FOA can be summarised as given below:

Step 1: Initialise optimisation problem and constraints of the algorithm.
Step 2: Reiteration till stopping conditions are achieved. Firstly, choose a location
through distance and smell concentration decision value randomly. Secondly,
compute its fitness function (Si ). Thirdly, discover fruit fly with the maximum
smell concentration amongst fruit fly swarms. Lastly, solutions are ranked and
transfer to preeminent solution.
Step 3: Post-process and visualise outcomes.
This section describes the results of rainfall data using the GRNN and GRNN-FOA
based on different scenario condition. Performance indicators like NSE, RMSE,
and WI are employed for evaluating efficacy of model. Result reveals that GRNN-
FOA found better performance than standard GRNN with NSE, RMSE, WI values
are 0.9964, 1.39, 0.9937 during training phases. Similarly, GRNN values of NSE,
RMSE, WI are 0.9547, 10.3398, 0.9533 during training phases, respectively. The
performances of proposed algorithm for all five scenario conditions are given in
Table 1.
Figure 3 shows that the scatter plot results of predicting rainfall at Keonjar gauge
station with prominent R2 value 0.94353, 0.96875 for GRNN and GRNN-FOA,
respectively. Figure 4 shows comparison of predicted model (GRNN and GRNN-
FOA) and observed rainfall for Keonjhar gauge station. Box plot for actual and
predicted model (GRNN and GRNN-FOA) is given in Fig. 5.
It is observed that GRNN-FOA is superior to GRNN as a prospective approach.
GRNN cannot grasp non-linearity in a dataset, and GRNN-FOA wind-up in being
Table 1 Results presented in training and testing phases using GRNN and GRNN-FOA models
Station name Model name NSE RMSE WI NSE RMSE WI
Training Testing
Keonjhar GRNN 1 0.9516 13.5219 0.9486 0.9448 17.0254 0.9426
GRNN 2 0.9521 12.964 0.9502 0.9452 16.934 0.943
GRNN 3 0.9532 12.36 0.951 0.9463 16.128 0.9447
GRNN 4 0.954 11.047 0.9526 0.947 15.4126 0.9451
GRNN 5 0.9547 10.3398 0.9533 0.9487 14.394 0.9462
GRNN-FOA 1 0.9932 4.5872 0.9908 0.9658 8.8746 0.9617
GRNN-FOA 2 0.9943 3.478 0.9916 0.9664 8.103 0.9631
GRNN-FOA 3 0.995 2.9634 0.9924 0.967 7.4469 0.965
GRNN-FOA 4 0.9956 2.1178 0.993 0.9683 6.72 0.9668
GRNN-FOA 5 0.9964 1.39 0.9937 0.9699 5.3301 0.9672
Fig. 3 Scatter plots showing R2 and linearly fitted equation between observed and forecasted
rainfall values
Fig. 4 Observed and forecasted rainfall for a GRNN and b GRNN-FOA models
Fig. 5 Box plots showing

rainfall values of observed
and prediction model
useful in such situations. In addition, root mean square error (RMSE) is processed for
both models for assessing implementation of these models. The outcomes of present
research revealed that stand-alone ML models are capable of predicting rainfall with
standard level of precision, but, applying hybrid ML algorithms certainly provided
more precise rainfall predictions.
5 Conclusion
As rainfall time series is non-stationary and non-linear, conventional approaches may

not find the best outcomes. Hence in this study, the GRNN-FOA model’s perfor-
mances were evaluated to predict rainfall for Keonjhar district of Odisha, India.
The models were validated using R2 and RMSE. The rainfall estimation outcomes
revealed that the GRNN-FOA was the most appropriate model, considerably outper-
forming the benchmark in specified study station. Furthermore, we find out that
GRNN-FOA performed more stable in the scatter plots than other models. Utilisation
of FOA as an optimiser for GRNN resulted in finding a solid system that significantly
improves the prediction accurateness because of the optimum values of periodical
gene weights achieved in hybrid model. In the future, it may be thought-provoking to
integrate predictive models with recently-established deep learning techniques like
stacked autoencoder algorithm for improving performances of rainfall predictions.
References
1. Agnihotri A, Sahoo A, Diwakar MK (2021) Flood prediction using hybrid ANFIS-ACO model:
a case study. In: Inventive computation and information technologies: proceedings of ICICIT
2021, p 169
2. Chen L, Singh VP, Guo S, Zhou J, Ye L (2014) Copula entropy coupled with artificial neural
network for rainfall–runoff simulation. Stoch Env Res Risk Assess 28(7):1755–1767
3. Danandeh Mehr A, Nourani V, Karimi Khosrowshahi V, Ghorbani MA (2019) A hybrid support
vector regression-firefly model for monthly rainfall forecasting. Int J Environ Sci Technol
(IJEST) 16(1)
4. Hartmann H, Snow JA, Stein S, Su B, Zhai J, Jiang T, Krysanova V, Kundzewicz ZW (2016)
Predictors of precipitation for improved water resources management in the Tarim River basin:
creating a seasonal forecast model. J Arid Environ 125:31–42
5. Jimmy SR, Sahoo A, Samantaray S, Ghose DK (2021) Prophecy of runoff in a river basin using
various neural networks. In: Communication software and networks. Springer, Singapore, pp
709–718
6. Kamel AH, Afan HA, Sherif M, Ahmed AN, El-Shafie A (2021) RBFNN versus GRNN
modeling approach for sub-surface evaporation rate prediction in arid region. Sustain Comput
Inform Syst 30:100514
7. Kusiak A, Wei X, Verma AP, Roz E (2012) Modeling and prediction of rainfall using radar
reflectivity data: a data-mining approach. IEEE Trans Geosci Remote Sens 51(4):2337–2342
8. Lu W, Chu H, Zhang Z (2015) Application of generalized regression neural network and
support vector regression for monthly rainfall forecasting in western Jilin Province, China. J
Water Supply Res Technol—AQUA 64(1):95–104
9. Moustris KP, Larissi IK, Nastos PT, Paliatsos AG (2011) Precipitation forecast using artificial
neural networks in specific regions of Greece. Water Resour Manage 25(8):1979–1993
10. Modaresi F, Araghinejad S, Ebrahimi K (2018) A comparative assessment of artificial neural
network, generalized regression neural network, least-square support vector regression, and
K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear
conditions. Water Resour Manage 32(1):243–258
11. Mohanta NR, Patel N, Beck K, Samantaray S, Sahoo A (2021) Efficiency of river flow prediction
in river using wavelet-CANFIS: a case study. In: Intelligent data engineering and analytics.
12. Nagahamulla HR, Ratnayake UR, Ratnaweera A (2012) An ensemble of artificial neural
networks in rainfall forecasting. In: International conference on advances in ICT for emerging
regions (ICTer2012). IEEE, pp 176–181
13. Niu D, Wang H, Chen H, Liang Y (2017) The general regression neural network based on
the fruit fly optimization algorithm and the data inconsistency rate for transmission line icing
prediction. Energies 10(12):2066
14. Ruiming F, Shijie S (2020) Daily reference evapotranspiration prediction of Tieguanyin tea
plants based on mathematical morphology clustering and improved generalized regression
neural network. Agric Water Manage 236:106177
15. Sahoo A, Samantaray S, Paul S (2021) Efficacy of ANFIS-GOA technique in flood prediction:
16. Salehi M, Farhadi S, Moieni A, Safaie N, Hesami M (2021) A hybrid model based on general
regression neural network and fruit fly optimization algorithm for forecasting and optimizing
paclitaxel biosynthesis in Corylus avellana cell culture. Plant Methods 17(1):1–13
17. Samantaray S, Sahoo A (2020) Prediction of runoff using BPNN, FFBPNN, CFBPNN
algorithm in arid watershed: a case study. Int J Knowl Based Intell Eng Syst 24(3):243–251
20. Samantaray S, Sahoo A, Ghose DK (2020) Prediction of sedimentation in an arid watershed

using BPNN and ANFIS. In: ICT analysis and applications. Springer, Singapore, pp 295–302
21. Samantaray S, Sahoo A, Mohanta NR, Biswal P, Das UK (2021) Runoff prediction using hybrid
neural networks in semi-arid watershed, India: a case study. In Communication software and
networks. Springer, Singapore, pp 729–736
hybrid neural network method: Mahanadi River Basin, India. J Geol Soc India 97(8):867–880
23. Sanikhani H, Deo RC, Samui P, Kisi O, Mert C, Mirabbasi R, Gavili S, Yaseen ZM (2018)
Survey of different data-intelligent modeling strategies for forecasting air temperature using
geographic information as model predictors. Comput Electron Agric 152:242–260
24. Trinh TA (2018) The impact of climate change on agriculture: findings from households in
Vietnam. Environ Resource Econ 71(4):897–921
25. Wang B, Xiang B, Li J, Webster PJ, Rajeevan MN, Liu J, Ha KJ (2015) Rethinking Indian
monsoon rainfall prediction in the context of recent global warming. Nat Commun 6(1):1–9
Guided Image Filter and SVM-Based
Automated Classification of Microscopy
Images
Vikrant Bhateja, Disha Singh, and Ankit Yadav
Abstract Microscopy images are acquired by capturing the microscopic view of

blood sample under a microscope using a camera. The image quality is not that
reliable further making bacteria classification a tedious task. Further manual classi-
fication requires a specialist recommendation. Guided image filter (GIF) has been a
suitable filter for contrast enhancement of the images. Otsu thresholding (OT) has
been a suitable algorithm for segmentation of microscopy image. Scale invariant
feature transform has been an appropriate method for feature extraction. Support
vector machines (SVMs) classifier is a suitable classifier with a large dataset. In this
paper, a combinative approach of all the aforesaid methods is proposed on the bacte-
rial microscopy images for the classification of the bacterial cells in the microscopy
images. The image quality assessment (IQA) of the enhanced image is evalu-
ated using parameters like standard deviation (SD) and entropy. The performance
evaluation of the classifier has been carried out using confusion matrix.
Keywords GIF · OT · SIFT · SVM · Entropy · SD
1 Introduction
Bacterial species identification is necessary because the biological information of the

microorganisms is very essential in the field of medicine, veterinary science, farming,
biochemistry, and food industry. Although, many microorganisms are useful in our
daily lives (such as Streptococcus for fermentation of dairy and vegetable products),
and many diseases are caused by them (often they are infectious). Therefore, identi-
fying them becomes obligatory so that the infection level can be diagnosed and cured
properly [1]. Microscopes are used as microorganisms are not visible to the naked
eye. Microscopy images are termed as the images of the microorganisms captured
V. Bhateja (B) · D. Singh · A. Yadav

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial College
of Engineering and Management (SRMCEM), Lucknow, Uttar Pradesh 226028, India
e-mail: bhateja.vikrant@gmail.com
Dr. A. P. J. Abdul Kalam Technical University (AKTU), Lucknow, Uttar Pradesh 226031, India
366 V. Bhateja et al.
under a microscope. Microscopy images are acquired using samples (blood, oral
cavity, or urine) from the human body using which a slide is prepared and further
placed under a microscope for capturing an image. Further, using these images,
microorganisms are detected and classified by the pathologist [2]. Because of poor
visibility, contrast, acquired noise, etc., the credibility of these images is questioned
for classification purposes. For increasing the accuracy of classification, enhance-
ment of the microscopy images is necessary. To achieve better results, pre-processing
techniques are used followed by machine learning algorithms for an automated clas-
sification. In the works [3–5], multi-scale retinex (MSR) is used for the contrast
enhancement of the image. Dynamic range compression and color constancy are
obtained by this method. The GIF [6] is used for enhancement as well as noise
filtering in the microscopy images. It mostly emphasizes on the various uses of GIF
such as noise improvement, contrast enhancement, and feathering. In [7], OT has
been used for segmentation of the images. In [8], SIFT feature extraction is used for
classification of the images. In [9], SVM is used for classification of the classification
of the bio-medical images. The combinative approach using these techniques can be
used to develop an automated classifier for microscopy images. The organization
of the rest of the paper is in the following sections: Sect. 2 describes the general
overview of GIF, OT, SIFT, and SVM; In Sect. 3, discussion is on the IQA, summary
of the results, performance metrics, and discussion of the outcomes of this work. In
Sect. 4, conclusion of the work is discussed.
2 Proposed Automated Classification of Microcopy Images
2.1 Guided Image Filter (GIF)
GIF is a widely used bilateral filter because of its extraordinary edge-preserving

property and also its efficiency is very high. Considering the guidance image and a
specified window, the pixel values are calculated by GIF using a linear model. GIF
has three controlling parameters which are regularization parameter, window size,
and multiplication factor. The values of theses parameters are decided according to
the image selected as the input image [10]. Because of the suitable values of these
parameters, gradient reversal artifacts are prevented in the GIF, thus making it more
reliable than the conventional bilateral filters. GIF can be used in a large number of
applications such as smoothening, enhancement, feathering, and noise removal. If the
guidance image and all the controlling parameters are selected and used appropriately,
GIF can be used for making the image smoother and more structured [11].
Guided Image Filter and SVM-Based Automated Classification … 367
2.2 Otsu Thresholding (OT)
Segmentation on the pre-processed image is applied to retain the bacteria present in

the image and remove the unwanted background. The process in which the image
is partitioned into multiple segments to achieve a more meaningful image which is
easier to analyze is called segmentation [12]. In this work, OT is used for segmenting
the pre-processed image to extract the meaningful regions of the input image. The
pre-processed images are converted to gray level followed by binarization. OT works
according to the mathematical Eq. (1) [13]:
⎧
255 q(x, y) ≥ t
p(x, y) = (1)
0 q(x, y) ≤ t
where p(x, y) is the value of pixel of the result at (x, y); q(x, y) is the value of input
image pixel and t is the optimum threshold value.
OT is an automatic region-based segmentation algorithm which largely depends on
the value of threshold selected. For achieving reliable segmentation, special attention
has to be given while selecting an optimal value of threshold.
2.3 Scale Invariant Feature Transform (SIFT)
SIFT is used for the extraction of the local features known as the key points of an
image. These extracted key points of the microscopy images are the rotation and
scale invariants which are used for image matching and image classification by the
machine learning algorithms. These SIFT features are dependent on the appearance of
the bacterial cells in the microscopy image [14]. These SIFT features are independent
of the illumination level of the image, minor changes in the viewpoint as well as the
noise in the images. The SIFT key points extracted are saved as a visual vocabulary
for the classification of images. This visual vocabulary is known as the bag of visual
words (BoVW) model. This BoVW model is used as a reference by the classification
algorithms.
2.4 Support Vector Machines (SVMs)
SVM is supervised machine learning algorithm used for classification of images.

SVM is suitable in the cases where the input dataset used for training is large. SVM
takes the input features and marks a best hyperplane which is closest to the various
types of training sets [15]. This hyperplane can be of any shape, that is; it is linear
for two types of training sets, and it takes the shape of ‘Y ’ in the case of three types
of images used for classification. In our work, we have used three types of images
Fig. 1 Design methodology of automated classification of bacterial cells
for classification; therefore, the SVM hyperplane is of ‘Y ’ shape. During the testing
phase, the datapoints (referred to the features) are classified into the category to which
it is the closest. The proposed design methodology for the automated classification
is shown in Fig. 1.
3.1 Performance Evaluation
In the above discussed methods, contrast enhancement of the colored images is

carried out by GIF. For achieving the proper results in the segmentation using OT,
the IQA of GIF response is of utmost importance. Contrast enhancement of the
colored microscopy images is evaluated through entropy [16] and SD [17] for both
input and output images as they are absolute values. Higher the value of entropy
and SD, better is the result. The performance evaluation of the classifier is carried
out using confusion matrix of SVM, and the achieved confusion matrix is compared
with other classifiers.
3.2 Experimental Results
The above discussed simulation is performed on the dataset of microscopy images of

bacterial cells taken from DiBAS dataset [18]. In the initial stage, some of the images
from the dataset are selected as test images. The GIF is applied on the test images
to access the performance of the GIF. After satisfactory tuning of GIF, the dataset
of the bacterial cells is used for the contrast enhancement followed by segmentation
using OT. Features of a particular type of bacteria cells are extracted using SIFT
as discussed in Sect. 2.3. Based on these features, the bacteria cells are classified
SD= 0.81 SD= 3.21

Entropy=1.65 Entropy=9.51
(a) (b)
(c) (d)
Fig. 2 a Original test image#1, b contrast enhancement using GIF, c segmented image, d SIFT
feature extraction
using SVM as discussed in Sect. 2.4. Based on this classifier, a confusion matrix is
obtained for the evaluation of the overall accuracy of the classifier. Response of the
simulation is shown in Figs. 2 and 3. Figure 2 depicts the results obtained at each step
of the working, while Fig. 3 depicts the confusion matrix obtained for the classifier.
The IQA parameters SD and entropy are used for the comparison of the responses
for original image and GIF response. The confusion matrix obtained after the
classification of the bacterial cells using SVM is depicted in Fig. 3.
3.3 Discussions
Figure 2 depicts a test image and the responses at all the proceeding steps, while Fig. 3
shows the confusion matrix obtained for the SVM classifier. The original test image
consists of the bacterial cells, but the appearance of these cells in the microscopy
images is not very clear. Thus, making the classification purpose very challenging.
The poor visual characteristics of the input microscopy images should be improved
Fig. 3 Confusion matrix for

SVM classifier
for the proper segmentation and classification. The contrast enhancement achieved
by GIF is shown in Fig. 2b. It can be clearly inferred from this image that the GIF filter
has improved the images quality by improving the contrast as well as sharpening the
edges. This is further proved by incremental values of SD and entropy in the GIF
response as compared to original microscopy images. The bacterial cells are visible
properly and are separated from each other as well as the background. After the
contrast enhancement, OT is used for segmentation. In the segmentation process, the
unwanted background is removed from the images, while the meaningful regions
(bacteria cells) are retained as depicted in Fig. 2c. After the segmentation of the
images, features are extracted in the form of key points in the images which are
further used for training the SVM for the successful classification of the bacterial
cells. The SVM is trained for the classification of six species of bacteria cells, and
the confusion matrix thus obtained is depicted in Fig. 3. On the basis of the confusion
matrix, the accuracy of the classifier is calculated to be 93.3%.
4 Conclusion
In this paper, an improved approach for classification of bacteria cells in microscopy

images is presented in which image processing is carried out with GIF, segmentation
is carried out with OT, and SIFT features are extracted for training the SVM classifier
for the classification of six genera of bacteria cells. The resultant image after GIF is
quite satisfactory as it provides a significant result after segmentation followed by a
proper extraction of key points in the images for classification purpose. The image
quality after GIF is assessed by using SD and entropy, and a significant increment in
these values proves the quality of image enhancement. The accuracy obtained after
the classification is shown in the form of confusion matrix in Fig. 3 on the basis
of which accuracy is calculated to be 93.3% which is also very convincing. This
work can be further improved by improving the various methods used. GIF used in
this work is manually tuned which can be tuned automatically by using the various
optimizing algorithms [19]. OT is a very simple segmentation algorithm, which can
be replaced by using a more sophisticated technique. Further, the classifier used
here is SVM which can be further improved by using deep learning algorithms for a
more accurate classification. Thus, by incorporating these techniques, a more flexible
classifier can be developed.
References
1. Zielinski B, Oleksiak AS, Rymarczyk D, Piekarczyk A (2020) Deep learning approach to

describing and classifying fungi microscopic images. Journal of arXiv, Vol. Xxx, No. Xxx,
pp. 1–21, January 2020.
2. Nizar A, Yigit A, Isik Z, Alpkocak A (2019) Identification of Leukemia subtypes from
microscopic images using convolutional neural network. Diagnostics 9(3):1–11
3. Jobson DJ, Rahman Z, Woodell GA (2001) Spatial aspect of color and scientific implications
of retinex image processing. Vis Inf Process Int Soc Opt Photonics 4388:117–128
4. Rahman ZU, Jobson D, Woodell J, Woodell GA (2004) Retinex processing for automatic image
enhancement. J Electron Imaging 13(1):100–110
5. Barnard K, Funt B (1997) Analysis and improvement of multi-scale retinex in color and
imaging. Soc Imaging Sci Technol 1997(1):221–226
6. Sharma A, Bhateja V, Sinha AK (2015) Synthesis of flash and no-flash image pairs using
guided image filtering. In: Proceedings of 2nd international conference on signal processing
and integrated networks (SPIN). Noida, India, pp 768–773
7. Win KY, Choomchuay S (2017) Automated segmentation of cell nuclei in cytology pleural
fluid images using OTSU thresholding. In: Proceedings of international conference on digital
arts, media and technology (ICDAMT). Chiang Mai, Thailand, pp 14–18
8. Azhar R, Tuwohingide D, Kamudi D, Suciati N (2015) Batik image classification using SIFT
feature extraction, bag of features and support vector machine. Proc Comput Sci 72:24–30
9. Bhateja V, Tiwari A, Gautam A (2017) Classification of mammograms using sigmoidal
transformation and SVM. In: Smart computing and informatics. Singapore, pp 193–199
10. He K, Sun J, Tang X (2010) Guided image filtering. In: European conference on computer
vision. Berlin, Heidelberg, pp 1–14
11. Awasthi N, Katare P, Gorthi SS, Yalavarthy PK (2020) Guided filter based image enhancement
for focal error compensation in low cost automated histopathology microscopic system. J
Biophotonics 13(11):1–23
12. Pandey A, Yadav A, Bhateja V (2013) Volterra filter design for edge enhancement of mammo-
gram lesions. In: Proceedings of 3rd IEEE international advance computing conference (IACC).
Ghaziabad, India, pp 1219–1222
13. Bhateja V, Srivastava A, Singh G, Singh J (2014) A modified speckle suppression algo-
rithm for breast ultrasound images using directional filters. Inict and critical infrastructure.
In: Proceedings of the 48th annual convention of computer society of India, vol II. Cham, pp
219–226
14. Mohamed BA, Afify HM (2018) Automated classification of bacterial images extracted
from digital microscope via bag of words model. In: Proceedings of 9th Cairo international
biomedical engineering conference (CIBEC). Cairo, Egypt, pp 86–89
15. Bhateja V, Taquee A, Sharma DK (2019) Pre-processing and classification of cough sounds in
noisy environment using SVM. In: Proceedings of 4th international conference on information
systems and computer networks (ISCON). Mathura, India, pp 822–826
16. Bhateja V, Nigam M, Bhadauria AS, Arya A, Zhang EY (2019) Human visual system based opti-
mized mathematical morphology approach for enhancement of brain MR images. J Ambient
Intell Humanized Comput 1–9
17. Sahu A, Bhateja V, Krishn A (2014) Medical image fusion with Laplacian pyramids.
In: Proceedings of international conference on medical imaging, M-health and emerging
communication systems (MedCom). Greater Noida, India, pp 448–453
18. The bacterial image dataset (DIBaS) is available online at: http://misztal.edu.pl/software/dat
abases/dibas/. Last visited on 10 Dec 2020
19. Jordehi AR (2015) Enhanced leader PSO (ELPSO): a new PSO variant for solving global
optimisation problems. Appl Soft Comput 26:401–417
Application of Machine Learning
Algorithms for Creating a Wilful
Defaulter Prediction Model
B. Uma Maheswari, Hari Shankar Chandran, R. Sujatha, and D. Kavitha
Abstract A “wilful defaulter” is a borrower who has the financial means to repay the
bank but chooses not to do so. With the increasing cases of such defaulters creating
serious economic implications for the country, it is essential to develop a robust
credit assessment model to predict defaulters. The primary objective of this study
is to develop an efficient wilful defaulter prediction model through the deployment
of machine learning algorithms like logistic regression, Naïve Bayes, and random
forest. The dataset was 250 public-listed companies as published by the RBI and
the All India Bank Employees Association for 2020–2021. The analysis showed that
debt service coverage ratio, debt-equity ratio, profit after tax, governance factors like
board size, promoters, and board composition are crucial factors in this prediction.
The study helps organizations by giving them a framework to focus upon these factors
to avoid financial complications in future.
Keywords Wilful defaulters · Machine learning · Predictive analytics · Firm

performance · Prediction models
1 Introduction
The term “default” refers to any individual or company that has not paid their loans
within the agreed-upon payback time and has breached the lending authority’s terms
and conditions. A “wilful defaulter” is a borrower who has the financial means to
repay the bank but chooses not to do so. The Reserve Bank of India (RBI) has
issued guidelines for identifying a wilful defaulter [16]. The defaulter ratio has been
steadily rising over a period of time for a number of reasons: taking advantage of the
B. Uma Maheswari (B) · H. S. Chandran · R. Sujatha · D. Kavitha

PSG Institute of Management, Coimbatore, Tamil Nadu 641004, India
e-mail: uma@psgim.ac.in
R. Sujatha
e-mail: sujatha@psgim.ac.in
D. Kavitha
e-mail: kavitha@psgim.ac.in
374 B. Uma Maheswari et al.
country’s weak governance systems, ineffective economic and legal systems, and the
inability of the financial institutions’ risk assessment models to predict defaulters.
However, with the implementation of machine learning models, data analytics can
be effectively deployed to improve the prediction rate. This study aims to propose
a model that would not only prove to be an efficient credit assessment solution but
also act as a tool in minimizing the risk of defaults by predicting potential defaulters
and analysing the attributes that lead to the scenario.
2 Literature Review
The banking sector has contributed significantly to the country’s economic develop-
ment. One of the most important banking activities is lending loans to customers,
corporate organizations, micro, small, and medium enterprises (MSMEs), and start-
ups. Bank credit increased at a compounded annual growth rate (CAGR) of 0.29%
from fiscal year (FY) 16 to fiscal year (FY) 21, with total credit extended in FY21
totalling $1487.60 billion. The current challenges faced by a multitude of financial
institutions in the banking sector in India are high non-performing assets (NPAs) due
to an increasing number of defaulters with outstanding payments exceeding 25 lakhs
and 1 crore, according to RBI [16]. In March 2021, Indian banks were said to have
gross NPAs valued at over Rs. 8.3 lakh crore. Many businesses that were affected
by the current pandemic are experiencing liquidity shortages, which has resulted in
payments and reimbursements being delayed. Post-COVID defaults are expected to
increase to 10–11 lakh crores by March 2022 [4]. Wilful defaulters rose by over 200
from 2208 to 2494 in FY21, as of March 2021 [13] according to RBI data. The top
100 wilful defaulters owe lenders Rs. 84,632 crore [14]. Furthermore, legacy debts
or backlogs on the bank balance sheets must still be accounted for as they account
for a large share of NPAs. The Ministry of Finance in India has recently proposed
a scheme by setting up bad banks, collectively called the National Asset Recon-
struction Company (NARCL), with funding to help the banks recover from backlogs
during the first phase of recovery. This will help stabilize the margins of banks and,
furthermore, contribute to the GDP of the country [6]. Initially, the government has
agreed to lend Rs. 90,000 crores to NARCL and provide additional incremental funds
with an overall target of covering 2 lakh crores of NPA.
Many studies have developed models for credit risk assessment using unstructured
and structured data [2]. In certain studies, it was found that, in comparison with the
final prediction model, more emphasis should have been given to the data prepro-
cessing steps to achieve better accuracy [1]. The relevant features were extracted prior
to model construction by using a compromised analytic-hierarchy process (AHP)
approach [5]. In contrast to a mere quantitative metric-based approach, the financial
characteristics of companies were studied, which classified defaulters with respect to
industry and identified the likeliness of a defaulter trait using the Altman Z Score [9].
Application of Machine Learning Algorithms for Creating … 375
Another study focused on the various types of loan products and the credit scoring
models used to grant loans [1]. The characteristic traits were used to build a credit risk
model using logistic regression along with net cash flow from financing activities,
investment activities, and cash inflows and outflows [8].
Application of a CatBoost model with synthetic characteristics was used to create a
prediction model that focuses significantly on categorical [15]. Research on the Indian
banking sector and significant factors responsible for the non-performing assets have
been identified. According to this study, non-priority sector lending contributes more
to NPAs than priority sector loans [5]. In addition to this, it was also inferred that
a fraudulent credit rating affects the probability of default. It is seen that increasing
liquidity and competition are partly responsible for the growing NPAs [11]. An
empirical study was undertaken to examine the non-performing assets in India’s
governmental, corporate, and foreign sector banks [3]. In the case of educational loan
defaults, the impact of macroeconomic conditions greatly improved the classification
accuracy [7]. A few others studied the measures taken by the government to accelerate
the loan recovery rate, which can help with new model design [12].
Studies also dealt with the partitioning and clustering of real-time data by imple-
menting incremental k-means for clustering with reduced iterations, modified k-
modes using frequency measures, and K-prototype algorithms for a hybrid method,
thereby reducing the cost of implementation [10]. A thorough review of existing
literature in this domain indicates that mostly quantitative and financial factors were
considered for prediction of such credit assessment models, and the qualitative factors
were neglected. This study addresses the limitations of the previous research and uses
financial indicators, macroeconomic factors represented by industry performance,
and factors relating to corporate governance. The primary objective of the paper is
to design a machine learning model to predict the wilful defaulters using logistic
regression, Naive Bayes, and random forest algorithms and to understand the key
influential factors that lead to wilful default and outstanding loans.
3 Methodology
The backbone of the research primarily starts with the selection of a relevant
dataset comprising of wilful defaulters and non-defaulters, extracted from the recent
published list of wilful defaulters by the RBI based on outstanding payments,
credit ratings, history of repayments, and potential bankruptcy. A list of 150 wilful
defaulters was selected this, and 100 non-defaulter companies were selected from
the NIFTY 50 index. The resultant dataset is the combination of these 250 publicly
listed companies. The dataset considers variables that the literature has identified
as factors that contribute significantly to a firm’s performance in terms of financial,
industrial, governance, and firm performance characteristics. The data pertaining to
the parameters that define the characteristics of a wilful defaulter has been extracted
Table 1 Variables used in the study

Parameters Independent variables Description
Financial Current assets and current liabilities Financial strength and stability
Current ratio, quick ratio, interest Liquidity ratios
coverage
Profit after tax, total income Profitability measures
Industry Total assets and liabilities, sales, cost Market share, potential, expense
of goods sold
DSCR (times) and debt-equity ratio Debt payment ability of the firm
Governance Board size, % of independent Independent and non-independent
directors
Gender diversity, number of meetings Female directors, AGM attended
CEO duality (CEO and chairperson 0—No; 1—Yes
same)
Promoters number, equity, foreign Board structure, shares (nos.)
directors
Firm performance Return on assets Net income to total assets ratio
Return on net worth Net income to shareholder equity
Net income Sales—cost of goods sold
from the CMIE Prowess IQ database. The final dataset contains 30 variables under
these four main categories. The financial variables comprise key ratios like liquidity
ratios, profitability ratios, and leverage ratios. The industry characteristics include
market share, expenses, total assets, and total debts. The governance attributes include
board size, composition of directors, board diversity, number of meetings attended,
and CEO duality. The firm’s performance is measured using return on assets (ROA),
net worth, and net income. Among these variables, the dependent variable is the vari-
able indicating a wilful defaulter which is a binary classification variable with the
values “Yes” (defaulter) and “No” (non-defaulter). The sub-categories under each of
the indicators are presented in Table 1. The data preprocessing was done by identi-
fying the outliers and missing values and treating them. The outliers were treated by
the winsorization technique using the 95 percentile and 5 percentile rule to achieve
better results. Missing (NA) values were treated using the mean imputation method.
3.1 Prediction Model
Figure 1 represents the model representing the variables like financial indicators,
industry characterestics and firm performance impacting whether a company would
be a wilful defaulter or not.
Fig. 1 Prediction model
3.2 Model Performance Measures
The different model performance measures used in this study are discussed here.
Accuracy is a measure of the overall model performance and is the percentage of
correct predictions, and Sensitivity (“true positive rate”) refers to the percentage
of correct positive predictions, and Specificity (“true negative rate”) refers to the
percentage of correct negative predictions. Area under the curve (AUC) measure
gives the area under the ROC curve (ROC—receiver operating characteristic). The
higher the AUC, the better the model distinguishes between positive and negative
classes. Gini coefficient ranges from 0 (inequality) to 1 (inequality). Kolmogorov–
Smirnov (KS) parameter estimates the degree of separation between the positive and
negative distributions. The higher the number, the better the model is at distinguishing
between positive and negative classes.
4 Analysis and Discussion
The analysis shows that there is a significant difference between the defaulter and non-
defaulter in terms of the different financial indicators and ratios of the non-defaulters
are better than the defaulters (Table 2).
Three machine learning models (logistic regression, Naïve Bayes, and random
forest) were applied to the dataset. The model outputs and the corresponding accuracy
measures are presented in Table 3.
Table 3 shows that the random forest algorithm has the highest accuracy in compar-
ison with the other models. Here, the specificity value is the highest in all the models,
which indicates that the prediction of negative classes is done accurately. The KS
parameter is very high in the random forest, and the Gini coefficient ranges from
0.15 to 0.60 in the model. Overall, the random forest algorithm gives the best results
in terms of models predicting wilful defaulters. Therefore, the variable importance
plot (Fig. 2) of the random forest algorithm was analysed in order to understand the
significant variables influencing defaulters.
It is seen that among the four main categories of financial, industry, governance,
and firm performance, the variable importance plot (Fig. 2) shows that under the
Table 2 Summary statistics of financial indicators for wilful defaulters (“Yes”)

Financial Wilful defaulter (“Yes”) Wilful defaulter (“No”)
indicator Mean S.D Min Max Mean S.D Min Max
Current 1.10 3.27 0 37.86 2.06 1.99 0.01 10.61
ratio
Quick 0.79 3.17 0 36.88 1.62 2.03 0.01 10.56
ratio
Debt to 16.05 64.89 0 421.61 3.19 17.12 0 163.98
equity
DSCR −3.67 27.46 −233 23 11.48 42 0.01 353.26
Interest 2.09 2.85 0.03 21.59 33.67 89.36 0.02 562.54
cover
ROA 0.04 0.18 −0.94 0.92 0.09 0.24 −0.84 1.416
RONW −61.07 461.0 −4424.8 199.85 119.16 1125.06 10,304 48.9
PAT (in −2817.1 8713 −83,470 6807.7 870.02 28,195.80 −251,976 65,885
millions)
Net 1330.05 7068.6 −14,253 68,576.7 5407.90 32,670.19 – 101,812 218,254.61
income
Table 3 Model performance measures

Models Accuracy (%) Sensitivity (% ) Specificity (%) AUC KS Gini
Logistic regression 92.06 88.00 94.74 0.913 0.827 0.603
Naïve Bayes 80.00 76.92 82.05 0.791 0.582 0.584
Random forest 93.85 92.59 94.74 0.936 0.873 0.153
Fig. 2 Random forest variable importance plot

financial indicators, the highly significant values include profit after tax, debt-to-
equity ratio, and debt service coverage ratio (DSCR). PAT is an important indicator
of the operational efficiency and performance of the organization and is important
for the distribution of dividends or retained earnings to shareholders. A corporation’s
DSCR is important when a corporation has borrowings such as bonds, loans, or lines
of credit. If the DSCR ratio is less than one, it indicates a negative cash flow and the
borrower can only service debts partially. This can help lenders effectively assess
borrowers. If the DSCR ratio is 1 or above, it means the firm has sufficient revenue to
settle the debts. Hence, DSCR is an important parameter in risk assessment and loan
approval. The debt-equity ratio (D/E) measures a firm’s ability to repay loans based
on the equity distributed among its shareholders. Thus, lenders can be self-assured
about the credit worthiness of the borrowing firm before granting loans.
The other common variables affecting the performance of a company and causing
potential defaults include the short-term liquidity of the company as represented by
current assets and current liabilities. Creditors use this ratio to assess a company’s
capacity to pay short-term obligations. Before a loan approval is granted, lenders
understand the ability of the firm to pay off its debts and other obligations. In general,
higher assets indicate that the company is expanding. The asset turnover of a corpo-
ration is a measure of its capacity to produce revenue from its assets. The higher the
asset turnover, the more efficient the company is; on the other hand, a lower value
indicates that the company is not successfully utilizing its assets to generate revenue.
This is an important factor to consider while assessing the credit worthiness of a
borrower. If the total volume of assets was not sufficient enough, it could possibly
indicate that the market performance and liquidity of that company is weak. In such
cases, it is better to avoid taking risks by reconsidering loan approvals. The measure
of total assets is not only critical for loan grants but also determines the market
potential of a company in attracting potential investors. A high debt-to-equity ratio
is believed to signal that a company is having financial troubles and may be unable
to pay its creditors. If it is too low, the company is relying too heavily on equity to
support its operations, which can be expensive and inefficient. This ratio displays
the proportions of equity and debt utilized to fund a company’s assets, as well as the
extent to which shareholder equity can satisfy creditors’ obligations in the event of
a company’s failure.
Furthermore, the variables that fall under governance characteristics include the
number and percentage of promoters, board size, board composition, number of direc-
tors, and both independent and non-independent. Independent directors are those
who are outside board members with minimal responsibilities, like attending the
annual general meetings and being present during executive board events. The non-
independent board directors are those who are directly involved in the process of
decision making. These variables define the administrative efficiency of a company
and its corporate governance. By taking these into consideration, a lender can gain
insights into the management, hierarchy, and internal work ethics of an organiza-
tion. The ownership structure of a company has a significant impact on its long-term
performance, and research shows that a high concentration of promoter ownership
has a beneficial impact on profitability. The number of promoters on a board is propor-

tional to the number of responsibilities and obligations shared; the more promoters
on the board, the less risky it is for a firm to survive a financial crisis. A larger board
of directors means more directors who oversee and supervise the firm’s performance
in order to protect the interests of stakeholders. Hence, it can be used to evaluate the
firm from a lender’s point of view.
5 Conclusion
In India, the majority of the lenders grant loans to individuals and firms based upon
historical payment behaviour, credit ratings, and current financial position. Any finan-
cial institution with the authority to provide funding must have a stable credit risk
assessment model to avoid outstanding debt payments. Contemporarily, the risk
assessment system uses financial metrics and neglects administrative attributes. The
potential scope for the future lies with predictive analytics as it serves the purpose of
identifying frauds by prevention of risks involved in approving bad loans. This paper
proposes a machine learning model that takes all the major influential factors. The
prediction model built using the random forest algorithm provides the highest accu-
racy of 93.85%. The models have also identified key significant predictor variables
that impact the defaulter prediction. More weightage should be given to industry
parameters like total assets and liabilities, DSCR ratio, and debt-equity ratio. In
addition, governance factors like board size, promoters, composition and financial
indicators like PAT were also identified as important. The model can play an impor-
tant role in the mitigation of non-performing assets (NPA) in the country by predicting
potential wilful defaulters. The outcome of the paper highlights the advantages of
using machine learning to predict defaulters and promises a stable and effective credit
assessment solution that can be integrated with banks and financial institutions in
future. The study helps organizations by giving a framework to concentrate on the
factors that need to be focused upon to avoid such financial complications in future.
References
1. Ahmad Itoo R, Selvarasu A (2017) Loan products and credit scoring methods by commercial
banks. Int J Latest Trends Finance Econ Sci 7(1):1297–1304
2. Attigeri G, Pai MMM, Pai RM (2019) Framework to predict NPA/willful defaults in corporate
loans: a big data approach. Int J Electr Comput Eng (IJECE) 9(5):3786
3. Bhasin ML (2017) Galloping non-performing assets bringing a stress on India’s banking sector:
an empirical study of an Asian country. Int J Manage Sci Bus Res 6(3):1–26
4. Das JK, Dey S (2019) Factors contributing to non-performing assets in India: an empirical
study. Rev Prof Manage J New Delhi Inst Manage 16(2):62
5. Eddy YL, Nazri EM, Mahat NI (2020) Identifying relevant predictor variables for a credit
scoring model using compromised-analytic hierarchy process (compromised-AHP). J Adv
Res Bus Manage Stud 20(1):1–13
6. Financial Express (2021). https://www.financialexpress.com/industry/banking-finance/bad-

bank-to-solve-rs-2-lakh-crore-bad-loans-take-npas-off-banks-books-heres-how-it-will-work/
2332436/
7. Jayadev M, Shah N, Vadlamani R (2019) Predicting educational loan defaults: application of
artificial intelligence models. SSRN Electron J
8. Karthik L, Subramanyam M, Shrivastava A, Joshi AR (2018) Prediction of wilful defaults: an
empirical study from Indian corporate loans. Int J Intell Technol Appl Stat 11(1):15–41
9. Madhavi S, Jhaveri M (2017) An analysis of characteristics of wilful defaulters in India. In:
Seventeenth AIMS international conference on management, pp 3–7
10. Madhuri R, Murty MR, Murthy JVR, Reddy PVGDP, Satapathy SC (2014) Cluster analysis
on different data sets using K-modes and K-prototype algorithms. In: ICT and critical infras-
tructure: proceedings of the 48th annual convention of computer society of India, vol II, pp
137–144
11. Munkhdalai L, Munkhdalai T, Namsrai OE, Lee J, Ryu K (2019) An empirical comparison of
machine-learning methods on bank client credit assessments. Sustainability 11(3):699
12. Rajeev M, Mahesh HP (2010) Banking sector reforms and NPA: institute for social and
economic change
13. The Indian Express (2021). https://indianexpress.com/article/business/banking-and-finance/
wilful-defaulters-rise-by-over-200-to-2494-in-fy21-nirmala-sitharaman-7425706/
14. The Economic Times Top 100 wilful defaulters owe lenders Rs. 84,632 crore—The Economic
Times (indiatimes.com)
15. Wang H (2021) CatBoost model with synthetic features in application of loan risk, ArXiv.Org.
https://arxiv.org/abs/2106.07954 (2021).
16. Wilful defaulter regulations as per RBI Master circular: RBI/2015-16/100. https://www.rbi.
org.in/Scripts/BS_ViewMasCirculardetails.aspx?id=9044
Design of Metamaterial-Based Multilayer
Dual Band Circularly Polarized
Microstrip Patch Antenna
Chirag Arora
Abstract This chapter presents a miniaturized multilayer circularly polarized dual

band microstrip antenna, operating at 5.8 GHz and 2.45 GHz, which is inspired with
a metamaterial. The two layers are placed one over the other in such a way that there
is no air gap between the two layers. To achieve circular polarization, single feed
and suitable mitering is introduced on the upper patch and is fed by a single probe.
Further, dual band characteristics have been achieved by using a plus shaped slot on
this lower patch. The antenna so designed provides the gain of 9.6 dBic and 3 dB
bandwidth of 525 MHz at the resonant frequency of 5.8 GHz. Also, at 2.45 GHz,
this antenna provides the gain of 5.9 dBic and 3 dB bandwidth of 270 MHz. Now to
enhance the performance of this antenna, the help of metamaterial has been taken,
which is realized by introducing via holes and L-shaped slots on the lower patch.
This metamaterial-inspired antenna provides the three dB axial ratio bandwidth of
590 MHz and 325 MHz at 5.8 GHz and 2.45 GHz, respectively. However, its gain
remains almost same at both the frequencies. The antenna has been designed on
low-cost FR-4 substrate.
Keywords Patch antenna · Circular polarization · Metamaterial · Via holes
1 Introduction
Reduction in multipath effect and suppression of interference due to rain can be easily
achieved by circularly polarized antennas. Therefore, such antennas are good candi-
dates for various applications such as satellite communication [1–7]. Researchers
have adopted various techniques to enhance the gain and three dB axial ratio band-
width of the circularly polarized antenna [8–14]. In [8], Guo and Tan designed a
multilayered circularly polarized patch antenna. This antenna uses single feed and
provides wide bandwidth but its fabrication is quite tedious since very high preci-
sion is required to maintain the desired gap between different layers of the antenna.
C. Arora (B)
KIET Group of Institutions, Delhi-NCR, Ghaziabad, Uttar Pradesh, India
e-mail: c_arora2002@yahoo.co.in
384 C. Arora
Wang et al., in [10], proposed a circularly polarized antenna with surface integrated
waveguide. The proposed antenna shows characteristics of good circular polariza-
tion but this feeding technique is not much easy to implement practically. The design
proposed by Cheng and Dong, in [11], provides wide three dB axial ratio bandwidth
of 22.58% and −10 dB impedance bandwidth of 48.75%. But these characteristics
have been obtained by use of two suspended metal rods, making the antenna bulky.
Pan and Dong proposed circularly polarized stacked antenna for radio frequency
reader applications [14]. This antenna produces good bore sight gain but at the cost
of a director patch, which is in addition to the parasitic patch. Thus, from the literature
survey presented above, it is concluded that the most common technique adopted to
enhance the performance of circularly polarized antenna is the use of multilayered
structure with air filled between the different layers. This technique results in less
fabrication cost, low antenna profile, and low dielectric loss. However, this technique
requires some dielectric posts to support the upper patch, thus causing difficulty in
fabrication. To the best knowledge of author, limited work has been done toward
performance improvement of circularly polarized patch antenna using multilayered
structure, where different layers are not filled with air, rather they are fixed together
with help of glue.
In this chapter, a two-layered circularly polarized microstrip patch antenna has
been explored, where the two layers are not separated by the air gap, rather they
are stuck together with some pasting material. The upper layer of this proposed
antenna is composed of a mitered square patch to acquire circular polarization. The
help of metamaterials have been taken to improve the performance of this antenna.
These are specially designed structures which possess peculiar properties that are not
found in naturally occurring materials. These materials were proposed by Veselago in
1967 [15] and then in 1990s, Pendry et al. showed electric plasma from wire-shaped
structures [16] and then magnetic plasma from ring-shaped structures [17]. However,
these structures were not planar in nature and hence, difficult to fabricate. Since
then, various two-dimensional metamaterial structures have been designed by various
researchers and are being widely used in antenna as well as other microwave and
millimeter wave applications [18–28]. The metamaterials can be incorporated with
conventional antennas in several ways, out of which the most common techniques
include their loading as substrate [29, 30] or superstrate [31–34]. Though these two
techniques provide significant betterment in the performance parameters of patch
antennas, but at the cost of tedious technique of designing array of metamaterial unit
cell or increased profile, respectively.
In this paper, authors have realized the metamaterial characteristics through
thin slots on lower patch and via holes which extend from L-shaped patch to the
ground plane. The narrow slots provide left-handed series capacitance and via holes
contribute to left-handed shunt inductance. Thus, this technique of metamaterial
realization eliminates the requirement of designing the array of metamaterial unit
cell.
Moreover, to squeeze the profile of the patch antenna, use of multi-band patch
antennas prove to be very beneficial. Several techniques have been discussed in the
literature to design multi-band antennas, such as stacking of two different structures
Design of Metamaterial-Based Multilayer Dual Band … 385
[35], using stubs [18], using defective ground plane [36], slotting the radiator to
create perturbations [37]. Out of various above-said techniques, perturbation creation
to achieve multi-band operation by slotting the radiator appears to be the simplest; as
it does not increase the profile of the antenna as well as no special arrangements are
required for its fabrication. Taking the advantage of this fact, in this chapter authors
have etched a plus-shaped slot on the lower patch so that dual band operation can
be achieved. Thus, the lower patch of the proposed patch antenna serves the two
functions—one it realizes the metamaterial effect and second it helps in achieving
dual band operation with the help of a plus-shaped slot.
Thus, in this literature a metamaterial-based multilayer two band circularly polar-
ized microstrip patch antenna is proposed. This antenna comprises of two layers, and
both these layers are pasted together with some glue, thus eliminating the problem
of aligning the two layers. The upper layer has a mitered square patch to achieve
circular polarization. The lower layer possesses a plus-shaped patch to obtain dual
band behavior. Further to improve the performance of this antenna metamaterials
have been used. The realization of metamaterial has been done by using the L-
shaped slots and via holes. The antenna is simulated on FR-4 substrate of thickness
(h) = 1.48 mm, dielectric constant (εr) = 4.3, and loss tangent = 0.01.
2 Antenna Design
A. Configuration of Antenna
The top view of this antenna is presented in Figs. 1 and 2. Whereas, the rare view
of the composite structure is given in Fig. 3. The designed antenna is composed of
two layers, which are pasted together with the help of some glue. The upper layer is
composed of a square mitered patch of length 12 mm × 16 mm. The dimensions of
lower patch are also same. Both the substrates possess the length and width of 60 mm
× 60 mm. The perturbation caused by this mitered patch leads to achieve the circular
polarization. The width of all slots on lower patch is 4 mm. Radius of each via hole
is 0.3 mm. The upper patch is mitered at the length of 4 mm. The antenna is excited
using a probe feed. The lower patch possesses a plus-shaped slot to obtain dual band
behavior. The performance improvement of this antenna is achieved by using the
metamaterials. The metamaterial behavior is realized by etching L-shaped slots on
the four corners of the lower patch and making via holes from each slot to the ground
plane. The L-shaped patches account for left-handed series capacitance, whereas via
holes contribute for shunt inductance. These four patches are etched on four corners
of the lower patch, symmetrically with respect to the central plus-shaped patch. The
location of via holes is selected in such a way that their presence does not affect the
circular polarization. Since the current intensity at the corners is usually weak, hence
via holes are introduced at the corner of the patches.
386 C. Arora
Fig. 1 Geometric sketch of

perturbed square patch
(upper patch) Upper Substrate
Upper Patch
Fig. 2 Geometric sketch of

lower patch comprising of
plus-shaped patch with Plus Shaped Patch
capacitive slots and inductive Lower
via holes Substrate
Slot
Slot Main Lower

Patch
Coaxial
Probe
Via
holes
B. Theory
As discussed in the literature [38], the metamaterial-based antennas are considered
as composite right/left-handed transmission line with a terminal open, whose equiv-
alent circuit is shown in Fig. 4. As observed from Fig. 4, a metamaterial struc-
ture not only comprises traditional RH shunt capacitance and series inductance, but
it also possesses left-handed shunt inductance and series capacitance. These left-
handed characteristics can be realized by thin slots on microstrip patch and via holes
Z Upper Patch
Upper Substrate Lower Patch
Via Holes
Lower Substrate
Coaxial Probe X
Ground
Fig. 3 Side view of the two-layered metamaterial-inspired dual band circularly polarized microstrip
patch antenna
Fig. 4 Equivalent circuit of Lseries Cseries

composite transmission line
(right/left-handed)
Cshunt Lshunt
to the ground. This results in simultaneous negative and positive phase constant.
As compared to a conventional half wavelength antenna of same electrical length,
the metamaterial antennas possess lower resonant frequencies. Hence, metamaterial
structures can be used to realize compact antennas. Moreover, due to the presence
of multiple left- and right-handed resonant frequencies, metamaterial antennas also
provide dual band operations.
The two opposite cut corners help to produce the perturbation needed for the
circular polarization by using single feed. The location of via holes is decided in
such a way that it coincides with the direction of current so that the performance of
circular polarization is enhanced.
3 Results
This segment describes the simulated results of the proposed antenna with and
without metamaterial loading. Figure 5 shows the simulated return loss characteris-
tics of the conventional dual band circularly polarized microstrip patch antenna and
metamaterial loaded antenna at 5.8 GHz and 2.45 GHz. It is seen that the unloaded
antenna array resonates at 5.8 GHz and 2.45 GHz with bandwidth of 525 MHz and
388 C. Arora
Fig. 5 Return loss curve of 0

unloaded (shown in black
colour) and loaded (shown in -5
green colour) antenna at
5.8 GHz and 2.45 GHz -10
S11 (dB)
-15
-20
-25
1.5 2.5 3.5 4.5 5.5 6.5
Frequency (GHz)
0 0
10
330 30 330 30
15
5 10
0 300 60 5 300 60
0
-5
-5
-10 270 90 -10 270 90
-5 -5
Loaded
0 Unloaded
0 240 Loaded 120 240 120
5
Unloaded
5 10
210 150
10
180
(a) 15 210 150 (b)
180
Fig. 6 Elevation plane radiation pattern curves of unloaded and loaded proposed antenna at a
5.8 GHz b 2.45 GHz
270 MHz, whereas when this traditional patch antenna is loaded with metamaterial,
bandwidth reaches to 590 MHz at the resonant frequency of 5.8 GHz and 325 MHz at
resonant frequency of 2.45 GHz. From Fig. 6, it is observed that gain of the proposed
antenna is almost same at both the resonant frequencies for loaded and unloaded
conditions.
4 Conclusions
A multilayered dual band circularly polarized metamaterial-inspired microstrip patch

antenna has been proposed. This antenna resonated at 5.8 GHz and 2.45 GHz. The
performance of the proposed antenna has been improved using metamaterials, which
are realized using slots and via holes. The performance of the proposed structure has
been improved at no additional hardware price and size. Performance comparison of a
traditional and designed metamaterial-inspired antenna shows that the metamaterials
are budding candidates for improving the parameters of traditional antennas.
References
1. Jeon SI, Kim YW (2000) New active phased array antenna for mobile direct broadcasting
satellite reception. IEEE Trans Broadcast 46(1):34–40
2. Sajal S, Latif SI, Spencer E (2018) Circularly polarized small-footprint hybrid ring-patch
stacked antenna for pico-satellites. In: IEEE international symposium on antennas and
propagation & USNC/URSI national radio science meeting. Boston, MA, USA
3. Satapathy SC et al (eds) (2016) Information systems design and intelligent applications. In:
Proceedings of third international conference INDIA 2016, vol 2. Springer India
4. Satapathy SC et al (2016) Computer communication, networking and internet security. In:
Proceedings of IC3T
5. Matsunaga M, Yamamoto M (2018) A double-band circularly polarized antenna for satellite
signal bands in the ratio of 3:8. In: IEEE conference on antenna measurements & applications
(CAMA). Sweden
6. Arora C (2021) Metamaterial-inspired circularly polarized microstrip patch antenna. In:
Proceedings of international conference on computer communication, networking and IoT.
Lecture notes in networks and systems book series, vol 197. LNNS, pp 183–190
7. Arora C (2021) Metamaterial-loaded circularly polarized patch antenna array for C band appli-
cations. In: Proceedings of 6th international conference on recent trends in computing. Lecture
notes in networks and systems book series, vol 177. LNNS, pp 57–64
8. Guo YX, Tan DCH (2009) Wideband single-feed circularly polarized patch antenna with
conical radiation pattern. IEEE Antennas Wirel Propag Lett 8:924–926
9. So KK, Wong H, Luk KM, Chan CH (2015) Miniaturized circularly polarized patch antenna
with low back radiation for GPS satellite communications. IEEE Trans Antennas Propag
63(12):5934–5938
10. Wang Y, Zhu F, Gao S (2018) 24-GHz circularly polarized substrate integrated waveguide-fed
patch antenna. In: International applied computational electromagnetics society symposium.
Beijing, China
11. Cheng Y, Dong Y (2021) Wideband circularly polarized split patch antenna loaded with
suspended rods. IEEE Antennas Wirel Propag Lett 20(2):229–233
12. Satapathy SC, Bhateja V, Joshi A (eds) (2016) Proceedings of the international conference on
data engineering and communication technology: ICDECT 2016, Volume 2, vol 469. Springer
13. Satapathy SC, Bhateja V, Das S (2018) Smart intelligent computing and applications. In:
Proceedings of the second international conference on SCI, vol 1
14. Pan Y, Dong Y (2020) circularly polarized stack Yagi RFID reader antenna. IEEE Antennas
Wirel Propag Lett 19(7):1053–1057
15. Veselago VG (1968) The electrodynamics of substances with simultaneously negative values
of ε and μ. Soviet Physics Uspekhi 10(4):509–514
16. Pendry JB, Holden AJ, Stewart WJ, Youngs I (1996) Extremely low frequency plasmons in
metallic mesostructures. Phy Rev Lett 76(25):4773–4776
17. Pendry JB, Holden AJ, Robbins DJ, Stewart WJ (1999) Magnetism from conductors and
enhanced nonlinear phenomena. IEEE Trans Microw Theory Tech 47(11):2075–2084
18. Ali T, Biradar RC (2017) A compact multiband antenna using λ/4 rectangular stub loaded with
metamaterial for IEEE 802.11N and IEEE 802.16E. Micro Opt Tech Lett 59(5):1000–1006
19. Alu A, Engheta N, Erentok A, Ziolkowski RW (2007) Single negative, double-negative, and
low index metamaterials and their electromagnetic applications. IEEE Antennas Propag Mag
49(1):23–36
20. Rezaeieh SA, Antoniades MA, Abbosh AM (2017) Miniaturization of planar Yagi antennas
using Mu-negative metamaterial-loaded reflector. IEEE Trans Antenna Propag 65(12):6827–
6837
21. Chen PY, Alu A (2010) Dual-mode miniaturized elliptical patch antenna with μ–negative
metamaterials. IEEE Antenna Propaga Lett 9:351–354
22. Joshi JG, Pattnaik SS, Devi S, Lohokare MR (2012) Metamaterial embedded wearable
rectangular microstrip patch antenna. Int J Antenna Propag 2012:1–9
390 C. Arora
23. Arora C, Pattnaik SS, Baral RN (2015) SRR inspired microstrip patch antenna array. J. Prog
Electromag Res C 58(10):89–96
24. Arora C, Pattnaik SS, Baral RN (2015) Microstrip patch antenna array with metamaterial
ground plane for Wi-MAX applications. In: Proceedings of the springer second international
conference on computer and communication technologies (IC3T-2015). India, pp 665–671
25. Arora C, Pattnaik SS, Baral RN (2016) Metamaterial superstrate for performance enhancement
of microstrip patch antenna array. In: Proceedings of 3rd international conference on signal
processing and integrated networks (SPIN-2016). India, pp 775–779
26. Palandoken M, Grede A, Henke H (2009) Broadband microstrip antenna with left-handed
metamaterials. IEEE Trans Antennas Propag 57(2):331–338
27. Du G, Tang X, Xiao F (2011) Tri-band metamaterial-inspired monopole antenna with modified
S-shaped resonator. Prog Electromag Res Lett 23:39–48
28. Gao XJ, Cai T, Zhu L (2016) Enhancement of gain and directivity for microstrip antenna using
negative permeability metamaterial. AEU Int J Electron Commun 70(7):880–885
29. Li M, Luk KM, Ge L, Zhang K (2016) Miniaturization of magnetoelectric dipole antenna by
using metamaterial loading. IEEE Trans Antennas Propag 64(11):4914–4918
30. Dong Y, Toyao H, Itoh T (2012) Design and characterization of miniaturized patch antennas
loaded with complementary split-ring resonators. IEEE Trans Antennas Propag 60(2):772–785
31. Arora C, Pattnaik SS, Baral RN (2017) SRR superstrate for gain and bandwidth enhancement
of microstrip patch antenna array. Prog Electromag Res B 76:73–85
32. Arora C, Pattnaik SS, Baral RN (2017) Performance enhancement of patch antenna array for
5.8 GHz Wi-MAX applications using metamaterial inspired technique. Int J Electron Commun
(AEÜ) 79:124–131
33. Chung KL, Chaimool S (2012) Broadside gain and bandwidth enhancement of microstrip patch
antenna using a MNZ-metasurface. Microw Opt Technol Lett 54(2):529–532
34. Wu Z, Li L, Li Y, Chen X (2016) Metasurface superstrate antenna with wideband circular
polarization for satellite communication application. IEEE Antennas Wirel Propag Lett 15:374–
377
35. Shafai L, Chamma W, Seguin G, Sultano N (1997) Dual-band dual polarized microstrip
antennas for SAR applications. In: Proceedings of IEEE antennas and propagation international
symposium. Canada, pp 1866–1869
36. Zayed ASA, Shameena VA (2016) Planar dual-band monopole antenna with an extended
ground plane for WLAN applications. Int J Antennas Propag 1–10
37. Mok WC, Wong SH, Luk KM, Lee KF (2013) Single-layer single-patch dual-band and triple
band patch antennas. IEEE Trans Antenna Propag 61(8):4341–4344
38. Caloz C, Itoh T (2002) Application of the transmission line theory of left-handed (LH) materials
to the realization of a microstrip LH line. In: 2002 IEEE antennas and propagation society
international symposium, vol 2, pp 412–415
Heart Disease Prediction in Healthcare
Communities by Machine Learning Over
Big Data
Lingala Thirupathi, B. Srinivasulu, Unnati Khanapurkar, D. Rambabu,

and C. M. Preeti
Abstract In today’s world, big data is the fastest and most widely used tool in
every industry. Medical and healthcare industries flourish with the help of vast data,
and with the help of massive data, the advantages of accurate medical data analysis,
early illness prediction, and accurate patient information may be securely held on and
employed. Furthermore, the accuracy of the study may be harmed due to a variety of
factors such as poor medical information and regional sickness features that might
be used to anticipate outbreaks and so on. In this work, we will show how to use
a machine learning algorithmic programme to correctly anticipate disease. To do
so, we will collect hospital data from a specific location. We can utilize latent factor
models to actualize unfinished information in the case of missing data. In prior work,
a convolutional neural network-based unimodal sickness prediction (CNN-UDRP)
algorithmic programme was used to forecast illness. The CNN-MDRP algorithmic
programme, which is based on multimodal sickness prediction, solves the shortcom-
ings of the CNN-UDRP algorithmic programme, which only works with structured
data. This algorithmic application makes use of all of the hospital’s organized and
unstructured data. None of the previous studies focused on every type of data in the
field of medical big data analysis.
Keywords Machine learning · Big data · Health care · Disease prediction
L. Thirupathi (B)
CSE Department, Stanley College of Engineering and Technology for Women, Hyderabad,
Telangana, India
e-mail: thiru1274@gmail.com
B. Srinivasulu
CSE Department, Vidya Jyothi Institute of Technology, Hyderabad, Telangana, India
U. Khanapurkar
CSE Department, Methodist College of Engineering & Technology, Hyderabad, Telangana, India
D. Rambabu
CSE Department, Sreenidhi Institute of Science & Technology, Hyderabad, Telangana, India
C. M. Preeti
CSE Department, Institute of Aeronautical Engineering, Hyderabad, Telangana, India
392 L. Thirupathi et al.
1 Introduction
Therapeutic X-rays are images that are commonly used to examine a few sensitive
body parts such as the bones, chest, teeth, skull, and so on. For decades, experts
have used this method to study and visualize breakdowns or irregularities in human
organs. This would be very welcomed in revealing the shocking truth that X-rays are
extremely effective symptomatic tools in detecting obsessive alterations, in addition
to their non-invasive properties and economical concerns. CXR images will show
chest infections in the form of cavitation, combinations, penetrates, and small broadly
speaking transmitted modules. The radiologists will examine the chest X-ray image
for a variety of conditions and illnesses, including fiery illness, radiation, penetration,
module, and pathology, deviation from the norm, breaks, and a plethora of others.
Classifying the chest X-ray variations from the norm is taken under consideration as
a repetitive errand for radiologists; in this way, a few calculations were anticipated by
analysts to precisely perform this assignment. Over the past decades, computer-aided
assignment (CAD) frameworks have been created to extract helpful information from
X-rays to help specialists in having quantitative knowledge with respect to Relate
in Nursing X-ray. In any case, these CAD frameworks couldn’t have accomplished
an important level to make determinations on the kind of conditions of maladies
in Relate in Nursing X-ray. These profound systems appeared divine correctness
in performing expressions such assignments. This victory affected the analysts to
utilize these systems to therapeutic pictures for maladies classification assignments,
and thus it appeared that profound systems will with proficiency extricate supportive
choices that recognize completely diverse categories of pictures.
The most ordinarily utilized profound learning plan is that the convolutional neural
organize (CNN). CNN has been connected to changed restorative pictures classifi-
cation much obliged to its control of extricating completely distinctive level alterna-
tives from pictures. Having reacted to the associated examination, amid this paper,
a profound convolutional neural arrange (CNN) is utilized to upgrade the execution
of the assignment of the chest maladies in terms of precision and least square error
accomplished. For this reason, old and profound learning-based systems are utilized
to classify commonest pectoral illnesses and to bless comparative outcomes.
2 Literature Survey
Accurate prediction saves you time by minimizing the need to locate frequent action
sets. To enhance the accuracy of failure prediction classifications results, which antic-
ipates a person’s risk of developing heart disease and offers a succinct explanation of
numerous classification rules. In [1–4], the authors used various methods to predict
the diseases. In [5–19], the authors has focused on security-related aspects to predict
the diseases. In [20–32], the authors have given overview of big data in healthcare
domains. In [33], an enhanced fingerprinting and trajectory prediction for IoT were
Heart Disease Prediction in Healthcare Communities by Machine … 393
developed. In [34–39], the authors have used different models in healthcare domain
to assess the risks in data mining. Heart diseases are the most common disease to
be detected. We require an accurate model to decrease the effort of mankind. On the
premise of mining rule and therefore the given inputs, they’re making an attempt to
search out the problems of prediction of cardiomyopathy. In [40–43], data analysis,
news recognition, and plant leaf diseases are performed in machine learning.
3 Proposed System
To carry out all of the alterations inside the values included in the data set that we tend
to conduct our work, we use Python libraries. Our paper’s architecture is depicted
in Fig. 1. We usually use pandas to retrieve information about which frame to use
for data modification. Age, pain frequency and types, blood pressure, steroid alcohol
levels, and other variables are included in our data set. In Python, we frequently
employ machine learning methods such as Tree and SVM, as well as the scikit-
learn package. We like to visualize the results of our victimization Matplotlib library
implementation in Python. After that, we usually compare the two algorithms to see
which one is the greatest at supporting each algorithm’s accuracy. The input layer
is followed by a convolutional layer with 16 kernels and activation conducted as
ReLU, and twenty-five per cent of the nodes are born by the dropout layer within the
subsequent layer. The convolutional layer was applied with eight kernels with the
same settings as before, and the dropout layer was applied with twenty-fifth. Asso-
ciate the degree output layer with the prediction chance computations. For coaching
and testing purposes, the cleansed data is divided into eightieth coaching and two
hundredth testing. The same data set is examined using a variety of machine learning
classifiers, including supply regression (LR), NB, KNN, and SVM, as well as various
kernels, such as linear and RBF, and simple neural networks. Throughout this study,
we will use a CNN to accurately predict whether or not a patient has a cardiac
problem.
Fig. 1 Architecture design

4 Algorithm
Neural network using convolution In general, CNN is made up of two levels. One
is the highlight extraction layer, which associates each neuron’s contribution with
the previous layer’s surrounding response fields and emphasizes the neighbour-
hood include. When the nearby highlights are extracted, the spatial relationship
between them and other highlights will be resolved as well. Figure 2 shows that
CNN multimodal disease risk prediction algorithm.
4.1 Algorithm Process Flow
Stage 1: Select the data set.

Stage2: Feature determination utilizing data gain and positioning.
Stage 3: Classification calculation.
Stage 4: Each feature ascertains the fx estimation of the info layer.
Stage 5: Inclination class of each component computes.
Stage 6: Next produce the component map and go to advance pass input layer.
Stage 7: Calculate the convolution canters in an element design.
Stage 8: Produce sub-example layers and highlight esteem.
Stage 9: Back propagation input deviation of the kth neuron in the yield layer.
Stage 10: Finally give the choose highlight and arrangement results.
Fig. 2 CNN multimodal disease risk prediction algorithm

Fig. 3 Convolution layer
Fig. 4 Pooling layer
5 Implementation
In the convolutional neural network algorithm, it consists of three layers, convolu-

tional, pooling, and fully connected layers, and they are represented in Figs. 3, 4 and
5, respectively.
6 Results
We applied the convolution neural network algorithm on our data set, and the effects
which will be produced are given inside in the form of a confusion matrix which
Fig. 5 Fully connected layer
suggests the accuracy of the unique model in the form of true positive and true
negative values.
Figure 6 depicts the user interface which we designed. Here the user is requested
to enter the input image to check whether he is suffering from low risk or high risk.
In Fig. 7, x-axis: one unit-0.5 s and y-axis: one unit-0.5 epoch, and the input we
give here is chest X-ray images. It will analyse the images of both virus infected
people and normal people. We assign the value 1 for high risk and 0 for low risk
(Figs. 8 and 9).
Fig. 6 User interface
Fig. 7 Threshold curve for

our data set
Fig. 8 High risk data input
Fig. 9 Low risk data input
The output will be in the form of 0 or 1, i.e., low or high risk.

In Figs. 10 and 11, x-axis: one unit-0.5 s and y-axis: one unit-0.5 epoch, and
Fig. 10 depicts the loss value which gradually decreases when the training images
increases. Figure 11 depicts the accuracy value which gradually increases when the
input no. of images for increases.
Fig. 10 Plotting loss values
Fig. 11 Plotting accuracy

values
7 Conclusion
For simple diseases, only structured data is enough to predict but for complex data
both structured and unstructured data is required. In this article, we used a new
convolutional neural network primarily based totally multimodal disease prediction
(CNN-MDRP) set of rules the usage of dependent and unstructured statistics from
statistics set. To the best of our knowledge, none of the previous works focused on
each statistical category in the context of clinical big data analytics. When compared
to other tough prediction algorithms, our proposed version has a high level of accu-
racy. In this paper, we will implement a substitute CNN-MDRP rule that targets both
structured and unstructured hospital data. To the best of our knowledge, no current
effort in the field of medical big data analytics has focused on every type of knowl-
edge. In comparison with several common prediction algorithms, our projected rule
has a prediction accuracy of 94.8% and a convergence speed that is faster than the
CNN-based uni-modal risk prediction (CNN-UDRP) rule.
References
1. Akbarizadeh G, Tirandaz Z (2020) Segmentation parameter estimation algorithm based on

Curvelet transforms coefficients energy for feature extraction and texture description of SAR
images. In: 7th conference on information and knowledge technology (IKT), pp 1–4
2. Aril I, Rose DC, Karnowski TP (2020) Deep machine learning—a new frontier in artificial
intelligence research. IEEE Comput Intell Mag 5(4):13–18
3. Bents C, Frost A, Velotto D, Tings B (2020) Ship-Iceberg discrimination with convolutional
neural networks in high resolution SAR images. In: Proceedings of EUSAR 2020: 11th
European conference on synthetic aperture radar, pp 1–4
4. Agrawal KK, Sharma S, Tomar S, Kumar S (2020) Disease prediction for the deprived
using machine learning from iijitee.org. ISSN 9(7):2278–3075. https://doi.org/10.35940/iji
tee.F3076.059720
5. Thirupathi L, Padmanabhuni VNR (2021) Multi-level protection (Mlp) policy implementation
using graph database. Int J Adv Comput Sci Appl (IJACSA) 12(3). https://doi.org/10.14569/
IJACSA.2021.0120350
6. Thirupathi L, Nageswara Rao PV (2020) Developing a multi-level protection framework using
EDF. Int J Adv Res Eng Technol (IJARET) 11(10):893–902
7. Thirupathi L, Padmanabhuni VNR (2020) Protected framework to detect
and mitigate attacks. Int J Anal Exp Modal Anal XII(VI):2335–2337.
18.0002.IJAEMA.2020.V12I6.200001.0156858943
8. Thirupathi L, Rekha G (2016) Future drifts and modern investigation tests in wireless sensor
networks. Int J Adv Res Comput Sci Manage Stud 4(8)
9. Thirupati L, Pasha R, Prathima Y (2014) Malwise system for packed and polymorphic malware.
Int J Adv Trends Comput Sci Eng 3(1):167–172
10. Lingala T, Galipelli A, Thanneru M (2014) Traffic congestion control through vehicle-to-
vehicle and vehicle to infrastructure communication. Int J Comput Sci Inf Technol (IJCSIT)
5(4):5081–5084
11. Swathi M, Thirupathi L (2013) Algorithm for detecting cuts in wireless sensor networks. Int J
Comput Trends Technol (IJCTT) 4(10)
12. Thirupathi L, Reddemma, Gunti S (2009) A secure model for cloud computing based storage
and retrieval. SIGCOMM Comput Commun Rev 39(1):50–55
13. Thirupathi L, Nageswara RPV (2018) Understanding the influence of ransomware: an inves-
tigation on its development mitigation and avoidance techniques. Grenze Int J Eng Technolo
(GIJET) 4(3):123–126
14. Lingala T, Ravikanti S (2017) Social media: to deal crisis circumstances. Int J Innov Adv
Comput Sci (IJIACS) 6(9)
15. Rekha S, Thirupathi L, Renikunta S, Gangula R (2021) Study of security issues and solutions in
Internet of Things (IoT). Mater Today Proc. ISSN 2214-7853. https://doi.org/10.1016/j.matpr.
2021.07.295
16. Gangula R, Thirupathi L, Parupati R, Sreeveda K, Gattoju S (2021) Ensemble machine learning
based prediction of dengue disease with performance and accuracy elevation patterns. Mater
Today Proc. ISSN 2214-7853. https://doi.org/10.1016/j.matpr.2021.07.270
17. Nalajala S, Thirupathi L, Pratap NL (2020) Improved access protection of cloud using feedback
and de-duplication schemes. J Xi’an Univ Archit Technol XII(IV)
18. Srividya V, Swarnalatha P, Thirupathi L (2018) Practical authentication mechanism using
PassText and OTP. Grenze Int J Eng Technol Special Issue, Grenze ID: 01.GIJET.4.3.27,
© Grenze Scientific Society
19. Thirupathi L, Rehaman Pasha MD, Reddy GS (2013) Game based learning (GBL). Int J Res
Eng Adv Technol 1(4)
20. Groves P, Kayyali B, Knott D, Kuiken SV (2016) The ‘big data’ revolution in healthcare:
accelerating value and innovation
21. Chen M, Mao S, Liu Y (2014) Big data: a survey. Mob Netw Appl 19(2):171–209
22. Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better
research applications and clinical care. Nat Rev Genet 13(6):395–405
23. Tian D, Zhou J, Wang Y, Lu Y, Xia H, Yi Z (2015) A dynamic and self-adaptive network
selection method for multimode communications in heterogeneous vehicular telematics. IEEE
Trans Intell Transp Syst 16(6):3033–3049
24. Chen M, Ma Y, Li Y, Wu D, Zhang Y, Youn C (2017) Wearable 2.0: enable human-cloud
integration in next generation healthcare system. IEEE Commun 55(1):54–61
25. Chen M, Ma Y, Song J, Lai C, Hu B (2016) Smart clothing: connecting human with clouds and
big data for sustainable health monitoring. ACM/Springer Mob Netw Appl 21(5):825–845
26. Chen M, Zhou P, Fortino G (2016) Emotion communication system. IEEE Access. https://doi.
org/10.1109/ACCESS.2016.2641480
27. Qiu M, Sha EH-M (2009) Cost minimization while satisfying hard/soft timing constraints for
heterogeneous embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 14(2):25
28. Wang J, Qiu M, Guo B (2017) Enabling real-time information service on telehealth system
over cloud-based big data platform. J Syst Architect 72:69–79
29. Bates DW, Saria S, Ohno-Machado L, Shah A, Escobar G (2014) Big data in health care: using
analytics to identify and manage high-risk and high-cost patients. Health Aff 33(7):1123–1131
30. Qiu L, Gai K, Qiu M (2016) Optimal big data sharing approach for tele-health in cloud
computing. In: IEEE international conference on smart cloud (SmartCloud). IEEE, pp 184–189
31. Zhang Y, Qiu M, Tsai C-W, Hassan MM, Alamri A (2015) Health CPS: healthcare cyber-
physical system assisted by cloud and big data. IEEE Syst J
32. Lin K, Luo J, Hu L, Hossain MS, Ghoneim A (2016) Localization based on social big data
analysis in the vehicular networks. IEEE Trans Ind Inform
33. Lin K, Chen M, Deng J, Hassan MM, Fortino G (2016) Enhanced fingerprinting and trajectory
prediction for IoT localization in smart buildings. IEEE Trans Autom Sci Eng 13(3):1294–1307
34. Oliver D, Daly F, Martin FC, McMurdo ME (2004) Risk factors and risk assessment tools for
falls in hospital in-patients: a systematic review. Age Ageing 33(2):122–130
35. Marcoon S, Chang AM, Lee B, Salhi R, Hollander JE (2013) Heart score to further risk stratify
patients with low timing scores. Crit Pathw Cardiol 12(1):1–5
36. Bandyopadhyay S, Wolfson J, Vock DM, Vazquez-Benitez G, Adomavicius G, Elidrisi M,
Johnson PE, O’Connor PJ (2015) Data mining for censored time-to-event data: a Bayesian
network model for predicting cardiovascular risk from electronic health record data. Data Min
Knowl Disc 29(4):1033–1069
37. Qian B, Wang X, Cao N, Li H, Jiang Y-G (2015) A relative similarity based method for
interactive patient risk prediction. Data Min Knowl Disc 29(4):1070–1093
38. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV (2015) Incorporating
temporal data in predictive models for risk stratification of renal function deterioration. J
Biomed Inform 53:220–228
39. Wan J, Tang S, Li D, Wang S, Liu C, Abbas H, Vasilakos A (2017) A manufacturing big data
solution for active preventive maintenance. IEEE Trans Ind Inf. https://doi.org/10.1109/TI-I.
2017.2670505
40. Thirupathi L et al (2021) J Phys Conf Ser 2089:012049. https://doi.org/10.1088/1742-6596/
2089/1/012049
41. Lingala T et al (2021) J Phys Conf Ser 2089:012050. https://doi.org/10.1088/1742-6596/2089/
1/012050
42. Pratapagiri S, Gangula R, Ravi G, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early
detection of plant leaf disease using convolutional neural networks. In: 2021 3rd international
conference on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.
1109/ICERA53111.2021.9538659
43. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message over
the network using another frequency and timing technique to circumvent the jammers. J Nucl
Energy Sci Power Gener Technol 10:9
A Novel Twitter Sentimental Analysis
Approach Using Naive Bayes
Classification
Lingala Thirupathi, G. Rekha, S. K. Shruthi, B. Sowjanya,

and Sowmya Jujuroo
Abstract The world is mutating into a better place due to the innovations happening
around the globe. Since people spend much time regularly on social media to express
their opinions, social networks are the primary sources of information regarding
people’s opinions and feelings on various topics. Twitter is a micro blogging and
social networking site that allows users to post brief status updates of up to 140
characters in length. It is a rapidly growing service. This project resolves the issue
of sentiment analysis on Twitter. Sentiment analysis is a form of natural language
processing used to monitor public opinion on a specific product or subject. Sentiment
analysis, also known as opinion mining, entails creating a framework to capture and
analyze product opinions expressed in blog posts, comments, reviews, or tweets.
The objective of this report is to provide an illustration of this fascinating problem as
well as a model for performing sentiment analysis on Twitter tweets using the Naïve
Bayes classification algorithm.
Keywords NLP · Twitter · Naïve Bayes · Classification
1 Introduction
Sentiment analysis, also known as “opinion mining” or “emotion Artificial Intelli-

gence,” refers to the use of natural language processing (NLP), text mining, machine
learning and other techniques to analyze people’s feelings. Biometrics and compu-
tational linguistics are also used to recognize, extricate, analyze, and investigate in a
methodical manner the subjective facts and emotional states of public opinion.
L. Thirupathi (B)
CSE Department, Stanley College of Engineering and Technology for Women, Hyderabad,
Telangana, India
e-mail: thiru1274@gmail.com
G. Rekha
CSE Department, Kakatiya Institute of Technology & Science, Warangal, Telangana, India
S. K. Shruthi · B. Sowjanya · S. Jujuroo
CSE Department, Methodist College of Engineering & Technology, Hyderabad, Telangana, India
In this big data era, almost every individual has access to the Internet. As a result,
people are finding it easy to share their thoughts and opinions on social and also
cultural issues on global platforms like Twitter. Twitter is providing an opportunity
to communicate with strong influential people. Hence common people often address
their issues through tweets to bring them to politicians notice. Many brands launch
their products on Twitter. Influential people like actors share their life events and
experience through tweets with their fans. Their tweets get numerous replies from
fans.
As a consequence, large volumes of data are collected every single day, every
single hour, every single minute, and every single second on Twitter. This data when
put to the right use can benefit Businesses to make major decisions. Sentiment anal-
ysis is crucial since it allows companies to easily consider their consumers’ overall
views. Twitter sentiment analysis allows you to follow what people are saying on
social media about your product or service, and it can help you discover disgruntled
customers or unfavorable mentions before they become a major problem.
Simultaneously, sentiment analysis on Twitter may provide useful information.
What are the characteristics of your business that your customers enjoy the most?
What are the most frequently mentioned negative aspects?
Sentiment analysis’ primary notion is to determine the polarity of brief sentences
and classify them accordingly. The starting point The polarity of a sentiment can
be classified as “good,” “bad,” or “balanced” Because sentiment analysis in the
context of micro blogging is a relatively new study field, there is plenty of room
for more research. Prior work has been conducted on sentiment analysis of user
comments, papers, web blogs/articles, and general phrase analysis. The 280-character
limit distinguishes these from Twitter. Although work on unsupervised and semi-
supervised approaches is complete, there is still much room for improvement.
The major goal of this research is to investigate and evaluate a sentimental analysis
model based on nave Bayes classification.
2 Literature Survey
Sentimental analysis is a category of classification problems. It deals with classifi-

cation and identification of sentiments and opinions appearing in text. In [1–7], the
authors proposed various methods for sentiment analysis.
The research in this field of sentimental or opinion analysis arose in early 1990s.
The authors [8–26] implemented different techniques in machine learning, networks,
and security domains. Later on many approaches were introduced which made it
very easy to analyze the sentiments of user generated data be it in the form of tweets,
product reviews, political views which helps in knowing the opinion of huge crowd in
various domains. Twitter posts are preprocessed and categorized as positive, negative,
or neutral based on their emotional content.
The accuracy of the classifier is improved by pre-processing textual data to
reduce noise. The authors [27–30] proposed predictions and survey on deep learning
A Novel Twitter Sentimental Analysis Approach Using Naive Bayes… 403
Fig. 1 Proposed methodology for twitter sentiment analysis
domains. Another study attempted to pre-process the dataset, then extract the adjec-
tives from the dataset that have significant significance (feature vector), pick the
feature vector array, and apply machine learning algorithms such as Nave-Bayes,
Maximum Entropy, and SVM. Finally, they evaluated the classifier’s efficiency in
terms of recall, precision, and accuracy. Nave Bayes, according to Bo Pang and
Lillian Lee, is the most efficient process with highest accuracy.
The proposed methodology is a classifier built using naïve Bayes algorithm. The
following steps are performed and shown in the Fig. 1.
(1) Creating a Twitter developer account, (2) Obtaining access keys and access
tokens, (3) Connecting to Twitter API, (4) Data Acquisition, (5) Data preprocessing,
(6) Feature Extraction, (7) Training the classifier, (8) Classification.
4 Functionality and Design
Initially, a Twitter developer account has to be created by answering a few questions

on the purpose of the account. Then, Twitter takes a few hours to verify the response
and activate the account. Once the account is created, we can request access key,
access token, consumer key and consumer secret. Next up, we are using Twitter
API, a library that provides a Python interface to the Twitter API. Then after we are
searching for positive, negative and neutral tweets. Storing these tweets obtained in
CSV files. Figure 2 shows the CSV file of tweets.
Examples of positive tweets:
Fig. 2 Example of tweets stored in CSV files
. It’s always great hearing from happy customers! Show your support for small
businesses like ours.
. Happy 6 months to this beautiful girl, I can’t wait for what the future has for us.
. To all runners training for the Every Step Counts 5 K event, May the 4th is with
you! Happy for you! Register for our ESC 5 K today.
Examples of negative tweets:
. It makes me feel so sad that bullying still exist I think it will never stop in this
world and that is so horrible.
. Stormy Daniels is a very selfish person. For her to bring this up from years ago.
Sad, not thinking of Melania.
. I am disappointed. It’s not live in CA anymore:(
Examples of neutral tweets:
. Are there any flights flying from nyc to CA this afternoon?
. Where is the Eiffel tower entrance?
. Where can I get my license renewed?
As seen in the examples above, tweets may contain useful information and express
opinions on any subject. But they also contain a lot of irrelevant characters. Hence
preprocessing of data is important.
We are applying Tokenization, Normalization and Substitution preprocessing
techniques, then after extract the futures. To build a training model, choose one
of the text classification algorithms and feed the training corpus to the classifier.
We have chosen Multinomial naïve Bayes classifier. Now that we have the training
model, we can feed it with the testing data and get a classification prediction.
Fig. 3 Example of a positive tweet
The learning algorithm Nave Bayes is widely used in text classification prob-
lems because it is computationally efficient and simple to implement. There are two
different event models: Multivariate Event Model and Multivariate Bernoulli Event
Model.
The Multivariate Event model is referred to as Multinomial Naive Bayes. The
Multinomial Naive Bayes algorithm is a probabilistic learning method prominent in
Natural Language Processing (NLP). It calculates each tag’s probability for a given
sample and outputs the tag with the highest probability.
Bayes theorem gives a route to estimating posterior probability P(c|x) from P(c),
P(x) and P(x|c).
The below Figs. 3 and 4 shows that examples of positive and negative tweets
respectively.
Text classification is amongst the most crucial elements of text data mining, and it is
used to perform sentiment analysis. Although today’s data is exploding, classifying
Fig. 4 Example of a negative tweet
vast amounts of data has become a challenge. Through this methodology, we can
collect and analyze the sentiment of tweets based on keyword searched. Sentiment
analysis is still in its early phases, especially in the context of micro blogging, and
is far from complete. As a result, we’ve come up with a few ideas that we think are
worth investigating in the future and could lead to even better outcomes.
References
1. Fouad MM, Gharib TF, Mashat AS (2018) Efficient Twitter Sentiment Analysis System with
Feature Selection and lassifier Ensemble. International Conference on Advanced Machine
Learning Technologies and Applications. Springer, pp 516–527
2. Musto C, Semeraro G, Polignano M (2014) A comparison of lexiconbased approaches for
sentiment analysis of microblog posts. Info Filtering Retrieval 59
3. Kharde V, Sonawane P Sentiment analysis of twitter data: a survey of techniques. arXiv preprint
1601.06971
4. Harb A, Plantié M, Dray G, Roche M, Trousset F, Poncelet P (2008) Web Opinion Mining:
How to extract opinions from blogs? In: Proceedings of the 5th international conference on
Soft computing as transdisciplinary science and technology, ACM, pp 211–217
5. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: sentiment classification using machine
learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural
language processing vol 10, Association for Computational Linguistics, pp 79–86
6. Silge J, Robinson D (2017) Text mining with R: a tidy approach, O’Reilly Media
7. Abirami A, Gayathri V (2017) A survey on sentiment analysis methods and approach. In: 2016
eighth international conference on Advanced computing (ICoAC), IEEE, pp 72–76
8. Thirupathi L, Rao PVN (2021) Multi-level protection (Mlp) policy implementation using graph
database. Int J Adv Comput Sci Appl (IJACSA) 12(3). https://doi.org/10.14569/IJACSA.2021.
0120350
9. Thirupathi L, Rao PVN (2020) Developing a multilevel protection framework using EDF. Int
J Adv Res Eng Technol (IJARET) 11(10):893–902
10. Thirupathi L, Padmanabhuni VNR (2020) Protected framework to detect and mitigate attacks.
Int J Anal Exp Modal Anal 12(4):2335–2337. https://doi.org/18.0002.IJAEMA.2020.V12I6.
200001.0156858943
11. Thirupathi L, Rekha G (2016) Future drifts and modern investigation tests in wireless sensor
networks. Int J Adv Res Comput Sci Manag Stud 4(8)
12. Thirupati L, Pasha R, Prathima Y (2014) Malwise system for packed and polymorphic malware.
Int J Adv Trends Comput Sci Eng 3(1):167–172
13. Thirupathi L, Galipelli A, Thanneru M (2014) Traffic congestion control through vehicle-to-
vehicle and vehicle to infrastructure communication. (IJCSIT) Int J Comput Sci Info Technol
5(4):5081–5084
14. Swathi M, Thirupathi L (2013) Algorithm for detecting cuts in wireless sensor networks. Int J
Comput Trends Technol (IJCTT) 4(10)
15. L Thirupathi, Reddemma Y, Gunti S (2009) A secure model for cloud computing based storage
and retrieval. SIGCOMM Comput Commun Rev 39(1):50–55
16. Thirupathi L, Nageswara RPV (2018) Understanding the influence of ransomware: an inves-
tigation on its development mitigation and avoidance techniques. Grenze Int J Eng Technol
(GIJET) 4(3):123–126
17. Thirupathi L, Sandeep R (2017) Social media: to deal crisis circumstances. Int J Innov Adv
Comput Sci (IJIACS) 6(9)
18. Rekha S, Thirupathi L, Renikunta S, Gangula R (2021) Study of security issues and solutions in
Internet of Things (IoT). Mater Today Proc ISSN 2214–7853. https://doi.org/10.1016/j.matpr.
2021.07.295
19. Gangula R, Thirupathi L, Parupati R, Sreeveda K, Gattoju S (2021) Ensemble machine learning
based prediction of dengue disease with performance and accuracy elevation patterns, Mater
Today Proc ISSN 2214–7853. https://doi.org/10.1016/j.matpr.2021.07.270
20. Nalajala S, Thirupathi L, Pratap NL (2020) Improved access protection of cloud using feedback
and de-duplication schemes. J Xi’an Univ Architect Technol 12(4)
21. Srividya V, Swarnalatha P, Thirupathi L (2018) Practical authentication mechanism using
passtext and OTP. Grenze Int J Eng Technol Spec Issue Grenze ID 1 GIJET.4.3.27,© Grenze
Scientific Society
22. Thirupathi L, Rehaman Pasha MD, Reddy GS (2013) Game based learning (GBL). Int J Res
Eng Adv Technol 1(4)
23. Thirupathi L et al. (2021) J Phys Conf Ser 2089:012049. https://doi.org/10.1088/1742-6596/
2089/1/012049
24. Thirupathi L et al. (2021) J Phys Conf Ser 2089:012050. https://doi.org/10.1088/1742-6596/
2089/1/012050.
25. Pratapagiri S, Gangula R, Ravi G, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early
detection of plant leaf disease using convolutional neural networks. In: 2021 3rd International
conference on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.
1109/ICERA53111.2021.9538659
26. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message over
the network using another frequency and timing technique to circumvent the jammers. J Nucl
Ene Sci Power Generat Techno 10:9
27. Reddy CKK, Anisha PR, Shastry R, Ramana Murthy BV (2021) Comparative study on internet
of things: enablers and constraints. Adv Intell Syst Comput
28. Reddy CKK, Babu BV (2015) ISPM: improved snow prediction model to nowcast the presence
of snow/no-snow. Int Rev Comput Softw
29. Reddy CKK, Rupa CH, Babu BV (2015) SLGAS: supervised learning using gain ratio as
attribute selection measure to nowcast snow/no-snow. Int Rev Comput Softw
30. Reddy CKK, Rupa CH, Babu BV (2014) A pragmatic methodology to predict the presence of
snow/no-snow using supervised learning methodologies. Int J Appl Eng Res
Recognition and Adoption
of an Abducted Child Using Haar
Cascade Classifier and JSON Model
Ghousia Begum, C. Kishor Kumar Reddy, and P. R. Anisha
Abstract The purpose of the paper is to recognize an abducted child from the photos
of children available, with the help of face recognition and face matching technique
and to adopt an abandoned child from the data available, with the help of string
comparing technique. The work in this paper is extended by providing the concept
of an adoption system for the abandoned child. Haar cascade classifier, an OpenCV
classifier can be used for face recognition, the mean method can be used for face
matching, and DB can be used for string comparisons. The dataset consists of 500
images for child recognition and child matching, and 50 images for child adoption.
The accuracy obtained from the face recognition algorithm is 80.7%.
Keywords Haar cascade classifier · JSON model · OpenCV and DB
1 Introduction
Image processing enables us to modify and manipulate a number of images at-a-time

and takes out helpful information from them. It has a deep range of applications in
most of every field. According to [1], the country loses hundreds of children each
day. According to [2], NISMART–2, a missing child was defined in two ways: In the
first instance, those who were missing from their caretakers (“caretakers missing”);
and in the second instance, those who were reported to an agency for missing from
their caretakers and to help in locating them (“reported missing”).
These abductions can be done in many ways. It can be done by strangers and can
even be by family members. It may include the following: By kidnapped children, by
lost children, by run-away children, by trafficked children, and by family abduction.
In India, several children are getting abducted every day. These are the children
who got separated from their elders, guardians, or from a family. As per the NCRB
report, which was cited by the MHA in the parliament; stated that in India, the number
of reported missing children between 2016 and 2018 is more than 193,000, and many
G. Begum · C. Kishor Kumar Reddy (B) · P. R. Anisha

Stanley College of Engineering and Technology for Women, Hyderabad, Telangana, India
e-mail: kishoar23@gmail.com
410 G. Begum et al.
of them are remain untraced. This project will be helpful to these abducted children,
their parents, and the authorities who are searching for these children.
Even after getting traced, some of the children are not taken by their real parents
or guardians. These are abandoned children. There are some parents who don’t want
to take their children back and there are also some parents who want to adopt the
child. These parents, who want to adopt the child, are known as adoptive parents.
This project will be helpful for these adoptive parents who can adopt the child and
live happily.
2 Literature Survey
2.1 A Survey on Previous Papers
In paper [3], a methodology is proposed for missing child identification system that
combines both facial feature extraction and matching. For feature extraction, a deep
learning method is used and for matching support vector machine is used. Face
detection is done using HOG algorithm. A box is bounded on the detected face,
and by using the algorithm of face landmark estimation, sixty-eight specific points
(landmarks) on the face are figured out. After passing these images to the deep CNN,
128 measurements are obtained. An SVM classifier which takes the measurements
from a test image and gives the closest match as output. A face is recognized using
this SVM classifier. The dataset in this system consists of 43 child cases and the
accuracy of the system is 99.4%. The system doesn’t perform well in case of large
data set because the required training time will be higher.
In paper [4], a method of tracking people online and identifying them using
RFID and Kinect was proposed. Kinect V2 sensor was used for tracking, and it
generates a skeleton of the body for six persons. Identification was performed using
both Kinect and passive RFID. A person’s skeleton is first measured, and then their
RFID tag measured using the reader antenna positions as references, and then the
best match is made between the two. Only six people can be tracked by this system
simultaneously. The effective area is limited to four meters. People have to wear the
RFID tag physically.
In paper [5], the system presented an E-crime alert by using the robust face
recognition system. It worked on the LEM algorithm to detect the point, and LSD is
calculated, and finally, the feature is computed. The system is efficient to 85%, and
it doubles the cost of computing time.
In paper [6], a system is developed by using deep learning for face detection and
tagging a deep dense face detector is used for face detection, and the LBPH method is
used to recognize the detected faces. The system is extended by providing the concept
of a tagging system for the detected faces. For the faces detected successfully, the
system achieved an accuracy of 85% for tagging the faces.
Recognition and Adoption of an Abducted Child Using Haar... 411
In paper [7], the author presented work on identifying aging deep face features on
missing children. The system proposed an age-progression module that is responsible
for age-progress deep face features given by any commodity face matcher. Three
face matchers called Face Net, Cos Face, and COTS were used to evaluate the face
matching results. The name of the dataset is the ITWCC dataset, and it consists of
7990 images of 745 child celebrities.
In paper [8], a methodology is proposed for missing child identification systems
using deep learning and multi-class SVM. It combined both facial feature extrac-
tion and matching. For feature extraction, a deep learning method is used and for
matching support vector machine is used. Face recognition is done using the VGG-
Face network. An SVM classifier takes the measurements from a test image and gives
the closest match as output. The dataset in this system is user defined and it consists
of 846 child face images with 43 unique children’s cases. The accuracy of the system
is 99.41%. The system doesn’t perform well in case of large data set because the
required training time will be higher.
In paper [9], the work presented on developing the ML-based methods, to recom-
mend missing person’s search level. It proposed the methods of ML to support the
decisions of the police actions in search of missing persons. According to the author,
the time between the moments of disappearance of a person to the moment of deci-
sion making must be short. Several methods were explored, including decision trees,
random forests, naïve bayes, support vector machines, and multi-layer perceptron.
Among all these, decision trees and random forests gave the Fit factor value that
indicates their best adaptations for the classified information. The weakness of the
system was the small number of real cases.
In paper [10], an application is proposed for uploading the complaints of a missing
person on the AWS web server. It can be accessed by any of the government officials
and also by the local people for matching the missing person’s face. By using face
recognition, this application matches the image of a missing person on any android
platform. It consists of several layers like Presentation Layer for front-end, Business
Layer for requests and responses, and Database Layer for storing data. This applica-
tion obtained a better accuracy. It is limited to android devices only, and it requires
an internet connection.
2.2 Challenges and Gaps Identified
. Some of the papers do not have authentic missing child image dataset like in
reference [7]. It includes only the image dataset but not the missing child image
dataset.
. According to reference [8], a deeply disturbing fact about India’s missing children
is that many children go missing every day, and half of them remain untraced.
. According to reference [8], the earliest methods for face recognition commonly
used features such as LBPH, HOG, SIFT, or SURF does not give better
performance.
412 G. Begum et al.
. Even after getting traced, the abandoned children are not provided with any option
in the system.
. There is no adoption module for the abandoned children.
. No other module is provided for those users who want to adopt the children from
the abandoned ones.
. There is no option provided for the authorities or the official people regarding
how to proceed with the abandoned children.
3 Methodology
3.1 Dataset
The dataset in this system is user-defined and a real-time dataset. The data used to
make the dataset is collected from the authentic website named “Ministry of Women
and Child Development (MWCD)” Govt. of India. This website provides information
for both the Missing Children and the Recovered Children.
The data is downloaded by web scrapping and written in a.txt file and then loaded
into MySQL DB after performing preprocessing techniques. For the missing children,
the data is downloaded for the months 11th June 2021–13th August 2021, and it
consists of 500 records (RCF images). There are 10 more images downloaded for
the missing children whose details are not added to DB (RCNF images). There are
10 more random (things) images downloaded whose details are not added to DB
(NRCNF images). For the adopting children, the data is downloaded before March
2021, and it consists of 50 records of recovered children.
3.2 System Architecture
The system architecture consists of two main modules divided into four modules.
These two main modules are: (1) Child Recognition and (2) Child Adoption. These
two modules are sub divided into four modules. Child Recognition is divided into
Official Module and Public Module. Child Adoption is divided into Welfare Module
and Adopting Parent Module (Fig. 1).
Module 1: Official Module The Official Module is responsible for uploading

abducted child images. Child recognition and matching of the image process begins
once the user uploads the suspected child images. The missing or abducted child DB
contains all the information along with the status and uploaded date. If the status
shows that the child found in the DB, the official people can trace the child. If the
child is not taken by the guardians or parents after six months, the child’s details will
be sent to the Welfare Module for the adoption process.
Module 2: Public Module The Public Module is accessible to those users who want
to upload the suspected child details. Public can upload the suspected child image
with other details, then face recognition and face matching of the image process
begins. After this process, the screen shows the status of the uploaded child image.
The status shows one of the three outputs. The three outputs are.
1. Thank you for uploading. It is not recognized as a face, and no child found
(NRCNF).
2. Thank you for uploading. It is recognized as a face, but no child found (RCNF).
3. Thank you for uploading. It is recognized as face and child found (RCF).
When the user uploads any of the thing’s images but not the child image, the
status shows the first output. When the user uploads a suspected child image that is
not there in the missing database, the status will show the second output. When the
user uploads the suspected child’s photo that is stored in the missing database, the
status shows the third output.
Module 3: Welfare Module The Welfare Module is responsible for uploading the
adoptive child images. Welfare people add the images according to the conditions
provided by the adoptive parents. There are six folders which contains adoptive child
images based on the requirements (age and gender) provide by the adoptive parents.
By using string comparing through DB technique, the welfare people match the
data provided by the adoptive parents at the time registration with the data while
resubmitting the child details. If the details are matched, the respective child images
are displayed on the screen. If the details are not matched, the adoptive parent will
414 G. Begum et al.
not have access to the adoptive child images. After adoption, the welfare people
generate the adoption certificate.
Module 4: Adopting Parent Module The Adopting Parent Module is accessible to
those users who want to adopt the abandoned child. It is responsible for providing the
abandoned children to the adoptive parents. This module requires the login process
of the adopting parent. Before the adoption process, the adopting parent has to go
through the adoption rules, provide the required documents, and strictly adhere to the
adoption rules. After reading the adoption rules, adopting parent has to register. After
registration, adopting parents can log in by providing the username and password
created at the time of registration. For verification, the adopting parent has to resubmit
those child details that are submitted at the time of registration. In this, the system
compares both the strings through Database (DB). If the strings match, a screen
will be appeared showing the Adoption Child Images List. From that list, adopting
parents can adopt the abandoned child, get the adoption certificate, and can log out.
3.3 Methods
Face Recognition by Haar Cascade Classifier The Face recognition by using a

classifier is done in the following way:
First, we will input the abducted child images. From those images, we crop only
the face images. We recognize the faces by using Haar cascade classifier, and classify
the group of features into stages of classifiers. This classifier will give a cropped face
after a particular required iteration. After this, JSON model is loaded, and it is used
for parsing. Then weights are loaded and saved in HDF5 format. After loading the
weights, we use a function to predict the child faces. Resizing of the image is done
with a dimension of pixels containing 64 height and 64 widths. The Haar cascade
classifier is an OpenCV classifier used for face detection. It is used to extract features
from the images. Haar features are shown in the below image. They are just like our
convolutional kernel. Each feature is a single value obtained by subtracting sum of
pixels under the white rectangle from sum of pixels under the black rectangle (Figs. 2
and 3).
But among all these calculated features, most of them are irrelevant. In In an
image, most of the image is non-face region. So, it is a good idea to have a simple
method that checks if a window is not containing a face region. If it is not, discard it,
and don’t process that window again. Instead, focus on areas which contains a face.
This way, we spend more time checking possible face regions. For this the concept
of Cascade of Classifiers is introduced.
Face Matching by Using a Mean Method First, we will get the recognized face
from the face recognition method. The mean of the image of original face pixels is
calculated which is in the first folder. Then, the mean of the image of matching face
pixels is calculated which is in the second folder.
Fig. 2 Flow of face recognition system
sum of the pixel values

mean(x) = (1)
total number of pixels
If both the mean matches, the system will delete the image first folder and add
it to the third folder, and shows the message as matched. Otherwise, it shows the
message as did not match.
String Comparing Through DB The two important strings are Child Age and Child
Gender. This string comparing method takes both the strings age and gender from
the log-in page and compares them with the strings stored in the database of that
particular adopting parent, with the respective attributes. It is possible with the help
of python “for” and “if” statements. If the strings match, then the system will appear
a screen showing the appropriate list of abandoned child images. It is possible with
the help of python “if” and “elif” conditions. If the strings do not match, then the
system will appear a screen showing the message as log in failed.
416 G. Begum et al.
Fig. 3 Calculating Haar features in face recognition
The dataset consists of 500 missing child (1–17 years old both male and female)
images. From these 500 images, the system is tested for 396 missing child images.
There are 10 more images downloaded for the missing children whose details are
not added to DB. From these 10 images, the system is tested for 4 images. There are
10 more random (things) images downloaded whose details are not added to DB.
From these 10 images, the system is tested for 5 images. Therefore, the system is
totally tested for 405 images. Among the 405 images, 396 belong to RCF, 4 belong
to RCNF, and 5 belong to NRCNF. The actual 396 RCF images predicted 318 as
RCF, 0 as RCNF, and 78 as NRCNF. The actual 4 RCNF images predicted 0 as RCF,
4 as RCNF, and 0 as NRCNF. The actual 5 NRCNF images predicted 0 as RCF, 0 as
RCNF, and 5 as NRCNF.
The average accuracy obtained is 0.8074074074074075 i.e., 80.74%. The preci-
sion for RCF obtained is 1.0. The precision for RCNF obtained is 1.0. The
precision for NRCNF obtained is 0.060240963855421686. The average preci-
sion for all the classes obtained is 0.6867469879518072. The recall for RCF
obtained is 0.803030303030303. The recall for RCNF obtained is 1.0. The recall
for NRCNF obtained is 1.0. The average recall for all the classes obtained is
0.9343434343434343. F1-score for all the classes obtained is 0.7916369505649186
(Fig. 4).
Fig. 4 Output screen for entering suspected child details
5 Conclusion
In this paper, we have proposed a system for recognizing an abducted child using
a classifier. Even after getting traced, some of the children are not taken by their
real parents or guardians. This research will be helpful for these abducted children,
their parents, authorities, abandoned children and the adoptive parents who can adopt
the child and live happily. The work in this research is extended by providing the
concept of an adoption system for the abandoned child. The performance of the
face recognition and matching algorithm is obtained by using the parameters like
precision, recall and F1-score. The accuracy obtained from the face recognition
algorithm is 80.7%.
References
1. Ministry of home affairs (2019) Report on missing women and children in India. National
Crime Records Bureau
2. Flores JR (2002) National incidence studies of missing, abducted, runaway, and thrown-away
children
3. Kumar BK, Supriya G, Divya N, Bhargavi T, Venkatesh T (2020) Missing children identification
system using deep learning and multiclass SVM. J Info Comput Sci
4. Arniker SB (2014) RFID based missing person identification system. In: Conference Paper
5. Pate S, Deepak GM, Vinit JM, Parmesh KY (2016) Robust face recognition system for E-crime
alert. Int J Res Eng Appl Manage (IJREAM)
6. Mehta J, Ramnani E, Singh S (2018) Face detection and tagging using deep learning. Int Conf
Comput Commun Signal Proc (ICCCSP)
7. Deb D, Aggarwal D, Jain AK (2019) Finding missing children: aging deep face features
8. Chandran PS, Byju NB, Deepak RU, Nishakumari KN, Devanand P, Sasi PM (2018) Missing
child identification system using deep learning and multiclass SVM. In: IEEE recent advances
in intelligent computational systems (RAICS)
418 G. Begum et al.
9. Pierzchała D, Gutowski T, Czuba P, Antkiewicz R (2021) Machine learning-based method for

recommendation of missing person’s search level
10. Ansari A, Singh A, Sagar A, Komal (2020) Android based Application–missing person. Int
Res J Eng Technol (IRJET)
Automatic Brain Tumor Detection Using
Convolutional Neural Networks
Amtul B. Ifra and Madiha Sadaf
Abstract Artificial Intelligence has the potential to bring about an exemplary repo-
sition in the detection of brain tumors. Many health organizations have identified
brain tumors as the second leading cause of mortality in humans worldwide. The
possibility of an effective medical therapy exists if a brain tumor is identified at an
early stage. For appropriate diagnosis, Magnetic Resonance Imaging (MRI) is firmly
recommended for individuals with brain tumor indications. The immense geograph-
ical and structural variety of the brain tumor’s surrounding environment makes auto-
matic brain tumor classification a challenging task. The differences in the tumor site,
structure, and size present a significant difficulty for brain tumor identification. This
research proposes the design and implementation of Convolutional Neural Networks
(CNN) classification for enabling automatic brain tumor detection. When compared
to other cutting-edge methodologies such as Support Vector Machines (SVM) and
Deep Neural Networks (DNN), obtained results demonstrate that CNN repositories
have a rate of 97.5% accuracy with minimal intricacy.
Keywords Artificial Intelligence · Deep Neural Networks · Brain tumor ·

Convolutional Neural Networks · Magnetic Resonance Imaging · Support Vector
Machines
1 Introduction
Artificial Intelligence (AI) is a field in computer science that aims to provide machines
with human-resembling intelligence, allowing them to learn, analyze, and solve prob-
lems when confronted with multitudinous forms of data. In recent times, the infusion
of Artificial Intelligence into the healthcare system has helped clinical experts give
quality patient care. AI has been demonstrated in research to have a positive impact
A. B. Ifra
Shadan Women’s College of Engineering and Technology, Hyderabad, Telangana, India
M. Sadaf (B)
Chaitanya Bharathi Institute of Technology, Hyderabad, India
e-mail: memad321@gmail.com
420 A. B. Ifra and M. Sadaf
on many preoperative stages such as diagnosis, assessment, and planning [1]. One
of the important organs in the human body is the brain, which is made up of billions
of cells. Irregular cell division produces an abnormal group of cells, often known
as a tumor. Low grade and high grade are the two forms of brain tumors. Low-
grade brain tumors are known as benign brain tumors. Malignant refers to a tumor
with a high grade. Because the malignant tumor is cancerous, it spreads rapidly and
endlessly throughout the body. It causes immediate death [2]. X-rays, CT scans, and
magnetic resonance imaging are among the imaging techniques available (MRI).
The X-ray provides visual evidence of the brain or skull’s living structures and
overall synthesis. However, neuroimaging, such as MRI, is still the fundamental
basis for diagnosing brain tumors [3]. Brown et al. devised a Natural Language
Processing (NLP) ML system that analyzed brain MRI inputs and then determined
the most optimal MRI brain imaging sequence to generate the most therapeutically
valuable images, demonstrating the influence of AI even before radiological images
are obtained [4]. ML-based sequential algorithms could help standardize the MRI
sequence protocol, increasing the clinical utility of the scans produced [5]. Moreover,
researchers observed that radiologist sequence selection is frequently challenged by
unusual situations, indicating that the ML technique performed particularly well in
these instances [1]. Using publically available datasets [6], the goal of this study is to
create a completely automatic CNN model for brain tumor detection. The following
is how the rest of the paper is organized: The second and third section contains a
summary of the current system and proposed solution. Section 4 digs much deeper
into the proposed CNN model. A full comparison of the proposed method to current
methodologies, as well as a description of the experimental results, is included in
Sect. 5. Portion 6 is the paper’s final portion, and it draws everything to an end.
2 Present System
Since it became possible to capture and transmit image data to the computer, auto-
matic algorithms for brain tumor detection and type labeling have been used with
brain MRI images. Over the last decade, NN and SVM are the most commonly
used approaches for their application in classifying brain tumor images and their
ease of use, whereas deep learning models have recently established an emerging
trend in ML that represents composite relations with the least possible number of
nodes. Hence, as matter of fact, they gradually ascended towards the top of their
respective healthcare sectors, such as medical image analysis, healthcare analytics,
and bioinformatics. Various algorithms like the FCM, SVM, and DNN are used for
the partial fulfillment of the requirements to arrive at the best results for brain tumor
detection. FCM (Fuzzy C Means Clustering), is a soft clustering method in which
each data point is allocated a probability or likelihood score to belong to that cluster.
It is preferred when we have overlapped datasets. Support Vector Machines (SVMs)
are supervised learning algorithms for classifying, predicting, and detecting outliers.
They are used in high-dimensional settings and in situations where dimensionality
Automatic Brain Tumor Detection Using Convolutional Neural… 421
exceeds the amount of samples because of their extreme effectiveness [7]. SVM clas-
sifier is used to detect a cluster of malignant tumor cells in a segment of Magnetic
Resonance (MR) and fragment the tumor cells to assess the size of the tumor present
in that segmented area of the brain. The SVM approach is not appropriate for large
data sets and therefore does not function effectively when there is more distortion in
the data set. The support vector classifier has no probabilistic explanation because
it operates via positioning the data points around the classifying hyperplane [8]. So
far many algorithms have been implemented on how to detect and extract tumors in
medical images; they used techniques such as a hybrid approach with Support Vector
Machines (SVM), backpropagation, and dice coefficient. Among these algorithms
which used backpropagation as a base classifier had the highest accuracy of 90%
[7]. The Deep Neural Network (DNN) is another Deep Learning framework that has
been successfully used for classification and regression in a spectrum of areas. This
is a feed-forward network in which the input is routed through numerous hidden
layers from the input layer to the output layer (more than two). With a 0.97 recall
rate, the DNN classification rate is 96.97% [9].
3 Proposed System
The significance of early identification and recognition of brain cancers cannot be

understated. Computer-aided diagnostic (CAD) tools are now frequently used to
diagnose neurological conditions in a systematic and particular manner [10]. The
current study focuses on using brain excrescence MRIs to feed CNN for computer-
aided diagnosis of brain cancers. CNN gathers features from labeled data and learns
to identify images with or without brain excrescence diagnosis. This supervised CNN
model uses preprocessed images to improve performance. The key steps of the study
include gathering the most recent brain tumor imaging dataset, image preprocessing,
training of the model in gradual steps, and finally performance assessment by testing
the model on unknown MRI data samples. The trials performed earlier using FCM,
DNN and SVM showed a low degree of complexity and high tumor processing time,
and low accuracy. Such a model cannot be used in real-time where high speed and
accuracy are required. To overcome this problem, we have come up with an automatic
brain tumor detection system that has both high speed and accuracy. This model can
be used in real-time situations to get comparatively better results in less time with
97.5% accuracy.
4 Methodology
The design and execution of a neural network are used to model the human brain.
Vector quantization, data aggregation, optimization methods, approximation, pattern
recognition, and classification algorithms are all common uses for neural networks. A
neural network’s interconnections are divided into three categories. Neural networks
can be classified into three types: feedback, feed-forward, and recurrent. In a typical
neural network, images cannot be resized. However, in a convolutional neural
network, the picture can be resized. The Convolution Neural Network is made up of
an input layer, a convolution layer, a Rectified Linear Unit (ReLU) layer, a pooling
layer, and a fully connected layer (CNN). The convolution layer divides the image
into small parts. The element-by-element activation function is performed by the
ReLU layer. It is not necessary to use the pooling layer. We can choose whether to
use it or not. On the other hand, the pooling layer is mostly used for downsampling.
Based on the probability score between 0 and 1, the class score or label score value is
created in the last layer (i.e. fully linked layer) [11]. A block diagram of brain tumor
classification using convolutional neural networks is shown in Fig. 1. The CNN-based
brain tumor categorization process is divided into two phases: the training phase and
the testing phase. Using label names such as tumor and non-tumor brain images, the
quantity of images is divided into various categories. To create a prediction model,
the training phase includes preprocessing, semantic segmentation, and categorization
using the Loss function. To begin, label the image collection for training purposes.
Picture resizing is done during preprocessing to change the image’s size.
The loss function is calculated using a gradient descent-based approach. The raw
image component is mapped with class scores using a scoring method. The loss
function is used to determine how good a set of variables are. It is defined by how
closely the generated scores reflect the ground truth labels in the training data. The
loss function calculation is critical for improving accuracy. The precision and loss
function are inversely related, if one is high then the other is low, and vice-versa. The
value for the loss function is used to construct the gradient descent method. Compute
the loss function’s gradient by evaluating the gradient value several times.
Fig. 1 Flowchart of the proposed CNN classification system for brain tumors
4.1 Datasets
The proposed CNN is trained on the Kaggle dataset [6], which contains MRIs of brain
tumors. There are a total of 253 images in the MRI dataset [6]. We used “ImageData-
Generator” provided by Keras among other techniques for data augmentation [12].
It replaces the original batch with a new batch of images that have been randomly
modified. The images are flipped, rotated, tilted, and brightened before being resized
to 128 × 128 pixels.
4.1.1 Training Dataset
For training CNN, the Kaggle dataset is used which contains 98 without excrescence
and 155 with excrescence MRIs of the brain. To ameliorate the performance of the
model, “ImageDataGenerator” is used which generates new data continuously from
the dataset handed to train the model. Grayscale images are produced by converting
multi-channel images into single-channel images [13]. The model is trained using
80% of the photos from this dataset. The pictures are all pre-processed before being
fed to CNN. After geometric and color augmentation, the images are scaled, tilted,
and rotated before being resized to 128 × 128 pixels. Now, CNN should be able to
execute dynamically for a variety of purposes.
4.1.2 Testing Datasets
The trained model is now tested with the remaining 20% of the data. A snapshot of
the data set is shown in Fig. 2.
Fig. 2 Augmented images

Fig. 3 Brain MRI images from the dataset
4.2 CNN-Based Algorithm Classification
Three convolution layers make up CNN’s architecture, as designed in this study. The
convolution layer, the network’s fundamental building unit, combines distinct sets
by convolving images using the convolution filter, resulting in a feature map. Three
layers are presented in the architecture for extracting feature maps and creating more
information for categorization (Fig. 3).
Algorithm:
1. Convolution filter is to be added to the beginning layer.
2. Smooth convolution filter to minimize the sensitivity of the filter.
3. For the purpose of signal transduction from different layers an activation layer
is in charge.
4. A rectified linear unit is used for minimizing the time for training (RELU).
5. All neurons in the coming layers are connected to each other.
6. To the neural network, a loss layer is to be added to give feedback towards the
end of training (Fig. 4).
5 Discussions
The tumor and non-tumor MRI scans from Kaggle [6] are included in our Dataset. The
dataset contains real cases of patients. In this study, a convolutional neural network
is used to enable effective automatic brain tumor detection. Python is used to carry
out the simulation. The precision is calculated and compared to all other existing
processes. Calculating the training accuracy, validation accuracy, and validation loss
determines the efficiency of the proposed brain tumor classification system. The
current technique for detecting brain lesions is SVM-based categorization. It accepts
the feature extraction output. The classifying output is generated, and the accuracy is
Fig. 4 CNN-based algorithm classification for brain tumor images
determined using the feature value. The computation rate is slow and the accuracy is
low in SVM-based tumor and non-tumor identification. Separate feature extraction
methods are not required for the proposed CNN-based classification. The feature
value is calculated using CNN. The classification of tumor and non-tumor brain
imaging is shown in Fig. 3. As a result, the complexity and computation time are
reduced while the accuracy remains high. The accuracy of brain tumor classification
is shown in Fig. 5. Finally, the segmentation results in a Tumor brain or a Non-tumor
brain, based on the probability score value.
Fig. 5 Comparison of different brain tumor classifications

6 Conclusion
The fundamental goal of this research is to create an accurate, quick, and easy-to-use
automatic brain tumor classification system. Tumor classification has traditionally
relied on Fuzzy C Means (FCM)-based segmentation, texture and shape feature
extraction, as well as SVM and DNN-based classification. The level of complication
in this method is low, the tumor processing rate is long, and the accuracy is poor.
The suggested system includes a CNN-based classification to improve accuracy and
reduce time complexity. The results acquired are also labeled as either tumor or
normal brain images. The training accuracy is 97.5%, and the validation loss and
delicacy are determined. Validation accuracy is likewise good, with negligible vali-
dation loss. Convolutional neural networks are a developing field that will likely aid
radiologists in providing more accurate patient care. This paper provides a funda-
mental review of automated segmentation, allowing the reader to be well-informed
about the field. This could be used in other fields of radiology by further developing
segmentation techniques in brain tumors.
References
1. Williams S, Layard Horsfall H, Funnell JP, Hanrahan JG, Khan DZ, Muirhead W, Stoyanov
D, Marcus HJ (2021) Artificial intelligence in brain tumor surgery-an emerging paradigm.
Cancers 13(19):5010. https://doi.org/10.3390/cancers13195010
2. Zhang J et al. (2018) Brain tumor segmentation based on refined fully convolutional neural
networks with a hierarchical dice loss. In: Cornell university library, computer vision, and
pattern recognition
3. Ranjbar Zadeh R, Bagherian Kasgari A, Jafarzadeh Ghoushchi S et al (2021) Brain tumor
segmentation based on deep learning and an attention mechanism using MRI multi-modalities
brain images. Sci Rep 11:10930. https://doi.org/10.1038/s41598-021-90428-8
4. Brown AD, Marotta TR (2018) Using machine learning for sequence-level automated MRI
protocol selection in neuroradiology. J Am Med Inform Assoc 25:568–571. https://doi.org/10.
1093/jamia/ocx125.[PMCfreearticle][PubMed][CrossRef][GoogleScholar]
5. Brown AD, Marotta TR (2017) A natural language processing-based model to
automate MRI brain protocol selection and prioritization. Acad Radiol 24:160–166.
7.1016/j.acra.2016.09.013. [PubMed][CrossRef][Google Scholar]
6. Abdalslam L (2019) Brain tumor detection CNN, retrieved 10 November 2021 from. https://
www.kaggle.com/loaiabdalslam/brain-tumor-detection-cnn/data
7. Pedapati P, Tanneedi RV (2017) Masters thesis electrical engineering, December 2017. http://
www.diva-portal.org/smash/get/diva2:1184069/FULLTEXT02.pdf
8. Dhiraj K (2019) Top 4 advantages and disadvantages of Support Vector Machine or
SVM, Medium. https://dhirajkumarblog.medium.com/top-4-advantages-and-disadvantages-
of-support-vector-machine-or-svm-a3c06a2b107
9. Mohsen H, El-Dahshan ESA, El-Horbaty ESM, Salem ABM (2018) Classification using deep
learning neural networks for brain tumors. Future Comput Info J 3(1):68–71, ISSN 2314–7288.
https://doi.org/10.1016/j.fcij.2017.12.001
10. Alam MS, Rahman MM, Hossain MA, Islam MK, Ahmed KM, Ahmed KT, Singh BC, Miah
MS (2019) Automatic Human Brain Tumor Detection in MRI Image Using Template-Based K
Means and Improved Fuzzy C Means Clustering Algorithm. Big Data Cogn Comput 3(2):27.
https://doi.org/10.3390/bdcc3020027.[GoogleScholar][Scopus]
11. Seetha J, Raja SS (2018) Brain tumor classification using convolutional neural networks.
Biomed Pharmacol J 11(3). https://dx.doi.org/https://doi.org/10.13005/bpj/1511
12. Naseer A, Yasir T, Azhar A, Shakeel T, Zafar K (2021) Hindawi Int J Biomed Imaging 2021,
Article ID 5513500. https://doi.org/10.1155/2021/5513500
13. Reddy CKK, Anisha PR, Apoorva K (2021) Early prediction of pneumonia using convolutional
neural network and X-Ray images. In: Smart innovation, systems and technologies
Deep Learning and Blockchain
for Electronic Health Record
in Healthcare System
Ch. Sravanthi and Smitha Chowdary
Abstract The emerging technologies like Artificial intelligence and block chain
technology has wide range of applications in the field of healthcare system. Deep
neural networks or deep learning technology in Artificial intelligence which works
similar to human brain is coupled with blockchain technology provides effective
tracking and personalized collection of data in the medical field. Integration of these
two technologies together allows data security and transparency in the medical care
system with a high accuracy. Review on several research paper by using deep learning
and blockchain technology illustrated security and efficiency advancement for the
prediction and decision making process in biomedical applications. Blockchain tech-
nology stores cryptographic data which artificial intelligence requires. Real life
data increases the accuracy of regression or classification problems in deep neural
network. Block chain technology ensures the safety of data exchanging and analysis
among data suppliers. Comparison studies between deep learning technology and
blockchain technology in medical field aims to give a brief informations and process
flow about their integration process in medical electronic health care sector. Integra-
tion of artificial intelligence and Block chain technology provides time consuming
and more accurate result with all safety towards data exchanged.
Keywords Deep learning · Blockchain · Prediction · Artificial intelligence ·

Machine learning
1 Introduction
Medical services is coming to another age where the adequate biomedical infor-
mation are assuming an ever increasing number of critical parts. The tremendous
Ch. Sravanthi (B)

G. Narayanamma Institute of Technology & Science, Hyderabad, Telangana, India
e-mail: sravanthi.cvr@gmail.com
S. Chowdary
Koneru Lakshmaiah Educational Foundation, Vijayawada, Andhra Pradesh, India
430 Ch. Sravanthi and S. Chowdary
attainable quality of biomedical information gets awesome possibilities and diffi-

culties to medical care study. Specifically, finding the relations among every one
of the different snippets of data in these informational collections is a focal issue
to develop trustworthy remedial devices in view of information driven strategies
and AI. Given its laid out execution in various fields and the quick improvements
of procedural upgrades, deep learning ideal models present exciting novel oppor-
tunities for biomedical informatics. Endeavors to apply profound learning ways to
deal with medical care are now booked or begun. For instance, Google DeepMind
has declared missions to apply its ability to medical services and Enlitic is utilizing
profound learning insight to recognize medical issues on X-beams and Computed
Tomography (CT) filters. Nonetheless, deep learning methods have not been broadly
surveyed for a complete scope of clinical issues that could profit from its capacities.
There are various elements of deep learning that could be useful in medical services,
for example, its more prominent presentation, start to finish learning plan with consol-
idated highlight learning, ability of taking care of complex and multi-methodology
information, etc. To revive these endeavors, the profound learning research field,
overall, should address various difficulties associating with the highlights of medical
care information [1].
The information on blockchain, with acquired highlights like decentralization,
straightforwardness and anonymization, was introduced in the digital currency
Bitcoin in 2008. This prompted considerations and ideas that blockchain innova-
tion could be important in an assortment of different information driven regions,
including medical care. As indicated by IBM, 70% of medical care leaders guess
that the most vital impact of blockchain inside the wellbeing area would be improve-
ment of clinical test the board, administrative consistence and giving a scattered
construction to dispersion of electronic wellbeing records (EHR). Moreover, the
worldwide blockchain innovation market in the medical care industry is probably
going to cross $500 million by 2022 [2].
2 Related Work
In this article, we discuss about most recent purposes of deep learning and block chain
in clinical science, accentuating the distinguished elements which can considerably
influence health care. We have especially centered on the subject of electronic health
records stored by providing safer mechanisms using deep learning and blockchain
based on several published peer-reviewed studies.
2.1 Deep Learning for Healthcare
The huge models applied to the clinical consideration locale have been by and large
established on convolutional neural networks (CNNs) [3] Ismail et al., K-Nearest
Deep Learning and Blockchain for Electronic Health Record… 431
Neighbor (K-NN) for arrangement and forecast at this point time taken for the
assumption is longer so the further improvement in gauge is LSTM (Long Short-
Term Memory Neural Network) with Recurrent Neural Networks (RNNs) has been
made [4], Restricted Boltzmann Machines (RBMs) [5], and Autoencoders (AEs) [6].
2.2 Drug Discovery
Patel et al. [7] one of the significant AI techniques used for drug discovery is sensible
molecular candidates based on the training inputs by means of QSAR. This also devel-
oped by the neural network based retrosynthesis algorithm to obtain 90% accuracy.
This paper incorporates the survey for AI playing major role in drug discovery in
pharma intelligence.
Keshavari et al. [8] recommended the headway of COVID-19 medication and
immunization revelations by man-made reasoning procedures. If the specialist having
an adequate number of information, it will prepare the information and anticipate the
require immunization. So they gathered dataset of mixtures, peptides and epitopes
found either in silico or vitro from CoronaDB-AI. The outcome assessed from the
trained dataset and discovered the effective viral therapies.
2.3 Alzheimer’s Disease Prediction
Jo et al. [9] on the basis of neuroimaging techniques and data, the utilization of deep
learning in detecting Alzheimer’s disease has a rapid growth in healthcare system.
The accuracy of Alzheimer disease prediction is detected by the combination of
machine learning and stacked auto-coder (SAE). Alzheimer infection research actu-
ally developing and further developing execution ordinarily by fusing cross breed
information types, omics information and so on, Fisher et al. [10] Charles., explained
the unsupervised machine learning involved to predict Alzheimer disease for dozens
of patients simultaneously. The techniques used in machine learning called Condi-
tional Restricted Boltzmann Machine (CRBM). The dataset collected from 44 clinical
places with 18 months trajectories from 1909 patients. This unsupervised techniques
predicts changes in ADAS-Cog scores with accuracy.
2.4 Clinical Imaging
Lundervold and Lundervold [11], elaborated the machine learning algorithm in clin-
ical pictutre handling and image analysis. They used image analysis in MRI from
segmentation to disease prediction. Esteva et al. [12] illustrated the deep learning in
medical computer vision and how it get benefited through various applications such
as cardiology, pathology, dermatology and ophthalmology. This work also included

challenges and hurdles required to implement of these technology in the real world.
Gulshan et al. [13] utilized CNNs to characterize diabetic retinopathy in retinal
fundus photographs, acquiring high awareness and particularity over around 10,000
test pictures concerning guaranteed ophthalmologist understandings. CNNs addition-
ally got exhibitions comparable to 21 board-confirmed dermatologists on ordering
biopsy-demonstrated clinical pictures of various types of skin disease over an
enormous informational collection of 130,000 pictures.
2.5 Blockchain for Healthcare
The need for patient-driven offices and connecting disparate frameworks have caused
the act of blockchain. Blockchain conveys patients unlimited authority over their
wellbeing accounts. Patient information is very case-delicate and should be kept
and partaken in a protected and private way. Thus, it is a significant objective for
malevolent attacks, like Denial of Service (DoS), Mining Attack, Storage Attack and
Dropping Attack. Blockchain conveys a secured and solid stage for medical care
against frustrations and assaults since it contains assorted systems of access control
[14].
The act of blockchain innovation in medical services doesn’t accentuation on
understanding’s classification and security as it were. It is applied to address extra
significant subjects like interoperability. Applying safeguarded strategies to share
clinical information is testing a direct result of the heterogeneous information struc-
tures among different gatherings, which brings about similarity forestalling. Infor-
mation understanding can be fragmented because of the disparate utilization of the
wording “medical services”. It is compulsory to settle on both construction and
semantics of information to share restorative information.
One application test is Guard time, a Netherland based information security firm,
which got together with the public authority of Estonia to make a blockchain-based
system to affirm patient characters. A second EHR-related execution is MedRec, a
venture began between MIT Media Lab and Beth Israel Deaconess Medical Center.
This stage offers a scattered strategy to taking care of assents, endorsement, and
information dividing among medical care frameworks [15].
2.6 Block Chain Based Electronic Health Record
Reegu et al. [16] suggested the EHR for handling the secure data using blockchain
techniques. They have likewise inspected the pandemic management in more efficient
such as monitoring supply chain management for vaccines, data aggregation, forecast
for additional advancement of contamination populace and COVID certificate for
the people etc., So blockchain technology provided the accurate and secure data
storage during pandemic period. Dubovitskaya et al. [17] illustrated that the clinical
care services raised with the specialization across multiple hospitals for disease
prediction and diagnosis of chronic disease such as cancer. They collaborated with
Stony Brook University Hospital and developed ACTION-EHR to treat radiation
treatment to cancer. These techniques built on Hyperledger Fabric with blockchain
framework. By approaching the blockchain technology, obviously the therapy for
the cancer would get succeed without delay and efficient.
Fatokun et al. [18] The goal of this research was to demonstrate blockchain concept
used to solve privacy, security and data exchange issues and implemented Ethereum
consortium blockchain. But there is some drawbacks which should be avoided by
future. The drawback that they mentioned are scalable system for blockchain, extra
overhead due to bandwidth resources and suggested to include machine learning to
detect the intrusions.
Wang and Song [19] projected a safe cloud-put together EHR framework based
with respect to blockchain and property based cryptosystem. To scramble clinical
information, they utilized a combination of personality based encryption and char-
acter based initials at the comparative chance to execute advanced signs. On top
of that blockchain, different strategies are utilized to shield the dependability and
detectability of clinical fitness. While the recently referenced three investigations
figured around the cryptographic highlights to get EHR blocks, Roehrs et al. [20] have
addressed different difficulties associated with the mixture of scattered wellbeing
record, and the entrance controlling of medical care supplier’s benefactors. These
two matters were settled by proposing OmniPHR, a scattered model for absorbing
individual wellbeing records (PHR) that utilizes an identical information base to store
PHR in blocks and joined primary semantic interoperability and cutting-edge vision
of various PHR set-ups. Ultimately, in an absolutely different strategy, Hussein et al.
[21] laid out a structure for defending helpful record by blockchain innovation in
view of hereditary calculations and discrete wavelet changes. The projected technique
involves a reexamined cryptographic hash generator for delivering the fundamental
client security key. In addition, MD5 (a message-digest calculation utilizing a hash
work that yields a 128-bit hash esteem) strings were used to make another key set-
up by tolerating a discrete wavelet change. This strategy further develops generally
framework security and protection to various attacks.
2.7 Healthcare IoT and Medical Devices
Griggs et al. [22] gave the expansion of WBANs blockchain savvy arrangements
for a safeguarded ongoing patient noticing and clinical intercessions framework.
The review proposes the joining of blockchain to perform shrewd arrangements that
would survey information gathered by a patient’s IoT medical services gadgets in
view of custom fitted edge principles. This is finished to beat the issue of logging
move of information exchanges in an IoT medical services structure. Rahman et al.
[23] introduced a shrewd dyslexia investigation arrangement where a decentralized
enormous information source was utilized to store and afterward share with medical
care gatherings and people utilizing blockchain.
Versatile Hypermedia Health information was seized during dyslexia examina-
tion and kept in a decentralized enormous information storehouse, which could be
shared for extra clinical examination and factual assessment. Ichikawa et al. [24] laid
out a structure alter safe versatile Health framework by blockchain apparatuses, to
ensure dependability of records. The objective of this study was to foster a versatile
wellbeing framework for mental social treatment for a sleeping disorder by a cell
phone application.
2.8 Secure Blockchain Technology and Deep Learning

Disease Prediction
The Deep learning enhance the data, analysing and decision making. It sharing
the data and reliability is efficient in order to improve its accuracy. The decen-
tralized data using blockchain technology giving importance on data sharing. The
data shared should be secure and legitimate and blockchain concept enables these
tasks. The combination of these two techniques deep learning and blockchain shows
high accuracy with security and dependability of shared data that helps in healthcare
intelligence.
Tagde et al. 2021 [25] shows that the integrating blockchain and deep learning
concepts makes significance difference in healthcare. It generalized the analytical
technology that can be integrated to make risk management approach. So the health-
care utilizes the data from the blockchain medical records and the profound learning
techniques analyse the proposed algorithm to settle the issue and trackdown the [26]
proposed the latest advancements of blockchain and AI approach in healthcare moni-
toring systems. Their main focus on sustainable framework based on integrating these
two technologies, characteristics of healthcare supply chains, the impact on human
of these techniques and emerging technologies such as big data, IoT and AI.
Bhattacharya et al. [27] demonstrated the Healthcare 4.0 with decentralization and
provided necessary inputs for user data privacy based on previous electronic health
record to be analysed. The proposed engineering BinDaas (Blockchain-Based Deep-
Learning as-a-Service in Healthcare 4.0 Applications) an incorporated methods for
exact expectation. The following contributions have involved in this research such
as a lattice key based to avoid quantum attacks, approval of the security plan and
forecast model against existing cutting edge frameworks. This model comprises of
enormous number of boundaries in view of Gaussian dispersions.
kumar et al. [28] the new issue looked by the worldwide with the increment of
COVID-19 cases. The best way to move towards the information assortment and
maintaining in secure way. In order to diagnosis the covid patients requirements
such as shortage and reliability of testing kits. It was a tough time for everyone
to dealt with it due to increased positive cases and predictions. The another issue
confronted was dividing information between the emergency clinics internationally

in the perspective on protection concerns. So they proposed the integrating profound
learning and blockchain approach. They gathered the small amount of data from
various emergency clinics and data normalization techniques dealt with the hetero-
geneity of data having different kind of CT scanners. The results obtained improvised
recognition of CT images and to detect COVID-19 patients.
3 Conclusion
Early utilizations of profound figuring out how to biomedical information uncovered

powerful possibilities to demonstrate, mean and study from such diverse and hetero-
geneous sources. Profound learning can open the technique toward the following
gathering of prescient medical care courses of action, which can measure to incor-
porate billions of patient records and rely upon a solitary comprehensive patient
portrayal to effectively uphold clinicians in their ordinary activities.
Then again, blockchain exhibits wonderful possibility in the medical care region
since it settle issues connected to clinical records while giving privacy, secu-
rity, interoperability, verification, and affirmation. Blockchain offers replies to real
present inquiries, for example, unreported clinical preliminaries, medical services
information holes, and questionable information issues.
This paper planned to see the value in the chance of the utilization of profound
learning and blockchain in the medical services field referencing their applications
in the medical care industry especially for prediction. There are yet various open
difficulties that need additional study. We also propose more exploration on ground-
breaking explanations that encourage deep learning and blockchain as a service to
IoMT model.
References
1. Miotto R, Wang F, Wang S, Jiang X, Dudley JT (2018) Deep learning for healthcare: review,
opportunities and challenges. Brief Bioinform 19(6):1236–1246
2. Hasselgren A, Kralevska K, Gligoroski D, Pedersen SA, Faxvaag A (2020) Blockchain in
healthcare and health sciences—a scoping review. Int J Med Info 134:104040
3. Ismail WN, Hassan MM, Alsalamah HA, Fortino G (2020) CNN-based health model for regular
health factors analysis in internet-of-medical things environment. IEEE Access 8:52541–52549
4. Aldahiri A, Alrashed B, Hussain W (2021) Trends in using IoT with machine learning in health
prediction system. Forecasting 3(1):181–206
5. Cifuentes J, Yao Y, Yan M, Zheng B (2020) Blood transfusion prediction using restricted
Boltzmann machines. Comput Methods Biomech Biomed Engin 23(9):510–517
6. Baucum M, Khojandi A, Vasudevan R (2021) Improving deep reinforcement learning with
transitional variational autoencoders: a healthcare application. IEEE J Biomed Health Inform
25(6):2273–2280
7. Patel L, Shukla T, Huang X, Ussery DW, Wang S (2020) Machine learning methods in drug
discovery. Molecules 25(22):5277
8. Keshavarzi Arshadi A, Webb J, Salem M, Cruz E, Calad- Thomson S, Ghadirian N, Collins

J, Diez-Cecilia E, Kelly B, Goodarzi H, Yuan JS (2020) Artificial intelligence for COVID- 19
drug discovery and vaccine development. Front Artif Intell 3:65
9. Jo T, Nho K, Saykin AJ (2019) Deep learning in Alzheimer’s disease: diagnostic classification
and prognostic prediction using neuroimaging data. Front Aging Neurosci 11:220
10. Fisher CK, Smith AM, Walsh JR (2019) Machine learning for comprehensive forecasting of
Alzheimer’s disease progression. Sci Rep 9(1):1–14
11. Lundervold SA, Lundervold A (2018) An overview of deep learning in medical imaging
focusing on MRI. Zeitschrift für Medizinische Physik S0939388918301181
12. Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Socher R (2021) Deep learning-
enabled medical computer vision. NPJ Digit Med 4(1):1–9
13. Gulshan V, Peng L, Coram M et al (2016) Development and validation of a deep learning
algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316:2402–
2410
14. Zubaydi HD, Chong YW, Ko K, Hanshi SM, Karuppayah S (2019) A review on the role of
blockchain technology in the healthcare domain. Electronics 8(6):679
15. Angraal S, Krumholz HM, Schulz WL (2017) Blockchain technology: applications in health
care. Circ Cardiovasc Qual Outcomes 10(9):e003800
16. Reegu FA, Daud SM, Alam S, Shuaib M (2021) Blockchain-based electronic health record
system for efficient Covid-19 pandemic management
17. Dubovitskaya A, Baig F, Xu Z, Shukla R, Zambani PS, Swaminathan A, Jahangir MM
et al. (2020) ACTION-EHR: Patient-centric blockchain-based electronic health record data
management for cancer care. J Med Internet Res 22(8):e13598
18. Fatokun T, Nag A, Sharma S (2021) Towards a blockchain assisted patient owned system for
elctronic health records. Electronics 10:580
19. Wang H, Song Y (2018) Secure cloud-based EHR system using attribute-based cryptosystem
and blockchain. J Med Syst 42(8):1–9
20. Roehrs A, da Costa CA, da Rosa Righi R, da Silva VF, Goldim JR, Schmidt DC (2019)
Analyzing the performance of block chain based personal health record implementation. J
Biomed Info 92:103140
21. Hussein AF, ArunKumar N, Ramirez-Gonzalez G, Abdulhay E, Tavares JMR, de Albuquerque
VHC (2018) A medical records managing and securing blockchain based system supported by
a genetic algorithm and discrete wavelet transform. Cogn Syst Res 52:1–11
22. Griggs KN, Ossipova O, Kohlios CP, Baccarini AN, Howson EA, Hayajneh T (2018) Healthcare
blockchain system using smart contracts for secure automated remote patient monitoring. J Med
Syst 42(7):1–7
23. Rahman MA, Hassanain E, Rashid MM, Barnes SJ, Hossain MS (2018) Spatial blockchain-
based secure mass screening framework for children with dyslexia. IEEE Access 6:61876–
61885
24. Ichikawa D, Kashiyama M, Ueno T (2017) Tamper-resistant mobile health using blockchain
technology. JMIR Mhealth Uhealth 5(7):e7938
25. Tagde P, Tagde S, Bhattacharya T, Tagde P, Chopra H, Akter R Rahman M (2021) Blockchain
and artificial intelligence technology in e-health. Environ Sci Pollution Res 1–22
26. Gao Y, Wang J, Yuan Y, Jiang YZ, Yue X (2021) Machine learning and blockchain technology
for smart healthcare and human health. J Healthc Eng
27. Bhattacharya P, Tanwar S, Bodke U, Tyagi S, Kumar N (2019). BinDaaS: blockchain-based
deep-learning as-a-service in healthcare 4.0 applications. IEEE Trans Netw Sci Eng 1–1
28. Kumar R, Khan AA, Kumar J, Zakria A, Golilarz NA, Zhang S, Wang W (2021) Blockchain-
federated-learning and deep learning models for covid-19 detection using ct imaging. IEEE
Sens J
Artificial Neural Networks
in Improvement of Spatial Resolution
of Thermal Infrared Data
Mallam Gurudeep, Gaddam Samatha, Sandeep Ravikanti,

and Gopal Rao Kulkarni
Abstract This paper focus on the visible (VIS) and near-infrared (NIR) range of
electromagnetic (EM) spectrum, the available EM energy is very high, but in thermal
infrared (TIR) range the EME is very low. This results in poor or coarse spatial
resolution and lower amount of detail in TIR images. Artificial Neural Networks
(ANN) approach is adopted to utilize the best properties, to improve the coarse
spatial resolution (120 m) of Landsat Thematic Mapper (TM) TIR data utilizing the
advantages of fine resolution (30 m) VIS and NIR data. The working of this model
is based on the 3 VIS and NIR band data, Raw TIR data is used as the input. This in
turn improves the result in substantial improvement of spatial resolution at the end
of the model.
Keywords Artificial Neural Networks · Multi spectral · Thermal infrared · Spatial

resolution · TIR
1 Introduction
The Sun, is the basic and major source of electro-magnetic (EM) energy for remote
sensing, with the Earth as the secondary source. The solar energy available in the
visible (VIS) and near-infrared (NIR) range (0.4 µm–0.7 µm) of EM spectrum is
very high (108 W/µm/m2 ), and in the TIR range (8 µm–15 µm) EM energy is very
low (10 W/µm/m2 ) [1]. The high energy in VIS–NIR ranges results in fine or high
spatial resolutions of images [1] acquired, and the low energy in TIR range results in
M. Gurudeep
Department of ECE, MCET Hyderabad, Hyderabad, India
G. Samatha
Department of ECE, JBIET Hyderabad, Hyderabad, India
S. Ravikanti (B)
Methodist College of Engineering & Technology, CSE, Hyderabad, India
e-mail: rsd.sandeep@gmail.com
G. R. Kulkarni
IIT BOMBAY, Mumbai, India
438 M. Gurudeep et al.
coarse resolution. TIR data in general, is 3–4 times coarser than the data in VIS–NIR
range. Spatial resolution in digital imagery refers to the area on the ground covered
(ground resolution element—GRE) or one picture element (or a pixel) one data value
in digital images. Amount of detail conveyed by the satellite imagery, mainly depends
on the spatial resolution of the sensor, apart from spectral resolution. Finer spatial
resolution results in higher amount of detail. Artificial Neural Networks (ANNs) [2]
due to their best properties are used in many applications [2], classification of Multi-
Spectral (MS) data [3] and urban land use classification [4]. TIR data add thermal
responses [4, 5] in many Earth resources applications. Study area and Data.
The study area falls in the North-East part of Bombay (Plate 1). Thane creek is
the most dominant feature in the area flowing from North to South. The study area
covers the three prominent lakes viz. Tulsi, Vihar and Powai from North to South
on the left side of the creek. The urban area spreading from Thane to Kurla falls on
the left (West) of the creek, and New Bombay falls on the right (East) of the creek.
There are mangroves on either side of the creek [6], and forests cover the hilly areas
around Tulsi and Vihar lakes. The Eastern Express Highway runs on the left side of
Thane creek. The Central Railway and LalBahadurShastriMarg running in North–
South direction are also prominently seen in the area. Landsat 5, thematic mapper
(TM) VIS–NIR data of 30 m spatial resolution (bands 2, 3, 4) and TIR data of 120 m
resolution (band 6) acquired on 20th Dec 1989 was available for the study area and
used in the present studies [6]. Plate 2 shows, Landsat 5 TM band 3 (red, 0.6–0.7 m)
image (left) at 30 m resolution and Plate 3 shows the Raw TIR band 6 (10.4–12.5 m)
image (right) at 120 m resolution. Plate 1, is a FCC image of 3 bands, 2, 3, and 4 of
the Landsat 5 TM (Fig. 3).
2 Artificial Neural Networks
Artificial Neural Networks (ANN) with their best properties [2] are found to be far
superior in many applications. ANNs are highly interconnected systems of infor-
mation processing cells/nodes, formed into a few layers with several nodes in each
layer. All the nodes in each layer are connected to all the nodes of the next layer.
Generally, a 3-layer network is used with 1 input layer (I), 1 output layer (O) and 1
or more hidden layers (H) in between. The number of input nodes (NI) is known and
equal to the number of input variables considered (like, number of bands) and the
number of output nodes (NO) is also known and equal to number of output variables
(like, landuse classes). The number of nodes in the Hidden layer (NH) is chosen
empirically [7].
A good number of samples are extracted from some selected areas of the input
image data representing the landuse classes, and scaled to 0.0–1.0 for sigmoid activa-
tion function. This data is used in training the Network chosen to derive the requisite
weights (W ij and W jk ) for the connections between the NI-NH and NH-NO layers
[8]. In training the network, the TIR data (120 m) is enlarged 4 times to match the
resolution (30 m) of the VIS–NIR data (Fig. 4).
Artificial Neural Networks in Improvement of Spatial Resolution… 439
The inputs from all input nodes i, are multiplied

Σ by their respective weights W ij ,
and summed up, at each hidden node, H j = (Bi * W ij ). A non-linear activation
function like the sigmoid is used to transform these individual products, multiplied
by their respective weights W jk , and summed up at the output node as the output. The
output of the network is compared with the desired output and if any discrepancy, the
weights of the network are modified. Back propagation algorithm [6] is used with
several iterations (forward and backward) carried out to improve and derive the final
weights for use in the resolution improvement process [9].
3 TIR Information Spatial Resolution Enhancement
The primary goal of the current research is to enhance the spatial resolution of TIR
data from 120 to 30 m. Different methods [6] have been employed in the past to
improve spatial resolution, but ANNs have been proven to be significantly superior.
A 3-layer (I, H, O) ANN is employed in the current investigations, and programmes
developed [10] in “C” in previous studies are used in the process. Two cases are
carried out in the process of improvement of effective spatial resolution of the TIR
data. In case 1, only 3-band (VIS–NIR) data is used as input. In case 2, Raw TIR
data is also used along with the 3 VIS–NIR bands as input. Only a single forward
pass is used in the improvement of resolution. The output is the improved resolution
[11] (30 m) TIR data.
3.1 Improvement of Spatial Resolution: Case 1
In the first case, data of bands 2, 3 and 4 (VIS and NIR) only are used as the input.
The network used for the case is shown in the Fig. 2. The output of the Network is
the improved resolution (30 m) TIR data.
3.2 Improvement of Spatial Resolution Case 2
In the second case, data of bands 2, 3, 4 and raw day time band 6 (TIR) data also is
used as the input. The network is shown in Fig. 5. The output of the Network is the
improved resolution (30 m) TIR data.
Fig. 1 Plate 1 False colour

composite (FCC) using 30 m
resolution Landsat 5 TM
bands 4 (red), 3 (green) and
2 (blue)
3.3 Improvement of Resolution
In the improvement stage, pixel values of band 2, 3 and 4 at 30 m resolution (448

lines × 448 pixels) are used as input and 4 times enlarged 120 m resolution Band
6 data (448 lines × 448 pixels) are given as input to the network. The procedure
adopted in the improvement of resolution in Case 2 is also exactly the same as in
Case I (Figs. 5 and 6), except that there are now 4 bands (2, 3, 4 and 6) as input.
In a forward pass, (i) The normalized input data (4 bands) is multiplied by the
1st set of final adjusted weights (W ij ), (ii) summed up at the hidden nodes, (iii)
transformed using sigmoid function to find the output of hidden nodes, (iv) multiplied
with the second set of weights (W jk ), (v) summed up at the output nodes, to derive
the final output. This output is scaled back to the original Digital values to derive
the Improved Resolution (30 m) TIR data [12]. The procedure is repeated for all the
pixels [13] of all the lines to derive the complete improved TIR image. There is no
backward pass or iterations through the network.
The improved resolution image [14] by ANN using 3 inputs, VIS–NIR bands 2,
3, and 4 only as in Case 1 (Fig. 1) is shown in Plate 4. The improved resolution
image by ANN using 4 inputs, VIS–NIR bands 2, 3, and 4 and Raw TIR [15] band as
input as in Case 2 (Fig. 5) is shown in Plate 5. In [16–22], the authors implemented
various techniques in Machine Learning, Networks domains for security and disease
predictions (Figs. 7 and 8).
4 Results and Observations
The higher resolution of VIS–NIR data is taken advantage of in the process. Two
cases, (i) using only the 3 bands of VIS–NIR data as input, and (ii) using 3 bands of
Fig. 2 Plate 2 Landsat 5 TM

band 3 image (red 0.6–0.7 m)
at 30 m resolution
Fig. 3 Plate 3 Raw TIR

band 6 image (10.4–12.5 m)
at 120 m resolution
Wij Wjk
Input B1
Input B2 output B6
Input B3
Input B4
Input Layer (i) hidden layer(j) output layer(k)
Fig. 4 A 3-layer (I, H, O) neural network and connection weights
VIS–NIR data added with the Raw TIR data also as input. Results of Improved TIR
data obtained with ANN method are found to be far superior to the results obtained
by the statistical approaches carried out earlier.
Band 2
Band 3 Improved
Band 6
Input Layer Output layer
Fig. 5 Network for improvement of resolution with 3 (VIS–NIR) bands by ANN
Fig. 6 Network for improvement of resolution
Fig. 7 Plate 4 ANN

improved TIR image with
bands 2, 3 and 4 as input
Histograms, of the Raw TIR data (Fig. 9a), ANN improved TIR data with 3 bands
only (Fig. 9b), and ANN improved TIR data (Fig. 9c) with 4 bands (VIS–NIR and
Raw TIR), are obtained to compare the results. The ranges of DN values in the ANN
improved TIR data are found to be within the range (128–182), of values of the Raw
TIR data (120–183) (Table 1).
Visual interpretation of the ANN improved images (Plate 4 and Plate 5) are also
found to show far superior results with higher amount of land use detail and ease
of terpretation. The results of Case 2 (VIS–NIR and Raw TIR) are found to be far
better than those of Case 1.
Fig. 8 Plate 5 ANN

improved TIR imagewith
bands 2, 3 and 4 and raw TIR
as input
(a) Histograms of Raw TIR data
(b) TIR data case 1
(c) Improved TIR data case 2
Fig. 9 a Histograms of Raw TIR data, b TIR data case 1, c Improved TIR data case 2
Table 1 Statistical parameters of raw and improved TIR images

S. No. Image description Mean Median S.D DN range
1 Raw thermal infrared image (120 m resolution) 144.2 144 10.2 120–183
2 Improved thermal infrared image with input as 146 145 7.0 128–182
VIS–NIR bands (30 m resolution)
3 Improved thermal infrared image with input as 146.6 146 4.9 135–180
VIS–NIR and TIR band (30 m resolution)
5 Conclusion
The major goal of this research was to use the good features of ANNs to improve
the effective spatial resolution of TIR data. It has been discovered that artificial
neural networks (ANNs) can be utilized to improve the effective spatial resolution
of low resolution (120 m) TIR data. The ANN technique has been proven to provide
significantly improved resolution and detail, as well as being very good at retaining
the thermal patterns of the original photos. Using only the high-resolution VIS and
NIR band data resulted in a significant increase in spatial resolution. The addition
of Raw TIR band data to the high-resolution VIS and NIR data as input improves
the resolution even further. The DN ranges of the raw images are maintained better,
according to the histograms of the ANN improved TIR images.
Acknowledgements The authors express their gratitude to Dr. K. GopalRao,Prof. (Retd.,) IIT
Bombay for his guidance and valuable support.
References
1. Lillesand TM, Kiefer R (1994) Remote sensing and image interpretation. In: 3rd (ed) John
Wiley & Sons, Inc
2. Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H (2018) State-of-the
art in artificial neural network applications: A survey. Heliyon 4
3. Kahle BA, Michael JA, Frank DP, John PS (1984) Geological mapping using thermal images,
Rem Sens Environ 16:13–33
4. Price JC (1981) The contribution of thermal data in landsat multi spectral classification.
Photogram Eng Remote Sens 47(2):229–236
5. Stephen ML, Venugopal G (1990) Thematic mapper thermal infrared data in discriminating
selected urban features. Int J Remote Sens 11(5):841–857
6. Valdes M, Inamura M (2001) Improvement of remotely sensed low spatial resolution images by
backpropagated neural networks using data fusion technique. Int J Remote Sens 22(4):629–642
7. Zhang X, Van Gendern JL, Kroonenberg SB (1997) A method to evaluate the capability of
landsat TM band 6 data for sub-pixel coal fire detection. Int J Remote Sens 15:3279–3288
8. Heermann PD, Khazenie K (1992) Classification of multispectral remote sensing data using a
back propagation neural networks. IEEE Trans Geosci Remote Sens 30(1):81–88
9. Chavez SP, Stuart CS, Labrey AA (1991) Comparison of three different methods to merge
multi resolution and multi spectral data: landsat TM data and SPOT panchromatic. Photogram
Eng Remote Sens 57(3):295–303
10. Ravikanti S (2017) Internet of everything (IoE): a new technology era will have impact on
every facet of our life
11. Ravikanti S, Preeti G (2016) Future’s smart objects in IOT, Based on big-data and cloud
computing technologies
12. A hybrid forecasting method based on exponential smoothing and multiplicative neuron model
artificial neural network, (IRSYSC-2017)
13. J N FIDALGO (2015) Neural networks applied to spatial load forecasting in GIS. INESC Porto
and Depart Electric Eng Comput
14. Application of artificial neural network and climate indices to drought forecasting in south-
central Vietnam, September 2019
15. Mehdy MM, Ng PY, Shair EF, MdSaleh NI, Gomes C (2017) Artificial neural networks in
image processing for early detection of breast cancer. Hindawi Comput Math Meth Med 2017,
Article ID 2610628
16. Thirupathi L, Padmanabhuni VNR (2021) Multi-level Protection (Mlp) policy implementation
using graph database. Int J Adv Comput Sci Appl (IJACSA) 12(3). http://dx.doi.org/https://
doi.org/10.14569/IJACSA.2021.0120350
17. Thirupathi L et al (2021) J Phys: Conf Ser 2089:012049
18. Lingala T et al (2021) J Phys: Conf Ser 2089:012050
19. Pratapagiri S, Gangula RRG, Srinivasulu B, Sowjanya B, Thirupathi L (2021) Early detection
of plant leaf disease using convolutional neural networks. In: 2021 3rd International conference
on electronics representation and algorithm (ICERA), pp 77–82. https://doi.org/10.1109/ICE
RA53111.2021.9538659
20. Padmaja P, Sophia IJ, Hari HS, Kumar SS, Somu K et al (2021) Distribute the message
over the network using another frequency and timing technique to circumvent the jammers. J
NuclEneSci Power Generat Techno 10:9
21. Reddy CKK, VijayaBabu B (2015) ISPM: improved snow prediction model to nowcast the
presence of snow/no-snow. Int Rev Comput Softw
22. Reddy CKK, Rupa CH, VijayaBabu B (2015) SLGAS: supervised learning using gain ratio as
attribute selection measure to nowcast snow/no-snow. Int Rev Comput Softw
23. Artificial neural networks-based machine learning for wireless networks: IEEE, 03 July 2019
Facial Micro-expression Recognition
Using Deep Learning
Nasaka Ravi Praneeth, Godavarthi Sri Sai Vikas, Ravuri Naveen Kumar,
and T. Anuradha
Abstract Micro-expressions are those which reveal a person’s true intentions,

because it lasts for less than 0.5 s. So in that short time, there will be no chance to
hide or fake their emotions. This helps to know the real intention of a person in any
unexpected situations. Subtle expression popularity is a getting popularity owing to
its capability in revealing subtle intention of humans, particularly while under exces-
sive stake conditions. The main applications of micro-expression recognition are to
detect lies and investigate a thief. Human facial micro-expression is divided into
several universal emotions such as happy, sad, angry, fear, neutral, and surprised. For
detecting these expressions, the proposed research has taken videos from MEVIEW
dataset and converted videos into frames according to their expressions and consid-
ered images from SAMM dataset which is used to detect the micro-expression. To
identify the face first to detect the micro-expression, the pre-trained model Haar
cascade is used for this purpose, and with the help of that, a bordered rectangular
box appears around the face. A deep learning technique called convolutional neural
network (CNN) is used for image processing, where there are various strategies. A
précised model is defined, to know the level of emotions, with the help of the SAMM
and MEVIEW datasets and obtained better results compared to previous research in
the literature.
Keywords Micro-expression recognition · CNN · SAMM · Haar cascade ·

MEVIEW
1 Introduction
Subtle expressions involuntarily cause an emotional leakage that exposes true feel-
ings of a person. Because these expressions occur inadvertently, they can also be
N. R. Praneeth · G. S. S. Vikas · R. N. Kumar · T. Anuradha (B)

Department of Information Technology, Velagapudi Ramakrishna Siddhartha Engineering
College, Vijayawada, India
e-mail: anuradha_it@vrsiddhartha.ac.in
448 N. R. Praneeth et al.
considered as a person’s true feelings. These expressions are officially called micro-
expressions because they do not last long. In order to detect these subtle expres-
sions in real-life situations, a well-trained accurate model is needed to identify the
real motives. Human micro-expression recognition can be implemented in many
areas for security, lie detection, or information about the person. Nowadays, human
facial micro-expression is most widely used for day to day life. Happy, sad, angry,
fear, neutral, and surprised emotions of human can be detected by using this facial
micro-expression.
MEVIEW dataset consists of 146 video clips which consist of different catego-
rized micro-expression videos in a sequential way [1]. These sequenced datasets
help us to recognize the spontaneous changes of expressions even in the live video.
A model was built using the SAMM [2, 3] and MEVIEW datasets and convolutional
neural network (CNN) classifier, as it is used for detecting the micro-expressions.
This needs to train in a critical way with huge number of layers as it needs to recog-
nize the class that belongs to even for a minute change in the expression as it needs
to detect micro-expressions which occur as fast as 1/15–1/25 of a second. The model
that we built shows the classes of expressions with an accuracy of 89%. Also, the
model that we built shows the emotional levels in the form of percentage.
2 Literature Survey
Choi, song presented their work of recognizing facial micro-expressions as shown

in [4]. They proposed 2D landmark feature map technique which helps them to
predict the micro-expression based on the coordinate-based landmark. Reddy et al.
have worked on recognizing facial micro-expressions with 3D spatiotemporal CNN
technique by proposing MicroExpFuseNet model [5]. Dubey, BhavyaTakkar, and
Lamba implemented their work in micro-expression recognition using 3D-CNN [6].
Adegun, Vadapalli shown their work by using machine learning approach called
extreme learning machine (ELM). In this work, if size becomes more, it would
become difficult for feature selection [7]. Takalkar, Xu shown their work by using
deep learning technique called CNN on small-sized datasets. They faced a situation
of difficultness for assigning class labels for nuanced faces [8]. Yap et al. in [9] have
used 3D-CNN-based approach for spotting micro- and macro-expressions. They
concluded that LCN has provided some unique improvement in the performance of
model. Zhao and Xu had brought up their model using CNN approach. They have
used compound micro-expression database (CMED) synthesized from existing ones.
[10].
Peng et al. proposed in [11] that they have used dual temporal-scale CNN, which is
a two-stream network used for recognizing these subtle expressions. The model they
built can avoid gradient vanishing. Husak, Cech, Matas presented their idea in [1]
about spotting the micro-expressions. They considered making use of SVM classifier
and compared the evaluations with the baseline method. Davison et al. described
about the SAMM: a spontaneous micro-facial movement dataset. They concluded
Facial Micro-expression Recognition Using Deep Learning 449
that it was better to use deep learning models for micro-expression detection in a
better way compared to machine learning methods [2].
Qu and their team explained the CAS (ME)2 database and its characteristics. They
employed LBP method for spotting and evaluating the micro- and macro-expressions
[12]. Adegun et al. presented their work of recognizing micro-expressions using
combination of LBP on three orthogonal planes and ELM. They concluded that
detecting expression from static images will not be effective for subtle movements
[13].
3 Proposed System
The specific domain for this model is in research areas such as human physiological
interaction detection purposes especially, at the time of interrogations. Building a
model for micro-expression recognition is a heavy task, where it has to be trained in
an ultimate way for detecting micro-expressions that last for 0.2–0.5 s which means
that the upper bound limit of such expressions will be less than ½ s. The proposed
system as shown in Fig. 1 captures the face of the user using a webcam. The frames
will be extracted from the video, and all these frames will be converted to grayscale
for pixel formatting. The model built helps to detect these facial micro-expressions in
a spirited way as it was well trained with the existed spontaneous micro-expression
database called SAMM and MEVIEW.
MEVIEW consists of a set of video clips that helps us to train the model for
predicting facial micro-expressions precisely. Those videos were splitted into frames,
and some preprocessing techniques such as image rescaling, cropping, converting
RGB images to grayscale, and image rotation were applied on the dataset for
Fig. 1 Architecture diagram

predicting the class labels of a micro-expression accurately in an easier way. The

images of the training dataset were 48 × 48 pixels in size.
3.1 Frames Extraction
Usually, a person’s subtle expressions come out when they are in a stressful and high
share situation such as when they feel to hide their real feelings. As we know, a video
is a sequence of images, where every type of expression can be extracted as there
will be continuality. So, we extracted the frames from each video and used them to
train our model so that it can be able to predict micro-expressions.
3.2 Data Preprocessing
From video clips, the frames are extracted. Some preprocessing techniques like image
rescale, rotation, and image cropping were done and converted all the images from
RGB to grayscale for extracting the information easily from the image by reducing
the size of pixel values. The images of the training dataset are 48 X 48 pixels in size.
3.3 Interface and Class Label Prediction
3.3.1 Detecting the Face
Face detection plays a key role in recognizing facial micro-expressions. For this,
Haar cascade model was implemented which will be provided in OpenCV as a pre-
trained method [14, 15]. A pop-up window will occur on the screen which shows the
webcam feed. Haar cascade is an object detection algorithm which helps to draw a
bounding box around the face which denotes the identification of the object that we
are looking for.
3.3.2 Prediction of Class Label of Micro-expression
As described in Fig. 1, the detected frame will be given as input to the trained model
for predicting the class label of that particular expression. The model calculates
and extracts the features from the image. This model learns the feature detection
via hidden layers of the model. These extracted features will be compared with the
training sets of data. Thus, the class label will be displayed on the top of the bounding
box occurring around the face. Also, all the labels of micro-expressions that were
captured when the webcam is on will be visible in the background with the prediction
score of that respective image.
3.4 Limitations
Since assessing facial subtle expression is a very laborious task, the face must be
visible to accurately assess subtle expression. For this, the use of a high-quality
camera can capture every frame in a clear cut. Thereby, it is more helpful in accurately
assessing subtle expressions.
4 Proposed Algorithm
We have used deep learning approach, which is used for image processing is convo-
lutional neural network (CNN). Convolutional neural network consists of different
layers. They have built this CNN model with six convolutional layers and max-
pooling layers. The input size of the image must be in the 48 × 48 dimensions.
Convolution layer is used for filters, or methods are applied to the original image or
to other feature maps. This convolution layer works with the help of ReLU which is
activation function, and these can be applied for different number of filters. In this
convolution layer, kernel size must be the 3 × 3 dimensions as shown in Fig. 2.
Max-pooling is one of the layers of CNN model which is used for selecting most
of the elements from the particular region of the feature map which is covered by the
filter. For this, size of max-pooling must be 2 × 2 dimensions for this model which
have been created as shown in Fig. 2. Thus, the output after the max-pooling layer
contains a feature map similar to the features of the previous feature map. Flattening
layer converts the data into a one-dimensional array for inputting it to the next layer.
Fig. 2 CNN architecture

Finally, flatten layer is connected to the fully connected layer which is the last layer
of the convolutional layer for the output. Softmax is used as an activation function
for multi-class classification problems as there are more than two classes of emotions
that need to be predicted. In this, micro-expressions are detected within 0.5 s and
also detected every micro-expression, and the levels of micro-expressions can be
displayed on background.
5 Results
The model results were analyzed by different types of micro-expressions of a person

when the webcam gets on and detects the face. Here, each figure represents each
class according to the changes in the expressions of a person. Figure 3 represents
input images of human emotions such as neutral, surprised, happy, angry, and fearful
respectively. Some preprocessing techniques like image resizing, rotating, rescaling
have been done and converted all the images from RGB to grayscale for extracting
the information easily from the image by reducing the pixel size. Then, the images
were trained with 2D-CNN classifier.
By using live webcam, we have captured some of the micro-expressions predicted
by our model such as neutral, happy, sad, angry, fearful, and surprised as shown in
Fig. 4. Neutral emotion is the feeling indifferent and lack of preference, which is a
rest emotion of the human face. Surprised encounters when sudden or unexpected
movements occurred, and surprised emotion can have both positive and negative.
Happy is an emotion state that is joy, satisfaction, and fulfillment. Angry is an emotion
state that antagonism to someone and strong uncomfortable. Fear emotion encounters
when a person in a danger or threat.
All the micro-expressions, which were resulted throughout the webcam is on, will
be visible in the background as shown in Fig. 5.
Experimentation was done by training the model with Adam optimizer and
RMSprop optimizer with ReLU as activation function. We built our model with 100
epochs as it needs to be trained such that it can be able to detect micro-expressions.
Table 1 represents the accuracy and loss analysis of the built model. From Table 1,
it was concluded that the model built with Adam optimizer has more accuracy and
more validation loss compared to the model built with RMSProp.
Fig. 3 Image dataset

Fig. 4 Various detected facial micro-expressions
Fig. 5 Percentage levels of

detected micro-expressions
Table 1 Model accuracy and loss with different optimizers

Optimizer Accuracy Loss
Training Validation Training Validation
Adam 89.62 85.25 0.80 1.57
RMSProp 75.56 70.31 0.98 1.48
6 Conclusion
Facial micro-expression recognition with 2D-CNN classifier was proposed. Experi-

mentation was done on the SAMM dataset which consists of images and MEVIEW
dataset which consists of videos. The model built shows the classes of expressions
with an accuracy of 89%. Also, the model can show the emotional levels in the form
of percentage.
The accuracy of the model can be improved by adding much more layers to it.
Use of a high-quality camera is required to capture every frame, and the face must
be visible to accurately assess subtle expressions.
References
1. Husák P, Cech J, Matas J (2017) Spotting facial micro-expressions “in the wild”. In: 22nd
computer vision winter workshop (Retz). https://cmp.felk.cvut.cz/~cechj/ME/
2. Davison K, Lansley C, Costen N, Tan K, Yap MH (2018) Samm: a spontaneous micro-facial
movement dataset. IEEE Trans Affect Comput 9(1):116–129
3. Davison AK, Merghani W, Yap MH (2018) Objective classes for micro-facial expression
recognition. J Imaging 4(10):119
4. Choi DY, Song BC (2020) Facial micro-expression recognition using two-dimensional land-
mark feature maps. IEEE Access 8:121549–121563. https://doi.org/10.1109/ACCESS.2020.
3006958
5. Reddy S, Karri ST, Dubey SR, Mukherjee S (2019) Spontaneous facial micro-expression recog-
nition using 3D spatiotemporal convolutional neural networks, pp 1–8. https://doi.org/10.1109/
IJCNN.2019.8852419
6. Dubey V, Takkar B, Lamba PS (2020) Micro-expression recognition using 3D—CNN. Fusion:
Pract Appl 1(1):5–13. https://doi.org/10.54216/FPA.010101
7. Adegun IP, Vadapalli HB (2020) Facial micro-expression recognition: a machine learning
approach. Sci Afr 8:e00465, ISSN 2468-2276. https://doi.org/10.1016/j.sciaf.2020.e00465
8. Takalkar MA, Xu M (2017) Image based facial micro-expression recognition using deep
learning on small datasets. Int Conf Digital Image Comput: Tech Appl (DICTA) 2017:1–7.
https://doi.org/10.1109/DICTA.2017.8227443
9. Yap CH, Yap MH, Davison AK, Cunningham R (2021) 3D-CNN for facial micro- and macro-
expression spotting on long video sequences using temporal oriented reference frame. https://
arxiv.org/abs/2105.06340v3
10. Zhao Y, Xu J (2019) A convolutional neural network for compound micro-expression
recognition. Sensors 19:5553. https://doi.org/10.3390/s19245553
11. Peng M, Wang C, Chen T, Liu G, Fu X (2017) Dual temporal scale convolutional neural network
for micro-expression recognition. Front Psychol 8:1745. Published 2017 Oct 13. https://doi.
org/10.3389/fpsyg.2017.01745
12. Fu (2017) Cas (me)ˆ2: a database for spontaneous macro-expression and micro-expression
spotting and recognition. IEEE Trans Affect Comput
13. Adegun P, Vadapalli HB (2016) Automatic recognition of micro-expressions using local binary
patterns on three orthogonal planes and extreme learning machine. In: 2016 pattern recognition
association of South Africa and robotics and mechatronics international conference (PRASA-
RobMech), pp 1–5. https://doi.org/10.1109/RoboMech.2016.7813187.
14. Sri BR, Akanksha Y, Puthali R, Anuradha T (2021) Early driver drowsiness detection using
convolution neural networks. In: Proceedings of the 2nd international conference on electronics
and sustainable communication systems, ICESC 2021, pp 1779–1784
15. Teja PR, AnjanaGowri G, PreethiLalithya G, Anuradha T, Kumar CSP (2021) Driver drowsi-
ness detection using convolution neural networks, smart innovation, systems and technologies
224:617–62
Precision Agriculture with Weed
Detection Using Deep Learning
I. Deva Kumar, J. Sai Rashitha Sree, M. Devi Sowmya, and G. Kalyani
Abstract Agriculture is the field which needs care and attention. This field remains
as the backbone to the Indian Economy. Nowadays, the production or yield decreases
due to the increase of variety of crop diseases and weeds. Identification and elimi-
nation of weeds are a tedious task. To reduce the stress on the farmers and increase
the productivity of the crop, machine learning and deep learning can be used to
detect the weeds and the diseases. Various researches have been conducted in this
area using machine learning algorithms like Random Forest (RF) and Support Vector
Machine (SVM). But for better accuracy in the results, deep learning techniques—
InceptionV4 and Xception are used to detect the weeds with higher speed and usage
of less computing resources.
Keywords CNN · SVM · Deep learning · Machine learning · InceptionV4 ·

Xception
1 Introduction
Weeds grow on farmland by feeding on nutrients present in the soil that is meant for
the crop plant. Weeds indeed compete for various resources necessary for growth
from the crop plants and deplete the nutrients available for the crop plants to grow.
Therefore, pulling weeds becomes an inevitable task in farming. However, manually
pulling weeds on huge crop lands can be time-consuming task to farmers. This
problem raises a need to automated weeding. Automated weeding is process of
removing weeds using machines. Weed identification also called weed detection
plays an important role in automated weeding. One crop that suffers from many
types of weeds is cotton [1].
Identifying crop and weeds (hereinafter referred to as detection) is the first impor-
tant step in an automated weed control process. The development of computer-vision
algorithms for weed detection has a long history, and research dates back to the 1980s.
I. Deva Kumar · J. Sai Rashitha Sree · M. Devi Sowmya · G. Kalyani (B)

Velagapudi Ramakrishna Siddhartha Engineering College, Vijayawada, Andhra Pradesh, India
e-mail: kalyanichandrak@gmail.com
456 I. Deva Kumar et al.
Many algorithms are used to distinguish between weeds and crops. Feature detec-
tion with machine learning algorithms combined can improve performance. Recent
studies have shown that deep learning algorithms can be used to further improve
performance. These algorithms learn the characteristics of the associated image and
detect weeds directly from the camera image.
Hence, in this paper total eleven weeds in the cotton crop are classified using
the convolutional neural network algorithms. The models used are InceptionV4 and
Xception. InceptionV4 is a CNN architecture built on previous iterations of the Incep-
tion family by simplifying the architecture and using more Inception modules than
Inceptionv3. The module was developed, among other things, to solve the problem
of computational effort and overfitting.
Xception is a deep convolutional neural network architecture that includes convo-
lutions that can be separated by depth. It was proposed by Google researchers. Google
has presented the interpretation of the convolutional neural network Inception module
as an intermediate step between normal convolution and a separable convolution oper-
ation in the depth direction (depth convolution followed by pointwise convolution).
From this point of view, the depth-separable convolution can be understood as an
Inception module with the largest number of towers. Based on this observation, they
propose a new deep convolutional neural network architecture inspired by Inception,
replacing the Inception module with a depth-separable convolution.
2 Review of Literature
There are different papers published to detect the weeds and control the weeds in real
time. One of them is “A Multi-class weed species Image Dataset for Deep Learning”
[2]. 17,509 images are collected from eight different kinds of species from the fields
in Australia, and Inception v3 and ResNet-50 were used to identify the weeds. But
the author focused only on the preparing the dataset.
Next paper is “A Deep Learning Approach for Weed Detection in Lettuce Crops
Using Multispectral Images” [3]. They gathered the images from the field with the
help of drone and used machine learning algorithm called “Support Vector Machine”
and convolutional neural network—YOLO and R-CNN for identification of the
weeds.
The authors of the paper titled “Weed Location and Recognition based on UAV
imaging and Deep Learning” have taken 2000 weed images from the fields of china
[4]. To classify the weeds, deep learning techniques called YOLO V3 and YOLO
V3-tiny are used. But the disadvantage is the model failed to identify the weeds
which are in growing stage or in small size.
The models InceptionV4 and ResNet are used in the paper. InceptionV4 uses
stronger architectural constraints than InceptionV3. The novel introduction of ResNet
with skipping layers is included in Xception network along with deep convolution
and point convolution techniques. Ultimately by using high end models like these,
Precision Agriculture with Weed Detection Using Deep Learning 457
we have significantly reduced training times and accuracy of important weeds is

higher than the available models in the market.
Other papers using YOLO and its versions have significant amount of prepro-
cessing and could not be used for real time. A model needs to learn from new data in
timely manner. Updating the weights from new data and model’s mistakes becomes
very difficult due to amount of preprocessing model demands and the training time.
Ultimately, the project rectifies few drawbacks of above papers and provides more
accuracy for few class labels and better training time and high throughput while
testing. The InceptionV4 and Xception are computationally simpler yet efficient
than other models like R-CNN.
3 Proposed Architecture and Methodology
3.1 Proposed Architecture
Figure 1 shows the architecture of the proposed work. The weed images which are
taken in different soil conditions, growth stages, etc., are fed into the preprocessing
stage. In the preprocessing stage, the data augmentation and normalization techniques
are applied. Before the normalization, the pixel values are between 0 and 255. Now,
the pixel values are between 0 and 1. After preprocessing step, the data (image
dataset) is separated to training data and testing data. Training images are given to
the classification model called Inception, whereas the testing images are used for
the evaluation of the model. The model takes many parameters. After completion of
the iteration, the model is saved and evaluated against the test data. If accuracy is
acceptable, model can be used on real-world data.
Fig. 1 Architecture of the proposed work

3.2 Proposed Methodology of the Work
3.2.1 Data Preprocessing
The data preprocessing is used to enhance the performance of the model. It

is an important step in developing reliable machine learning models. The data
preprocessing operators used in this work are as follows:
randomResizedCrop()—crop the random size of the image and resize it to given
size. It takes size as the parameter which takes length and breadth of the image.
horizantalFlip()—flips the input figure randomly with the given probability. The
default probability value is 0.5.
centerCrop()—crops the given image at the center. If the image is smaller than
the output size, it will be padded with zeroes and then center cropped.
normalize ()—normalizes the given image with mean and standard deviation. The
mean and standard deviation are given as input to the method.
resize ()—resize the image with the given size.
3.2.2 Algorithm
The pretrained models like from torch are imported and used in this project. Here,
we try to summarize how different algorithms work.Inception Algorithm:
1. Input the image to the model
2. for i: = 1 to epochs
2.1
parallel filtering is applied for different sizes 1*1,3*3,5*5
2.2
extract the features by applying the filters
2.3
average pooling is applied to reduce the amount of parameters
2.4
if predicted_label not matched with the actual_label
2.4.1 loss function is applied
2.5 if classification then
2.5.1 Cross_Entropy_Loss is applied
2.6 else
2.6.1 Mean_Squared_Loss is applied
3. Outputs the probabilities of the class label
4. The class label which has higher probability is the predicted label
Optimization algorithms are used to boost and give momentum to the model.
The torch.optim is a package which implements various optimization algorithms.
The algorithm used in this paper is SGD (stochastic gradient descent). The
torch.optim.lr_Scheduler helps to adjust the learning rate after each epochs based on
the optimization algorithms used.
3.2.3 Loss Function
The cost or loss functions are used to increase the accuracy and make the model
optimized. The main aim is used to decrease the loss. If the loss is more, then the
model is not preferred. If the loss is less, the model is preferred for real-time usage.
Cross-entropy loss function is mainly used for the classification problems. The cross-
entropy loss is preferred when there is an unbalanced dataset. The formula for the
cross-entropy loss is given.
( )
exp(x[class])
loss(x, class) = − log Σ = −x[class]
j exp(x[i])
( ) (1)
Σ
+ log exp(x[ j])
i
4 Discussion on Experimental Investigations
4.1 Dataset
The cotton crop was selected for the project; then, the cotton weeds are taken from
the Kaggle [6]. The dataset contains the 11 labels and 5187 images of the weeds.
The images are taken under different conditions like the natural light, varied weed
growth stages, and different soil types. The names of the weeds are Carpetweeds,
Crabgrass, Ragweed, Sickle pod, Spurred Anoda, Swinecress, Water hemp, Morning
glory, Spotted Spurge, and Prickly Sida. Figure 2 shows the sample images of each
class.
Fig. 2 Sample images of 11 weed classes in the dataset

4.2 Discussion on Results
The environment used to construct and validate the model is Google Colab. It is a
free notebook environment which helps to connect to the Google Drive. Colab has
many pre-installed machine learning libraries which can load into our notebook using
import keyword followed by the library name. The libraries used in this project are
PyTorch, Seaborn, Matplotlib, time, multiprocessing, csv, etc. Figure 3 shows the
predicted results of some weed images as part of the validation process of the model.
The three weeds in the above figure are predicted correctly but the last one is wrongly
predicted as the Morning glory. From the confusion matrix of the InceptionV4,
carpetweeds were classified with 96% accuracy but the Swinecress and Spurred
Anoda were not classified properly due to the similar structure with other weeds.
The graphical representation of the per class, precision, recall, and F1score is shown
in Fig. 4.
The different models are constructed using the same dataset with different CNN
algorithms. Even though the Xception and efficientnet-B5 have given high training
accuracy and validation accuracy, the models failed to predict the real-time weeds in
the cotton fields. This may be due to the overfitting of the data. InceptionV4 predicts
the labels with more accuracy for real weed images. The tabular representation of
the training time, training accuracy, validation accuracy, etc., for different models
with different epochs is shown in Table 1. The training and validation loss for the
InceptionV4 model is shown in Fig. 5. The loss graph of Fig. 5 is with 25 epochs. At
Fig. 3 Input images and predicted labels

Fig. 4 InceptionV4 model evaluation metrics per each class
epoch-0, the validation loss is 2.2558 and the training loss is 2.2456, and at epoch-25,
the validation loss is nearly 0.8328 and the training loss is 0.3127.
5 Conclusion
Weed detection is essential for automatic weeding machinery. Weed-specific herbi-

cides can be given by classifying weeds in an area when running machinery on total
Table 1 Training and validation accuracy of different models

Model Training time (m) Trainable Best training Best validation
parameters accuracy (%) accuracy (%)
Xception 136 20,837,687 81 60
InceptionV2 49 41,165,871 72 16
Efficientnet-B5 219 28,371,519 93 60
InceptionV4 98 41,159,723 96 66
Fig. 5 Training and validation loss of InceptionV4 model
crop cannot be afforded. For weed pulling application, high accuracy and speed
should be given. Our project concentrates on higher accuracy with increased speed.
The project also facilitates the retraining when suffice amount of new data is added.
Therefore, the models with less computational cost and higher efficiency are used as
part of this project.
References
1. Alex O, Konovalov DA, Philippa B, Ridd P, Wood JC, Johns J, Banks W et al (2019) DeepWeeds:
a multiclass weed species image dataset for deep learning. Sci Rep 9(1):1–12
2. Zhang R et al (2020) Weed location and recognition based on UAV imaging and deep learning. Int
J Precision Agric Aviat 3(1)
3. Arif S et al (2021) Weeds detection and classification using convolutional long-short-term
memory. ResearchSquare
4. Islam N, Rashid MM, Wibowo S, Wasimi S, Morshed A, Xu C, Moore S (2020) Machine
learning based approach for weed detection in chilli field using RGB images
5. li L, Zhang S, Wang B (2021) Plant diease detection and classification by deep learning. IEEE
6. Chen D, Lu Y, Li Z, Young S (2021) Performance evaluation of deep transfer learning
on multiclass identification of common weed species in cotton production systems.
arXiv:2110.04960[cs.CV]
An Ensemble Model to Detect
Parkinson’s Disease Using MRI Images
T. Sri Lakshmi, B. Lakshmi Ramani, Rohith Kumar Jayana, Satwik Kaza,

Soma Sai Surya Teja Kamatam, and Bhimala Raghava
Abstract Parkinson’s disease (PD) is a highly common progressive central nervous

system disorder caused by the decrease in neurons that produce dopamine in the basal
ganglia and substantia nigra regions of the brain that control the body’s movement. To
diagnose this disorder in the early stages, an extensive analysis of Magnetic Reso-
nance Imaging (MRI) capable of capturing the pathophysiological changes in the
brain that can determine the deficiency of dopamine in Parkinson’s disease affected
patients is required. In our study, an ensemble of deep neural networks has been
implemented to accurately detect and classify the brain MR images of patients into
Parkinson’s disease and healthy control (HC). These deep learning models help clin-
icians use models with good classification performances of specific feature sets and
better classify images in the early diagnoses of Parkinson’s disease. An ensemble
of popular convolutional neural networks VGG16 and ResNet50 is performed. The
model helps combine the best performance of the two models in extracting specific
features of the images and is tested on a large dataset to observe an overall high
classification performance compared to the individual performance of the models. A
weighted average ensemble is used, which takes the ideal weights of the two models
based on their contribution to classification. An accuracy of 96.09% is observed.
Keywords Parkinson’s disease · MRI · Deep learning · Convolutional neural

networks · VGG16 · ResNet50 · Ensemble
1 Introduction
PD is a neural disorder that affects the human brain’s motor system. The disease
occurs when neurons that control the human body’s movement become impaired
and eventually die. When this phenomenon occurs, neurons produce less dopamine,
T. S. Lakshmi (B) · B. L. Ramani · R. K. Jayana · S. Kaza · S. S. S. T. Kamatam · B. Raghava

Prasad V. Potluri, Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India
e-mail: tslakshmi@pvpsiddhartha.ac.in
B. L. Ramani
e-mail: blramani@pvpsiddhartha.ac.in
466 T. S. Lakshmi et al.
a brain chemical responsible for the disease’s movement problems. The symptoms
appear when 60% of the dopaminergic neurons begin deteriorating. Significant PD
symptoms are shaking, rigidity, and difficulty with walking, balance, and coordi-
nation. These symptoms typically begin slowly and get worse over time. PD is the
second primary neurological disorder widespread in older adults after Alzheimer’s.
Age is identified to be an extreme risk factor for Parkinson’s. The disease’s occur-
rence reaches a high at around 80 years. The number of patients will be more than
30% by 2030. Environmental and genetic factors play a considerable role in the cause
of PD.
Detection of PD in the early stages is crucial in impeding the disease’s growth
and providing patients with some opportunity to have access to good treatment. The
disease is usually diagnosed by using history and neurological investigations [1]. But
the condition may not be identified accurately, as other similar symptoms of diseases
related to neurodegeneration exist. Diagnosis is typically made when there is a heavy
loss of dopamine chemicals. The exact detection of PD is a task that still poses a
challenge. Various clinical tests can diagnose it. But, as the clinical tests are related to
biological brain changes, visual image inspection can be an appropriate technique for
diagnosis. Several neuroimaging methods like Single Photon Emission Computed
Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Resonance
Imaging (MRI), Functional Magnetic Resonance Imaging (fMRI), and Transcranial
Sonography are used for diagnoses of PD [2]. However, MRI has seen many recent
improvements, making the diagnosis relatively easier. Convolutional neural network
(CNN) is a deep learning (DL) technique that has recently demonstrated excellent
results in classifying images in visual content analysis. Over the years, researchers
and students have studied and worked upon artificial neural networks to solve crucial
image classification challenges [3]. In this work, an ensemble of popular convolu-
tional neural networks VGG16 and ResNet50 is performed to observe an overall high
classification performance compared to the models’ individual performance.
2 Literature Review
The authors of [4] discuss the existing deep learning architectures for image detec-
tion, segmentation, classification, etc., of MRI. It mainly focuses on deep learning’s
application in disease detection using the MRI modality and numerous problems
and current advances in deep learning linked to image processing. In paper [5], a
CNN architecture has been implemented to efficiently classify Alzheimer’s subjects
from healthy control subjects using fMRI data. The authors of [6] have shown how
complex networks can be proficiently used to define novel brain connectivity and
introduce accurate PD markers. In the study of [7], the authors proposed a custom
CAD-based CNN model for classifying healthy patterns and MRI patches related to
Parkinson’s.
The authors of [8] proposed a framework to classify MRI scans of Parkinson’s
disease and healthy control subjects by combining data augmentation techniques
An Ensemble Model to Detect Parkinson’s Disease Using MRI Images 467
with a transfer learned CNN like AlexNet. In [9], the authors have shown how the
dropout algorithm can affect accuracy in diagnosing Parkinson’s disease and the use
of batch normalization. 97.92% accuracy is achieved when these are applied with
LeNet-5 architecture.
In the study [10], two DL models were used to classify PD subjects from healthy
control subjects at the early stages of diagnosis. The flair and T2-weighted MRI
scans were extracted from the public database of PPMI. To improve model perfor-
mance, pre-processing was performed in four stages in the following order: N4 bias
correction, histogram matching, z-score normalization, and image rescaling. They
regulated the model having almost 123 million parameters using dropout and a ridge
regularizer and achieved a high accuracy of 88.9%.
All the above-proposed models have achieved considerable accuracy in classifying
the brain MR images.
3 Materials and Methods
This section contains a brief explanation of the methodology involved in our proposed
work. It covers the MRI database that is taken, the pre-processing of the image dataset,
and the architectures of the CNN models used.
3.1 Dataset
For our work, we extracted axial plane-oriented PD (Proton Density)-T2-weighted

TSE sequenced MRI images from a public domain database, namely Parkinson’s
Progression Markers Initiative (PPMI). Authors of [11] have identified that TSE
sequenced MRI images have more diagnostic reliability. The PPMI dataset used in
our proposed study consists of 186 subjects, of which 102 are PD subjects and 84
HC subjects. The subjects’ demographics present in the extracted dataset are given
in Table 1.
Dataset pre-processing
The MR images present in our dataset are normalized in this pre-processing stage to
maintain constant intensity and uniformity across all the images in the dataset. The
Table 1 Demographics
HC PD
Subject records 84 102
Sex (F/M) 38/46 38/62
Age 58 ± 11 (approx.) 60 ± 9(approx.)
images are normalized to the (0, 1) range using normalization methods. A filtering
operation is then applied to these images to reduce their noise. A 2D Gaussian filter
with a value of 0.8 is set as the optimized standard deviation, and a 5 × 5 kernel
is applied to smoothen the previously normalized images, reducing the intensity
inconsistencies.
To increase the size of the dataset and better train the models, we perform real-
time augmentation, which creates new iterations in real time while the model is being
trained. We create multiple transformed copies of the same image using this image
augmentation technique by applying different transformations to original dataset
images.
Pre-processing is then done on the raw images shown in Fig. 1a by applying a
Gaussian filter to reduce noise in the images shown in Fig. 1b. The image dataset is
split into training, validation, and test sets in the ratio of 70:15:15, indicating that
70% of the dataset is split for training, 15% for validation, and 15% for the test set.
The models are then made to train with images of size 224 × 224 pixels for
VGG16 and 256 × 256 pixels for ResNet50, which are processed in their respective
layers from the input layer to the final output layer. An average of 45 ± 10 image
slices from each patient has been used in the input dataset according to the criteria
given in Table 2.
Raw Gaussian Blur

HC PD HC PD
(a) Raw Images (b) Noise reduction using Gaussian Blur
Fig. 1 MR images
Table 2 Structure of dataset

Classes HC PD Total
for each split and each class
Images for training 2813 2798 5611
Images for validation 594 608 1202
Images for testing 589 614 1203
3.2 Methodology
The ensemble model used in our study consists of two CNN models, namely VGG16
and ResNet50. The VGG16 is a 16-layer convolutional neural network. This network
substitutes huge kernel-sized filters with multiple 3 × 3 kernels, one after the
other, followed by Max-pooling across a 22-pixel window with stride two and three
completely connected layers [12]. ResNet50, a 50-layer neural network, was success-
fully trained using the ResNet unit. The core concept is residual learning, and it is
shown to be effective in dealing with network degeneration. It produces good results
despite having fewer parameters than VGGNet [13].
To tackle the issue of a large number of parameters, models that have already been
pre-trained on different image datasets and contain pre-trained weights are used to
ease the process for classifying new datasets [14]. Both of our proposed VGG16
and ResNet50 models have been trained previously on the ImageNet dataset, which
consists of around 15 million images and 1000 classes. This knowledge helps in
better classification of our dataset, giving improved accuracy.
Ensemble Modeling
The proposed methodology’s core concept involves ensembling two CNN models
to increase accuracy considerably. It was first discussed in [15]. Ensembling is the
process of combining multiple learning models or algorithms to achieve a collec-
tive and improved performance in prediction. Traditionally, while many models are
available to classify or predict data individually, sometimes it may lead to lower
accuracy due to the models not fitting the whole training data or some models identi-
fying specific features better than the other models. If we combine such models, the
overall accuracy is boosted, leading to a better classification of images. Our study
ensembles a VGG16 model and a ResNet50 model.
The proposed method shown in Fig. 2 uses the weighted average ensemble method.
The weighted average ensemble combines the models based on their effectiveness
and contribution in classifying the given dataset, since some models tend to classify
a specific set of features better than the others. Our study combines the models by
finding the ideal weights to achieve maximum possible accuracy.
4 Experimental Results
The MR image dataset is trained on the proposed models. The raw data is fed into
both models’ first convolution layers, and image slices’ convolution is performed
with filters. Prominent features that help recognize images are extracted at each
convolution layer. The fully connected (FC) layer is then fed by the features learnt
by all the previous layers. The VGG16 and ResNet50 models are trained for 25
epochs using the training set and validation set we prepared earlier. The models’
parameters are tuned based on the validation set for every epoch. The training loss
Fig. 2 Ensemble model
and accuracy and the validation loss and accuracy are calculated for every epoch.
Initially, the learning rate is initialized to 1e-4. The learning rate decreases if the
validation loss is not improved over several epochs. The models are saved when
there is an improvement in validation loss. The best models that were saved during
the training process after a cycle of 25 epochs are used, and the metrics, loss value,
and accuracy are found. The parameters used are shown in Table 3.
During the training process of VGG16, the validation loss is initially in the range
of 0.6–0.7. It gradually decreases with each epoch as the model learns and tunes
its hyperparameters based on the validation set. The learning rate is reduced if the
validation loss does not improve for a set number of iterations. The validation loss
reaches 0.38 by the end of 25 epochs. Graphs for training and validation metrics are
plotted in Fig. 3.
Similarly, for ResNet50, the validation loss is initially observed to be in the range
of 1.2–1.6 and decreases gradually with each iteration as the model learns and tunes
its hyperparameters based on the validation set. The learning rate decreases if the
validation loss does not improve for a set number of iterations. The validation loss
improves to 0.27 by the end of 25 epochs. Graphs for the training process are plotted
in Fig. 4.
Table 3 Parameters used for

Parameters VGG16 ResNet50
training
Activation function Softmax Softmax
Optimizer Adam Adam
Loss Categorical cross CatCategorical
entropy cross entropy
Batch size 64 64
Epochs 25 25
Learning rate 1e-4 1e-4
Fig. 3 Accuracy and loss graphs for training and validation—VGG16
Fig. 4 Accuracy and loss graphs for training and validation—ResNet50
These models are subjected to predicting classes of the images present in the test
dataset. The output from the final fully connected layer consisting of the softmax
function gives two outputs, probabilities between 0 and 1. Using the predicted labels
and true labels, the performance of the classifiers of VGG16, ResNet, and ensemble
models are obtained by measuring their accuracy, recall, precision, and F1-score.
The classification results for VGG16, ResNet50, and ensemble model are tabulated
in Table 4.
ResNet50 performed better in classifying PD, while VGG16 classified HC better.
The weighted average ensemble is applied using these models, combining the predic-
tions from each model based on the ideal weights. Ideal weights for each model are
found using the grid search algorithm. The contribution of each model is weighted
proportionally to its capability and effectiveness, which in turn results in the best
achievable combination and maximum performance, which can be seen in the table.
Table 5 shows the comparison of VGG16, ResNet, and ensemble models’ accuracy
on test data. We can see that the ensemble model achieved higher accuracy than the
ResNet50 and VGG16 models for the detection and classification of Parkinson’s
disease.
Table 4 Classification report

Architectures Precision Recall F1-Score Support
VGG16 HC 0.87 0.94 0.90 589
PD 0.94 0.87 0.90 614
Accuracy 0.90 1203
ResNet50 HC 0.98 0.85 0.91 589
PD 0.88 0.99 0.93 614
Accuracy 0.92 1203
Ensemble HC 0.99 0.93 0.96 589
PD 0.93 0.99 0.96 614
Accuracy 0.96 1203
Table 5 Comparison of
Model Accuracy (%)
accuracies of all models
VGG16 90.19
ResNet50 92.18
Ensemble model (VGG16 and ResNet50) 96.09
5 Conclusion
Parkinson’s disease has no cure till today, and effective early diagnosis is essen-
tial before it can severely affect the patients so that proper care can be taken in
later stages. MRI has been increasingly used in recent years for neuroimaging anal-
ysis of degenerative diseases. In this work, we performed a study on the classifi-
cation of MRI scanned images of HC and PD patients by applying state-of-the-art
deep learning architectures and techniques. We have used pre-trained VGG16 and
ResNet50 models for detection and classification purposes. The final FC layer is
fine-tuned for classifying HC and PD classes. Later, we built an ensemble of the
best performing versions of the two models using the weighted average ensemble
technique where each model’s prediction is multiplied by their ideal weights, and
then, their average is calculated. The contribution of each model to the final predic-
tion is weighted by their individual performance. An accuracy of 90.19% is achieved
with VGG16 and 92.18% with Resnet50 individually. The proposed ensemble model
achieved an accuracy of 96.09%, showing an improved discriminatory proficiency
than individual deep learning models. In future, modern state-of-the-art CNN models
with deeper architectures like EfficientNet and DenseNet with millions of parameters
and their ensembles can be used to classify MR images with very high accuracies,
making the diagnoses of PD no longer an arduous job for clinicians.
References
1. Jankovic J (2008) Parkinsons disease: clinical features and diagnosis. J Neurol Neurosurg
Psychiatry 79(4):368–376
2. Chung HYChung YL, Tsai WF (2019) An efficient hand gesture recognition system based on
deep CNN. In: 2019 IEEE international conference on industrial Technology (ICIT). IEEE
3. Provost JS, Hanganu A, Monchi O (2015) Neuroimaging studies of the striatum in cognition
part I: healthy individuals. Front Syst Neurosci 9:140
4. Kalyani G, Janakiramaiah B, Karuna A, Prasad LV (2021) Diabetic retinopathy detection and
classification using capsule networks. Complex Intell Syst, pp 1–14
5. International Conference on Power Energy, Environment and Intelligent Control (PEEIC)
(2019) Greater Noida, India, pp 458–465. https://doi.org/10.1109/PEEIC47157.2019.8976727
6. Sarraf S, Tofighi G (2016) Deep learning-based pipeline to recognize Alzheimer’s disease using
fMRI data. In: 2016 future technologies conference (FTC), San Francisco, CA, pp 816-820.
https://doi.org/10.1109/FTC.2016.7821697
7. Amoroso N, La Rocca M, Monaco A, Bellotti R, Tangaro S (2018) Complex networks reveal
early MRI markers of Parkinson’s disease. Med Image Anal 48:12–24
8. Shah PM, Zeb A, Shafi U, Zaidi SFA, Shah MA (2018) Detection of Parkinson disease in brain
MRI using convolutional neural network. In: 2018 24th international conference on automation
and computing (ICAC). IEEE, pp 1–6
9. Kaur S, Aggarwal H, Rani R (2021) Diagnosis of Parkinson’s disease using deep CNN with
transfer learning and data augmentation. Multimedia Tools Appl 80(7):10113–10139
10. Bhan A, Kapoor S, Gulati M, Goyal A (2021) Early diagnosis of Parkinson’s disease in brain
MRI using deep learning algorithm. In: 2021 third international conference on intelligent
communication technologies and virtual mobile networks (ICICV). IEEE, pp 1467–1470
11. Vyas T, Yadav R, Solanki C, Darji R, Desai S, Tanwar S (2021) Deep learning-based scheme
to diagnose Parkinson’s disease. Expert Syst, e12739
12. Fellner F, Schmitt R, Trenkler J, Fellner C, Helmberger T, Obletter N, Böhm-Jurkovic H (1994)
True proton density and T2-weighted turbo spin-echo sequences for routine MRI of the brain.
Neuroradiology 36(8):591–597. https://doi.org/10.1007/BF00600415
recognition. arXiv 1409.1556
14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778.
https://doi.org/10.1109/CVPR.2016.90
15. Subramanian M, LV NP, B J, A MB, VE S (2021) Hyperparameter optimization for transfer
learning of VGG16 for disease identification in corn leaves using Bayesian optimization. Big
Data 3:X, pp 1–15, https://doi.org/10.1089/big.2021.0218.
Classification of Diabetic Retinopathy
Using Deep Neural Networks
J. Hyma, M. Ramakrishna Murty, S. Ranjan Mishra, and Y. Anuradha
Abstract Diabetic retinopathy (DR) is one of the disorders which generally occurs
among diabetic patients, which can even affect the eye gradually. This disorder
has to be identified at the beginning stage, or else this can damage the eyesight
permanently. Since the fundus oculi are so easily visible, retinopathy is the most
commonly recorded chronic complication of diabetes and, as a result, the one we
know the most about in terms of epidemiology and natural history. Clinicians can use
empirical but effective methods to postpone the initiation and development of diabetic
retinopathy by achieving near-normal blood glucose and blood pressure levels. In
order to identify this abnormality, ophthalmologists use the “fundus images” of the
eye, that is retinal image to detect it. But to detect this abnormality through a naked
eye, ophthalmologists find it difficult because it takes lot of time and consumes cost.
Also, there might be misjudgement through a naked eye. So, “deep learning” can be
used to detect the diabetic retinopathy at an early stage. There are many techniques
of deep learning using “convolutional neural network (CNN)” to achieve the results
of the level of eye damage. In this work, “Residual Networks” are also experimented
to find out the results by improving accuracy.
Keywords Diabetic retinopathy · Deep learning · Convolutional neural networks ·

ResNet
J. Hyma (B)
Department of CSE, GITAM University (Deemed to Be), Visakhapatnam, India
e-mail: hjanapan@gitam.edu
M. Ramakrishna Murty · S. Ranjan Mishra
Department of CSE, Anil Neerukonda Institute of Technology & Sciences (ANITS),
Visakhapatnam, India
e-mail: mramakrishna.cse@anits.edu.in
Y. Anuradha
Department of CSE, G.V.P College of Engineering (A), Visakhapatnam, India
e-mail: anuradhayarlagaddag@gvpce.ac.in
476 J. Hyma et al.
1 Introduction
Diabetes mellitus (DM) is a leading cause of vision loss in people of working age.
Medical signs of vascular defects in the eye are used to diagnose DR. Increased
vascular permeability and capillary occlusion are two key findings in the retinal
vasculature in non-proliferative diabetic retinopathy NPDR, which represents the
early stage of DR. Fundus photography can detect retinal pathologies such as microa-
neurysms, haemorrhages, and hard exudates at this point, even if the patients are
asymptomatic.
PDR is distinguished by neovascularization, which is a more advanced stage of
DR. When the new irregular vessels bleed into the vitreous (vitreous haemorrhage)
or when tractional retinal detachment is present, the patients may experience extreme
vision impairment. DME—diabetic macular oedema is characterized by swelling or
thickening of the macula caused by sub- and intra-retinal fluid accumulation in the
macula as a result of the blood-retinal barrier breaking down (BRB). DME can occur
at any level of DR, causing visual image distortion and a loss of visual acuity.
Laser photocoagulation has proved to be extraordinarily successful even though
other treatments have failed and retinopathy has progressed to the point of sight loss.
Despite this, retinopathy continues to be a leading cause of blindness, and there is no
evidence that diabetes-related vision loss is declining in developed countries. This
may be due to the mixed blessing of longer survival of diabetic patients who were
diagnosed when metabolic regulation was less stringent than it is now. Screening
for sight-threatening retinopathy is the most cost-effective medical technique docu-
mented, and it can help improve the usage of diagnostic and therapeutic services, but
most healthcare systems are still stuck in a state of stagnation and lack of interest. In
order to identify this abnormality, ophthalmologists use the “fundus images” of the
eye, that is retinal image to detect it.
But to detect this abnormality through a naked eye, ophthalmologists find it diffi-
cult because it takes lot of time and consumes cost. Also, there might be misjudge-
ment through a naked eye. Several traditional image processing techniques have
been experimented on DR detection. More advanced deep learning techniques are
also used to detect the diabetic retinopathy at an early stage. There are many tech-
niques of deep learning using “convolutional neural network (CNN)” to achieve the
results of the level of eye damage. This work extended with an experimentation with
“Residual Network” to find out the results by improving accuracy.
Diabetic retinopathy is divided into five categories: “No Diabetic Retinopathy
(NDR)”, “Mild Non-proliferative Retinopathy”, “Moderate Non-proliferative
Retinopathy” or “Severe Non-proliferative Retinopathy”, and “Proliferative
Retinopathy”.
1. No Diabetic Retinopathy (NDR): It is the early stage of the disease, in which
there is no irregular blood vessel development (proliferation).
2. Mild Non-proliferative Retinopathy: This is the first stage of the disorder, which
is characterized by microaneurysms, which are small balloon-like swellings that
develop within the retina’s tiny blood vessels.
Classification of Diabetic Retinopathy Using Deep Neural Networks 477
3. Moderate Non-proliferative Retinopathy: At this point, the blood vessels that

nourish the retina have become blocked as the amount and size of those balloon-
like swellings or microaneurysms have increased significantly.
4. Severe Non-proliferative Retinopathy: At this point, a large number of blood
vessels have become blocked, depriving many areas of the retina of blood supply.
As a result of the signals transmitted by the retina to the brain, new blood vessels
expand to compensate for the loss of blood supply and nourishment.
5. Proliferative Retinopathy: This is the most advanced stage of DR progression, in
which the retina sends a signal to the brain to compensate for lost nourishment,
resulting in a significant growth of new blood vessels. These new irregular blood
vessels are extremely fragile, vulnerable to damage and blood and fluid leakage
within the retina, as well as trespassing into the transparent, vitreous gel that
makes up the white area of the eye. Extreme vision loss or even blindness may
result from prolonged blood and fluid leakage.
2 Literature Study
The work proposed in [1] aimed to find the abnormalities with a proper detection of
abnormal features of the retinal fundus images. It focused on pre-processing steps
like image enhancement, noise removal, etc., which are crucial in detecting important
features. The results depicted the successful extraction of features and their classifica-
tion to various DR stages. Another work proposed in [2] developed a saliency-based
technique for leakage detection in the angiography. The work proposed in [3] has used
Principal Component Analysis (PCA) for better feature selection and also used back-
propagation neural networks for classifying retinal images to non-diabetic or diabetic
classes. In [4], a hybrid classifier has been proposed with a combination of m-medoids
with Gaussian mixture model to detect retinal lesions with an improved accuracy.
Digital colour images of retinas have been considered for automatic detection of
retinopathy [5]. Another work proposed in [6] came up with an advanced method
for automatic extraction of anatomical features with more precise and accuracy to
detect and diagnose the Glaucoma. Convolutional neural network and advanced deep
learning techniques in defining and analysing the deviations in the DR fundus images
from the non-DR fundus images (the input data) were proposed in the paper [7]. The
work given in [8] used the convolutional neural network, and current DR screening
systems usually use retinal fundus imaging, which is manually assessed by profes-
sional readers. The aim of this research was to create a reliable diagnostic technology
that could be used to automate DR screening. Using our local data collection, they
have achieved a 0.97 AUC with 94 and 98% sensitivity and specificity, respectively,
after fivefold cross-validation. They a developed a systematic analysis of causes of
vision loss.
478 J. Hyma et al.
3 Methodology
Permanent blindness will be an unfortunate thing that can happen if DR is untreated on

time. For a successful treatment by the medical practitioners, early detection of DR is
one of the major prerequisites. An accurate and on-time detection is crucial and helps
many people from the threatening complications. Professional doctors are currently
classifying the videos, and there are only a few of them who are capable of doing
such problem detection. Despite the fact that the doctors identify the photographs,
the results of the reports take between 15–20 days to arrive since they must undergo
several scans. However, since humans cannot be 100% precise all of the time, there
is no guarantee that the secret images are accurate enough. It is therefore preferable
to send the images to computational models in order to be accurate in classifying the
images to various classes of DR. The main goal here is to divide a diabetic patient’s
retinal images into five categories based on the degree of the damage to the eye. The
proposed work aimed to work with few convolutional neural network models to see
which one provides the best accuracy since neural networks are the best in image
processing, and the methodology is depicted in Fig. 1 [1].
3.1 Dataset Description
There are several freely accessible datasets for detecting DR and vessels in the retina.
These datasets are often used to train, verify, and evaluate systems, as well as to
compare the performance of one system to that of others. Retinal imaging includes
fundus colour images and optical coherence tomography (OCT). OCT images are
two- and three-dimensional images of the retina taken with low-coherence light that
reveal a lot about the shape and thickness of the retina, while fundus images are
two-dimensional images taken with reflected light. OCT retinal images have been
Fig. 1 CNN model for retinal image classification

Fig. 2 Fundus image
available for a few years now. A wide range of publicly accessible fundus image
datasets are widely used. The following is the fundus image shown in Fig. 2.
Kaggle: It comprises 88,702 high-resolution images obtained from various
cameras, with resolutions ranging from 433 289 pixels to 5184 3456 pixels. Each
picture is assigned to one of the five DR levels. Only the ground truths for training
photographs are open to the public. Many of the photographs on Kaggle are of low
quality and have inaccurate labelling. We have also created a CSV file with the image
names corresponding to the degree of eye injury, which ranges from 0–4.
3.2 Pre-processing
Images from patients of various ethnicities, genders, and lighting conditions in fundus
photography were included in the dataset. This has an effect on the pixel intensity
values in the images, resulting in unneeded variance that is unrelated to classification
levels. To combat this, the Python Image Library package was used to apply colour
normalization to the files. The images were also of high resolution, requiring a large
amount of memory. The images were resized to 128 × 128 pixels.
3.3 Training
The CNN was pre-trained on 1000 photographs at first, before it achieved a significant
degree of accuracy. This was essential in order to get a fast classification result without
wasting a lot of training time. The model was trained on 1000 training images for
another epoch after two epochs of training on the initial images. Over-fitting is a
problem for neural networks, particularly in a dataset like ours, those with no signs
480 J. Hyma et al.
of retinopathy. The class weights were changed with a ratio proportional to how
many images in the training batch were graded as having no signs of DR for each
batch loaded for back-propagation. The probability of over-fitting to a specific class
was significantly decreased as a result of this. To stabilize the weights, more epochs
were used [9]. This was followed by increasing the model’s accuracy to over 75%.
The network was then trained with a low learning rate on the entire training set of
photographs. Various layers and their importance in reaching the objective are given
below.
. Pooling Layer
Pooling is a nonlinear down-sampling technique. Pooling can be implemented
using a variety of nonlinear functions, the most common of which is max pooling.
It divides the input image into arrays and outputs the limit. A feature’s exact loca-
tion is not that useful when compared to components. Pooling in convolutional
neural networks is based on this concept. Down-sampling is the term for this. In
a CNN architecture, a pooling layer is often inserted between successive convo-
lutional layers (each of which is usually accompanied by an activation feature,
such as a ReLU layer). Pooling layers contribute to local translation invariance in
a CNN, but they do not have global translation invariance unless global pooling
is used.
. Rectified Linear Unit (ReLU)
Rectified Linear Units (RLUs) are frequently used in deep learning models. To be
clearer, if the function receives a negative value, it returns 0; if it receives a positive
value, it returns the same positive value [10]. The following is a description of the
function:
f(x) = max(0, x) (1)
The RLU, also known as the ReLU, helps the deep learning model to account for
nonlinearities and complex interaction effects. The ReLU function has the advan-
tage of being a relatively inexpensive function to compute due to its simplicity.
The model can be trained and run in a short amount of time because there is
no complex math involved. Similarly, it converges faster, implying that the slope
does not plateau as X increases. Unlike other functions such as sigmoid or tanh,
ReLU avoids the vanishing gradient problem [11]. Finally, ReLU is only partially
activated since the output is zero for all negative inputs.
. SoftMax
The SoftMax function is a generalization of the logistic function, which simply
squashes values into a given set. The reason for using the SoftMax is to ensure
that these logits all add to 1, thus satisfying the probability density constraints. If
one of the inputs is large, it becomes a large probability; however, it will always
be between 0 and 1 [12, 13].
4 Results
To validate the proposed algorithm, the Kaggle dataset clinically specified referable
diabetic retinopathy as a benchmark was used. The proposed model was trained for
enough number of epochs on the Kaggle dataset. For validation purposes, 20,000
images from the dataset were saved. The number of patients with a proper classi-
fication is what we call accuracy. The network’s classifications were numerically
described as follows: 0—No DR, 1—Mild DR, 2—Moderate DR, and 3—Serious
Proliferative DR is the fourth form of DR. The entropy picture of the grey level’s
result outperforms the photograph. This approach, which uses the greyscale variable,
is more accurate and sensitive than the fundus photograph’s entropy of luminance.
Using the entropy image of the greyscale portion will improve accuracy and prevent
under-diagnosis. When using CNN with the entropy picture from the grey as input, the
output is around 73.5% better than when using CNN with individual input. The final
validation dataset achieved 75.5 per cent accuracy by using the Residual Network
and is presented in Figs. 3 and 4.
CNN Model RESNet

75 80
100,
75 100, 75
Accuracy
72.5
Accuracy
70
70
65 65 Accuracy
Accuracy
60 60
0 20 40 60 80 100120 0 20 40 60 80100120
Batch Size Epoch
Fig. 3 CNN and ResNet performance analysis
Comparison of CNN and RESNET models

78
75
Accuracy %
72
69
CNN RESNet
Model
Fig. 4 CNN versus ResNet

482 J. Hyma et al.
5 Conclusion
In diabetic patients, a deep learning can improve the accuracy of diagnosing retinal
pathologies. The proposed method’s methodology starts with the grey portion of
the RGB picture. The RGB component’s entropy image can help with accuracy and
sensitivity. In this case, the CNN has been pre-processed with a greyscale input image
and has a lower accuracy than the Residual Network. The proposed deep learning
technology will benefit the automated retinal image analysis system and will assist
ophthalmologists in diagnosing referable diabetic retinopathy. As a future work,
more deep models can be experimented to get better accuracy.
References
1. Raman V, Then P, Sumari P (2016) Proposed retinal abnormality detection and classification
approach: computer-aided detection for diabetic retinopathy by machine learning approaches.
In: 2016 8th IEEE international conference on communication software networks (ICCSN).
IEEE
2. Zhao Y (2017) Intensity and compactness enabled saliency estimation for leakage detection in
diabetic and malarial retinopathy. IEEE Transac Med Imaging 36(1):51–63
3. Prasad DK, Vibha L, Venugopal KR (2015) Early detection of diabetic retinopathy from digital
retinal fundus images. In: 2015 IEEE recent advances in intelligent computational systems
(RAICS). IEEE
4. Akram MU (2014) Detection and classification of retinal lesions for grading of diabetic
retinopathy. Comput Biol Med 45:161–171
5. Winder RJ (2009) Algorithms for digital image processing in diabetic retinopathy. Comput
Med Imaging Graphics 33(8):608–622
6. Haleem MS (2013) Automatic extraction of retinal features from colour retinal images for
glaucoma diagnosis: A review. Comput Med Imaging Graph 37(7):581–596
7. Harshitha C, Asha A, Pushkala JLS, Anogini RNS, Karthikeyan C (2021) Predicting the stages
of diabetic retinopathy using deep learning. In: 2021 6th international conference on inventive
computation technologies (ICICT), 2021, pp 1–6. https://doi.org/10.1109/ICICT50816.2021.
9358801
8. Goh JK, Cheung CY, Sim SS, Tan PC, Tan GS, Wong TY (2016) Retinal imaging techniques
for diabetic retinopathy screening. J Diabetes Sci Technol 10(2):282–94. https://doi.org/10.
1177/1932296816629491. PMID: 26830491, PMCID: PMC4773981
9. Praneel ASV, Rao TS, RamakrishnaMurty M (2019) A survey on accelerating the classifier
training using various boosting schemes within casecades of bossted ensembles. Springer SIST
series, 169:809–825
10. Agarap AF (2018) Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.
08375
11. Hanin B (2019) Universal function approximation by deep neural nets with bounded width and
relu activations. Mathematics 7(10):992
12. Wang M, Lu S, Zhu D, Lin J, Wang Z. A high-speed and low-complexity architecture for
softmax function in deep learning. In: 2018 IEEE Asia pacific conference on circuits and
systems (APCCAS). IEEE, pp 223–226
13. de La Torre J, Valls A, Puig D (2020) A deep learning interpretable classifier for diabetic
retinopathy disease grading. Neurocomputing 5(396):465–476
A Deep Learning Model for Stationary
Audio Noise Reduction
Sanket S. Kulkarni, Ansuman Mahapatra, and T. Bala Sundar
Abstract The primary aim of the paper is to reduce the noise in the audio using
deep learning techniques. Speech denoising is a long-standing problem. Given a noisy
input signal, we aim to filter out the undesired noise without degrading the signal
of interest. Noise is a widespread problem faced during calls, video recordings, and
many other situations. The proposed work focuses on short-time Fourier analysis,
which can be done on the sound wave. The Convolutional neural network (CNN)
architecture removes the noise from analyzed data. Then inverse Short-Time Fourier
analysis is used to reconstruct the sound wave. The work targets stationary noise
like wind or thermal noise and non-stationary noises like strings, music, chattering,
etc. The result is compared with existing work to show the efficacy of the proposed
framework.
Keywords Noise reduction · Neural network · CNN · Deep learning · Audio ·

Stationary noise
1 Introduction
The audio noise reduction plays a vital role in scenarios such as traveling by bike or
bus when an individual gets an urgent call and lifts the call, but the opposite person
can hear the noise from the wind. This problem is addressed by many different signal
processing techniques. For instance, in the case of mobile phones, this problem has
already been addressed using multiple mics. When there is a need to remove noise
from recorded audio, keeping multiple mics on the phone consumes extra space,
making the mobile thicker.
The above problems can be addressed if we shift from a hardware-based solution
to a software-based solution. The current work presents one such software-based
solution using deep learning techniques. The deep learning technique used in this
work is a kind of convolutional neural network (CNN). The reason behind using a
S. S. Kulkarni · A. Mahapatra (B) · T. Bala Sundar

National Institute of Technology Puducherry, Puducherry, India
e-mail: ansuman.mahapatra@nitpy.ac.in
484 S. S. Kulkarni et al.
CNN is that it can work well with 2D matrices, and the number of parameters is very
low to require less computation.
Not only in phones, this solution can be utilized in many cases where arranging
multiple mic setups is not possible due to power, cost, and space constraints (like
hearing aid). However, this work focuses on reducing noise from signals received
during phone calls. This work concentrates on noises like stationary noises like
thermal noise. Noise reduction is considered as the process of
1. Identifying the presence of noise in the audio signal
2. Removing noise from the signal
3. Reproducing the clean signal
The objective of this work is to reduce noise in audio due to air which would
be feasible for both real-time applications like calls and non-real-time applications
like audio from a video recording. Current-generation phones use multiple mics to
reduce background noise. The latest iPhones use four mics which takes more phone
space and adds cost, the RNNoise software developed by Mozilla uses handpicked
features, which causes inferior performance of the neural network. Many other works
which use non RNN solutions suffer from high latency due to their sampling rates.
Some of the applications include noise-free calls, removing noise from video
recording, removing noise caused by cochlear implants. Further noise cancellation
is a very much required mechanism in many cases. Besides that, noise reduction in the
video can help viewers retain concentration. It can be used to clean the audio before
using it as an input to other neural networks like caption maker, speech summarizer,
or sentiment analyzer. In the case of cochlear implants, it is very much necessary
because most of the patients complain of thermal noise in them. Thus, the necessity
of noise reduction mechanisms is undeniable.
2 Literature Review
The work entitled Two-Microphone noise reduction using spatial information-based

spectral amplitude estimation by Li et al. proposes a method that uses spatial infor-
mation to reduce noise [1]. Similar approaches are followed by present-day devices
for eliminating the noise from the speech signal. However, to follow this approach,
multiple mics are required to calculate the audio’s spatial data.
Then due to the availability of large data sets, machine learning approaches were
introduced to remove noise from the signals. The work titled a regression approach to
speech enhancement based on deep neural networks by Xu et al. provides a regression
approach to remove noise and enhance speech [2].
A special kind of neural network called RNN is introduced based on David Rumel-
hart’s work, which can accept inputs of variable lengths. This work has provided a
way to remove noise from the signal in real-time. Mozilla made an RNN based
denoiser called RNNoise. The Mozilla research RRNoise work shows how to apply
deep learning to noise suppression. It combines classic signal processing with deep
A Deep Learning Model for Stationary Audio Noise Reduction 485
learning. A paper named “Recurrent Neural Networks for Noise Reduction in Robust
ASR” [3] was published by Maas et al. It suggests an end-to-end deep recurrent neural
network with three hidden layers. It is presented how end-to-end neural networks if
they have enough data, can easily reduce a wide variety of noises.
The technical work on ‘speech processing for cochlear implants with the discrete
wavelet transform [4] suggests how differently from the traditional filter-bank spec-
tral analysis strategies are available. It is possible to analyze the speech signal using
the discrete wavelet transform (DWT). Preliminary tests were conducted to compare
the WT and the filter-bank analysis methods. Additionally, the intelligibility of the
speech processed with the proposed strategy was tested on ordinary hearing people
using the acoustic simulations, and a comparison was made with respect to traditional
CI algorithms.
The paper by Park and Lee suggests why a CNN is optimal for speech reduction
in real-time situations and provides a way to build a CNN for noise reduction and
evaluate it using SDR values [5].
The central part of the architecture is the neural network. This neural network gets the
Fourier transformed data of the noisy signal as the input, processes it, and returns the
Fourier transformed data of the clean signal. First, the Fourier transform is applied
on all the audio files, including the clean files and noisy files. The Fourier transform
of the noisy audio files is sent to the input side of the neural network, and the Fourier
transform of the clean files is sent to the output side of the neural network. Then the
neural network is trained.
Once the model is trained, it is used for making predictions. The noisy audio is
taken as input, and Fourier transform is applied to it. Then this data is fed into the
neural network to get a Fourier transform of the clean audio as the output. Inverse
Fourier transform is then applied on the output to get the final noise-free output
audio.
Figure 1 denotes the training model with Fourier analysis with input parameter as
X-Train and the output parameter as Y-Train. The sample rates used for the Fourier
transform must be optimum so that it does not result in much latency due to processing
and does not degrade the speech data in the original audio.
Dataset Collection: The dataset used in this work is obtained from floydhub [6].
The dataset contains recordings of speech audio files of different persons. Then the
noises are obtained from the urban noise dataset. These clean audios are combined
with noises which makes the overall dataset. This overall data set is saved as npy
(numpy) files so that we do not need to do the audio to numpy conversions every
time we need to train the model.
Model: A CNN-based model is used for this work. The CNN takes a 128 × 128 ×
1 matrix, which is obtained from Fourier transform of the audio signal as the input.
Fig. 1 Architecture for

training model
Each layer from next aims to reduce the matrix size by half and increase the kernels
by twice. The values are converted into features until the final matrix dimensions
become 8 × 8, and the kernels become 256. From the next layer, the exact opposite
happens, and finally, we get a 128 × 128 × 1 matrix which can be used to construct
the wave later. To reserve the original values, these layers are concatenated with the
previous layers of the same size.
Figure 2 demonstrates the model visualization of various layers in the CNN model,
which uses the Fourier transform of the audio signal as the input.
Post-processing follows the method exactly reverse of the preprocessing. It
contains three steps.
1. Rescale the output values with the same parameters used for scaling.
2. Change the shape of the array as required.
3. Apply inverse short-time Fourier analysis to convert the wave from frequency to
amplitude domain.
The last stage is amplitude scaling once after applying the Fourier and inverse
Fourier analysis, the wave’s amplitude reduces. The wave has to be amplified again.
This can be done by taking averages of the input wave and denoised wave, then
multiplying the amplitudes of the denoised wave with the ratio between the mean
amplitude of the input wave and the mean average of the denoised wave. Due to
this, the speech sounds slightly louder than the original audio. Because the original
audio has noise in it and the denoised audio does not. As we are making the averages
of noisy audio and denoised audio same, this slightly increases the loudness of the
speech signal.
Huber loss is used as the cost function as the noise reduction use-case can have a
lot of outliers and they have to be handled. The model converged nearly after 200
epochs as demonstrated in Fig. 3 which includes the training and validation loss.
The Fig. 4 denotes the waveform for the clean signal without noise attenuation and
Fig. 5 represents the waveform for the noisy signal. Figure 6 denotes the waveform
of denoised signal. Figure 7 denotes the various wavelengths of frequencies for
Fig. 2 Deep learning model visualization
spectrogram of clean audio signal. The Fig. 8 denotes the range of spectrogram for a
noisy signal with various levels of disturbances. The Fig. 9 denotes the spectrogram
of denoised and amplified signal based on range of frequencies in the spectrum. This
shows the efficacy of the proposed architecture.
Signal to distortion ratio: SDR Eq. (1) is used to measure the error between clean
signal and denoised signal.
||y||2
SDR = 10 log (1)
|| f (x) − y||2
Fig. 3 Convergence of the

model
Fig. 4 Waveform of clean

signal
Fig. 5 Waveform of noisy

signal
The paper by Park and Lee [5], which is used for comparison, gives SDR as
8.62 using 15 convolutional layers. As tabulated in Table 1, the SDR obtained by
the proposed method was 8.64 using 11 convolutional layers. The reduction in the
number of layers reduces a significant number of parameters of the neural network
and hence improves the latency and computational power requirement.
Fig. 6 Waveform of
denoised signal
Fig. 7 Spectrogram of clean

audio signal
Fig. 8 Spectrogram of noisy

signal
The proposed work provides a completely software-based solution for noise reduc-
tion, which reduces hardware cost and space requirements in phone manufacturing.
Even though this work handles many noises, many more new kinds of noises can be
added to the training set, and generalization can be achieved. Short frequency noises
like non-stationary noises can be achieved by reducing the sampling rates, which can
Fig. 9 Spectrogram of
denoised and amplified
signal
Table 1 Comparative
Park and Lee [5] Proposed work
analysis
SDR 8.62 8.64
No. of layers 15 11
cause more latency which the present-day mobiles cannot handle. This whole work
is based on python. Cython can be used to optimize the builds, and then the resultant
files can be used for the end deployment.
References
1. Li K, Guo Y, Fu Q, Li J, Yan Y (2012) Two-microphone noise reduction using spatial information-

based spectral amplitude estimation. IEICE Trans Inf Syst 95(5):1454–64
2. Xu Y, Du J, Dai LR, Lee CH (2014) A regression approach to speech enhancement based on
deep neural networks. IEEE/ACM Trans Audio Speech Lang Process 23(1):7–19
3. Maas A, Le QV, O’neil TM, Vinyals O, Nguyen P, Ng AY (2012) Recurrent neural networks for
noise reduction in robust ASR
4. Paglialonga A, Tognola G, Baselli G, Parazzini M, Ravazzani P, Grandori F (2006) Speech
processing for cochlear implants with the discrete wavelet transform: feasibility study and perfor-
mance evaluation.“ In: International conference of the IEEE engineering in medicine and biology
society, pp 3763–3766. IEEE
5. Park SR, Lee J (2016) A fully convolutional neural network for speech enhancement. arXiv:
1609.07132
6. Noisy audio dataset. https://blog.floydhub.com
Optimizing Deep Neural Network
for Viewpoint Detection in 360-Degree
Images
Surya Raj and Ansuman Mahapatra
Abstract 360° image enables the user to interact with the view and explore the
whole environment around the camera. As there can be infinite number of viewports
in a 360° image, the work of viewer becomes cumbersome and confusing. This work
aims to study 360° images and to classify those into 10 different categories based on
places and then to predict the viewpoint using deep CNN architectures. This study
explores the advantages of transfer learning and uses the same to create a classifier
to classify the 360° images into different categories. Further, two approaches are
proposed to predict the viewpoint in a 360° image to recommend the viewer best
viewport.
Keywords 360° images · Viewport prediction · Best view synthesis · CNN · Place
classification · Spherical video
1 Introduction
360° images (Spherical image) and videos have taken over the world because of its
effectiveness in providing the users with a rich immersive experience to the view or
scenario. With the emergence of virtual reality as a mainstream trend, 360° images
and videos have become more and more popular. The users can explore the whole
view using their mobile phone or computer screen.
Due to COVID19 lockdown, there is a huge growth in interest of people for
virtual tours of places. Cheap 360° consumer cameras help to supercharge the 360°
contents on internet. 360° photos and videos provide a controllable spherical view
that surrounds the center point from which the shot was taken. These images provide
the user with the flexibility to look around the place where the shot was taken.
However, as the viewer can view only a part of the whole sphere at any particular
time, it is a tedious task for the viewer to identify where to look at.
S. Raj · A. Mahapatra (B)

National Institute of Technology Puducherry, Karaikal, Puducherry 609602, India
e-mail: ansuman.mahapatra@nitpy.ac.in
492 S. Raj and A. Mahapatra
Identifying the best viewport is one main problem for which a solution is sought
after. There are only few attempts and researches in this area. Classifying 360°
images into different categories can actually make use of the many already established
models that are trained over millions of normal limited field of view images. Though
the curved view in 360° images have different features compared to limited field of
view images, the study shows that working on an already established model provides
better accuracy. The limited availability of data is one of the biggest challenges in
establishing an accurate model.
Deep learning models helps in studying the complex features of images that helps
in distinguishing them into different categories. Several state-of-art deep learning
architectures successfully classified limited view images into various categories.
Dataset is one such challenging resource that need to be fed to the model for good
results. Both quality of coverage and quantity of images in the dataset greatly influ-
ences the end results. Various standard benchmarks for datasets are available for
limited view images. The number of datasets available for 360° image processing is
very limited.
Captured pictures have the potential to hold one or more important viewports.
The number of subjects in a limited field of view image is lesser compared to a 360°
image. Often all the subjects in a 360° image are not of importance to the user. The
most important or interesting subject in a 360° image is called as the viewpoint.
It helps in showing the user the most important subject of interest in the picture.
They need not turn the 360° image around to check out the main subject of interest.
Viewpoint provides the user, the hint on main subject of interest and based on the
same they can decide if they want to see beyond the viewpoint.
The objective of this work is to explore 360° images, various suitable architectures
and methods that can be used to classify the images and predict the viewpoint in the
given 360° image.
The rest of the paper is organized as follows. In Sect. 2, the literature survey
required for the work is discussed. Then in Sect. 3, the existing tools and technologies
available which are used for the implementation are explained. Section 4 demon-
strates all the implementation details and the results. Finally, the conclusion and
Results are discussed in Sect. 5.
2 Literature Review
The study related to this work is done in (1) 360° images, (2) choosing dataset, (3)
preprocessing techniques used in 360° image (4) various architectures and method-
ologies. The author Xiao et. al. devises a model to classify given limited view photo
to the place category of the panorama to which it belongs [1]. The dataset consists of
various panoramas that is categorized based on place shown using amazon mechan-
ical truck workers. The model is first trained with the limited view photos from the
panorama images. The model takes advantage of the symmetry of images to finds
the best viewpoint of the observer, that is the direction towards which the observer
Optimizing Deep Neural Network … 493
is facing the view. They used a two-staged greedy algorithm. In the first stage it
predicts the category of limited view photo. There are 26 panorama categories. The
given limited view photo will be classified to one of the 26 categories. In the second
stage of the algorithm the panorama will be aligned and the observer’s direction is
identified. The model in addition to these also finds the average representation of
each panorama category. Given a limited view photo, the view beyond the given
image can be filled up using the average representation of panorama. They have used
the support vector machine (SVM) model for the classification of photo and identifi-
cation of the viewpoint. In the first iteration of training they used the sample limited
view photos from one panorama to train the model. Then used the model to predict
for new panoramas. The one with highest confidence score is then used to train the
model again. Thus, the implemented a greedy approach to train the model. They
got an accuracy of 51.9% for SUN360 dataset place classification, 50.2% for view-
point prediction and 24.2% accuracy combining place classification and viewpoint
prediction.
The authors Raja et al. worked on a KNN classifier to achieve reasonable perfor-
mance on classification of images into indoor and outdoor by using limited resources
[2]. The work provided better accuracy considering the limited resources used. Their
work mainly consists of training the KNN model using a small dataset and using
the features learned during the training phase to predict for new queries. Instead of
using RGB color model for the images, the authors choose to use HSV color model.
HSV color model is based on how human perceives colors and hence it is considered
a more natural way to capture the details. The low-level features of the images like
color, texture, entropy is considered important factors that will help in better accu-
racy. The proposed methodology was experimented in two different datasets. The
first dataset IITM-SCID2 consist of 907 images and the second one was consisted
of 3442 images that are downloaded from the internet.
The authors Li et al. devised a model, viewport-based convolutional neural
network (V-CNN) for viewport predictions [3]. The work is based on studying the
real head movement and eye movement of the observer viewing 360 video images.
The work consists of three stages. The first stage consists of viewpoint prediction
network. In this stage the various viewpoints from the 360° images are identified
from the data. The second stage consist of viewpoint alignment methods and view-
point quality network. The viewpoint alignment method consists of techniques to
map the angular viewpoint obtained in first stage to a flat plane. This helps in smooth
processing of the viewpoints in further stages. The quality of the aligned view-
point is assessed using the viewpoint quality network. In the final stage, the video
quality assessment (VQA) of videos is made using the score of individual view-
points obtained in the second stage. The score of all viewpoints from the second
stage for a given video is integrated together to form the video quality assessment
score of videos. They used CNN network as the base model. They performed the
experiment on 360° video dataset VQA-ODV. It consists of 540 360° videos. They
used 432 for training purpose and 108 for testing purpose. In addition to the videos,
the dataset consists of head and eye movement of 200+ subjects while watching the
video. Accuracy of the system is measured in Normalized scan path Saliency and
correlation coefficient.
The author Shahryar Afzal et al. studied YouTube 360° videos in comparison
to normal videos and found that 360° videos demand about 3 times the resolution
of normal videos [4]. This causes higher bandwidth usage of the network. They
developed a technique to find the resolution requirement of 360° videos based on
the field of view of VR headset. These techniques help to reduce the bandwidth
requirement of 360° videos in a network.
The authors Ali Caglayan et al. created a new standard database for image clas-
sification problems [5]. The new database consists of 10 million images categorized
into 365 different categories. The exhaustiveness of the database was then measured
with respect to various classic architectures like GoogleNet, VGG and ResNet etc.
The study showed that the accuracy of these architectures that were trained on top of
this dataset-place365 have better performance that the other popular databases like
ImageNet88 and SUN88. The models that was solely trained on Places365 dataset
only showed 6% loss compared to other models that were trained in addition to 1.2
million images. The study shows the exhaustiveness and robustness of the Place365
dataset. The standard datasets like SUN contains a large coverage of various place
and scene categories. Though the dataset was exhaustive, it lacked the quantity. The
deep learning algorithms need to be fed with large amount of data for good results.
The SUN contained 397 categories and a total of around 1 lakh images. The places365
dataset attempted to overcome this drawback and succeeded in the effort. The Places
dataset contains 10,624,928 images from 434 different categories. There are four
benchmarks for this dataset and they are Places365-standard, Places35-Challenge,
Places205 and places88. Each of the subset contains different quantity of images for
training, validation and testing.
The authors Qin Yang et al. a model to predict the future viewport the users will
be interested in based on eye movement while watching 360 videos [6]. Further they
extend the study to find the interested viewports in a future duration also called as
viewport trajectory. They also combined RNN and CFVT to explore the correla-
tion between viewport and video content. They were able to improve the accuracy
up to 40%. They used CNN for viewport prediction, RNN for viewport trajectory
prediction and CFVT for content aware viewport prediction. The dataset used for
the experiment consist of head motion data of 153 different volunteers for each of
16 different 360-video images. There were 985 views in total. The viewpoint trajec-
tory predicts the various viewpoints the observer may look at in future duration. For
achieving this, they used many CNN models. Each of the CNN model predicted
one of the possible viewpoints from the trajectory. Thus, combining the result of all
the models, there will be many viewpoints predicted from the head movements of
the observer. The correlation filter-based viewpoint tracker performed content aware
viewpoint predictions. It helps in finding a target in a normal video. The spherical
image is mapped to a plane surface before given to process via the model. The
combination of RNN to CFVT helped in increasing the accuracy of the model.
The author Karen Simonyan et al. designed the classic Convolutional Network
Architecture, VGG16 [7]. The architecture was a breakthrough for its time. The
authors studied the performance of deep CNN by varying the depth of convolutional
layers and keeping all other parameters constant. The best of the same models won
the ImageNet challenge 2014. They designed about 5 models by varying the number
of convolutional layers in each of them. All other parameters are kept constant. All
the models consisted of 5 max pool layer, 3 fully connected layer and a final soft
max layer. The number of convolutional layers varied from the 1st model to the 5th
model. They started with 8 convolutional layers in the first and added more convo-
lutional layer to subsequent model. The last model has 16 convolutional layers. Not
all convolutional layer is followed by a max pool layer. The width of convolutional
layer is rather small, starting from 64 in the first layer and increases by a factor of 2.
The number of parameters required is considerably small compared to other shallow
neural network with larger widths. The model was trained on ImageNet dataset and
have shown good performance.
Two different approaches are proposed to predict the viewpoint in a 360° image. The
first approach is viewpoint prediction using feature map (Sect. 3.1) and the second
one is viewpoint prediction using object detection models (Sect. 3.2).
3.1 Viewpoint Prediction Using Feature Map
Feature map of an image is the feature vector extracted by the layers of convolutional
neural network. The convolution operation on a convnet filter and an input image
results in a feature map. Each layers of a CNN are capable of detecting specific
features in an image. The filters in lower layers helps in detecting subtle features like
edges, lines etc. whereas the filters in higher layers captures for complex patterns
specific to a class. A model trained to classify images will learn specific common
feature in that class and will show the same as the most activated region for that class.
For Example, what distinguish a bedroom from a classroom is the presence of bed
in the bedroom. There is a high probability that a model trained to classify bedroom
from a classroom will capture the bed as the most activated region for all images
that contain a bed. And most often bed becomes one of the most sought-after sight
in a bedroom and hence it is a potential viewpoint. That is, what the model captures
in a class as the most activated region for a class of images has the potential to be
a viewpoint. The intuition in using this approach is that the final feature map of an
image will contain the subject that is specific to a given class, often the viewpoint,
as the most activated region. The Fig. 1 shows the flowchart of first approach.
Various pre-trained models are trained to fit and classify 360° image dataset
SUN360 to different classes. The dataset contained 80 categories and a total of
67,583 panoramic images. Due to computational constraints, only a total of 700
Fig. 1 Flowchart of approach 1
images belonging to 10 categories are used. Each of the 10 classes contained 50

images, among that 40 images are used for training and 10 images for validation
and test set respectively. The classic DNN architectures like VGG16, ResNET50
are trained on the dataset. The best hyperparameters for the model are set based on
trial-and-error method. The best result is obtained for model with ResNET50 as the
base architecture. The ResNET50 architecture have skip connection and it helps in
transferring the raw data from the previous input to the next layer as well. This helps
later layers in not missing out information.
The feature maps of trained images are then obtained to find the most activated
region. The corresponding region in the original image is then cropped to perform
the subjective evaluation. The Fig. 2 shows the feature map and the corresponding
region in a trained image.
On subjective evaluation it is noted that the model considers a combination of
subjects in a class to distinguish it from other classes. For example, in the Fig. 2
the viewpoint expected are the balcony or the bed. But on considering one-unit area
around the most activated region, many subjects are coming to the frame. It includes
part of bed, door, mirror, ceiling and chair. Thus, it is concluded that the accuracy
for this approach is low (Fig. 3).
Fig. 2 Viewpoint as obtained from the feature map of the image

Fig. 3 a 360-image and, b annotated viewpoints
Fig. 4 Flowchart of approach 2
3.2 Viewpoint Prediction Using Object Detection Model
In this approach, the potential of object detection model to predict the viewpoint
in a 360° image is explored. The flowchart for the same is illustrated in Fig. 4. The
difficult task in using object detection model is to Pre-process the data. Pre-processing
of data require annotating the viewpoints in each of the 360° image. Annotating the
viewpoint is manual work. The Sun 360 dataset is used for this experiment [1]. The
viewpoint is marked in an XML file corresponding to an image. In case an image
contains more than 1 viewpoint, the same is annotated by mentioning the diagonal
coordinates of those viewpoints in the XML file. 100 images are used for training,
30 images for testing in each class. Figure 3a and b shows a sample frame and the
annotated viewpoint respectively. The Faster RCNN is used to train and fit the dataset.
Two different approaches are proposed in this study to predict the viewpoint in 360°
image.
It is observed from the Table 1 is that the classification result obtained from
pretrained model of RESNET50 is better compared to the method of SVM proposed
in [1]. However, the first approach gave poor performance on subjective evaluation.
The second approach have shown better results. The metric mean Average Preci-
sion (mAP) is used to evaluate the model used in the latter approach. Average Preci-
sion (AP) is very similar to F1-score in that it provides a harmonic mean value
between precision and recall. Higher the score better is the model in predicting the
viewpoint. The mAP and AP become the same only when a single class of view-
point is detected. Keeping the learning rate at 0.0001 and varying the epochs, the
Tables 2 and 3 shows the AP score obtained for two different classes. It contains
the various mAP score obtained for two different classes trained separately on the
model proposed in this paper. The score depends on the type of viewpoint in each
class as well. For example, for bedroom the viewpoint is less complex compared to
the viewpoint in a church. In bedroom, mostly the viewpoint is the bed whereas in
the church the viewpoint is the altar. The altar has more complex features compared
to a bed in terms of pattern, arrangements, the details and it varies a lot from one
image to another within a class. So even with more epochs the mAP score obtained
is only 61%, compared to the class bedroom which have a mAP score of 86%.
Based on the results of object detection model, the viewpoint in 360° images is
predicted. The Fig. 5 shows original image, manual annotations and prediction by
the model. The so obtained results from the trained model can be used for viewpoint
detection. The model has shown an average recall value of 75% and an average
precision value of 65%. The recall is higher for the model and it means that model
detects most of the viewpoints correctly. In the sample prediction shown in Fig. 5c,
it is clear that the model predicted the positive sample correctly. Along with that,
the reflection of the bed in the mirror and object like table got predicted as false
positives.
A comparative evaluation of the model based on Faster-RCNN proposed in this
study and various others are shown in the below table. Various authors have used
Table 1 Comparison of classification model

Model Learning rate Epochs Number of batches Epochs (%)
VGG16 0.0001 50 32 75
ResNET50 0.0001 70 32 86
Xiao [1] – – – 51
Table 2 Evaluation result of

Epochs AP (%)
viewpoint prediction model
for class bedroom 50 82
100 84
150 85
200 85
Table 3 Evaluation result of

Epochs AP (%)
viewpoint prediction model
for class church 50 60
100 61
150 61
200 61
Fig. 5 a Original 360° image, b manually annotated image, c viewpoint predicted by model
different metrics to evaluate their results. Xiao et al. [1] evaluated the viewpoint
prediction using the metric precision. The model proposed in this study got better
result compared to the same. The model based on YOLO architecture used by Yang
et al. [8] obtained a mAP score of 30.29% for 360° image. Thus, on evaluation of
both metric with the existing methodologies, the proposed model has obtained better
results (Tables 4and5).
Table 4 Comparative
Model mAP
analysis of models (mAP)
Proposed 70.5
Yang [8] 30.29
Table 5 Comparative
Model Precision
analysis of models (Precision)
Proposed 65
Xiao [1] 50.2
The proposed work is implemented to study 360° images and various deep learning
architectures through which the 360° images can be classified into various classes
and to predict viewpoints. This study uses the transfer learning approach to classify
the images into 10 categories. Viewpoints extracted from the Feature map of images
in classification models give lesser accuracy as compared to object detection models
like Faster-RCNN. It showed good results compared to existing models. 86% mAP is
obtained in predicting the viewpoint in the Bedroom class. This work can be further
extended to experiment with larger datasets and more classic and new state of art
architectures.
References
1. Xiao J, Ehinger KA, Oliva A, Torralba A (2012) Recognizing scene viewpoint using panoramic
place representation. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition.
IEEE, pp 2695–2702
2. Raja R, Roomi SM, Dharmalakshmi D, Rohini S (2013) Classification of indoor/outdoor
scene. In: 2013 IEEE International Conference on Computational Intelligence and Computing
Research. IEEE, pp 1–4
3. Li C, Xu M, Jiang L, Zhang S, Tao X (2019) Viewport proposal CNN for 360 video quality assess-
ment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
IEEE, pp 10169–10178
4. Afzal S, Chen J, Ramakrishnan KK (2017) Characterization of 360-degree videos. In:
Proceedings of the Workshop on Virtual Reality and Augmented Reality Network, pp 1–6
5. Caglayan A, Imamoglu N, Can AB, Nakamura R (2020) When CNNs meet random RNNs:
towards multi-level analysis for RGB-D object and scene recognition. arXiv preprint arXiv:
2004.12349
6. Yang Q, Zou J, Tang K, Li C, Xiong H (2019) Single and se-quential viewports prediction for
360-degree video streaming. In: 2019 IEEE International Symposium on Circuits and Systems
(ISCAS). IEEE, pp 1–5
recognition. arXiv preprint arXiv:1409.1556
8. Yang W, Qian Y, Kämäräinen JK, Cricri F, Fan L (2018) Object detection in equirectangular
panorama. In: 2018 24th International Conference on Pattern Recognition (ICPR). IEEE, pp
2190–2195
ConvNet of Deep Learning in Plant
Disease Detection
J. Gajavalli and S. Jeyalaksshmi
Abstract Balancing the yield and population of the country is most important factor
faced by the farmers in the agricultural field. The maximum number of farmers
all over the world are still struggling due to natural disasters, unexpected rainfall,
nutrient deficiency in the soil, etc. But, above all, the major key problem is pest
infection. Many researchers used various techniques to find out the plant diseases.
Deep learning technique is widely used to find out a solution to the image-oriented
problems using convolutional neural network. CNN (ConvNet) neural network model
of deep learning is effective and efficient technique for analysing an image. This work
compares the different models used to detect plant diseases using CNN. Finally, this
research paper outlines the existing achievements, limitations and suggestions for
future plant disease detection research using the convolutional neural network.
Keywords Image processing · Convolutional neural network · Deep learning ·

AlexNet · GoogLeNet
1 Introduction
Plant infections are regular problems in the farmers’ life, since most of them are
unaware of Plant Pathology, study of plant diseases. Normally, plant diseases are
classified under the two major categories, Biotic and Abiotic. Plants infected due
to microorganism come under Biotic as well as in Abiotic plant diseases caused by
natural factors like temperature, rainfall, humidity, etc. (Fig. 1) [1].
Population pressure is one of the most important facts that rules the Indian Agricul-
ture. Directly or indirectly seventy percentage of the population depends on farmers
for their food. Agricultural production is basically affected by various problems like
J. Gajavalli (B) · S. Jeyalaksshmi

Vels Institute of Science, Technology and Advanced Studies, Chennai, Tamil Nadu, India
e-mail: gajavallimaheshwaran@gmail.com
S. Jeyalaksshmi
e-mail: pravija.lakshmi@gmail.com
502 J. Gajavalli and S. Jeyalaksshmi
Plant Diseases
Biotic Abiotic
Factors: Factors:
Virus Temperature
Bacteria Rainfall
Fungus Humidity
Nutrient
deficiency
Fig. 1 This figure explains the various factors of plant diseases
the lack in scientific methodology, lack in proper knowledge of fertilizers, insecticides

and pesticides (Fig. 2, Table 1) [2, 3].
From the farmer’s point of view, diagnosing the plant infection in early stage
is most important process in the agricultural field [5]. Most of the plant infections
are analysed by using the leaves of the plant. Computer vision in diagnosing the
plant disease in the early stage of plant life cycle is suggested. Various researchers
have implemented numerous methods and techniques using traditional methods of
Fig. 2 Production versus

consumption chart clearly
depicts the year wise
consumption with respect to
productions in tonnes that
highlights the existence of
population pressure [4]
ConvNet of Deep Learning in Plant Disease Detection 503
Table 1 Crops affected by microorganisms table describes some microorganism causing plant
disease images, disease name, affected crops and their symptoms
Infected Leaves images Diseases Crops affected Symptoms
Black rot Brassicas Leaves margin with
yellow to light brown
Bacterial canker Tomato Capsicum Leaf corners getting

Chilli turned to brown with
yellow border
Bacterial spot Lettuce Cucurbits Leaves areas in

Tomato brownish black colour
Capsicum circular shape
Mildew Onion leeks Powdery mildew, i.e.

Garlic pale yellow leaf spots
Rust Beans Orange or

String Beans yellow spots on the
leaves
Mosaic Tomato Leaves getting curled,

Carrot malformed, or reduced
Celery in size
image processing. In the next milestone of image processing, the deep learning
technique is widely represented to implement the identification of affected region of
the plant; especially convolutional neural network models have created and achieved
the required accuracy rate in the findings.
2 Image Processing
The image processing concept is used to perform some operations on digital images
that enable to find the solution for image related problems. Image processing consists
of a series of phases that are Image Acquisition, Image preprocessing, Image
segmentation, Feature extraction and detection and classification (Fig. 3) [6].
Image
Acquisition
Detection Image
& Pre-
Classification processing
Feature Image
Extraction Segmentation
Fig. 3 Phases of image processing
2.1 Image Acquisition
Image acquisition phase is a process of collecting image inputs. Digital form of data
is collected using hardware’s like Cameras and Scanners, etc. This is the initial step
of the image processing.
2.2 Image Preprocessing
This stage is used to improve the image for further operations like removing unwanted
distortions and enhancing the image features for further analysis. Important tech-
niques were used for preprocessing such as sizing, noise reduction, brightness
corrections and geometric transformations.
2.3 Image Segmentation
In image segmentation stage, the input image is divided into multiple portions that
are used to implement the similar portions of similar features for further analysing.
Understanding the image details with the required parameters extracted from the
image is known as feature extraction phase. An extracted piece of information will
describe more details about an image. This feature’s parameters are texture, shape
and colour of an image, which plays an important role in further processing.
2.5 Detection and Classification
Classification and detection phase is used to find out the actual region of an image,
based on the extracted features. Various classification and detection methodologies
are implemented to identify the diseased portion of the leaf. The familiar classifica-
tion methods are artificial neural network, probabilistic neural network, K-nearest
neighbour, SVM and back propagation, etc.
3 Deep Learning
Deep learning is the subsection of machine learning. Here learning and predicting of
the model is trained using series of neurons, like functions of the human brain. Deep
leaning algorithm creates the features and models by its own, not by the manual
definition. More labelled data is required to train a deep learning model for best
accuracy [7]. The information is processed by the many hierarchical layers of deep
learning in a non-linear manner, in that the lower level features and concepts help to
define the higher level features and concepts. The following are famous types of deep
learning neural networks, such as convolutional neural networks (CNN), recurrent
neural networks (RNN), artificial neural networks (ANN) (Fig. 4) [8].
4 Convolutional Neural Network
CNN of deep learning is used to perform computer vision on an image and video
recognition. CNN is advanced than ANN, this neural network model is used in
Fig. 4 This deep neural network diagram shows the process flow of finding the predicted output
[9]
Fig. 5 CNN diagram describes the architecture of convolutional neural network [11]
image recognition and classification. CNN works on pixel of images, then train the
model for extracting the features for better classification [10]. The layers of a CNN
consist of (i) input layer, (ii) output layer, (iii) hidden layer that includes, (a) multiple
convolutional layers, (b) pooling layers, (c) fully connected layers (d) normalization
layers (Fig. 5).
The existing research work on plant disease detection focuses on the traditional
method of image processing concepts and its phases used to detect the diseased part
of an input image. Currently, researchers are turning to focus on the convolutional
neural network architecture of deep learning for detecting the infected region from
leaf. The familiar convolutional neural network milestone models are listed in Table
2 [12].
This research paper is a comparative study of research on CNN architecture which
is used by various researchers to detect the plant diseases. This study is structured
with the collected information in comparative form by the following three phases:
Data collection, data augmentation and data detection and classification (Fig. 6).
Table 2 Milestone of various CNN models

CNN models Paper title Author and year
LeNet-5 “Gradient-based learning applied to document Lecun et al. [13]
recognition”
AlexNet “ImageNet classification with deep convolutional neural Krizhevsky et al. [14]
networks”
VGG “Very deep convolutional networks for large-scale image Simonyan et al. [15]
recognition”
GoogLeNet “Going deeper with convolutions” Szegedy [16]
ResNet “Deep residual learning for image recognition” He [17]
Fig. 6 Process flow of CNN

architecture consists of three
major phases, which are data
acquisition, data
preprocessing and detection
and classification. Data
acquisition concentrates on
the collection of data set
which are labelled or
unlabelled according to the
implementation. In
preprocessing phase, the
collected data cleaned and
augmented for creating the
model. Detection and
classification phase is
responsible for creating the
required model which is
trained and evaluated as per
the desired output
4.1 Data Collection Phase
This phase describes about the dataset collection by the various researchers and it
defines about the capturing the images directly from the field, existing repositories
Table 3 Comparison of data collection phase of CNN of various researchers

Authors and year Plant common name Source of dataset No. of classes No of images
Prabhakar et al. Tomato open access 38 classes 50,000
[18] database of Plant
village dataset
Chhillar et al. [19] Corn, pepper, potato RGB format from 4 classes 950
tomato Kaggle
Toda et al. [20] 14 Crop species http://github.com/ N/A 54,306
spMohanty/PlantV
illage-Dataset
Hari et al. [21] Maize, Grape, Open-source 10 classes 14,810
Tomato dataset from plant
Potato village
Konstantinos et al. Apple, Watermelon, Collected from 58 classes 87,848
[22] Banana, Strawberry, laboratory and
Blueberry, Tomato, cultivation fields
Cabbage, Squash,
Cantaloupe, Orange,
Cassava, Eggplant,
Celery, Potato,
Cherry, Peach, Corn,
Pumpkin,
Cucumber,
Raspberry, Gourd,
Soybean, Grape,
Onion, Pepper
Ashraf et al. [23] Maize Kaggle 4 15,408
classes
Gupta et al. [24] Tomato, Apple, Corn Plant village 24 classes 31,048
Potato, Grapes
Gutierrez et al. Tomato Plant village 7 classes 4331
[25]
Wang et al. [26] Apple Plant village 4 classes 2086
and certain dataset. Image quality is most important for further processing, whilst
collecting the images quality measures are adhered by the researchers. Table 3 listed
the source dataset according to the plant.
4.2 Data Augmentation Phase
The data augmentation phase is used to increase the amount of dataset by adding
slightly modified copies of existing images in the dataset. Popular augmentation
techniques are Crop, Scale, Flip, Rotation, Translation, sizing and Gaussian Noise.
Normally, image preprocessing is recommended to remove unwanted portion and
enhancing the image for further processing [27]. CNN input layer accepts the images
in the following sizing 227 × 227 for AlexNet, 224 × 224 for other architecture like
DenseNet, ResNet and VGG Table 4, [18].
Table 4 Comparison of data augmentation phase of CNN describes the sizing and augmentation
techniques used in their research work
Authors and year Plant common name Image sizing dimension Augmentation
techniques
Prabhakar Tomato AlexNet 227 × 227 Rotation translation
et al. [18] other network 224 × scaling
224
Chhillar Corn, Pepper, Potato, Image dimension in 256 Flipping, rotation,
et al. [19] Tomato × 256 zooming, flipping
Toda et al. [20] 14 crop species 224 × 224 N/A
Konstantinos et al. Apple, Watermelon, 256 × 256 Size reduction,
[22] Banana, Strawberry, cropping
Blueberry, Tomato,
Cabbage, Squash,
Cantaloupe, Orange,
Cassava, Eggplant,
Celery, Potato, Cherry,
Peach, Corn, Pumpkin,
Cucumber, Raspberry,
Gourd, Soybean,
Grape, Onion, Pepper
Darwish et al. [23] Maize 256 × 256 Rotation, shear
Fill mode, width
shift
Height shift,
horizontal Flip,
zoom
Gupta et al. [24] Tomato, Apple, Corn, N/A Resize,
Potato, Grapes segmentation
Crop, flipping
rotation, zooming
Noise Removal
Background
Removal
Gutierrez et al. [25] Tomato N/A Flipping, rotation,
crop
Wang et al. [26] Apple Shallow Networks—256 Resize
× 256 Flipping
VGG16, VGG19, and Rotation
ResNet50—224 × 224 Zooming
InceptionV3—299 ×
299
Table 5 The comparison of various CNN models used for identifying the plant diseases
Authors and year Plant name CNN architecture Accuracy rate
Prabhakar et al. [18] Tomato ResNet101 Training—97.6%
and testing—94.6%
Chhillar et al. [19] Corn, Pepper, Potato, CNN 96.54% accuracy
Tomato achieved
Hari et al. [21] Maize, Grape, Tomato, PDDNN—17 layers Implemented from
Potato Scratch and produces
accuracy of 86%
Konstantinos et al. [22] Apple, Watermelon, VGG 99.53%
Banana, Strawberry,
Blueberry, Tomato,
Cabbage, Squash,
Cantaloupe, Orange,
Cassava, Eggplant,
Celery, Potato, Cherry,
Peach, Corn, Pumpkin,
Cucumber, Raspberry,
Gourd, Soybean,
Grape, Onion, Pepper
Darwish et al. [23] Maize VGG16 97.9%
VGG19 97.7%
AE Model 98.2%
Gupta et al. [24] Tomato, Apple, Corn, VGG13 95.21%
Potato, Grapes
Gutierrez et al. [25] Tomato Faster RCNN 82.51%
Wang et al. [26] Apple VGG16 90.40%
Lu et al. [29] Rice AlexNet 95.48%
Mohanty Sharada et al. 14 crop species AlexNet and 99.34%
[30] GoogleNet
4.3 Data Detection and Classification Phase
This phase explains the detection and classification of plant diseases that are exam-
ined only using different CNN architecture for image classification. The most
familiar CNN architectures are [28]: (i) LeNet-5, (ii) AlexNet, (iii) VGG-16, (iv)
Inception-V1, (v) Inception-v3, (vi) ResNet-50, (vii), Xception, (viii) Inception-v4,
(ix) Inception—ResNets x) ResNeXt-50 (Table 5) (Fig. 7).
5 Conclusion
Automation in the agricultural field is essential for balancing the population pres-
sure. Automatic plant disease detection is a required automation in the agricultural
ACCURACY RATE OF VARIOUS MODELS

OF CNN
Accuray range in percentage
120 97.6 96.54 99.53 98.2 95.21 90.4 95.48 99.34
100 86 82.51
80
60
40
20
0
CNN Models
Fig. 7 The accuracy rate of different models of the CNN architecture is visualized using above
chart
field to promote the production, which helps to decrease the farmers’ loss. Automatic
plant disease detection is achieved by traditional image processing methods and algo-
rithms. Currently, many researchers are focussed on deep learning of CNN archi-
tecture for analysing an image and detecting the required region, especially in plant
disease detection various CNN models are involved to identify the diseased region.
This comparative study highlights many researcher’s implementation methods on
plant disease detection using convolutional neural network of deep learning. The
collected details are projected in the three phases of CNN (i) data collection, (ii) data
augmentation and (iii) data detection and classification. This research motivates to
find the best CNN architecture in plant disease detection in future.
References
1. Vishnoi VK, Kumar K, Kumar B (2021) Plant disease detection using computational intel-
ligence and image processing. J Plant Diseases Protection 128(1):19–53. https://doi.org/10.
1007/s41348-020-00368-0
2. Goyal SK, Rai JP, Singh SR (2016) Indian agriculture and farmers problems and reforms
3. Jeyalaksshmi S, Rama V, Suseendran G (2019) Data mining in soil & plant nutrient manage-
ment, recent advances and future challenges in organic crops. Int J Recent Technol Eng 8(2)
S11:pp 213–216. https://www.ijrte.org/wpcontent/uploads/papers/v8i2S11/B10350982S1119
4. https://www.world-grain.com/articles/13645-focus-on-india
5. Rama V, Jeyalaksshmi S (2019) Data mining based integrated nutrient and soil management
system for agriculture—a survey. CIKITUSI J Multidisciplinary Res 6(5):144–147, ISSN NO:
0975-6876. http://www.cikitusi.com
6. Devaraj A, Rathan K, Sarvepalli J, Indira K (2019) In: IEEE 2019 international conference
on communication and signal processing (ICCSP)—Chennai, India (2019.4.4–2019.4.6)]. In:
2019 International conference on communication and signal processing (ICCSP)—identifica-
tion of plant disease using image processing technique, pp 0749–0753. https://doi.org/10.1109/
ICCSP.2019.8698056
7. https://www.smlease.com/entries/technology/machine-learning-vs-deep-learning-what-is-
the-difference-between-ml-and-dl
8. https://roboticsbiz.com/different-types-of-deep-learning-models-explained
9. https://commons.wikimedia.org/wiki/File:MultiLayerNeuralNetworkBigger_english.png
10. Amara J, Bouaziz B, Algergawy A (2017) A deep learning-based approach for banana leaf
diseases classification. In: Mitschang B, Nicklas D, Leymann F, Schöning H, Herschel M,
Teubner J, Härder T, Kopp O, Wieland M (Hrsg.) Datenbanksystemefür Bus Technol und Web
(BTW 2017)—orkshopband Bonn Gesellschaft fürInformatike V (S. 79–88)
11. https://docs.ecognition.com/eCognition_documentation/User%20Guide%20Developer/8%
20Classification%20-%20Deep%20Learning.htm
12. https://machinelearningmastery.com/review-of-architectural-innovations-for-convolutional-
neural-networks-for-image-classification/
13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document
recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791
14. Krizhevsky A, Sutskever I, HintonImageNet GE (2012) classification with deep convolutional
neural networks part of advances in neural information processing systems 25 (NIPS 2012)
15. Simonyan K, Zisserman A (2015) Computer science > computer vision and pattern recogni-
tion [Submitted on 4 Sep 2014 (v1), last revised 10 Apr 2015 (this version, v6)] Very deep
convolutional networks for large-scale image recognition
16. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich
A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer
vision and pattern recognition, pp 1–9
17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In:
Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
18. Prabhakar M, Purushothaman R, Awasthi DP (2020) Deep learning based assessment of disease
severity for early blight in tomato crop. Multimed Tools Appl 79:28773–28784. https://doi.org/
10.1007/s11042-020-09461-w
19. Chhillar A, Thakur S (2021) Plant disease detection using image classification. In: Tiwari S,
Suryani E, Ng AK, Mishra KK, Singh N (eds) Proceedings of international conference on big
data, machine learning and their applications. Lecture notes in networks and systems, vol 150.
Springer, Singapore. https://doi.org/10.1007/978-981-15-8377-3_23
20. Toda Y, Okura F (2019) How convolutional neural networks diagnose plant disease. Plant
Phenomics, p 14, ArticleID 9237136. https://doi.org/10.34133/2019/9237136
21. Hari SS, Sivakumar M, Renuga P, karthikeyan S, Suriya (2019) Detection of plant disease by
leaf image using convolutional neural network. In: 2019 International conference on vision
towards emerging trends in communication and networking (ViTECoN), pp 1–5. https://doi.
org/10.1109/ViTECoN.2019.8899748
22. KP Ferentinos (2018) Deep learning models for plant disease detection and diagnosis. Comput
Electron Agricult 145:311–318, ISSN: 01681699. https://doi.org/10.1016/j.compag.2018.01.
009.https://www.sciencedirect.com/science/article/pii/S0168169917311742)
23. Darwish A, Ezzat D, Hassanien AE (2020) An optimized model based on convolutional neural
networks and orthogonal learning particle swarm optimization algorithm for plant diseases
diagnosis. Swarm Evol Comput 52:100616, ISSN: 22106502. https://doi.org/10.1016/j.swevo.
2019.100616. https://www.sciencedirect.com/science/article/pii/S2210650219305462
24. Gupta S, Garg G, Mishra P, Joshi RC (2021) CDMD: an efficient crop disease detection and
pesticide recommendation system using mobile vision and deep learning. In: Tiwari S, Suryani
E, Ng AK, Mishra KK, Singh N (eds) Proceedings of international conference on big data,
machine learning and their applications. Lecture notes in networks and systems, vol 150.
Springer, Singapore. https://doi.org/10.1007/978-981-15-8377-3_25
25. Gutierrez A, Ansuategi A, Susperregi L, Tubío C, Rankić I, Lenža L (2019) A benchmarking
of learning strategies for pest detection and identification on tomato plants for autonomous
scouting robots using internal databases. J Sens. 2019:15. ArticleID 5219471. https://doi.org/
10.1155/2019/5219471
26. Wang G, Sun Y, Wang J (2017) Automatic image-based plant disease severity estimation using
deep learning. Comput Intell Neurosci 2017:8. https://doi.org/10.1155/2017/2917536
27. Srivastava P, Mishra K, Awasthi V, Sahu V, Pawan Kumar P (2021) Plant disease detection
using convolutional neural network. Int J Adv Res 09:691–698. https://doi.org/10.21474/IJA
R01/12346
28. https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d
29. Lu Y, Yi S, Zeng N, Liu Y, Zhang Y (2017) Identification of rice diseases using deep
convolutional neural networks. Neurocomputing 267:378–384, ISSN: 09252312. https://doi.
org/10.1016/j.neucom.2017.06.023. https://www.sciencedirect.com/science/article/pii/S09
25231217311384
30. Mohanty Sharada P, Hughes David P (2016) Salathé Marcel using deep learning for
image-based plant disease detection frontiers in plant science 72016:1419. https://doi.org/
10.3389/fpls.2016.01419. https://www.frontiersin.org/article/10.3389/fpls.2016.01419. ISSN:
664-462X
Recognition of Iris Segmentation Using
CNN and Neural Networks
S. Jeyalaksshmi and P. J. Sai Vignesh
Abstract As the first stage of an iris identification system, iris segmentation is

important. Iris is regarded as one of the most accurate biometrics. We gathered a
huge dataset of iris pictures, purposefully sampling a wider range of quality than that
utilized by existing commercial iris identification algorithms. Recent deep learning
research suggests that NN has a lot of potential in the field of biometric security.
This study discusses iris region segmentation for iris recognition as a biometrical
personal identification and verification method. To improve accuracy, we employed
CNN and neural networks in our technique.
Keywords Biometric · Iris segmentation · Convolutional neural networks

(CNNs) · Neural networks (NNs)
1 Introduction
Nowadays, biometric recognition has become a solid method for identifying and
recognizing individuals based on physiological or behavioral features. Traditional
means of identity verification, passwords and identity cards, for example, are not
always trustworthy since they might be forgotten or stolen. Biometric identification
has been utilized in security systems such as authentication and information protec-
tion. Biometric technologies use behavioral (such as handwriting) or physiological
(such as fingerprint, face, and iris) features to correctly validate human identification.
In comparison to other biometric techniques, iris recognition has the best accuracy
in identifying individuals among various biometric technologies [1]. If a system can
automatically identify a human person based on differences in biological features
among humans, it will be revolutionary; it is referred to as biometric recognition [2].
S. Jeyalaksshmi (B)
Vels Institute of Science, Technology and Advanced Studies, Chennai, Tamil Nadu, India
e-mail: jlakshmi.scs@velsuniv.ac.in
P. J. Sai Vignesh
Rajalakshmi Engineering College, Thandalam, Chennai, India
516 S. Jeyalaksshmi and P. J. Sai Vignesh
Iris texture patterns are thought to be unique to each individual and even to the
two eyes of the same person. It is also stated that the iris patterns of a specific
person change rarely beyond youth. In investigations to date, very high recogni-
tion/verification rates for iris recognition systems have been recorded [3]. As a result,
the iris is regarded as the most accurate and reliable means of identifying people,
and it has attracted a lot of attention in the recent decade [4]. In this paper, we used
CNN and neural networks for better iris recognition.
2 Related Works
Shashidhara and Aswath [5] demonstrated an iris area segmentation method for iris
recognition. The human iris is a one-of-a-kind feature that differs from person to
person. Human irises are unique, much like fingerprints, according to biological
research. In addition, any vision-capturing device may simply access the iris. The
iris’s two-dimensional structure makes the technology much more useful.
Hu et al. [6] created a unique approach for improving color iris segmentation
accuracy and reliability in both static and mobile device captures. Our approach
is a fusion technique that selects segmentation results from a number of different
methods. To begin, we propose and investigate a three-model iris segmentation
framework, demonstrating that selecting among the three models’ outputs can result
in improvements.
Sreeja and Jeyalakshmi [7] presented the issues and disputes with existing iris
biometric systems.
CNN-based Hofbauer et al. [8] pupil classification systems have been demon-
strated to outperform traditional iris segmentation techniques in terms of segmented
error metrics. They created a method for parameterizing CNN-based segmentation
that bridged the gap between the rubber sheet transform and CNN.
Abiyev and Altunkaya [9] presented a NN based for physical identity information;
a biometric security technology is used. The location of the iris area and the creation
of an iris image dataset are the first steps in personal identification, followed by iris
pattern recognition. The iris is taken from an eye picture and represented as a dataset
after normalization and augmentation. Using this data collection, a NN is used to
categorize iris patterns.
Existing Arsalan et al. [10] iris identification systems are largely reliant on certain
circumstances, picture capturing distance, and the stop-and-stare environment, for
example, both need significant user engagement. In a non-cooperative scenario, they
introduced a two-stage CNN-based approach for identifying the true iris border in
noisy iris images [11, 12].
Recognition of Iris Segmentation … 517
Fig. 1 Image taken from

CASIA database
3 Proposed System
3.1 Image Data
One of the most difficult aspects of iris recognition is capturing an elevated picture of
the iris by an operator. First, pictures of the iris with adequate resolution and crispness
to allow identification are desirable. Second, without a particular amount of light, it
is crucial that the inner iris design has a lot of contrast. In order to circumvent this,
we used iris pictures from the CASIA database (Fig. 1).
3.2 Grayscale Conversion
If the recorded image is in color (RBG), before being stored for further processing,
it is transformed to gray scale. One picture must be used to detect the inner circle,
while the other must be used to detect the outside circle. The picture is just captured
once to locate the pupil area. The image is transformed to black and white when a
certain threshold value is reached.
4 Determination of Center and Radius
From one side of the image to the other, a vertical scan is performed. We will receive
a tangent to the circle on the left if the scan starts on the left side of the picture, for
example. The other tangent is produced as a result of this operation. The diameter is
determined by the distance between these two points, and the radius and center are
derived as a result.
5 Determining Outer Boundary of Iris
To get the iris’s outer boundary, step ii is done with a high value (0.38).
6 Calculating the Outer Boundary of the Circle
The center of both circles remains constant since they are concentric. To obtain a
tangent, a horizontal scan is performed from this point. As a consequence, the iris’s
outer edge radius is calculated. The radii for the iris’ inner and outer rings, as well
as the center, were sent to us. The final step is to use the equations below to create
the circles.
u = (double(x) ∗ cos(y) + double(z(1))) (1)
v = (double(x) ∗ sin(y) + double(z(2))) (2)
7 Segmentation
7.1 CNN
Pattern recognition applications frequently use CNNs, also known as multi-layer

neural networks. The algorithm’s resilience to minor changes in input is improved by
CNNs. The low pre-treatment rate is required for procedures that do not necessitate
the selection of extractors with specialized iris properties. In this work, we use a
VGG-face model that has been pre-trained and fine-tuned using our training images.
/( )
σ(x) = 1 1 + e−x (3)
As indicated in Eq. (3), the candidate input value is shifted between 0 and 1 using
the sigmoid-based activation function, meaning that the output becomes zero for
negative inputs, but for big positive inputs, the output becomes 1.
7.2 Neural Network
The iris patterns in this article are identified using a neural network (NN). The
normalized and improved iris picture is represented as a two-dimensional array in
Fig. 2 Extracted pupil
Fig. 3 Neural network architecture
Table 1 Accuracy of CNN

Data CNN accuracy (%) Neural network accuracy (%)
and neural network in iris
segmentation D1 91.3 88.6
D2 90.6 86.4
D3 89.7 83.1
this method. This array stores the texture of the iris pattern’s grayscale values. The
input signals to a neural network are represented by these numbers. Figure 2 depicts
the NN architecture. The NN employs two hidden layers.
X1, X2,…,Xm are grayscale input array values that characterize the iris texture
information, whereas P1, P2,…,Pn are output patterns that characterize the irises
(Fig. 3).
From Table 1 and Fig. 4 shows the accuracy of CNN and neural network in iris
segmentation. We can conclude that CNN has better accuracy than neural network
in iris recognition method.
8 Conclusion
The iris region has been determined once the inner and outer circles have been
gathered and is being examined for pattern recognition. As a result, CNN and neural
networks algorithms are used for iris segmentation. It is worth mentioning that the
eyelids and lashes are included in the outer circle. As a result, this approach is
effective for pattern matching in a specified area of the obtained region. As a result,
ACCURACY
92.00%
90.00%
88.00%
Percentage
86.00%
84.00%
82.00%
80.00%
78.00%
D1 D2 D3
CNN NN
Fig. 4 Accuracy of CNN and neural network in iris segmentation
an efficient biometric verification may be performed. In this paper, we conclude that

CNN has better accuracy when comparing with neural networks in iris recognition.
References
1. Ohmaid H, Eddarouich S, Bourouhou A, Timouyas M (2020) Iris segmentation using a new

unsupervised neural approach. IAES Int J Artif Intell 9(1):58
2. Tobji R, Di W, Ayoub N (2019) FMnet: iris segmentation and recognition by using fully and
multi-scale CNN for biometric security. Appl Sci 9(10):2042
3. Liu X, Bowyer KW, Flynn PJ (2005) Experiments with an improved iris segmentation
algorithm. In: Fourth IEEE workshop on automatic identification advanced technologies
(AutoID’05). IEEE, pp 118–123
4. Huang J, Wang Y, Tan T, Cui J (2004) A new iris segmentation method for recognition. In:
Proceedings of the 17th international conference on pattern recognition ICPR vol 3. IEEE, pp
554–557
5. Shashidhara HR, Aswath AR (2014) A novel approach to circular edge detection for iris image
segmentation. In: 2014 Fifth international conference on signal and image processing. IEEE,
pp 316–320
6. Hu Y, Sirlantzis K, Howells G (2015) Improving colour iris segmentation using a model
selection technique. Pattern Recogn Lett 57:24–32
7. Sreeja VS, Jeyalaksshmi S, 2017, An overview of iris recognition, an overview on iris
recognition. J Adv Res Dyn Control Syst 9(6):76–81, ISSN: 1943-023X
8. Hofbauer H, Jalilian E, Uhl A (2019) Exploiting superior CNN-based iris segmentation for
better recognition accuracy. Pattern Recogn Lett 120:17–23
9. Abiyev RH, Altunkaya K (2009) Neural network based biometric personal identification with
fast iris segmentation. Int J Control Autom Syst 7(1):17–23
10. Arsalan M, Hong HG, Naqvi RA, Lee MB, Kim MC, Kim DS, Park KR (2017) Deep learning-
based iris segmentation for iris recognition in visible light environment. Symmetry 9(11):263
11. Jeyalaksshmi S, Padmapriya D, Midhunchakkravarthy D, Ameen A (2020) Detection of hard

exudate from diabetic retinopathy image using fuzzy logic. In: Intelligent computing and
innovation on data science. Springer, Singapore, pp 543–550
12. Juneja S, Anand R (2018) Contrast enhancement of an image by DWT-SVD and DCT-SVD.
In: Data engineering and intelligent computing. Springer, Singapore, pp 595–603
Popularity of Optimization Techniques
in Sentiment Analysis
Priyanka and Kirti Walia
Abstract In today’s scenarios, online marketing and social networking require senti-
ment for opinion mining to understand its customers and users. The sentiment anal-
ysis involves extracting information from the text and symbols shared by the individ-
uals over the website reflecting their opinions. It describes various emotions of the
customers based on any product. Sentiment analysis is applicable to monitor social
media that recognized the mood of customers against the brand or any other product.
It has been observed that a variety of techniques were used to optimize the features
extracted during sentiment analysis. In the present paper, the author has presented a
detailed literature survey to outline the popularity of optimization techniques used
in the field of sentiment analysis. The literature review conducted over the authenti-
cated research published in the last decade had illustrated that most of the researchers
had implemented Ant Colony Optimization (ACO) and Particle Swarm Intelligence
(PSO) as optimization techniques. In addition to this hybrid, optimization had also
been emerging in recent years. The work outcomes are supported by the graphical
illustrations to show the rising popularity of optimization techniques in the field of
sentiment analysis.
Keywords Sentiment analysis · Optimization techniques · Machine learning ·

ACO · PSO
1 Introduction
SA is used to predict positive or negative sentiment in the form of text. It is mostly

preferred by businesses to predict sentiment in social-based data, brand reputation,
and check the review of the product from the customer feedback. NLP is used to deter-
mine SA in form of negative, positive, and neutral. It is beneficial for businesses
Priyanka (B) · K. Walia

University Institute of Computing, Chandigarh University Gharuan, Punjab, India
e-mail: priyankatuli1986@gmail.com
K. Walia
e-mail: kirti.e8889@cumail.in
524 Priyanka and K. Walia
Table 1 Sentiment analysis

Sentiments Description
Positive Happy and satisfied customer
Neutral Wants more in future
Negative Dissatisfied with the service
because it predicts the actual needs of the customers through customer feedback.
Many studies were provided by several researchers that are based on sentiment anal-
ysis. In the year 2012, [1] presented a survey that defined the detailed study regarding
sentiment analysis, its definitions, problems, and development. Nowadays, several
applications are available that are based on sentiment analysis such as for the polit-
ical sector, public opinion, medical-based analysis, business analysis, etc., [2]. One
of the recent research works presented by Saha et al. is based upon the sentiment
analysis of COVID-19 with the help of Twitter data sets [3]. Several challenges and
applications of sentiment analysis were discussed by Makinist et al. [4] and Tang
et al. [5] in the detailed description.
Sentiment analysis is an intelligent technique because it helps to capture and
predict the various opinions, attitudes, feelings, and emotions by using different
sources like speech, database, text, and database. The main target behind the senti-
ment analysis is to find out the various emotions and attitudes of the customers
through the feedback of the product [6]. Table 1 defines various emotions that are
used for sentiment analysis. In sentiment analysis, three types of emotions are mostly
preferred that are defined as positive, neutral, and negative [7].
Social media such as blogs, wikis, review sites, social sites, and tweets helps the
customers to share their experiences, knowledge, and thoughts. In the last few years,
people have preferred social networking due to its several advantages. So, social
media plays a major role in sentiment analysis [8]. Microblogging is mostly preferred
for sentimental analysis. It is a network-based service that provides the exchange of
information according to customers’ feedback with the help of messages, videos,
images, etc., [9]. Several machine learning techniques are used in sentiment analysis
[10]. Provided a deep study that is based on ML techniques that are implemented
into sentiment analysis.
Contributions of the study: Overall the paper provided a review to show the popu-
larity of optimization techniques in the field of sentiment analysis based on published
and authenticated work. The major contributions of the review study are:
. The key stages of sentiment analysis and classification work are discussed.
. Highlights the various optimization techniques that have been frequently utilized
for the sentiment analysis work concerning the feature extraction stage.
. The year-wise assessment was provided to show the changing trend in the
popularity of optimization techniques for sentiment analysis.
. A literature review is summarized to show the goal of existing works behind the
integration of optimization techniques at various stages.
Popularity of Optimization Techniques in Sentiment Analysis 525
2 Sentiment Analysis
Sentiment analysis is very important because it is used to analyse information based

on social media [11]. Sentiment analysis is not an easy task because it is analysed
through various steps. The methodology is generalized using steps shown in Fig. 1.
2.1 Data Collection from Different Sources
In sentiment analysis, the first step that is performed is known as data collection.
Social media helps in this step because data were collected from blogs, tweets,
forums, reviews, etc. NLP is used for the mining and classification of data [12].
However, the majority of the research works focus on a Twitter data set for analysing
the positive, neutral, and negative sentiments of customers or consumers to support
various review or survey studies.
Fig. 1 Steps for sentiment

analysis
2.2 Preparation of Text
After the collection of data, prepare the text and eliminate the non-textual content or
irrelevant data. This step is also known as pre-processing of the data and is respon-
sible for cleaning the data before initiating any type of analysis. At this stage, all
the unwanted or irrelevant words are removed from the collected text [13]. These
unwanted words usually represent repeated use of some words, phrases, stop words,
punctuations, etc. Usually, there are three important techniques that are widely used
in the pre-processing or the preparation of the text for the sentiment analysis. These
techniques are normalization, punctuation removal, and stop word removal.
The last step has resulted in the preparation of the most relevant textual data that
can be used for the sentiment analysis. Now, at this step, the features representing
the refined data are extracted [14, 15] that are then passed to the next step. Various
techniques such as Bag of words, N-gram, and TF-IDF have been popularly used
by various researchers for the extraction of features of the textual data for sentiment
analysis.
2.4 Feature Optimization
The output of the feature extraction step is further refined by applying various opti-
mization techniques. This step increases the accuracy of sentiment analysis and
classification work. The popular nature-inspired optimization techniques are Artifi-
cial Bee Colony, Particle Swarm Optimization, Ant Colony Optimization, FireFly
Optimization, etc., [16–18].
2.5 Sentiment Classification and Detection
In sentiment detection, the extracted sentences related to the customer opinions and
reviews are examined deeply. It considered only sentences that are having subjec-
tive expressions and the objective-based communication is removed [19]. After the
sentiment detection, the next step is sentiment classification which classifies the
subjective sentences in the form of positive, negative, and neutral. It is also classified
according to the likes, dislikes good, and bad based on different points [20, 21]. This
is performed using machine learning algorithms such as Naïve Bayes, SVM, and
neural networks.
2.6 Final Output
The last step of the sentiment analysis is the output that is represented in the form of
pictorial representation through bar charts, pie charts, and line graphs [22].
3 Sentiment Analysis Using Optimization Techniques
Sentiment analysis is used to collect the textual form of the opinions that are based on
customer feedback. It defines the various forms of customer emotions. In the past few
years, several authors proposed studies that belong to sentiment analysis [23]. Have
presented a study that used Twitter data to examine sentiments based on a particular
subject. The tweets of the proposed work are categorized into two opinion classes
that are; negative or positive. A hybrid approach of the machine learning algorithm
SVM and ACO were performed for classification. The simulation results show the
enhancement of average accuracy classification is computed from 75.54% (using
SVM) to 86.74% (using SVMACO) [24]. The author introduced a hybrid algorithm
by integrating ACO and KNN algorithm that is applied for feature selection. The
author implemented the simulated results on the customer review data sets. The
proposed work will be compared with the baseline algorithms like information gain
(IG), genetic algorithm (GA), and rough set attribute reduction (RSAR). The overall
evaluation describes that the proposed method provides an improvement in results
as compared with baseline algorithms [25]. The author proposed a system that is
based on sentiment analysis or is used for the election of West Java Governor. In
the proposed work, PSO and information gain are used that provide helps to select
suitable attributes from the documents. For the classification, the author used SVM as
a classifier. The accuracy achieved by the proposed system is computed as 94.8%, and
the value of AUC is computed as 0.98 [26]. Authors have proposed a hybrid approach
using a swarm intelligence-based optimization algorithm. In this paper, the pre-
processing is performed through various steps like tokenization, stemming, removing
emotions, and stop words. The author utilized ACO and PSO techniques because
these are optimized best features selection and also reduce the number of paths. The
optimization is performed before the categorization of the text. For classifications of
tweets, Naïve Bayes (NB) and Support Vector (SVM) techniques of machine learning
were implemented [27]. The author presented a study that used PSO for feature
selection to check the performance of the various classification algorithms. The
author implemented two data sets in the proposed work that are sentiment analysis
data sets and SMS spam detection. The main reason behind the implementation of the
PSO approach was feature selection. The overall result analysis shows that the PSO
enables to reduce the space complexity and provide better accuracy of the classifiers
[28]. The author introduced a study that overcome the difficulty of feature selection
in sentiment analysis. The author used ACO because it is always considered the best
feature selection approach. In the proposed work, a KNN classifier was implemented
to generate the optimum features subset of the candidate. The experimental results
show the relationship between the features and sentiment that a determined through
the accuracy depends on precision, recall, and f-score. The overall results evaluation
describe that the proposed ACO-KNN algorithm obtained a better feature subset, and
it also improved accuracy based on sentiment classification. Shekhawat et al. [29] the
author also presented a study that describes sentiment analysis and optimization ACO
is considered as a better approach rather than others. Some authors proposed PSO
or ABC as are best optimization techniques for sentiment analysis. The following
literature survey of this paper is based on different optimization techniques that
are being used for sentiment analysis. In the current era, Twitter is an important
blogging platform that is used to collect the different customers’ opinions based on
“tweets”. The detailed analysis of published work against “sentiment analysis” and
“optimization techniques” is presented in the next section.
The overall inferences drawn from the aforementioned survey analysis are discussed
in this section of the paper. Critically, the study had focused on the sentiment anal-
ysis performed since last decade. With the rising popularity of meta-heuristics and
swarm intelligence, it is concluded that a lot of work had been done on sentiment
analysis using different optimization techniques to address the feature selection stage.
Table 2 describes the different optimization techniques that were implemented by
the researchers in combination with machine learning while focusing on the purpose
of implementation of the optimization techniques. It has been observed that Ahmad
et al. 2015 had integrated GA at the feature selection stage and later in 2017 involved
KNN with ACO for the same purpose Table 2 also generalized that there is several
researchers who had integrated swarm-based optimization techniques at the feature
selection stage due to their objective fitness functions that could resolve the opti-
mization and selection issues. Further, it has also been analysed that the integration
of optimization approaches also enhanced the classification accuracy of existing
machine learning techniques.
The published resources summarized in Table 2 give the constructive outline of the
existing research and lay down the foundation of future research. It has been observed
that in recent years, optimization techniques, namely ACO, PSO, ABC, and hybrid
techniques had been popularly implemented by the research community. Moreover,
a rising trend towards the integration of optimization approaches to enhance the
classification performance of sentiment classification based on machine learning
work had also been presented. The graphical presentation of these observations is
used to illustrate interpretations of the review study.
Further, Fig. 2 represents the pie chart of sentiment analysis depending on the
different optimization techniques. It shows the popularity of three optimization tech-
niques, namely are ACO, PSO, ABC, and hybrid as illustrated by the published
papers cited within the present survey. It is observed that the popularity of ACO
Table 2 Different optimization techniques for sentiment analysis

Author’s detail Implemented Optimization Data sets Purpose of
techniques techniques optimization
technique
S. R. Ahmad, A. A. NA GA and rough Document data Feature
Bakar, en M. R. set theory sets selection
Yaakub. 2015, [21]
S. R. Ahmad, N. KNN ACO Customer review Feature
M. M. Yusop, A. data sets consist of selection
A. Bakar, en M. R. five electronic
Yaakub. 2017, [22] product data sets
from the Amazon
website
D. K. Gupta, K. S. Learning PSO NA Feature
Reddy, en A. Ekbal framework of selection
et al. 2015, [23] conditional
random field
(CRF)
S. R. Ahmad, A. A. K-nearest ACO Customer review Feature
Bakar, en M. R. neighbour (KNN) data sets consist of selection
Yaakub et al. 2019, five electronic
[19] product data sets
from the Amazon
website
E. M. Badr, M. A. Naïve Bayes ACO and PSO Twitter data sets Feature
Salam, M. Ali, en (NB) and support selection
H. Ahmed et al. vector (SVM)
2019, [17]
S. Kumar, M. Random forest ABC EEG data sets Boost the
Yadava, en P. P. (RF) overall
Roy et al. 2019, performance
[24] based upon
local and global
ratings
Nagarajan and Decision tree PSO Twitter data sets Feature
Gandhi 2019, [25] (DT) selection
Orkphol and Yang, K-Means, term ABC Twitter data sets Used to find the
2019, [26] frequency-inverse best initial state
document of the centroids
frequency
(TF–IDF), and
singular value
decomposition
(SVD)
(continued)
Table 2 (continued)
Author’s detail Implemented Optimization Data sets Purpose of
techniques techniques optimization
technique
A. Jain, B. Pal Senti-NSetPSO PSO Blitzer, All IMDb, To categorize
Nandi, and C. Polarity and the document
Gupta, en D. K. subjective data set
Tayal et al. 2020,
[27]
K. Machová, M. Naïve Bayes Particle swarm Movie and general Sentiment
Mikula, and X. optimization data sets labelling
GAO, en M. Mach
et al. 2020, [28]
S. S. Shekhawat, S. SVM and NB Spider monkey Sender2 and Identify optimal
Shringi, en H. optimization Twitter cluster-heads of
Sharma et al. 2020, the data set
[29]
A. Jain, B. Pal NA PSO with Blitzer, aclIMDb, Classify
Nandi, and C. Neutrosophic Polarity and large-sized text
Gupta, en D. K. Set subjective data set
Tayal et al. 2020,
[27]
Naresh and SVM Sequential Twitter data set Multistage
Venkata 2021, [30] minimal optimized
optimization classification
(SMO)
Datta and Recurrent Neural Firefly Demonetization Optimization of
Chakrabarti 2021, Network (RNN) algorithm (FF), tweets weights of the
[31] and Multi-verse polarity scores
optimization
(MVO)
A. Hosseinalipour, Fuzzy C-means Social spider ISEAR data set, Feature
F. S. data clustering optimization sentiment polarity selection
Gharehchopogh, technique, a (SSO) data sets, and
and M. Masdari, en decision tree Stanford sentiment
A. Khademi ipour (DT), and Naïve treebank data sets
et al. 2021, [32] Bayes (NB)
Vasudevan and Support vector PSO Amazon data sets Feature
Kaliyamurthie, (SVM) selection
2021, [33]
has increased to 32% in comparison with PSO (26%) and ABC (21%) in the field
of sentiment analysis. It also means that ACO is an emerging optimization tech-
nique and proves its strength over earlier ABC and PSO techniques. Moreover,
the hybridization of various optimization techniques had also shown a significant
attraction to improve the sentiment analysis and classification work. Usually, these
Fig. 2 Popular optimization

techniques for sentiment
analysis
hybrid techniques combine two or more optimization techniques to overcome their

limitations.
The yearly assessment of the published research against keywords “sentiment
analysis” and “optimization techniques” since 2015 is shown in Fig. 3. The published
articles from various resources, namely Google Scholar, Elsevier, Springer, Science
Direct, and other publishers are used for preparing the assessment graph. It is
observed that the number of publications for the analysed keywords are constantly
rising since 2015 showing the attraction of the scientific community towards the
implementation of optimization techniques for the sentiment analysis. For instance,
by 2015, only 4% of the publications were focusing on the integration of optimiza-
tion approaches which increased to 27% by the end of 2020. In between, various
fluctuations in the number of publications were observed since 2019. Moreover, the
publications for the year 2021 are still less and are hopefully justifiable by the end
of 2021.
Overall, it is observed that the review analysis of the articles published since
2015 had shown a tremendous rise in the popularity of optimization techniques to
address the optimal selection adjoin the sentiment analysis and classification work.
Further, ACO had emerged as the most popular optimization technique that had been
implemented by various researchers in the past few years.
5 Conclusion
Sentiments reflect the thoughts of an individual and the sentiment analysis belongs
to the text-based analysis that reflects customers’ emotions and opinions. It has been
observed that for sentiment analysis two optimization techniques have been popularly
implemented and form the first choice of most of the researchers for the feature
ASSESSMENT BASED ON PUBLISHED

RESOURCES
30%
27%
Number of Published Articles
25% 23%
20%
15%
15%
12% 12%
10% 8%
5% 4%
0%
2015 2016 2017 2018 2019 2020 2021
Fig. 3 Yearly assessment of sentiment analysis and optimization approaches
selection stage. Further, it is observed that ABC, PSO, and ACO hold the major
proportion among various optimization techniques concerning sentiment analysis.
However, the highest popularity is observed for ACO as an optimization approach
for feature selection in the field of sentiment analysis. The overall results describe
that the ACO emerged as the best approach for sentiment analysis since 2015 with
Twitter data sets implemented in most of the cases.
References
1. Saha G, Roy S, Maji P (2021) Sentiment analysis of twitter data related to COVID-19. In: Impact
of AI and data science in response to coronavirus pandemic, Singapore, Springer, Singapore,
pp 169–191
2. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Foundations and trends® in
information retrieval 2(1–2):1–135
3. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
4. Makinist S, Hallaç IR, Karakuş BA, Aydın G (2017) Preparation of improved Turkish DataSet
for sentiment analysis in social media. ITM Web Conf 13:01030
5. Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl
36(7):10760–10773
6. Trupthi M, Pabboju S, Narasimha G (2016) Improved feature extraction and classification—
Sentiment analysis. pp 1–6
7. Mouthami K, Devi KN, Bhaskaran VM (2013) Sentiment analysis and classification based
on textual reviews. In: 2013 International conference on information communication and
embedded systems (ICICES), Chennai
8. Alessia D, Ferri F, Grifoni P, Guzzo T (2015) Approaches, tools and applications for sentiment
analysis implementation. Int J Comput Appl 125(3)
9. Kaur J, Sehra SS, Sehra SK (2016) Sentiment analysis of twitter data using hybrid method of
support vector machine and ant colony optimization. Int J Comput Sci Inf Secur 14(7):222
10. Ahmad SR, Yusop NMM, Bakar AA, Yaakub MR (2017) Statistical analysis for vali-
dating ACO-KNN algorithm as feature selection in sentiment analysis. In: 2nd international
conference on applied science and technology 2017 (ICAST’17), Kedah, Malaysia
11. Kurniawati I, Pardede HF (2018) Hybrid method of information gain and particle swarm opti-
mization for selection of features of SVM-based sentiment analysis. In: 2018 international
conference on information technology systems and innovation (ICITSI), Bandung-Padang,
Indonesia
12. Bakshi G, Shukla R, Yadav V, Dahiya A, Anand R, Sindhwani N, Singh H (2021) An optimized
approach for feature extraction in multi-relational statistical learning. J Sci Ind Res (JSIR)
80(6):537–542
13. Gupta A, Anand R, Pandey D, Sindhwani N, Wairya S, Pandey BK, Sharma M (2021) Prediction
of breast cancer using extremely randomized clustering forests (ERCF) technique: prediction
of breast cancer. Int J Distrib Sys Technol (IJDST) 12(4):1–15
14. Anand R, Chawla P (2016) A review on the optimization techniques for bio-inspired antenna
design. In: 2016 3rd international conference on computing for sustainable global development
(INDIACom), IEEE. pp 2228–2233
15. Srivastava A, Gupta A, Anand R (2021) Optimized smart system for transportation using RFID
technology. Math Eng Sci Aerosp 12(4):953–965
16. Anand R, Chawla P (2020) A hexagonal fractal microstrip antenna with its optimization for
wireless communications. Int J Adv Sci Technol 29(3s):1787–1791
17. Badr EM, Salam MA, Ali M, Ahmed H (2019) Social Media Sentiment Analysis using Machine
Learning and Optimization Techniques. Int J Comput Appl 975:8887
18. Bajeh AO, Funso BO, Usman-Hamza FE (2019) Performance analysis of particle swarm
optimization for feature selection. FUOYE J Eng Technol 4(1)
19. Ahmad SR, Bakar AA, Yaakub MR (2019) Ant colony optimization for text feature selection
in sentiment analysis. Intell Data Anal 23(1):133–158
20. Nayar N, GautamS, Singh P, Mehta G (2021) Ant colony optimization: A review of literature
and application in feature selection. In: Inventive Computation and Information Technologies,
Singapore: Springer, Singapore 285–297
21. Ahmad SR, Bakar AA, Yaakub MR (2015) Metaheuristic algorithms for feature selection
in sentiment analysis. In: 2015 Science and information conference (SAI), London, United
Kingdom
22. Ahmad SR, Bakar AA, Yaakub MR, Yusop NMM (2017) Statistical validation of ACO-KNN
algorithm for sentiment analysis. J Telecommun Electron Comput Eng JTEC 9(2–11):165–170
23. Gupta DK, Reddy KS, Ekbal A (2015) Pso-asent: Feature selection using particle swarm
optimization for aspect based sentiment analysis. pp 220–233
24. Kumar S, Yadava M, Roy PP (2019) Fusion of EEG response and sentiment analysis of products
review to predict customer satisfaction. Inf Fusion 52:41–52
25. Nagarajan SM, Gandhi UD (2019) Classifying streaming of Twitter data based on sentiment
analysis using hybridization. Neural Comput Appl 31(5):1425–1433
26. Orkphol K, Yang W (2019) Sentiment analysis on microblogging with K-means clustering and
artificial bee colony. Int J Comput Intell Appl 18(03):1950017
27. Jain A, Pal Nandi B, Gupta C, Tayal DK (2020) Senti-NSetPSO: large-sized document-level
sentiment analysis using Neutrosophic Set and particle swarm optimization. Soft Comput
24(1):3–15
28. Machová K, Mikula M, Gao X, Mach M (2020) Lexicon-based sentiment analysis using the
particle swarm optimization. Electronics (Basel) 9(8):1317
29. Shekhawat SS, Shringi S, Sharma H (2021) Twitter sentiment analysis using hybrid Spider
Monkey optimization method. Evol Intell 14(3):1307–1316
30. Naresh A, Krishna PV (2021) An efficient approach for sentiment analysis using machine
learning algorithm. Evol Intell 14(2):725–731
31. Datta S, Chakrabarti S (2021) Aspect based sentiment analysis for demonetization tweets by
optimized recurrent neural network using fire fly-oriented multi-verse optimizer. Sādhanā 46(2)
32. Hosseinalipour A, Gharehchopogh FS, Masdari M, Khademi A (2021) Toward text psychology
analysis using social spider optimization algorithm. Concurr Comput 33(17)
33. Vasudevan P, Kaliyamurthie KP (2021) Product sentiment analysis using particle swarm opti-
mization based feature selection in a large-scale cloud. In: Proceedings of the 1st international
conference on computing, communication and control system, I3CAC 2021
Predominant Role of Artificial
Intelligence in Employee Retention
Ravinder Kaur and Hardeep Kaur
Abstract The present study throws light on the role of artificial intelligence in human
recourses. As technology is changing very rapidly so many industries adopted this
system to give more satisfaction to the employees. An employee plays a vital role in
the organization. New techniques and technologies are used by the organization to
maintain their employees. It is important for all organizations that offer more bene-
fits to the employee. The validity and reliability of the questionnaire were validated
by Cronbach’s alpha. The present study is based on the previous literature research
“Factors” with the help of this literature review, a structured questionnaire was devel-
oped. AI technology will continue to grow and at some point in future, AI will be
the norm and the old-fashioned recruiting and hiring processes will seem stone-age.
There is a positive relationship between the hiring and training for employees with
AI and find the factor which employees are required to be done with AI. With the
help of SPSS, the study was found that AI showing a positive relationship with the
human resource department as well as with employees. With the help of the Radom
sampling technique, the data was collected from the different companies’ employees
(N = 50).
Keywords Artificial intelligence · Role · Employee · Human recourse · North

region
1 Introduction
Modernized thinking is a device that uses human information in various areas and
improves adoption, and it is a creative development used in all organizations to
improve utility and performance. In this article, we are receiving a wide technique to
consolidate ace unique systems, re-enactment and showing, mechanical innovation,
regular language planning (NLP), usage of inventively decided computations, etc.
Thus, we are including helped, increased, and self-ruling insight in the different
R. Kaur (B) · H. Kaur

University Business Schools, Chandigarh University, Mohali, India
e-mail: ravinder.e1795@cumail.in
536 R. Kaur and H. Kaur
ways that people are upheld, or supplanted by AI [1–3]. Numerous advancements

that were once observed as “bleeding edge” (and part of AI) have become routinized
and thusly not, at this point saw as a feature of the change cycle [4].
In addition, the handling carried out by robots is “more developed than before
devices for the robotization of business processes” absolutely against the back-
ground that they imitate human behaviour by “contributing data from numerous
IT frameworks and devour” [5].
This implies that they can collect, search, and record information from numerous
sources, but effortlessly that individuals can only envy in 2018 [6].
Another improvement will be to smooth the call-up. At Unilever, for example,
the recruitment options have been reduced by 75% [6].
Front-end enrolling exercises in dealing with grounds that the benefits of speed
and accuracy will outweigh the benefits of human association. Human resources will
likely stay in touch with certain parts of the recruiting process, regardless of whether
or not the manager has assumed the first leadership role. This can mean promoting
the exchange about the details of a job offer, agreeing start dates or asking questions
about agreements.
An explicitly favourable position for the use of AI is the consistency of the trans-
port with redundant assignments. “In the event that you submit a calculation on
a similar subject twice, you will get a similar return. That just doesn’t apply to
individuals” [7].
Organization
AI Role
Human
Employees
Resource
2 Literature Review
In the year 2021 found in his study that AI is contributing in the success of recruitment
process: identifying, selecting, and retaining talented people. Howard identified that
AI is playing role in everybody life. Strategic foresight on AI workplace applications
will shift professional research and practice from a reactive attitude to a proactive
attitude [8]. Understanding the opportunities and challenges of AI for the future of
Predominant Role of Artificial Intelligence in Employee Retention 537
work will help mitigate the adverse effects of AI on the safety, health, and well-being
of workers.
In 2019, emphasis the AI use in human resource management in term of Team
Estimate, Recruitment and Selection, Employability/R&S, Recruitment, Turnover,
HR Performance Measurement, Corporate Education /training, Development (HRD)
Management by Competencies Quality of life at work Employability [9]. Worked
on the requirement of the organization as he found the chatbots attract the employee
and help the organization to engage their employees with it and along with this
with the help of this AI recruitment procedure became easy for an functions of
an organization like production, performance management, sale, strategic planning,
customer correlation management, banking system, coaching, training, taxes, etc.
[10]. It was also Investigated in 1986 that with the help of AI, time is saved, i.e. tasks
can be completed in less time, and also emphasize learning and development, which
help the organization to focus on crucial thinking [11].
Artificial intelligence has now entered the overall procedure, method, technique
of an organization, and one of the areas where AI has replaced humans is in the
human resource department, where all functions in the human resource department
technology are carried out, such as candidate interviewee, recruitment, orientation of
human resource activities, and performance management, among others [12] (Fig. 1).
S. No Research paper name Year Findings Sector

1 “Emotional intelligence or 2020 Emotional intelligence has IT sector
artificial intelligence—an a major impact on
employee perspective” employee retention and
performance, according to
the findings [13]
2 “Impact of artificial 2020 This study determine the IT sector
intelligence on HR positive and bad effects of
management—a review” adopting artificial
intelligence, as well as how
some firms are employing
it in a real-world scenario
[14]
3 “A study of artificial 2019 According to the findings IT sector
intelligence and its role in of the study, AI plays a
human resource growing role in various
management” operations carried out in
human resource
departments, where
robotics businesses may
manage recruitment, hiring,
data analysis, data
collection, lowering
workplace workload, and
optimizing workplace
efficacy [15]
(continued)
(continued)
4 “Artificial intelligence 2019 Artificial intelligence Food processing
chatbots are new recruiters” chatbots are very
productive instruments in
the recruitment process,
according to the research,
and they will be useful in
developing a recruitment
strategy for the industry
[16]
5 “Artificial intelligence: 2019 Taking a proactive Manufacturing sector
implications for the future approach to AI workplace
of work” applications will transform
occupational research and
practice from a reactive to a
proactive position [17]
6 “Impact of artificial 2019 The researcher attempted to Manufacturing sector
intelligence in recruitment, establish a link between
selection, screening, and mass hiring drives and the
retention outcomes in the success of using AI in
Irish market in view of the identifying top performers
global market” who will be interested in
long-term organisational
development in this article
[18]
7 “The influence of 2019 Research found that some Manufacturing sector
organizational culture on employees are poor task
employee” performance; lacked
discipline in carrying out
tasks, such as coming and
leaving work without
following applicable
regulations; some
employees carried out their
tasks without following
applicable guidelines
(resulting in poor work
quality); there were delays
in reporting by employees
[11]
9 “Evolution of artificial 2019 It’s thought that the IT sector
intelligence research in interdisciplinary practice of
human resources” AI and HR hasn’t yet
resulted in a theoretical
break or a new conceptual
field [19]
(continued)
(continued)
10 “Artificial intelligence in 2019 Solution for (1) HR Industry sector
human resources phenomenon complexity,
management: challenges (2) restrictions imposed by
and a path forward” tiny data sets, (3) ethical
concerns related to fairness
and legal constraints, and
(4) employee reaction to
management by databased
algorithms [20]
11 “Artificial intelligence and 2018 AI in HR, the benefits of IT sector
the future of HR practices” AI, the obstacles of
implementing AI, and the
way ahead AI and machine
learning are two critical
tech trends that must be
adopted if inch-perfect
decision-making and
successful people
management are to be
achieved [21]
12 “Can artificial intelligence 2018 According to the findings, Industry sector
change the way in which AI has a good impact [22]
companies recruit, train,
develop, and manage
human resources in
workplace?”
13 “Artificial intelligence in 2018 Artificial neural networks IT sector
human resource for turnover prediction,
management” knowledge-based search
engines for candidate
search, genetic algorithms
for staff rostering, text
mining for HR sentiment
analysis, information
extraction for résumé data
acquisition, and interactive
voice response for
employee self-service [9]
14 “Employee turnover 2017 Reduce the employee Industry sector
prediction and retention turnover [23]
policies design: a case
study”
15 “Prediction of employee 2016 It has a far greater accuracy IT sector
turnover in organizations rate when it comes to
using machine learning predicting employee
algorithms” turnover [12]
(continued)
(continued)
16 “Artificial intelligence for 2015 Assisting the organization IT sector
marketing” in the development of new
strategies and innovations
[24]
17 “Retention: A 2011 Retention policy should be IT sector
case of Google” resolved, as anonymized
logs could be shared with
third parties without prior
user approval [25]
3 AI and Its Impact on Various Occupations
By deciding elements of employee’s goal to keep, turnover activities could be normal

all the more absolutely and activities to forestall turnover could be taken ahead of time
[26]. Low labourer association brings about target to leave PC-based understanding
which makes the HR authorities accomplish straightforwardness distinctly several
focuses the using heading measure: immense furthest reaches of AI in HR [27].
Man-made cognizance support HR social affairs, they can check measure perceive
the superlative among them. Through utilizing the association spares correspondingly
can pick the best reasonable open doors for the occupation by surveying the portions
like qualities, limits, insight, and so forth, each original joined representative specific
wisdom inclinations isolating the worker direct customization preparing/preparing
specialists executed as one of the tremendous effect of AI.
Advanced artificial intelligence systems, on the other hand, employ proprietary
algorithms designed expressly to match specific job performance measurements to
possible applicants who best show these characteristics. HireVue is a firm dedicated
to the development of superior AI human resource solutions. Currently, their most
important video interview with questions “particularly crafted to elicit reactions” is
part of a successful programme. “Find the correct behaviours that are predictive of
work success [28].”
4 Research Methodology
The research methodology is based on the descriptive by nature, with the help of
above literature review prepared one structured questionnaire. The information is
gathered from the respective HR manager using the AI in their organization. Infor-
mation was gathered using self-gathered survey. Factor analysis technique was use
Fig. 1 Flow chart for design

Vacation
procedure help to develop request
questionnaire to understand
the role of AI in human
resources
Team
Training
Employee
Learning
Retain the
Employee
with
Organization
by use AI
Biases
removed
Develop
Ledership
Skills
Analysis
the
potenial
of
Employee
to found that the factor in which manager can easily use the AI, and these factor find
out with the help of above literature review (Fig. 2).
Publications
Literature
Survey
Industry
Survey
Research Analysis
methodology
By Email
Industry
Contacts
By Visit
Fig. 2 Flow chart for design procedure help in data collection

Data is categorized on the Statistical tools for

Data are coded
basis of objective data analysis
Fig. 3 Statistical method used to describe variability among observed
5 Statistical Treatment of Data
In this examination, the investigation was performed with the help of SPSS. By using
the factor, analysis technique hiring, training, vacations, appraisal, and engagement
are factors most suits to an employee, and they feel motivate and well treated. This
examination give light on the retention as well as on the satisfaction of employee by
using this technique (Fig. 3).
6 Factor Analysis
By using this technique, 50 questions reduce to six factors. So that manager can
easily understand the factor in which give employee satisfaction, and they able to
retain for long time. The value of −1 to 1 that indicate there is strongly influence the
factor.
Reliability Test
Cronbach’s alpha No of items

0.88 24
Cronbach alpha tests were conducted for 24 parameters evaluated for analysing
the success of employee retention tactics in the selected organisation in order to
validate the questionnaire. The Cronbach alpha for 24 items, however, is 0.8, which
is higher than the 0.8 threshold level for social sciences. As a result, the values used
in the evaluation of research factors are consistent.
7 R Statistics
The correlation coefficient tells the strength direction of variables r −1 to + 1.

Correlations
Hiring Pearson correlation 1 0.572**

Sig. (2-tailed) < 0.001
N 116 116
Training Pearson correlation 0.572** 1 Sig. (2-tailed) <
0.001
** Correlation is significant at the 0.01 level (2-tailed)
8 Result
Using factor analysis in SPSS, it able to find six factor these factors are.
. Hiring with the help of AI
. Training with the help of AI
. Vacation request
. Employee development
. Appraisals through AI
. Employee engagement
– It was shown that there is a strong and favourable association between AI-
assisted hiring and employee retention techniques (r = 0.634).
– It was revealed that AI-assisted training was linked to organisational manpower
involvement, as demonstrated by (r = 0.585).
– It was discovered that vacation request has a statistically significant link with
staff retention techniques (r = 0.680).
– It was discovered that staff development has a significant impact on employee
retention, as evidenced by (r = 0.445).
– AI-assisted appraisals have been found to boost the retention rate in organisa-
tions, according to research (0.564).
– Through AI, it was shown that employee engagement strategies had a direct
association with appraisals (r = 0.551).
9 Discussion
All of these applications are novel, and as fascinating as they may appear, there
are a few risks to be aware of. The good news is that AI can’t function without
training data. Algorithms, in other words, learn from their experiences. You may
wind up with all of the things you despise if your existing management techniques
are prejudiced, discriminating, punishing, or overly consistent. To ensure that ways,
techniques [algorithms, methods] are doing the right thing, we need visible and
adaptable AI. Our early algorithms would require a bump and a change of knots in
order to learn how to develop, design, and manufacture more precisely, just as early
cars did not always go straight. Methods can be used to determine bias. Consider
this scenario: your company has never employed a female engineer and just a few
African–American engineers. According to the AI recruitment method, women and
black engineers are less likely to rise in management. This form of bias should
be carefully removed from algorithms, and it will take time to do it successfully.
There’s also the risk of data breaches and misuse. Consider the widespread use,
universal, and integrated analysis in which we attempt to predict the likelihood of
the most productive employee leaving the company. In fact, informing management
that this person is more likely to leave the company may result in the manager firing or
disregarding the employee. Instead of being an independent decision-making process,
modern AI is a tool for recommendation and improvement. The need of establishing
interpretative and transparent AI systems was underlined by AI scientists at Entelo.
To put it another way, whenever a system makes a decision, it must explain why it
made that decision so that we, as humans, can assess whether the approach it employs
is still effective. Unfortunately, most AI algorithms today are completely opaque, and
this is one of the most influential display elements of the most recent tools.
10 Conclusions and Suggestions
The decision to keep the employee’s goal meant that fluctuation activities could be
normalized all the more and measures to avoid fluctuations could be taken at an
early stage. Low professional association leads to the goal of decisively ending the
association of HR patterns with AI-based competitors for a more profound impact
on improving overall execution. Despite the way AI applications are unlikely to have
the breaking points, humans have the energetic and insightful breaking points, these
shocking AI-powered HR applications can still verify, predict, disconnect, and it’s an
incredible benefit that such a connection is real The fear sweeping the global work-
force shows how AI is impacting work in diverse fields around the world. However,
it is not the frontline advances that people are displacing, and it is how people should
change and view these movements in order to achieve prosperity and success. As
such, there will be some level of operators affected by the cutoff points, the rela-
tionship between responsibility and regulators is nil in terms of expected outcomes.
Furthermore, in our estimation, a majority of affiliations will eventually have enough
AI-based tools to choose from, at least not in all areas where the upcoming AI will
eventually block HR: registration, organization, embarkation, execution evaluation,
maintenance, and so on, and affiliations are relaxed enough to connect in terms of
the mix. In summary, the execution of AI should be seen as living chance, since AI
improves life, AI improves the future when obviously observed and used in a real
way.
References
1. Singh H, Rehman TB, Gangadhar C, Anand R, Sindhwani N, Babu M (2021) Accuracy

detection of coronary artery disease using machine learning algorithms. Appl Nanosci 1–7
2. Sindhwani N, Anand R, Meivel S, Shukla R, Yadav MP, Yadav V (2021) Performance analysis
of deep neural networks using computer vision. Mach Learn 15:17
3. Bakshi G, Shukla R, Yadav V, Dahiya A, Anand R, Sindhwani N, Singh H (2021) An optimized
approach for feature extraction in multi-relational statistical learning. J Sci Ind Res (JSIR)
80(6):537–542
4. Keller S, Meaney M (2017) Attracting and retaining the right talent. McKinsey global institute
study
5. Davenport TH, Ronanki R (2018) Artificial intelligence for the real world. Harv Bus Rev
96(1):108–116
6. Heric M (2018) HR new digital mandate-digital technologies have become essential for HR to
engage top talent and add value to the business. Accessed 20 August 2018
7. Kahneman D, Brynjolfsson E (2018) Where humans meet machines: intuition expertise and
learning
8. Allal-Chérif O, Aránega AY, Sánchez RC (2021) Intelligent recruitment: how to identify, select,
and retain talents from around the world using artificial intelligence. Technol Forecast Soc
Chang 169:120822
9. Jatobá M, Santos J, Gutierriz I, Moscon D, Fernandes PO, Teixeira JP (2019) Evolution of
artificial intelligence research in human resources. Procedia Comput Sci 164:137–142
10. Razzaq S, Shujahat M, Hussain S, Nawaz F, Wang M, Ali M, Tehseen S (2019) Knowledge
management, organizational commitment and knowledge-worker performance: the neglected
role of knowledge management in the public sector. Bus Process Manage J
11. Yawalkar V (2019) A study of artificial intelligence and its role in human resource management.
IJRAR 6(1):20–24
12. Rathi DR (2018) Artificial intelligence and the future of HR practices. Int J Appl Res 4(6):113–
116
13. Nicastro M, Arbore M, Davis E, Feldman T (2021) Creating a campaign supporting residential
fire sprinkler uptake in Australia
14. Bersin J (2018) AI in HR: a real killer app
15. Dirican C (2015) The impacts of robotics, artificial intelligence on business and economics.
Procedia Soc Behav Sci 195:564–573
16. Lawler JJ, Elliot R (1996) Artificial intelligence in HRM: an experimental study of an expert
system. J Manag 22(1):85–111
17. Prentice C, Dominique Lopes S, Wang X (2020) Emotional intelligence or artificial intel-
ligence–an employee perspective. J Hosp Market Manag 29(4):377–403. https://doi.org/10.
1080/19368623.2019.1647124
18. Zahidi F, Imam Y, Hashmi AU, Baig MM (2020) Impact of artificial intelligence on HR
management–A
19. Nawaz N, Gomes AM (2019) Artificial intelligence chatbots are new recruiters. IJACSA 10(9)
20. Howard J (2019) Artificial intelligence: implications for the future of work. Am J Ind Med
62(11):917–926
21. Chanda A (2019) Impact of artificial intelligence in recruitment, selection, screening and reten-
tion outcomes in the Irish market in view of the global market. Ph.D. diss., Dublin, National
College of Ireland
22. Omondi DO (2014) The influence of organizational culture on employee job performance:
a case study of Pacis insurance company limited. Ph.D. diss., United States international
university-Africa
23. Tambe P, Cappelli P, Yakubovich V (2019) Artificial intelligence in human resources
management: challenges and a path forward. Calif Manage Rev 61(4):15–42
24. Iqbal FM (2018) Can artificial intelligence change the way in which companies recruit, train,
develop and manage human resources in workplace? Asian J Soc Sci Manag Stud 5(3):102–104
25. Kumar BSP, Nagrani K Artificial intelligence in human resource management. JournalNX
106–118
26. Ajit P (2016) Prediction of employee turnover in organizations using machine learning
algorithms. Algorithms 4(5): C5
27. Sterne J (2017) Artificial intelligence for marketing: practical applications. Wiley
28. Toubiana V, Nissenbaum H (2011) An analysis of google log retention policies
Semantic Segmentation of Brain MRI
Images Using Squirrel Search
Algorithm-Based Deep Convolution
Neural Network
B. Tapasvi, E. Gnana Manoharan, and N. Udaya Kumar
Abstract In recent years, brain tumor has become a severe threat to human lives.
These tumors are so often inadequately contrasted and are inadequately dispersed.
In recent days, brain tumor is automatically detected using semantic segmentation.
However, the variability in the size of brain tumors and the low contrast of brain
imaging are the two major problems affecting the performance of semantic segmen-
tation. To address this problem, a squirrel search algorithm-based deep convolution
neural network (SSA-DCNN) proposed for semantic segmentation of the medical
images in this paper. The proposed method is a blend of deep convolution neural
network (DCNN) and squirrel search algorithm (SSA). The SSA is used to fine-
tune the performance of DCNN by optimizing the hyperparameters of the DCNN,
which in turn enhances the accuracy of the semantic segmentation. The proposed
method is implemented and validated by performance metrics such as accuracy, loss,
IoU, and BF score. The performance of SSA-DCNN is compared with the jellyfish
algorithm-based deep convolution neural network (JA-DCNN), oppositional-based
seagull optimization algorithm (OSOA-3DCNN), and particle swarm optimization
(PSO)-DCNN.
Keywords Squirrel search · Deep convolution network · Brain tumor · Semantic

segmentation
B. Tapasvi (B) · E. Gnana Manoharan

ECE Department, Annamalai University, Chidambaram, India
e-mail: tapasvi07@gmail.com
N. Udaya Kumar
ECE Department, SRKR Engineering College, ChinnaAmiram, Bhimavaram 534204, India
e-mail: nuk@srkrec.ac.in
548 B. Tapasvi et al.
1 Introduction
A brain tumor is the growth of abnormal cells in the brain, out of which few of
them may be malignant and can cause cancer. Gliomas is a typical tumor that
occurs in brain and spinal cord. Based on the glioma cell involved in the tumor,
the gliomas can be classified into three types: Astrocytoma, Ependymomas, and
Oligodendrogliomas. Gliomas is also classified as low-grade gliomas (LGG) and
high-grade gliomas (HGG), the latter having more strength and penetration than
the previous one [1]. A glioma usually develops strongly and penetrates deeply on
the basis that it rapidly attacks the central nervous system (CNS). About 18,000
Americans continue to have glioma, according to the U.S. National Cancer Institute.
A significant number of them die within 14 months [2]. In clinical practice, clin-
ical imaging, mainly computed tomography (CT), and magnetic resonance imaging
(MRI) has been used to determine the presence of (1) tumor, (2) peritoneal edema,
and (3) localization. However, the classification and complexity of cerebral tumors
under MRI often make it difficult for radiologists and various physicians to approve
and classify tumors [3, 4]. Therefore, the automatic segmentation of multiple tumors
into clinically relieving physicians from the burden of manual imaging of tumors may
adversely affect clinical medications [5, 6]. Many different types of semantic segmen-
tation have been developed by researchers. Several deep neural network architectures
available in the literature are showing great efficiency in classification and object
recognition. However, the computational complexity of building these architectures
is very high because they have to be custom designed for each different application
and problem domain in a manual fashion. Therefore, there is a need for reducing the
computational complexity in designing the neural network architecture for a specific
application. To this extent, this paper proposes to use a squirrel search-based genetic
algorithm to automatically optimize the structure of deep neural network to be suit-
able for the semantic segmentation of brain tumor images to distinguish between
tumor and the rest of the brain.
The remaining part of the paper is organized as follows; Sect. 2 provides a detailed
description of the proposed methodology. Section 3 provides a detailed description
of the results of the semantic segmentation. The conclusion of the paper is presented
in Sect. 4.
2 SSA-Based Deep Convolution Neural Network
The concept diagram of squirrel search algorithm-based deep convolution neural

network (SSA-DCNN) is shown in Fig. 1. Initially, the brain tumor database for
training the SSA-CNN is collected from the open-source database for medical
images. A database of two hundred images is taken from the source, and to improve
the classification accuracy, the database is augmented by shifting, scaling, and rota-
tion of the pixels in the image. This database is presented to the proposed SSA-DCNN
Semantic Segmentation of Brain MRI Images Using Squirrel… 549
Fig. 1 Concept diagram of SSA-DCNN
for classification of the tumor. In general, all the layers in the DCNN may not be
needed for classification of brain tumor. The decision of skipping some layers, while
retaining other layers of the DCNN to optimize the architecture would be made by
the squirrel search algorithm. After training the SSA-DCNN with the database, the
DCNN is tested for its performance by giving the test images to the network.
The heart of the proposed method lies in two sub-systems: the deep convolution
neural network and the squirrel search algorithm. Therefore, the details of these
two sub-systems are elaborated with respect to the classification of tumor in the
brain MRI images. To optimize the architecture of the DCNN, the squirrel search
algorithm must be presented with the input database, all the layers of the DCNN and
the classification labels of the required application. So, the first step in the proposed
system is to design a base DCNN architecture to be provided for the SSA algorithm
for optimization. The base DCNN for the brain tumor classification is presented in
Fig. 2.
The size of the images in the input layer is chosen to be 110 × 110 × 3 and is
connected to a convolution layer as shown in Fig. 2. The DCNN consists of two
convolution layers with a 3 × 3 filter size and 1 × 1 stride. Each convolution layer is
followed by a batch normalization layer to improve the classification accuracy. Two
max pooling layers are used to reduce the dimensions of the features extracted after
convolution. Two fully connected layers are used to flatten the output and the final
fully connected layer is connected to a softmax layer for classifying the brain image
with or without the tumor.
1 2
Input Image 1 2
3 4 5
Convolutional +ReLU layer Fully Connected Layer
Normalization Layer
Soft max layer
Max pooling Layer
Fig. 2 Deep convolution neural network architecture
Now, the details of the DCNN are presented to the squirrel search algorithm for
optimizing the architecture and thereby reducing the computational complexity.
Squirrel Search Algorithm

The main purpose of optimization is to determine the decision variables of a process
to ensure that the function attains a maximum or a minimum value. Optimization
w.r.t to DCNN refers to the problem of minimizing the number of layers of the
DCNN to reduce the computational complexity. Squirrel search algorithm is nature-
inspired optimization algorithm and mimics the natural foraging behavior of the
flying squirrels. Generally, the food acquisition by the flying squirrels is affected by
two scenarios, viz.: summer and winter seasons. During summer, the flying squirrels
glide from one tree to another tree and collect two different types of food resources.
They gather acorn nuts in the forest to keep up their energy levels during summer
and collect hickory nuts (optimization sources) for the winter season. During winter,
the flying squirrels take rest and utilize the hickory nuts collected during summer.
Mathematically, this process can be divided into five phases:
1. Initialization of algorithm parameters

2. Initialization and sorting of flying squirrel’s locations
3. Generation of new locations through gliding
4. Verification of seasonal monitoring conditions
5. Stopping.
In the initial population phase, a number of flying squirrels are initialized as

DCNN hyperparameters. Additionally, the flying squirrel location is presented as a
vector. The location of complete flying squirrels can be presented as a matrix which
is given as follows,
⎡ ⎤
S1,1 S1,2 ... ... S1,d
⎢S ... ... S2,1 ⎥
⎢ 2,1 S2,2 ⎥
⎢ ⎥
S = ⎢ ... ... ... ... ... ⎥ (1)
⎢ ⎥
⎣ ... ... ... ... ... ⎦
S N ,1 S N ,2 ... ... S N ,d
where Si, j can be represented as jth dimension of ith flying squirrel. The initial loca-
tion of each flying squirrel is allocated with the consideration of uniform distribution
in the forest. The fitness location of each flying squirrel is computed with the decision
variable values into a user-described fitness function. The fitness function values are
stores in the below array.
⎡ ([ ]) ⎤
F1 ([ S1,1 , S1,2 , . . . , S1,d ])
⎢ F S ,S ,...,S ⎥
⎢ 2 2,1 2,2 2,d ⎥
⎢ ⎥
FF = ⎢ ... ⎥ (2)
⎢ ⎥
⎣ . . . ⎦
([ ])
FN S N ,1 , S N ,2 , . . . , S N ,d
The fitness function is mathematically formulated as follows.
FF = MAX{PSNR} (3)
( )
MAX P
PSNR = 10log10 (4)
MSE
1 Σ Σ [ ]2
N M
MSE = Iimage (A, B) − Id−image ( A, B) (5)
N ∗ M X =1 Y =1
where Id−image (A, B) is described as segmented images and Iimage (A, B) is described
as an input image. Based on the fitness function, the DCNN images are selected
which are utilized to enhance the optimal semantic segmentation process. Once
compute fitness values, these values are stored in the array. After that, the stored
fitness values are sorted in ascending order. The flying squirrel with minimum fitness
value is considered a hickory nut tree. The next three optimal flying squirrels are
considered as the acorn nuts tree which moved toward the hickory nut tree. The
remaining flying squirrels are considered the normal tree. In the flying squirrels,
the foraging characteristics are affected by the presence of predators. This normal
character is changed by considering the location updating technique with predator
presence probability function. The new solutions are generated with the consideration
of the dynamic foraging behavior of flying squirrels. The dynamic foraging behavior
of the flying squirrels can be understood with three conditions: scenario 1 in which
the flying squirrels move from acorn nut tree to hickory nut tree, scenario 2, in which
the flying squirrels move to acorn nut tree, and scenario 3 in which, the squirrels are
on the normal tree. The mathematical description of these three scenarios is presented
in this section,
Scenario 1: Flying squirrels are presented in acorn nut trees which moves to
hickory nut tree. The new location is computed as follows.
⎧ ( )
T +1
T
S AT + dg × gC × S HT T − S AT
T
r1 ≥ PD P
S AT = (6)
Random location otherwise
where S HT T can be described as a location of a flying squirrel in a hickory nut tree, r1

can be described as a random number, dg can be described as gliding distance, and
T can be described as current iteration.
Scenario 2: In this scenario, the flying squirrels are moved to the acorn nut trees
to achieve the required food. In this condition, the new location is computed based
on the below conditions.
⎧ T ( T )
S N T + dg × gC × S AT − S NT T r2 ≥ PD P
S NT +1
T = (7)
Here the r2 can be described as a random number in the range [0, 1].
Scenario 3: In this scenario, the squirrels are on normal trees which already
consumed acorn nuts may move toward hickory nut trees to store hickory nuts which
can be considered at the time of food scarcity. The new location of squirrels can be
achieved follows.
⎧ T ( )
S N T + dg × gC × S HT T − S NT T r3 ≥ PD P
S NT +1
T = (8)
Here the r3 can be described as a random number in the range [0, 1]. PD P can be
described as a probability function taken as 0.1 for three scenarios.
In the SSA, seasonal changes significantly affect the foraging activity of squirrels.
They affect heat loss at very low temperatures [12]. The seasonal constant value
should be considered to enhance the performance which presented follows.
[
| D
| Σ ( )2
SC = ]
T T
S AT ,K − S H T,K
T
(9)
K =1
where T = 1, 2, 3.
The relocation of the flying squirrels is designed with the below equation.
T = S L + Levy(N ) × (SU − S L )
S Nnew (10)
Function tolerance is a commonly utilized convergence condition in which a

permissible but small threshold value can be defined among the last two upcoming
results. In this condition, maximum iteration is checked. Based on the algorithm, the
optimal DCNN hyperparameter is selected.
The results obtained by applying the proposed SSA-DCNN architecture on MRI

images of the brain tumor are presented in this section. The proposed method is
implemented on an Intel Core i5-2450 M CPU 2.50 GHz laptop and 6 GB RAM.
This method is implemented in MATLAB software R2016b. To validate the perfor-
mance of the proposed method, the databases are collected from [25, 26] which
consists of 253 MRI images of the brain. The SSA-DCNN is tested for its ability to
perform semantic segmentation to detect the brain tumor by setting up the following
parameters for its functionality: The boundaries are set from −5.12 to 5.12 for a
population of 50, and the training is done for 100 iterations by taking 5 decision
variables. A set of sample images from the database of brain MRI images is shown
in Fig. 3.
The tumor is located at different locations, in each MRI image from Fig. 3a, b,
c, d, e, f, g, h. The proposed system is applied on this set of images, and the result
obtained after semantic segmentation is shown in Fig. 4. Here, the purple color-
shaded region in each image corresponds to the tumor region, and the yellow color
region corresponds to the normal portion of the brain. Our proposed system could
efficiently differentiate the two regions and the demarcation of these two regions is
clearly visible in the figures from Fig. 4a, b, c, d, e, f, g, h. Further, it is observed
that the tumor detected by the proposed system is almost similar to the ground
truth images. The classification accuracy obtained by using the proposed system is
98.455% and the loss of the proposed method is 0.12 for brain images as shown in
Fig. 5.
Comparison Analysis
The proposed method is validated by using comparative analysis. In the compar-
ison analysis, the proposed method is compared with conventional methods such as
OSOA-3DCNN, JA-DCNN, and PSO-DCNN, respectively. The comparison anal-
ysis of accuracy is illustrated in Fig. 6. The comparison analysis of the IoU score is
illustrated in Fig. 7. The comparison analysis of the BF score is illustrated in Fig. 8.
From Figs. 6, 7 and 8, it is clearly visible that the proposed method is showing better
results compared to OSOA-3DCNN, JA-DCNN, and PSO-DCNN.
Till now, the results and comparisons of the performance of the proposed SSA-
DCNN are discussed, and the next section concludes the paper.
Fig. 3 Sample images from brain MRI database
4 Conclusion
In this paper, SSA-DCNN has been developed for semantic segmentation of medical
images. Initially, the brain tumor images have been collected from the open-source
system. The proposed semantic segmentation process is a combination of DCNN
and SSA. In the DCNN, the hyperparameters have been selected with the help of
the SSA algorithm for enhancing the segmentation accuracy. The proposed method
has been implemented and validated by performance metrics such as accuracy, loss,
IoU, and BF score. The proposed method has been compared with the conventional
methods such as JA-DCNN, OSOA-3DCNN, and PSO-DCNN, respectively. From
the results, the proposed methodology has been achieved the best results in terms of
accuracy, loss, IoU, and BF score, respectively. In the future, the efficient methods
will be developed to achieve the best segmentation outcomes in different medical
images.
Fig. 4 Semantic segmentation results of brain MRI images

Fig. 5 Analysis of brain tumor a accuracy and b loss
100
Accuracy (%)
80
60
40
20
0
SSA-DCNN OSOA-3DCNN JA-DCNN PSO-DCNN DCNN
Fig. 6 Comparison analysis of the accuracy
0.96
0.95
0.94
IoU
0.93
0.92
0.91
Fig. 7 Comparison analysis of the IoU

0.825
0.82
BF score
0.815
0.81
0.805
0.8
Fig. 8 Comparison analysis of the BF score
References
1. Zhang D, Huang G, Zhang Q, Han J, Han J, Yu Y (2021) Cross-modality deep feature learning
for brain tumor segmentation. Pattern Recogn 110:107562
2. Naser MA, Deen MJ (2020) Brain tumor segmentation and grading of lower-grade glioma using
deep learning in MRI images. Comput Bio Med 121:103758
3. Khan H, Shah PM, Shah MA, ul Islam S, Rodrigues JJ (2020) Cascading handcrafted features
and convolutional neural network for IoT-enabled brain tumor segmentation. Comput Commun
153:196–207
4. Aboelenein NM, Songhao P, Koubaa A, Noor A, Afifi A (2020) HTTU-Net: hybrid two track
U-net for automatic brain tumor segmentation. IEEE Access 8:101406–101415
5. Yogananda CGB, Shah BR, Vejdani-Jahromi M, Nalawade SS, Murugesan GK, Yu FF, Pinho MC
et al (2020) A fully automated deep learning network for brain tumor segmentation. Tomography
6(2):186–193
6. Zhang W, Yang G, Huang H, Yang W, Xu X, Liu Y, Lai X (2021) ME-net: multi-encoder net
framework for brain tumor segmentation. Int J Imaging Syst Technol
Top Five Machine Learning Libraries in
Python: A Comparative Analysis
Mothe Rajesh and M. Sheshikala
Abstract Nowadays machine learning (ML) is used in all sorts of fields like health
care, retail, travel, finance, social media, etc. ML system is used to learn from input
data to construct a suitable model by continuously estimating, optimizing, and tuning
parameters of the model. To attain the stated, Python programming language is one of
the most flexible languages, and it does contain special libraries for ML applications,
namely SciKit-Learn, TensorFlow, PyTorch, Keras, Theano, etc., which is great for
linear algebra and getting to know kernel methods of machine learning. The Python
programming language is great to use when working with ML algorithms and has
easy syntax relatively. When taking the deep-dive into ML, choosing a framework can
be daunting. The most common concern is to understand which of these libraries has
the most momentum in ML system modeling and development. The major objective
of this paper is to provide extensive knowledge on various Python libraries and
different ML algorithms in comparison with meet multiple application requirements.
This paper also reviewed various ML algorithms and application domains.
Keywords Machine Learning · Libraries · Python
1 Introduction
Machine learning (ML) is the most popular technology in today’s world. The ML-
domain is very immense, and it is developing quickly, being constantly apportioned
and sub-parcelled relentlessly into various sub-fortes and types [1]. AI is the domain
of study that enables PCs to learn without being unequivocally modified and tackles
issues that can’t be addressed by mathematical. Among the various kinds of ML
assignments, a pivotal qualification is drawn among regulated and unaided learning:
Supervised machine learning, the program is “prepared” on a predefined set of
“preparing models”, which then, at that point, work with its capacity to arrive at an
M. Rajesh (B) · M. Sheshikala

SR University, Warangal, Telangana, India
e-mail: mraj1210@gmail.com
559
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2023
560 M. Rajesh and M. Sheshikala
exact resolution when given new information. Solo machine learning is a program
that gives a lot of information and should discover examples and connections in that.
These days, the strength of an organization is estimated by the measure of
information it has. These organizations examine the information and concentrate
valuable data. For example, E-Marts continues proposing items dependent on your
buying patterns and Facebook and Twitter consistently recommend companions and
posts in which you may be intrigued. Information in crude structure resembles
raw petroleum, we need to filter raw petroleum to make petroleum and diesel.
Additionally, machine learning is helpful in handling the information to get valuable
experiences.
Machine learning algorithms are being applied in heaps of spots curiously.
It’s turning out to be progressively universal with an ever-increasing number of
utilization’s in places where we can’t consider, in some application of medical field,
academic and Data Center Optimization. In the medical field, machine learning
assumes a fundamental part and is as a rule progressively applied to clinical
image division, image enlistment, multi-modal picture combination, PC supported
conclusion, image directed care will be taken, picture comment, and picture information
base recovery, where disappointment could be lethal [2]. In academics, educators
need to plan showing materials, physically grade understudies’ schoolwork, and
give criticism to the understudies on their learning progress. Understudies, then
again, regularly amazingly troublesome “one-size-fits-all” training measure that isn’t
customized to their capacities, needs, and training setting [3].
In ongoing advances, ML furnish new freedoms to handle difficulties in instruction
framework by gathering and dissecting the understudy’s information and produce
when they cooperate with a learning framework. Those large number of racks of
murmuring workers utilize tremendous measures of energy; together, all current
server farms utilize generally 2% of the world’s power, and whenever left unchecked,
this energy request could develop as quickly as Internet use. So, making server
farms run as productively as conceivable is an exceptionally serious deal. ML is
appropriate for the DC climate given the intricacy of plant activities and the plenitude
of existing observing information and a portion of the undertakings dealt with by
machine learning DC Plant Configuration Optimization [4].
2 Overview of Machine Learning Libraries
Python is becoming famous step by step and has begun to supplant numerous well-
known dialects in the business. The effortlessness of Python has drawn in numerous
designers to assemble libraries for machine learning and data science, due to this load
of libraries, Python is practically well known as R for data science. It is an engaging
decision for an algorithmic turn of events and exploratory information examination
[5, 6]. The overview of top machine learning libraries in Python is shown in Fig. 1.
Top Five Machine Learning Libraries … 561
Fig. 1 Overview of top machine learning libraries in Python
2.1 Scikit-Learn
Scikit-learn [7] is a famous AI library in Python. It is a library coordinating a wide

range of best-in-class AI algorithms for medium-scale directed and solo issues.
Accentuation is put on convenience, execution, documentation, and API consistency.
It has insignificant conditions and is circulated under the worked-on empowering its
utilization in both scholastic and business settings. Scikit-learn accompanies the
help of different calculations, for example, Classification, Regression, Clustering,
Dimensionality Reduction, Model Selection, and Pre-handling.
Scikit-learn enjoys a few benefits like, it has a go-to bundle that comprises a
multitude of strategies for executing the standard calculations of AI. It has a basic
and predictable interface that helps fit and change the model over any dataset. It is the
most appropriate library for making pipelines that assist with building a quick model.
It is additionally the awesome solid arrangement of machine learning models. On the
other hand has a few disservices additionally like, it isn’t fit for utilizing downright
information to calculations, and it is intensely subject to the SciPy stack.
2.2 TensorFlow
TensorFlow [8] (TF) is created by Google Brain and a team of Google, it has
decent documentation, a lot of functionalities alongside the rudiments and it is
feasible to make code entirely customizable. Since it is composed as a low-level
library, it is a bit more earnestly to dominate. TensorBoard is a representation
instrument that accompanies every one of the standard establishments of TF. It
permits clients to screen their models, boundaries, misfortunes, and substantially
more. A portion of the fundamental regions where TensorFlow sparkles are Handling
profound neural organizations, NLP, Abstraction capacities, Image, Text, and Speech
acknowledgment, Effortless cooperation of thoughts and code. The center undertaking
of TensorFlow is to fabricate profound learning models. Figure 2 illustrates the data
flow processing mechanism in TensorFlow.
TensorFlow enjoys many benefits, some resemble assists us with carrying out
support learning. We can straight away imagine machine learning models utilizing
TensorBoard, an instrument in the TensorFlow library. We can convey the models
constructed utilizing TensorFlow on CPUs just as GPUs. On the other hand, it
Fig. 2 Data flow processing

in TensorFlow
has a few weaknesses like, it runs significantly more slow in contrast with those
CPUs/GPUs that are utilizing different structures. The computational charts in
TensorFlow are moderate when executed.
2.3 Keras
Keras [9] is based on and works on top of TensorFlow programming in it, which is like
the way on a more significant level. The expense for that is harder customization of
code. Notably, customization and tweaking of code are a lot simpler when coding at a
low level. Keras includes a few of the structure squares and apparatuses fundamental
for making a neural organization, for example, Neural layers, Activation and cost
capacities, Objectives, Batch standardization, Dropout, Pooling.
Keras enjoys a few benefits like, it is the awesome examination work and product
prototyping. The Keras system is compact. It permits a simple portrayal of neural
organizations. It is exceptionally productive for representation and demonstrating. On
other hand, it has a few impediments like it is delayed as it requires a computational
chart before carrying out an activity.
2.4 PyTorch
PyTorch [10] (PT) is created as well as utilized by Facebook. This was grown later
than TensorFlow, however, its local area is developing quickly. PyTorch runs its code
in a more procedural style, while in TensorFlow, one first necessity to plan the entire
model and afterward run it inside a session. Along these lines, it is a lot simpler
to troubleshoot code in PyTorch. It has more “pythonic” codes, it is simpler to
learn and simpler to use for speedy prototyping. PyTorch and Keras additionally
have great documentation. A portion of the crucial provisions that put PyTorch
aside from TensorFlow is Tensor registering with the capacity for sped-up handling
through GP units, easy to study, utilize and coordinate with the remainder of the
Python environment, support for neural organizations based on a tape-based auto
diff framework. The different modules PyTorch accompanies that help makes and
train neural organizations are Tensors-torch.Tensor, Optimizers-torch.optim module,
Neural Networks-nn module, and Autograd.
PyTorch enjoys a few benefits like its system is famous for its speed of execution.
It is equipped for taking care of amazing charts. It additionally incorporates different
Python articles and libraries. On the other hand, it has a few weaknesses like the
local area for PyTorch isn’t broad, and it slacks to give content to questions. In
contrast with other Python structures, PyTorch has lesser elements as far as giving
representations and application investigating.
2.5 Theano
Theano [11] library is a Python interface for the upgrading compiler. After enhancement
and accumulation, the capacities become accessible as regular Python capacities,
yet have superior. Vector, grid, and tensor activities are upheld and productively
paralleled on accessible equipment. A portion of the elements that make Theano
a hearty library for doing logical estimations. It assists for GPUs to do more in
uncompromising calculations contrasted with CPUs, strong joining relation with
NumPy, faster and stable assessments of even the trickiest of factors, and ability to
make custom C code for your numerical activities. Theano enjoys a few benefits
like, it upholds GPUs that assist applications with performing complex calculations
proficiently, it is straightforward execute Theano as a result of its combination with
NumPy, there is an immense local area of engineers utilizing Theano. On another
hand, it has a few burdens like, it is slower in the back end, there are different issues
in Theano’s low-level API, it gives a ton of back-end blunders. Also, the Theano
library has a precarious expectation to absorb information.
The overall comparison of machine learning libraries is done in the next section,
here in Table 1, all the basic information of the libraries is mentioned like developed
by whom, launched in which year, written in which languages, and well-known
applications.
3 Comparative Analysis of Machine Learning Libraries
Table 2 gives the libraries information taken from GitHub. It is a popular online
hosting service for version control [12]. Here the information is taken based on
parameters like the number of stars, forks, contributors, and activity on the library
repository [13]. If we look into the table, all the libraries have a good number of stars,
which means all the libraries are performing better and users giving good feedback.
Table 1 Machine learning libraries basic information

Library Developed by Launched year Written in Well-known
applications
Scikit-Learn David 2007 Python, C, C++, Spotify, Inria
Cournapeau and Cyhton
TensorFlow Google Brain and 2015 Python, CUDA, Lib
team of Google and C++
PyTorch Facebook’s AI 2016 Python, CUDA, Apple and
Research Lab and C++ Samsung
Electronics
Keras François Chollet 2015 Python Uber and Netflix
Theano MILA, University 2007 Python, CUDA Zetaops and
of Montreal Vuclip
Table 2 GitHub comparative information about machine learning libraries

Library Stars Forked Contributors Activity
Scikit-Learn 33,337 16,358 1253 38 (94)
TensorFlow 120,548 72,009 1835 193 (1889)
PyTorch 24,782 5877 933 151 (912)
Keras 38,197 14,585 774 21 (52)
Theano 2033 475 46 3 (17)
In the list of specified libraries, TensorFlow has more contributors, which implies
that it has more popularity among the all other machine learning libraries [14–17].
Theano is becoming famous day by day in machine learning applications.
4 Conclusions
There are many more libraries in the ML world, but these are the most popular and
widely used. ML is a huge world and the most promising tech right now. No matter
the programming language or the area a developer is working in, learning to work
with libraries is important. Doing so helps in de-complexing the things and to cut
the tedious effort.
References
1. https://www.toptal.com/machine-learning/machine-learningtheory-an-introductory-prime
2. There A, Jeon M, Sethi IK, Xu B (2017) Machine learning theory and applications for
healthcare. J Healthc Eng 2017. Article ID 5263570
3. Dilhara M, Ketkar A, Dig D (2021) Understanding software-2.0: a study of machine learning

library usage and evolution. ACM Trans Softw Eng Methodol (TOSEM) 30(4):1–42
4. Gao J (2014) Machine learning applications for data center optimization
5. Dubois PF (ed) (2007) Python: batteries included, volume 9 of computing in science &
engineering. IEEE/AIP
6. Milmann KJ, Avaizis M (eds) (2011) Scientific Python, volume 11 of computing in science &
engineering. IEEE/AIP
7. Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–
2830
8. Abadi M et al (2016) TensorFlow: a system for large-scale machine learning. In: Proceedings
of the 12th USENIX symposium on operating systems design and implementation (OSDI’16)
9. Chollet F (2019) Keras. https://github.com/fchollet/keras
10. PyTorch (2019) https://pytorch.org/
11. Theano Development Team (2016) Theano: a Python framework for fast computation of
mathematical expressions. eprint: arXiv:1605.02688
12. GitHub (2019) https://github.com/
13. Sheshikala M, Kothandaraman D, Vijaya Prakash R, Roopa G (2019) Natural language
processing and machine learning classifier used for detecting the author of the sentence. Int J
Recent Technol Eng 8(3):936–939
14. Ravi Kumar R, Babu Reddy M, Praveen P (2019) An evaluation of feature selection algorithms
in machine learning. Int J Sci Technol Res 8(12):2071–2074
15. Kumar RR, Reddy MB, Praveen P (2019) Text classification performance analysis on machine
learning. Int J Adv Sci Technol 28(20):691–697
16. Kollem S, Reddy KRL, Rao DS (2019) A review of image denoising and segmentation methods
based on medical images. Int J Mach Learn Comput 9(3):288–295
17. Tizpaz-Niari S, Černý P, Trivedi A (2020) Detecting and understanding real-world differential
performance bugs in machine learning libraries. In: Proceedings of the 29th ACM SIGSOFT
international symposium on software testing and analysis. Association for Computing
Machinery, New York, NY, pp 189–199. https://doi.org/10.1145/3395363.3404540
A Novel Technique of Threshold
Distance-Based Vehicle Tracking System
for Woman Safety
B. V. D. S. Sekhar, V. V. S. S. S. Chakravarthy, S. Venkataramana,

Bh. V. S. R. K. Raju, N. Udayakumar, and S. Krishna Rao
Abstract In the current days, popular and innovative works are in developing
tracking efficient systems. This is an easy part, with a smart phone application.
The main objective involved in system for Vehicle Tracking is for providing the
vehicle location on maps using tools like Global Positioning Systems (GPS) and
Global System for mobile communications (GSM) (Hlaing et al. in Int J Trend Sci
Res Dev 3, 2019; Ramadan et al. in Int J Mach Lear Comp 2, 2012; Shafee et al.
Int J Adv Comp Sci App 4, 2013) which operates using base stations and satellites
using Internet in both web and Android applications. Markers in the map provide the
complete information about that vehicle which includes time, address and vehicle
ID. Scripting is done on server by using PHP which is used to insert and get the
details of location information of vehicles. An Android smart phone application is
developed to track the vehicle expected to this application in his smart phone. Appli-
cation will continuously monitor the position and updates the database for every five
minutes. Registered vehicles are issued a track ID, and this is confined to only one
smart phone, which is achieved my MAC-based authentication. He has to specify
the track ID which is basically transport vehicle number and the MAC address of his
smart phone during registration process. Paper keeps emphasis on tracking system
as core concept and explains about various application areas in which system can
be implemented. This work provides real-time results through experimentation and
implementations.
Keywords Global Positioning Systems (GPS) · Global System for mobile

communications (GSM) · Threshold distance · Tracking · Authentication · Woman
safety
B. V. D. S. Sekhar · S. Venkataramana · Bh. V. S. R. K. Raju · N. Udayakumar

S R K R Engineering College, Bhimavaram, Andhra Pradesh, India
e-mail: bvdssekhar@srkrec.ac.in
V. V. S. S. S. Chakravarthy (B)
Raghu Institute of Technology, Dakamari, Visakhapatnam, Andhra Pradesh, India
e-mail: sameervedula@ieee.org
S. K. Rao
Sir C R R Engineering College, Eluru, Andhra Pradesh, India
568 B. V. D. S. Sekhar et al.
1 Introduction
Initial implementation of tracking systems for vehicles was done for shipping
industry where the system provides current position and where about at any given
instance of time. With the needs of people and innovations in technologies, currently
such tracking systems are implemented in real time. Moving further in this work, we
tried and successfully implemented real-time tracking system using which people
can know the exact locations of intended vehicle on their handheld Android mobile
phones or smart phone applications.
We in this work used technologies like GPS and GSM to provide real-time location
and time information anywhere on this globe. PHP which is a server-side scripting
language is also used to send the values from an android device to MQSQL database,
to retrieve the values from the database,to perform analytics on GPS data available
in the database.
1.1 Global Positioning Systems (GPS)
A GPS network involves GPS transmitter, GPS receiver and satellites, sending radio
signals of particular frequency which will be detected by GPS receiver. The trans-
mitted messages from satellites are coded in time at this particular frequency with
exact time as well because satellites use atomic clock.
The GPS receiver detects the satellites it can listen, and then collects messages.
As told above, the messages incorporate with time, current position of the satellite
and bits related to other information. These streams of messages are delayed so as
to compensate power and ease to read messages as all satellites transmit messages
on same frequency. Hence, time required for locating position on normal GPS takes
30–60 s.
1.2 Assisted GPS (AGPS)
As a normal GPS is delayed in providing exact location, Assisted GPS (AGPS)

was introduced. In AGPS, the mobiles with GPS collect the satellite information
or messages from local cellular network towers [1–4]. There will be a short time
switching from cellular to GPS which is not noticeable. Send this unprocessed data
to mobile company which in turn processes the data and sends back to mobile for
exact location.
In the case of wireless transmission of data [5–8], short message service (SMS) is
commonly implemented over GSM network. In this communication model, a GSM
modem provides Track ID which enables users track the vehicle. People who are
travelling by anonymous vehicles use our application by entering the vehicle no. as
A Novel Technique of Threshold Distance-Based Vehicle... 569
track ID and a mobile number to which track ID has to be sent, so that parent/friend
can use this track ID and track the person. Vehicles within the specified range can
also be identified by using our smart phone application in which results are viewed
as markers in map which gives us complete information.
2 Existing System
Existed Vehicle Tracking system covers the aspect of tracking a vehicle by using
GPS satellites and Android smart phone application. These are not dealing with
the application area such as implementing this tracking system which relates to the
aspect of woman safety. Existing system cannot get the vehicles which are located
in threshold distance to our vehicle or device [9–11].
3 Proposed System
3.1 Architecture
The architecture of the proposed system can be described as follows (Fig. 1).
i. Registration and get the track ID.

ii. Install track application in monitoring android device.
iii. Driver enters bus ID/woman enters track ID in application before she entering
into the anonymous vehicle.
iv. Press track button in application after entering the track ID in application.
v. To track where anonymous vehicle is travelling, send the track ID to parents
via personal message.
vi. Trace the route that vehicle travelling, by entering the track ID in our website
after login process.
vii. Any user can know the current location of bus by entering the bus vehicle
no./service no.
A dedicated code snippet to get the location from GPS/mobile network as a

failover. In this module, server has to capture details sent by the application, and
these have to store in database. After storing the details, website has to retrieve these
values from db when user requests the website to track by entering track ID.
Tracking application mainly deals with tracking a vehicle assuming vehicle is
associated with an Android device which runs our application. Soon vehicle started
he has to tap track button. The structure of the code can be explained in the following
three specifications.
Fig. 1 Architecture for proposed system
i. The code snippet is used to send geo-coordinates data and associated address
by using reverse geo coding and time to our web server. The address from
GPS coordinates is obtained. Further, the data is retrieved and inserted into the
dedicated database. This data is used for woman safety module and threshold
distance-based tracking.
ii. Whenever user wants to track the vehicle, he enters the track ID in the application
or website to get position which can be viewed in Google Map associated to it.
iii. To confine a track ID with a specific device, a MAC-based authentication is
used in which user has to register the device with track ID by entering the MAC
address of mobile.
3.2 Bus Tracking Mechanism
Android application will retrieve GPS location of device for every 5 min and send
location details to database along with device ID associated with device which is
basically a vehicle no./service no. (or) bus driver starts the bus by entering the
service no./vehicle no. and start the application. So that Android application will
start tracking bus locations every 5 min and send to our database. We will provide
a website to register the bus by entering an ID and other personal details. Users can
track the bus by entering ID via by our application or he can use a normal browser
Fig. 2 Architecture for bus tracking module
where we can see markers on Google Map. We can set map to satellite view to get
more clearance. Users by tapping on marker can get the information such as bus ID
which is basically a vehicle no. issued by the Road Transport Authority and service
number, so that we can know the route the bus travels by service number, we will
also get the time and address where bus is located. We also get path in which bus
travels soon they start tracking (Fig. 2).
3.3 Protect Her
Dealing with aspect of women safety, this application can protect her. Women who are
travelling by anonymous vehicles can assure safety by our application. They should
enter the vehicle no. before entering the anonymous vehicle and start the application.
They have to enter the mobile number to a person who is basically a father to know
the vehicle ID to track the vehicle. Soon she enters track ID and mobile number, she
has to tap track button. Application will send this vehicle no. which is a track ID as
a SMS to entered mobile number. So that by entering the vehicle no. in our website
or in Android application they can track the vehicle if there is any problem (Fig. 3).
Fig. 3 Architecture for woman safety module
3.4 Trace Route for Vehicles in Threshold Distance
Dealing with a scenario in which a person got stuck in a highway. If he knows what are
the buses that are available to him within specified threshold distance. Our Android
application or through website he can know the vehicles within threshold distance.
If he takes service no. and check the route so that he will retrieve the source and
destination of a bus. This is done by taking his own location and data analytics with
our locations data in our database by taking locations which are only 5 km far away
from our location. So that he/she might know the vehicle that approaching him and
by knowing service no. he may know the path that bus travels. We also provides the
possible path from his location to vehicle available within threshold distance which
can be viewed in Google Map provided in both Android and in website (Fig. 4).
Fig. 4 Architecture for trace route module
4 Experimental Results
The experimental results are practically taken and presented in several scenarios as
follows.
1. Testing smartphone application.
2. Testing web server.
3. Testing results in both Android and website.
4. Testing all the modules.
Testing objectives and modules and their related data and status are given in the
Table 1.
Real-time tracking screen shots of the proposed tracking system and mobile
interface (Figs. 5, 6, 7 and 8).
Adding the fuel level sensor to this system results in prediction of distance in which
vehicle can travel and can estimate whether it reaches the nearest refilling centre.
Based on the data available in database, we can calculate the speed and distance of
the vehicle. Prediction of person by using data analytics, i.e. more presence of user
in one location. We can perform anti-theft module by using microcontroller with
GSM module. Implementing this tracking system to delivery instances to track the
item. Finding the nearest ambulance, school buses details with threshold distance
Table 1 Testing of different proposed system modules and their outcomes

Test Feature/Functionality Test Test Test data Expected Actual Status
Id Id objectives steps behaviour
2.1 Toggle GPS ON Enabling Start GPS_system GPS not As Yes
GPS GPS service enabled go expected
to settings
2.2 Enabling high Enable Tap – Lat:0 Lng:0 As Yes
accuracy (Agps) Google tracker Message to expected
location button turn ON
service Google
location
service
2.3 Message Message Send Mobile Notification As Yes
to entered message number as message expected
mobile to entered in sent
number parent edit text
view
2.4 Sending data to Lat, Lng, Enter Latitude, Data As Yes
server time, track ID longitude entered expected
address and time, successfully
ID to press address in
server track database
2.5 Displaying markers Track ID Enter Track ID Map with As Yes
in map with a and ID and markers expected
specific ID vehicle press
ID Get
Button
2.6 Displaying markers Track ID Enter Track ID Map with As Yes
with information on should be track ID multiple expected
map entered and markers and
press route from
track origin to all
bus destinations
button
in
website
module. Using satellite images, we can find nearest water bodies. We can develop
an application that list of buses arrives a particular station within five minutes.
Fig. 5 Real-time tracking on 3D Google map
Fig. 6 Real-time tracking on 2D map
5 Conclusions
Tracking system gives us an advantage for companies to locate their vehicles and to
retrieve exact location of vehicle. This can be utilized by the organizations who deal
with product delivery systems. Recent evolution of online cab services too can use
this system. However, even a group of people can place a log of their location details,
so that they will know each other at what distance they separated. Also useful in some
investigation-related activities done by the Police or Military Departments. However
in future, we can even expect a default option in smart phones to enable the location
log with the unique ID or registered mobile number itself. It is a centralized system
capable of giving complete information of location of mobile which can be useful
Fig. 7 Real-time tracking on satellite view of Google map
Fig. 8 Mobile application interface

in robbery situations or to know the exact location of person who is calling to him
(CONSTRAINT: mobile number is the track ID.). We can even use the GLONASS
and GALILEO which are more accurate than GPS but receivers cost high. Same
system can be developed by using GPS + GPRS module and microcontroller board.
References
1. Rohitaksha K, Madhu CG, Nalini BG, Nirupama CV (2014) Android application for vehicle
theft prevention and tracking system. Int J Comp Sci Info Tech 5(3)
2. Dhumal A, Naikoji A, Patwa Y, Shilimkar M, Nighot MK (2014) Survey paper on vehicle
tracking system using GPS and android. Int J Adv Res Comp Eng Tech 3(11):3762–3765
3. Nkem NF (2020) Implementation of car tracking system using GSM/GPS. Int J Sci Res Pub
10(3)
4. Patil U, Mathad SN, Patil SR (2018) Vehicle tracking system using GPS and GSM using mobile
applications. Intl J Inno Sci Res Tech 3(5)
5. Sekhar BVDS, Reddy PVGD, Varma GPS (2017) Performance of secure and robust water-
marking using evolutionary computing technique. J Glo Inf Mang 25(4)
6. Sekhar BVDS, Venkataramana S, Chakravarthy VVSS, Chowdary PSR, Varma GPS (2018)
Image denoising using wavelet transform based flower pollination algorithm. Adv Intell Syst
Comput 862 Springer
7. Sekhar BVDS et al. (2019) Image denoising using novel social grouping optimization algorithm
with transform domain technique. Int J Natl Comp Res 8(4)
8. Sekhar BVDS, Reddy PVGD, Varma GPS (2015) Novel technique of image denoising using
adaptive haar wavelet transformation. IRECOS 10(10)
9. Venkataramana S, Reddy PVGD, Krishna Rao S (2017) EEECARP: efficient energy clustering
adaptive routing procedure for wireless sensor networks. J Glo Inf Mang 25(4)
10. Venkataramana S, Sekhar BVDS et al. (2020) Recognition of human being through handwritten
digits using image processing techniques and AI. Int J Inno Eng Manag Res 9(12)
11. Deshai N, Sekhar BVDS, Reddy PVGD, Chakravarthy VVSSS (2020) Processing real world
datasets using big data hadoop tools. J Sci Indu Res 79(7)
12. Hlaing N, Naing M, Naing S (2019) GPS and GSM based vehicle tracking system. Int J Tren
Sci Res Deve 3
13. Ramadan MN, Al-Khedher MA, Al-Kheder SA (2012) Intelligent anti-theft and tracking system
for automobiles. Int J Mach Lear Comp 2(1)
14. ElShafee A, EIMenshawi M, Saeed M (2013) Integrating social network services with vehicle
tracking technologies. Int J Adv Comp Sci App 4(6)
Author Index
A D
Aakunoori Suryanandh, 239 Deba Prakash Satapathy, 273, 329, 339, 355
Abinash Sahoo, 273, 299, 319, 329, 339, Deepanshi Agarwal, 29
355 Deepti Barhate, 85, 115
Adi Narayana Reddy, K., 107 Deva Kumar, I., 455
Aiswarya Mishra, 329 Devi Sowmya, M., 455
Akash Naik, 299 Dharmesh Shah, 261
Aluri Lakshmi, 179 Disha Singh, 365
Amit Gupta, 207 Dutta Sai Eswari, 283
Amtul B. Ifra, 419
Anisha, P. R., 409
Anjali Singhal, 29 E
Ankit Yadav, 365 Ebin Deni Raj, 93
Ansuman Mahapatra, 483, 491 Esha Singh, 29
Anuradha, T., 447
Anuradha, Y., 475
G
Arkajyoti Ray, 273
Gaddam Samatha, 437
Ashoka Kumar Ratha, 349
Gajavalli, J., 501
Ashutosh Kumar Dubey, 85, 115
Ghousia Begum, 409
Gnana Manoharan, E., 547
Godavarthi Sri Sai Vikas, 447
B Gopal Krishna Sahoo, 339
Bala Sundar, T., 483 Gopal Rao Kulkarni, 437
Balendra Mouli Marrapu, 309
Bharathi Uppalapati, 39
Bhimala Raghava, 465 H
Bhoomika, S. S., 73 Harapriya Swain, 355
Hardeep Kaur, 535
Hari Shankar Chandran, 373
Hyma, J., 475
C
Chaitanya P. Agrawal, 157
Chakravarthy, V. V. S. S. S., 567 I
Chilupuri Supriya, 239 Indrasena Reddy, M., 135
Chirag Arora, 1, 383 Ippili Saikrishna Amacharyulu, 309
© The Editor(s) (if applicable) and The Author(s), under exclusive license 579
to Springer Nature Singapore Pte Ltd. 2023
and Systems 494, https://doi.org/10.1007/978-981-19-4863-3
580 Author Index
Ippili Saikrishna Macharyulu, 273, 299, Navya Thampan, 15

319 Nihar Ranjan Mohanta, 273, 319
J P
Jeet Santosh Nimbhorkar, 61 Padmaja Usharani, D., 227
Jeevesh, K., 61 Padma Vasavi, K., 197
Jeyalaksshmi, S., 501, 515 Poornima, K. M., 73
Juthika Mahanta, 171 Prabira Kumar Sethy, 349
Prameet Kumar Nanda, 319
Prathima, K., 135, 283
K Praveen, P., 239
Kakunuri Sandya, 187 Preeti, C. M., 391
Kalyani, G., 455 Priyanka, 523
Kavitha, D., 373 Priyashree Ekka, 319
Kirill Krinkin, 147
Kirti Walia, 523
Kishore, T. S., 125 R
Kishor Kumar Reddy, C., 409 Raghavaiah, B., 207
Konda Srikar Goud, 283 Raghavender Raju, L., 251
Kranthi, A., 217 Rajat Valecha, 29
Krishna Kishore, P., 283 Raju, Bh. V. S. R. K., 567
Krishna Rao, S., 567 Rakesh, B., 135
Kurapati Sreenivas Aravind, 61 Ramakrishna Murty, M., 475
Rambabu, D., 391
Rambabu Pemula, 227
L Ranjan Mishra, S., 475
Lakshmana Rao, K., 125 Ravi Mohan Sharma, 157
Lakshmi, L., 51 Ravinder Kaur, 535
Lakshmi Ramani, B., 465 Ravinder Reddy, R., 251
Latha, D., 179 Ravuri Naveen Kumar, 447
Lingala Thirupathi, 391, 401 Rekha, G., 401
Lolla Kiran Kumar, 1 Remya Raveendran, 93
Ritika Malik, 29
Rohith Kumar Jayana, 465
M
Madiha Sadaf, 419
Mallam Gurudeep, 437 S
Mandala Nischitha, 239 Sachin Sharma, 261
Mavoori Hitesh Kumar, 299, 319 Sagenela Vijaya Kumar, 227
Mitta Yogitha, 239 Sai Rashitha Sree, J., 455
Mohan Gopal Raje Urs, 147 Sai Vignesh, P. J., 515
Mothe Rajesh, 559 Sakshi Zanje, 261
Murali Nath, R. S., 51 Sandeep Ravikanti, 437
Sandeep Samantara, 339
Sandeep Samantaray, 273, 299, 319, 329,
N 355
Naga Kalyani, A., 51 Sanket S. Kulkarni, 483
Nagarampalli Manoj Kumar, 299 Santi Kumari Behera, 349
Naga Satish, G., 51 Satwik Kaza, 465
Nageswara Rao, A. V., 207 Sekhar, B. V. D. S., 567
Nalini Kanta Barpanda, 349 Senthil Arumugam Muthukumaraswamy,
Nasaka Ravi Praneeth, 447 15
Naveen Kumar Laskari, 107 Shanmuga Sundari, M., 217
Author Index 581
Shashi Kant Dargar, 207 Sujatha, R., 373

Sheshikala, M., 559 Sunil Pathak, 85, 115
Shiva Prakash, S. P., 147 Suresh C. Satapathy, 329, 339, 355
Shruthi, S. K., 401 Surya Raj, 491
Shyam Chandra Prasad, G., 107
Smitha Chowdary, 429
Soma Sai Surya Teja Kamatam, 465 T
Sowjanya, B., 401 Tapasvi, B., 547
Sowmya Jujuroo, 401
Sravanthi, Ch., 429
Sreekanth, N., 107 U
Sreenivasa Rao, S., 1 Udaya Kumar, N., 547, 567
Sridevi, G., 227 Uma Maheswari, B., 373
Sri Lakshmi, T., 465 Unnati Khanapurkar, 391
Srinibash Sahoo, 299 Upendra Kumar, P., 125
Srinivasa Rao, P., 1, 39
Srinivasa Rao, S., 39
Srinivasa Reddy, K., 135 V
Srinivasulu, B., 391 Varsha Nemade, 85, 115
Subhadra Kompella, 187 Vasala Madhava Rao, 309
Subhankar Jana, 171 Venkata Krishna Reddy, M., 251
Sudha Rani, M., 217 Venkataramana, S., 567
Suja Palaniswamy, 61 Vikrant Bhateja, 365

Top Five Machine Learning Libraries in Python - A Comparative Analysis

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Top Five Machine Learning Libraries in Python - A Comparative Analysis

Uploaded by

Copyright:

Available Formats

Lecture Notes in Networks and Systems 494

For proposals from Asia please contact Aninda Bose (aninda.bose@springer.com).

Intelligent System Design

ISSN 2367-3370 ISSN 2367-3389 (electronic)

Sri. K. V. Vishnu Raju, Chairman, SVES

Sri. Ravichandran Rajagopal, Vice-Chairman, SVES

Dr. K. V. N. Sunitha, Principal, BVRIT, Hyderabad, India

Dr. J. Naga Vishnu Vardhan, Professor, ECE, and Professor-Incharge Academics,

Dr. Vikrant Bhateja, Shri Ramswaroop Memorial College of Engineering and

Dr. Ch. Sunil Kumar, Vice-Principal and HoD, EEE

Website and Poster Committee

Ms. M. Shanmuga Sundari, Assistant Professor, CSE

Dr. P. Kayal, Associate Professor, IT and R&D Incharge

Aimé Lay-Ekuakille, University of Salento, Lecce, Italy

Yu-Dong Zhang, University of Leicester, UK

Technical Program Committee

Abdul Rajak A. R., Department of Electronics and Communication Engineering

Apurva A. Desai, Veer Narmad South Gujarat University, Surat, India

Sushil Kumar, School of Computer and Systems Sciences, Jawaharlal Nehru

This book is a collection of high-quality peer-reviewed research papers presented

These keynote lectures/talks embraced a huge toll of audience of students, facul-

Bhubaneswar, India Suresh Chandra Satapathy

A Framework for Early Recognition of Alzheimer’s Using Machine

Deep Generative Models Under GAN: Variants, Applications,

Classification of High-Dimensionality Data Using Machine

Monthly Runoff Prediction by Support Vector Machine Based

Deep Learning and Blockchain for Electronic Health Record

A Novel Technique of Threshold Distance-Based Vehicle Tracking

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579

About the Editors

Vikrant Bhateja is the associate professor in the Department of ECE, SRMGPC,

K. V. N. Sunitha completed her B.Tech. in ECE from ANU, M.Tech. in CS from

Adi Narayana Reddy K. BVRIT HYDERABAD College of Engineering for

Agarwal Deepanshi Computer Science and Engineering, Inderprastha Engineering

Chowdary Smitha Koneru Lakshmaiah Educational Foundation, Vijayawada,

Jujuroo Sowmya CSE Department, Methodist College of Engineering & Tech-

Kumar Sagenela Vijaya Department of Computer Science and Engineering,

Nageswara Rao A. V. Department of Electronics and Communications Engi-

Raghava Bhimala Prasad V. Potluri, Siddhartha Institute of Technology,

Rekha G. CSE Department, Kakatiya Institute of Technology & Science, Warangal,

Sheshikala M. SR University, Warangal, Telangana, India

Sujatha R. PSG Institute of Management, Coimbatore, Tamil Nadu, India

Yogitha Mitta Department of Computer Science Engineering, SR Engineering

Lolla Kiran Kumar, P. Srinivasa Rao, and S. Sreenivasa Rao

Abstract Alzheimer’s disease is a neurological disorder of the brain that primarily

Keywords Alzheimer’s disease · Magnetic resonance imaging · Machine learning

Alzheimer’s disease is a neurological brain disorder that gradually degrades memory

value is 3, indicating severe dementia [32–34]. The mini-mental state examination

to extract features from BPW. Machine learning techniques, such as neuroimaging

3.1 Model Diagram

Fig. 1 Schematic representation of proposed model

3.2 CatBoost Classifier

3.2.1 Algorithm Description

Input: training set {(Xk , Yk )}p k=1 , number of Iterations I

The CatBoost algorithm is a new gradient boosting implementation with a high

3.3 Model Evaluation

The classifier’s performance is assessed using a confusion matrix, precision, recall,

5 Results and Discussion

A classifier’s performance is evaluated using the confusion matrix. When evaluating

True Positive + True Negative

Error Rate = 1 − Accuracy (5)

Fig. 2 Accuracy chart for a machine learning classifiers

A classification model’s precision refers to the percentage of positive predicted values